Re: [PATCH v4] powerpc/powernv: hwmon driver for power, fan rpm, voltage and temperature
On 07/04/2014 04:02 AM, Neelesh Gupta wrote: This patch adds basic kernel support for reading power values, fan speed rpm, voltage and temperature data on powernv platforms which will be exported to user space through sysfs interface. Hi Neelesh, Copying devicetree mailing list. Please copy it on the next (hopefully final) revision; devicetree maintainers at least need a chance to comment. [ Yes, I understand this is a shipping product, but still ... ] I am ok with the code except for a couple of minor nitpicks (see below). Thanks, Guenter Test results: - [root@tul163p1 ~]# sensors ibmpowernv-isa- Adapter: ISA adapter fan1:5567 RPM (min =0 RPM) fan2:5232 RPM (min =0 RPM) fan3:5532 RPM (min =0 RPM) fan4:4945 RPM (min =0 RPM) fan5: 0 RPM (min =0 RPM) fan6: 0 RPM (min =0 RPM) fan7:7392 RPM (min =0 RPM) fan8:7936 RPM (min =0 RPM) temp1:+39.0°C (high = +0.0°C) power1: 191.00 W [root@tul163p1 ~]# ls /sys/devices/platform/ alarmtimer ibmpowernv.0 power rtc-generic serial8250 uevent [root@tul163p1 ~]# ls /sys/devices/platform/ibmpowernv.0/hwmon/hwmon0/ device fan2_minfan4_minfan6_minfan8_min power fan1_fault fan3_fault fan5_fault fan7_fault in1_fault power1_input fan1_input fan3_input fan5_input fan7_input in2_fault subsystem fan1_minfan3_minfan5_minfan7_minin3_fault temp1_input fan2_fault fan4_fault fan6_fault fan8_fault in4_fault temp1_max fan2_input fan4_input fan6_input fan8_input name uevent [root@tul163p1 ~]# [root@tul163p1 ~]# ls /sys/class/hwmon/hwmon0/ device fan2_minfan4_minfan6_minfan8_min power fan1_fault fan3_fault fan5_fault fan7_fault in1_fault power1_input fan1_input fan3_input fan5_input fan7_input in2_fault subsystem fan1_minfan3_minfan5_minfan7_minin3_fault temp1_input fan2_fault fan4_fault fan6_fault fan8_fault in4_fault temp1_max fan2_input fan4_input fan6_input fan8_input name uevent [root@tul163p1 ~]# Signed-off-by: Neelesh Gupta --- Changes in v4 = - Replaced pr_err() with dev_err() for loggin print messages. - Using kstrtou32() function for converting string to u32 instead of sscanf(). Changes in v3 = - Fixed an endianness bug leading the driver to break on LE. - Fixed a bug that when one of the 'attribute_group' not populated, following groups attributes were dropped. - Rewrite the get_sensor_index_attr() function to handle all the error scenarios like 'sscanf' etc. - Fixed all the errors/warnings related to coding style/whitespace. - Added 'Documentation' files. - Addressed remaining review comments on V2. Changes in v2 = - Generic use of devm_* functions in hwmon like using devm_kzalloc() for dynamic memory request, avoiding the need to explicit free of memory. Adding 'struct attribute_group' as member of platform data structure to be populated and then passed to devm_hwmon_device_register_with_groups(). Note: Having an array of pointers of 'attribute_group' and each group corresponds to 'enum sensors' type. Not completely sure, if it's ideal or could have just one group populated with attributes of sensor types? - 'ibmpowernv' is not hot-pluggable device so moving 'platform_driver' callback function (probe) as part of __init code. - Fixed issues related to coding style. - Other general comments in v1. .../devicetree/bindings/hwmon/ibmpowernv.txt | 27 + Documentation/hwmon/ibmpowernv | 41 ++ drivers/hwmon/Kconfig | 11 + drivers/hwmon/Makefile |1 drivers/hwmon/ibmpowernv.c | 362 5 files changed, 442 insertions(+) create mode 100644 Documentation/devicetree/bindings/hwmon/ibmpowernv.txt create mode 100644 Documentation/hwmon/ibmpowernv create mode 100644 drivers/hwmon/ibmpowernv.c diff --git a/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt b/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt new file mode 100644 index 000..e3bd1eb --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt @@ -0,0 +1,27 @@ +IBM POWERNV platform sensors + + +Required node properties: +- compatible: must be one of + "ibm,opal-sensor-cooling-fan" + "ibm,opal-sensor-amb-temp" + "ibm,opal-sensor-power-supply" + "ibm,opal-sensor-power" +- sensor-id: an opaque id provided by the firmware to the kernel, identifies a +given sensor and its attribute data + +Example sensors node: + +cooling-fan#8-data { + sensor-id = <0x7052107>; + phandle = <0x1028>; + linux,phandle = <0x1028>; + compatible = "ibm,opal-sensor-cooling-fan"; phandle and linux-phandle are neither
Re: [PATCH v2] powerpc/kvm: support to handle sw breakpoint
On Thursday 03 July 2014 05:21 PM, Alexander Graf wrote: > > On 01.07.14 10:41, Madhavan Srinivasan wrote: >> This patch adds kernel side support for software breakpoint. >> Design is that, by using an illegal instruction, we trap to hypervisor >> via Emulation Assistance interrupt, where we check for the illegal >> instruction >> and accordingly we return to Host or Guest. Patch also adds support for >> software breakpoint in PR KVM. >> >> Patch mandates use of "abs" instruction as sw breakpoint instruction >> (primary opcode 31 and extended opcode 360). Based on PowerISA v2.01, >> ABS instruction has been dropped from the architecture and treated an >> illegal instruction. >> >> Changes v1->v2: >> >> Moved the debug instruction #def to kvm_book3s.h. This way PR_KVM >> can also share it. >> Added code to use KVM get one reg infrastructure to get debug opcode. >> Updated emulate.c to include emulation of debug instruction incase >> of PR_KVM. >> Made changes to commit message. >> >> Signed-off-by: Madhavan Srinivasan >> --- >> arch/powerpc/include/asm/kvm_book3s.h |8 >> arch/powerpc/include/asm/ppc-opcode.h |5 + >> arch/powerpc/kvm/book3s.c |3 ++- >> arch/powerpc/kvm/book3s_hv.c |9 + >> arch/powerpc/kvm/book3s_pr.c |3 +++ >> arch/powerpc/kvm/emulate.c| 10 ++ >> 6 files changed, 37 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/include/asm/kvm_book3s.h >> b/arch/powerpc/include/asm/kvm_book3s.h >> index f52f656..180d549 100644 >> --- a/arch/powerpc/include/asm/kvm_book3s.h >> +++ b/arch/powerpc/include/asm/kvm_book3s.h >> @@ -24,6 +24,14 @@ >> #include >> #include >> +/* >> + * KVMPPC_INST_BOOK3S_DEBUG is debug Instruction for supporting >> Software Breakpoint. >> + * Instruction mnemonic is ABS, primary opcode is 31 and extended >> opcode is 360. >> + * Based on PowerISA v2.01, ABS instruction has been dropped from the >> architecture >> + * and treated an illegal instruction. >> + */ >> +#define KVMPPC_INST_BOOK3S_DEBUG0x7c0002d0 > > This will still break with LE guests. > I am told to try with all 0s opcode. So rewriting the patch. >> + >> struct kvmppc_bat { >> u64 raw; >> u32 bepi; >> diff --git a/arch/powerpc/include/asm/ppc-opcode.h >> b/arch/powerpc/include/asm/ppc-opcode.h >> index 3132bb9..3fbb4c1 100644 >> --- a/arch/powerpc/include/asm/ppc-opcode.h >> +++ b/arch/powerpc/include/asm/ppc-opcode.h >> @@ -111,6 +111,11 @@ >> #define OP_31_XOP_LHBRX 790 >> #define OP_31_XOP_STHBRX918 >> +/* KVMPPC_INST_BOOK3S_DEBUG -- Software breakpoint Instruction >> + * Instruction mnemonic is ABS, primary opcode is 31 and extended >> opcode is 360. >> + */ >> +#define OP_31_XOP_ABS360 >> + >> #define OP_LWZ 32 >> #define OP_LD 58 >> #define OP_LWZU 33 >> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c >> index c254c27..b40fe5d 100644 >> --- a/arch/powerpc/kvm/book3s.c >> +++ b/arch/powerpc/kvm/book3s.c >> @@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu >> *vcpu, >> int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, >> struct kvm_guest_debug *dbg) >> { >> -return -EINVAL; >> +vcpu->guest_debug = dbg->control; >> +return 0; >> } >> void kvmppc_decrementer_func(unsigned long data) >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 7a12edb..402c1ec 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -725,8 +725,14 @@ static int kvmppc_handle_exit_hv(struct kvm_run >> *run, struct kvm_vcpu *vcpu, >>* we don't emulate any guest instructions at this stage. >>*/ >> case BOOK3S_INTERRUPT_H_EMUL_ASSIST: >> +if (kvmppc_get_last_inst(vcpu) == KVMPPC_INST_BOOK3S_DEBUG ) { >> +run->exit_reason = KVM_EXIT_DEBUG; >> +run->debug.arch.address = kvmppc_get_pc(vcpu); >> +r = RESUME_HOST; > > Phew - why can't we just go into the normal instruction emulator for > EMUL_ASSIST? > IIUC, using the emulation_assist_interrupt function (kernel/trap.c) ? Thanks for review Regards Maddy > > Alex > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Disable RELOCATABLE for COMPILE_TEST with PPC64
ping ... Guenter On 06/30/2014 11:45 AM, Guenter Roeck wrote: powerpc:allmodconfig has been failing for some time with the following error. arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:1312: Error: attempt to move .org backwards make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1 A number of attempts to fix the problem by moving around code have been unsuccessful and resulted in failed builds for some configurations and the discovery of toolchain bugs. Fix the problem by disabling RELOCATABLE for COMPILE_TEST builds instead. While this is less than perfect, it avoids substantial code changes which would otherwise be necessary just to make COMPILE_TEST builds happy and might have undesired side effects. Signed-off-by: Guenter Roeck --- arch/powerpc/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index bd6dd6e..fefe7c8 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -414,7 +414,7 @@ config KEXEC config CRASH_DUMP bool "Build a kdump crash kernel" depends on PPC64 || 6xx || FSL_BOOKE || (44x && !SMP) - select RELOCATABLE if PPC64 || 44x || FSL_BOOKE + select RELOCATABLE if (PPC64 && !COMPILE_TEST) || 44x || FSL_BOOKE help Build a kernel suitable for use as a kdump capture kernel. The same kernel binary can be used as production kernel and dump @@ -1017,6 +1017,7 @@ endmenu if PPC64 config RELOCATABLE bool "Build a relocatable kernel" + depends on !COMPILE_TEST select NONSTATIC_KERNEL help This builds a kernel image that is capable of running anywhere ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 1/3] dmaengine: mpc512x: add device tree binding document
Hi Alexander, Apologies for the late reply. DT-related email is somewhat a firehose and unfortunately I lose track of things. On Thu, Jun 19, 2014 at 02:59:19PM +0100, Alexander Popov wrote: > 2014-06-18 18:56 GMT+04:00 Alexander Popov : > > 2014-06-18 17:37 GMT+04:00 Mark Rutland : > >> On Wed, Jun 18, 2014 at 11:48:10AM +0100, Alexander Popov wrote: > >>> Introduce a device tree binding document for the MPC512x DMA controller > >>> +Optional properties: > >>> +- #dma-cells: the length of the DMA specifier, must be <1>. > >>> + Each channel of this DMA controller has a peripheral request line, > >>> + the assignment is fixed in hardware. This one cell > >>> + in dmas property of a client device represents the channel number. > >> > >> Surely this is required to be able to refer to DMA channels on the > >> device? > > > > Excuse me, I didn't understand your question. > > Do you inquire about the reason of making #dma-cells an optional property? > > It's optional because device tree based lookup support is made > > optional (part 3/3). > > Mark, did I answer your question? > Should I fix anything in this patch series? I would move it under required properties even if we happen to not use it in certain edge cases. Moving forwards everything should be DT-driven, so it'll be necessary. Thanks, Mark. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 2/3] iommu/fsl: Fix the device domain attach condition.
> -Original Message- > From: Joerg Roedel [mailto:j...@8bytes.org] > Sent: Friday, July 04, 2014 4:25 PM > To: Sethi Varun-B16395 > Cc: io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > linuxppc-dev@lists.ozlabs.org; alex.william...@redhat.com > Subject: Re: [PATCH 2/3] iommu/fsl: Fix the device domain attach > condition. > > Hmm, > > On Tue, Jun 24, 2014 at 07:27:16PM +0530, Varun Sethi wrote: > > - old_domain_info = find_domain(dev); > > + old_domain_info = dev->archdata.iommu_domain; > > if (old_domain_info && old_domain_info->domain != dma_domain) { > > spin_unlock_irqrestore(&device_domain_lock, flags); > > detach_device(dev, old_domain_info->domain); > > Wouldn't this set dev->archdata.iommu_domain to NULL anyway, so that ... > Not for the case where device has multiple LIODNs. > > @@ -399,7 +394,7 @@ static void attach_device(struct fsl_dma_domain > *dma_domain, int liodn, struct d > > * the info for the first LIODN as all > > * LIODNs share the same domain > > */ > > - if (!old_domain_info) > > + if (!dev->archdata.iommu_domain) > > dev->archdata.iommu_domain = info; > > We already know that it _must_ be NULL here? > That won't be true for devices having multiple LIODNs > > spin_unlock_irqrestore(&device_domain_lock, flags); > > This would shrink down the patch to: > > diff --git a/drivers/iommu/fsl_pamu_domain.c > b/drivers/iommu/fsl_pamu_domain.c index 93072ba..d21b554 100644 > --- a/drivers/iommu/fsl_pamu_domain.c > +++ b/drivers/iommu/fsl_pamu_domain.c > @@ -399,8 +399,7 @@ static void attach_device(struct fsl_dma_domain > *dma_domain, int liodn, struct d >* the info for the first LIODN as all >* LIODNs share the same domain >*/ > - if (!old_domain_info) > - dev->archdata.iommu_domain = info; > + dev->archdata.iommu_domain = info; For devices having multiple LIODNs, we don't want to overwrite the info. -Varun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 1/3] iommu/fsl: Fix PAMU window size check.
> -Original Message- > From: Joerg Roedel [mailto:j...@8bytes.org] > Sent: Friday, July 04, 2014 4:15 PM > To: Sethi Varun-B16395 > Cc: io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > linuxppc-dev@lists.ozlabs.org; alex.william...@redhat.com > Subject: Re: [PATCH 1/3] iommu/fsl: Fix PAMU window size check. > > On Tue, Jun 24, 2014 at 07:27:15PM +0530, Varun Sethi wrote: > > /* window size is 2^(WSE+1) bytes */ > > - return __ffs(addrspace_size) - 1; > > + return fls64(addrspace_size) - 2; > > This looks bogus, why do you replace ffs (find-first-bit) by fls (find- > last-bit)? > Address space size is always a power of 2. This change was required to handle address sizes > 32bit width on 32 bit architectures. -Varun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4] powerpc/powernv: hwmon driver for power, fan rpm, voltage and temperature
This patch adds basic kernel support for reading power values, fan speed rpm, voltage and temperature data on powernv platforms which will be exported to user space through sysfs interface. Test results: - [root@tul163p1 ~]# sensors ibmpowernv-isa- Adapter: ISA adapter fan1:5567 RPM (min =0 RPM) fan2:5232 RPM (min =0 RPM) fan3:5532 RPM (min =0 RPM) fan4:4945 RPM (min =0 RPM) fan5: 0 RPM (min =0 RPM) fan6: 0 RPM (min =0 RPM) fan7:7392 RPM (min =0 RPM) fan8:7936 RPM (min =0 RPM) temp1:+39.0°C (high = +0.0°C) power1: 191.00 W [root@tul163p1 ~]# ls /sys/devices/platform/ alarmtimer ibmpowernv.0 power rtc-generic serial8250 uevent [root@tul163p1 ~]# ls /sys/devices/platform/ibmpowernv.0/hwmon/hwmon0/ device fan2_minfan4_minfan6_minfan8_min power fan1_fault fan3_fault fan5_fault fan7_fault in1_fault power1_input fan1_input fan3_input fan5_input fan7_input in2_fault subsystem fan1_minfan3_minfan5_minfan7_minin3_fault temp1_input fan2_fault fan4_fault fan6_fault fan8_fault in4_fault temp1_max fan2_input fan4_input fan6_input fan8_input name uevent [root@tul163p1 ~]# [root@tul163p1 ~]# ls /sys/class/hwmon/hwmon0/ device fan2_minfan4_minfan6_minfan8_min power fan1_fault fan3_fault fan5_fault fan7_fault in1_fault power1_input fan1_input fan3_input fan5_input fan7_input in2_fault subsystem fan1_minfan3_minfan5_minfan7_minin3_fault temp1_input fan2_fault fan4_fault fan6_fault fan8_fault in4_fault temp1_max fan2_input fan4_input fan6_input fan8_input name uevent [root@tul163p1 ~]# Signed-off-by: Neelesh Gupta --- Changes in v4 = - Replaced pr_err() with dev_err() for loggin print messages. - Using kstrtou32() function for converting string to u32 instead of sscanf(). Changes in v3 = - Fixed an endianness bug leading the driver to break on LE. - Fixed a bug that when one of the 'attribute_group' not populated, following groups attributes were dropped. - Rewrite the get_sensor_index_attr() function to handle all the error scenarios like 'sscanf' etc. - Fixed all the errors/warnings related to coding style/whitespace. - Added 'Documentation' files. - Addressed remaining review comments on V2. Changes in v2 = - Generic use of devm_* functions in hwmon like using devm_kzalloc() for dynamic memory request, avoiding the need to explicit free of memory. Adding 'struct attribute_group' as member of platform data structure to be populated and then passed to devm_hwmon_device_register_with_groups(). Note: Having an array of pointers of 'attribute_group' and each group corresponds to 'enum sensors' type. Not completely sure, if it's ideal or could have just one group populated with attributes of sensor types? - 'ibmpowernv' is not hot-pluggable device so moving 'platform_driver' callback function (probe) as part of __init code. - Fixed issues related to coding style. - Other general comments in v1. .../devicetree/bindings/hwmon/ibmpowernv.txt | 27 + Documentation/hwmon/ibmpowernv | 41 ++ drivers/hwmon/Kconfig | 11 + drivers/hwmon/Makefile |1 drivers/hwmon/ibmpowernv.c | 362 5 files changed, 442 insertions(+) create mode 100644 Documentation/devicetree/bindings/hwmon/ibmpowernv.txt create mode 100644 Documentation/hwmon/ibmpowernv create mode 100644 drivers/hwmon/ibmpowernv.c diff --git a/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt b/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt new file mode 100644 index 000..e3bd1eb --- /dev/null +++ b/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt @@ -0,0 +1,27 @@ +IBM POWERNV platform sensors + + +Required node properties: +- compatible: must be one of + "ibm,opal-sensor-cooling-fan" + "ibm,opal-sensor-amb-temp" + "ibm,opal-sensor-power-supply" + "ibm,opal-sensor-power" +- sensor-id: an opaque id provided by the firmware to the kernel, identifies a +given sensor and its attribute data + +Example sensors node: + +cooling-fan#8-data { + sensor-id = <0x7052107>; + phandle = <0x1028>; + linux,phandle = <0x1028>; + compatible = "ibm,opal-sensor-cooling-fan"; +}; + +amb-temp#1-thrs { + sensor-id = <0x5096000>; + phandle = <0x1017>; + linux,phandle = <0x1017>; + compatible = "ibm,opal-sensor-amb-temp"; +}; diff --git a/Documentation/hwmon/ibmpowernv b/Documentation/hwmon/ibmpowernv new file mode 100644 index 000..644245a --- /dev/null +++ b/Documentation/hwmon/ibmpowernv @@ -0,0 +1,41 @@ +Kernel Driver IBMPOWENV +===
Re: [PATCH 2/3] iommu/fsl: Fix the device domain attach condition.
Hmm, On Tue, Jun 24, 2014 at 07:27:16PM +0530, Varun Sethi wrote: > - old_domain_info = find_domain(dev); > + old_domain_info = dev->archdata.iommu_domain; > if (old_domain_info && old_domain_info->domain != dma_domain) { > spin_unlock_irqrestore(&device_domain_lock, flags); > detach_device(dev, old_domain_info->domain); Wouldn't this set dev->archdata.iommu_domain to NULL anyway, so that ... > @@ -399,7 +394,7 @@ static void attach_device(struct fsl_dma_domain > *dma_domain, int liodn, struct d >* the info for the first LIODN as all >* LIODNs share the same domain >*/ > - if (!old_domain_info) > + if (!dev->archdata.iommu_domain) > dev->archdata.iommu_domain = info; We already know that it _must_ be NULL here? > spin_unlock_irqrestore(&device_domain_lock, flags); This would shrink down the patch to: diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index 93072ba..d21b554 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu/fsl_pamu_domain.c @@ -399,8 +399,7 @@ static void attach_device(struct fsl_dma_domain *dma_domain, int liodn, struct d * the info for the first LIODN as all * LIODNs share the same domain */ - if (!old_domain_info) - dev->archdata.iommu_domain = info; + dev->archdata.iommu_domain = info; spin_unlock_irqrestore(&device_domain_lock, flags); } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] iommu/fsl: Fix PAMU window size check.
On Tue, Jun 24, 2014 at 07:27:15PM +0530, Varun Sethi wrote: > /* window size is 2^(WSE+1) bytes */ > - return __ffs(addrspace_size) - 1; > + return fls64(addrspace_size) - 2; This looks bogus, why do you replace ffs (find-first-bit) by fls (find-last-bit)? Joerg ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()
On Fri, Jul 04, 2014 at 09:11:50AM +, David Laight wrote: > > I might be missing something, but we are talking of MSI address space > > here, aren't we? I am not getting how we could end up with a 'write' > > to a random kernel location when a unclaimed MSI vector sent. We could > > only expect a spurious interrupt at worst, which is handled and reported. > > > > Anyway, as I described in my reply to Bjorn, this is not a concern IMO. > > I'm thinking of the following - which might be MSI-X ? > 1) Hardware requests some interrupts and tells the host the BAR (and offset) >where the 'vectors' should be written. > 2) To raise an interrupt the hardware uses the 'vector' as the address >of a normal PCIe write cycle. > > So if the hardware requests 4 interrupts, but the driver (believing it > will only use 3) only write 3 vectors, and then the hardware uses the > 4th vector it can write to a random location. > > Debugging that would be hard! MSI base address is kind of hardcoded for a platform. A combination of MSI base address, PCI function number and MSI vector makes a PCI host to raise interrupt on a CPU. I might be inaccurate in details, but the scenario you described is impossible AFAICT. > David > > > -- Regards, Alexander Gordeev agord...@redhat.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()
From: Alexander Gordeev ... > > Even if you do that, you ought to write valid interrupt information > > into the 4th slot (maybe replicating one of the earlier interrupts). > > Then, if the device does raise the 'unexpected' interrupt you don't > > get a write to a random kernel location. > > I might be missing something, but we are talking of MSI address space > here, aren't we? I am not getting how we could end up with a 'write' > to a random kernel location when a unclaimed MSI vector sent. We could > only expect a spurious interrupt at worst, which is handled and reported. > > Anyway, as I described in my reply to Bjorn, this is not a concern IMO. I'm thinking of the following - which might be MSI-X ? 1) Hardware requests some interrupts and tells the host the BAR (and offset) where the 'vectors' should be written. 2) To raise an interrupt the hardware uses the 'vector' as the address of a normal PCIe write cycle. So if the hardware requests 4 interrupts, but the driver (believing it will only use 3) only write 3 vectors, and then the hardware uses the 4th vector it can write to a random location. Debugging that would be hard! David ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()
On Thu, Jul 03, 2014 at 09:20:52AM +, David Laight wrote: > From: Bjorn Helgaas > > On Tue, Jun 10, 2014 at 03:10:30PM +0200, Alexander Gordeev wrote: > > > There are PCI devices that require a particular value written > > > to the Multiple Message Enable (MME) register while aligned on > > > power of 2 boundary value of actually used MSI vectors 'nvec' > > > is a lesser of that MME value: > > > > > > roundup_pow_of_two(nvec) < 'Multiple Message Enable' > > > > > > However the existing pci_enable_msi_block() interface is not > > > able to configure such devices, since the value written to the > > > MME register is calculated from the number of requested MSIs > > > 'nvec': > > > > > > 'Multiple Message Enable' = roundup_pow_of_two(nvec) > > > > For MSI, software learns how many vectors a device requests by reading > > the Multiple Message Capable (MMC) field. This field is encoded, so a > > device can only request 1, 2, 4, 8, etc., vectors. It's impossible > > for a device to request 3 vectors; it would have to round up that up > > to a power of two and request 4 vectors. > > > > Software writes similarly encoded values to MME to tell the device how > > many vectors have been allocated for its use. For example, it's > > impossible to tell the device that it can use 3 vectors; the OS has to > > round that up and tell the device it can use 4 vectors. > > > > So if I understand correctly, the point of this series is to take > > advantage of device-specific knowledge, e.g., the device requests 4 > > vectors via MMC, but we "know" the device is only capable of using 3. > > Moreover, we tell the device via MME that 4 vectors are available, but > > we've only actually set up 3 of them. > ... > > Even if you do that, you ought to write valid interrupt information > into the 4th slot (maybe replicating one of the earlier interrupts). > Then, if the device does raise the 'unexpected' interrupt you don't > get a write to a random kernel location. I might be missing something, but we are talking of MSI address space here, aren't we? I am not getting how we could end up with a 'write' to a random kernel location when a unclaimed MSI vector sent. We could only expect a spurious interrupt at worst, which is handled and reported. Anyway, as I described in my reply to Bjorn, this is not a concern IMO. > Plausibly something similar should be done when a smaller number of > interrupts is assigned. > > David > -- Regards, Alexander Gordeev agord...@redhat.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()
On Wed, Jul 02, 2014 at 02:22:01PM -0600, Bjorn Helgaas wrote: > On Tue, Jun 10, 2014 at 03:10:30PM +0200, Alexander Gordeev wrote: > > There are PCI devices that require a particular value written > > to the Multiple Message Enable (MME) register while aligned on > > power of 2 boundary value of actually used MSI vectors 'nvec' > > is a lesser of that MME value: > > > > roundup_pow_of_two(nvec) < 'Multiple Message Enable' > > > > However the existing pci_enable_msi_block() interface is not > > able to configure such devices, since the value written to the > > MME register is calculated from the number of requested MSIs > > 'nvec': > > > > 'Multiple Message Enable' = roundup_pow_of_two(nvec) > > For MSI, software learns how many vectors a device requests by reading > the Multiple Message Capable (MMC) field. This field is encoded, so a > device can only request 1, 2, 4, 8, etc., vectors. It's impossible > for a device to request 3 vectors; it would have to round up that up > to a power of two and request 4 vectors. > > Software writes similarly encoded values to MME to tell the device how > many vectors have been allocated for its use. For example, it's > impossible to tell the device that it can use 3 vectors; the OS has to > round that up and tell the device it can use 4 vectors. Nod. > So if I understand correctly, the point of this series is to take > advantage of device-specific knowledge, e.g., the device requests 4 > vectors via MMC, but we "know" the device is only capable of using 3. > Moreover, we tell the device via MME that 4 vectors are available, but > we've only actually set up 3 of them. Exactly. > This makes me uneasy because we're lying to the device, and the device > is perfectly within spec to use all 4 of those vectors. If anything > changes the number of vectors the device uses (new device revision, > firmware upgrade, etc.), this is liable to break. If a device committed via non-MSI specific means to send only 3 vectors out of 4 available why should we expect it to send 4? The probability of a firmware sending 4/4 vectors in this case is equal to the probability of sending 5/4 or 16/4, with the very same reason - a bug in the firmware. Moreover, even vector 4/4 would be unexpected by the device driver, though it is perfectly within the spec. As of new device revision or firmware update etc. - it is just yet another case of device driver vs the firmware match/mismatch. Not including this change does not help here at all IMHO. > Can you quantify the benefit of this? Can't a device already use > MSI-X to request exactly the number of vectors it can use? (I know A Intel AHCI chipset requires 16 vectors written to MME while advertises (via AHCI registers) and uses only 6. Even attempt to init 8 vectors results in device's fallback to 1 (!). > not all devices support MSI-X, but maybe we should just accept MSI for > what it is and encourage the HW guys to use MSI-X if MSI isn't good > enough.) > > > In this case the result written to the MME register may not > > satisfy the aforementioned PCI devices requirement and therefore > > the PCI functions will not operate in a desired mode. > > I'm not sure what you mean by "will not operate in a desired mode." > I thought this was an optimization to save vectors and that these > changes would be completely invisible to the hardware. Yes, this should be invisible to the hardware. The above is an attempt to describe the Intel AHCI weirdness in general terms :) I think it could be omitted. > Bjorn > > > This update introduces pci_enable_msi_partial() extension to > > pci_enable_msi_block() interface that accepts extra 'nvec_mme' > > argument which is then written to MME register while the value > > of 'nvec' is still used to setup as many interrupts as requested. > > > > As result of this change, architecture-specific callbacks > > arch_msi_check_device() and arch_setup_msi_irqs() get an extra > > 'nvec_mme' parameter as well, but it is ignored for now. > > Therefore, this update is a placeholder for architectures that > > wish to support pci_enable_msi_partial() function in the future. > > > > Cc: linux-...@vger.kernel.org > > Cc: linux-m...@linux-mips.org > > Cc: linuxppc-dev@lists.ozlabs.org > > Cc: linux-s...@vger.kernel.org > > Cc: x...@kernel.org > > Cc: xen-de...@lists.xenproject.org > > Cc: io...@lists.linux-foundation.org > > Cc: linux-...@vger.kernel.org > > Cc: linux-...@vger.kernel.org > > Signed-off-by: Alexander Gordeev > > --- > > Documentation/PCI/MSI-HOWTO.txt | 36 ++-- > > arch/mips/pci/msi-octeon.c |2 +- > > arch/powerpc/kernel/msi.c |4 +- > > arch/s390/pci/pci.c |2 +- > > arch/x86/kernel/x86_init.c |2 +- > > drivers/pci/msi.c | 83 > > ++- > > include/linux/msi.h |5 +- > > include/linux/pci.h |3 + > > 8 files changed, 115 insertions(+), 22 deletio
Re: [PATCH v2] KVM: PPC: e500: Emulate power management control SPR
On 04.07.14 10:17, Mihai Caraman wrote: For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0) to enable idle power down for cores and devices by setting up the idle count period at boot time. With the host already controlling the power management configuration the guest could simply benefit from it, so emulate guest request as a general store. Signed-off-by: Mihai Caraman Thanks, applied to kvm-ppc-queue. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 0/4] KVM Book3E support for HTW guests
On 03.07.14 16:45, Mihai Caraman wrote: KVM Book3E support for Hardware Page Tablewalk enabled guests. It looks reasonably straight forward to me, though I have to admit that I find the sind conditions pretty confusing. Scott, would you mind to have a look at this set too? :) Thanks a lot! Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] KVM: PPC: e500: Emulate power management control SPR
For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0) to enable idle power down for cores and devices by setting up the idle count period at boot time. With the host already controlling the power management configuration the guest could simply benefit from it, so emulate guest request as a general store. Signed-off-by: Mihai Caraman --- v2: - treat the operation as a general store arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kvm/e500_emulate.c | 12 2 files changed, 13 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 62b2cee..faf2f0e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -584,6 +584,7 @@ struct kvm_vcpu_arch { u32 mmucfg; u32 eptcfg; u32 epr; + u32 pwrmgtcr0; u32 crit_save; /* guest debug registers*/ struct debug_reg dbg_reg; diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index 002d517..c99c40e 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -250,6 +250,14 @@ int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, int sprn, ulong spr_va spr_val); break; + case SPRN_PWRMGTCR0: + /* +* Guest relies on host power management configurations +* Treat the request as a general store +*/ + vcpu->arch.pwrmgtcr0 = spr_val; + break; + /* extra exceptions */ case SPRN_IVOR32: vcpu->arch.ivor[BOOKE_IRQPRIO_SPE_UNAVAIL] = spr_val; @@ -368,6 +376,10 @@ int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn, ulong *spr_v *spr_val = vcpu->arch.eptcfg; break; + case SPRN_PWRMGTCR0: + *spr_val = vcpu->arch.pwrmgtcr0; + break; + /* extra exceptions */ case SPRN_IVOR32: *spr_val = vcpu->arch.ivor[BOOKE_IRQPRIO_SPE_UNAVAIL]; -- 1.7.11.7 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 2/4] KVM: PPC: Book3E: Handle LRAT error exception
On 03.07.14 16:45, Mihai Caraman wrote: Handle LRAT error exception with support for lrat mapping and invalidation. Signed-off-by: Mihai Caraman --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/kvm_ppc.h| 2 + arch/powerpc/include/asm/mmu-book3e.h | 3 + arch/powerpc/include/asm/reg_booke.h | 13 arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kvm/booke.c | 40 +++ arch/powerpc/kvm/bookehv_interrupts.S | 9 ++- arch/powerpc/kvm/e500_mmu_host.c | 125 ++ arch/powerpc/kvm/e500mc.c | 2 + 9 files changed, 195 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index bb66d8b..7b6b2ec 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -433,6 +433,7 @@ struct kvm_vcpu_arch { u32 eplc; u32 epsc; u32 oldpir; + u64 fault_lper; #endif #if defined(CONFIG_BOOKE) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 9c89cdd..2730a29 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -86,6 +86,8 @@ extern gpa_t kvmppc_mmu_xlate(struct kvm_vcpu *vcpu, unsigned int gtlb_index, gva_t eaddr); extern void kvmppc_mmu_dtlb_miss(struct kvm_vcpu *vcpu); extern void kvmppc_mmu_itlb_miss(struct kvm_vcpu *vcpu); +extern void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn); +extern void kvmppc_lrat_invalidate(struct kvm_vcpu *vcpu); extern struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id); diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index 088fd9f..ac6acf7 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,6 +40,8 @@ /* MAS registers bit definitions */ +#define MAS0_ATSEL 0x8000 +#define MAS0_ATSEL_SHIFT 31 #define MAS0_TLBSEL_MASK0x3000 #define MAS0_TLBSEL_SHIFT 28 #define MAS0_TLBSEL(x) (((x) << MAS0_TLBSEL_SHIFT) & MAS0_TLBSEL_MASK) @@ -53,6 +55,7 @@ #define MAS0_WQ_CLR_RSRV 0x2000 #define MAS1_VALID 0x8000 +#define MAS1_VALID_SHIFT 31 #define MAS1_IPROT0x4000 #define MAS1_TID(x) (((x) << 16) & 0x3FFF) #define MAS1_IND 0x2000 diff --git a/arch/powerpc/include/asm/reg_booke.h b/arch/powerpc/include/asm/reg_booke.h index 75bda23..783d617 100644 --- a/arch/powerpc/include/asm/reg_booke.h +++ b/arch/powerpc/include/asm/reg_booke.h @@ -43,6 +43,8 @@ /* Special Purpose Registers (SPRNs)*/ #define SPRN_DECAR0x036 /* Decrementer Auto Reload Register */ +#define SPRN_LPER 0x038 /* Logical Page Exception Register */ +#define SPRN_LPERU 0x039 /* Logical Page Exception Register Upper */ #define SPRN_IVPR 0x03F /* Interrupt Vector Prefix Register */ #define SPRN_USPRG0 0x100 /* User Special Purpose Register General 0 */ #define SPRN_SPRG3R 0x103 /* Special Purpose Register General 3 Read */ @@ -358,6 +360,9 @@ #define ESR_ILK 0x0010 /* Instr. Cache Locking */ #define ESR_PUO 0x0004 /* Unimplemented Operation exception */ #define ESR_BO0x0002 /* Byte Ordering */ +#define ESR_DATA 0x0400 /* Page Table Data Access */ +#define ESR_TLBI 0x0200 /* Page Table TLB Ineligible */ +#define ESR_PT 0x0100 /* Page Table Translation */ #define ESR_SPV 0x0080 /* Signal Processing operation */ /* Bit definitions related to the DBCR0. */ @@ -649,6 +654,14 @@ #define EPC_EPID 0x3fff #define EPC_EPID_SHIFT0 +/* Bit definitions for LPER */ +#define LPER_ALPN 0x000FF000ULL +#define LPER_ALPN_SHIFT12 +#define LPER_WIMGE 0x0F80 +#define LPER_WIMGE_SHIFT 7 +#define LPER_LPS 0x000F +#define LPER_LPS_SHIFT 0 + /* * The IBM-403 is an even more odd special case, as it is much * older than the IBM-405 series. We put these down here incase someone diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index f5995a9..be6e329 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -713,6 +713,7 @@ int main(void) DEFINE(VCPU_HOST_MAS4, offsetof(struct kvm_vcpu, arch.host_mas4)); DEFINE(VCPU_HOST_MAS6, offsetof(struct kvm_vcpu, arch.host_mas6)); DEFINE(VCPU_EPLC, offsetof(struct kvm_vcpu, arch.eplc)); + DEFINE(VCPU_FAULT_LPER, offsetof(struct kvm_vcpu, arch.fault_lper)); #endif #ifdef CONFIG_KVM_EXIT_TIMING diff --git a/arch/powerpc/kvm/booke
Re: [PATCH 1/3] powerpc/kvm: Remove redundant save of SIER AND MMCR2
On 03.07.14 08:12, Joel Stanley wrote: These two registers are already saved in the block above. Aside from being unnecessary, by the time we get down to the second save location r8 no longer contains MMCR2, so we are clobbering the saved value with PMC5. Signed-off-by: Joel Stanley Reviewed-by: Alexander Graf Please CC kvm-ppc@vger and kvm@vger when you send kvm related patches :). Ben, I think this patch makes sense to go via your tree. Want to take it? Alex --- arch/powerpc/kvm/book3s_hv_interrupts.S | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_interrupts.S b/arch/powerpc/kvm/book3s_hv_interrupts.S index 8c86422..731be74 100644 --- a/arch/powerpc/kvm/book3s_hv_interrupts.S +++ b/arch/powerpc/kvm/book3s_hv_interrupts.S @@ -127,11 +127,6 @@ BEGIN_FTR_SECTION stw r10, HSTATE_PMC + 24(r13) stw r11, HSTATE_PMC + 28(r13) END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) -BEGIN_FTR_SECTION - mfspr r9, SPRN_SIER - std r8, HSTATE_MMCR + 40(r13) - std r9, HSTATE_MMCR + 48(r13) -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) 31: /* ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 5/6 v2] KVM: PPC: Book3E: Add ONE_REG AltiVec support
On 03.07.14 18:11, mihai.cara...@freescale.com wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 03, 2014 3:34 PM To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org Cc: k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 5/6 v2] KVM: PPC: Book3E: Add ONE_REG AltiVec support On 30.06.14 17:34, Mihai Caraman wrote: Add ONE_REG support for AltiVec on Book3E. Signed-off-by: Mihai Caraman Any chance we can handle these in generic code? I expected this request :) Can we let this for a second phase to have e6500 enabled first? I don't see the value of duplicating code in e500 specific code only to remove and combine it in common code in a follow-up patch after that. Can you share with us a Book3S setup so I can validate the requested changes? I already fell anxious touching strange hardware specific Book3S code without running it. Until a few weeks ago I had an externally reachable G5 machine that we could've used for this. Unfortunately I had to replace the box with another one that's not quite as stable. I'll try and see if I can fix or replace it soon. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/6 v2] KVM: PPC: Book3E: Increase FPU laziness
On 04.07.14 09:46, Alexander Graf wrote: On 03.07.14 17:46, mihai.cara...@freescale.com wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 03, 2014 3:29 PM To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org Cc: k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 3/6 v2] KVM: PPC: Book3E: Increase FPU laziness On 30.06.14 17:34, Mihai Caraman wrote: Increase FPU laziness by calling kvmppc_load_guest_fp() just before returning to guest instead of each sched in. Without this improvement an interrupt may also claim floting point corrupting guest state. How do you handle context switching with this patch applied? During most of the guest's lifetime we never exit kvmppc_vcpu_run(), so when the guest gets switched out all FPU state gets lost? No, we had this discussion in ver 1. The FP/VMX/VSX is implemented lazy in the kernel i.e. the unit state is not saved/restored until another thread that once claimed the unit is sched in. Since FP/VMX/VSX can be activated by the guest independent of the host, the vcpu thread is always using the unit (even if it did not claimed it once). Now, this patch optimize the sched in flow. Instead of checking on each vcpu sched in if the kernel unloaded unit's guest state for another competing host process we do this when we enter the guest. But we only do it when we enter the guest from QEMU, not when we enter the guest after a context switch on cond_resched(), no? Ah, I missed the call to the load function in handle_exit(). Ok, I think that approach should work. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] powerpc/kvm: Remove redundant save of SIER AND MMCR2
On Thu, Jul 03, 2014 at 03:42:34PM +0930, Joel Stanley wrote: > These two registers are already saved in the block above. Aside from > being unnecessary, by the time we get down to the second save location > r8 no longer contains MMCR2, so we are clobbering the saved value with > PMC5. > > Signed-off-by: Joel Stanley Eek! Acked-by: Paul Mackerras ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/6 v2] KVM: PPC: Book3E: Increase FPU laziness
On 03.07.14 17:46, mihai.cara...@freescale.com wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 03, 2014 3:29 PM To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org Cc: k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 3/6 v2] KVM: PPC: Book3E: Increase FPU laziness On 30.06.14 17:34, Mihai Caraman wrote: Increase FPU laziness by calling kvmppc_load_guest_fp() just before returning to guest instead of each sched in. Without this improvement an interrupt may also claim floting point corrupting guest state. How do you handle context switching with this patch applied? During most of the guest's lifetime we never exit kvmppc_vcpu_run(), so when the guest gets switched out all FPU state gets lost? No, we had this discussion in ver 1. The FP/VMX/VSX is implemented lazy in the kernel i.e. the unit state is not saved/restored until another thread that once claimed the unit is sched in. Since FP/VMX/VSX can be activated by the guest independent of the host, the vcpu thread is always using the unit (even if it did not claimed it once). Now, this patch optimize the sched in flow. Instead of checking on each vcpu sched in if the kernel unloaded unit's guest state for another competing host process we do this when we enter the guest. But we only do it when we enter the guest from QEMU, not when we enter the guest after a context switch on cond_resched(), no? Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc/kvm: support to handle sw breakpoint
On Friday 04 July 2014 12:18 PM, Alexander Graf wrote: > > On 04.07.14 06:34, Madhavan Srinivasan wrote: >> On Thursday 03 July 2014 05:21 PM, Alexander Graf wrote: >>> On 01.07.14 10:41, Madhavan Srinivasan wrote: This patch adds kernel side support for software breakpoint. Design is that, by using an illegal instruction, we trap to hypervisor via Emulation Assistance interrupt, where we check for the illegal instruction and accordingly we return to Host or Guest. Patch also adds support for software breakpoint in PR KVM. Patch mandates use of "abs" instruction as sw breakpoint instruction (primary opcode 31 and extended opcode 360). Based on PowerISA v2.01, ABS instruction has been dropped from the architecture and treated an illegal instruction. Changes v1->v2: Moved the debug instruction #def to kvm_book3s.h. This way PR_KVM can also share it. Added code to use KVM get one reg infrastructure to get debug opcode. Updated emulate.c to include emulation of debug instruction incase of PR_KVM. Made changes to commit message. Signed-off-by: Madhavan Srinivasan --- arch/powerpc/include/asm/kvm_book3s.h |8 arch/powerpc/include/asm/ppc-opcode.h |5 + arch/powerpc/kvm/book3s.c |3 ++- arch/powerpc/kvm/book3s_hv.c |9 + arch/powerpc/kvm/book3s_pr.c |3 +++ arch/powerpc/kvm/emulate.c| 10 ++ 6 files changed, 37 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index f52f656..180d549 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -24,6 +24,14 @@ #include #include +/* + * KVMPPC_INST_BOOK3S_DEBUG is debug Instruction for supporting Software Breakpoint. + * Instruction mnemonic is ABS, primary opcode is 31 and extended opcode is 360. + * Based on PowerISA v2.01, ABS instruction has been dropped from the architecture + * and treated an illegal instruction. + */ +#define KVMPPC_INST_BOOK3S_DEBUG0x7c0002d0 >>> This will still break with LE guests. >>> >> I am told to try with all 0s opcode. So rewriting the patch. > > The problem with "all 0s" is that it's reasonably likely to occur on > real world code. Hence Segher was proposing something like 0x0000 > which should be the same regardless of endianness, but has a certain > appeal of intentional placement ;). > Ok Sure. >> + struct kvmppc_bat { u64 raw; u32 bepi; diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 3132bb9..3fbb4c1 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -111,6 +111,11 @@ #define OP_31_XOP_LHBRX 790 #define OP_31_XOP_STHBRX918 +/* KVMPPC_INST_BOOK3S_DEBUG -- Software breakpoint Instruction + * Instruction mnemonic is ABS, primary opcode is 31 and extended opcode is 360. + */ +#define OP_31_XOP_ABS360 + #define OP_LWZ 32 #define OP_LD 58 #define OP_LWZU 33 diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index c254c27..b40fe5d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { -return -EINVAL; +vcpu->guest_debug = dbg->control; +return 0; } void kvmppc_decrementer_func(unsigned long data) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 7a12edb..402c1ec 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -725,8 +725,14 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, * we don't emulate any guest instructions at this stage. */ case BOOK3S_INTERRUPT_H_EMUL_ASSIST: +if (kvmppc_get_last_inst(vcpu) == KVMPPC_INST_BOOK3S_DEBUG ) { +run->exit_reason = KVM_EXIT_DEBUG; +run->debug.arch.address = kvmppc_get_pc(vcpu); +r = RESUME_HOST; >>> Phew - why can't we just go into the normal instruction emulator for >>> EMUL_ASSIST? >>> >> IIUC, using the emulation_assist_interrupt function (kernel/trap.c) ? > > I was more thinking of kvmppc_emulate_instruction() :). > This makes sense. Can use the same call for pr kvm also. awesome :) > > Alex > __