Re: [PATCH v4] powerpc/powernv: hwmon driver for power, fan rpm, voltage and temperature

2014-07-04 Thread Guenter Roeck

On 07/04/2014 04:02 AM, Neelesh Gupta wrote:

This patch adds basic kernel support for reading power values, fan
speed rpm, voltage and temperature data on powernv platforms which will
be exported to user space through sysfs interface.



Hi Neelesh,

Copying devicetree mailing list. Please copy it on the next (hopefully final)
revision; devicetree maintainers at least need a chance to comment.
[ Yes, I understand this is a shipping product, but still ... ]

I am ok with the code except for a couple of minor nitpicks (see below).

Thanks,
Guenter


Test results:
-
[root@tul163p1 ~]# sensors
ibmpowernv-isa-
Adapter: ISA adapter
fan1:5567 RPM  (min =0 RPM)
fan2:5232 RPM  (min =0 RPM)
fan3:5532 RPM  (min =0 RPM)
fan4:4945 RPM  (min =0 RPM)
fan5:   0 RPM  (min =0 RPM)
fan6:   0 RPM  (min =0 RPM)
fan7:7392 RPM  (min =0 RPM)
fan8:7936 RPM  (min =0 RPM)
temp1:+39.0°C  (high =  +0.0°C)
power1:  191.00 W

[root@tul163p1 ~]# ls /sys/devices/platform/
alarmtimer  ibmpowernv.0  power  rtc-generic  serial8250  uevent
[root@tul163p1 ~]# ls /sys/devices/platform/ibmpowernv.0/hwmon/hwmon0/
device  fan2_minfan4_minfan6_minfan8_min   power
fan1_fault  fan3_fault  fan5_fault  fan7_fault  in1_fault  power1_input
fan1_input  fan3_input  fan5_input  fan7_input  in2_fault  subsystem
fan1_minfan3_minfan5_minfan7_minin3_fault  temp1_input
fan2_fault  fan4_fault  fan6_fault  fan8_fault  in4_fault  temp1_max
fan2_input  fan4_input  fan6_input  fan8_input  name   uevent
[root@tul163p1 ~]#
[root@tul163p1 ~]# ls /sys/class/hwmon/hwmon0/
device  fan2_minfan4_minfan6_minfan8_min   power
fan1_fault  fan3_fault  fan5_fault  fan7_fault  in1_fault  power1_input
fan1_input  fan3_input  fan5_input  fan7_input  in2_fault  subsystem
fan1_minfan3_minfan5_minfan7_minin3_fault  temp1_input
fan2_fault  fan4_fault  fan6_fault  fan8_fault  in4_fault  temp1_max
fan2_input  fan4_input  fan6_input  fan8_input  name   uevent
[root@tul163p1 ~]#

Signed-off-by: Neelesh Gupta 
---

Changes in v4
=
- Replaced pr_err() with dev_err() for loggin print messages.
- Using kstrtou32() function for converting string to u32 instead of sscanf().

Changes in v3
=
- Fixed an endianness bug leading the driver to break on LE.
- Fixed a bug that when one of the 'attribute_group' not populated, following
   groups attributes were dropped.
- Rewrite the get_sensor_index_attr() function to handle all the error scenarios
   like 'sscanf' etc.
- Fixed all the errors/warnings related to coding style/whitespace.
- Added 'Documentation' files.
- Addressed remaining review comments on V2.

Changes in v2
=
- Generic use of devm_* functions in hwmon like using devm_kzalloc() for dynamic
   memory request, avoiding the need to explicit free of memory.
   Adding 'struct attribute_group' as member of platform data structure to be
   populated and then passed to devm_hwmon_device_register_with_groups().

   Note: Having an array of pointers of 'attribute_group' and each group
   corresponds to 'enum sensors' type. Not completely sure, if it's ideal or
   could have just one group populated with attributes of sensor types?

- 'ibmpowernv' is not hot-pluggable device so moving 'platform_driver' callback
   function (probe) as part of __init code.
- Fixed issues related to coding style.
- Other general comments in v1.

  .../devicetree/bindings/hwmon/ibmpowernv.txt   |   27 +
  Documentation/hwmon/ibmpowernv |   41 ++
  drivers/hwmon/Kconfig  |   11 +
  drivers/hwmon/Makefile |1
  drivers/hwmon/ibmpowernv.c |  362 
  5 files changed, 442 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/hwmon/ibmpowernv.txt
  create mode 100644 Documentation/hwmon/ibmpowernv
  create mode 100644 drivers/hwmon/ibmpowernv.c

diff --git a/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt 
b/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt
new file mode 100644
index 000..e3bd1eb
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt
@@ -0,0 +1,27 @@
+IBM POWERNV platform sensors
+
+
+Required node properties:
+- compatible: must be one of
+   "ibm,opal-sensor-cooling-fan"
+   "ibm,opal-sensor-amb-temp"
+   "ibm,opal-sensor-power-supply"
+   "ibm,opal-sensor-power"
+- sensor-id: an opaque id provided by the firmware to the kernel, identifies a
+given sensor and its attribute data
+
+Example sensors node:
+
+cooling-fan#8-data {
+   sensor-id = <0x7052107>;
+   phandle = <0x1028>;
+   linux,phandle = <0x1028>;
+   compatible = "ibm,opal-sensor-cooling-fan";


phandle and linux-phandle are neither

Re: [PATCH v2] powerpc/kvm: support to handle sw breakpoint

2014-07-04 Thread Madhavan Srinivasan
On Thursday 03 July 2014 05:21 PM, Alexander Graf wrote:
> 
> On 01.07.14 10:41, Madhavan Srinivasan wrote:
>> This patch adds kernel side support for software breakpoint.
>> Design is that, by using an illegal instruction, we trap to hypervisor
>> via Emulation Assistance interrupt, where we check for the illegal
>> instruction
>> and accordingly we return to Host or Guest. Patch also adds support for
>> software breakpoint in PR KVM.
>>
>> Patch mandates use of "abs" instruction as sw breakpoint instruction
>> (primary opcode 31 and extended opcode 360). Based on PowerISA v2.01,
>> ABS instruction has been dropped from the architecture and treated an
>> illegal instruction.
>>
>> Changes v1->v2:
>>
>>   Moved the debug instruction #def to kvm_book3s.h. This way PR_KVM
>> can also share it.
>>   Added code to use KVM get one reg infrastructure to get debug opcode.
>>   Updated emulate.c to include emulation of debug instruction incase
>> of PR_KVM.
>>   Made changes to commit message.
>>
>> Signed-off-by: Madhavan Srinivasan 
>> ---
>>   arch/powerpc/include/asm/kvm_book3s.h |8 
>>   arch/powerpc/include/asm/ppc-opcode.h |5 +
>>   arch/powerpc/kvm/book3s.c |3 ++-
>>   arch/powerpc/kvm/book3s_hv.c  |9 +
>>   arch/powerpc/kvm/book3s_pr.c  |3 +++
>>   arch/powerpc/kvm/emulate.c|   10 ++
>>   6 files changed, 37 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_book3s.h
>> b/arch/powerpc/include/asm/kvm_book3s.h
>> index f52f656..180d549 100644
>> --- a/arch/powerpc/include/asm/kvm_book3s.h
>> +++ b/arch/powerpc/include/asm/kvm_book3s.h
>> @@ -24,6 +24,14 @@
>>   #include 
>>   #include 
>>   +/*
>> + * KVMPPC_INST_BOOK3S_DEBUG is debug Instruction for supporting
>> Software Breakpoint.
>> + * Instruction mnemonic is ABS, primary opcode is 31 and extended
>> opcode is 360.
>> + * Based on PowerISA v2.01, ABS instruction has been dropped from the
>> architecture
>> + * and treated an illegal instruction.
>> + */
>> +#define KVMPPC_INST_BOOK3S_DEBUG0x7c0002d0
> 
> This will still break with LE guests.
> 

I am told to try with all 0s opcode. So rewriting the patch.

>> +
>>   struct kvmppc_bat {
>>   u64 raw;
>>   u32 bepi;
>> diff --git a/arch/powerpc/include/asm/ppc-opcode.h
>> b/arch/powerpc/include/asm/ppc-opcode.h
>> index 3132bb9..3fbb4c1 100644
>> --- a/arch/powerpc/include/asm/ppc-opcode.h
>> +++ b/arch/powerpc/include/asm/ppc-opcode.h
>> @@ -111,6 +111,11 @@
>>   #define OP_31_XOP_LHBRX 790
>>   #define OP_31_XOP_STHBRX918
>>   +/* KVMPPC_INST_BOOK3S_DEBUG -- Software breakpoint Instruction
>> + * Instruction mnemonic is ABS, primary opcode is 31 and extended
>> opcode is 360.
>> + */
>> +#define OP_31_XOP_ABS360
>> +
>>   #define OP_LWZ  32
>>   #define OP_LD   58
>>   #define OP_LWZU 33
>> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
>> index c254c27..b40fe5d 100644
>> --- a/arch/powerpc/kvm/book3s.c
>> +++ b/arch/powerpc/kvm/book3s.c
>> @@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu
>> *vcpu,
>>   int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>>   struct kvm_guest_debug *dbg)
>>   {
>> -return -EINVAL;
>> +vcpu->guest_debug = dbg->control;
>> +return 0;
>>   }
>> void kvmppc_decrementer_func(unsigned long data)
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 7a12edb..402c1ec 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -725,8 +725,14 @@ static int kvmppc_handle_exit_hv(struct kvm_run
>> *run, struct kvm_vcpu *vcpu,
>>* we don't emulate any guest instructions at this stage.
>>*/
>>   case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
>> +if (kvmppc_get_last_inst(vcpu) == KVMPPC_INST_BOOK3S_DEBUG ) {
>> +run->exit_reason = KVM_EXIT_DEBUG;
>> +run->debug.arch.address = kvmppc_get_pc(vcpu);
>> +r = RESUME_HOST;
> 
> Phew - why can't we just go into the normal instruction emulator for
> EMUL_ASSIST?
> 

IIUC, using the emulation_assist_interrupt function (kernel/trap.c) ?

Thanks for review
Regards
Maddy

> 
> Alex
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Disable RELOCATABLE for COMPILE_TEST with PPC64

2014-07-04 Thread Guenter Roeck

ping ...

Guenter

On 06/30/2014 11:45 AM, Guenter Roeck wrote:

powerpc:allmodconfig has been failing for some time with the following
error.

arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:1312: Error: attempt to move .org backwards
make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1

A number of attempts to fix the problem by moving around code have been
unsuccessful and resulted in failed builds for some configurations and
the discovery of toolchain bugs.

Fix the problem by disabling RELOCATABLE for COMPILE_TEST builds instead.
While this is less than perfect, it avoids substantial code changes
which would otherwise be necessary just to make COMPILE_TEST builds
happy and might have undesired side effects.

Signed-off-by: Guenter Roeck 
---
  arch/powerpc/Kconfig | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index bd6dd6e..fefe7c8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -414,7 +414,7 @@ config KEXEC
  config CRASH_DUMP
bool "Build a kdump crash kernel"
depends on PPC64 || 6xx || FSL_BOOKE || (44x && !SMP)
-   select RELOCATABLE if PPC64 || 44x || FSL_BOOKE
+   select RELOCATABLE if (PPC64 && !COMPILE_TEST) || 44x || FSL_BOOKE
help
  Build a kernel suitable for use as a kdump capture kernel.
  The same kernel binary can be used as production kernel and dump
@@ -1017,6 +1017,7 @@ endmenu
  if PPC64
  config RELOCATABLE
bool "Build a relocatable kernel"
+   depends on !COMPILE_TEST
select NONSTATIC_KERNEL
help
  This builds a kernel image that is capable of running anywhere



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 1/3] dmaengine: mpc512x: add device tree binding document

2014-07-04 Thread Mark Rutland
Hi Alexander,

Apologies for the late reply. DT-related email is somewhat a firehose
and unfortunately I lose track of things.

On Thu, Jun 19, 2014 at 02:59:19PM +0100, Alexander Popov wrote:
> 2014-06-18 18:56 GMT+04:00 Alexander Popov :
> > 2014-06-18 17:37 GMT+04:00 Mark Rutland :
> >> On Wed, Jun 18, 2014 at 11:48:10AM +0100, Alexander Popov wrote:
> >>> Introduce a device tree binding document for the MPC512x DMA controller
> >>> +Optional properties:
> >>> +- #dma-cells: the length of the DMA specifier, must be <1>.
> >>> + Each channel of this DMA controller has a peripheral request line,
> >>> + the assignment is fixed in hardware. This one cell
> >>> + in dmas property of a client device represents the channel number.
> >>
> >> Surely this is required to be able to refer to DMA channels on the
> >> device?
> >
> > Excuse me, I didn't understand your question.
> > Do you inquire about the reason of making #dma-cells an optional property?
> > It's optional because device tree based lookup support is made
> > optional (part 3/3).
> 
> Mark, did I answer your question?
> Should I fix anything in this patch series?

I would move it under required properties even if we happen to not use
it in certain edge cases.

Moving forwards everything should be DT-driven, so it'll be necessary.

Thanks,
Mark.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 2/3] iommu/fsl: Fix the device domain attach condition.

2014-07-04 Thread Varun Sethi


> -Original Message-
> From: Joerg Roedel [mailto:j...@8bytes.org]
> Sent: Friday, July 04, 2014 4:25 PM
> To: Sethi Varun-B16395
> Cc: io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> linuxppc-dev@lists.ozlabs.org; alex.william...@redhat.com
> Subject: Re: [PATCH 2/3] iommu/fsl: Fix the device domain attach
> condition.
> 
> Hmm,
> 
> On Tue, Jun 24, 2014 at 07:27:16PM +0530, Varun Sethi wrote:
> > -   old_domain_info = find_domain(dev);
> > +   old_domain_info = dev->archdata.iommu_domain;
> > if (old_domain_info && old_domain_info->domain != dma_domain) {
> > spin_unlock_irqrestore(&device_domain_lock, flags);
> > detach_device(dev, old_domain_info->domain);
> 
> Wouldn't this set dev->archdata.iommu_domain to NULL anyway, so that ...
> 
Not for the case where device has multiple LIODNs.

> > @@ -399,7 +394,7 @@ static void attach_device(struct fsl_dma_domain
> *dma_domain, int liodn, struct d
> >  * the info for the first LIODN as all
> >  * LIODNs share the same domain
> >  */
> > -   if (!old_domain_info)
> > +   if (!dev->archdata.iommu_domain)
> > dev->archdata.iommu_domain = info;
> 
> We already know that it _must_ be NULL here?
> 

That won't be true for devices having multiple LIODNs

> > spin_unlock_irqrestore(&device_domain_lock, flags);
> 
> This would shrink down the patch to:
> 
> diff --git a/drivers/iommu/fsl_pamu_domain.c
> b/drivers/iommu/fsl_pamu_domain.c index 93072ba..d21b554 100644
> --- a/drivers/iommu/fsl_pamu_domain.c
> +++ b/drivers/iommu/fsl_pamu_domain.c
> @@ -399,8 +399,7 @@ static void attach_device(struct fsl_dma_domain
> *dma_domain, int liodn, struct d
>* the info for the first LIODN as all
>* LIODNs share the same domain
>*/
> - if (!old_domain_info)
> - dev->archdata.iommu_domain = info;
> + dev->archdata.iommu_domain = info;

For devices having multiple LIODNs, we don't want to overwrite the info.

-Varun

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/3] iommu/fsl: Fix PAMU window size check.

2014-07-04 Thread Varun Sethi


> -Original Message-
> From: Joerg Roedel [mailto:j...@8bytes.org]
> Sent: Friday, July 04, 2014 4:15 PM
> To: Sethi Varun-B16395
> Cc: io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> linuxppc-dev@lists.ozlabs.org; alex.william...@redhat.com
> Subject: Re: [PATCH 1/3] iommu/fsl: Fix PAMU window size check.
> 
> On Tue, Jun 24, 2014 at 07:27:15PM +0530, Varun Sethi wrote:
> > /* window size is 2^(WSE+1) bytes */
> > -   return __ffs(addrspace_size) - 1;
> > +   return fls64(addrspace_size) - 2;
> 
> This looks bogus, why do you replace ffs (find-first-bit) by fls (find-
> last-bit)?
> 
Address space size is always a power of 2. This change was required to handle 
address sizes > 32bit width on 32 bit architectures.

-Varun

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4] powerpc/powernv: hwmon driver for power, fan rpm, voltage and temperature

2014-07-04 Thread Neelesh Gupta
This patch adds basic kernel support for reading power values, fan
speed rpm, voltage and temperature data on powernv platforms which will
be exported to user space through sysfs interface.

Test results:
-
[root@tul163p1 ~]# sensors
ibmpowernv-isa-
Adapter: ISA adapter
fan1:5567 RPM  (min =0 RPM)
fan2:5232 RPM  (min =0 RPM)
fan3:5532 RPM  (min =0 RPM)
fan4:4945 RPM  (min =0 RPM)
fan5:   0 RPM  (min =0 RPM)
fan6:   0 RPM  (min =0 RPM)
fan7:7392 RPM  (min =0 RPM)
fan8:7936 RPM  (min =0 RPM)
temp1:+39.0°C  (high =  +0.0°C)
power1:  191.00 W

[root@tul163p1 ~]# ls /sys/devices/platform/
alarmtimer  ibmpowernv.0  power  rtc-generic  serial8250  uevent
[root@tul163p1 ~]# ls /sys/devices/platform/ibmpowernv.0/hwmon/hwmon0/
device  fan2_minfan4_minfan6_minfan8_min   power
fan1_fault  fan3_fault  fan5_fault  fan7_fault  in1_fault  power1_input
fan1_input  fan3_input  fan5_input  fan7_input  in2_fault  subsystem
fan1_minfan3_minfan5_minfan7_minin3_fault  temp1_input
fan2_fault  fan4_fault  fan6_fault  fan8_fault  in4_fault  temp1_max
fan2_input  fan4_input  fan6_input  fan8_input  name   uevent
[root@tul163p1 ~]# 
[root@tul163p1 ~]# ls /sys/class/hwmon/hwmon0/
device  fan2_minfan4_minfan6_minfan8_min   power
fan1_fault  fan3_fault  fan5_fault  fan7_fault  in1_fault  power1_input
fan1_input  fan3_input  fan5_input  fan7_input  in2_fault  subsystem
fan1_minfan3_minfan5_minfan7_minin3_fault  temp1_input
fan2_fault  fan4_fault  fan6_fault  fan8_fault  in4_fault  temp1_max
fan2_input  fan4_input  fan6_input  fan8_input  name   uevent
[root@tul163p1 ~]#

Signed-off-by: Neelesh Gupta 
---

Changes in v4
=
- Replaced pr_err() with dev_err() for loggin print messages.
- Using kstrtou32() function for converting string to u32 instead of sscanf().

Changes in v3
=
- Fixed an endianness bug leading the driver to break on LE.
- Fixed a bug that when one of the 'attribute_group' not populated, following
  groups attributes were dropped.
- Rewrite the get_sensor_index_attr() function to handle all the error scenarios
  like 'sscanf' etc.
- Fixed all the errors/warnings related to coding style/whitespace.
- Added 'Documentation' files.
- Addressed remaining review comments on V2.

Changes in v2
=
- Generic use of devm_* functions in hwmon like using devm_kzalloc() for dynamic
  memory request, avoiding the need to explicit free of memory.
  Adding 'struct attribute_group' as member of platform data structure to be
  populated and then passed to devm_hwmon_device_register_with_groups().

  Note: Having an array of pointers of 'attribute_group' and each group
  corresponds to 'enum sensors' type. Not completely sure, if it's ideal or
  could have just one group populated with attributes of sensor types?

- 'ibmpowernv' is not hot-pluggable device so moving 'platform_driver' callback
  function (probe) as part of __init code.
- Fixed issues related to coding style.
- Other general comments in v1.

 .../devicetree/bindings/hwmon/ibmpowernv.txt   |   27 +
 Documentation/hwmon/ibmpowernv |   41 ++
 drivers/hwmon/Kconfig  |   11 +
 drivers/hwmon/Makefile |1 
 drivers/hwmon/ibmpowernv.c |  362 
 5 files changed, 442 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/ibmpowernv.txt
 create mode 100644 Documentation/hwmon/ibmpowernv
 create mode 100644 drivers/hwmon/ibmpowernv.c

diff --git a/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt 
b/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt
new file mode 100644
index 000..e3bd1eb
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/ibmpowernv.txt
@@ -0,0 +1,27 @@
+IBM POWERNV platform sensors
+
+
+Required node properties:
+- compatible: must be one of
+   "ibm,opal-sensor-cooling-fan"
+   "ibm,opal-sensor-amb-temp"
+   "ibm,opal-sensor-power-supply"
+   "ibm,opal-sensor-power"
+- sensor-id: an opaque id provided by the firmware to the kernel, identifies a
+given sensor and its attribute data
+
+Example sensors node:
+
+cooling-fan#8-data {
+   sensor-id = <0x7052107>;
+   phandle = <0x1028>;
+   linux,phandle = <0x1028>;
+   compatible = "ibm,opal-sensor-cooling-fan";
+};
+
+amb-temp#1-thrs {
+   sensor-id = <0x5096000>;
+   phandle = <0x1017>;
+   linux,phandle = <0x1017>;
+   compatible = "ibm,opal-sensor-amb-temp";
+};
diff --git a/Documentation/hwmon/ibmpowernv b/Documentation/hwmon/ibmpowernv
new file mode 100644
index 000..644245a
--- /dev/null
+++ b/Documentation/hwmon/ibmpowernv
@@ -0,0 +1,41 @@
+Kernel Driver IBMPOWENV
+===

Re: [PATCH 2/3] iommu/fsl: Fix the device domain attach condition.

2014-07-04 Thread Joerg Roedel
Hmm,

On Tue, Jun 24, 2014 at 07:27:16PM +0530, Varun Sethi wrote:
> - old_domain_info = find_domain(dev);
> + old_domain_info = dev->archdata.iommu_domain;
>   if (old_domain_info && old_domain_info->domain != dma_domain) {
>   spin_unlock_irqrestore(&device_domain_lock, flags);
>   detach_device(dev, old_domain_info->domain);

Wouldn't this set dev->archdata.iommu_domain to NULL anyway, so that ...

> @@ -399,7 +394,7 @@ static void attach_device(struct fsl_dma_domain 
> *dma_domain, int liodn, struct d
>* the info for the first LIODN as all
>* LIODNs share the same domain
>*/
> - if (!old_domain_info)
> + if (!dev->archdata.iommu_domain)
>   dev->archdata.iommu_domain = info;

We already know that it _must_ be NULL here?

>   spin_unlock_irqrestore(&device_domain_lock, flags);

This would shrink down the patch to:

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 93072ba..d21b554 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -399,8 +399,7 @@ static void attach_device(struct fsl_dma_domain 
*dma_domain, int liodn, struct d
 * the info for the first LIODN as all
 * LIODNs share the same domain
 */
-   if (!old_domain_info)
-   dev->archdata.iommu_domain = info;
+   dev->archdata.iommu_domain = info;
spin_unlock_irqrestore(&device_domain_lock, flags);
 
 }


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] iommu/fsl: Fix PAMU window size check.

2014-07-04 Thread Joerg Roedel
On Tue, Jun 24, 2014 at 07:27:15PM +0530, Varun Sethi wrote:
>   /* window size is 2^(WSE+1) bytes */
> - return __ffs(addrspace_size) - 1;
> + return fls64(addrspace_size) - 2;

This looks bogus, why do you replace ffs (find-first-bit) by fls
(find-last-bit)?


Joerg


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-04 Thread Alexander Gordeev
On Fri, Jul 04, 2014 at 09:11:50AM +, David Laight wrote:
> > I might be missing something, but we are talking of MSI address space
> > here, aren't we? I am not getting how we could end up with a 'write'
> > to a random kernel location when a unclaimed MSI vector sent. We could
> > only expect a spurious interrupt at worst, which is handled and reported.
> > 
> > Anyway, as I described in my reply to Bjorn, this is not a concern IMO.
> 
> I'm thinking of the following - which might be MSI-X ?
> 1) Hardware requests some interrupts and tells the host the BAR (and offset)
>where the 'vectors' should be written.
> 2) To raise an interrupt the hardware uses the 'vector' as the address
>of a normal PCIe write cycle.
> 
> So if the hardware requests 4 interrupts, but the driver (believing it
> will only use 3) only write 3 vectors, and then the hardware uses the
> 4th vector it can write to a random location.
> 
> Debugging that would be hard!

MSI base address is kind of hardcoded for a platform. A combination of
MSI base address, PCI function number and MSI vector makes a PCI host to
raise interrupt on a CPU. I might be inaccurate in details, but the scenario
you described is impossible AFAICT.

>   David
> 
> 
> 

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-04 Thread David Laight
From: Alexander Gordeev
...
> > Even if you do that, you ought to write valid interrupt information
> > into the 4th slot (maybe replicating one of the earlier interrupts).
> > Then, if the device does raise the 'unexpected' interrupt you don't
> > get a write to a random kernel location.
> 
> I might be missing something, but we are talking of MSI address space
> here, aren't we? I am not getting how we could end up with a 'write'
> to a random kernel location when a unclaimed MSI vector sent. We could
> only expect a spurious interrupt at worst, which is handled and reported.
> 
> Anyway, as I described in my reply to Bjorn, this is not a concern IMO.

I'm thinking of the following - which might be MSI-X ?
1) Hardware requests some interrupts and tells the host the BAR (and offset)
   where the 'vectors' should be written.
2) To raise an interrupt the hardware uses the 'vector' as the address
   of a normal PCIe write cycle.

So if the hardware requests 4 interrupts, but the driver (believing it
will only use 3) only write 3 vectors, and then the hardware uses the
4th vector it can write to a random location.

Debugging that would be hard!

David



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-04 Thread Alexander Gordeev
On Thu, Jul 03, 2014 at 09:20:52AM +, David Laight wrote:
> From: Bjorn Helgaas
> > On Tue, Jun 10, 2014 at 03:10:30PM +0200, Alexander Gordeev wrote:
> > > There are PCI devices that require a particular value written
> > > to the Multiple Message Enable (MME) register while aligned on
> > > power of 2 boundary value of actually used MSI vectors 'nvec'
> > > is a lesser of that MME value:
> > >
> > >   roundup_pow_of_two(nvec) < 'Multiple Message Enable'
> > >
> > > However the existing pci_enable_msi_block() interface is not
> > > able to configure such devices, since the value written to the
> > > MME register is calculated from the number of requested MSIs
> > > 'nvec':
> > >
> > >   'Multiple Message Enable' = roundup_pow_of_two(nvec)
> > 
> > For MSI, software learns how many vectors a device requests by reading
> > the Multiple Message Capable (MMC) field.  This field is encoded, so a
> > device can only request 1, 2, 4, 8, etc., vectors.  It's impossible
> > for a device to request 3 vectors; it would have to round up that up
> > to a power of two and request 4 vectors.
> > 
> > Software writes similarly encoded values to MME to tell the device how
> > many vectors have been allocated for its use.  For example, it's
> > impossible to tell the device that it can use 3 vectors; the OS has to
> > round that up and tell the device it can use 4 vectors.
> > 
> > So if I understand correctly, the point of this series is to take
> > advantage of device-specific knowledge, e.g., the device requests 4
> > vectors via MMC, but we "know" the device is only capable of using 3.
> > Moreover, we tell the device via MME that 4 vectors are available, but
> > we've only actually set up 3 of them.
> ...
> 
> Even if you do that, you ought to write valid interrupt information
> into the 4th slot (maybe replicating one of the earlier interrupts).
> Then, if the device does raise the 'unexpected' interrupt you don't
> get a write to a random kernel location.

I might be missing something, but we are talking of MSI address space
here, aren't we? I am not getting how we could end up with a 'write'
to a random kernel location when a unclaimed MSI vector sent. We could
only expect a spurious interrupt at worst, which is handled and reported.

Anyway, as I described in my reply to Bjorn, this is not a concern IMO.

> Plausibly something similar should be done when a smaller number of
> interrupts is assigned.
> 
>   David
> 

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] PCI/MSI: Add pci_enable_msi_partial()

2014-07-04 Thread Alexander Gordeev
On Wed, Jul 02, 2014 at 02:22:01PM -0600, Bjorn Helgaas wrote:
> On Tue, Jun 10, 2014 at 03:10:30PM +0200, Alexander Gordeev wrote:
> > There are PCI devices that require a particular value written
> > to the Multiple Message Enable (MME) register while aligned on
> > power of 2 boundary value of actually used MSI vectors 'nvec'
> > is a lesser of that MME value:
> > 
> > roundup_pow_of_two(nvec) < 'Multiple Message Enable'
> > 
> > However the existing pci_enable_msi_block() interface is not
> > able to configure such devices, since the value written to the
> > MME register is calculated from the number of requested MSIs
> > 'nvec':
> > 
> > 'Multiple Message Enable' = roundup_pow_of_two(nvec)
> 
> For MSI, software learns how many vectors a device requests by reading
> the Multiple Message Capable (MMC) field.  This field is encoded, so a
> device can only request 1, 2, 4, 8, etc., vectors.  It's impossible
> for a device to request 3 vectors; it would have to round up that up
> to a power of two and request 4 vectors.
> 
> Software writes similarly encoded values to MME to tell the device how
> many vectors have been allocated for its use.  For example, it's
> impossible to tell the device that it can use 3 vectors; the OS has to
> round that up and tell the device it can use 4 vectors.

Nod.

> So if I understand correctly, the point of this series is to take
> advantage of device-specific knowledge, e.g., the device requests 4
> vectors via MMC, but we "know" the device is only capable of using 3.
> Moreover, we tell the device via MME that 4 vectors are available, but
> we've only actually set up 3 of them.

Exactly.

> This makes me uneasy because we're lying to the device, and the device
> is perfectly within spec to use all 4 of those vectors.  If anything
> changes the number of vectors the device uses (new device revision,
> firmware upgrade, etc.), this is liable to break.

If a device committed via non-MSI specific means to send only 3 vectors
out of 4 available why should we expect it to send 4? The probability of
a firmware sending 4/4 vectors in this case is equal to the probability
of sending 5/4 or 16/4, with the very same reason - a bug in the firmware.
Moreover, even vector 4/4 would be unexpected by the device driver, though
it is perfectly within the spec.

As of new device revision or firmware update etc. - it is just yet another
case of device driver vs the firmware match/mismatch. Not including this
change does not help here at all IMHO.

> Can you quantify the benefit of this?  Can't a device already use
> MSI-X to request exactly the number of vectors it can use?  (I know

A Intel AHCI chipset requires 16 vectors written to MME while advertises
(via AHCI registers) and uses only 6. Even attempt to init 8 vectors results
in device's fallback to 1 (!).

> not all devices support MSI-X, but maybe we should just accept MSI for
> what it is and encourage the HW guys to use MSI-X if MSI isn't good
> enough.)
> 
> > In this case the result written to the MME register may not
> > satisfy the aforementioned PCI devices requirement and therefore
> > the PCI functions will not operate in a desired mode.
> 
> I'm not sure what you mean by "will not operate in a desired mode."
> I thought this was an optimization to save vectors and that these
> changes would be completely invisible to the hardware.

Yes, this should be invisible to the hardware. The above is an attempt
to describe the Intel AHCI weirdness in general terms :) I think it
could be omitted.

> Bjorn
> 
> > This update introduces pci_enable_msi_partial() extension to
> > pci_enable_msi_block() interface that accepts extra 'nvec_mme'
> > argument which is then written to MME register while the value
> > of 'nvec' is still used to setup as many interrupts as requested.
> > 
> > As result of this change, architecture-specific callbacks
> > arch_msi_check_device() and arch_setup_msi_irqs() get an extra
> > 'nvec_mme' parameter as well, but it is ignored for now.
> > Therefore, this update is a placeholder for architectures that
> > wish to support pci_enable_msi_partial() function in the future.
> > 
> > Cc: linux-...@vger.kernel.org
> > Cc: linux-m...@linux-mips.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: linux-s...@vger.kernel.org
> > Cc: x...@kernel.org
> > Cc: xen-de...@lists.xenproject.org
> > Cc: io...@lists.linux-foundation.org
> > Cc: linux-...@vger.kernel.org
> > Cc: linux-...@vger.kernel.org
> > Signed-off-by: Alexander Gordeev 
> > ---
> >  Documentation/PCI/MSI-HOWTO.txt |   36 ++--
> >  arch/mips/pci/msi-octeon.c  |2 +-
> >  arch/powerpc/kernel/msi.c   |4 +-
> >  arch/s390/pci/pci.c |2 +-
> >  arch/x86/kernel/x86_init.c  |2 +-
> >  drivers/pci/msi.c   |   83 
> > ++-
> >  include/linux/msi.h |5 +-
> >  include/linux/pci.h |3 +
> >  8 files changed, 115 insertions(+), 22 deletio

Re: [PATCH v2] KVM: PPC: e500: Emulate power management control SPR

2014-07-04 Thread Alexander Graf


On 04.07.14 10:17, Mihai Caraman wrote:

For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0)
to enable idle power down for cores and devices by setting up the idle count
period at boot time. With the host already controlling the power management
configuration the guest could simply benefit from it, so emulate guest request
as a general store.

Signed-off-by: Mihai Caraman 


Thanks, applied to kvm-ppc-queue.


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 0/4] KVM Book3E support for HTW guests

2014-07-04 Thread Alexander Graf


On 03.07.14 16:45, Mihai Caraman wrote:

KVM Book3E support for Hardware Page Tablewalk enabled guests.


It looks reasonably straight forward to me, though I have to admit that 
I find the sind conditions pretty confusing.


Scott, would you mind to have a look at this set too? :)


Thanks a lot!

Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] KVM: PPC: e500: Emulate power management control SPR

2014-07-04 Thread Mihai Caraman
For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0)
to enable idle power down for cores and devices by setting up the idle count
period at boot time. With the host already controlling the power management
configuration the guest could simply benefit from it, so emulate guest request
as a general store.

Signed-off-by: Mihai Caraman 
---
v2:
 - treat the operation as a general store

 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/e500_emulate.c | 12 
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 62b2cee..faf2f0e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -584,6 +584,7 @@ struct kvm_vcpu_arch {
u32 mmucfg;
u32 eptcfg;
u32 epr;
+   u32 pwrmgtcr0;
u32 crit_save;
/* guest debug registers*/
struct debug_reg dbg_reg;
diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
index 002d517..c99c40e 100644
--- a/arch/powerpc/kvm/e500_emulate.c
+++ b/arch/powerpc/kvm/e500_emulate.c
@@ -250,6 +250,14 @@ int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, 
int sprn, ulong spr_va
spr_val);
break;
 
+   case SPRN_PWRMGTCR0:
+   /*
+* Guest relies on host power management configurations
+* Treat the request as a general store
+*/
+   vcpu->arch.pwrmgtcr0 = spr_val;
+   break;
+
/* extra exceptions */
case SPRN_IVOR32:
vcpu->arch.ivor[BOOKE_IRQPRIO_SPE_UNAVAIL] = spr_val;
@@ -368,6 +376,10 @@ int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, 
int sprn, ulong *spr_v
*spr_val = vcpu->arch.eptcfg;
break;
 
+   case SPRN_PWRMGTCR0:
+   *spr_val = vcpu->arch.pwrmgtcr0;
+   break;
+
/* extra exceptions */
case SPRN_IVOR32:
*spr_val = vcpu->arch.ivor[BOOKE_IRQPRIO_SPE_UNAVAIL];
-- 
1.7.11.7

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 2/4] KVM: PPC: Book3E: Handle LRAT error exception

2014-07-04 Thread Alexander Graf


On 03.07.14 16:45, Mihai Caraman wrote:

Handle LRAT error exception with support for lrat mapping and invalidation.

Signed-off-by: Mihai Caraman 
---
  arch/powerpc/include/asm/kvm_host.h   |   1 +
  arch/powerpc/include/asm/kvm_ppc.h|   2 +
  arch/powerpc/include/asm/mmu-book3e.h |   3 +
  arch/powerpc/include/asm/reg_booke.h  |  13 
  arch/powerpc/kernel/asm-offsets.c |   1 +
  arch/powerpc/kvm/booke.c  |  40 +++
  arch/powerpc/kvm/bookehv_interrupts.S |   9 ++-
  arch/powerpc/kvm/e500_mmu_host.c  | 125 ++
  arch/powerpc/kvm/e500mc.c |   2 +
  9 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index bb66d8b..7b6b2ec 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -433,6 +433,7 @@ struct kvm_vcpu_arch {
u32 eplc;
u32 epsc;
u32 oldpir;
+   u64 fault_lper;
  #endif
  
  #if defined(CONFIG_BOOKE)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 9c89cdd..2730a29 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -86,6 +86,8 @@ extern gpa_t kvmppc_mmu_xlate(struct kvm_vcpu *vcpu, unsigned 
int gtlb_index,
gva_t eaddr);
  extern void kvmppc_mmu_dtlb_miss(struct kvm_vcpu *vcpu);
  extern void kvmppc_mmu_itlb_miss(struct kvm_vcpu *vcpu);
+extern void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn);
+extern void kvmppc_lrat_invalidate(struct kvm_vcpu *vcpu);
  
  extern struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm,

  unsigned int id);
diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
b/arch/powerpc/include/asm/mmu-book3e.h
index 088fd9f..ac6acf7 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -40,6 +40,8 @@
  
  /* MAS registers bit definitions */
  
+#define MAS0_ATSEL		0x8000

+#define MAS0_ATSEL_SHIFT   31
  #define MAS0_TLBSEL_MASK0x3000
  #define MAS0_TLBSEL_SHIFT   28
  #define MAS0_TLBSEL(x)  (((x) << MAS0_TLBSEL_SHIFT) & 
MAS0_TLBSEL_MASK)
@@ -53,6 +55,7 @@
  #define MAS0_WQ_CLR_RSRV  0x2000
  
  #define MAS1_VALID		0x8000

+#define MAS1_VALID_SHIFT   31
  #define MAS1_IPROT0x4000
  #define MAS1_TID(x)   (((x) << 16) & 0x3FFF)
  #define MAS1_IND  0x2000
diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index 75bda23..783d617 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -43,6 +43,8 @@
  
  /* Special Purpose Registers (SPRNs)*/

  #define SPRN_DECAR0x036   /* Decrementer Auto Reload Register */
+#define SPRN_LPER  0x038   /* Logical Page Exception Register */
+#define SPRN_LPERU 0x039   /* Logical Page Exception Register Upper */
  #define SPRN_IVPR 0x03F   /* Interrupt Vector Prefix Register */
  #define SPRN_USPRG0   0x100   /* User Special Purpose Register General 0 */
  #define SPRN_SPRG3R   0x103   /* Special Purpose Register General 3 Read */
@@ -358,6 +360,9 @@
  #define ESR_ILK   0x0010  /* Instr. Cache Locking */
  #define ESR_PUO   0x0004  /* Unimplemented Operation 
exception */
  #define ESR_BO0x0002  /* Byte Ordering */
+#define ESR_DATA   0x0400  /* Page Table Data Access */
+#define ESR_TLBI   0x0200  /* Page Table TLB Ineligible */
+#define ESR_PT 0x0100  /* Page Table Translation */
  #define ESR_SPV   0x0080  /* Signal Processing operation 
*/
  
  /* Bit definitions related to the DBCR0. */

@@ -649,6 +654,14 @@
  #define EPC_EPID  0x3fff
  #define EPC_EPID_SHIFT0
  
+/* Bit definitions for LPER */

+#define LPER_ALPN  0x000FF000ULL
+#define LPER_ALPN_SHIFT12
+#define LPER_WIMGE 0x0F80
+#define LPER_WIMGE_SHIFT   7
+#define LPER_LPS   0x000F
+#define LPER_LPS_SHIFT 0
+
  /*
   * The IBM-403 is an even more odd special case, as it is much
   * older than the IBM-405 series.  We put these down here incase someone
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index f5995a9..be6e329 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -713,6 +713,7 @@ int main(void)
DEFINE(VCPU_HOST_MAS4, offsetof(struct kvm_vcpu, arch.host_mas4));
DEFINE(VCPU_HOST_MAS6, offsetof(struct kvm_vcpu, arch.host_mas6));
DEFINE(VCPU_EPLC, offsetof(struct kvm_vcpu, arch.eplc));
+   DEFINE(VCPU_FAULT_LPER, offsetof(struct kvm_vcpu, arch.fault_lper));
  #endif
  
  #ifdef CONFIG_KVM_EXIT_TIMING

diff --git a/arch/powerpc/kvm/booke

Re: [PATCH 1/3] powerpc/kvm: Remove redundant save of SIER AND MMCR2

2014-07-04 Thread Alexander Graf


On 03.07.14 08:12, Joel Stanley wrote:

These two registers are already saved in the block above. Aside from
being unnecessary, by the time we get down to the second save location
r8 no longer contains MMCR2, so we are clobbering the saved value with
PMC5.

Signed-off-by: Joel Stanley 


Reviewed-by: Alexander Graf 

Please CC kvm-ppc@vger and kvm@vger when you send kvm related patches :).

Ben, I think this patch makes sense to go via your tree. Want to take it?


Alex


---
  arch/powerpc/kvm/book3s_hv_interrupts.S | 5 -
  1 file changed, 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_interrupts.S 
b/arch/powerpc/kvm/book3s_hv_interrupts.S
index 8c86422..731be74 100644
--- a/arch/powerpc/kvm/book3s_hv_interrupts.S
+++ b/arch/powerpc/kvm/book3s_hv_interrupts.S
@@ -127,11 +127,6 @@ BEGIN_FTR_SECTION
stw r10, HSTATE_PMC + 24(r13)
stw r11, HSTATE_PMC + 28(r13)
  END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
-BEGIN_FTR_SECTION
-   mfspr   r9, SPRN_SIER
-   std r8, HSTATE_MMCR + 40(r13)
-   std r9, HSTATE_MMCR + 48(r13)
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
  31:
  
  	/*


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/6 v2] KVM: PPC: Book3E: Add ONE_REG AltiVec support

2014-07-04 Thread Alexander Graf


On 03.07.14 18:11, mihai.cara...@freescale.com wrote:

-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Thursday, July 03, 2014 3:34 PM
To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
Cc: k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 5/6 v2] KVM: PPC: Book3E: Add ONE_REG AltiVec support


On 30.06.14 17:34, Mihai Caraman wrote:

Add ONE_REG support for AltiVec on Book3E.

Signed-off-by: Mihai Caraman 

Any chance we can handle these in generic code?

I expected this request :) Can we let this for a second phase to have
e6500 enabled first?


I don't see the value of duplicating code in e500 specific code only to 
remove and combine it in common code in a follow-up patch after that.



Can you share with us a Book3S setup so I can validate the requested
changes? I already fell anxious touching strange hardware specific
Book3S code without running it.


Until a few weeks ago I had an externally reachable G5 machine that we 
could've used for this. Unfortunately I had to replace the box with 
another one that's not quite as stable. I'll try and see if I can fix or 
replace it soon.



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/6 v2] KVM: PPC: Book3E: Increase FPU laziness

2014-07-04 Thread Alexander Graf


On 04.07.14 09:46, Alexander Graf wrote:


On 03.07.14 17:46, mihai.cara...@freescale.com wrote:

-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Thursday, July 03, 2014 3:29 PM
To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
Cc: k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 3/6 v2] KVM: PPC: Book3E: Increase FPU laziness


On 30.06.14 17:34, Mihai Caraman wrote:

Increase FPU laziness by calling kvmppc_load_guest_fp() just before
returning to guest instead of each sched in. Without this improvement
an interrupt may also claim floting point corrupting guest state.
How do you handle context switching with this patch applied? During 
most

of the guest's lifetime we never exit kvmppc_vcpu_run(), so when the
guest gets switched out all FPU state gets lost?
No, we had this discussion in ver 1. The FP/VMX/VSX is implemented 
lazy in
the kernel i.e. the unit state is not saved/restored until another 
thread

that once claimed the unit is sched in.

Since FP/VMX/VSX can be activated by the guest independent of the 
host, the
vcpu thread is always using the unit (even if it did not claimed it 
once).


Now, this patch optimize the sched in flow. Instead of checking on 
each vcpu
sched in if the kernel unloaded unit's guest state for another 
competing host

process we do this when we enter the guest.


But we only do it when we enter the guest from QEMU, not when we enter 
the guest after a context switch on cond_resched(), no?


Ah, I missed the call to the load function in handle_exit(). Ok, I think 
that approach should work.



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc/kvm: Remove redundant save of SIER AND MMCR2

2014-07-04 Thread Paul Mackerras
On Thu, Jul 03, 2014 at 03:42:34PM +0930, Joel Stanley wrote:
> These two registers are already saved in the block above. Aside from
> being unnecessary, by the time we get down to the second save location
> r8 no longer contains MMCR2, so we are clobbering the saved value with
> PMC5.
> 
> Signed-off-by: Joel Stanley 

Eek!

Acked-by: Paul Mackerras 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/6 v2] KVM: PPC: Book3E: Increase FPU laziness

2014-07-04 Thread Alexander Graf


On 03.07.14 17:46, mihai.cara...@freescale.com wrote:

-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Thursday, July 03, 2014 3:29 PM
To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
Cc: k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 3/6 v2] KVM: PPC: Book3E: Increase FPU laziness


On 30.06.14 17:34, Mihai Caraman wrote:

Increase FPU laziness by calling kvmppc_load_guest_fp() just before
returning to guest instead of each sched in. Without this improvement
an interrupt may also claim floting point corrupting guest state.

How do you handle context switching with this patch applied? During most
of the guest's lifetime we never exit kvmppc_vcpu_run(), so when the
guest gets switched out all FPU state gets lost?

No, we had this discussion in ver 1. The FP/VMX/VSX is implemented lazy in
the kernel i.e. the unit state is not saved/restored until another thread
that once claimed the unit is sched in.

Since FP/VMX/VSX can be activated by the guest independent of the host, the
vcpu thread is always using the unit (even if it did not claimed it once).

Now, this patch optimize the sched in flow. Instead of checking on each vcpu
sched in if the kernel unloaded unit's guest state for another competing host
process we do this when we enter the guest.


But we only do it when we enter the guest from QEMU, not when we enter 
the guest after a context switch on cond_resched(), no?



Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc/kvm: support to handle sw breakpoint

2014-07-04 Thread Madhavan Srinivasan
On Friday 04 July 2014 12:18 PM, Alexander Graf wrote:
> 
> On 04.07.14 06:34, Madhavan Srinivasan wrote:
>> On Thursday 03 July 2014 05:21 PM, Alexander Graf wrote:
>>> On 01.07.14 10:41, Madhavan Srinivasan wrote:
 This patch adds kernel side support for software breakpoint.
 Design is that, by using an illegal instruction, we trap to hypervisor
 via Emulation Assistance interrupt, where we check for the illegal
 instruction
 and accordingly we return to Host or Guest. Patch also adds support for
 software breakpoint in PR KVM.

 Patch mandates use of "abs" instruction as sw breakpoint instruction
 (primary opcode 31 and extended opcode 360). Based on PowerISA v2.01,
 ABS instruction has been dropped from the architecture and treated an
 illegal instruction.

 Changes v1->v2:

Moved the debug instruction #def to kvm_book3s.h. This way PR_KVM
 can also share it.
Added code to use KVM get one reg infrastructure to get debug
 opcode.
Updated emulate.c to include emulation of debug instruction incase
 of PR_KVM.
Made changes to commit message.

 Signed-off-by: Madhavan Srinivasan 
 ---
arch/powerpc/include/asm/kvm_book3s.h |8 
arch/powerpc/include/asm/ppc-opcode.h |5 +
arch/powerpc/kvm/book3s.c |3 ++-
arch/powerpc/kvm/book3s_hv.c  |9 +
arch/powerpc/kvm/book3s_pr.c  |3 +++
arch/powerpc/kvm/emulate.c|   10 ++
6 files changed, 37 insertions(+), 1 deletion(-)

 diff --git a/arch/powerpc/include/asm/kvm_book3s.h
 b/arch/powerpc/include/asm/kvm_book3s.h
 index f52f656..180d549 100644
 --- a/arch/powerpc/include/asm/kvm_book3s.h
 +++ b/arch/powerpc/include/asm/kvm_book3s.h
 @@ -24,6 +24,14 @@
#include 
#include 
+/*
 + * KVMPPC_INST_BOOK3S_DEBUG is debug Instruction for supporting
 Software Breakpoint.
 + * Instruction mnemonic is ABS, primary opcode is 31 and extended
 opcode is 360.
 + * Based on PowerISA v2.01, ABS instruction has been dropped from the
 architecture
 + * and treated an illegal instruction.
 + */
 +#define KVMPPC_INST_BOOK3S_DEBUG0x7c0002d0
>>> This will still break with LE guests.
>>>
>> I am told to try with all 0s opcode. So rewriting the patch.
> 
> The problem with "all 0s" is that it's reasonably likely to occur on
> real world code. Hence Segher was proposing something like 0x0000
> which should be the same regardless of endianness, but has a certain
> appeal of intentional placement ;).
> 

Ok Sure.

>>
 +
struct kvmppc_bat {
u64 raw;
u32 bepi;
 diff --git a/arch/powerpc/include/asm/ppc-opcode.h
 b/arch/powerpc/include/asm/ppc-opcode.h
 index 3132bb9..3fbb4c1 100644
 --- a/arch/powerpc/include/asm/ppc-opcode.h
 +++ b/arch/powerpc/include/asm/ppc-opcode.h
 @@ -111,6 +111,11 @@
#define OP_31_XOP_LHBRX 790
#define OP_31_XOP_STHBRX918
+/* KVMPPC_INST_BOOK3S_DEBUG -- Software breakpoint Instruction
 + * Instruction mnemonic is ABS, primary opcode is 31 and extended
 opcode is 360.
 + */
 +#define OP_31_XOP_ABS360
 +
#define OP_LWZ  32
#define OP_LD   58
#define OP_LWZU 33
 diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
 index c254c27..b40fe5d 100644
 --- a/arch/powerpc/kvm/book3s.c
 +++ b/arch/powerpc/kvm/book3s.c
 @@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu
 *vcpu,
int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg)
{
 -return -EINVAL;
 +vcpu->guest_debug = dbg->control;
 +return 0;
}
  void kvmppc_decrementer_func(unsigned long data)
 diff --git a/arch/powerpc/kvm/book3s_hv.c
 b/arch/powerpc/kvm/book3s_hv.c
 index 7a12edb..402c1ec 100644
 --- a/arch/powerpc/kvm/book3s_hv.c
 +++ b/arch/powerpc/kvm/book3s_hv.c
 @@ -725,8 +725,14 @@ static int kvmppc_handle_exit_hv(struct kvm_run
 *run, struct kvm_vcpu *vcpu,
 * we don't emulate any guest instructions at this stage.
 */
case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
 +if (kvmppc_get_last_inst(vcpu) == KVMPPC_INST_BOOK3S_DEBUG ) {
 +run->exit_reason = KVM_EXIT_DEBUG;
 +run->debug.arch.address = kvmppc_get_pc(vcpu);
 +r = RESUME_HOST;
>>> Phew - why can't we just go into the normal instruction emulator for
>>> EMUL_ASSIST?
>>>
>> IIUC, using the emulation_assist_interrupt function (kernel/trap.c) ?
> 
> I was more thinking of kvmppc_emulate_instruction() :).
> 

This makes sense. Can use the same call for pr kvm also. awesome :)

> 
> Alex
> 

__