Re: [PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space

2018-07-18 Thread David Gibson
On Thu, Jul 19, 2018 at 12:25:10PM +1000, Sam Bobroff wrote:
> From: Sam Bobroff 
> 
> It is not currently possible to create the full number of possible
> VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less
> threads per core than it's core stride (or "VSMT mode"). This is
> because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS
> even though the VCPU ID is less than KVM_MAX_VCPU_ID.
> 
> To address this, "pack" the VCORE ID and XIVE offsets by using
> knowledge of the way the VCPU IDs will be used when there are less
> guest threads per core than the core stride. The primary thread of
> each core will always be used first. Then, if the guest uses more than
> one thread per core, these secondary threads will sequentially follow
> the primary in each core.
> 
> So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
> VCPUs are being spaced apart, so at least half of each core is empty
> and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
> into the second half of each core (4..7, in an 8-thread core).
> 
> Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
> each core is being left empty, and we can map down into the second and
> third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
> 
> Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
> threads are being used and 7/8 of the core is empty, allowing use of
> the 1, 3, 5 and 7 thread slots.
> 
> (Strides less than 8 are handled similarly.)
> 
> This allows the VCORE ID or offset to be calculated quickly from the
> VCPU ID or XIVE server numbers, without access to the VCPU structure.
> 
> Signed-off-by: Sam Bobroff 

Reviewed-by: David Gibson 

> ---
> Hello everyone,
> 
> I've completed a trial merge with the guest native-XIVE code and found no
> problems; it's no more difficult than the host side and only requires a few
> calls to xive_vp().
> 
> On that basis, here is v3 (unchanged from v2) as non-RFC and it seems to be
> ready to go.
> 
> Patch set v3:
> Patch 1/1: KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
> 
> Patch set v2:
> Patch 1/1: KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
> * Corrected places in kvm/book3s_xive.c where IDs weren't packed.
> * Because kvmppc_pack_vcpu_id() is only called on P9, there is no need to 
> test "emul_smt_mode > 1", so remove it.
> * Re-ordered block_offsets[] to be more ascending.
> * Added more detailed description of the packing algorithm.
> 
> Patch set v1:
> Patch 1/1: KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
> 
>  arch/powerpc/include/asm/kvm_book3s.h | 44 
> +++
>  arch/powerpc/kvm/book3s_hv.c  | 14 +++
>  arch/powerpc/kvm/book3s_xive.c| 19 +--
>  3 files changed, 66 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> b/arch/powerpc/include/asm/kvm_book3s.h
> index 1f345a0b6ba2..ba4b6e00fca7 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -390,4 +390,48 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu 
> *vcpu);
>  #define SPLIT_HACK_MASK  0xff00
>  #define SPLIT_HACK_OFFS  0xfb00
>  
> +/* Pack a VCPU ID from the [0..KVM_MAX_VCPU_ID) space down to the
> + * [0..KVM_MAX_VCPUS) space, while using knowledge of the guest's core stride
> + * (but not it's actual threading mode, which is not available) to avoid
> + * collisions.
> + *
> + * The implementation leaves VCPU IDs from the range [0..KVM_MAX_VCPUS) 
> (block
> + * 0) unchanged: if the guest is filling each VCORE completely then it will 
> be
> + * using consecutive IDs and it will fill the space without any packing.
> + *
> + * For higher VCPU IDs, the packed ID is based on the VCPU ID modulo
> + * KVM_MAX_VCPUS (effectively masking off the top bits) and then an offset is
> + * added to avoid collisions.
> + *
> + * VCPU IDs in the range [KVM_MAX_VCPUS..(KVM_MAX_VCPUS*2)) (block 1) are 
> only
> + * possible if the guest is leaving at least 1/2 of each VCORE empty, so IDs
> + * can be safely packed into the second half of each VCORE by adding an 
> offset
> + * of (stride / 2).
> + *
> + * Similarly, if VCPU IDs in the range [(KVM_MAX_VCPUS*2)..(KVM_MAX_VCPUS*4))
> + * (blocks 2 and 3) are seen, the guest must be leaving at least 3/4 of each
> + * VCORE empty so packed IDs can be offset by (stride / 4) and (stride * 3 / 
> 4).
> + *
> + * Finally, VCPU IDs from blocks 5..7 will only be seen if the guest is 
> using a
> + * stride of 8 and 1 thread per core so the remaining offsets of 1, 3, 5 and 
> 7
> + * must be free to use.
> + *
> + * (The offsets for each block are stored in block_offsets[], indexed by the
> + * block number if the stride is 8. For cases where the guest's stride is 
> less
> + * than 8, we can re-use the block_offsets array by multiplying the block
> + * 

Re: [PATCH v3] PCI: Data corruption happening due to race condition

2018-07-18 Thread Benjamin Herrenschmidt
On Wed, 2018-07-18 at 18:29 -0500, Bjorn Helgaas wrote:
> [+cc Paul, Michael, linuxppc-dev]
> 

   /...

> > Debugging revealed a race condition between pcie core driver
> > enabling is_added bit(pci_bus_add_device()) and nvme driver
> > reset work-queue enabling is_busmaster bit (by pci_set_master()).
> > As both fields are not handled in atomic manner and that clears
> > is_added bit.
> > 
> > Fix moves device addition is_added bit to separate private flag
> > variable and use different atomic functions to set and retrieve
> > device addition state. As is_added shares different memory
> > location so race condition is avoided.
> 
> Really nice bit of debugging!

Indeed. However I'm not fan of the solution. Shouldn't we instead have
some locking for the content of pci_dev ? I've always been wary of us
having other similar races in there.

As for the powerpc bits, I'm probably the one who wrote them, however,
I'm on vacation this week and right now, no bandwidth to context switch
all that back in :-) So give me a few days and/or ping me next week.

The powerpc PCI code contains a lot of cruft coming from the depth of
history, including rather nasty assumptions. We want to progressively
clean it up, starting with EEH, but it will take time.

Cheers,
Ben.

> > Signed-off-by: Hari Vyas 
> > ---
> >  arch/powerpc/kernel/pci-common.c  |  4 +++-
> >  arch/powerpc/platforms/powernv/pci-ioda.c |  3 ++-
> >  arch/powerpc/platforms/pseries/setup.c|  3 ++-
> >  drivers/pci/bus.c |  6 +++---
> >  drivers/pci/hotplug/acpiphp_glue.c|  2 +-
> >  drivers/pci/pci.h | 11 +++
> >  drivers/pci/probe.c   |  4 ++--
> >  drivers/pci/remove.c  |  5 +++--
> >  include/linux/pci.h   |  1 -
> >  9 files changed, 27 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/pci-common.c 
> > b/arch/powerpc/kernel/pci-common.c
> > index fe9733f..471aac3 100644
> > --- a/arch/powerpc/kernel/pci-common.c
> > +++ b/arch/powerpc/kernel/pci-common.c
> > @@ -42,6 +42,8 @@
> >  #include 
> >  #include 
> >  
> > +#include "../../../drivers/pci/pci.h"
> 
> I see why you need it, but this include path is really ugly.  Outside
> of bootloaders and tools, there are very few instances of includes
> like this that reference a different top-level directory, and I'm not
> very keen about adding more.
> 
> Obviously powerpc is the only arch that needs dev->is_added.  It seems
> to be because "We can only call pcibios_setup_device() after bus setup
> is complete, since some of the platform specific DMA setup code
> depends on it."
> 
> I don't know powerpc, but it does raise the question in my mind of
> whether powerpc could be changed to do the DMA setup more like other
> arches do to remove this ordering dependency and the need to use
> dev->is_added.
> 
> That sounds like a lot of work, but it would have the benefit of
> unifying some code that is probably needlessly arch-specific.
> 
> >  /* hose_spinlock protects accesses to the the phb_bitmap. */
> >  static DEFINE_SPINLOCK(hose_spinlock);
> >  LIST_HEAD(hose_list);
> > @@ -1014,7 +1016,7 @@ void pcibios_setup_bus_devices(struct pci_bus *bus)
> > /* Cardbus can call us to add new devices to a bus, so ignore
> >  * those who are already fully discovered
> >  */
> > -   if (dev->is_added)
> > +   if (pci_dev_is_added(dev))
> > continue;
> >  
> > pcibios_setup_device(dev);
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index 5bd0eb6..70b2e1e 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -46,6 +46,7 @@
> >  
> >  #include "powernv.h"
> >  #include "pci.h"
> > +#include "../../../../drivers/pci/pci.h"
> >  
> >  #define PNV_IODA1_M64_NUM  16  /* Number of M64 BARs   */
> >  #define PNV_IODA1_M64_SEGS 8   /* Segments per M64 BAR */
> > @@ -3138,7 +3139,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
> > pci_dev *pdev)
> > struct pci_dn *pdn;
> > int mul, total_vfs;
> >  
> > -   if (!pdev->is_physfn || pdev->is_added)
> > +   if (!pdev->is_physfn || pci_dev_is_added(pdev))
> > return;
> >  
> > pdn = pci_get_pdn(pdev);
> > diff --git a/arch/powerpc/platforms/pseries/setup.c 
> > b/arch/powerpc/platforms/pseries/setup.c
> > index 139f0af..8a4868a 100644
> > --- a/arch/powerpc/platforms/pseries/setup.c
> > +++ b/arch/powerpc/platforms/pseries/setup.c
> > @@ -71,6 +71,7 @@
> >  #include 
> >  
> >  #include "pseries.h"
> > +#include "../../../../drivers/pci/pci.h"
> >  
> >  int CMO_PrPSP = -1;
> >  int CMO_SecPSP = -1;
> > @@ -664,7 +665,7 @@ static void pseries_pci_fixup_iov_resources(struct 
> > pci_dev *pdev)
> > const int *indexes;
> > struct device_node *dn = 

[PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space

2018-07-18 Thread Sam Bobroff
From: Sam Bobroff 

It is not currently possible to create the full number of possible
VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less
threads per core than it's core stride (or "VSMT mode"). This is
because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS
even though the VCPU ID is less than KVM_MAX_VCPU_ID.

To address this, "pack" the VCORE ID and XIVE offsets by using
knowledge of the way the VCPU IDs will be used when there are less
guest threads per core than the core stride. The primary thread of
each core will always be used first. Then, if the guest uses more than
one thread per core, these secondary threads will sequentially follow
the primary in each core.

So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
VCPUs are being spaced apart, so at least half of each core is empty
and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
into the second half of each core (4..7, in an 8-thread core).

Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
each core is being left empty, and we can map down into the second and
third quarters of each core (2, 3 and 5, 6 in an 8-thread core).

Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
threads are being used and 7/8 of the core is empty, allowing use of
the 1, 3, 5 and 7 thread slots.

(Strides less than 8 are handled similarly.)

This allows the VCORE ID or offset to be calculated quickly from the
VCPU ID or XIVE server numbers, without access to the VCPU structure.

Signed-off-by: Sam Bobroff 
---
Hello everyone,

I've completed a trial merge with the guest native-XIVE code and found no
problems; it's no more difficult than the host side and only requires a few
calls to xive_vp().

On that basis, here is v3 (unchanged from v2) as non-RFC and it seems to be
ready to go.

Patch set v3:
Patch 1/1: KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space

Patch set v2:
Patch 1/1: KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
* Corrected places in kvm/book3s_xive.c where IDs weren't packed.
* Because kvmppc_pack_vcpu_id() is only called on P9, there is no need to test 
"emul_smt_mode > 1", so remove it.
* Re-ordered block_offsets[] to be more ascending.
* Added more detailed description of the packing algorithm.

Patch set v1:
Patch 1/1: KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space

 arch/powerpc/include/asm/kvm_book3s.h | 44 +++
 arch/powerpc/kvm/book3s_hv.c  | 14 +++
 arch/powerpc/kvm/book3s_xive.c| 19 +--
 3 files changed, 66 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 1f345a0b6ba2..ba4b6e00fca7 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -390,4 +390,48 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu 
*vcpu);
 #define SPLIT_HACK_MASK0xff00
 #define SPLIT_HACK_OFFS0xfb00
 
+/* Pack a VCPU ID from the [0..KVM_MAX_VCPU_ID) space down to the
+ * [0..KVM_MAX_VCPUS) space, while using knowledge of the guest's core stride
+ * (but not it's actual threading mode, which is not available) to avoid
+ * collisions.
+ *
+ * The implementation leaves VCPU IDs from the range [0..KVM_MAX_VCPUS) (block
+ * 0) unchanged: if the guest is filling each VCORE completely then it will be
+ * using consecutive IDs and it will fill the space without any packing.
+ *
+ * For higher VCPU IDs, the packed ID is based on the VCPU ID modulo
+ * KVM_MAX_VCPUS (effectively masking off the top bits) and then an offset is
+ * added to avoid collisions.
+ *
+ * VCPU IDs in the range [KVM_MAX_VCPUS..(KVM_MAX_VCPUS*2)) (block 1) are only
+ * possible if the guest is leaving at least 1/2 of each VCORE empty, so IDs
+ * can be safely packed into the second half of each VCORE by adding an offset
+ * of (stride / 2).
+ *
+ * Similarly, if VCPU IDs in the range [(KVM_MAX_VCPUS*2)..(KVM_MAX_VCPUS*4))
+ * (blocks 2 and 3) are seen, the guest must be leaving at least 3/4 of each
+ * VCORE empty so packed IDs can be offset by (stride / 4) and (stride * 3 / 
4).
+ *
+ * Finally, VCPU IDs from blocks 5..7 will only be seen if the guest is using a
+ * stride of 8 and 1 thread per core so the remaining offsets of 1, 3, 5 and 7
+ * must be free to use.
+ *
+ * (The offsets for each block are stored in block_offsets[], indexed by the
+ * block number if the stride is 8. For cases where the guest's stride is less
+ * than 8, we can re-use the block_offsets array by multiplying the block
+ * number by (MAX_SMT_THREADS / stride) to reach the correct entry.)
+ */
+static inline u32 kvmppc_pack_vcpu_id(struct kvm *kvm, u32 id)
+{
+   const int block_offsets[MAX_SMT_THREADS] = {0, 4, 2, 6, 1, 3, 5, 7};
+   int stride = kvm->arch.emul_smt_mode;
+   int block = (id / KVM_MAX_VCPUS) * (MAX_SMT_THREADS 

Re: [PATCH v3] PCI: Data corruption happening due to race condition

2018-07-18 Thread Bjorn Helgaas
[+cc Paul, Michael, linuxppc-dev]

On Tue, Jul 03, 2018 at 02:35:41PM +0530, Hari Vyas wrote:
> When a pci device is detected, a variable is_added is set to
> 1 in pci device structure and proc, sys entries are created.
> 
> When a pci device is removed, first is_added is checked for one
> and then device is detached with clearing of proc and sys
> entries and at end, is_added is set to 0.
> 
> is_added and is_busmaster are bit fields in pci_dev structure
> sharing same memory location.
> 
> A strange issue was observed with multiple times removal and
> rescan of a pcie nvme device using sysfs commands where is_added
> flag was observed as zero instead of one while removing device
> and proc,sys entries are not cleared.  This causes issue in
> later device addition with warning message "proc_dir_entry"
> already registered.
> 
> Debugging revealed a race condition between pcie core driver
> enabling is_added bit(pci_bus_add_device()) and nvme driver
> reset work-queue enabling is_busmaster bit (by pci_set_master()).
> As both fields are not handled in atomic manner and that clears
> is_added bit.
> 
> Fix moves device addition is_added bit to separate private flag
> variable and use different atomic functions to set and retrieve
> device addition state. As is_added shares different memory
> location so race condition is avoided.

Really nice bit of debugging!

> Signed-off-by: Hari Vyas 
> ---
>  arch/powerpc/kernel/pci-common.c  |  4 +++-
>  arch/powerpc/platforms/powernv/pci-ioda.c |  3 ++-
>  arch/powerpc/platforms/pseries/setup.c|  3 ++-
>  drivers/pci/bus.c |  6 +++---
>  drivers/pci/hotplug/acpiphp_glue.c|  2 +-
>  drivers/pci/pci.h | 11 +++
>  drivers/pci/probe.c   |  4 ++--
>  drivers/pci/remove.c  |  5 +++--
>  include/linux/pci.h   |  1 -
>  9 files changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/pci-common.c 
> b/arch/powerpc/kernel/pci-common.c
> index fe9733f..471aac3 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -42,6 +42,8 @@
>  #include 
>  #include 
>  
> +#include "../../../drivers/pci/pci.h"

I see why you need it, but this include path is really ugly.  Outside
of bootloaders and tools, there are very few instances of includes
like this that reference a different top-level directory, and I'm not
very keen about adding more.

Obviously powerpc is the only arch that needs dev->is_added.  It seems
to be because "We can only call pcibios_setup_device() after bus setup
is complete, since some of the platform specific DMA setup code
depends on it."

I don't know powerpc, but it does raise the question in my mind of
whether powerpc could be changed to do the DMA setup more like other
arches do to remove this ordering dependency and the need to use
dev->is_added.

That sounds like a lot of work, but it would have the benefit of
unifying some code that is probably needlessly arch-specific.

>  /* hose_spinlock protects accesses to the the phb_bitmap. */
>  static DEFINE_SPINLOCK(hose_spinlock);
>  LIST_HEAD(hose_list);
> @@ -1014,7 +1016,7 @@ void pcibios_setup_bus_devices(struct pci_bus *bus)
>   /* Cardbus can call us to add new devices to a bus, so ignore
>* those who are already fully discovered
>*/
> - if (dev->is_added)
> + if (pci_dev_is_added(dev))
>   continue;
>  
>   pcibios_setup_device(dev);
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 5bd0eb6..70b2e1e 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -46,6 +46,7 @@
>  
>  #include "powernv.h"
>  #include "pci.h"
> +#include "../../../../drivers/pci/pci.h"
>  
>  #define PNV_IODA1_M64_NUM16  /* Number of M64 BARs   */
>  #define PNV_IODA1_M64_SEGS   8   /* Segments per M64 BAR */
> @@ -3138,7 +3139,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
> pci_dev *pdev)
>   struct pci_dn *pdn;
>   int mul, total_vfs;
>  
> - if (!pdev->is_physfn || pdev->is_added)
> + if (!pdev->is_physfn || pci_dev_is_added(pdev))
>   return;
>  
>   pdn = pci_get_pdn(pdev);
> diff --git a/arch/powerpc/platforms/pseries/setup.c 
> b/arch/powerpc/platforms/pseries/setup.c
> index 139f0af..8a4868a 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -71,6 +71,7 @@
>  #include 
>  
>  #include "pseries.h"
> +#include "../../../../drivers/pci/pci.h"
>  
>  int CMO_PrPSP = -1;
>  int CMO_SecPSP = -1;
> @@ -664,7 +665,7 @@ static void pseries_pci_fixup_iov_resources(struct 
> pci_dev *pdev)
>   const int *indexes;
>   struct device_node *dn = pci_device_to_OF_node(pdev);
>  
> - if (!pdev->is_physfn || 

Re: Improvements for the PS3

2018-07-18 Thread Geoff Levand
Hi Fredrik,

On 07/14/2018 09:49 AM, Fredrik Noring wrote:
> so I added a sleep with
> 
> + msleep(1);
> +
>   return 0;
> 
> et voilà, the screen came alive and the kernel panic was revealed! It seems
> the kernel panics so fast that the PS3 frame buffer is unprepared. This is,
> of course, very unfortunate because trying to debug the boot process without
> a screen or any other means of obtaining console text is quite difficult.

We could add a fixed delay there, but I'd like to avoid waiting that
long on every boot.  Why don't you add a kernel module_param named
something like ps3fb_delay that takes a value in milliseconds and a
default of zero.  See:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/video/fbdev/ps3fb.c?h=v4.17#n260

> I suppose the problem is that it relies on interrupts for ps3fb_sync_image
> to regularly copy the image, hence without them the screen isn't updated to
> show kernel panics, etc. Perhaps one way to fix that is to implement the
> struct fb_tile_ops API, so that the console is synchronously updated? Would
> that be acceptable?

I'm not sure if that would work or not.   Maybe Geert is more familiar with it.

-Geoff


Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required

2018-07-18 Thread Andrew Morton
On Wed, 18 Jul 2018 10:49:44 +0800 Baoquan He  wrote:

> For kexec_file loading, if kexec_buf.top_down is 'true', the memory which
> is used to load kernel/initrd/purgatory is supposed to be allocated from
> top to down. This is what we have been doing all along in the old kexec
> loading interface and the kexec loading is still default setting in some
> distributions. However, the current kexec_file loading interface doesn't
> do like this. The function arch_kexec_walk_mem() it calls ignores checking
> kexec_buf.top_down, but calls walk_system_ram_res() directly to go through
> all resources of System RAM from bottom to up, to try to find memory region
> which can contain the specific kexec buffer, then call 
> locate_mem_hole_callback()
> to allocate memory in that found memory region from top to down. This brings
> confusion especially when KASLR is widely supported , users have to make clear
> why kexec/kdump kernel loading position is different between these two
> interfaces in order to exclude unnecessary noises. Hence these two interfaces
> need be unified on behaviour.

As far as I can tell, the above is the whole reason for the patchset,
yes?  To avoid confusing users.

Is that sufficient?  Can we instead simplify their lives by providing
better documentation or informative printks or better Kconfig text,
etc?

And who *are* the people who are performing this configuration?  Random
system administrators?  Linux distro engineers?  If the latter then
they presumably aren't easily confused!

In other words, I'm trying to understand how much benefit this patchset
will provide to our users as a whole.



Re: [PATCH] MAINTAINERS: Drop inactive Vitaly Bordug's email

2018-07-18 Thread David Miller
From: Krzysztof Kozlowski 
Date: Tue, 17 Jul 2018 18:41:54 +0200

> The Vitaly Bordug's email bounces ("ru.mvista.com: Name or service not
> known") and there was no activity (ack, review, sign) since 2009.
> 
> Cc: Vitaly Bordug 
> Cc: Pantelis Antoniou 
> Cc: "David S. Miller" 
> Signed-off-by: Krzysztof Kozlowski 

Applied.


[PATCH] powerpc/ps3: Set driver coherent_dma_mask

2018-07-18 Thread Geoff Levand
Set the coherent_dma_mask for the PS3 ehci, ohci, and snd devices.

Silences WARN_ON_ONCE messages emitted by the dma_alloc_attrs() routine.

Reported-by: Fredrik Noring 
Signed-off-by: Geoff Levand 
---
Hi Michael,

This just silences some warnings.  Can you take it through the powerpc
tree?

-Geoff


 drivers/usb/host/ehci-ps3.c | 6 --
 drivers/usb/host/ohci-ps3.c | 6 --
 sound/ppc/snd_ps3.c | 5 +
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/ehci-ps3.c b/drivers/usb/host/ehci-ps3.c
index 8c733492d8fe..454d8c624a3f 100644
--- a/drivers/usb/host/ehci-ps3.c
+++ b/drivers/usb/host/ehci-ps3.c
@@ -86,7 +86,7 @@ static int ps3_ehci_probe(struct ps3_system_bus_device *dev)
int result;
struct usb_hcd *hcd;
unsigned int virq;
-   static u64 dummy_mask = DMA_BIT_MASK(32);
+   static u64 dummy_mask;
 
if (usb_disabled()) {
result = -ENODEV;
@@ -131,7 +131,9 @@ static int ps3_ehci_probe(struct ps3_system_bus_device *dev)
goto fail_irq;
}
 
-   dev->core.dma_mask = _mask; /* FIXME: for improper usb code */
+   dummy_mask = DMA_BIT_MASK(32);
+   dev->core.dma_mask = _mask;
+   dma_set_coherent_mask(>core, dummy_mask);
 
hcd = usb_create_hcd(_ehci_hc_driver, >core, 
dev_name(>core));
 
diff --git a/drivers/usb/host/ohci-ps3.c b/drivers/usb/host/ohci-ps3.c
index 20a23d795adf..395f9d3bc849 100644
--- a/drivers/usb/host/ohci-ps3.c
+++ b/drivers/usb/host/ohci-ps3.c
@@ -69,7 +69,7 @@ static int ps3_ohci_probe(struct ps3_system_bus_device *dev)
int result;
struct usb_hcd *hcd;
unsigned int virq;
-   static u64 dummy_mask = DMA_BIT_MASK(32);
+   static u64 dummy_mask;
 
if (usb_disabled()) {
result = -ENODEV;
@@ -115,7 +115,9 @@ static int ps3_ohci_probe(struct ps3_system_bus_device *dev)
goto fail_irq;
}
 
-   dev->core.dma_mask = _mask; /* FIXME: for improper usb code */
+   dummy_mask = DMA_BIT_MASK(32);
+   dev->core.dma_mask = _mask;
+   dma_set_coherent_mask(>core, dummy_mask);
 
hcd = usb_create_hcd(_ohci_hc_driver, >core, 
dev_name(>core));
 
diff --git a/sound/ppc/snd_ps3.c b/sound/ppc/snd_ps3.c
index 36f34f434ecb..abe031c9d592 100644
--- a/sound/ppc/snd_ps3.c
+++ b/sound/ppc/snd_ps3.c
@@ -930,6 +930,7 @@ static int snd_ps3_driver_probe(struct 
ps3_system_bus_device *dev)
 {
int i, ret;
u64 lpar_addr, lpar_size;
+   static u64 dummy_mask;
 
if (WARN_ON(!firmware_has_feature(FW_FEATURE_PS3_LV1)))
return -ENODEV;
@@ -970,6 +971,10 @@ static int snd_ps3_driver_probe(struct 
ps3_system_bus_device *dev)
goto clean_mmio;
}
 
+   dummy_mask = DMA_BIT_MASK(32);
+   dev->core.dma_mask = _mask;
+   dma_set_coherent_mask(>core, dummy_mask);
+
snd_ps3_audio_set_base_addr(dev->d_region->bus_addr);
 
/* CONFIG_SND_PS3_DEFAULT_START_DELAY */
-- 
2.14.1



Re: [PATCH] mtd: powernv_flash: set of_node in mtd's dev

2018-07-18 Thread Boris Brezillon
On Fri, 13 Jul 2018 10:15:59 +0200
Rafał Miłecki  wrote:

> From: Rafał Miłecki 
> 
> This enables some features implemented in mtd subsystem like reading
> label and partitioning info from DT.
> 
> Reported-by: Timothy Pearson 
> Signed-off-by: Rafał Miłecki 

Applied.

Thanks,

Boris

> ---
>  drivers/mtd/devices/powernv_flash.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/mtd/devices/powernv_flash.c 
> b/drivers/mtd/devices/powernv_flash.c
> index c1312b141ae0..33593122e49b 100644
> --- a/drivers/mtd/devices/powernv_flash.c
> +++ b/drivers/mtd/devices/powernv_flash.c
> @@ -223,6 +223,7 @@ static int powernv_flash_set_driver_info(struct device 
> *dev,
>   mtd->_read = powernv_flash_read;
>   mtd->_write = powernv_flash_write;
>   mtd->dev.parent = dev;
> + mtd_set_of_node(mtd, dev->of_node);
>   return 0;
>  }
>  



[PATCH v6 0/2] hwmon/powernv: Add attributes to enable/disable sensors

2018-07-18 Thread Shilpasri G Bhat
This patch series adds new attribute to enable or disable a sensor at
runtime.

Changes from v5:
- Dont store the sensor node and parse the device-tree for each sensor
  to find the sensor-group during init

v5 : https://lkml.org/lkml/2018/7/15/15
v4 : https://lkml.org/lkml/2018/7/6/379
v3 : https://lkml.org/lkml/2018/7/5/476
v2 : https://lkml.org/lkml/2018/7/4/263
v1 : https://lkml.org/lkml/2018/3/22/214

Shilpasri G Bhat (2):
  powernv:opal-sensor-groups: Add support to enable sensor groups
  hwmon: ibmpowernv: Add attributes to enable/disable sensor groups

 Documentation/hwmon/ibmpowernv |  43 +++-
 arch/powerpc/include/asm/opal-api.h|   1 +
 arch/powerpc/include/asm/opal.h|   2 +
 .../powerpc/platforms/powernv/opal-sensor-groups.c |  28 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 drivers/hwmon/ibmpowernv.c | 249 ++---
 6 files changed, 290 insertions(+), 34 deletions(-)

-- 
1.8.3.1



[PATCH v6 1/2] powernv:opal-sensor-groups: Add support to enable sensor groups

2018-07-18 Thread Shilpasri G Bhat
Adds support to enable/disable a sensor group at runtime. This
can be used to select the sensor groups that needs to be copied to
main memory by OCC. Sensor groups like power, temperature, current,
voltage, frequency, utilization can be enabled/disabled at runtime.

Signed-off-by: Shilpasri G Bhat 
---
 arch/powerpc/include/asm/opal-api.h|  1 +
 arch/powerpc/include/asm/opal.h|  2 ++
 .../powerpc/platforms/powernv/opal-sensor-groups.c | 28 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 4 files changed, 32 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 3bab299..56a94a1 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -206,6 +206,7 @@
 #define OPAL_NPU_SPA_CLEAR_CACHE   160
 #define OPAL_NPU_TL_SET161
 #define OPAL_SENSOR_READ_U64   162
+#define OPAL_SENSOR_GROUP_ENABLE   163
 #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR   164
 #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR   165
 #define OPAL_LAST  165
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e1b2910..fc0550e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -292,6 +292,7 @@ int64_t opal_imc_counters_init(uint32_t type, uint64_t 
address,
 int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
 int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
 int opal_sensor_group_clear(u32 group_hndl, int token);
+int opal_sensor_group_enable(u32 group_hndl, int token, bool enable);
 
 s64 opal_signal_system_reset(s32 cpu);
 s64 opal_quiesce(u64 shutdown_type, s32 cpu);
@@ -326,6 +327,7 @@ extern int opal_async_wait_response_interruptible(uint64_t 
token,
struct opal_msg *msg);
 extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data);
 extern int opal_get_sensor_data_u64(u32 sensor_hndl, u64 *sensor_data);
+extern int sensor_group_enable(u32 grp_hndl, bool enable);
 
 struct rtc_time;
 extern time64_t opal_get_boot_time(void);
diff --git a/arch/powerpc/platforms/powernv/opal-sensor-groups.c 
b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
index 541c9ea..f7d04b6 100644
--- a/arch/powerpc/platforms/powernv/opal-sensor-groups.c
+++ b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
@@ -32,6 +32,34 @@ struct sg_attr {
struct sg_attr *sgattrs;
 } *sgs;
 
+int sensor_group_enable(u32 handle, bool enable)
+{
+   struct opal_msg msg;
+   int token, ret;
+
+   token = opal_async_get_token_interruptible();
+   if (token < 0)
+   return token;
+
+   ret = opal_sensor_group_enable(handle, token, enable);
+   if (ret == OPAL_ASYNC_COMPLETION) {
+   ret = opal_async_wait_response(token, );
+   if (ret) {
+   pr_devel("Failed to wait for the async response\n");
+   ret = -EIO;
+   goto out;
+   }
+   ret = opal_error_code(opal_get_async_rc(msg));
+   } else {
+   ret = opal_error_code(ret);
+   }
+
+out:
+   opal_async_release_token(token);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(sensor_group_enable);
+
 static ssize_t sg_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
 {
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a8d9b40..8268a1e 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -327,3 +327,4 @@ OPAL_CALL(opal_npu_tl_set,  
OPAL_NPU_TL_SET);
 OPAL_CALL(opal_pci_get_pbcq_tunnel_bar,
OPAL_PCI_GET_PBCQ_TUNNEL_BAR);
 OPAL_CALL(opal_pci_set_pbcq_tunnel_bar,
OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
 OPAL_CALL(opal_sensor_read_u64,OPAL_SENSOR_READ_U64);
+OPAL_CALL(opal_sensor_group_enable,OPAL_SENSOR_GROUP_ENABLE);
-- 
1.8.3.1



[PATCH v6 2/2] hwmon: ibmpowernv: Add attributes to enable/disable sensor groups

2018-07-18 Thread Shilpasri G Bhat
On-Chip-Controller(OCC) is an embedded micro-processor in POWER9 chip
which measures various system and chip level sensors. These sensors
comprises of environmental sensors (like power, temperature, current
and voltage) and performance sensors (like utilization, frequency).
All these sensors are copied to main memory at a regular interval of
100ms. OCC provides a way to select a group of sensors that is copied
to the main memory to increase the update frequency of selected sensor
groups. When a sensor-group is disabled, OCC will not copy it to main
memory and those sensors read 0 values.

This patch provides support for enabling/disabling the sensor groups
like power, temperature, current and voltage. This patch adds new
per-senor sysfs attribute to disable and enable them.

Signed-off-by: Shilpasri G Bhat 
---
Changes from v5:
- Dont store the sensor node and parse the device-tree for each sensor
  to find the sensor-group during init

 Documentation/hwmon/ibmpowernv |  43 ++-
 drivers/hwmon/ibmpowernv.c | 249 +++--
 2 files changed, 258 insertions(+), 34 deletions(-)

diff --git a/Documentation/hwmon/ibmpowernv b/Documentation/hwmon/ibmpowernv
index 8826ba2..5646825 100644
--- a/Documentation/hwmon/ibmpowernv
+++ b/Documentation/hwmon/ibmpowernv
@@ -33,9 +33,48 @@ fanX_input   Measured RPM value.
 fanX_min   Threshold RPM for alert generation.
 fanX_fault 0: No fail condition
1: Failing fan
+
 tempX_inputMeasured ambient temperature.
 tempX_max  Threshold ambient temperature for alert generation.
-inX_input  Measured power supply voltage
+tempX_highest  Historical maximum temperature
+tempX_lowest   Historical minimum temperature
+tempX_enable   Enable/disable all temperature sensors belonging to the
+   sub-group. In POWER9, this attribute corresponds to
+   each OCC. Using this attribute each OCC can be asked to
+   disable/enable all of its temperature sensors.
+   1: Enable
+   0: Disable
+
+inX_input  Measured power supply voltage (millivolt)
 inX_fault  0: No fail condition.
1: Failing power supply.
-power1_input   System power consumption (microWatt)
+inX_highestHistorical maximum voltage
+inX_lowest Historical minimum voltage
+inX_enable Enable/disable all voltage sensors belonging to the
+   sub-group. In POWER9, this attribute corresponds to
+   each OCC. Using this attribute each OCC can be asked to
+   disable/enable all of its voltage sensors.
+   1: Enable
+   0: Disable
+
+powerX_input   Power consumption (microWatt)
+powerX_input_highest   Historical maximum power
+powerX_input_lowestHistorical minimum power
+powerX_enable  Enable/disable all power sensors belonging to the
+   sub-group. In POWER9, this attribute corresponds to
+   each OCC. Using this attribute each OCC can be asked to
+   disable/enable all of its power sensors.
+   1: Enable
+   0: Disable
+
+currX_inputMeasured current (milliampere)
+currX_highest  Historical maximum current
+currX_lowest   Historical minimum current
+currX_enable   Enable/disable all current sensors belonging to the
+   sub-group. In POWER9, this attribute corresponds to
+   each OCC. Using this attribute each OCC can be asked to
+   disable/enable all of its current sensors.
+   1: Enable
+   0: Disable
+
+energyX_input  Cumulative energy (microJoule)
diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c
index f829dad..167acf3 100644
--- a/drivers/hwmon/ibmpowernv.c
+++ b/drivers/hwmon/ibmpowernv.c
@@ -90,11 +90,20 @@ struct sensor_data {
char label[MAX_LABEL_LEN];
char name[MAX_ATTR_LEN];
struct device_attribute dev_attr;
+   struct sensor_group_data *sgrp_data;
+};
+
+struct sensor_group_data {
+   struct mutex mutex;
+   u32 gid;
+   bool enable;
 };
 
 struct platform_data {
const struct attribute_group *attr_groups[MAX_SENSOR_TYPE + 1];
+   struct sensor_group_data *sgrp_data;
u32 sensors_count; /* Total count of sensors from each group */
+   u32 nr_sensor_groups; /* Total number of sensor groups */
 };
 
 static ssize_t show_sensor(struct device *dev, struct device_attribute 
*devattr,
@@ -105,6 +114,9 @@ static ssize_t show_sensor(struct device *dev, struct 
device_attribute *devattr,
ssize_t ret;
u64 x;
 
+   if (sdata->sgrp_data && 

Re: [PATCH] Mark ams driver as orphaned in MAINTAINERS

2018-07-18 Thread Michael Hanselmann
On 18.07.2018 12:00, Michael Ellerman wrote:
> Michael Hanselmann  writes:
>> I no longer have any hardware with the Apple motion sensor and thus
>> relinquish maintainership of the driver.
> 
> Thanks. I think I'll just remove the whole entry, meaning it will be
> caught under the "LINUX FOR POWER MACINTOSH" entry.
> 
> Unless you object. Full patch below.

Sounds good to me.

Reviewed-by: Michael Hanselmann 



Re: [PATCH v14 22/22] selftests/vm: test correct behavior of pkey-0

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> Ensure pkey-0 is allocated on start.  Ensure pkey-0 can be attached
> dynamically in various modes, without failures.  Ensure pkey-0 can be
> freed and allocated.
> 
> Signed-off-by: Ram Pai 
> ---
>  tools/testing/selftests/vm/protection_keys.c |   66 
> +-
>  1 files changed, 64 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/protection_keys.c 
> b/tools/testing/selftests/vm/protection_keys.c
> index 569faf1..156b449 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -999,6 +999,67 @@ void close_test_fds(void)
>   return *ptr;
>  }
>  
> +void test_pkey_alloc_free_attach_pkey0(int *ptr, u16 pkey)
> +{
> + int i, err;
> + int max_nr_pkey_allocs;
> + int alloced_pkeys[NR_PKEYS];
> + int nr_alloced = 0;
> + int newpkey;
> + long size;
> +
> + assert(pkey_last_malloc_record);
> + size = pkey_last_malloc_record->size;
> + /*
> +  * This is a bit of a hack.  But mprotect() requires
> +  * huge-page-aligned sizes when operating on hugetlbfs.
> +  * So, make sure that we use something that's a multiple
> +  * of a huge page when we can.
> +  */
> + if (size >= HPAGE_SIZE)
> + size = HPAGE_SIZE;
> +
> +
> + /* allocate every possible key and make sure key-0 never got allocated 
> */
> + max_nr_pkey_allocs = NR_PKEYS;
> + for (i = 0; i < max_nr_pkey_allocs; i++) {
> + int new_pkey = alloc_pkey();
> + assert(new_pkey != 0);

Missed these earlier.  This needs to be pkey_assert().  We don't want
these tests to ever _actually_ crash.

> + /* attach key-0 in various modes */
> + err = sys_mprotect_pkey(ptr, size, PROT_READ, 0);
> + pkey_assert(!err);
> + err = sys_mprotect_pkey(ptr, size, PROT_WRITE, 0);
> + pkey_assert(!err);
> + err = sys_mprotect_pkey(ptr, size, PROT_EXEC, 0);
> + pkey_assert(!err);
> + err = sys_mprotect_pkey(ptr, size, PROT_READ|PROT_WRITE, 0);
> + pkey_assert(!err);
> + err = sys_mprotect_pkey(ptr, size, PROT_READ|PROT_WRITE|PROT_EXEC, 0);
> + pkey_assert(!err);

This is all fine.

> + /* free key-0 */
> + err = sys_pkey_free(0);
> + pkey_assert(!err);

This part is called out as undefined behavior in the manpage:

>An application should not call pkey_free() on any protection key
>which has been assigned to an address range by pkey_mprotect(2) and
>which is still in use.  The behavior in this case is undefined and
>may result in an error.

I don't think we should be testing for undefined behavior.

> + newpkey = sys_pkey_alloc(0, 0x0);
> + assert(newpkey == 0);
> +}
> +
>  void test_read_of_write_disabled_region(int *ptr, u16 pkey)
>  {
>   int ptr_contents;
> @@ -1144,10 +1205,10 @@ void 
> test_kernel_gup_write_to_write_disabled_region(int *ptr, u16 pkey)
>  void test_pkey_syscalls_on_non_allocated_pkey(int *ptr, u16 pkey)
>  {
>   int err;
> - int i = get_start_key();
> + int i;
>  
>   /* Note: 0 is the default pkey, so don't mess with it */
> - for (; i < NR_PKEYS; i++) {
> + for (i=1; i < NR_PKEYS; i++) {
>   if (pkey == i)
>   continue;

This seems to be randomly reverting earlier changes.


Re: [PATCH v14 20/22] selftests/vm: testcases must restore pkey-permissions

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> Generally the signal handler restores the state of the pkey register
> before returning. However there are times when the read/write operation
> can legitamely fail without invoking the signal handler.  Eg: A
> sys_read() operaton to a write-protected page should be disallowed.  In
> such a case the state of the pkey register is not restored to its
> original state.  Test cases may not remember to restoring the key
> register state. During cleanup generically restore the key permissions.

This would, indeed be a good thing to do for a well-behaved application.

But, for selftests, why does it matter what state we leave the key in?
Doesn't the test itself need to establish permissions?  Don't we *do*
that at pkey_alloc() anyway?

What problem does this solve?

> diff --git a/tools/testing/selftests/vm/protection_keys.c 
> b/tools/testing/selftests/vm/protection_keys.c
> index 8a6afdd..ea3cf04 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -1476,8 +1476,13 @@ void run_tests_once(void)
>   pkey_tests[test_nr](ptr, pkey);
>   dprintf1("freeing test memory: %p\n", ptr);
>   free_pkey_malloc(ptr);
> +
> + /* restore the permission on the key after use */
> + pkey_access_allow(pkey);
> + pkey_write_allow(pkey);
>   sys_pkey_free(pkey);
>  
> +
>   dprintf1("pkey_faults: %d\n", pkey_faults);
>   dprintf1("orig_pkey_faults: %d\n", orig_pkey_faults);





Re: [PATCH v14 16/22] selftests/vm: fix an assertion in test_pkey_alloc_exhaust()

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> The maximum number of keys that can be allocated has to
> take into consideration, that some keys are reserved by
> the architecture for   specific   purpose. Hence cannot
> be allocated.

Back to incomplete sentences, I see. :)

How about:

Some pkeys which are valid to the hardware are not available
for application use.  Those can not be allocated.

test_pkey_alloc_exhaust() tries to account for these but
___FILL_IN_WHAT_IT_DID_WRONG.  We fix this by
___FILL_IN_WAY_IT_WAS_FIXED.

> diff --git a/tools/testing/selftests/vm/protection_keys.c 
> b/tools/testing/selftests/vm/protection_keys.c
> index d27fa5e..67d841e 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -1171,15 +1171,11 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
>   pkey_assert(i < NR_PKEYS*2);
>  
>   /*
> -  * There are 16 pkeys supported in hardware.  Three are
> -  * allocated by the time we get here:
> -  *   1. The default key (0)
> -  *   2. One possibly consumed by an execute-only mapping.
> -  *   3. One allocated by the test code and passed in via
> -  *  'pkey' to this function.
> -  * Ensure that we can allocate at least another 13 (16-3).
> +  * There are NR_PKEYS pkeys supported in hardware. arch_reserved_keys()
> +  * are reserved. And one key is allocated by the test code and passed
> +  * in via 'pkey' to this function.
>*/
> - pkey_assert(i >= NR_PKEYS-3);
> + pkey_assert(i >= (NR_PKEYS-arch_reserved_keys()-1));
>  
>   for (i = 0; i < nr_allocated_pkeys; i++) {
>   err = sys_pkey_free(allocated_pkeys[i]);

You also killed my nice, shiny, new comment.  You made an attempt to
make up for it two patches ago, but it pales in comparison to mine.  The
fact that I wrote it only a few short week ago makes me very attached to
it, kinda like a new puppy.  I don't want to throw it to the wolves
quite yet.  So, please preserve as much of it as possible, even if it
has to live in the x86 header.

For bonus points, axe this comment in the same patch that you create the
x86 header comment for easier review.


Re: [PATCH v14 15/22] selftests/vm: powerpc implementation to check support for pkey

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> -static inline int cpu_has_pku(void)
> +static inline bool is_pkey_supported(void)
>  {
> - return 1;
> + /*
> +  * No simple way to determine this.
> +  * Lets try allocating a key and see if it succeeds.
> +  */
> + int ret = sys_pkey_alloc(0, 0);
> +
> + if (ret > 0) {
> + sys_pkey_free(ret);
> + return true;
> + }
> + return false;
>  }

This actually works on x86 too.

>  static inline int arch_reserved_keys(void)
> diff --git a/tools/testing/selftests/vm/pkey-x86.h 
> b/tools/testing/selftests/vm/pkey-x86.h
> index f5d0ff2..887acf2 100644
> --- a/tools/testing/selftests/vm/pkey-x86.h
> +++ b/tools/testing/selftests/vm/pkey-x86.h
> @@ -105,7 +105,7 @@ static inline void __cpuid(unsigned int *eax, unsigned 
> int *ebx,
>  #define X86_FEATURE_PKU(1<<3) /* Protection Keys for Userspace */
>  #define X86_FEATURE_OSPKE  (1<<4) /* OS Protection Keys Enable */
>  
> -static inline int cpu_has_pku(void)
> +static inline bool is_pkey_supported(void)
>  {
>   unsigned int eax;
>   unsigned int ebx;
> @@ -118,13 +118,13 @@ static inline int cpu_has_pku(void)
>  
>   if (!(ecx & X86_FEATURE_PKU)) {
>   dprintf2("cpu does not have PKU\n");
> - return 0;
> + return false;
>   }
>   if (!(ecx & X86_FEATURE_OSPKE)) {
>   dprintf2("cpu does not have OSPKE\n");
> - return 0;
> + return false;
>   }
> - return 1;
> + return true;
>  }



>  #define XSTATE_PKEY_BIT  (9)
> diff --git a/tools/testing/selftests/vm/protection_keys.c 
> b/tools/testing/selftests/vm/protection_keys.c
> index 18e1bb7..d27fa5e 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -1389,8 +1389,8 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, 
> u16 pkey)
>   int size = PAGE_SIZE;
>   int sret;
>  
> - if (cpu_has_pku()) {
> - dprintf1("SKIP: %s: no CPU support\n", __func__);
> + if (is_pkey_supported()) {
> + dprintf1("SKIP: %s: no CPU/kernel support\n", __func__);
>   return;
>   }

I actually wanted a kernel-independent check, based entirely on CPUID.
That's specifically why I said "no CPU support".

If you want to do this, please do:

/* powerpc has no enumeration, just assume it has support: */
static inline bool cpu_has_cpu(void) { return true; };

if (cpu_has_pku()) {
dprintf1("SKIP: %s: no CPU support\n", __func__);
return
}

if (kernel_pkey_supported()) {
dprintf1("SKIP: %s: no kernel support\n", __func__);
return;
}




Re: [PATCH v14 14/22] selftests/vm: Introduce generic abstractions

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> Introduce generic abstractions and provide architecture
> specific implementation for the abstractions.

I really wanted to see these two things separated:
1. introduce abstractions
2. introduce ppc implementation

But, I guess most of it is done except for the siginfo stuff.

>  #if defined(__i386__) || defined(__x86_64__) /* arch */
>  #include "pkey-x86.h"
> +#elif defined(__powerpc64__) /* arch */
> +#include "pkey-powerpc.h"
>  #else /* arch */
>  #error Architecture not supported
>  #endif /* arch */
> @@ -186,7 +191,16 @@ static inline int open_hugepage_file(int flag)
>  
>  static inline int get_start_key(void)
>  {
> - return 1;
> + return 0;
> +}

How does this not now break x86?
>  #endif /* _PKEYS_X86_H */
> diff --git a/tools/testing/selftests/vm/protection_keys.c 
> b/tools/testing/selftests/vm/protection_keys.c
> index 304f74f..18e1bb7 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -197,17 +197,18 @@ void dump_mem(void *dumpme, int len_bytes)
>  
>  int pkey_faults;
>  int last_si_pkey = -1;
> +void pkey_access_allow(int pkey);

Please just move the function.

>  void signal_handler(int signum, siginfo_t *si, void *vucontext)
>  {
>   ucontext_t *uctxt = vucontext;
>   int trapno;
>   unsigned long ip;
>   char *fpregs;
> +#if defined(__i386__) || defined(__x86_64__) /* arch */
>   pkey_reg_t *pkey_reg_ptr;
> - u64 siginfo_pkey;
> +#endif /* defined(__i386__) || defined(__x86_64__) */
> + u32 siginfo_pkey;
>   u32 *si_pkey_ptr;
> - int pkey_reg_offset;
> - fpregset_t fpregset;
>  
>   dprint_in_signal = 1;
>   dprintf1("===SIGSEGV\n");
> @@ -217,12 +218,14 @@ void signal_handler(int signum, siginfo_t *si, void 
> *vucontext)
>  
>   trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
>   ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
> - fpregset = uctxt->uc_mcontext.fpregs;
> - fpregs = (void *)fpregset;
> + fpregs = (char *) uctxt->uc_mcontext.fpregs;
>  
>   dprintf2("%s() trapno: %d ip: 0x%016lx info->si_code: %s/%d\n",
>   __func__, trapno, ip, si_code_str(si->si_code),
>   si->si_code);
> +
> +#if defined(__i386__) || defined(__x86_64__) /* arch */
> +
>  #ifdef __i386__
>   /*
>* 32-bit has some extra padding so that userspace can tell whether
> @@ -230,20 +233,21 @@ void signal_handler(int signum, siginfo_t *si, void 
> *vucontext)
>* state.  We just assume that it is here.
>*/
>   fpregs += 0x70;
> -#endif
> - pkey_reg_offset = pkey_reg_xstate_offset();
> - pkey_reg_ptr = (void *)([pkey_reg_offset]);
> +#endif /* __i386__ */
>  
> - dprintf1("siginfo: %p\n", si);
> - dprintf1(" fpregs: %p\n", fpregs);
> + pkey_reg_ptr = (void *)([pkey_reg_xstate_offset()]);

There are unnecessary parenthesis here.

Also, why are you bothering to mess with this?  This is inside the x86
#ifdef, right?

>   /*
> -  * If we got a PKEY fault, we *HAVE* to have at least one bit set in
> +  * If we got a key fault, we *HAVE* to have at least one bit set in
>* here.
>*/
>   dprintf1("pkey_reg_xstate_offset: %d\n", pkey_reg_xstate_offset());
>   if (DEBUG_LEVEL > 4)
>   dump_mem(pkey_reg_ptr - 128, 256);
>   pkey_assert(*pkey_reg_ptr);
> +#endif /* defined(__i386__) || defined(__x86_64__) */
> +
> + dprintf1("siginfo: %p\n", si);
> + dprintf1(" fpregs: %p\n", fpregs);
>  
>   if ((si->si_code == SEGV_MAPERR) ||
>   (si->si_code == SEGV_ACCERR) ||
> @@ -252,22 +256,28 @@ void signal_handler(int signum, siginfo_t *si, void 
> *vucontext)
>   exit(4);
>   }
>  
> - si_pkey_ptr = (u32 *)(((u8 *)si) + si_pkey_offset);
> + si_pkey_ptr = siginfo_get_pkey_ptr(si);
>   dprintf1("si_pkey_ptr: %p\n", si_pkey_ptr);
> - dump_mem((u8 *)si_pkey_ptr - 8, 24);
> + dump_mem(si_pkey_ptr - 8, 24);

You removed the cast here, why?  That changes the pointer math.

Can we merge this as-is.  No, I do not think we can.  If it were _just_
the #ifdefs, we could let it pass, but there are a bunch of rough spots,
not just the #ifdefs.



Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public

2018-07-18 Thread Andy Shevchenko
On Wed, Jul 18, 2018 at 7:36 PM, Andy Shevchenko
 wrote:
> On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He  wrote:
>> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
>> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
>> so that it's shared.

>> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
>> + * contained by 'res', -ECANCELED if no any conflicting entry found.

You also can refer to constants by prefixing them with %, e.g. %-ENOTSUPP.
But this is up to you completely.

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public

2018-07-18 Thread Andy Shevchenko
On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He  wrote:
> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> so that it's shared.

Some minor stuff.

> +/**
> + * reparent_resources - reparent resource children of parent that res covers
> + * @parent: parent resource descriptor
> + * @res: resource descriptor desired by caller
> + *
> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
> + * contained by 'res', -ECANCELED if no any conflicting entry found.

'res' -> @res

> + *
> + * Reparent resource children of 'parent' that conflict with 'res'

Ditto + 'parent' -> @parent

> + * under 'res', and make 'res' replace those children.

Ditto.

> + */
> +int reparent_resources(struct resource *parent, struct resource *res)
> +{
> +   struct resource *p, **pp;
> +   struct resource **firstpp = NULL;
> +
> +   for (pp = >child; (p = *pp) != NULL; pp = >sibling) {
> +   if (p->end < res->start)
> +   continue;
> +   if (res->end < p->start)
> +   break;
> +   if (p->start < res->start || p->end > res->end)
> +   return -ENOTSUPP;   /* not completely contained */
> +   if (firstpp == NULL)
> +   firstpp = pp;
> +   }
> +   if (firstpp == NULL)
> +   return -ECANCELED; /* didn't find any conflicting entries? */
> +   res->parent = parent;
> +   res->child = *firstpp;
> +   res->sibling = *pp;
> +   *firstpp = res;
> +   *pp = NULL;
> +   for (p = res->child; p != NULL; p = p->sibling) {
> +   p->parent = res;

> +   pr_debug("PCI: Reparented %s %pR under %s\n",
> +p->name, p, res->name);

Now, PCI is a bit confusing here.

> +   }
> +   return 0;
> +}
> +EXPORT_SYMBOL(reparent_resources);
> +
>  static void __init __reserve_region_with_split(struct resource *root,
> resource_size_t start, resource_size_t end,
> const char *name)
> --
> 2.13.6
>



-- 
With Best Regards,
Andy Shevchenko


Re: [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump.

2018-07-18 Thread Mahesh Jagannath Salgaonkar
On 07/17/2018 05:22 PM, Michal Hocko wrote:
> On Tue 17-07-18 16:58:10, Mahesh Jagannath Salgaonkar wrote:
>> On 07/16/2018 01:56 PM, Michal Hocko wrote:
>>> On Mon 16-07-18 11:32:56, Mahesh J Salgaonkar wrote:
 One of the primary issues with Firmware Assisted Dump (fadump) on Power
 is that it needs a large amount of memory to be reserved. This reserved
 memory is used for saving the contents of old crashed kernel's memory 
 before
 fadump capture kernel uses old kernel's memory area to boot. However, This
 reserved memory area stays unused until system crash and isn't available
 for production kernel to use.
>>>
>>> How much memory are we talking about. Regular kernel dump process needs
>>> some reserved memory as well. Why that is not a big problem?
>>
>> We reserve around 5% of total system RAM. On large systems with
>> TeraBytes of memory, this reservation can be quite significant.
>>
>> The regular kernel dump uses the kexec method to boot into capture
>> kernel and it can control the parameters that are being passed to
>> capture kernel. This allows a capability to strip down the parameters
>> that can help lowering down the memory requirement for capture kernel to
>> boot. This allows regular kdump to reserve less memory to start with.
>>
>> Where as fadump depends on power firmware (pHyp) to load the capture
>> kernel after full reset and boots like a regular kernel. It needs same
>> amount of memory to boot as the production kernel. On large systems
>> production kernel needs significant amount of memory to boot. Hence
>> fadump needs to reserve enough memory for capture kernel to boot
>> successfully and execute dump capturing operations. By default fadump
>> reserves 5% of total system RAM and in most cases this has worked
>> flawlessly on variety of system configurations. Optionally,
>> 'crashkernel=X' can also be used to specify more fine-tuned memory size
>> for reservation.
> 
> So why do we even care about fadump when regular kexec provides
> (presumably) same functionality with a smaller memory footprint? Or is
> there any reason why kexec doesn't work well on ppc?

Kexec based kdump is loaded by crashing kernel. When OS crashes, the
system is in an inconsistent state, especially the devices. In some
cases, a rogue DMA or ill-behaving device drivers can cause the kdump
capture to fail.

On power platform, fadump solves these issues by taking help from power
firmware, to fully-reset the system, load the fresh copy of same kernel
to capture the dump with PCI and I/O devices reinitialized, making it
more reliable.

Fadump does full system reset, booting system through the regular boot
options i.e the dump capture kernel is booted in the same fashion and
doesn't have specialized kernel command line option. This implies, we
need to give more memory for the system boot. Since the new kernel boots
from the same memory location as crashed kernel, we reserve 5% of memory
where power firmware moves the crashed kernel's memory content. This
reserved memory is completely removed from the available memory. For
large memory systems like 64TB systems, this account to ~ 3TB, which is
a significant chunk of memory production kernel is deprived of. Hence,
this patch adds an improvement to exiting fadump feature to make the
reserved memory available to system for use, using zone movable.

Thanks,
-Mahesh.

> 
 Instead of setting aside a significant chunk of memory that nobody can use,
 take advantage ZONE_MOVABLE to mark a significant chunk of reserved memory
 as ZONE_MOVABLE, so that the kernel is prevented from using, but
 applications are free to use it.
>>>
>>> Why kernel cannot use that memory while userspace can?
>>
>> fadump needs to reserve memory to be able to save crashing kernel's
>> memory, with help from power firmware, before the capture kernel loads
>> into crashing kernel's memory area. Any contents present in this
>> reserved memory will be over-written. If kernel is allowed to use this
>> memory, then we loose that kernel data and won't be part of captured
>> dump, which could be critical to debug root cause of system crash.
> 
> But then you simply screw user memory sitting there. This might be not
> so critical as the kernel memory but still it sounds like you are
> reducing the usefulness of the dump just because of inherent limitations
> of fadump.
> 
>> Kdump and fadump both uses same infrastructure/tool (makedumpfile) to
>> capture the memory dump. While the tool provides flexibility to
>> determine what needs to be part of the dump and what memory to filter
>> out, all supported distributions defaults to "Capture only kernel data
>> and nothing else". Taking advantage of this default we can at least make
>> the reserved memory available for userspace to use.
>>
>> If someone wants to capture userspace data as well then
>> 'fadump=nonmovable' option can be used where reserved pages won't be
>> marked zone movable.
> 
> Ohh, so you have an 

Re: [PATCH] powerpc/prom_init: remove linux,stdout-package property

2018-07-18 Thread Murilo Opsfelder Araujo
On Wed, Jul 18, 2018 at 07:37:37PM +1000, Michael Ellerman wrote:
> Hi Murilo,
> 
> Thanks for the patch.
> 
> Murilo Opsfelder Araujo  writes:
> > This property was added in 2004 by
> >
> > 
> > https://github.com/mpe/linux-fullhistory/commit/689fe5072fe9a0dec914bfa4fa60aed1e54563e6
> >
> > and the only use of it, which was already inside `#if 0`, was removed a 
> > month
> > later by
> >
> > 
> > https://github.com/mpe/linux-fullhistory/commit/1fbe5a6d90f6cd4ea610737ef488719d1a875de7
> >
> > Fixes: https://github.com/linuxppc/linux/issues/125
> 
> That is going to confuse some scripts that are expecting that to be a
> "Fixes: " tag :)
> 
> The proper tag to use there would be "Link:".
> 
> But, I'd prefer we didn't add github issue links to the history, as I'm
> not sure they won't bit-rot eventually. Not because I'm a anti-Microsoft
> conspiracy person but just because it's a repo I set up and manage and
> there's no long term plan for it necessarily.
> 
> > ---
> >  arch/powerpc/kernel/prom_init.c | 2 --
> >  1 file changed, 2 deletions(-)
> 
> Including the link here would be ideal, as it means it doesn't end up in
> the commit history, but it does end up in the mail archive. So if we
> ever really need to find it, it should be there.
> 
> cheers

Hi, Michael.

Thanks for reviewing it.  I've sent v2 with your suggestions:

https://lkml.kernel.org/r/20180718161544.12134-1-muri...@linux.ibm.com

Cheers

--
Murilo



[PATCH v2] powerpc/prom_init: remove linux,stdout-package property

2018-07-18 Thread Murilo Opsfelder Araujo
This property was added in 2004 and the only use of it, which was already inside
`#if 0`, was removed a month later.

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/prom_init.c | 2 --
 1 file changed, 2 deletions(-)

v1..v2:
- Move github.com links to this section so they can still end up in the mail
  archive.
- Remove Fixes: tag from commit message.
- Add Link: tag here.
- Link to v1: 
https://lkml.kernel.org/r/20180712171904.18971-1-muri...@linux.ibm.com

This property was added in 2004 by


https://github.com/mpe/linux-fullhistory/commit/689fe5072fe9a0dec914bfa4fa60aed1e54563e6

and the only use of it, which was already inside `#if 0`, was removed a month
later by


https://github.com/mpe/linux-fullhistory/commit/1fbe5a6d90f6cd4ea610737ef488719d1a875de7

Link: https://github.com/linuxppc/linux/issues/125

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 5425dd3d6a9f..c45fb463c9e5 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -2102,8 +2102,6 @@ static void __init prom_init_stdout(void)
stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
if (stdout_node != PROM_ERROR) {
val = cpu_to_be32(stdout_node);
-   prom_setprop(prom.chosen, "/chosen", "linux,stdout-package",
-, sizeof(val));
 
/* If it's a display, note it */
memset(type, 0, sizeof(type));
-- 
2.17.1



Re: [PATCH v14 13/22] selftests/vm: generic cleanup

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> cleanup the code to satisfy coding styles.
> 
> cc: Dave Hansen 
> cc: Florian Weimer 
> Signed-off-by: Ram Pai 
> ---
>  tools/testing/selftests/vm/protection_keys.c |   64 +
>  1 files changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/protection_keys.c 
> b/tools/testing/selftests/vm/protection_keys.c
> index f50cce8..304f74f 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -4,7 +4,7 @@
>   *
>   * There are examples in here of:
>   *  * how to set protection keys on memory
> - *  * how to set/clear bits in pkey registers (the rights register)
> + *  * how to set/clear bits in Protection Key registers (the rights register)

Huh?  Which coding style says that we can't say "pkey"?

>   *  * how to handle SEGV_PKUERR signals and extract pkey-relevant
>   *information from the siginfo
>   *
> @@ -13,13 +13,18 @@
>   *   prefault pages in at malloc, or not
>   *   protect MPX bounds tables with protection keys?
>   *   make sure VMA splitting/merging is working correctly
> - *   OOMs can destroy mm->mmap (see exit_mmap()), so make sure it is immune 
> to pkeys
> - *   look for pkey "leaks" where it is still set on a VMA but "freed" back 
> to the kernel
> - *   do a plain mprotect() to a mprotect_pkey() area and make sure the pkey 
> sticks
> + *   OOMs can destroy mm->mmap (see exit_mmap()),
> + *   so make sure it is immune to pkeys
> + *   look for pkey "leaks" where it is still set on a VMA
> + *but "freed" back to the kernel
> + *   do a plain mprotect() to a mprotect_pkey() area and make
> + *sure the pkey sticks

This makes it work substantially worse.  That's not acceptable, even if
you did move it under 80 columns.

>   * Compile like this:
> - *   gcc  -o protection_keys-O2 -g -std=gnu99 -pthread -Wall 
> protection_keys.c -lrt -ldl -lm
> - *   gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall 
> protection_keys.c -lrt -ldl -lm
> + *   gcc  -o protection_keys-O2 -g -std=gnu99
> + *-pthread -Wall protection_keys.c -lrt -ldl -lm
> + *   gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99
> + *-pthread -Wall protection_keys.c -lrt -ldl -lm
>   */

Why was this on one line?  Because it was easier to copy and paste.
Please leave it on one line, CodingStyle be damned.

>  #define _GNU_SOURCE
>  #include 
> @@ -263,10 +268,12 @@ void signal_handler(int signum, siginfo_t *si, void 
> *vucontext)
>   __read_pkey_reg());
>   dprintf1("pkey from siginfo: %jx\n", siginfo_pkey);
>   *(u64 *)pkey_reg_ptr = 0x;
> - dprintf1("WARNING: set PRKU=0 to allow faulting instruction to 
> continue\n");
> + dprintf1("WARNING: set PKEY_REG=0 to allow faulting instruction "
> + "to continue\n");

It's actually totally OK to let printk strings go over 80 columns.

>   pkey_faults++;
>   dprintf1("==\n");
>   dprint_in_signal = 0;
> + return;
>  }

Now we're just being silly.

>  
>  int wait_all_children(void)
> @@ -384,7 +391,7 @@ void pkey_disable_set(int pkey, int flags)
>  {
>   unsigned long syscall_flags = 0;
>   int ret;
> - int pkey_rights;
> + u32 pkey_rights;

This is not CodingStyle.  Shouldn't this be the pkey_reg_t that you
introduced earlier in the series?

> -int sys_pkey_alloc(unsigned long flags, unsigned long init_val)
> +int sys_pkey_alloc(unsigned long flags, u64 init_val)
>  {

Um, this is actually a 'unsigned long' in the ABI.

Can you go back through this and actually make sure that these are real
coding style cleanups?


Re: [PATCH v4 4/6] powerpc/fsl: Enable cpu vulnerabilities reporting for NXP PPC BOOK3E

2018-07-18 Thread LEROY Christophe

Diana Madalina Craciun  a écrit :


On 7/17/2018 7:47 PM, LEROY Christophe wrote:

Diana Craciun  a écrit :


The NXP PPC Book3E platforms are not vulnerable to meltdown and
Spectre v4, so make them PPC_BOOK3S_64 specific.

Signed-off-by: Diana Craciun 
---
History:

v2-->v3
- used the existing functions for spectre v1/v2

 arch/powerpc/Kconfig   | 7 ++-
 arch/powerpc/kernel/security.c | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9f2b75f..116c953 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -165,7 +165,7 @@ config PPC
select GENERIC_CLOCKEVENTS_BROADCASTif SMP
select GENERIC_CMOS_UPDATE
select GENERIC_CPU_AUTOPROBE
-   select GENERIC_CPU_VULNERABILITIES  if PPC_BOOK3S_64
+   select GENERIC_CPU_VULNERABILITIES  if PPC_NOSPEC

I don't understand.  You say this patch is to make something specific
to book3s64 specific, and you are creating a new config param that
make things less specific

Christophe


In order to enable the vulnerabilities reporting on NXP socs I need to
enable them for PPC_FSL_BOOK3E. So they will be enabled for both
PPC_FSL_BOOK3E and PPC_BOOK3S_64. This is the reason for adding the
Kconfig. However this will enable: spectre v1/v2 and meltdown. NXP socs
are not vulnerable to meltdown, so I made the meltdown reporting
PPC_BOOK3S_64 specific. I guess I can have the PPC_NOSPEC definition in
a separate patch to be more clear.


Yes you can. Or keep it as a single patch and add the details you gave  
me in the patch description.


Christophe



Diana




select GENERIC_IRQ_SHOW
select GENERIC_IRQ_SHOW_LEVEL
select GENERIC_SMP_IDLE_THREAD
@@ -240,6 +240,11 @@ config PPC
# Please keep this list sorted alphabetically.
#

+config PPC_NOSPEC
+bool
+default y
+depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E
+
 config GENERIC_CSUM
def_bool n

diff --git a/arch/powerpc/kernel/security.c  
b/arch/powerpc/kernel/security.c

index 3a4e5c3..539c744 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -92,6 +92,7 @@ static __init int barrier_nospec_debugfs_init(void)
 device_initcall(barrier_nospec_debugfs_init);
 #endif /* CONFIG_DEBUG_FS */

+#ifdef CONFIG_PPC_BOOK3S_64
 ssize_t cpu_show_meltdown(struct device *dev, struct
device_attribute *attr, char *buf)
 {
bool thread_priv;
@@ -124,6 +125,7 @@ ssize_t cpu_show_meltdown(struct device *dev,
struct device_attribute *attr, cha

return sprintf(buf, "Vulnerable\n");
 }
+#endif

 ssize_t cpu_show_spectre_v1(struct device *dev, struct
device_attribute *attr, char *buf)
 {
--
2.5.5








Re: [PATCH v14 12/22] selftests/vm: pkey register should match shadow pkey

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> expected_pkey_fault() is comparing the contents of pkey
> register with 0. This may not be true all the time. There
> could be bits set by default by the architecture
> which can never be changed. Hence compare the value against
> shadow pkey register, which is supposed to track the bits
> accurately all throughout

This is getting dangerously close to full sentences that actually
describe the patch.  You forgot a period, but much this is a substantial
improvement over earlier parts of the series.  Thanks for writing this,
seriously.

> cc: Dave Hansen 
> cc: Florian Weimer 
> Signed-off-by: Ram Pai 
> ---
>  tools/testing/selftests/vm/protection_keys.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/protection_keys.c 
> b/tools/testing/selftests/vm/protection_keys.c
> index 2e448e0..f50cce8 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -913,10 +913,10 @@ void expected_pkey_fault(int pkey)
>   pkey_assert(last_si_pkey == pkey);
>  
>   /*
> -  * The signal handler shold have cleared out PKEY register to let the
> +  * The signal handler should have cleared out pkey-register to let the
>* test program continue.  We now have to restore it.
>*/

... while I appreciate the spelling corrections, and I would totally ack
a patch that fixed them in one fell swoop, could we please segregate the
random spelling corrections from code fixes unless you touch those lines
otherwise?

> - if (__read_pkey_reg() != 0)
> + if (__read_pkey_reg() != shadow_pkey_reg)
>   pkey_assert(0);
>  
>   __write_pkey_reg(shadow_pkey_reg);

I know this is a one-line change, but I don't fully understand it.

On x86, if we take a pkey fault, we clear PKRU entirely (via the
on-stack XSAVE state that is restored at sigreturn) which allows the
faulting instruction to resume and execute normally.  That's what this
check is looking for: Did the signal handler clear PKRU?

Now, you're saying that powerpc might not clear it.  That makes sense.

While PKRU's state here is obvious, it isn't patently obvious to me what
shadow_pkey_reg's state is.  In fact, looking at it, I don't see the
signal handler manipulating the shadow.  So, how can this patch work?


Re: [PATCH v14 11/22] selftests/vm: introduce two arch independent abstraction

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> open_hugepage_file() <- opens the huge page file

Folks, a sentence here would be nice:

Different architectures have different huge page sizes and thus
have different sysfs filees to manipulate when allocating huge
pages.

> get_start_key() <--  provides the first non-reserved key.

Does powerpc not start on key 0?  Why do you need this?

> cc: Dave Hansen 
> cc: Florian Weimer 
> Signed-off-by: Ram Pai 
> Signed-off-by: Thiago Jung Bauermann 
> Reviewed-by: Dave Hansen 
> ---
>  tools/testing/selftests/vm/pkey-helpers.h|   10 ++
>  tools/testing/selftests/vm/pkey-x86.h|1 +
>  tools/testing/selftests/vm/protection_keys.c |6 +++---
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
> b/tools/testing/selftests/vm/pkey-helpers.h
> index ada0146..52a1152 100644
> --- a/tools/testing/selftests/vm/pkey-helpers.h
> +++ b/tools/testing/selftests/vm/pkey-helpers.h
> @@ -179,4 +179,14 @@ static inline void __pkey_write_allow(int pkey, int 
> do_allow_write)
>  #define __stringify_1(x...) #x
>  #define __stringify(x...)   __stringify_1(x)
>  
> +static inline int open_hugepage_file(int flag)
> +{
> + return open(HUGEPAGE_FILE, flag);
> +}

open_nr_hugepages_file() if you revise this, please
> +
> +static inline int get_start_key(void)
> +{
> + return 1;
> +}

get_first_user_pkey(), please.

>  #endif /* _PKEYS_HELPER_H */
> diff --git a/tools/testing/selftests/vm/pkey-x86.h 
> b/tools/testing/selftests/vm/pkey-x86.h
> index 2b3780d..d5fa299 100644
> --- a/tools/testing/selftests/vm/pkey-x86.h
> +++ b/tools/testing/selftests/vm/pkey-x86.h
> @@ -48,6 +48,7 @@
>  #define MB   (1<<20)
>  #define pkey_reg_t   u32
>  #define PKEY_REG_FMT "%016x"
> +#define HUGEPAGE_FILE
> "/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"
>  
>  static inline u32 pkey_bit_position(int pkey)
>  {
> diff --git a/tools/testing/selftests/vm/protection_keys.c 
> b/tools/testing/selftests/vm/protection_keys.c
> index 2565b4c..2e448e0 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -788,7 +788,7 @@ void setup_hugetlbfs(void)
>* Now go make sure that we got the pages and that they
>* are 2M pages.  Someone might have made 1G the default.
>*/
> - fd = open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages", 
> O_RDONLY);
> + fd = open_hugepage_file(O_RDONLY);
>   if (fd < 0) {
>   perror("opening sysfs 2M hugetlb config");
>   return;

This is fine, and obviously necessary.

> @@ -1075,10 +1075,10 @@ void 
> test_kernel_gup_write_to_write_disabled_region(int *ptr, u16 pkey)
>  void test_pkey_syscalls_on_non_allocated_pkey(int *ptr, u16 pkey)
>  {
>   int err;
> - int i;
> + int i = get_start_key();
>  
>   /* Note: 0 is the default pkey, so don't mess with it */
> - for (i = 1; i < NR_PKEYS; i++) {
> + for (; i < NR_PKEYS; i++) {
>   if (pkey == i)
>   continue;

Grumble, grumble, you moved the code away from the comment connected to
it.



Re: [PATCH v14 10/22] selftests/vm: fix alloc_random_pkey() to make it really random

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> alloc_random_pkey() was allocating the same pkey every time.
> Not all pkeys were geting tested. fixed it.

This fixes a real issue but also unnecessarily munges whitespace.  If
you rev these again, please fix the munging.  Otherwise:

Acked-by: Dave Hansen 


Re: [PATCH v14 09/22] selftests/vm: fixed bugs in pkey_disable_clear()

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> instead of clearing the bits, pkey_disable_clear() was setting
> the bits. Fixed it.

Again, I know these are just selftests, but I'm seeing a real lack of
attention to detail with these.  Please at least go to the trouble of
writing actual changelogs with full sentences and actual capitalization.
 I think it's the least you can do if I'm going to spend time reviewing
these.

I'll go one better.  Can you just use this as a changelog?

pkey_disable_clear() does not work.  Instead of clearing bits,
it sets them, probably because it originated as a copy-and-paste
of pkey_disable_set().

This has been dead code up to now because the only callers are
pkey_access/write_allow() and _those_ are currently unused.

> Also fixed a wrong assertion in that function. When bits are
> cleared, the resulting bit value will be less than the original.
> 
> This hasn't been a problem so far because this code isn't currently
> used.
> 
> cc: Dave Hansen 
> cc: Florian Weimer 
> Signed-off-by: Ram Pai 
> ---
>  tools/testing/selftests/vm/protection_keys.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/protection_keys.c 
> b/tools/testing/selftests/vm/protection_keys.c
> index 2dd94c3..8fa4f74 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -433,7 +433,7 @@ void pkey_disable_clear(int pkey, int flags)
>   pkey, pkey, pkey_rights);
>   pkey_assert(pkey_rights >= 0);
>  
> - pkey_rights |= flags;
> + pkey_rights &= ~flags;
>
>   ret = hw_pkey_set(pkey, pkey_rights, 0);
>   shadow_pkey_reg &= clear_pkey_flags(pkey, flags);
> @@ -446,7 +446,7 @@ void pkey_disable_clear(int pkey, int flags)
>   dprintf1("%s(%d) pkey_reg: 0x"PKEY_REG_FMT"\n", __func__,
>   pkey, read_pkey_reg());

If you add a better changelog:

Acked-by: Dave Hansen 

>   if (flags)
> - assert(read_pkey_reg() > orig_pkey_reg);
> + assert(read_pkey_reg() <= orig_pkey_reg);
>  }

This really is an orthogonal change that would have been better placed
with the other patch that did the exact same thing.  But I'd be OK with
it here, as long as it has an actual comment.


Re: [PATCH v14 08/22] selftests/vm: fix the wrong assert in pkey_disable_set()

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> If the flag is 0, no bits will be set. Hence we cant expect
> the resulting bitmap to have a higher value than what it
> was earlier.
...
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -415,7 +415,7 @@ void pkey_disable_set(int pkey, int flags)
>   dprintf1("%s(%d) pkey_reg: 0x"PKEY_REG_FMT"\n",
>   __func__, pkey, read_pkey_reg());
>   if (flags)
> - pkey_assert(read_pkey_reg() > orig_pkey_reg);
> + pkey_assert(read_pkey_reg() >= orig_pkey_reg);
>   dprintf1("END<---%s(%d, 0x%x)\n", __func__,
>   pkey, flags);
>  }

I know these are just selftests, but this change makes zero sense
without the context from how powerpc works.  It's also totally
non-obvious from the patch itself what is going on, even though I
specifically called this out in a previous review.

Please add a comment here that either specifically calls out powerpc or
talks about "an architecture that does this ..."



Re: [PATCH v14 07/22] selftests/vm: generic function to handle shadow key register

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> - shifted_pkey_reg = (pkey_reg >> (pkey * PKEY_BITS_PER_PKEY));
> + shifted_pkey_reg = right_shift_bits(pkey, pkey_reg);
>   dprintf2("%s() shifted_pkey_reg: "PKEY_REG_FMT"\n", __func__,
>   shifted_pkey_reg);
>   masked_pkey_reg = shifted_pkey_reg & mask;

I'm not a fan of how this looks.  This is almost certainly going to get
the argument order mixed up at some point.

Why do we need this?  The description doesn't say.


Re: [PATCH v14 06/22] selftests/vm: typecast the pkey register

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> This is in preparation to accomadate a differing size register
> across architectures.

This is pretty fugly, and reading it again, I wonder whether we should
have just made it 64-bit, at least in all the printk's.  Or even

prink("pk reg: %*llx\n", PKEY_FMT_LEN, pkey_reg);

But, I don't _really_ care in the end.

Acked-by: Dave Hansen 


Re: [PATCH v14 04/22] selftests/vm: move arch-specific definitions to arch-specific header

2018-07-18 Thread Dave Hansen
On 07/17/2018 06:49 AM, Ram Pai wrote:
> In preparation for multi-arch support, move definitions which have
> arch-specific values to x86-specific header.

Acked-by: Dave Hansen 


Re: [PATCH v14 03/22] selftests/vm: move generic definitions to header file

2018-07-18 Thread Dave Hansen
Acked-by: Dave Hansen 


Re: [PATCH v14 01/22] selftests/x86: Move protecton key selftest to arch neutral directory

2018-07-18 Thread Dave Hansen
Acked-by: Dave Hansen 



Re: [PATCH v4 4/6] powerpc/fsl: Enable cpu vulnerabilities reporting for NXP PPC BOOK3E

2018-07-18 Thread Diana Madalina Craciun
On 7/17/2018 7:47 PM, LEROY Christophe wrote:
> Diana Craciun  a écrit :
>
>> The NXP PPC Book3E platforms are not vulnerable to meltdown and
>> Spectre v4, so make them PPC_BOOK3S_64 specific.
>>
>> Signed-off-by: Diana Craciun 
>> ---
>> History:
>>
>> v2-->v3
>> - used the existing functions for spectre v1/v2
>>
>>  arch/powerpc/Kconfig   | 7 ++-
>>  arch/powerpc/kernel/security.c | 2 ++
>>  2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 9f2b75f..116c953 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -165,7 +165,7 @@ config PPC
>>  select GENERIC_CLOCKEVENTS_BROADCASTif SMP
>>  select GENERIC_CMOS_UPDATE
>>  select GENERIC_CPU_AUTOPROBE
>> -select GENERIC_CPU_VULNERABILITIES  if PPC_BOOK3S_64
>> +select GENERIC_CPU_VULNERABILITIES  if PPC_NOSPEC
> I don't understand.  You say this patch is to make something specific  
> to book3s64 specific, and you are creating a new config param that  
> make things less specific
>
> Christophe

In order to enable the vulnerabilities reporting on NXP socs I need to
enable them for PPC_FSL_BOOK3E. So they will be enabled for both
PPC_FSL_BOOK3E and PPC_BOOK3S_64. This is the reason for adding the
Kconfig. However this will enable: spectre v1/v2 and meltdown. NXP socs
are not vulnerable to meltdown, so I made the meltdown reporting
PPC_BOOK3S_64 specific. I guess I can have the PPC_NOSPEC definition in
a separate patch to be more clear.

Diana

>
>>  select GENERIC_IRQ_SHOW
>>  select GENERIC_IRQ_SHOW_LEVEL
>>  select GENERIC_SMP_IDLE_THREAD
>> @@ -240,6 +240,11 @@ config PPC
>>  # Please keep this list sorted alphabetically.
>>  #
>>
>> +config PPC_NOSPEC
>> +bool
>> +default y
>> +depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E
>> +
>>  config GENERIC_CSUM
>>  def_bool n
>>
>> diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
>> index 3a4e5c3..539c744 100644
>> --- a/arch/powerpc/kernel/security.c
>> +++ b/arch/powerpc/kernel/security.c
>> @@ -92,6 +92,7 @@ static __init int barrier_nospec_debugfs_init(void)
>>  device_initcall(barrier_nospec_debugfs_init);
>>  #endif /* CONFIG_DEBUG_FS */
>>
>> +#ifdef CONFIG_PPC_BOOK3S_64
>>  ssize_t cpu_show_meltdown(struct device *dev, struct  
>> device_attribute *attr, char *buf)
>>  {
>>  bool thread_priv;
>> @@ -124,6 +125,7 @@ ssize_t cpu_show_meltdown(struct device *dev,  
>> struct device_attribute *attr, cha
>>
>>  return sprintf(buf, "Vulnerable\n");
>>  }
>> +#endif
>>
>>  ssize_t cpu_show_spectre_v1(struct device *dev, struct  
>> device_attribute *attr, char *buf)
>>  {
>> --
>> 2.5.5
>
>



Re: [PATCH 2/3] [v2] m68k: mac: use time64_t in RTC handling

2018-07-18 Thread Arnd Bergmann
On Wed, Jul 18, 2018 at 3:49 PM, Finn Thain  wrote:
> On Wed, 18 Jul 2018, Arnd Bergmann wrote:

>>
>> -static long via_read_time(void)
>> +static time64_t via_read_time(void)
>>  {
>> union {
>> __u8 cdata[4];
>> -   long idata;
>> +   __u32 idata;
>> } result, last_result;
>> +   time64_t ret;
>
> ret isn't used.
>
>> int count = 1;
>>
>> via_pram_command(0x81, _result.cdata[3]);
>> @@ -279,12 +280,8 @@ static long via_read_time(void)
>> via_pram_command(0x89, [1]);
>> via_pram_command(0x8D, [0]);
>>
>> -   if (result.idata == last_result.idata) {
>> -   if (result.idata < RTC_OFFSET)
>> -   result.idata += 0x1ull;
>> -
>> -   return result.idata - RTC_OFFSET;
>> -   }
>> +   if (result.idata == last_result.idata)
>> +   return (time64_t(result.idata) - RTC_OFFSET);
>>
>
> Did you mean to write,
>
> return (time64_t)result.idata - RTC_OFFSET;
>
> ?

Right, I should have at least tried to build it again.

  Arnd


Re: [PATCH 2/3] [v2] m68k: mac: use time64_t in RTC handling

2018-07-18 Thread Finn Thain
On Wed, 18 Jul 2018, Arnd Bergmann wrote:

> Hmm, apparently I forgot to update via_read_time(), that one
> is indeed bogus and now inconsistent with the other functions.
> 
> The change in via_write_time() seems at least consistent wtih what we do 
> elsewhere, and using __u32 makes this code more portable. (yes, I 
> realize that 64-bit powermac doesn't use the VIA RTC, but it feels 
> better to write code portably anyway).
> 

As for portability, I think you just contradicted yourself. But I take 
your point about consistency. So I won't object to adopting __u32.

> I'd suggest we do it like below to make it consistent with the
> rest again, using the 1904..2040 range of dates and no warning
> for invalid dates.
> 
> If you agree, I'll send that as a proper patch.
> 

Geert may instead wish to fixup or revert the patch he has committed 
already...

>Arnd
> 
> diff --git a/arch/m68k/mac/misc.c b/arch/m68k/mac/misc.c
> index bf8df47a6d09..8335509969f1 100644
> --- a/arch/m68k/mac/misc.c
> +++ b/arch/m68k/mac/misc.c
> @@ -255,12 +255,13 @@ static void via_write_pram(int offset, __u8 data)
>   * is basically any machine with Mac II-style ADB.
>   */
> 
> -static long via_read_time(void)
> +static time64_t via_read_time(void)
>  {
> union {
> __u8 cdata[4];
> -   long idata;
> +   __u32 idata;
> } result, last_result;
> +   time64_t ret;

ret isn't used.

> int count = 1;
> 
> via_pram_command(0x81, _result.cdata[3]);
> @@ -279,12 +280,8 @@ static long via_read_time(void)
> via_pram_command(0x89, [1]);
> via_pram_command(0x8D, [0]);
> 
> -   if (result.idata == last_result.idata) {
> -   if (result.idata < RTC_OFFSET)
> -   result.idata += 0x1ull;
> -
> -   return result.idata - RTC_OFFSET;
> -   }
> +   if (result.idata == last_result.idata)
> +   return (time64_t(result.idata) - RTC_OFFSET);
> 

Did you mean to write,

return (time64_t)result.idata - RTC_OFFSET;

?

-- 


Re: [PATCH 2/3] [v2] m68k: mac: use time64_t in RTC handling

2018-07-18 Thread Arnd Bergmann
On Wed, Jul 18, 2018 at 2:02 PM, Finn Thain  wrote:
> On Wed, 18 Jul 2018, Geert Uytterhoeven wrote:
>
>>
>> Thanks for your patch!
>>
>> Applied and queued for v4.19, with the WARN_ON() dropped.
>>
>
> The patch you've committed to your for-v4.19 branch has this hunk:
>
> @@ -269,8 +275,12 @@ static long via_read_time(void)
> via_pram_command(0x89, [1]);
> via_pram_command(0x8D, [0]);
>
> -   if (result.idata == last_result.idata)
> +   if (result.idata == last_result.idata) {
> +   if (result.idata < RTC_OFFSET)
> +   result.idata += 0x1ull;
> +
> return result.idata - RTC_OFFSET;
> +   }
>
> if (++count > 10)
> break;
>
> That looks bogus to me, since result.idata is a long.



> Also, the following hunk seems a bit pointless (?)
>
> @@ -291,11 +301,11 @@ static long via_read_time(void)
>   * is basically any machine with Mac II-style ADB.
>   */
>
> -static void via_write_time(long time)
> +static void via_write_time(time64_t time)
>  {
> union {
> __u8 cdata[4];
> -   long idata;
> +   __u32 idata;
> } data;
> __u8 temp;
>
>
> But if data.idata needs to be changed to __u32 here, why not change the
> same struct member in via_read_time() also?

Hmm, apparently I forgot to update via_read_time(), that one
is indeed bogus and now inconsistent with the other functions.

The change in via_write_time() seems at least consistent wtih
what we do elsewhere, and using __u32 makes this code
more portable. (yes, I realize that 64-bit powermac doesn't
use the VIA RTC, but it feels better to write code portably
anyway).

I'd suggest we do it like below to make it consistent with the
rest again, using the 1904..2040 range of dates and no warning
for invalid dates.

If you agree, I'll send that as a proper patch.

   Arnd

diff --git a/arch/m68k/mac/misc.c b/arch/m68k/mac/misc.c
index bf8df47a6d09..8335509969f1 100644
--- a/arch/m68k/mac/misc.c
+++ b/arch/m68k/mac/misc.c
@@ -255,12 +255,13 @@ static void via_write_pram(int offset, __u8 data)
  * is basically any machine with Mac II-style ADB.
  */

-static long via_read_time(void)
+static time64_t via_read_time(void)
 {
union {
__u8 cdata[4];
-   long idata;
+   __u32 idata;
} result, last_result;
+   time64_t ret;
int count = 1;

via_pram_command(0x81, _result.cdata[3]);
@@ -279,12 +280,8 @@ static long via_read_time(void)
via_pram_command(0x89, [1]);
via_pram_command(0x8D, [0]);

-   if (result.idata == last_result.idata) {
-   if (result.idata < RTC_OFFSET)
-   result.idata += 0x1ull;
-
-   return result.idata - RTC_OFFSET;
-   }
+   if (result.idata == last_result.idata)
+   return (time64_t(result.idata) - RTC_OFFSET);

if (++count > 10)
break;


Re: [PATCH 2/3] [v2] m68k: mac: use time64_t in RTC handling

2018-07-18 Thread Finn Thain
On Wed, 18 Jul 2018, Geert Uytterhoeven wrote:

> 
> Thanks for your patch!
> 
> Applied and queued for v4.19, with the WARN_ON() dropped.
> 

The patch you've committed to your for-v4.19 branch has this hunk:

@@ -269,8 +275,12 @@ static long via_read_time(void)
via_pram_command(0x89, [1]);
via_pram_command(0x8D, [0]);
 
-   if (result.idata == last_result.idata)
+   if (result.idata == last_result.idata) {
+   if (result.idata < RTC_OFFSET)
+   result.idata += 0x1ull;
+
return result.idata - RTC_OFFSET;
+   }
 
if (++count > 10)
break;

That looks bogus to me, since result.idata is a long.

Also, the following hunk seems a bit pointless (?)

@@ -291,11 +301,11 @@ static long via_read_time(void)
  * is basically any machine with Mac II-style ADB.
  */
 
-static void via_write_time(long time)
+static void via_write_time(time64_t time)
 {
union {
__u8 cdata[4];
-   long idata;
+   __u32 idata;
} data;
__u8 temp;
 

But if data.idata needs to be changed to __u32 here, why not change the 
same struct member in via_read_time() also?

-- 


[PATCH 0/3] powerpc/dpaa: use the Cortina PHY driver

2018-07-18 Thread Camelia Groza
The Cortina PHY on T2080RDB and T4240RDB requires the use of the
dedicated Cortina PHY driver. This patch series enables it for
DPAA platforms and uses it on T4240RDB and T2080RDB.

Camelia Groza (3):
  powerpc/configs/dpaa: enable the Cortina PHY driver
  powerpc/dts/fsl: t4240rdb: use the Cortina PHY driver compatible
  powerpc/dts/fsl: t2080rdb: use the Cortina PHY driver compatible

 arch/powerpc/boot/dts/fsl/t2080rdb.dts | 4 ++--
 arch/powerpc/boot/dts/fsl/t4240rdb.dts | 8 
 arch/powerpc/configs/dpaa.config   | 1 +
 3 files changed, 7 insertions(+), 6 deletions(-)

--
1.9.1



[PATCH 1/3] powerpc/configs/dpaa: enable the Cortina PHY driver

2018-07-18 Thread Camelia Groza
Cortina PHYs are present on T4240RDB and T2080RDB. Enable the driver
by default.

Signed-off-by: Camelia Groza 
---
 arch/powerpc/configs/dpaa.config | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/dpaa.config b/arch/powerpc/configs/dpaa.config
index 2fe76f5..4ffacaf 100644
--- a/arch/powerpc/configs/dpaa.config
+++ b/arch/powerpc/configs/dpaa.config
@@ -2,3 +2,4 @@ CONFIG_FSL_DPAA=y
 CONFIG_FSL_PAMU=y
 CONFIG_FSL_FMAN=y
 CONFIG_FSL_DPAA_ETH=y
+CONFIG_CORTINA_PHY=y
-- 
1.9.1



[PATCH 2/3] powerpc/dts/fsl: t4240rdb: use the Cortina PHY driver compatible

2018-07-18 Thread Camelia Groza
The Cortina PHY requires the use of the dedicated Cortina PHY driver
instead of the generic one.

Signed-off-by: Camelia Groza 
---
 arch/powerpc/boot/dts/fsl/t4240rdb.dts | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/t4240rdb.dts 
b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
index 15eb0a3..a56a705 100644
--- a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
@@ -267,22 +267,22 @@
 
mdio@fd000 {
xfiphy1: ethernet-phy@10 {
-   compatible = 
"ethernet-phy-ieee802.3-c45";
+   compatible = "ethernet-phy-id13e5.1002";
reg = <0x10>;
};
 
xfiphy2: ethernet-phy@11 {
-   compatible = 
"ethernet-phy-ieee802.3-c45";
+   compatible = "ethernet-phy-id13e5.1002";
reg = <0x11>;
};
 
xfiphy3: ethernet-phy@13 {
-   compatible = 
"ethernet-phy-ieee802.3-c45";
+   compatible = "ethernet-phy-id13e5.1002";
reg = <0x13>;
};
 
xfiphy4: ethernet-phy@12 {
-   compatible = 
"ethernet-phy-ieee802.3-c45";
+   compatible = "ethernet-phy-id13e5.1002";
reg = <0x12>;
};
};
-- 
1.9.1



[PATCH 3/3] powerpc/dts/fsl: t2080rdb: use the Cortina PHY driver compatible

2018-07-18 Thread Camelia Groza
The Cortina PHY requires the use of the dedicated Cortina PHY driver
instead of the generic one.

Signed-off-by: Camelia Groza 
---
 arch/powerpc/boot/dts/fsl/t2080rdb.dts | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/t2080rdb.dts 
b/arch/powerpc/boot/dts/fsl/t2080rdb.dts
index 836e4c9..55c0210 100644
--- a/arch/powerpc/boot/dts/fsl/t2080rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t2080rdb.dts
@@ -97,12 +97,12 @@
 
mdio@fd000 {
xg_cs4315_phy1: ethernet-phy@c {
-   compatible = "ethernet-phy-ieee802.3-c45";
+   compatible = "ethernet-phy-id13e5.1002";
reg = <0xc>;
};
 
xg_cs4315_phy2: ethernet-phy@d {
-   compatible = "ethernet-phy-ieee802.3-c45";
+   compatible = "ethernet-phy-id13e5.1002";
reg = <0xd>;
};
 
-- 
1.9.1



Re: [PATCH 3/3] [v2] m68k: remove unused set_clock_mmss() helpers

2018-07-18 Thread Geert Uytterhoeven
On Tue, Jun 19, 2018 at 4:04 PM Arnd Bergmann  wrote:
> Commit 397ac99c6cef ("m68k: remove dead timer code") removed set_rtc_mmss()
> because it was unused in 2012. However, this was itself the only user of the
> mach_set_clock_mmss() callback and the many implementations of that callback,
> which are equally unused.
>
> This removes all of those as well.
>
> Acked-by: Greg Ungerer 
> Signed-off-by: Arnd Bergmann 

Thanks, applied and queued for v4.19.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH 2/3] [v2] m68k: mac: use time64_t in RTC handling

2018-07-18 Thread Geert Uytterhoeven
On Tue, Jun 19, 2018 at 4:04 PM Arnd Bergmann  wrote:
> The real-time clock on m68k (and powerpc) mac systems uses an unsigned
> 32-bit value starting in 1904, which overflows in 2040, about two years
> later than everyone else, but this gets wrapped around in the Linux
> code in 2038 already because of the deprecated usage of time_t and/or
> long in the conversion.
>
> Getting rid of the deprecated interfaces makes it work until 2040 as
> documented, and it could be easily extended by reinterpreting
> the resulting time64_t as a positive number. For the moment, I'm
> adding a WARN_ON() that triggers if we encounter a time before 1970
> or after 2040 (the two are indistinguishable).
>
> This brings it in line with the corresponding code that we have on
> powerpc macintosh.
>
> Signed-off-by: Arnd Bergmann 

Thanks for your patch!

Applied and queued for v4.19, with the WARN_ON() dropped.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH] Mark ams driver as orphaned in MAINTAINERS

2018-07-18 Thread Michael Ellerman
Michael Hanselmann  writes:

> I no longer have any hardware with the Apple motion sensor and thus
> relinquish maintainership of the driver.

Thanks. I think I'll just remove the whole entry, meaning it will be
caught under the "LINUX FOR POWER MACINTOSH" entry.

Unless you object. Full patch below.

cheers

commit 19fe15338d6935c1a61a24b83dd315970b1ba162
Author: Michael Hanselmann 
AuthorDate: Mon Jan 29 22:40:09 2018 +
Commit: Michael Ellerman 
CommitDate: Wed Jul 18 19:57:42 2018 +1000

MAINTAINERS: Remove the entry for the orphaned ams driver

I no longer have any hardware with the Apple motion sensor and thus
relinquish maintainership of the driver.

Remove the maintainers entry entirely, meaning the code will now fall
under "LINUX FOR POWER MACINTOSH".

Signed-off-by: Michael Hanselmann 
[mpe: Drop the entry entirely, munge change log]
Signed-off-by: Michael Ellerman 

diff --git a/MAINTAINERS b/MAINTAINERS
index 07d1576fc766..50e398a463f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -791,11 +791,6 @@ S: Supported
 F: drivers/net/ethernet/amd/xgbe/
 F: arch/arm64/boot/dts/amd/amd-seattle-xgbe*.dtsi
 
-AMS (Apple Motion Sensor) DRIVER
-M: Michael Hanselmann 
-S: Supported
-F: drivers/macintosh/ams/
-
 ANALOG DEVICES INC AD5686 DRIVER
 M: Stefan Popa 
 L: linux...@vger.kernel.org



Re: [PATCH] powerpc/prom_init: remove linux,stdout-package property

2018-07-18 Thread Michael Ellerman
Hi Murilo,

Thanks for the patch.

Murilo Opsfelder Araujo  writes:
> This property was added in 2004 by
>
> 
> https://github.com/mpe/linux-fullhistory/commit/689fe5072fe9a0dec914bfa4fa60aed1e54563e6
>
> and the only use of it, which was already inside `#if 0`, was removed a month
> later by
>
> 
> https://github.com/mpe/linux-fullhistory/commit/1fbe5a6d90f6cd4ea610737ef488719d1a875de7
>
> Fixes: https://github.com/linuxppc/linux/issues/125

That is going to confuse some scripts that are expecting that to be a
"Fixes: " tag :)

The proper tag to use there would be "Link:".

But, I'd prefer we didn't add github issue links to the history, as I'm
not sure they won't bit-rot eventually. Not because I'm a anti-Microsoft
conspiracy person but just because it's a repo I set up and manage and
there's no long term plan for it necessarily.

> ---
>  arch/powerpc/kernel/prom_init.c | 2 --
>  1 file changed, 2 deletions(-)

Including the link here would be ideal, as it means it doesn't end up in
the commit history, but it does end up in the mail archive. So if we
ever really need to find it, it should be there.

cheers

> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index 5425dd3d6a9f..c45fb463c9e5 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -2102,8 +2102,6 @@ static void __init prom_init_stdout(void)
>   stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
>   if (stdout_node != PROM_ERROR) {
>   val = cpu_to_be32(stdout_node);
> - prom_setprop(prom.chosen, "/chosen", "linux,stdout-package",
> -  , sizeof(val));
>  
>   /* If it's a display, note it */
>   memset(type, 0, sizeof(type));
> -- 
> 2.17.1


[PATCH v2] powerpc/powernv : Save/Restore SPRG3 on entry/exit from stop.

2018-07-18 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

On 64-bit servers, SPRN_SPRG3 and its userspace read-only mirror
SPRN_USPRG3 are used as userspace VDSO write and read registers
respectively.

SPRN_SPRG3 is lost when we enter stop4 and above, and is currently not
restored.  As a result, any read from SPRN_USPRG3 returns zero on an
exit from stop4 and above.

Thus in this situation, on POWER9, any call from sched_getcpu() always
returns zero, as on powerpc, we call __kernel_getcpu() which relies
upon SPRN_USPRG3 to report the CPU and NUMA node information.

Fix this by restoring SPRN_SPRG3 on wake up from a deep stop state
with the sprg_vdso value that is cached in PACA.

Fixes: e1c1cfed5432 ("powerpc/powernv: Save/Restore additional SPRs
for stop4 cpuidle")

Reported-by: Florian Weimer 
Cc:  # 4.14
Cc: Oleg Nesterov 
Cc: Michael Neuling 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Vaidyanathan Srinivasan 
Signed-off-by: Gautham R. Shenoy 
---
Change from v1: 
  Restoring the SPRG3 from paca->sprg_vdso instead of saving
  it separately during stop-entry, as suggested by Mikey.
  
 arch/powerpc/kernel/idle_book3s.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index d85d551..672ead8 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -144,7 +144,9 @@ power9_restore_additional_sprs:
mtspr   SPRN_MMCR1, r4
 
ld  r3, STOP_MMCR2(r13)
+   ld  r4, PACA_SPRG_VDSO(r13)
mtspr   SPRN_MMCR2, r3
+   mtspr   SPRN_SPRG3, r4
blr
 
 /*
-- 
1.8.3.1



Re: [PATCH] ipmi/powernv: Fix spurious warnings at boot

2018-07-18 Thread Jeremy Kerr

Hi William,


Sometimes we have stale messages showing up in the recv queue that are
being processed by the pollers. We don't want to print out warnings for
these messages not having an outstanding request as they can be expected
when doing a kexec from the petitroot environment or from another
running kernel.


OK, makes sense to do. The in_drain state partially shadows the value of
!smi->cur_msg; we may be able to use that instead.

This means that we'll no longer warn when getting an IPMI message after
the first successful send occurs, but do we really care about that
anyway? We could just drop those silently...

Cheers,


Jeremy


Re: [RESEND][PATCH] powerpc/powernv : Save/Restore SPRG3 on entry/exit from stop.

2018-07-18 Thread Gautham R Shenoy
Hello Mikey,

On Wed, Jul 18, 2018 at 09:24:19AM +1000, Michael Neuling wrote:
> 
> > DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> > diff --git a/arch/powerpc/kernel/idle_book3s.S
> > b/arch/powerpc/kernel/idle_book3s.S
> > index d85d551..5069d42 100644
> > --- a/arch/powerpc/kernel/idle_book3s.S
> > +++ b/arch/powerpc/kernel/idle_book3s.S
> > @@ -120,6 +120,9 @@ power9_save_additional_sprs:
> > mfspr   r4, SPRN_MMCR2
> > std r3, STOP_MMCR1(r13)
> > std r4, STOP_MMCR2(r13)
> > +
> > +   mfspr   r3, SPRN_SPRG3
> > +   std r3, STOP_SPRG3(r13)
> 
> We don't need to save it.  Just restore it from paca->sprg_vdso which should
> never change.

Ok. I will respin a patch to restore SPRG3 from paca->sprg_vdso.

> 
> How can we do better at catching these missing SPRGs?

We can go through the list of SPRs from the POWER9 User Manual and
document explicitly why we don't have to save/restore certain SPRs
during the execution of the stop instruction. Does this sound ok ?

(Ref: Table 4-8, Section 4.7.3.4 from the POWER9 User Manual
accessible from
https://openpowerfoundation.org/?resource_lib=power9-processor-users-manual)

> 
> We missed this one and looking at c1b25a17d249 we missed the AMOR a couple of
> months back. I'd rather we had some systematic way of finding the ones we are
> missing, rather than playing wake-a-mole.

I agree, we need something more systematic than the try-catch method
thing we have now.

For deep stop states on POWER9, we looked at the list of SPRs that
were being lost and restored during winkle on POWER8. The additional
SPRs that we took care of were the ones related to the Radix and IMC.

Now since winkle was used only in the context of CPU-Hotplug, the
CPU-online code would reinit some of the SPRs such as SPRG3, which is
why we didn't see this problem on POWER8. So, that is one obvious
place to audit.

AMOR was a bad miss. It was being restored in POWER8 as part of the
subcore restore code, so like RPR, it should have been restored along
with the other per-core SPRs.

> 
> Mikey 
> 
> > blr
> >  
> >  power9_restore_additional_sprs:
> > @@ -144,7 +147,9 @@ power9_restore_additional_sprs:
> > mtspr   SPRN_MMCR1, r4
> >  
> > ld  r3, STOP_MMCR2(r13)
> > +   ld  r4, STOP_SPRG3(r13)
> > mtspr   SPRN_MMCR2, r3
> > +   mtspr   SPRN_SPRG3, r4
> > blr
> >  
> >  /*



[PATCH 2/2] powerpc/mm/book3s/radix: Add mapping statistics

2018-07-18 Thread Aneesh Kumar K.V
Add statistics that show how memory is mapped within the kernel linear mapping.
This is similar to commit 37cd944c8d8f ("s390/pgtable: add mapping statistics")

We don't do this with Hash translation mode. Hash uses one size 
(mmu_linear_psize)
to map the kernel linear mapping and we print the linear psize during boot as
below.

"Page orders: linear mapping = 24, virtual = 16, io = 16, vmemmap = 24"

A sample output looks like:

DirectMap4k:   0 kB
DirectMap64k:   18432 kB
DirectMap2M: 1030144 kB
DirectMap1G:11534336 kB

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 13 
 arch/powerpc/include/asm/book3s/64/radix.h   |  3 +++
 arch/powerpc/mm/pgtable-book3s64.c   | 22 
 arch/powerpc/mm/pgtable-radix.c  | 13 +++-
 4 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 01ee40f11f3a..1d2d69ae1bd2 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -208,4 +208,17 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t table,
 
 #define check_pgt_cache()  do { } while (0)
 
+extern atomic_long_t direct_pages_count[MMU_PAGE_COUNT];
+static inline void update_page_count(unsigned long mapping_shift, long count)
+{
+   int level;
+
+   if (IS_ENABLED(CONFIG_PROC_FS)) {
+   level = shift_to_mmu_psize(mapping_shift);
+   if (level < 0)
+   return;
+   atomic_long_add(count, _pages_count[level]);
+   }
+}
+
 #endif /* _ASM_POWERPC_BOOK3S_64_PGALLOC_H */
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 3ab3f7aef022..56a6e3f5f7c1 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -30,6 +30,9 @@
 #define RADIX_PUD_BAD_BITS 0x60e0UL
 #define RADIX_PGD_BAD_BITS 0x60e0UL
 
+#define RADIX_PMD_SHIFT(PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
+#define RADIX_PUD_SHIFT(RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
+#define RADIX_PGD_SHIFT(RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
 /*
  * Size of EA range mapped by our pagetables.
  */
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 4afbfbb64bfd..5d2328ef7958 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -450,3 +450,25 @@ void pgtable_free_tlb(struct mmu_gather *tlb, void *table, 
int index)
return pgtable_free(table, index);
 }
 #endif
+
+#ifdef CONFIG_PROC_FS
+atomic_long_t direct_pages_count[MMU_PAGE_COUNT];
+
+void arch_report_meminfo(struct seq_file *m)
+{
+   /*
+* Hash maps the memory with one size mmu_linear_psize.
+* So don't bother to print these on hash
+*/
+   if (!radix_enabled())
+   return;
+   seq_printf(m, "DirectMap4k:%8lu kB\n",
+  atomic_long_read(_pages_count[MMU_PAGE_4K]) << 2);
+   seq_printf(m, "DirectMap64k:%8lu kB\n",
+  atomic_long_read(_pages_count[MMU_PAGE_64K]) << 6);
+   seq_printf(m, "DirectMap2M:%8lu kB\n",
+  atomic_long_read(_pages_count[MMU_PAGE_2M]) << 11);
+   seq_printf(m, "DirectMap1G:%8lu kB\n",
+  atomic_long_read(_pages_count[MMU_PAGE_1G]) << 20);
+}
+#endif /* CONFIG_PROC_FS */
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index d9819e573103..5c5ca58e7d1f 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -259,6 +259,7 @@ static int __meminit create_physical_mapping(unsigned long 
start,
 unsigned long end,
 int nid)
 {
+   int mapping_shift;
unsigned long vaddr, addr, mapping_size = 0;
pgprot_t prot;
 #ifdef CONFIG_STRICT_KERNEL_RWX
@@ -277,18 +278,19 @@ static int __meminit create_physical_mapping(unsigned 
long start,
 
if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE &&
mmu_psize_defs[MMU_PAGE_1G].shift)
-   mapping_size = PUD_SIZE;
+   mapping_shift = RADIX_PUD_SHIFT;
else if (IS_ALIGNED(addr, PMD_SIZE) && gap >= PMD_SIZE &&
 mmu_psize_defs[MMU_PAGE_2M].shift)
-   mapping_size = PMD_SIZE;
+   mapping_shift = RADIX_PMD_SHIFT;
else
-   mapping_size = PAGE_SIZE;
+   mapping_shift = PAGE_SHIFT;
 
-   if (split_text_mapping && (mapping_size != PAGE_SIZE) &&
+   if (split_text_mapping && (mapping_shift != PAGE_SHIFT) &&

[PATCH 1/2] powerpc/mm/book3s/radix: Drop the unnecessary retry when creating linear mapping

2018-07-18 Thread Aneesh Kumar K.V
Checking for mapping_size != PAGE_SIZE should make sure we handle the
split_text_mapping correctly. Drop the retry loop.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable-radix.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index bba168d02235..d9819e573103 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -261,7 +261,6 @@ static int __meminit create_physical_mapping(unsigned long 
start,
 {
unsigned long vaddr, addr, mapping_size = 0;
pgprot_t prot;
-   unsigned long max_mapping_size;
 #ifdef CONFIG_STRICT_KERNEL_RWX
int split_text_mapping = 1;
 #else
@@ -275,12 +274,9 @@ static int __meminit create_physical_mapping(unsigned long 
start,
 
gap = end - addr;
previous_size = mapping_size;
-   max_mapping_size = PUD_SIZE;
 
-retry:
if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE &&
-   mmu_psize_defs[MMU_PAGE_1G].shift &&
-   PUD_SIZE <= max_mapping_size)
+   mmu_psize_defs[MMU_PAGE_1G].shift)
mapping_size = PUD_SIZE;
else if (IS_ALIGNED(addr, PMD_SIZE) && gap >= PMD_SIZE &&
 mmu_psize_defs[MMU_PAGE_2M].shift)
@@ -288,14 +284,7 @@ static int __meminit create_physical_mapping(unsigned long 
start,
else
mapping_size = PAGE_SIZE;
 
-   if (split_text_mapping && (mapping_size == PUD_SIZE) &&
-   (addr <= __pa_symbol(__init_begin)) &&
-   (addr + mapping_size) >= __pa_symbol(_stext)) {
-   max_mapping_size = PMD_SIZE;
-   goto retry;
-   }
-
-   if (split_text_mapping && (mapping_size == PMD_SIZE) &&
+   if (split_text_mapping && (mapping_size != PAGE_SIZE) &&
(addr <= __pa_symbol(__init_begin)) &&
(addr + mapping_size) >= __pa_symbol(_stext))
mapping_size = PAGE_SIZE;
-- 
2.17.1



Re: [PATCH v5 2/2] hwmon: ibmpowernv: Add attributes to enable/disable sensor groups

2018-07-18 Thread Shilpasri G Bhat
Hi,

On 07/17/2018 07:17 PM, Guenter Roeck wrote:
> On 07/14/2018 11:54 PM, Shilpasri G Bhat wrote:
>> On-Chip-Controller(OCC) is an embedded micro-processor in POWER9 chip
>> which measures various system and chip level sensors. These sensors
>> comprises of environmental sensors (like power, temperature, current
>> and voltage) and performance sensors (like utilization, frequency).
>> All these sensors are copied to main memory at a regular interval of
>> 100ms. OCC provides a way to select a group of sensors that is copied
>> to the main memory to increase the update frequency of selected sensor
>> groups. When a sensor-group is disabled, OCC will not copy it to main
>> memory and those sensors read 0 values.
>>
>> This patch provides support for enabling/disabling the sensor groups
>> like power, temperature, current and voltage. This patch adds new
>> per-senor sysfs attribute to disable and enable them.
>>
>> Signed-off-by: Shilpasri G Bhat 
>> ---
>> Changes from v4:
>> - As per Mpe's suggestion store device_node instead of phandles and
>>clean it after init
>> - s/sg_data/sgrp_data
>>
>>   Documentation/hwmon/ibmpowernv |  43 ++-
>>   drivers/hwmon/ibmpowernv.c | 256 
>> +++--
>>   2 files changed, 265 insertions(+), 34 deletions(-)
>>
>> diff --git a/Documentation/hwmon/ibmpowernv b/Documentation/hwmon/ibmpowernv
>> index 8826ba2..5646825 100644
>> --- a/Documentation/hwmon/ibmpowernv
>> +++ b/Documentation/hwmon/ibmpowernv
>> @@ -33,9 +33,48 @@ fanX_inputMeasured RPM value.
>>   fanX_minThreshold RPM for alert generation.
>>   fanX_fault0: No fail condition
>>   1: Failing fan
>> +
>>   tempX_inputMeasured ambient temperature.
>>   tempX_maxThreshold ambient temperature for alert generation.
>> -inX_inputMeasured power supply voltage
>> +tempX_highestHistorical maximum temperature
>> +tempX_lowestHistorical minimum temperature
>> +tempX_enableEnable/disable all temperature sensors belonging to the
>> +sub-group. In POWER9, this attribute corresponds to
>> +each OCC. Using this attribute each OCC can be asked to
>> +disable/enable all of its temperature sensors.
>> +1: Enable
>> +0: Disable
>> +
>> +inX_inputMeasured power supply voltage (millivolt)
>>   inX_fault0: No fail condition.
>>   1: Failing power supply.
>> -power1_inputSystem power consumption (microWatt)
>> +inX_highestHistorical maximum voltage
>> +inX_lowestHistorical minimum voltage
>> +inX_enableEnable/disable all voltage sensors belonging to the
>> +sub-group. In POWER9, this attribute corresponds to
>> +each OCC. Using this attribute each OCC can be asked to
>> +disable/enable all of its voltage sensors.
>> +1: Enable
>> +0: Disable
>> +
>> +powerX_inputPower consumption (microWatt)
>> +powerX_input_highestHistorical maximum power
>> +powerX_input_lowestHistorical minimum power
>> +powerX_enableEnable/disable all power sensors belonging to the
>> +sub-group. In POWER9, this attribute corresponds to
>> +each OCC. Using this attribute each OCC can be asked to
>> +disable/enable all of its power sensors.
>> +1: Enable
>> +0: Disable
>> +
>> +currX_inputMeasured current (milliampere)
>> +currX_highestHistorical maximum current
>> +currX_lowestHistorical minimum current
>> +currX_enableEnable/disable all current sensors belonging to the
>> +sub-group. In POWER9, this attribute corresponds to
>> +each OCC. Using this attribute each OCC can be asked to
>> +disable/enable all of its current sensors.
>> +1: Enable
>> +0: Disable
>> +
>> +energyX_inputCumulative energy (microJoule)
>> diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c
>> index f829dad..a509b9b 100644
>> --- a/drivers/hwmon/ibmpowernv.c
>> +++ b/drivers/hwmon/ibmpowernv.c
>> @@ -90,11 +90,23 @@ struct sensor_data {
>>   char label[MAX_LABEL_LEN];
>>   char name[MAX_ATTR_LEN];
>>   struct device_attribute dev_attr;
>> +struct sensor_group_data *sgrp_data;
>> +};
>> +
>> +struct sensor_group_data {
>> +struct mutex mutex;
>> +struct device_node **of_nodes;
>> +u32 gid;
>> +u32 nr_nodes;
>> +enum sensors type;
>> +bool enable;
>>   };
>> struct platform_data {
>>   const struct attribute_group *attr_groups[MAX_SENSOR_TYPE + 1];
>> +struct sensor_group_data *sgrp_data;
>>   u32 sensors_count; /* Total count of sensors from each group */
>> +u32 nr_sensor_groups; /* Total number of sensor groups */
>>   };
>> static ssize_t show_sensor(struct device *dev, struct device_attribute
>> *devattr,
>> @@ -105,6 +117,9 

Re: [PATCH 0/6] lib/crc32: treewide: Use existing define with polynomial

2018-07-18 Thread Krzysztof Kozlowski
On 18 July 2018 at 02:12, Eric Biggers  wrote:
> Hi Krzysztof,
>
> On Tue, Jul 17, 2018 at 06:05:35PM +0200, Krzysztof Kozlowski wrote:
>> Hi,
>>
>> Kernel defines same polynomial for CRC-32 in few places.
>> This is unnecessary duplication of the same value. Also this might
>> be error-prone for future code - every driver will define the
>> polynomial again.
>>
>> This is an attempt to unify definition of polynomial.  Few obvious
>> hard-coded locations are fixed with define.
>>
>> All series depend on each 1/6 and 2/6.
>>
>> This could be merged in two different merge windows (1st lib/crc and then
>> the rest) or taken through one tree.
>>
>> It would be nice to get some testing. Only generic lib/crc, bunzip, xz_crc32
>> and Freescale's Ethernet driver were tested on HW.  Rest got just different
>> builds.
>>
>> Best regards,
>> Krzysztof
>>
>> Krzysztof Kozlowski (6):
>>   lib/crc: Move polynomial definition to separate header
>>   lib/crc: Use consistent naming for CRC-32 polynomials
>>   crypto: stm32_crc32 - Use existing define with polynomial
>>   net: ethernet: Use existing define with polynomial
>>   staging: rtl: Use existing define with polynomial
>>   lib: Use existing define with polynomial
>>
>>  drivers/crypto/stm32/stm32_crc32.c   | 11 ---
>>  drivers/net/ethernet/amd/xgbe/xgbe-dev.c |  4 ++--
>>  drivers/net/ethernet/apple/bmac.c|  8 ++--
>>  drivers/net/ethernet/broadcom/tg3.c  |  3 ++-
>>  drivers/net/ethernet/freescale/fec_main.c|  4 ++--
>>  drivers/net/ethernet/freescale/fs_enet/fec.h |  3 ---
>>  drivers/net/ethernet/freescale/fs_enet/mac-fec.c |  3 ++-
>>  drivers/net/ethernet/micrel/ks8851_mll.c |  3 ++-
>>  drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c|  4 ++--
>>  drivers/staging/rtl8712/rtl871x_security.c   |  5 ++---
>>  drivers/staging/rtl8723bs/core/rtw_security.c|  5 ++---
>>  include/linux/crc32poly.h| 20 
>>  lib/crc32.c  | 11 ++-
>>  lib/crc32defs.h  | 14 --
>>  lib/decompress_bunzip2.c |  3 ++-
>>  lib/gen_crc32table.c |  5 +++--
>>  lib/xz/xz_crc32.c|  3 ++-
>>  17 files changed, 55 insertions(+), 54 deletions(-)
>>  create mode 100644 include/linux/crc32poly.h
>>
>
> Did you check whether any of these users can be converted to use the CRC
> implementations in lib/, so they wouldn't need the polynomial definition
> themselves?

I did not check but that's interesting point... The Ethernet drivers
(xgbe, tg3, fec, ks8851, dwc-xlgmac) look like could be converted to
generic implementation. The apple/bmac looks weird. The rtl WiFi
drivers in long term can be converted to use generic lib80211 for
encryption (see commit 0d4876f4e977 ("staging:r8188eu: Use lib80211 to
encrypt (TKIP) tx frames")) but that is much bigger task. The
remaining use the polynomials in different aspect:
1. XZ and BUNZIP use it to create CRC tables - probably generic
gen_crc32table.c could be used,
2. stm32_crc32.c uses it to initialize HW CRC accelerator.

I can work on Freescale FEC, xz and bunzip code because these I can
test but I would prefer to do it as follow up.

Best regards,
Krzysztof


Re: [PATCH] KVM: PPC: Book3S HV: fix constant size warning

2018-07-18 Thread Paul Mackerras
On Sat, Jul 07, 2018 at 11:07:25AM +0200, Nicholas Mc Guire wrote:
>  The constants are 64bit but not explicitly declared UL resulting
> in sparse warnings. Fixed by declaring the constants UL.
> 
> Signed-off-by: Nicholas Mc Guire 

Thanks, patch applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH] KVM: PPC: Book3S HV: add of_node_put() in success path

2018-07-18 Thread Paul Mackerras
On Sat, Jul 07, 2018 at 08:53:07AM +0200, Nicholas Mc Guire wrote:
>  The call to of_find_compatible_node() is returning a pointer with
> incremented refcount so it must be explicitly decremented after the
> last use. As here it is only being used for checking of node presence
> but the result is not actually used in the success path it can be
> dropped immediately.
> 
> Signed-off-by: Nicholas Mc Guire 
> Fixes: commit f725758b899f ("KVM: PPC: Book3S HV: Use OPAL XICS emulation on 
> POWER9")

Thanks, patch applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH v2] KVM: PPC: remove mmio_vsx_tx_sx_enabled in KVM MMIO emulation

2018-07-18 Thread Paul Mackerras
On Mon, May 28, 2018 at 09:48:26AM +0800, wei.guo.si...@gmail.com wrote:
> From: Simon Guo 
> 
> Originally PPC KVM MMIO emulation uses only 0~31#(5 bits) for VSR
> reg number, and use mmio_vsx_tx_sx_enabled field together for
> 0~63# VSR regs.
> 
> Currently PPC KVM MMIO emulation is reimplemented with analyse_instr()
> assistence. analyse_instr() returns 0~63 for VSR register number, so
> it is not necessary to use additional mmio_vsx_tx_sx_enabled field
> any more.
> 
> This patch extends related reg bits(expand io_gpr to u16 from u8
> and use 6 bits for VSR reg#), so that mmio_vsx_tx_sx_enabled can
> be removed.
> 
> v1 -> v2 change:
> rework the commit message to remove "PR KVM" specific word.
> 
> Signed-off-by: Simon Guo 

Thanks, patch applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH kernel] KVM: PPC: Fix hardware and emulated TCE tables matching

2018-07-18 Thread Paul Mackerras
On Wed, Jun 20, 2018 at 06:42:58PM +1000, Alexey Kardashevskiy wrote:
> When attaching a hardware table to LIOBN in KVM, we match table parameters
> such as page size, table offset and table size. However the tables are
> created via very different paths - VFIO and KVM - and the VFIO path goes
> through the platform code which has minimum TCE page size requirement
> (which is 4K but since we allocate memory by pages and cannot avoid
> alignment anyway, we align to 64k pages for powernv_defconfig).
> 
> So when we match the tables, one might be bigger that the other which
> means the hardware table cannot get attached to LIOBN and DMA mapping
> fails.
> 
> This removes the table size alignment from the guest visible table.
> This does not affect the memory allocation which is still aligned -
> kvmppc_tce_pages() takes care of this.
> 
> This relaxes the check we do when attaching tables to allow the hardware
> table be bigger than the guest visible table.
> 
> Ideally we want the KVM table to cover the same space as the hardware
> table does but since the hardware table may use multiple levels, and
> all levels must use the same table size (IODA2 design), the area it can
> actually cover might get very different from the window size which
> the guest requested, even though the guest won't map it all.
> 
> Fixes: ca1fc489cf "KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages 
> with smaller physical pages"
> Signed-off-by: Alexey Kardashevskiy 

Thanks, patch applied to my kvm-ppc-next branch.

Paul.


Re: [PATCH kernel v7 0/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page

2018-07-18 Thread Paul Mackerras
On Tue, Jul 17, 2018 at 05:19:11PM +1000, Alexey Kardashevskiy wrote:
> This is to improve page boundaries checking and should probably
> be cc:stable. I came accross this while debugging nvlink2 passthrough
> but the lack of checking might be exploited by the existing userspace.
> 
> The get_user_pages() comment says it should be "phased out" but the only
> alternative seems to be get_user_pages_longterm(), should that be used
> instead (this is longterm reference elevation, however it is not DAX,
> whatever this implies)? get_user_pages_remote() seems unnecessarily
> complicated because of @locked.
> 
> 
> Changes:
> v7:
> * 2/2: do not fail if pte is not found, fall back to the default case instead
> 
> v6:
> * 2/2: read pageshift from pte
> 
> v5:
> * 2/2: changed compound pages handling
> 
> v4:
> * 2/2: implemented less strict but still safe max pageshift as David suggested
> 
> v3:
> * enforced huge pages not to cross preregistered chunk boundaries
> 
> v2:
> * 2/2: explicitly check for compound pages before calling compound_order()
> 
> 
> This is based on sha1
> 9d3cce1 Linus Torvalds "Linux 4.18-rc5".
> 
> Please comment. Thanks.
> 
> 
> 
> Alexey Kardashevskiy (2):
>   vfio/spapr: Use IOMMU pageshift rather than pagesize
>   KVM: PPC: Check if IOMMU page is contained in the pinned physical page
> 
>  arch/powerpc/include/asm/mmu_context.h |  4 ++--
>  arch/powerpc/kvm/book3s_64_vio.c   |  2 +-
>  arch/powerpc/kvm/book3s_64_vio_hv.c|  6 --
>  arch/powerpc/mm/mmu_context_iommu.c| 37 
> --
>  drivers/vfio/vfio_iommu_spapr_tce.c| 10 -
>  5 files changed, 47 insertions(+), 12 deletions(-)

Thanks, series applied to my kvm-ppc-next branch.

Paul.