Re: [RFC PATCH kernel] vfio/spapr_tce: Get rid of possible infinite loop

2018-10-01 Thread David Gibson
On Tue, Oct 02, 2018 at 01:22:31PM +1000, Alexey Kardashevskiy wrote:
> As a part of cleanup, the SPAPR TCE IOMMU subdriver releases preregistered
> memory. If there is a bug in memory release, the loop in
> tce_iommu_release() becomes infinite; this actually happened to me.
> 
> This makes the loop finite and prints a warning on every failure to make
> the code more bug prone.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

It does improve the current behaviour.  I do suspect, however, that
leaving the failed regions in the list will probably cause another
failure later on.

> ---
>  drivers/vfio/vfio_iommu_spapr_tce.c | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index b1a8ab3..ece0651 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -371,6 +371,7 @@ static void tce_iommu_release(void *iommu_data)
>  {
>   struct tce_container *container = iommu_data;
>   struct tce_iommu_group *tcegrp;
> + struct tce_iommu_prereg *tcemem, *tmtmp;
>   long i;
>  
>   while (tce_groups_attached(container)) {
> @@ -393,13 +394,8 @@ static void tce_iommu_release(void *iommu_data)
>   tce_iommu_free_table(container, tbl);
>   }
>  
> - while (!list_empty(>prereg_list)) {
> - struct tce_iommu_prereg *tcemem;
> -
> - tcemem = list_first_entry(>prereg_list,
> - struct tce_iommu_prereg, next);
> - WARN_ON_ONCE(tce_iommu_prereg_free(container, tcemem));
> - }
> + list_for_each_entry_safe(tcemem, tmtmp, >prereg_list, next)
> + WARN_ON(tce_iommu_prereg_free(container, tcemem));
>  
>   tce_iommu_disable(container);
>   if (container->mm)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[RFC PATCH kernel] vfio/spapr_tce: Get rid of possible infinite loop

2018-10-01 Thread Alexey Kardashevskiy
As a part of cleanup, the SPAPR TCE IOMMU subdriver releases preregistered
memory. If there is a bug in memory release, the loop in
tce_iommu_release() becomes infinite; this actually happened to me.

This makes the loop finite and prints a warning on every failure to make
the code more bug prone.

Signed-off-by: Alexey Kardashevskiy 
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index b1a8ab3..ece0651 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -371,6 +371,7 @@ static void tce_iommu_release(void *iommu_data)
 {
struct tce_container *container = iommu_data;
struct tce_iommu_group *tcegrp;
+   struct tce_iommu_prereg *tcemem, *tmtmp;
long i;
 
while (tce_groups_attached(container)) {
@@ -393,13 +394,8 @@ static void tce_iommu_release(void *iommu_data)
tce_iommu_free_table(container, tbl);
}
 
-   while (!list_empty(>prereg_list)) {
-   struct tce_iommu_prereg *tcemem;
-
-   tcemem = list_first_entry(>prereg_list,
-   struct tce_iommu_prereg, next);
-   WARN_ON_ONCE(tce_iommu_prereg_free(container, tcemem));
-   }
+   list_for_each_entry_safe(tcemem, tmtmp, >prereg_list, next)
+   WARN_ON(tce_iommu_prereg_free(container, tcemem));
 
tce_iommu_disable(container);
if (container->mm)
-- 
2.11.0



[PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2

2018-10-01 Thread Alexey Kardashevskiy
The skiboot firmware has a hot reset handler which fences the NVIDIA V100
GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
https://github.com/open-power/skiboot/commit/fca2b2b839a67

Now we are going to pass V100 via VFIO which most certainly involves
KVM guests which are often terminated without getting a chance to offline
GPU RAM so we end up with a running machine with misconfigured memory.
Accessing this memory produces hardware management interrupts (HMI)
which bring the host down.

To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
via pci_disable_device() which switches NPU2 to a safe mode and prevents
HMIs.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* updated the commit log
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index cde7102..e37b9cc 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
pnv_ioda_release_pe(pe);
 }
 
+static void pnv_npu_disable_device(struct pci_dev *pdev)
+{
+   struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
+   struct eeh_pe *eehpe = edev ? edev->pe : NULL;
+
+   if (eehpe && eeh_ops && eeh_ops->reset)
+   eeh_ops->reset(eehpe, EEH_RESET_HOT);
+}
+
 static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
 {
struct pnv_phb *phb = hose->private_data;
@@ -3732,6 +3741,7 @@ static const struct pci_controller_ops 
pnv_npu_ioda_controller_ops = {
.reset_secondary_bus= pnv_pci_reset_secondary_bus,
.dma_set_mask   = pnv_npu_dma_set_mask,
.shutdown   = pnv_pci_ioda_shutdown,
+   .disable_device = pnv_npu_disable_device,
 };
 
 static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
-- 
2.11.0



Re: [PATCH] driver core: device: add BUS_ATTR_WO macro

2018-10-01 Thread Greg KH
On Mon, Oct 01, 2018 at 06:32:52PM +0300, Ioana Ciornei wrote:
> Add BUS_ATTR_WO macro to make it easier to add attributes without
> auditing the mode settings. Also, use the newly added macro where
> appropriate.
> 
> Signed-off-by: Ioana Ciornei 
> ---
>  arch/powerpc/platforms/pseries/ibmebus.c | 12 
>  drivers/block/rbd.c  | 48 
> 
>  drivers/scsi/fcoe/fcoe_sysfs.c   |  4 +--
>  drivers/scsi/fcoe/fcoe_transport.c   | 10 +++
>  include/linux/device.h   |  2 ++
>  include/scsi/libfcoe.h   |  8 +++---
>  6 files changed, 43 insertions(+), 41 deletions(-)

Nice!  This duplicates a lot of the work I did back in July but have not
pushed out very far due to the other things that ended up happening
around that time:

https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/log/?h=bus_cleanup

As the patch series seen at that link shows, you can do this in more
places than just what you did here.

Either way, you should break this up into the individual patches, like I
did or you can take my patches if you want.  Getting the BUS_ATTR_WO()
macro added is good to do now, and then you can go and hit up all of the
different subsystems that should be converted over to it.

thanks,

greg k-h


Re: [PATCH 2/2] powerpc/tm: Avoid SPR flush if TM is disabled

2018-10-01 Thread Michael Neuling
On Mon, 2018-10-01 at 16:47 -0300, Breno Leitao wrote:
> There is a bug in the flush_tmregs_to_thread() function, where it forces
> TM SPRs to be saved to the thread even if the TM facility is disabled.
> 
> This bug could be reproduced using a simple test case:
> 
>   mtspr(SPRN_TEXASR, XX);
>   sleep until load_tm == 0
>   cause a coredump
>   read SPRN_TEXASR in the coredump
> 
> In this case, the coredump may contain an invalid SPR, because the
> current code is flushing live SPRs (Used by the last thread with TM
> active) into the current thread, overwriting the latest SPRs (which were
> valid).
> 
> This patch checks if TM is enabled for current task before
> saving the SPRs, otherwise, the TM is lazily disabled and the thread
> value is already up-to-date and could be used directly, and saving is
> not required.

Acked-by: Michael Neuling 

Breno, can you send your selftest upstream that also?

Mikey

> 
> Fixes: cd63f3cf1d5 ("powerpc/tm: Fix saving of TM SPRs in core dump")
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/kernel/ptrace.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 9667666eb18e..e0a2ee865032 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -138,7 +138,12 @@ static void flush_tmregs_to_thread(struct task_struct
> *tsk)
>  
>   if (MSR_TM_SUSPENDED(mfmsr())) {
>   tm_reclaim_current(TM_CAUSE_SIGNAL);
> - } else {
> + } else if (tm_enabled(tsk)) {
> + /*
> +  * Only flush TM SPRs to the thread if TM was enabled,
> +  * otherwise (TM lazily disabled), the thread already
> +  * contains the latest SPR value
> +  */
>   tm_enable();
>   tm_save_sprs(&(tsk->thread));
>   }


Re: [PATCH 2/2] powerpc/64: Increase stack redzone for 64-bit kernel to 512 bytes

2018-10-01 Thread Nicholas Piggin
On Mon, 1 Oct 2018 20:41:19 +0800
Bin Meng  wrote:

> Hi Nick,
> 
> On Mon, Oct 1, 2018 at 10:23 AM Nicholas Piggin  wrote:
> >
> > On Mon, 1 Oct 2018 09:11:04 +0800
> > Bin Meng  wrote:
> >  
> > > Hi Nick,
> > >
> > > On Mon, Oct 1, 2018 at 7:27 AM Nicholas Piggin  wrote: 
> > >  
> > > >
> > > > On Sat, 29 Sep 2018 23:25:20 -0700
> > > > Bin Meng  wrote:
> > > >  
> > > > > commit 573ebfa6601f ("powerpc: Increase stack redzone for 64-bit
> > > > > userspace to 512 bytes") only changes stack userspace redzone size.
> > > > > We need increase the kernel one to 512 bytes too per ABIv2 spec.  
> > > >
> > > > You're right we need 512 to be compatible with ABIv2, but as the
> > > > comment says, gcc limits this to 288 bytes so that's what is used
> > > > to save stack space. We can use a compiler version test to change
> > > > this if llvm or a new version of gcc does something different.
> > > >  
> > >
> > > I believe what the comment says is for ABIv1. At the time when commit
> > > 573ebfa6601f was submitted, kernel had not switched to ABIv2 build
> > > yet.  
> >
> > I see, yes you are right about that. However gcc still seems to be using
> > 288 bytes.
> >
> > static inline bool
> > offset_below_red_zone_p (HOST_WIDE_INT offset)
> > {
> >   return offset < (DEFAULT_ABI == ABI_V4
> >? 0
> >: TARGET_32BIT ? -220 : -288);
> > }
> >
> > llvm does as well AFAIKS
> >
> >   // DarwinABI has a 224-byte red zone. PPC32 SVR4ABI(Non-DarwinABI) has no
> >   // red zone and PPC64 SVR4ABI has a 288-byte red zone.
> >   unsigned  getRedZoneSize() const {
> > return isDarwinABI() ? 224 : (isPPC64() ? 288 : 0);
> >   }
> >
> > So I suspect we can get away with using 288 for the kernel. Although
> > the ELFv2 ABI allows 512, I suspect at this point compilers won't switch
> > over without an explicit red zone size flag.
> >  
> 
> Thanks for the info of gcc/llvm codes. I suspect for the red zone size
> gcc/llvm still uses ABIv1 defined value which is 288. If we get way
> with kernel using 288, what's the point of having user as 512 (commit
> 573ebfa6601f)?

See Segher's reply -- they are two different things here. 288 bytes is
the red zone that compilers may use. But there is another region out to
512 bytes which can be used by other system code (not compilers). So
the kernel always has to assume 512 for user.

The kernel code itself knows that it does not use any red zone beyond
288 bytes, so it does not have to preserve any more for its own stack.

Thanks,
Nick


Re: [PATCH 2/2] powerpc/64: Increase stack redzone for 64-bit kernel to 512 bytes

2018-10-01 Thread Nicholas Piggin
On Mon, 1 Oct 2018 03:51:21 -0500
Segher Boessenkool  wrote:

> Hi!
> 
> On Mon, Oct 01, 2018 at 12:22:56PM +1000, Nicholas Piggin wrote:
> > On Mon, 1 Oct 2018 09:11:04 +0800
> > Bin Meng  wrote:  
> > > On Mon, Oct 1, 2018 at 7:27 AM Nicholas Piggin  wrote: 
> > >  
> > > > On Sat, 29 Sep 2018 23:25:20 -0700
> > > > Bin Meng  wrote:  
> > > > > commit 573ebfa6601f ("powerpc: Increase stack redzone for 64-bit
> > > > > userspace to 512 bytes") only changes stack userspace redzone size.
> > > > > We need increase the kernel one to 512 bytes too per ABIv2 spec.
> > > >
> > > > You're right we need 512 to be compatible with ABIv2, but as the
> > > > comment says, gcc limits this to 288 bytes so that's what is used
> > > > to save stack space. We can use a compiler version test to change
> > > > this if llvm or a new version of gcc does something different.
> > > >
> > > 
> > > I believe what the comment says is for ABIv1. At the time when commit
> > > 573ebfa6601f was submitted, kernel had not switched to ABIv2 build
> > > yet.  
> > 
> > I see, yes you are right about that. However gcc still seems to be using
> > 288 bytes.  
> 
> And that is required by the ABI!
> 
> """
> 2.2.2.4. Protected Zone
> 
> The 288 bytes below the stack pointer are available as volatile program
> storage that is not preserved across function calls. Interrupt handlers and
> any other functions that might run without an explicit call must take care
> to preserve a protected zone, also referred to as the red zone, of 512 bytes
> that consists of:
> 
>  * The 288-byte volatile program storage region that is used to hold saved
>registers and local variables
>  * An additional 224 bytes below the volatile program storage region that is
>set aside as a volatile system storage region for system functions
> 
> If a function does not call other functions and does not need more stack
> space than is available in the volatile program storage region (that is, 288
> bytes), it does not need to have a stack frame. The 224-byte volatile system
> storage region is not available to compilers for allocation to saved
> registers and local variables.
> """
> 
> A routine has a red zone of 288 bytes.  Below there is 224 more bytes of
> available storage, but that is not available to the routine itself: some
> (asynchronous) other code (like an interrupt) can use (i.e. clobber) it.

Thanks Segher, that explains it very well and shows we are safe with
288 in the kernel. So we can leave the code as-is, but the comment
could be updated.

What are "system functions" exactly? Can the kernel use that, or are
we talking about user mode system code like libraries? The kernel
could maybe use that for scratch space for synchronous interrupts to
avoid using a slow SPR for scratch.

Thanks,
Nick




> 
> 
> Segher



Re: [PATCH] powerpc: remove leftover code of old GCC version checks

2018-10-01 Thread Nicholas Piggin
On Mon,  1 Oct 2018 15:10:24 +0900
Masahiro Yamada  wrote:

> Clean up the leftover of commit f2910f0e6835 ("powerpc: remove old
> GCC version checks").
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
> My patch had been sent earlier, with more clean-ups:
> https://lore.kernel.org/patchwork/patch/977805/

Sorry, I missed or forgot about your patch :(

Thanks for tidying up my mess!

Acked-by: Nicholas Piggin 

> 
> Anyway, this cleans up the left-over of the Nicholas' one.
> 
> 
>  arch/powerpc/Makefile | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 2ecd0976..b094375 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -400,10 +400,6 @@ archclean:
>  
>  archprepare: checkbin
>  
> -# Use the file '.tmp_gas_check' for binutils tests, as gas won't output
> -# to stdout and these checks are run even on install targets.
> -TOUT := .tmp_gas_check
> -
>  # Check toolchain versions:
>  # - gcc-4.6 is the minimum kernel-wide version so nothing required.
>  checkbin:
> @@ -414,7 +410,3 @@ checkbin:
>   echo -n '*** Please use a different binutils version.' ; \
>   false ; \
>   fi
> -
> -
> -CLEAN_FILES += $(TOUT)
> -
> -- 
> 2.7.4
> 



Re: [PATCH] migration/mm: Add WARN_ON to try_offline_node

2018-10-01 Thread Tyrel Datwyler
On 10/01/2018 01:27 PM, Michal Hocko wrote:
> On Mon 01-10-18 13:56:25, Michael Bringmann wrote:
>> In some LPAR migration scenarios, device-tree modifications are
>> made to the affinity of the memory in the system.  For instance,
>> it may occur that memory is installed to nodes 0,3 on a source
>> system, and to nodes 0,2 on a target system.  Node 2 may not
>> have been initialized/allocated on the target system.
>>
>> After migration, if a RTAS PRRN memory remove is made to a
>> memory block that was in node 3 on the source system, then
>> try_offline_node tries to remove it from node 2 on the target.
>> The NODE_DATA(2) block would not be initialized on the target,
>> and there is no validation check in the current code to prevent
>> the use of a NULL pointer.
> 
> I am not familiar with ppc and the above doesn't really help me
> much. Sorry about that. But from the above it is not clear to me whether
> it is the caller which does something unexpected or the hotplug code
> being not robust enough. From your changelog I would suggest the later
> but why don't we see the same problem for other archs? Is this a problem
> of unrolling a partial failure?
> 
> dlpar_remove_lmb does the following
> 
>   nid = memory_add_physaddr_to_nid(lmb->base_addr);
> 
>   remove_memory(nid, lmb->base_addr, block_sz);
> 
>   /* Update memory regions for memory remove */
>   memblock_remove(lmb->base_addr, block_sz);
> 
>   dlpar_remove_device_tree_lmb(lmb);
> 
> Is the whole operation correct when remove_memory simply backs off
> silently. Why don't we have to care about memblock resp
> dlpar_remove_device_tree_lmb parts? In other words how come the physical
> memory range is valid while the node association is not?
> 

I guess with respect to my previous reply that patch in conjunction with this 
patch set as well?

https://lore.kernel.org/linuxppc-dev/20181001125846.2676.89826.st...@ltcalpine2-lp9.aus.stglabs.ibm.com/T/#t

-Tyrel



Re: [PATCH] migration/mm: Add WARN_ON to try_offline_node

2018-10-01 Thread Tyrel Datwyler
On 10/01/2018 01:27 PM, Michal Hocko wrote:
> On Mon 01-10-18 13:56:25, Michael Bringmann wrote:
>> In some LPAR migration scenarios, device-tree modifications are
>> made to the affinity of the memory in the system.  For instance,
>> it may occur that memory is installed to nodes 0,3 on a source
>> system, and to nodes 0,2 on a target system.  Node 2 may not
>> have been initialized/allocated on the target system.
>>
>> After migration, if a RTAS PRRN memory remove is made to a
>> memory block that was in node 3 on the source system, then
>> try_offline_node tries to remove it from node 2 on the target.
>> The NODE_DATA(2) block would not be initialized on the target,
>> and there is no validation check in the current code to prevent
>> the use of a NULL pointer.
> 
> I am not familiar with ppc and the above doesn't really help me
> much. Sorry about that. But from the above it is not clear to me whether
> it is the caller which does something unexpected or the hotplug code
> being not robust enough. From your changelog I would suggest the later
> but why don't we see the same problem for other archs? Is this a problem
> of unrolling a partial failure?
> 
> dlpar_remove_lmb does the following
> 
>   nid = memory_add_physaddr_to_nid(lmb->base_addr);
> 
>   remove_memory(nid, lmb->base_addr, block_sz);
> 
>   /* Update memory regions for memory remove */
>   memblock_remove(lmb->base_addr, block_sz);
> 
>   dlpar_remove_device_tree_lmb(lmb);
> 
> Is the whole operation correct when remove_memory simply backs off
> silently. Why don't we have to care about memblock resp
> dlpar_remove_device_tree_lmb parts? In other words how come the physical
> memory range is valid while the node association is not?
> 

I think the issue here is a race between the LPM code updating affinity and 
PRRN events being processed. Does your other patch[1] not fix the issue? Or is 
it that the LPM affinity updates don't do any of the initialization/allocation 
you mentioned?

-Tyrel

[1] 
https://lore.kernel.org/linuxppc-dev/20181001185603.11373.61650.st...@ltcalpine2-lp9.aus.stglabs.ibm.com/T/#u



Re: [PATCH] powerpc/lib: fix book3s/32 boot failure due to code patching

2018-10-01 Thread Michael Neuling
On Mon, 2018-10-01 at 12:21 +, Christophe Leroy wrote:
> Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init
> sections") accesses 'init_mem_is_free' flag too early, before the
> kernel is relocated. This provokes early boot failure (before the
> console is active).
> 
> As it is not necessary to do this verification that early, this
> patch moves the test into patch_instruction() instead of
> __patch_instruction().
> 
> This modification also has the advantage of avoiding unnecessary
> remappings.
> 
> Fixes: 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections")
> Signed-off-by: Christophe Leroy 

Thanks

Acked-by: Michael Neuling 

The original patch was also marked for stable so we should do the same here.

Cc: sta...@vger.kernel.org # 4.13+

> ---
>  arch/powerpc/lib/code-patching.c | 20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-
> patching.c
> index 6ae2777c220d..5ffee298745f 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -28,12 +28,6 @@ static int __patch_instruction(unsigned int *exec_addr,
> unsigned int instr,
>  {
>   int err;
>  
> - /* Make sure we aren't patching a freed init section */
> - if (init_mem_is_free && init_section_contains(exec_addr, 4)) {
> - pr_debug("Skipping init section patching addr: 0x%px\n",
> exec_addr);
> - return 0;
> - }
> -
>   __put_user_size(instr, patch_addr, 4, err);
>   if (err)
>   return err;
> @@ -148,7 +142,7 @@ static inline int unmap_patch_area(unsigned long addr)
>   return 0;
>  }
>  
> -int patch_instruction(unsigned int *addr, unsigned int instr)
> +static int do_patch_instruction(unsigned int *addr, unsigned int instr)
>  {
>   int err;
>   unsigned int *patch_addr = NULL;
> @@ -188,12 +182,22 @@ int patch_instruction(unsigned int *addr, unsigned int
> instr)
>  }
>  #else /* !CONFIG_STRICT_KERNEL_RWX */
>  
> -int patch_instruction(unsigned int *addr, unsigned int instr)
> +static int do_patch_instruction(unsigned int *addr, unsigned int instr)
>  {
>   return raw_patch_instruction(addr, instr);
>  }
>  
>  #endif /* CONFIG_STRICT_KERNEL_RWX */
> +
> +int patch_instruction(unsigned int *addr, unsigned int instr)
> +{
> + /* Make sure we aren't patching a freed init section */
> + if (init_mem_is_free && init_section_contains(addr, 4)) {
> + pr_debug("Skipping init section patching addr: 0x%px\n", addr);
> + return 0;
> + }
> + return do_patch_instruction(addr, instr);
> +}
>  NOKPROBE_SYMBOL(patch_instruction);
>  
>  int patch_branch(unsigned int *addr, unsigned long target, int flags)


Re: [PATCH] kdb: print real address of pointers instead of hashed addresses

2018-10-01 Thread Jason Wessel

On 09/27/2018 12:17 PM, Christophe Leroy wrote:

Since commit ad67b74d2469 ("printk: hash addresses printed with %p"),
all pointers printed with %p are printed with hashed addresses
instead of real addresses in order to avoid leaking addresses in
dmesg and syslog. But this applies to kdb too, with is unfortunate:

 Entering kdb (current=0x(ptrval), pid 329) due to Keyboard Entry
 kdb> ps
 15 sleeping system daemon (state M) processes suppressed,
 use 'ps A' to see all.
 Task Addr   Pid   Parent [*] cpu State Thread Command
 0x(ptrval)  329  328  10   R  0x(ptrval) *sh

 0x(ptrval)10  00   S  0x(ptrval)  init
 0x(ptrval)32  00   D  0x(ptrval)  rcu_gp
 0x(ptrval)42  00   D  0x(ptrval)  rcu_par_gp
 0x(ptrval)52  00   D  0x(ptrval)  kworker/0:0
 0x(ptrval)62  00   D  0x(ptrval)  kworker/0:0H
 0x(ptrval)72  00   D  0x(ptrval)  kworker/u2:0
 0x(ptrval)82  00   D  0x(ptrval)  mm_percpu_wq
 0x(ptrval)   102  00   D  0x(ptrval)  rcu_preempt

The whole purpose of kdb is to debug, and for debugging real addresses
need to be known. In addition, data displayed by kdb doesn't go into
dmesg.



I completely agree.  This is added to the merge queue.

Cheers,
Jason.


[PATCH 2/2] powerpc/time: Add set_state_oneshot_stopped decrementer callback

2018-10-01 Thread Anton Blanchard
If CONFIG_PPC_WATCHDOG is enabled we always cap the decrementer to
0x7fff:

   if (IS_ENABLED(CONFIG_PPC_WATCHDOG))
set_dec(0x7fff);
else
set_dec(decrementer_max);

If there are no future events, we don't reprogram the decrementer
after this and we end up with 0x7fff even on a large decrementer
capable system.

As suggested by Nick, add a set_state_oneshot_stopped callback
so we program the decrementer with decrementer_max if there are
no future events.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/kernel/time.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 6a1f0a084ca3..40868f3ee113 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -111,6 +111,7 @@ struct clock_event_device decrementer_clockevent = {
.rating = 200,
.irq= 0,
.set_next_event = decrementer_set_next_event,
+   .set_state_oneshot_stopped = decrementer_shutdown,
.set_state_shutdown = decrementer_shutdown,
.tick_resume= decrementer_shutdown,
.features   = CLOCK_EVT_FEAT_ONESHOT |
-- 
2.17.1



[PATCH 1/2] powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer

2018-10-01 Thread Anton Blanchard
We currently cap the decrementer clockevent at 4 seconds, even on systems
with large decrementer support. Fix this by converting the code to use
clockevents_register_device() which calculates the upper bound based on
the max_delta passed in.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/kernel/time.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 70f145e02487..6a1f0a084ca3 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -984,10 +984,10 @@ static void register_decrementer_clockevent(int cpu)
*dec = decrementer_clockevent;
dec->cpumask = cpumask_of(cpu);
 
+   clockevents_config_and_register(dec, ppc_tb_freq, 2, decrementer_max);
+
printk_once(KERN_DEBUG "clockevent: %s mult[%x] shift[%d] cpu[%d]\n",
dec->name, dec->mult, dec->shift, cpu);
-
-   clockevents_register_device(dec);
 }
 
 static void enable_large_decrementer(void)
@@ -1035,18 +1035,7 @@ static void __init set_decrementer_max(void)
 
 static void __init init_decrementer_clockevent(void)
 {
-   int cpu = smp_processor_id();
-
-   clockevents_calc_mult_shift(_clockevent, ppc_tb_freq, 4);
-
-   decrementer_clockevent.max_delta_ns =
-   clockevent_delta2ns(decrementer_max, _clockevent);
-   decrementer_clockevent.max_delta_ticks = decrementer_max;
-   decrementer_clockevent.min_delta_ns =
-   clockevent_delta2ns(2, _clockevent);
-   decrementer_clockevent.min_delta_ticks = 2;
-
-   register_decrementer_clockevent(cpu);
+   register_decrementer_clockevent(smp_processor_id());
 }
 
 void secondary_cpu_time_init(void)
-- 
2.17.1



Re: [PATCH] kdb: use correct pointer when 'btc' calls 'btt'

2018-10-01 Thread Jason Wessel

On 09/28/2018 07:57 AM, Michael Ellerman wrote:

Christophe LEROY  writes:

Le 27/09/2018 à 13:09, Michael Ellerman a écrit :

Christophe LEROY  writes:

Le 26/09/2018 à 13:11, Daniel Thompson a écrit :

The Fixes: and now your Reviewed-by: appear automatically in patchwork
(https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=65715),
so I believe they'll be automatically included when Jason or someone
else takes the patch, no ?


patchwork won't add the Fixes tag from the reply, it needs to be in the
original mail.

See:
https://github.com/getpatchwork/patchwork/issues/151



Ok, so it accounts it and adds a '1' in the F column in the patches
list, but won't take it into account.


Yes. The logic that populates the columns is separate from the logic
that scrapes the tags, which is a bug :)


Then I'll send a v2 with revised commit text.





No need.  
https://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git/commit/?h=kgdb-next

Since it is a regression fix, we'll try and get it merged as soon as we can.

Cheers,
Jason.


Re: [PATCH 2/2] powerpc/time: Only cap decrementer when watchdog is enabled

2018-10-01 Thread Anton Blanchard
Hi Nick,

> Thanks for tracking this down. It's a fix for my breakage
> 
> a7cba02deced ("powerpc: allow soft-NMI watchdog to cover timer
> interrupts with large decrementers")
> 
> Taking another look... what I had expected here is the timer subsystem
> would have stopped the decrementer device after it processed the timer
> and found nothing left. And we should have set DEC to max at that
> time.
> 
> The above patch was really intended to only cover the timer interrupt
> itself locking up. I wonder if we need to add
> 
> .set_state_oneshot_stopped = decrementer_shutdown
> 
> In our decremementer clockevent device?

Thanks Nick, that looks much nicer, and passes my tests.

Anton


Re: [v4] powerpc: Avoid code patching freed init sections

2018-10-01 Thread Michael Neuling
On Mon, 2018-10-01 at 13:25 +0200, Christophe LEROY wrote:
> 
> Le 21/09/2018 à 13:59, Michael Ellerman a écrit :
> > On Fri, 2018-09-14 at 01:14:11 UTC, Michael Neuling wrote:
> > > This stops us from doing code patching in init sections after they've
> > > been freed.
> > > 
> > > In this chain:
> > >kvm_guest_init() ->
> > >  kvm_use_magic_page() ->
> > >fault_in_pages_readable() ->
> > >__get_user() ->
> > >  __get_user_nocheck() ->
> > >barrier_nospec();
> > > 
> > > We have a code patching location at barrier_nospec() and
> > > kvm_guest_init() is an init function. This whole chain gets inlined,
> > > so when we free the init section (hence kvm_guest_init()), this code
> > > goes away and hence should no longer be patched.
> > > 
> > > We seen this as userspace memory corruption when using a memory
> > > checker while doing partition migration testing on powervm (this
> > > starts the code patching post migration via
> > > /sys/kernel/mobility/migration). In theory, it could also happen when
> > > using /sys/kernel/debug/powerpc/barrier_nospec.
> > > 
> > > cc: sta...@vger.kernel.org # 4.13+
> > > Signed-off-by: Michael Neuling 
> > > Reviewed-by: Nicholas Piggin 
> > > Reviewed-by: Christophe Leroy 
> > 
> > Applied to powerpc fixes, thanks.
> > 
> > https://git.kernel.org/powerpc/c/51c3c62b58b357e8d35e4cc32f7b4e
> > 
> 
> This patch breaks booting on my MPC83xx board (book3s/32) very early 
> (before console is active), provoking restart.
> u-boot reports a checkstop reset at restart.
> 
> Reverting this commit fixes the issue.
> 
> The following patch fixes the issue as well, but I think it is not the 
> best solution. I still think the test should be in patch_instruction() 
> instead of being in __patch_instruction(), see my comment on v2

Arrh, sorry.

Can you write this up as a real patch with a signed off by so mpe can take it?

Mikey

> 
> Christophe
> 
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index 6ae2777..6192fda 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -29,7 +29,7 @@ static int __patch_instruction(unsigned int 
> *exec_addr, unsigned int instr,
>  int err;
> 
>  /* Make sure we aren't patching a freed init section */
> -   if (init_mem_is_free && init_section_contains(exec_addr, 4)) {
> +   if (*PTRRELOC(_mem_is_free) && 
> init_section_contains(exec_addr, 4)) {
>  pr_debug("Skipping init section patching addr: 
> 0x%px\n", exec_addr);
>  return 0;
>  }
> 
> 
> Christophe
> 


Re: [PATCH] powerpc: signedness bug in update_flash_db()

2018-10-01 Thread Geoff Levand
On 10/01/2018 09:44 AM, Dan Carpenter wrote:
> The "count < sizeof(struct os_area_db)" comparison is type promoted to
> size_t so negative values of "count" are treated as very high values and
> we accidentally return success instead of a negative error code.
> 
> This doesn't really change runtime much but it fixes a static checker
> warning.
> 
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/arch/powerpc/platforms/ps3/os-area.c 
> b/arch/powerpc/platforms/ps3/os-area.c
> index cdbfc5cfd6f3..f5387ad82279 100644
> --- a/arch/powerpc/platforms/ps3/os-area.c
> +++ b/arch/powerpc/platforms/ps3/os-area.c
> @@ -664,7 +664,7 @@ static int update_flash_db(void)
>   db_set_64(db, _area_db_id_rtc_diff, saved_params.rtc_diff);
>  
>   count = os_area_flash_write(db, sizeof(struct os_area_db), pos);
> - if (count < sizeof(struct os_area_db)) {
> + if (count < 0 || count < sizeof(struct os_area_db)) {
>   pr_debug("%s: os_area_flash_write failed %zd\n", __func__,
>count);
>   error = count < 0 ? count : -EIO;
> 

Seems OK.

Acked-by: Geoff Levand 



Re: [PATCH v2 0/5] soc/fsl/qbman: DPAA QBMan fixes and additions

2018-10-01 Thread Li Yang
On Fri, Sep 28, 2018 at 3:44 AM Madalin Bucur  wrote:
>

Applied 1-4 to for-next while waiting for clarification on 5/5.   And
updated the prefix to "soc: fsl:" style to be aligned with arm-soc
convention.  Please try to use that style in the future for soc/fsl
patches.

> This patch set brings a number of fixes and the option to control
> the QMan portal interrupt coalescing.
>
> Changes from v1:
>  - change CPU 0 with any online CPU to allow CPU 0 to be taken offline
>  - move common code in a function
>  - address all places in the code where the portal interrupt was affined
>to CPU 0
>  - remove unrelated change from patch adding 64 bit DMA addressing
>requirement
>
> Madalin Bucur (2):
>   soc/fsl/qbman: replace CPU 0 with any online CPU in hotplug handlers
>   soc/fsl_qbman: export coalesce change API
>
> Roy Pledge (3):
>   soc/fsl/qbman: Check if CPU is offline when initializing portals
>   soc/fsl/qbman: Add 64 bit DMA addressing requirement to QBMan
>   soc/fsl/qbman: Use last response to determine valid bit
>
>  drivers/soc/fsl/qbman/Kconfig   |  2 +-
>  drivers/soc/fsl/qbman/bman.c|  6 ++---
>  drivers/soc/fsl/qbman/bman_portal.c |  4 ++-
>  drivers/soc/fsl/qbman/dpaa_sys.h| 20 ++
>  drivers/soc/fsl/qbman/qman.c| 53 
> -
>  drivers/soc/fsl/qbman/qman_portal.c |  6 +++--
>  include/soc/fsl/qman.h  | 27 +++
>  7 files changed, 104 insertions(+), 14 deletions(-)
>
> --
> 2.1.0
>


Re: [PATCH v3 4/6] drivers: clk-qoriq: Add clockgen support for lx2160a

2018-10-01 Thread Stephen Boyd
Same subject comment.

Quoting Vabhav Sharma (2018-09-23 17:08:59)
> From: Yogesh Gaur 
> 
> Add clockgen support for lx2160a.


Re: [PATCH v3 3/6] drivers: clk-qoriq: increase array size of cmux_to_group

2018-10-01 Thread Stephen Boyd
Subject should be "clk: qoriq: increase array size ..."



Re: [PATCH v2 5/5] soc/fsl_qbman: export coalesce change API

2018-10-01 Thread Li Yang
On Fri, Sep 28, 2018 at 3:45 AM Madalin Bucur  wrote:
>
> Export the API required to control the QMan portal interrupt coalescing
> settings.

These are new APIs not just old APIs being exported.  What is the user
of these APIs?  Is the user being submitted?  We cannot have APIs in
kernel that has no users.

>
> Signed-off-by: Madalin Bucur 
> ---
>  drivers/soc/fsl/qbman/qman.c | 31 +++
>  include/soc/fsl/qman.h   | 27 +++
>  2 files changed, 58 insertions(+)
>
> diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
> index 99d0f87889b8..8ab75bb44c4d 100644
> --- a/drivers/soc/fsl/qbman/qman.c
> +++ b/drivers/soc/fsl/qbman/qman.c
> @@ -1012,6 +1012,37 @@ static inline void put_affine_portal(void)
>
>  static struct workqueue_struct *qm_portal_wq;
>
> +void qman_dqrr_set_ithresh(struct qman_portal *portal, u8 ithresh)
> +{
> +   if (!portal)
> +   return;
> +
> +   qm_dqrr_set_ithresh(>p, ithresh);
> +   portal->p.dqrr.ithresh = ithresh;
> +}
> +EXPORT_SYMBOL(qman_dqrr_set_ithresh);
> +
> +void qman_dqrr_get_ithresh(struct qman_portal *portal, u8 *ithresh)
> +{
> +   if (portal && ithresh)
> +   *ithresh = portal->p.dqrr.ithresh;
> +}
> +EXPORT_SYMBOL(qman_dqrr_get_ithresh);
> +
> +void qman_portal_get_iperiod(struct qman_portal *portal, u32 *iperiod)
> +{
> +   if (portal && iperiod)
> +   *iperiod = qm_in(>p, QM_REG_ITPR);
> +}
> +EXPORT_SYMBOL(qman_portal_get_iperiod);
> +
> +void qman_portal_set_iperiod(struct qman_portal *portal, u32 iperiod)
> +{
> +   if (portal)
> +   qm_out(>p, QM_REG_ITPR, iperiod);
> +}
> +EXPORT_SYMBOL(qman_portal_set_iperiod);
> +
>  int qman_wq_alloc(void)
>  {
> qm_portal_wq = alloc_workqueue("qman_portal_wq", 0, 1);
> diff --git a/include/soc/fsl/qman.h b/include/soc/fsl/qman.h
> index d4dfefdee6c1..42f50eb51529 100644
> --- a/include/soc/fsl/qman.h
> +++ b/include/soc/fsl/qman.h
> @@ -1186,4 +1186,31 @@ int qman_alloc_cgrid_range(u32 *result, u32 count);
>   */
>  int qman_release_cgrid(u32 id);
>
> +/**
> + * qman_dqrr_get_ithresh - Get coalesce interrupt threshold
> + * @portal: portal to get the value for
> + * @ithresh: threshold pointer
> + */
> +void qman_dqrr_get_ithresh(struct qman_portal *portal, u8 *ithresh);
> +
> +/**
> + * qman_dqrr_set_ithresh - Set coalesce interrupt threshold
> + * @portal: portal to set the new value on
> + * @ithresh: new threshold value
> + */
> +void qman_dqrr_set_ithresh(struct qman_portal *portal, u8 ithresh);
> +
> +/**
> + * qman_dqrr_get_iperiod - Get coalesce interrupt period
> + * @portal: portal to get the value for
> + * @iperiod: period pointer
> + */
> +void qman_portal_get_iperiod(struct qman_portal *portal, u32 *iperiod);
> +/*
> + * qman_dqrr_set_iperiod - Set coalesce interrupt period
> + * @portal: portal to set the new value on
> + * @ithresh: new period value
> + */
> +void qman_portal_set_iperiod(struct qman_portal *portal, u32 iperiod);
> +
>  #endif /* __FSL_QMAN_H */
> --
> 2.1.0
>


Re: dma mask related fixups (including full bus_dma_mask support) v2

2018-10-01 Thread Benjamin Herrenschmidt
On Mon, 2018-10-01 at 16:32 +0200, Christoph Hellwig wrote:
> FYI, I've pulled this into the dma-mapping tree to make forward
> progress.  All but oatch 4 had formal ACKs, and for that one Robin
> was fine even without and explicit ack.  I'll also send a patch to
> better document the zone selection as it confuses even well versed
> people like Ben.

Thanks. I"ll try to dig out some older systems to test.

Cheers,
Ben.



Re: [PATCH v4 6/9] kbuild: consolidate Devicetree dtb build rules

2018-10-01 Thread Masahiro Yamada
2018年10月2日(火) 0:26 Rob Herring :
>
> There is nothing arch specific about building dtb files other than their
> location under /arch/*/boot/dts/. Keeping each arch aligned is a pain.
> The dependencies and supported targets are all slightly different.
> Also, a cross-compiler for each arch is needed, but really the host
> compiler preprocessor is perfectly fine for building dtbs. Move the
> build rules to a common location and remove the arch specific ones. This
> is done in a single step to avoid warnings about overriding rules.
>
> The build dependencies had been a mixture of 'scripts' and/or 'prepare'.
> These pull in several dependencies some of which need a target compiler
> (specifically devicetable-offsets.h) and aren't needed to build dtbs.
> All that is really needed is dtc, so adjust the dependencies to only be
> dtc.
>
> This change enables support 'dtbs_install' on some arches which were
> missing the target.
>
> Acked-by: Will Deacon 
> Acked-by: Paul Burton 
> Acked-by: Ley Foon Tan 
> Cc: Masahiro Yamada 

Please change this to

Acked-by: Masahiro Yamada 


Thanks.


> Cc: Michal Marek 
> Cc: Vineet Gupta 
> Cc: Russell King 
> Cc: Catalin Marinas 
> Cc: Yoshinori Sato 
> Cc: Michal Simek 
> Cc: Ralf Baechle 
> Cc: James Hogan 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Chris Zankel 
> Cc: Max Filippov 
> Cc: linux-kbu...@vger.kernel.org
> Cc: linux-snps-...@lists.infradead.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: uclinux-h8-de...@lists.sourceforge.jp
> Cc: linux-m...@linux-mips.org
> Cc: nios2-...@lists.rocketboards.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-xte...@linux-xtensa.org
> Signed-off-by: Rob Herring 
> ---
> v4:
>  - Make dtbs and %.dtb rules depend on arch/$ARCH/boot/dts path rather than
>CONFIG_OF_EARLY_FLATTREE
>  - Fix install path missing kernel version for dtbs_install
>  - Fix "make CONFIG_OF_ALL_DTBS=y" for arches like ARM which selectively
>enable CONFIG_OF (and therefore dtc)




-- 
Best Regards
Masahiro Yamada


Re: [PATCH] migration/mm: Add WARN_ON to try_offline_node

2018-10-01 Thread Michal Hocko
On Mon 01-10-18 13:56:25, Michael Bringmann wrote:
> In some LPAR migration scenarios, device-tree modifications are
> made to the affinity of the memory in the system.  For instance,
> it may occur that memory is installed to nodes 0,3 on a source
> system, and to nodes 0,2 on a target system.  Node 2 may not
> have been initialized/allocated on the target system.
> 
> After migration, if a RTAS PRRN memory remove is made to a
> memory block that was in node 3 on the source system, then
> try_offline_node tries to remove it from node 2 on the target.
> The NODE_DATA(2) block would not be initialized on the target,
> and there is no validation check in the current code to prevent
> the use of a NULL pointer.

I am not familiar with ppc and the above doesn't really help me
much. Sorry about that. But from the above it is not clear to me whether
it is the caller which does something unexpected or the hotplug code
being not robust enough. From your changelog I would suggest the later
but why don't we see the same problem for other archs? Is this a problem
of unrolling a partial failure?

dlpar_remove_lmb does the following

nid = memory_add_physaddr_to_nid(lmb->base_addr);

remove_memory(nid, lmb->base_addr, block_sz);

/* Update memory regions for memory remove */
memblock_remove(lmb->base_addr, block_sz);

dlpar_remove_device_tree_lmb(lmb);

Is the whole operation correct when remove_memory simply backs off
silently. Why don't we have to care about memblock resp
dlpar_remove_device_tree_lmb parts? In other words how come the physical
memory range is valid while the node association is not?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] migration/mm: Add WARN_ON to try_offline_node

2018-10-01 Thread Kees Cook
On Mon, Oct 1, 2018 at 11:56 AM, Michael Bringmann
 wrote:
> In some LPAR migration scenarios, device-tree modifications are
> made to the affinity of the memory in the system.  For instance,
> it may occur that memory is installed to nodes 0,3 on a source
> system, and to nodes 0,2 on a target system.  Node 2 may not
> have been initialized/allocated on the target system.
>
> After migration, if a RTAS PRRN memory remove is made to a
> memory block that was in node 3 on the source system, then
> try_offline_node tries to remove it from node 2 on the target.
> The NODE_DATA(2) block would not be initialized on the target,
> and there is no validation check in the current code to prevent
> the use of a NULL pointer.  Call traces such as the following
> may be observed:
>
> A similar problem of moving memory to an unitialized node has
> also been observed on systems where multiple PRRN events occur
> prior to a complete update of the device-tree.
>
> pseries-hotplug-mem: Attempting to update LMB, drc index 8002
> Offlined Pages 4096
> ...
> Oops: Kernel access of bad area, sig: 11 [#1]
> ...
> Workqueue: pseries hotplug workque pseries_hp_work_fn
> ...
> NIP [c02bc088] try_offline_node+0x48/0x1e0
> LR [c02e0b84] remove_memory+0xb4/0xf0
> Call Trace:
> [c002bbee7a30] [c002bbee7a70] 0xc002bbee7a70 (unreliable)
> [c002bbee7a70] [c02e0b84] remove_memory+0xb4/0xf0
> [c002bbee7ab0] [c0097784] dlpar_remove_lmb+0xb4/0x160
> [c002bbee7af0] [c0097f38] dlpar_memory+0x328/0xcb0
> [c002bbee7ba0] [c00906d0] handle_dlpar_errorlog+0xc0/0x130
> [c002bbee7c10] [c00907d4] pseries_hp_work_fn+0x94/0xa0
> [c002bbee7c40] [c00e1cd0] process_one_work+0x1a0/0x4e0
> [c002bbee7cd0] [c00e21b0] worker_thread+0x1a0/0x610
> [c002bbee7d80] [c00ea458] kthread+0x128/0x150
> [c002bbee7e30] [c000982c] ret_from_kernel_thread+0x5c/0xb0
>
> This patch adds a check for an incorrectly initialized to the
> beginning of try_offline_node, and exits the routine.
>
> Another patch is being developed for powerpc to track the
> node Id to which an LMB belongs, so that we can remove the
> LMB from there instead of the nid as currently interpreted
> from the device tree.
>
> Signed-off-by: Michael Bringmann 

Reviewed-by: Kees Cook 

-Kees

> ---
>  mm/memory_hotplug.c |   10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 38d94b7..e48a4d0 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1831,10 +1831,16 @@ static int check_and_unmap_cpu_on_node(pg_data_t 
> *pgdat)
>  void try_offline_node(int nid)
>  {
> pg_data_t *pgdat = NODE_DATA(nid);
> -   unsigned long start_pfn = pgdat->node_start_pfn;
> -   unsigned long end_pfn = start_pfn + pgdat->node_spanned_pages;
> +   unsigned long start_pfn;
> +   unsigned long end_pfn;
> unsigned long pfn;
>
> +   if (WARN_ON(pgdat == NULL))
> +   return;
> +
> +   start_pfn = pgdat->node_start_pfn;
> +   end_pfn = start_pfn + pgdat->node_spanned_pages;
> +
> for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> unsigned long section_nr = pfn_to_section_nr(pfn);
>
>



-- 
Kees Cook
Pixel Security


[PATCH 2/2] powerpc/tm: Avoid SPR flush if TM is disabled

2018-10-01 Thread Breno Leitao
There is a bug in the flush_tmregs_to_thread() function, where it forces
TM SPRs to be saved to the thread even if the TM facility is disabled.

This bug could be reproduced using a simple test case:

  mtspr(SPRN_TEXASR, XX);
  sleep until load_tm == 0
  cause a coredump
  read SPRN_TEXASR in the coredump

In this case, the coredump may contain an invalid SPR, because the
current code is flushing live SPRs (Used by the last thread with TM
active) into the current thread, overwriting the latest SPRs (which were
valid).

This patch checks if TM is enabled for current task before
saving the SPRs, otherwise, the TM is lazily disabled and the thread
value is already up-to-date and could be used directly, and saving is
not required.

Fixes: cd63f3cf1d5 ("powerpc/tm: Fix saving of TM SPRs in core dump")
Signed-off-by: Breno Leitao 
---
 arch/powerpc/kernel/ptrace.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 9667666eb18e..e0a2ee865032 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -138,7 +138,12 @@ static void flush_tmregs_to_thread(struct task_struct *tsk)
 
if (MSR_TM_SUSPENDED(mfmsr())) {
tm_reclaim_current(TM_CAUSE_SIGNAL);
-   } else {
+   } else if (tm_enabled(tsk)) {
+   /*
+* Only flush TM SPRs to the thread if TM was enabled,
+* otherwise (TM lazily disabled), the thread already
+* contains the latest SPR value
+*/
tm_enable();
tm_save_sprs(&(tsk->thread));
}
-- 
2.19.0



[PATCH 1/2] powerpc/tm: Move tm_enable definition

2018-10-01 Thread Breno Leitao
The goal of this patch is to move function tm_enabled() to tm.h in order
to allow this function to be used by other files as an inline function.

This patch also removes the double inclusion of tm.h in the traps.c
source code. One inclusion is inside a CONFIG_PPC64 ifdef block, and
another one is in the generic part. This double inclusion causes a
redefinition of tm_enable(), that is why it is being fixed here.

There is generic code (non CONFIG_PPC64, thus, non
CONFIG_PPC_TRANSACTIONAL_MEM also) using some TM definitions, which
explains why tm.h is being imported in the generic code. This is
not correct, and this code is now surrounded by a
CONFIG_PPC_TRANSACTIONAL_MEM ifdef block.

These ifdef inclusion will avoid calling tm_abort_check() completely,
but it is not a problem since this function is just returning 'false' if
CONFIG_PPC_TRANSACTIONAL_MEM is not defined.

Signed-off-by: Breno Leitao 
---
 arch/powerpc/include/asm/tm.h | 5 +
 arch/powerpc/kernel/process.c | 5 -
 arch/powerpc/kernel/traps.c   | 5 -
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index e94f6db5e367..646d45a2aaae 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -19,4 +19,9 @@ extern void tm_restore_sprs(struct thread_struct *thread);
 
 extern bool tm_suspend_disabled;
 
+static inline bool tm_enabled(struct task_struct *tsk)
+{
+   return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM);
+}
+
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 913c5725cdb2..c1ca2451fa3b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -862,11 +862,6 @@ static inline bool hw_brk_match(struct arch_hw_breakpoint 
*a,
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 
-static inline bool tm_enabled(struct task_struct *tsk)
-{
-   return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM);
-}
-
 static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause)
 {
/*
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index c85adb858271..a3d6298b8074 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -64,7 +64,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -1276,9 +1275,11 @@ static int emulate_instruction(struct pt_regs *regs)
 
/* Emulate load/store string insn. */
if ((instword & PPC_INST_STRING_GEN_MASK) == PPC_INST_STRING) {
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
if (tm_abort_check(regs,
   TM_CAUSE_EMULATE | TM_CAUSE_PERSISTENT))
return -EINVAL;
+#endif
PPC_WARN_EMULATED(string, regs);
return emulate_string_inst(regs, instword);
}
@@ -1508,8 +1509,10 @@ void alignment_exception(struct pt_regs *regs)
if (!arch_irq_disabled_regs(regs))
local_irq_enable();
 
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
if (tm_abort_check(regs, TM_CAUSE_ALIGNMENT | TM_CAUSE_PERSISTENT))
goto bail;
+#endif
 
/* we don't implement logging of alignment exceptions */
if (!(current->thread.align_ctl & PR_UNALIGN_SIGBUS))
-- 
2.19.0



Re: [PATCH] powerpc: signedness bug in update_flash_db()

2018-10-01 Thread Dan Carpenter
On Mon, Oct 01, 2018 at 10:02:54PM +0300, Dan Carpenter wrote:
> On Mon, Oct 01, 2018 at 08:22:01PM +0200, christophe leroy wrote:
> > 
> > 
> > Le 01/10/2018 à 18:44, Dan Carpenter a écrit :
> > > The "count < sizeof(struct os_area_db)" comparison is type promoted to
> > > size_t so negative values of "count" are treated as very high values and
> > > we accidentally return success instead of a negative error code.
> > > 
> > > This doesn't really change runtime much but it fixes a static checker
> > > warning.
> > > 
> > > Signed-off-by: Dan Carpenter 
> > > 
> > > diff --git a/arch/powerpc/platforms/ps3/os-area.c 
> > > b/arch/powerpc/platforms/ps3/os-area.c
> > > index cdbfc5cfd6f3..f5387ad82279 100644
> > > --- a/arch/powerpc/platforms/ps3/os-area.c
> > > +++ b/arch/powerpc/platforms/ps3/os-area.c
> > > @@ -664,7 +664,7 @@ static int update_flash_db(void)
> > >   db_set_64(db, _area_db_id_rtc_diff, saved_params.rtc_diff);
> > >   count = os_area_flash_write(db, sizeof(struct os_area_db), pos);
> > > - if (count < sizeof(struct os_area_db)) {
> > > + if (count < 0 || count < sizeof(struct os_area_db)) {
> > 
> > Why not simply add a cast ? :
> > 
> > if (count < (ssize_t)sizeof(struct os_area_db)) {
> > 
> 
> There are so many ways to solve these and no accounting for taste.  Do
> you need me to resend or can you redo it yourself?
> 

Btw, I just went on vacation, and I'm not going to be back until next
week.

regards,
dan carpenter



Re: [PATCH] powerpc: signedness bug in update_flash_db()

2018-10-01 Thread Dan Carpenter
On Mon, Oct 01, 2018 at 08:22:01PM +0200, christophe leroy wrote:
> 
> 
> Le 01/10/2018 à 18:44, Dan Carpenter a écrit :
> > The "count < sizeof(struct os_area_db)" comparison is type promoted to
> > size_t so negative values of "count" are treated as very high values and
> > we accidentally return success instead of a negative error code.
> > 
> > This doesn't really change runtime much but it fixes a static checker
> > warning.
> > 
> > Signed-off-by: Dan Carpenter 
> > 
> > diff --git a/arch/powerpc/platforms/ps3/os-area.c 
> > b/arch/powerpc/platforms/ps3/os-area.c
> > index cdbfc5cfd6f3..f5387ad82279 100644
> > --- a/arch/powerpc/platforms/ps3/os-area.c
> > +++ b/arch/powerpc/platforms/ps3/os-area.c
> > @@ -664,7 +664,7 @@ static int update_flash_db(void)
> > db_set_64(db, _area_db_id_rtc_diff, saved_params.rtc_diff);
> > count = os_area_flash_write(db, sizeof(struct os_area_db), pos);
> > -   if (count < sizeof(struct os_area_db)) {
> > +   if (count < 0 || count < sizeof(struct os_area_db)) {
> 
> Why not simply add a cast ? :
> 
> if (count < (ssize_t)sizeof(struct os_area_db)) {
> 

There are so many ways to solve these and no accounting for taste.  Do
you need me to resend or can you redo it yourself?

regards,
dan carpenter



[PATCH] migration/mm: Add WARN_ON to try_offline_node

2018-10-01 Thread Michael Bringmann
In some LPAR migration scenarios, device-tree modifications are
made to the affinity of the memory in the system.  For instance,
it may occur that memory is installed to nodes 0,3 on a source
system, and to nodes 0,2 on a target system.  Node 2 may not
have been initialized/allocated on the target system.

After migration, if a RTAS PRRN memory remove is made to a
memory block that was in node 3 on the source system, then
try_offline_node tries to remove it from node 2 on the target.
The NODE_DATA(2) block would not be initialized on the target,
and there is no validation check in the current code to prevent
the use of a NULL pointer.  Call traces such as the following
may be observed:

A similar problem of moving memory to an unitialized node has
also been observed on systems where multiple PRRN events occur
prior to a complete update of the device-tree.

pseries-hotplug-mem: Attempting to update LMB, drc index 8002
Offlined Pages 4096
...
Oops: Kernel access of bad area, sig: 11 [#1]
...
Workqueue: pseries hotplug workque pseries_hp_work_fn
...
NIP [c02bc088] try_offline_node+0x48/0x1e0
LR [c02e0b84] remove_memory+0xb4/0xf0
Call Trace:
[c002bbee7a30] [c002bbee7a70] 0xc002bbee7a70 (unreliable)
[c002bbee7a70] [c02e0b84] remove_memory+0xb4/0xf0
[c002bbee7ab0] [c0097784] dlpar_remove_lmb+0xb4/0x160
[c002bbee7af0] [c0097f38] dlpar_memory+0x328/0xcb0
[c002bbee7ba0] [c00906d0] handle_dlpar_errorlog+0xc0/0x130
[c002bbee7c10] [c00907d4] pseries_hp_work_fn+0x94/0xa0
[c002bbee7c40] [c00e1cd0] process_one_work+0x1a0/0x4e0
[c002bbee7cd0] [c00e21b0] worker_thread+0x1a0/0x610
[c002bbee7d80] [c00ea458] kthread+0x128/0x150
[c002bbee7e30] [c000982c] ret_from_kernel_thread+0x5c/0xb0

This patch adds a check for an incorrectly initialized to the
beginning of try_offline_node, and exits the routine.

Another patch is being developed for powerpc to track the
node Id to which an LMB belongs, so that we can remove the
LMB from there instead of the nid as currently interpreted
from the device tree.

Signed-off-by: Michael Bringmann 
---
 mm/memory_hotplug.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 38d94b7..e48a4d0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1831,10 +1831,16 @@ static int check_and_unmap_cpu_on_node(pg_data_t *pgdat)
 void try_offline_node(int nid)
 {
pg_data_t *pgdat = NODE_DATA(nid);
-   unsigned long start_pfn = pgdat->node_start_pfn;
-   unsigned long end_pfn = start_pfn + pgdat->node_spanned_pages;
+   unsigned long start_pfn;
+   unsigned long end_pfn;
unsigned long pfn;
 
+   if (WARN_ON(pgdat == NULL))
+   return;
+
+   start_pfn = pgdat->node_start_pfn;
+   end_pfn = start_pfn + pgdat->node_spanned_pages;
+
for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
unsigned long section_nr = pfn_to_section_nr(pfn);
 



[PATCH] powerpc/mobility: Extend start/stop topology update scope

2018-10-01 Thread Michael Bringmann
The PPC mobility code may receive RTAS requests to perform PRRN
topology changes at any time, including during LPAR migration
operations.  In some configurations where the affinity of CPUs
or memory is being changed on that platform, the PRRN requests
may apply or refer to outdated information prior to the complete
update of the device-tree.  This patch changes the duration for
which topology updates are suppressed during LPAR migrations from
just the rtas_ibm_suspend_me / 'ibm,suspend-me' call(s) to cover
the entire 'migration_store' operation to allow all changes to
the device-tree to be applied prior to accepting and applying any
PRRN requests.

For tracking purposes, pr_info notices are added to the functions
start_topology_update() and stop_topology_update() of 'numa.c'.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/kernel/rtas.c|4 
 arch/powerpc/mm/numa.c|6 ++
 arch/powerpc/platforms/pseries/mobility.c |5 +
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 8afd146..28d8b57 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -981,8 +981,6 @@ int rtas_ibm_suspend_me(u64 handle)
goto out;
}
 
-   stop_topology_update();
-
/* Call function on all CPUs.  One of us will make the
 * rtas call
 */
@@ -994,8 +992,6 @@ int rtas_ibm_suspend_me(u64 handle)
if (atomic_read() != 0)
printk(KERN_ERR "Error doing global join\n");
 
-   start_topology_update();
-
/* Take down CPUs not online prior to suspend */
cpuret = rtas_offline_cpus_mask(offline_mask);
if (cpuret)
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b5a71ba..0ade0a1 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1518,6 +1518,10 @@ int start_topology_update(void)
}
}
 
+   pr_info("Starting topology update%s%s\n",
+   (prrn_enabled ? " prrn_enabled" : ""),
+   (vphn_enabled ? " vphn_enabled" : ""));
+
return rc;
 }
 
@@ -1539,6 +1543,8 @@ int stop_topology_update(void)
rc = del_timer_sync(_timer);
}
 
+   pr_info("Stopping topology update\n");
+
return rc;
 }
 
diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 23fb9ac..49ebefd 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -373,6 +373,8 @@ static ssize_t migration_store(struct class *class,
if (rc)
return rc;
 
+   stop_topology_update();
+
do {
rc = rtas_ibm_suspend_me(streamid);
if (rc == -EAGAIN)
@@ -383,6 +385,9 @@ static ssize_t migration_store(struct class *class,
return rc;
 
post_mobility_fixup();
+
+   start_topology_update();
+
return count;
 }
 



Re: [PATCH] powerpc: signedness bug in update_flash_db()

2018-10-01 Thread christophe leroy




Le 01/10/2018 à 18:44, Dan Carpenter a écrit :

The "count < sizeof(struct os_area_db)" comparison is type promoted to
size_t so negative values of "count" are treated as very high values and
we accidentally return success instead of a negative error code.

This doesn't really change runtime much but it fixes a static checker
warning.

Signed-off-by: Dan Carpenter 

diff --git a/arch/powerpc/platforms/ps3/os-area.c 
b/arch/powerpc/platforms/ps3/os-area.c
index cdbfc5cfd6f3..f5387ad82279 100644
--- a/arch/powerpc/platforms/ps3/os-area.c
+++ b/arch/powerpc/platforms/ps3/os-area.c
@@ -664,7 +664,7 @@ static int update_flash_db(void)
db_set_64(db, _area_db_id_rtc_diff, saved_params.rtc_diff);
  
  	count = os_area_flash_write(db, sizeof(struct os_area_db), pos);

-   if (count < sizeof(struct os_area_db)) {
+   if (count < 0 || count < sizeof(struct os_area_db)) {


Why not simply add a cast ? :

if (count < (ssize_t)sizeof(struct os_area_db)) {


Christophe


pr_debug("%s: os_area_flash_write failed %zd\n", __func__,
 count);
error = count < 0 ? count : -EIO;



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



[PATCH] driver core: device: add BUS_ATTR_WO macro

2018-10-01 Thread Ioana Ciornei
Add BUS_ATTR_WO macro to make it easier to add attributes without
auditing the mode settings. Also, use the newly added macro where
appropriate.

Signed-off-by: Ioana Ciornei 
---
 arch/powerpc/platforms/pseries/ibmebus.c | 12 
 drivers/block/rbd.c  | 48 
 drivers/scsi/fcoe/fcoe_sysfs.c   |  4 +--
 drivers/scsi/fcoe/fcoe_transport.c   | 10 +++
 include/linux/device.h   |  2 ++
 include/scsi/libfcoe.h   |  8 +++---
 6 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/ibmebus.c 
b/arch/powerpc/platforms/pseries/ibmebus.c
index c7c1140..c75006c 100644
--- a/arch/powerpc/platforms/pseries/ibmebus.c
+++ b/arch/powerpc/platforms/pseries/ibmebus.c
@@ -261,8 +261,8 @@ static char *ibmebus_chomp(const char *in, size_t count)
return out;
 }
 
-static ssize_t ibmebus_store_probe(struct bus_type *bus,
-  const char *buf, size_t count)
+static ssize_t probe_store(struct bus_type *bus,
+  const char *buf, size_t count)
 {
struct device_node *dn = NULL;
struct device *dev;
@@ -298,10 +298,10 @@ static ssize_t ibmebus_store_probe(struct bus_type *bus,
return rc;
return count;
 }
-static BUS_ATTR(probe, 0200, NULL, ibmebus_store_probe);
+static BUS_ATTR_WO(probe);
 
-static ssize_t ibmebus_store_remove(struct bus_type *bus,
-   const char *buf, size_t count)
+static ssize_t remove_store(struct bus_type *bus,
+   const char *buf, size_t count)
 {
struct device *dev;
char *path;
@@ -325,7 +325,7 @@ static ssize_t ibmebus_store_remove(struct bus_type *bus,
return -ENODEV;
}
 }
-static BUS_ATTR(remove, 0200, NULL, ibmebus_store_remove);
+static BUS_ATTR_WO(remove);
 
 static struct attribute *ibmbus_bus_attrs[] = {
_attr_probe.attr,
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 73ed5f3..703d875 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -428,14 +428,14 @@ enum rbd_dev_flags {
 module_param(single_major, bool, 0444);
 MODULE_PARM_DESC(single_major, "Use a single major number for all rbd devices 
(default: true)");
 
-static ssize_t rbd_add(struct bus_type *bus, const char *buf,
-  size_t count);
-static ssize_t rbd_remove(struct bus_type *bus, const char *buf,
- size_t count);
-static ssize_t rbd_add_single_major(struct bus_type *bus, const char *buf,
-   size_t count);
-static ssize_t rbd_remove_single_major(struct bus_type *bus, const char *buf,
-  size_t count);
+static ssize_t add_store(struct bus_type *bus, const char *buf,
+size_t count);
+static ssize_t remove_store(struct bus_type *bus, const char *buf,
+   size_t count);
+static ssize_t add_single_major_store(struct bus_type *bus, const char *buf,
+ size_t count);
+static ssize_t remove_single_major_store(struct bus_type *bus, const char *buf,
+size_t count);
 static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth);
 
 static int rbd_dev_id_to_minor(int dev_id)
@@ -469,10 +469,10 @@ static ssize_t rbd_supported_features_show(struct 
bus_type *bus, char *buf)
return sprintf(buf, "0x%llx\n", RBD_FEATURES_SUPPORTED);
 }
 
-static BUS_ATTR(add, 0200, NULL, rbd_add);
-static BUS_ATTR(remove, 0200, NULL, rbd_remove);
-static BUS_ATTR(add_single_major, 0200, NULL, rbd_add_single_major);
-static BUS_ATTR(remove_single_major, 0200, NULL, rbd_remove_single_major);
+static BUS_ATTR_WO(add);
+static BUS_ATTR_WO(remove);
+static BUS_ATTR_WO(add_single_major);
+static BUS_ATTR_WO(remove_single_major);
 static BUS_ATTR(supported_features, 0444, rbd_supported_features_show, NULL);
 
 static struct attribute *rbd_bus_attrs[] = {
@@ -5930,9 +5930,9 @@ static ssize_t do_rbd_add(struct bus_type *bus,
goto out;
 }
 
-static ssize_t rbd_add(struct bus_type *bus,
-  const char *buf,
-  size_t count)
+static ssize_t add_store(struct bus_type *bus,
+const char *buf,
+size_t count)
 {
if (single_major)
return -EINVAL;
@@ -5940,9 +5940,9 @@ static ssize_t rbd_add(struct bus_type *bus,
return do_rbd_add(bus, buf, count);
 }
 
-static ssize_t rbd_add_single_major(struct bus_type *bus,
-   const char *buf,
-   size_t count)
+static ssize_t add_single_major_store(struct bus_type *bus,
+ const char *buf,
+ size_t count)
 {
return do_rbd_add(bus, buf, count);
 }
@@ -6046,9 +6046,9 @@ static 

Re: [RFC PATCH 11/11] selftests/powerpc: Adapt the test

2018-10-01 Thread Breno Leitao
Hi Mikey,

On 09/28/2018 02:25 AM, Michael Neuling wrote:
>> Perfect, and if the transaction fail, the CPU will rollback the changes and
>> restore the checkpoint registers (replacing the r3 that contains the pid
>> value), thus, it will be like "getpid" system call didn't execute.
> 
> No.  If we are suspended, then we go back right after the sc. We don't get
> rolled back till the tresume.

Yes, but the test code (below) just run tresume after 'sc'. i.e, the syscall
will execute, but there is a tresume just after the syscall, which will cause
the transaction to rollback and jump to "1:" label, which will replace r3.
with -1.

So, the difference now (with this patchset) is that we are calling
treclaim/trecheckpoint in kernel space, which will doom the transaction. It
was not being done before, thus, the test was passing and it is not anymore.

Anyway,  the other way to check for it is calling 'blr' just after 'sc' (
before 'tresume.'), and then tresume after checking if pid == getpid().

If you prefer this method, I can implement it.

>> For this test specifically, it assumes the syscall didn't execute if the
>> transaction failed. Take a look:
>>
>>  FUNC_START(getppid_tm_suspended)
>>  tbegin.
>>  beq 1f
>>  li  r0, __NR_getppid
>>  tsuspend.
>>  sc
>>  tresume.
>>  tend.
>>  blr
>>  1:
>>  li  r3, -1
>>  blr
>>

Thank you!


[PATCH] powerpc: signedness bug in update_flash_db()

2018-10-01 Thread Dan Carpenter
The "count < sizeof(struct os_area_db)" comparison is type promoted to
size_t so negative values of "count" are treated as very high values and
we accidentally return success instead of a negative error code.

This doesn't really change runtime much but it fixes a static checker
warning.

Signed-off-by: Dan Carpenter 

diff --git a/arch/powerpc/platforms/ps3/os-area.c 
b/arch/powerpc/platforms/ps3/os-area.c
index cdbfc5cfd6f3..f5387ad82279 100644
--- a/arch/powerpc/platforms/ps3/os-area.c
+++ b/arch/powerpc/platforms/ps3/os-area.c
@@ -664,7 +664,7 @@ static int update_flash_db(void)
db_set_64(db, _area_db_id_rtc_diff, saved_params.rtc_diff);
 
count = os_area_flash_write(db, sizeof(struct os_area_db), pos);
-   if (count < sizeof(struct os_area_db)) {
+   if (count < 0 || count < sizeof(struct os_area_db)) {
pr_debug("%s: os_area_flash_write failed %zd\n", __func__,
 count);
error = count < 0 ? count : -EIO;


Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-10-01 Thread Dave Hansen
> How should a policy in user space look like when new memory gets added
> - on s390x? Not onlining paravirtualized memory is very wrong.

Because we're going to balloon it away in a moment anyway?

We have auto-onlining.  Why isn't that being used on s390?


> So the type of memory is very important here to have in user space.
> Relying on checks like "isS390()", "isKVMGuest()" or "isHyperVGuest()"
> to decide whether to online memory and how to online memory is wrong.
> Only some specific memory types (which I call "normal") are to be
> handled by user space.
> 
> For the other ones, we exactly know what to do:
> - standby? don't online

I think you're horribly conflating the software desire for what the stae
should be and the hardware itself.

>> As for the OOM issues, that sounds like something we need to fix by
>> refusing to do (or delaying) hot-add operations once we consume too much
>> ZONE_NORMAL from memmap[]s rather than trying to indirectly tell
>> userspace to hurry thing along.
> 
> That is a moving target and doing that automatically is basically
> impossible.

Nah.  We know how much metadata we've allocated.  We know how much
ZONE_NORMAL we are eating.  We can *easily* add something to
add_memory() that just sleeps until the ratio is not out-of-whack.

> You can add a lot of memory to the movable zone and
> everything is fine. Suddenly a lot of processes are started - boom.
> MOVABLE should only every be used if you expect an unplug. And for
> paravirtualized devices, a "typical" unplug does not exist.

No, it's more complicated than that.  People use MOVABLE, for instance,
to allow more consistent huge page allocations.  It's certainly not just
hot-remove.


[PATCH v4 7/9] powerpc: enable building all dtbs

2018-10-01 Thread Rob Herring
Enable the 'dtbs' target for powerpc. This allows building all the dts
files in arch/powerpc/boot/dts/ when COMPILE_TEST and OF_ALL_DTBS are
enabled.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring 
---
 arch/powerpc/boot/dts/Makefile | 5 +
 arch/powerpc/boot/dts/fsl/Makefile | 4 
 2 files changed, 9 insertions(+)
 create mode 100644 arch/powerpc/boot/dts/fsl/Makefile

diff --git a/arch/powerpc/boot/dts/Makefile b/arch/powerpc/boot/dts/Makefile
index f66554cd5c45..fb335d05aae8 100644
--- a/arch/powerpc/boot/dts/Makefile
+++ b/arch/powerpc/boot/dts/Makefile
@@ -1 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
+
+subdir-y += fsl
+
+dtstree:= $(srctree)/$(src)
+dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard 
$(dtstree)/*.dts))
diff --git a/arch/powerpc/boot/dts/fsl/Makefile 
b/arch/powerpc/boot/dts/fsl/Makefile
new file mode 100644
index ..3bae982641e9
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+
+dtstree:= $(srctree)/$(src)
+dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard 
$(dtstree)/*.dts))
-- 
2.17.1



[PATCH v4 6/9] kbuild: consolidate Devicetree dtb build rules

2018-10-01 Thread Rob Herring
There is nothing arch specific about building dtb files other than their
location under /arch/*/boot/dts/. Keeping each arch aligned is a pain.
The dependencies and supported targets are all slightly different.
Also, a cross-compiler for each arch is needed, but really the host
compiler preprocessor is perfectly fine for building dtbs. Move the
build rules to a common location and remove the arch specific ones. This
is done in a single step to avoid warnings about overriding rules.

The build dependencies had been a mixture of 'scripts' and/or 'prepare'.
These pull in several dependencies some of which need a target compiler
(specifically devicetable-offsets.h) and aren't needed to build dtbs.
All that is really needed is dtc, so adjust the dependencies to only be
dtc.

This change enables support 'dtbs_install' on some arches which were
missing the target.

Acked-by: Will Deacon 
Acked-by: Paul Burton 
Acked-by: Ley Foon Tan 
Cc: Masahiro Yamada 
Cc: Michal Marek 
Cc: Vineet Gupta 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Yoshinori Sato 
Cc: Michal Simek 
Cc: Ralf Baechle 
Cc: James Hogan 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Chris Zankel 
Cc: Max Filippov 
Cc: linux-kbu...@vger.kernel.org
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: uclinux-h8-de...@lists.sourceforge.jp
Cc: linux-m...@linux-mips.org
Cc: nios2-...@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-xte...@linux-xtensa.org
Signed-off-by: Rob Herring 
---
v4:
 - Make dtbs and %.dtb rules depend on arch/$ARCH/boot/dts path rather than
   CONFIG_OF_EARLY_FLATTREE
 - Fix install path missing kernel version for dtbs_install
 - Fix "make CONFIG_OF_ALL_DTBS=y" for arches like ARM which selectively
   enable CONFIG_OF (and therefore dtc)


 Makefile  | 37 ++-
 arch/arc/Makefile |  6 -
 arch/arm/Makefile | 20 +
 arch/arm64/Makefile   | 17 +-
 arch/c6x/Makefile |  2 --
 arch/h8300/Makefile   | 11 +
 arch/microblaze/Makefile  |  4 +---
 arch/microblaze/boot/dts/Makefile |  2 ++
 arch/mips/Makefile| 15 +
 arch/nds32/Makefile   |  2 +-
 arch/nios2/Makefile   |  7 --
 arch/nios2/boot/Makefile  |  4 
 arch/powerpc/Makefile |  3 ---
 arch/xtensa/Makefile  | 12 +-
 scripts/Makefile  |  3 +--
 scripts/Makefile.lib  |  2 +-
 scripts/dtc/Makefile  |  2 +-
 17 files changed, 48 insertions(+), 101 deletions(-)

diff --git a/Makefile b/Makefile
index 6c3da3e10f07..251875470c5b 100644
--- a/Makefile
+++ b/Makefile
@@ -1061,7 +1061,7 @@ include/config/kernel.release: $(srctree)/Makefile FORCE
 # Carefully list dependencies so we do not try to build scripts twice
 # in parallel
 PHONY += scripts
-scripts: scripts_basic asm-generic gcc-plugins $(autoksyms_h)
+scripts: scripts_basic scripts_dtc asm-generic gcc-plugins $(autoksyms_h)
$(Q)$(MAKE) $(build)=$(@)

 # Things we need to do before we recursively start building the kernel
@@ -1205,6 +1205,35 @@ kselftest-merge:
$(srctree)/tools/testing/selftests/*/config
+$(Q)$(MAKE) -f $(srctree)/Makefile olddefconfig

+# ---
+# Devicetree files
+
+ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/boot/dts/),)
+dtstree := arch/$(SRCARCH)/boot/dts
+endif
+
+ifneq ($(dtstree),)
+
+%.dtb: prepare3 scripts_dtc
+   $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@
+
+PHONY += dtbs dtbs_install
+dtbs: prepare3 scripts_dtc
+   $(Q)$(MAKE) $(build)=$(dtstree)
+
+dtbs_install:
+   $(Q)$(MAKE) $(dtbinst)=$(dtstree)
+
+ifdef CONFIG_OF_EARLY_FLATTREE
+all: dtbs
+endif
+
+endif
+
+PHONY += scripts_dtc
+scripts_dtc: scripts_basic
+   $(Q)$(MAKE) $(build)=scripts/dtc
+
 # ---
 # Modules

@@ -1414,6 +1443,12 @@ help:
@echo  '  kselftest-merge - Merge all the config dependencies of 
kselftest to existing'
@echo  '.config.'
@echo  ''
+   @$(if $(dtstree), \
+   echo 'Devicetree:'; \
+   echo '* dtbs- Build device tree blobs for enabled 
boards'; \
+   echo '  dtbs_install- Install dtbs to 
$(INSTALL_DTBS_PATH)'; \
+   echo '')
+
@echo 'Userspace tools targets:'
@echo '  use "make tools/help"'
@echo '  or  "cd tools; make help"'
diff --git a/arch/arc/Makefile b/arch/arc/Makefile
index 99cce77ab98f..caece8866080 100644
--- a/arch/arc/Makefile
+++ b/arch/arc/Makefile
@@ -124,11 +124,5 @@ boot_targets += uImage uImage.bin uImage.gz
 $(boot_targets): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@

-%.dtb %.dtb.S %.dtb.o: 

[PATCH v4 1/9] powerpc: build .dtb files in dts directory

2018-10-01 Thread Rob Herring
Align powerpc with other architectures which build the dtb files in the
same directory as the dts files. This is also in line with most other
build targets which are located in the same directory as the source.
This move will help enable the 'dtbs' target which builds all the dtbs
regardless of kernel config.

This transition could break some scripts if they expect dtb files in the
old location.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring 
---

PPC maintainers, really need you review/ack on this.

 arch/powerpc/Makefile  |  2 +-
 arch/powerpc/boot/Makefile | 55 --
 arch/powerpc/boot/dts/Makefile |  1 +
 3 files changed, 28 insertions(+), 30 deletions(-)
 create mode 100644 arch/powerpc/boot/dts/Makefile

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 11a1acba164a..53ea887eb34e 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -294,7 +294,7 @@ bootwrapper_install:
$(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)

 %.dtb: scripts
-   $(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
+   $(Q)$(MAKE) $(build)=$(boot)/dts $(patsubst %,$(boot)/dts/%,$@)

 # Used to create 'merged defconfigs'
 # To use it $(call) it with the first argument as the base defconfig
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 0fb96c26136f..bca5c23767df 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -304,9 +304,9 @@ image-$(CONFIG_PPC_ADDER875)+= 
cuImage.adder875-uboot \
   dtbImage.adder875-redboot

 # Board ports in arch/powerpc/platform/52xx/Kconfig
-image-$(CONFIG_PPC_LITE5200)   += cuImage.lite5200 lite5200.dtb
-image-$(CONFIG_PPC_LITE5200)   += cuImage.lite5200b lite5200b.dtb
-image-$(CONFIG_PPC_MEDIA5200)  += cuImage.media5200 media5200.dtb
+image-$(CONFIG_PPC_LITE5200)   += cuImage.lite5200
+image-$(CONFIG_PPC_LITE5200)   += cuImage.lite5200b
+image-$(CONFIG_PPC_MEDIA5200)  += cuImage.media5200

 # Board ports in arch/powerpc/platform/82xx/Kconfig
 image-$(CONFIG_MPC8272_ADS)+= cuImage.mpc8272ads
@@ -381,11 +381,11 @@ $(addprefix $(obj)/, $(sort $(filter zImage.%, 
$(image-y: vmlinux $(wrapperb
$(call if_changed,wrap,$(subst $(obj)/zImage.,,$@))

 # dtbImage% - a dtbImage is a zImage with an embedded device tree blob
-$(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/%.dtb FORCE
-   $(call if_changed,wrap,$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/dts/%.dtb FORCE
+   $(call if_changed,wrap,$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)

-$(obj)/dtbImage.%: vmlinux $(wrapperbits) $(obj)/%.dtb FORCE
-   $(call if_changed,wrap,$*,,$(obj)/$*.dtb)
+$(obj)/dtbImage.%: vmlinux $(wrapperbits) $(obj)/dts/%.dtb FORCE
+   $(call if_changed,wrap,$*,,$(obj)/dts/$*.dtb)

 # This cannot be in the root of $(src) as the zImage rule always adds a $(obj)
 # prefix
@@ -395,36 +395,33 @@ $(obj)/vmlinux.strip: vmlinux
 $(obj)/uImage: vmlinux $(wrapperbits) FORCE
$(call if_changed,wrap,uboot)

-$(obj)/uImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,uboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/uImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call 
if_changed,wrap,uboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)

-$(obj)/uImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,uboot-$*,,$(obj)/$*.dtb)
+$(obj)/uImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call if_changed,wrap,uboot-$*,,$(obj)/dts/$*.dtb)

-$(obj)/cuImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,cuboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/cuImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call 
if_changed,wrap,cuboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)

-$(obj)/cuImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,cuboot-$*,,$(obj)/$*.dtb)
+$(obj)/cuImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call if_changed,wrap,cuboot-$*,,$(obj)/dts/$*.dtb)

-$(obj)/simpleImage.initrd.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call 
if_changed,wrap,simpleboot-$*,,$(obj)/$*.dtb,$(obj)/ramdisk.image.gz)
+$(obj)/simpleImage.initrd.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call 
if_changed,wrap,simpleboot-$*,,$(obj)/dts/$*.dtb,$(obj)/ramdisk.image.gz)

-$(obj)/simpleImage.%: vmlinux $(obj)/%.dtb $(wrapperbits) FORCE
-   $(call if_changed,wrap,simpleboot-$*,,$(obj)/$*.dtb)
+$(obj)/simpleImage.%: vmlinux $(obj)/dts/%.dtb $(wrapperbits) FORCE
+   $(call if_changed,wrap,simpleboot-$*,,$(obj)/dts/$*.dtb)

-$(obj)/treeImage.initrd.%: vmlinux 

[PATCH v4 0/9] Devicetree build consolidation

2018-10-01 Thread Rob Herring
This series addresses a couple of issues I have with building dts files.

First, the ability to build all the dts files in the tree. This has been
supported on most arches for some time with powerpc being the main
exception. The reason powerpc wasn't supported was it needed a change
in the location built dtb files are put.

Secondly, it's a pain to acquire all the cross-compilers needed to build
dtbs for each arch. There's no reason to build with the cross compiler and
the host compiler is perfectly fine as we only need the pre-processor.

I started addressing just those 2 problems, but kept finding small
differences such as target dependencies and dtbs_install support across
architectures. Instead of trying to align all these, I've consolidated the
build targets moving them out of the arch makefiles.

I'd like to take the series via the DT tree.

PPC maintainers, really need you review/ack on this especially patch 1.

Rob


v4:
 - Make dtbs and %.dtb rules depend on arch/$ARCH/boot/dts path rather than
   CONFIG_OF_EARLY_FLATTREE
 - Fix install path missing kernel version for dtbs_install
 - Fix "make CONFIG_OF_ALL_DTBS=y" for arches like ARM which selectively
   enable CONFIG_OF (and therefore dtc)

v3:
 - Rework dtc dependency to avoid 2 entry paths to scripts/dtc/. Essentially,
   I copied 'scripts_basic'.
 - Add missing scripts_basic dependency for dtc and missing PHONY tag.
 - Drop the '|' order only from dependencies
 - Drop %.dtb.S and %.dtb.o as top-level targets
 - PPC: remove duplicate mpc5200 dtbs from image-y targets

v2:
 - Fix $arch/boot/dts path check for out of tree builds
 - Fix dtc dependency for building built-in dtbs
 - Fix microblaze built-in dtb building
 - Add dtbs target for microblaze


Rob Herring (9):
  powerpc: build .dtb files in dts directory
  nios2: build .dtb files in dts directory
  nios2: use common rules to build built-in dtb
  nios2: fix building all dtbs
  c6x: use common built-in dtb support
  kbuild: consolidate Devicetree dtb build rules
  powerpc: enable building all dtbs
  c6x: enable building all dtbs
  microblaze: enable building all dtbs

 Makefile   | 37 +++-
 arch/arc/Makefile  |  6 
 arch/arm/Makefile  | 20 +--
 arch/arm64/Makefile| 17 +
 arch/c6x/Makefile  |  2 --
 arch/c6x/boot/dts/Makefile | 17 -
 arch/c6x/boot/dts/linked_dtb.S |  2 --
 arch/c6x/include/asm/sections.h|  1 -
 arch/c6x/kernel/setup.c|  4 +--
 arch/c6x/kernel/vmlinux.lds.S  | 10 --
 arch/h8300/Makefile| 11 +-
 arch/microblaze/Makefile   |  4 +--
 arch/microblaze/boot/dts/Makefile  |  4 +++
 arch/mips/Makefile | 15 +---
 arch/nds32/Makefile|  2 +-
 arch/nios2/Makefile| 11 +-
 arch/nios2/boot/Makefile   | 22 
 arch/nios2/boot/dts/Makefile   |  6 
 arch/nios2/boot/linked_dtb.S   | 19 ---
 arch/powerpc/Makefile  |  3 --
 arch/powerpc/boot/Makefile | 55 ++
 arch/powerpc/boot/dts/Makefile |  6 
 arch/powerpc/boot/dts/fsl/Makefile |  4 +++
 arch/xtensa/Makefile   | 12 +--
 scripts/Makefile   |  3 +-
 scripts/Makefile.lib   |  2 +-
 scripts/dtc/Makefile   |  2 +-
 27 files changed, 102 insertions(+), 195 deletions(-)
 delete mode 100644 arch/c6x/boot/dts/linked_dtb.S
 create mode 100644 arch/nios2/boot/dts/Makefile
 delete mode 100644 arch/nios2/boot/linked_dtb.S
 create mode 100644 arch/powerpc/boot/dts/Makefile
 create mode 100644 arch/powerpc/boot/dts/fsl/Makefile

--
2.17.1


Re: [PATCH v2] powerpc/rtas: Fix a potential race between CPU-Offline & Migration

2018-10-01 Thread Nathan Fontenot
On 10/01/2018 05:40 AM, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 
> 
> Live Partition Migrations require all the present CPUs to execute the
> H_JOIN call, and hence rtas_ibm_suspend_me() onlines any offline CPUs
> before initiating the migration for this purpose.
> 
> The commit 85a88cabad57
> ("powerpc/pseries: Disable CPU hotplug across migrations")
> disables any CPU-hotplug operations once all the offline CPUs are
> brought online to prevent any further state change. Once the
> CPU-Hotplug operation is disabled, the code assumes that all the CPUs
> are online.
> 
> However, there is a minor window in rtas_ibm_suspend_me() between
> onlining the offline CPUs and disabling CPU-Hotplug when a concurrent
> CPU-offline operations initiated by the userspace can succeed thereby
> nullifying the the aformentioned assumption. In this unlikely case
> these offlined CPUs will not call H_JOIN, resulting in a system hang.
> 
> Fix this by verifying that all the present CPUs are actually online
> after CPU-Hotplug has been disabled, failing which we restore the
> state of the offline CPUs in rtas_ibm_suspend_me() and return an
> -EBUSY.
> 
> Cc: Nathan Fontenot 
> Cc: Tyrel Datwyler 
> Suggested-by: Michael Ellerman 
> Signed-off-by: Gautham R. Shenoy 

Reviewed-by: Nathan Fontenot 

> ---
> v2: Restore the state of the offline CPUs if all CPUs aren't onlined.
> 
>  arch/powerpc/kernel/rtas.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 2c7ed31..d4468cb 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -982,6 +982,15 @@ int rtas_ibm_suspend_me(u64 handle)
>   }
> 
>   cpu_hotplug_disable();
> +
> + /* Check if we raced with a CPU-Offline Operation */
> + if (unlikely(!cpumask_equal(cpu_present_mask, cpu_online_mask))) {
> + pr_err("%s: Raced against a concurrent CPU-Offline\n",
> +__func__);
> + atomic_set(, -EBUSY);
> + goto out_hotplug_enable;
> + }
> +
>   stop_topology_update();
> 
>   /* Call function on all CPUs.  One of us will make the
> @@ -996,6 +1005,8 @@ int rtas_ibm_suspend_me(u64 handle)
>   printk(KERN_ERR "Error doing global join\n");
> 
>   start_topology_update();
> +
> +out_hotplug_enable:
>   cpu_hotplug_enable();
> 
>   /* Take down CPUs not online prior to suspend */
> 



Re: [PATCH v3 6/9] kbuild: consolidate Devicetree dtb build rules

2018-10-01 Thread Masahiro Yamada
Hi Rob,


2018年10月1日(月) 22:26 Rob Herring :
>
> On Mon, Oct 1, 2018 at 12:49 AM Masahiro Yamada
>  wrote:
> >
> > Hi Rob,
> >
> >
> > 2018年9月29日(土) 0:43 Rob Herring :
> >
> > > +#
> > > ---
> > > +# Devicetree files
> > > +
> > > +ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/boot/dts/),)
> > > +dtstree := arch/$(SRCARCH)/boot/dts
> > > +endif
> > > +
> > > +ifneq ($(dtstree),)
> > > +
> > > +%.dtb : scripts_dtc
> >
> > %.dtb: prepare3 prepare
>
> I assume you didn't mean to drop scripts_dtc as that doesn't work.
>
> Why "prepare" here and not on dtbs?


Sorry, my mistake.


%.dtb: prepare3 scripts_dtc

is the correct one.



> > because we need to make sure KERNELRELEASE
> > is correctly defined before dtbs_install happens.
>
> Yes, indeed. With prepare3 added I get:
>
> cp: cannot create regular file
> '/boot/dtbs/4.19.0-rc3-9-g0afba9b7b2ea-dirty': No such file or
> directory
>
> vs. with it:
>
> cp: cannot create regular file '/boot/dtbs/': Not a directory
>
> >
> >
> > > +   $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@
> > > +
> > > +PHONY += dtbs dtbs_install
> > > +dtbs: scripts_dtc
> >
> >
> > dtbs: prepare3 scripts_dtc
> >
> >
> >
> > > +   $(Q)$(MAKE) $(build)=$(dtstree)
> > > +
> > > +dtbs_install: dtbs
> >
> >
> > Please do not have dtbs_install to depend on dtbs.
> >
> > No install targets should ever trigger building anything
> > in the source tree.
> >
> >
> > For the background, see the commit log of
> > 19514fc665ffbce624785f76ee7ad0ea6378a527
>
> Okay, thanks.
>
> Rob



-- 
Best Regards
Masahiro Yamada


Re: dma mask related fixups (including full bus_dma_mask support) v2

2018-10-01 Thread Christoph Hellwig
FYI, I've pulled this into the dma-mapping tree to make forward
progress.  All but oatch 4 had formal ACKs, and for that one Robin
was fine even without and explicit ack.  I'll also send a patch to
better document the zone selection as it confuses even well versed
people like Ben.


Re: [PATCH ppc-next] powerpc/fsl-booke: don't load early TLB at once

2018-10-01 Thread David Lamparter
On Sat, Sep 22, 2018 at 12:45:16AM -0500, Scott Wood wrote:
> I don't suppose you're running a relocatable kernel at a non-zero

# CONFIG_RELOCATABLE is not set

> address, and/or are running in an environment that sets
> HID0[EN_L2MMU_MHD] (neither standard U-boot nor Linux sets this bit,

It's u-boot v2018.09-rc3;  bit "33" / 30 isn't even defined as a
constant, much less touched anywhere.  HID0 in u-boot seems to be
0x8080 aka EMCP | EN_MAS7_UPDATE.

> though they probably should)?  On 32-bit, we're already running in an AS1
> trampoline when loadcam_multi() is called, but loadcam_multi() sets up
> its own.  This happens to not be catastrophic in standard scenarios, but
> it does add a duplicate TLB entry, and we return to AS0 sooner than
> expected.  I think your patch, plus ifdefs to make the change 32-bit
> only, is the appropriate fix.

I'll resend it with ifdefs inserted.  Meanwhile, still trying the
default config as mentioned in my previous mail.

> I also got an earlier udbg for e500 working (and happened to decide to
> test it with a relocatable kernel); I'll send that out once I've cleaned
> it up (or sooner with the extra TLB dumping included if the above doesn't
> explain why you're hitting this bug).

Thanks & Cheers,


-David


Re: [PATCH v9 0/3] powerpc: Detection and scheduler optimization for POWER9 bigcore

2018-10-01 Thread Dave Hansen
On 10/01/2018 06:16 AM, Gautham R. Shenoy wrote:
> 
> Patch 3: Creates a pair of sysfs attributes named
> /sys/devices/system/cpu/cpuN/topology/smallcore_thread_siblings
> and
> /sys/devices/system/cpu/cpuN/topology/smallcore_thread_siblings_list
> exposing the small-core siblings that share the L1 cache
> to the userspace.

I really don't think you've justified the existence of a new user/kernel
interface here.  We already have information about threads share L1
caches in here:

/sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_list

The only question would be if anything would break because it assumes
that all SMT siblings share all caches.  But, it breaks if your new
interface is there or not; it's old software that we care about.


Re: [PATCH v3 6/9] kbuild: consolidate Devicetree dtb build rules

2018-10-01 Thread Rob Herring
On Mon, Oct 1, 2018 at 12:49 AM Masahiro Yamada
 wrote:
>
> Hi Rob,
>
>
> 2018年9月29日(土) 0:43 Rob Herring :
>
> > +#
> > ---
> > +# Devicetree files
> > +
> > +ifneq ($(wildcard $(srctree)/arch/$(SRCARCH)/boot/dts/),)
> > +dtstree := arch/$(SRCARCH)/boot/dts
> > +endif
> > +
> > +ifneq ($(dtstree),)
> > +
> > +%.dtb : scripts_dtc
>
> %.dtb: prepare3 prepare

I assume you didn't mean to drop scripts_dtc as that doesn't work.

Why "prepare" here and not on dtbs?

> because we need to make sure KERNELRELEASE
> is correctly defined before dtbs_install happens.

Yes, indeed. With prepare3 added I get:

cp: cannot create regular file
'/boot/dtbs/4.19.0-rc3-9-g0afba9b7b2ea-dirty': No such file or
directory

vs. with it:

cp: cannot create regular file '/boot/dtbs/': Not a directory

>
>
> > +   $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@
> > +
> > +PHONY += dtbs dtbs_install
> > +dtbs: scripts_dtc
>
>
> dtbs: prepare3 scripts_dtc
>
>
>
> > +   $(Q)$(MAKE) $(build)=$(dtstree)
> > +
> > +dtbs_install: dtbs
>
>
> Please do not have dtbs_install to depend on dtbs.
>
> No install targets should ever trigger building anything
> in the source tree.
>
>
> For the background, see the commit log of
> 19514fc665ffbce624785f76ee7ad0ea6378a527

Okay, thanks.

Rob


Re: [PATCH ppc-next] powerpc/fsl-booke: don't load early TLB at once

2018-10-01 Thread David Lamparter
(Sorry about the delay on my end, deadlines ...)

On Fri, Sep 21, 2018 at 06:07:48PM +, York Sun wrote:
> On 09/21/2018 10:47 AM, Scott Wood wrote:
> > On Fri, 2018-09-21 at 17:40 +, York Sun wrote:
> >> On 09/20/2018 05:31 PM, Scott Wood wrote:
> >>> On Fri, 2018-09-21 at 00:48 +0200, David Lamparter wrote:
>  My dusty old P4080DS just completely fails to boot (no output at all)
>  without this revert.  I have no clue what's going on here, I just
>  bisected it down and since it looks like an optimization to me I just
>  reverted it - and voilá, the P4080 boots again.
> >>>
> >>> It's not an optimization; it was required to get kdump working, at least
> >>> for certain choices of crash kernel location.
[...]
> >>> York, can you try booting the latest kernel on p4080ds?
[...]
>
> Thanks for the instruction. Linux comes up OK with corenet32_smp_defconfig.
>
> root@p4080ds:~# uname -a
> Linux p4080ds 4.19.0-rc4-00206-ga27fb6d #1 SMP Fri Sep 21 10:56:36 PDT
> 2018 ppc GNU/Linux

Well, shoot.  I guess the next likely thing is that I have something
weird enabled in my .config, so I'll try booting corenet32_smp_defconfig
on my P4080DS.  (I vaguely think I did that at some point but I can't
swear to it so I'll just retry.)  If that doesn't boot, could either of
you provide me with your built uImage so I can exclude toolchain
stupidity?

=> I'll send another mail when I got to try corenet32_smp_defconfig.


Aaaand, during the past week I noticed this is a rev1.0 P4080 chip, which
apparently only exists on some early P4080DS boards - and Freescale even
did a replacement program to get rev2.0 boards out (I guess this one was
sitting in some broom closet.)  I don't have access to erratas for this,
I only know the entire QMan/FMan stuff is royally f*cked - no idea
whether the PPC core has issues too.

FWIW, my board is running perfectly stable with that patch I posted and
I'll just carry it locally if it's an issue for this one specific board
I have here.  I just don't have sufficient information to tell if that
is indeed the case.

Thanks a lot for your input and help,


-David


Re: [PATCH] powerpc/numa: Skip onlining a offline node in kdump path

2018-10-01 Thread Hari Bathini

Thanks for the fix, Srikar..


On Friday 28 September 2018 09:17 AM, Srikar Dronamraju wrote:

With Commit 2ea626306810 ("powerpc/topology: Get topology for shared
processors at boot"), kdump kernel on shared lpar may crash.

The necessary conditions are
- Shared Lpar with atleast 2 nodes having memory and CPUs.
- Memory requirement for kdump kernel must be met by the first N-1 nodes
   where there are atleast N nodes with memory and CPUs.

Example numactl of such a machine.
  numactl -H
available: 5 nodes (0,2,5-7)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 2 cpus:
node 2 size: 255 MB
node 2 free: 189 MB
node 5 cpus: 24 25 26 27 28 29 30 31
node 5 size: 4095 MB
node 5 free: 4024 MB
node 6 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 6 size: 6353 MB
node 6 free: 5998 MB
node 7 cpus: 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39
node 7 size: 7640 MB
node 7 free: 7164 MB
node distances:
node   0   2   5   6   7
   0:  10  40  40  40  40
   2:  40  10  40  40  40
   5:  40  40  10  40  40
   6:  40  40  40  10  20
   7:  40  40  40  20  10

Steps to reproduce.
1. Load / start kdump service.
2. Trigger a kdump (for example : echo c > /proc/sysrq-trigger)

When booting a kdump kernel with 2048M
kexec: Starting switchover sequence.
I'm in purgatory
Using 1TB segments
hash-mmu: Initializing hash mmu with SLB
Linux version 4.19.0-rc5-master+ (srikar@linux-xxu6) (gcc version 4.8.5 (SUSE 
Linux)) #1 SMP Thu Sep 27 19:45:00 IST 2018
Found initrd at 0xc9e7:0xcae554b4
Using pSeries machine description
-
ppc64_pft_size= 0x1e
phys_mem_size = 0x8800
dcache_bsize  = 0x80
icache_bsize  = 0x80
cpu_features  = 0x00ff8f5d91a7
   possible= 0xfbffcf5fb1a7
   always  = 0x006f8b5c91a1
cpu_user_features = 0xdc0065c2 0xef00
mmu_features  = 0x7c006001
firmware_features = 0x0007c45bfc57
htab_hash_mask= 0x7f
physical_start= 0x800
-
numa:   NODE_DATA [mem 0x87d5e300-0x87d67fff]
numa: NODE_DATA(0) on node 6
numa:   NODE_DATA [mem 0x87d54600-0x87d5e2ff]
Top of RAM: 0x8800, Total RAM: 0x8800
Memory hole size: 0MB
Zone ranges:
   DMA  [mem 0x-0x87ff]
   DMA32empty
   Normal   empty
Movable zone start for each node
Early memory node ranges
   node   6: [mem 0x-0x87ff]
Could not find start_pfn for node 0
Initmem setup node 0 [mem 0x-0x]
On node 0 totalpages: 0
Initmem setup node 6 [mem 0x-0x87ff]
On node 6 totalpages: 34816

Unable to handle kernel paging request for data at address 0x0060
Faulting instruction address: 0xc8703a54
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 11 PID: 1 Comm: swapper/11 Not tainted 4.19.0-rc5-master+ #1
NIP:  c8703a54 LR: c8703a38 CTR: 
REGS: cb673440 TRAP: 0380   Not tainted  (4.19.0-rc5-master+)
MSR:  82009033   CR: 24022022  XER: 2002
CFAR: c86fc238 IRQMASK: 0
GPR00: c8703a38 cb6736c0 c9281900 
GPR04:   f001 cb660080
GPR08:    0220
GPR12: 2200 c9e51400  0008
GPR16:  c8c152e8 c8c152a8 
GPR20: c9422fd8 c9412fd8 c9426040 0008
GPR24:   c9168bc8 c9168c78
GPR28: cb126410  c916a0b8 cb126400
NIP [c8703a54] bus_add_device+0x84/0x1e0
LR [c8703a38] bus_add_device+0x68/0x1e0
Call Trace:
[cb6736c0] [c8703a38] bus_add_device+0x68/0x1e0 (unreliable)
[cb673740] [c8700194] device_add+0x454/0x7c0
[cb673800] [c872e660] __register_one_node+0xb0/0x240
[cb673860] [c839a6bc] __try_online_node+0x12c/0x180
[cb673900] [c839b978] try_online_node+0x58/0x90
[cb673930] [c80846d8] find_and_online_cpu_nid+0x158/0x190
[cb673a10] [c80848a0] numa_update_cpu_topology+0x190/0x580
[cb673c00] [c8d3f2e4] smp_cpus_done+0x94/0x108
[cb673c70] [c8d5c00c] smp_init+0x174/0x19c
[cb673d00] [c8d346b8] kernel_init_freeable+0x1e0/0x450
[cb673dc0] [c80102e8] kernel_init+0x28/0x160
[cb673e30] [c800b65c] ret_from_kernel_thread+0x5c/0x80
Instruction dump:
6000 6000 e89e0020 7fe3fb78 4bff87d5 6000 7c7d1b79 4082008c
e8bf0050 e93e0098 3b9f0010 2fa5  38630018 419e0114 7f84e378
---[ end trace 593577668c2daa65 ]---

However a regular kernel with 4096M (2048 gets reserved for
crash kernel) boots properly.


[PATCH v9 3/3] powerpc/sysfs: Add topology/smallcore_thread_siblings[_list]

2018-10-01 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

This patch adds two sysfs attributes named smallcore_thread_siblings
and smallcore_thread_siblings_list to the "topology" attribute group
for each CPU device.

The read-only attributes
/sys/device/system/cpu/cpuN/topology/smallcore_thread_siblings and
/sys/device/system/cpu/cpuN/topology/smallcore_thread_siblings_list
will the online siblings of CPU N that share the L1 cache with it on
big-core configurations in cpumask format and cpu-list format
respectively.

Signed-off-by: Gautham R. Shenoy 
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 14 
 arch/powerpc/kernel/sysfs.c| 91 ++
 2 files changed, 105 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 7331822..2a80dc2 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -511,3 +511,17 @@ Description:   Control Symetric Multi Threading (SMT)
 
 If control status is "forceoff" or "notsupported" 
writes
 are rejected.
+
+What:  /sys/devices/system/cpu/cpu#/topology/smallcore_thread_siblings
+   
/sys/devices/system/cpu/cpu#/topology/smallcore_thread_siblings_list
+Date:  Sept 2018
+Contact:   Linux for PowerPC mailing list 
+Description:   CPU topology files that describe the thread siblings of a
+   logical CPU that share the L1-cache with it on POWER9
+   big-core configurations.
+
+   smallcore_thread_siblings: internal kernel map of
+   cpu#'s hardware threads that share L1-cache with cpu#.
+
+   smallcore_thread_siblings_list: human-readable list of
+   cpu#'s hardware threads that share L1-cache with cpu#.
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 755dc98..3049511 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "cacheinfo.h"
 #include "setup.h"
@@ -1060,3 +1061,93 @@ static int __init topology_init(void)
return 0;
 }
 subsys_initcall(topology_init);
+
+#ifdef CONFIG_SMP
+static ssize_t smallcore_thread_siblings_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   int cpu = dev->id;
+
+   return cpumap_print_to_pagebuf(false, buf, cpu_smallcore_mask(cpu));
+}
+static DEVICE_ATTR_RO(smallcore_thread_siblings);
+
+static ssize_t
+   smallcore_thread_siblings_list_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   int cpu = dev->id;
+
+   return cpumap_print_to_pagebuf(true, buf, cpu_smallcore_mask(cpu));
+}
+static DEVICE_ATTR_RO(smallcore_thread_siblings_list);
+
+static struct attribute *smallcore_attrs[] = {
+   _attr_smallcore_thread_siblings.attr,
+   _attr_smallcore_thread_siblings_list.attr,
+   NULL
+};
+
+static const struct attribute_group smallcore_attr_group = {
+   .name = "topology",
+   .attrs = smallcore_attrs
+};
+
+static int smallcore_register_cpu_online(unsigned int cpu)
+{
+   int err;
+   struct device *cpu_dev = get_cpu_device(cpu);
+
+   if (!has_big_cores)
+   return 0;
+
+   err = sysfs_merge_group(_dev->kobj, _attr_group);
+
+   return err;
+}
+
+static int smallcore_unregister_cpu_online(unsigned int cpu)
+{
+   struct device *cpu_dev = get_cpu_device(cpu);
+
+   if (!has_big_cores)
+   return 0;
+
+   sysfs_unmerge_group(_dev->kobj, _attr_group);
+
+   return 0;
+}
+
+/*
+ * NOTE: The smallcore_register_cpu_online
+ *   (resp. smallcore_unregister_cpu_online) callback will merge
+ *   (resp. unmerge) a couple of additional attributes to the
+ *   "topology" attribute group of a CPU device when the CPU comes
+ *   online (resp. goes offline).
+ *
+ *   Hence, the registration of these callbacks must happen after
+ *   topology_sysfs_init() is called so that the topology
+ *   attribute group is created before these additional attributes
+ *   can be merged/unmerged. We cannot register these callbacks in
+ *   topology_init() since this function is called before
+ *   topology_sysfs_init(). Hence we define the following
+ *   late_initcall for this purpose.
+ */
+static int __init smallcore_topology_init(void)
+{
+   int r;
+
+   if (!has_big_cores)
+   return 0;
+
+   r = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
+ "powerpc/topology/smallcore:online",
+ smallcore_register_cpu_online,
+ smallcore_unregister_cpu_online);
+   

[PATCH v9 1/3] powerpc: Detect the presence of big-cores via "ibm, thread-groups"

2018-10-01 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

On IBM POWER9, the device tree exposes a property array identifed by
"ibm,thread-groups" which will indicate which groups of threads share
a particular set of resources.

As of today we only have one form of grouping identifying the group of
threads in the core that share the L1 cache, translation cache and
instruction data flow.

This patch adds helper functions to parse the contents of
"ibm,thread-groups" and populate a per-cpu variable to cache
information about siblings of each CPU that share the L1, traslation
cache and instruction data-flow.

It also defines a new global variable named "has_big_cores" which
indicates if the cores on this configuration have multiple groups of
threads that share L1 cache.

For each online CPU, it maintains a cpu_smallcore_mask, which
indicates the online siblings which share the L1-cache with it.

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/include/asm/cputhreads.h |   2 +
 arch/powerpc/include/asm/smp.h|   6 +
 arch/powerpc/kernel/smp.c | 222 ++
 3 files changed, 230 insertions(+)

diff --git a/arch/powerpc/include/asm/cputhreads.h 
b/arch/powerpc/include/asm/cputhreads.h
index d71a909..deb99fd 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -23,11 +23,13 @@
 extern int threads_per_core;
 extern int threads_per_subcore;
 extern int threads_shift;
+extern bool has_big_cores;
 extern cpumask_t threads_core_mask;
 #else
 #define threads_per_core   1
 #define threads_per_subcore1
 #define threads_shift  0
+#define has_big_cores  0
 #define threads_core_mask  (*get_cpu_mask(0))
 #endif
 
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 95b66a0..4439893 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -100,6 +100,7 @@ static inline void set_hard_smp_processor_id(int cpu, int 
phys)
 DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_map);
+DECLARE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
 
 static inline struct cpumask *cpu_sibling_mask(int cpu)
 {
@@ -116,6 +117,11 @@ static inline struct cpumask *cpu_l2_cache_mask(int cpu)
return per_cpu(cpu_l2_cache_map, cpu);
 }
 
+static inline struct cpumask *cpu_smallcore_mask(int cpu)
+{
+   return per_cpu(cpu_smallcore_map, cpu);
+}
+
 extern int cpu_to_core_id(int cpu);
 
 /* Since OpenPIC has only 4 IPIs, we use slightly different message numbers.
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 61c1fad..22a14a9 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -74,14 +74,32 @@
 #endif
 
 struct thread_info *secondary_ti;
+bool has_big_cores;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
+DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_core_map);
 
 EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
 EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
 EXPORT_PER_CPU_SYMBOL(cpu_core_map);
+EXPORT_SYMBOL_GPL(has_big_cores);
+
+#define MAX_THREAD_LIST_SIZE   8
+#define THREAD_GROUP_SHARE_L1   1
+struct thread_groups {
+   unsigned int property;
+   unsigned int nr_groups;
+   unsigned int threads_per_group;
+   unsigned int thread_list[MAX_THREAD_LIST_SIZE];
+};
+
+/*
+ * On big-cores system, cpu_l1_cache_map for each CPU corresponds to
+ * the set its siblings that share the L1-cache.
+ */
+DEFINE_PER_CPU(cpumask_var_t, cpu_l1_cache_map);
 
 /* SMP operations for this machine */
 struct smp_ops_t *smp_ops;
@@ -674,6 +692,185 @@ static void set_cpus_unrelated(int i, int j,
 }
 #endif
 
+/*
+ * parse_thread_groups: Parses the "ibm,thread-groups" device tree
+ *  property for the CPU device node @dn and stores
+ *  the parsed output in the thread_groups
+ *  structure @tg if the ibm,thread-groups[0]
+ *  matches @property.
+ *
+ * @dn: The device node of the CPU device.
+ * @tg: Pointer to a thread group structure into which the parsed
+ *  output of "ibm,thread-groups" is stored.
+ * @property: The property of the thread-group that the caller is
+ *interested in.
+ *
+ * ibm,thread-groups[0..N-1] array defines which group of threads in
+ * the CPU-device node can be grouped together based on the property.
+ *
+ * ibm,thread-groups[0] tells us the property based on which the
+ * threads are being grouped together. If this value is 1, it implies
+ * that the threads in the same group share L1, translation cache.
+ *
+ * ibm,thread-groups[1] tells us how many such thread groups exist.
+ *
+ * ibm,thread-groups[2] tells us the number of threads in each such
+ * group.
+ *
+ * ibm,thread-groups[3..N-1] is the list of threads identified by
+ * "ibm,ppc-interrupt-server#s" 

[PATCH v9 0/3] powerpc: Detection and scheduler optimization for POWER9 bigcore

2018-10-01 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

Hi,

This is the ninth iteration of the patchset to add support for
big-core on POWER9. This patch also optimizes the task placement on
such big-core systems.

The previous versions can be found here:
v8: https://lkml.org/lkml/2018/9/20/899
v7: https://lkml.org/lkml/2018/8/20/52
v6: https://lkml.org/lkml/2018/8/9/119
v5: https://lkml.org/lkml/2018/8/6/587
v4: https://lkml.org/lkml/2018/7/24/79
v3: https://lkml.org/lkml/2018/7/6/255
v2: https://lkml.org/lkml/2018/7/3/401
v1: https://lkml.org/lkml/2018/5/11/245

Changes :
v8 --> v9:
   - Rebased it on v4.19-rc5
   - Updated the commit log for the second patch as per Dave Hansen's
   suggestion.
   - Fixed the build errors reported by Michael Neuling and the Kernel
   Build bot.

Description:


IBM POWER9 SMT8 cores consists of two groups of small-cores where each
group has its own L1 cache, translation cache and instruction-data
flow. This can be discovered via the "ibm,thread-groups" CPU property
in the device tree. Furthermore, on POWER9 the thread-ids of such a
big-core is obtained by interleaving the thread-ids of the two
small-cores.

Eg: In an SMT8 core with thread ids {0,1,2,3,4,5,6,7}, the thread-ids
of the threads in the two small-cores respectively will be {0,2,4,6}
and {1,3,5,7} respectively.

   -
   |L1 Cache   |
   --
   |L2| | | |  |
   |  |  0  |  2  |  4  |  6   |Small Core0
   |C | | | |  |
Big|a --
Core   |c | | | |  |
   |h |  1  |  3  |  5  |  7   | Small Core1
   |e | | | |  |
   -
  | L1 Cache   |
  --

On such a big-core system, when multiple tasks are scheduled to run on
the big-core, we get the best performance when the tasks are spread
across the pair of small-cores.

Eg: Suppose there 4 tasks {p1, p2, p3, p4} are run on a big core, then

An Example of Optimal Task placement:
   --
   | | | |  |
   |  0  |  2  |  4  |  6   |   Small Core0
   | (p1)| (p2)| |  |
Big Core   --
   | | | |  |
   |  1  |  3  |  5  |  7   |   Small Core1
   | | (p3)| | (p4) |
   --

An example of Suboptimal Task placement:
   --
   | | | |  |
   |  0  |  2  |  4  |  6   |   Small Core0
   | (p1)| (p2)| |  (p4)|
Big Core   --
   | | | |  |
   |  1  |  3  |  5  |  7   |   Small Core1
   | | (p3)| |  |
   --

Currently on the big-core systems, the sched domain hierarchy is:

SMT   : group of CPUs in the SMT8 core.
DIE   : groups of CPUs on the same die.
NUMA  : all the CPUs in the system.

Thus the scheduler doesn't distinguish between CPUs in the core that
share the L1-cache vs the ones that don't resulting in a run-to-run
variance when multithreaded applications are run on an SMT8 core.

In this patch-set, we address this by defining the sched-domain on the
big-core systems to be:

SMT   : group of CPUs sharing the L1 cache
CACHE : group of CPUs in the SMT8 core.
DIE   : groups of CPUs on the same die.
NUMA  : all the CPUs in the system.

With this, the Linux Kernel load-balancer will ensure that the tasks
are spread across all the component small cores in the system, thereby
yielding optimum performance.

Furthermore, this solution works correctly across all SMT modes
(8,4,2), as the interleaved thread-ids ensures that when we go to
lower SMT modes (4,2) the threads are offlined in a descending order,
thereby leaving equal number of threads from the component small cores
online as illustrated below.

This patchset contains three patches which on detecting the presence
of big-cores, defines the SMT level sched domain to correspond to the
threads of the small cores.

Patch 1: adds support to detect the presence of
big-cores and parses the output of "ibm,thread-groups" device-tree
which using which it updates a per-cpu mask named cpu_smallcore_mask

Patch 2: Defines the SMT level sched domain to correspond to the
threads of the small cores.

Patch 3: Creates a pair of sysfs attributes named
  /sys/devices/system/cpu/cpuN/topology/smallcore_thread_siblings
  and
  /sys/devices/system/cpu/cpuN/topology/smallcore_thread_siblings_list
  exposing the small-core siblings that share the L1 cache
  to the userspace.

Results:
~
1) 2 thread ebizzy
~~
Experimental results for ebizzy with 2 threads, bound to a single big-core
show a marked improvement with this patchset over the 4.19.0-rc5 vanilla
kernel.


[PATCH v9 2/3] powerpc: Use cpu_smallcore_sibling_mask at SMT level on bigcores

2018-10-01 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

POWER9 SMT8 cores consist of two groups of threads, where threads in
each group shares L1-cache. The scheduler is not aware of this
distinction as the current sched-domain hierarchy has all the threads
of the core defined at the SMT domain.

SMT  [Thread siblings of the SMT8 core]
DIE  [CPUs in the same die]
NUMA [All the CPUs in the system]

Due to this, we can observe run-to-run variance when we run a
multi-threaded benchmark bound to a single core based on how the
scheduler spreads the software threads across the two groups in the
core.

We fix this in this patch by defining each group of threads which
share L1-cache to be the SMT level. The group of threads in the SMT8
core is defined to be the CACHE level. The sched-domain hierarchy
after this patch will be :

SMT [Thread siblings in the core that share L1 cache]
CACHE   [Thread siblings that are in the SMT8 core]
DIE [CPUs in the same die]
NUMA[All the CPUs in the system]

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/kernel/smp.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 22a14a9..356751e 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1266,6 +1266,7 @@ static void add_cpu_to_masks(int cpu)
 void start_secondary(void *unused)
 {
unsigned int cpu = smp_processor_id();
+   struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
 
mmgrab(_mm);
current->active_mm = _mm;
@@ -1291,11 +1292,13 @@ void start_secondary(void *unused)
/* Update topology CPU masks */
add_cpu_to_masks(cpu);
 
+   if (has_big_cores)
+   sibling_mask = cpu_smallcore_mask;
/*
 * Check for any shared caches. Note that this must be done on a
 * per-core basis because one core in the pair might be disabled.
 */
-   if (!cpumask_equal(cpu_l2_cache_mask(cpu), cpu_sibling_mask(cpu)))
+   if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu)))
shared_caches = true;
 
set_numa_node(numa_cpu_lookup_table[cpu]);
@@ -1362,6 +1365,13 @@ static const struct cpumask *shared_cache_mask(int cpu)
return cpu_l2_cache_mask(cpu);
 }
 
+#ifdef CONFIG_SCHED_SMT
+static const struct cpumask *smallcore_smt_mask(int cpu)
+{
+   return cpu_smallcore_mask(cpu);
+}
+#endif
+
 static struct sched_domain_topology_level power9_topology[] = {
 #ifdef CONFIG_SCHED_SMT
{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
@@ -1389,6 +1399,13 @@ void __init smp_cpus_done(unsigned int max_cpus)
shared_proc_topology_init();
dump_numa_cpu_topology();
 
+#ifdef CONFIG_SCHED_SMT
+   if (has_big_cores) {
+   pr_info("Using small cores at SMT level\n");
+   power9_topology[0].mask = smallcore_smt_mask;
+   powerpc_topology[0].mask = smallcore_smt_mask;
+   }
+#endif
/*
 * If any CPU detects that it's sharing a cache with another CPU then
 * use the deeper topology that is aware of this sharing.
-- 
1.9.4



[PATCH v03 1/5] powerpc/drmem: Export 'dynamic-memory' loader

2018-10-01 Thread Michael Bringmann
powerpc/drmem: Export many of the functions of DRMEM to parse
"ibm,dynamic-memory" and "ibm,dynamic-memory-v2" during hotplug
operations and for Post Migration events.

Also modify the DRMEM initialization code to allow it to,

* Be called after system initialization
* Provide a separate user copy of the LMB array that is produces
* Free the user copy upon request

In addition, a couple of changes were made to make the creation
of additional copies of the LMB array more useful including,

* Add new iterator to work through a pair of drmem_info arrays.
* Modify DRMEM code to replace usages of dt_root_addr_cells, and
  dt_mem_next_cell, as these are only available at first boot.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/drmem.h |   15 
 arch/powerpc/mm/drmem.c  |   75 --
 2 files changed, 70 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index ce242b9..b0e70fd 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -35,6 +35,18 @@ struct drmem_lmb_info {
_info->lmbs[0],   \
_info->lmbs[drmem_info->n_lmbs - 1])
 
+#define for_each_dinfo_lmb(dinfo, lmb) \
+   for_each_drmem_lmb_in_range((lmb),  \
+   >lmbs[0],\
+   >lmbs[dinfo->n_lmbs - 1])
+
+#define for_each_pair_dinfo_lmb(dinfo1, lmb1, dinfo2, lmb2)\
+   for ((lmb1) = (>lmbs[0]),   \
+(lmb2) = (>lmbs[0]);   \
+((lmb1) <= (>lmbs[dinfo1->n_lmbs - 1])) && \
+((lmb2) <= (>lmbs[dinfo2->n_lmbs - 1]));   \
+(lmb1)++, (lmb2)++)
+
 /*
  * The of_drconf_cell_v1 struct defines the layout of the LMB data
  * specified in the ibm,dynamic-memory device tree property.
@@ -94,6 +106,9 @@ void __init walk_drmem_lmbs(struct device_node *dn,
void (*func)(struct drmem_lmb *, const __be32 **));
 int drmem_update_dt(void);
 
+struct drmem_lmb_info *drmem_lmbs_init(struct property *prop);
+void drmem_lmbs_free(struct drmem_lmb_info *dinfo);
+
 #ifdef CONFIG_PPC_PSERIES
 void __init walk_drmem_lmbs_early(unsigned long node,
void (*func)(struct drmem_lmb *, const __be32 **));
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 3f18036..13d2abb 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -20,6 +20,7 @@
 
 static struct drmem_lmb_info __drmem_info;
 struct drmem_lmb_info *drmem_info = &__drmem_info;
+static int n_root_addr_cells;
 
 u64 drmem_lmb_memory_max(void)
 {
@@ -193,12 +194,13 @@ int drmem_update_dt(void)
return rc;
 }
 
-static void __init read_drconf_v1_cell(struct drmem_lmb *lmb,
+static void read_drconf_v1_cell(struct drmem_lmb *lmb,
   const __be32 **prop)
 {
const __be32 *p = *prop;
 
-   lmb->base_addr = dt_mem_next_cell(dt_root_addr_cells, );
+   lmb->base_addr = of_read_number(p, n_root_addr_cells);
+   p += n_root_addr_cells;
lmb->drc_index = of_read_number(p++, 1);
 
p++; /* skip reserved field */
@@ -209,7 +211,7 @@ static void __init read_drconf_v1_cell(struct drmem_lmb 
*lmb,
*prop = p;
 }
 
-static void __init __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *usm,
+static void __walk_drmem_v1_lmbs(const __be32 *prop, const __be32 *usm,
void (*func)(struct drmem_lmb *, const __be32 **))
 {
struct drmem_lmb lmb;
@@ -225,13 +227,14 @@ static void __init __walk_drmem_v1_lmbs(const __be32 
*prop, const __be32 *usm,
}
 }
 
-static void __init read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
+static void read_drconf_v2_cell(struct of_drconf_cell_v2 *dr_cell,
   const __be32 **prop)
 {
const __be32 *p = *prop;
 
dr_cell->seq_lmbs = of_read_number(p++, 1);
-   dr_cell->base_addr = dt_mem_next_cell(dt_root_addr_cells, );
+   dr_cell->base_addr = of_read_number(p, n_root_addr_cells);
+   p += n_root_addr_cells;
dr_cell->drc_index = of_read_number(p++, 1);
dr_cell->aa_index = of_read_number(p++, 1);
dr_cell->flags = of_read_number(p++, 1);
@@ -239,7 +242,7 @@ static void __init read_drconf_v2_cell(struct 
of_drconf_cell_v2 *dr_cell,
*prop = p;
 }
 
-static void __init __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *usm,
+static void __walk_drmem_v2_lmbs(const __be32 *prop, const __be32 *usm,
void (*func)(struct drmem_lmb *, const __be32 **))
 {
struct of_drconf_cell_v2 dr_cell;
@@ -275,6 +278,9 @@ void __init walk_drmem_lmbs_early(unsigned long node,
const __be32 *prop, *usm;
int len;
 
+   if (n_root_addr_cells == 0)
+   n_root_addr_cells = dt_root_addr_cells;

[PATCH v03 2/5] powerpc/drmem: Add internal_flags feature

2018-10-01 Thread Michael Bringmann
powerpc/drmem: Add internal_flags field to each LMB to allow
marking of kernel software-specific operations that need not
be exported to other users.  For instance, if information about
selected LMBs needs to be maintained for subsequent passes
through the system, it can be encoded into the LMB array itself
without requiring the allocation and maintainance of additional
data structures.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/drmem.h |   18 ++
 arch/powerpc/mm/drmem.c  |2 ++
 2 files changed, 20 insertions(+)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index b0e70fd..acb6539 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -17,6 +17,7 @@ struct drmem_lmb {
u32 drc_index;
u32 aa_index;
u32 flags;
+   u32 internal_flags;
 };
 
 struct drmem_lmb_info {
@@ -101,6 +102,23 @@ static inline bool drmem_lmb_reserved(struct drmem_lmb 
*lmb)
return lmb->flags & DRMEM_LMB_RESERVED;
 }
 
+#define DRMEM_LMBINT_UPDATE0x0001
+
+static inline void drmem_mark_lmb_update(struct drmem_lmb *lmb)
+{
+   lmb->internal_flags |= DRMEM_LMBINT_UPDATE;
+}
+
+static inline void drmem_remove_lmb_update(struct drmem_lmb *lmb)
+{
+   lmb->internal_flags &= ~DRMEM_LMBINT_UPDATE;
+}
+
+static inline bool drmem_lmb_update(struct drmem_lmb *lmb)
+{
+   return lmb->internal_flags & DRMEM_LMBINT_UPDATE;
+}
+
 u64 drmem_lmb_memory_max(void);
 void __init walk_drmem_lmbs(struct device_node *dn,
void (*func)(struct drmem_lmb *, const __be32 **));
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 13d2abb..fd2cae92 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -207,6 +207,7 @@ static void read_drconf_v1_cell(struct drmem_lmb *lmb,
 
lmb->aa_index = of_read_number(p++, 1);
lmb->flags = of_read_number(p++, 1);
+   lmb->internal_flags = 0;
 
*prop = p;
 }
@@ -265,6 +266,7 @@ static void __walk_drmem_v2_lmbs(const __be32 *prop, const 
__be32 *usm,
 
lmb.aa_index = dr_cell.aa_index;
lmb.flags = dr_cell.flags;
+   lmb.internal_flags = 0;
 
func(, );
}



[PATCH v03 0/5] powerpc/migration: Affinity fix for memory

2018-10-01 Thread Michael Bringmann
The migration of LPARs across Power systems affects many attributes
including that of the associativity of memory blocks.  The patches
in this set execute when a system is coming up fresh upon a migration
target.  They are intended to,

* Recognize changes to the associativity of memory recorded in
  internal data structures when compared to the latest copies in
  the device tree (e.g. ibm,dynamic-memory, ibm,dynamic-memory-v2).
* Recognize changes to the associativity mapping (e.g. ibm,
  associativity-lookup-arrays), locate all assigned memory blocks
  corresponding to each changed row, and readd all such blocks.
* Generate calls to other code layers to reset the data structures
  related to associativity of memory.
* Re-register the 'changed' entities into the target system.
  Re-registration of memory blocks mostly entails acting as if they
  have been newly hot-added into the target system.

This code builds upon features introduced in a previous patch set
that updates CPUs for affinity changes that may occur during LPM.

Signed-off-by: Michael Bringmann 

Michael Bringmann (5):
  powerpc/drmem: Export 'dynamic-memory' loader
  powerpc/drmem: Add internal_flags feature
  migration/memory: Add hotplug flags READD_MULTIPLE
  migration/memory: Evaluate LMB assoc changes
  migration/memory: Support 'ibm,dynamic-memory-v2'
---
Changes in v03:
  -- Change operation to tag changed LMBs in DRMEM array instead
 of queuing a potentially huge number of structures.
  -- Added another hotplug queue event for CPU/memory operations
  -- Added internal_flags feature to DRMEM
  -- Improve the patch description language for the patch set.
  -- Revise patch set to queue worker for memory association
 updates directly to pseries worker queue.



[PATCH v03 4/5] migration/memory: Evaluate LMB assoc changes

2018-10-01 Thread Michael Bringmann
migration/memory: This patch adds code that recognizes changes to
the associativity of memory blocks described by the device-tree
properties in order to drive equivalent 'hotplug' operations to
update local and general kernel data structures to reflect those
changes.  These differences may include:

* Evaluate 'ibm,dynamic-memory' properties when processing the
  updated device-tree properties of the system during Post Migration
  events (migration_store).  The new functionality looks for changes
  to the aa_index values for each drc_index/LMB to identify any memory
  blocks that should be readded.

* In an LPAR migration scenario, the "ibm,associativity-lookup-arrays"
  property may change.  In the event that a row of the array differs,
  locate all assigned memory blocks with that 'aa_index' and 're-add'
  them to the system memory block data structures.  In the process of
  the 're-add', the system routines will update the corresponding entry
  for the memory in the LMB structures and any other relevant kernel
  data structures.

A number of previous extensions made to the DRMEM code for scanning
device-tree properties and creating LMB arrays are used here to
ensure that the resulting code is simpler and more usable:

* Use new paired list iterator for the DRMEM LMB info arrays to find
  differences in old and new versions of properties.
* Use new iterator for copies of the DRMEM info arrays to evaluate
  completely new structures.
* Combine common code for parsing and evaluating memory description
  properties based on the DRMEM LMB array model to greatly simplify
  extension from the older property 'ibm,dynamic-memory' to the new
  property model of 'ibm,dynamic-memory-v2'.

Signed-off-by: Michael Bringmann 
---
Changes in v03:
  -- Modify the code that parses the memory affinity attributes to
 mark relevant DRMEM LMB array entries using the internal_flags
 mechanism instead of generate unique hotplug actions for each
 memory block to be readded.  The change is intended to both
 simplify the code, and to require fewer resources on systems
 with huge amounts of memory.
  -- Save up notice about any all LMB entries until the end of the
 'migration_store' operation at which point a single action is
 queued to scan the entire DRMEM array.
  -- Add READD_MULTIPLE function for memory that scans the DRMEM
 array to identify multiple entries that were marked previously.
 The corresponding memory blocks are to be readded to the system
 to update relevant data structures outside of the powerpc-
 specific code.
  -- Change dlpar_memory_pmt_changes_action to directly queue worker
 to pseries work queue.
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |  220 +++
 arch/powerpc/platforms/pseries/mobility.c   |4 
 arch/powerpc/platforms/pseries/pseries.h|4 
 3 files changed, 194 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f5..68bde2e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -561,8 +561,11 @@ static int dlpar_memory_readd_by_index(u32 drc_index)
}
}
 
-   if (!lmb_found)
-   rc = -EINVAL;
+   if (!lmb_found) {
+   pr_info("Failed to update memory for drc index %lx\n",
+   (unsigned long) drc_index);
+   return -EINVAL;
+   }
 
if (rc)
pr_info("Failed to update memory at %llx\n",
@@ -573,6 +576,30 @@ static int dlpar_memory_readd_by_index(u32 drc_index)
return rc;
 }
 
+static int dlpar_memory_readd_multiple(void)
+{
+   struct drmem_lmb *lmb;
+   int rc;
+
+   pr_info("Attempting to update multiple LMBs\n");
+
+   for_each_drmem_lmb(lmb) {
+   if (drmem_lmb_update(lmb)) {
+   rc = dlpar_remove_lmb(lmb);
+
+   if (!rc) {
+   rc = dlpar_add_lmb(lmb);
+   if (rc)
+   dlpar_release_drc(lmb->drc_index);
+   }
+
+   drmem_remove_lmb_update(lmb);
+   }
+   }
+
+   return rc;
+}
+
 static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 {
struct drmem_lmb *lmb, *start_lmb, *end_lmb;
@@ -673,6 +700,10 @@ static int dlpar_memory_readd_by_index(u32 drc_index)
 {
return -EOPNOTSUPP;
 }
+static int dlpar_memory_readd_multiple(void)
+{
+   return -EOPNOTSUPP;
+}
 
 static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 {
@@ -952,6 +983,9 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
drc_index = hp_elog->_drc_u.drc_index;
rc = dlpar_memory_readd_by_index(drc_index);
break;
+   case 

[PATCH v03 3/5] migration/memory: Add hotplug READD_MULTIPLE

2018-10-01 Thread Michael Bringmann
migration/memory: This patch adds a new pseries hotplug action
for CPU and memory operations, PSERIES_HP_ELOG_ACTION_READD_MULTIPLE.
This is a variant of the READD operation which performs the action
upon multiple instances of the resource at one time.  The operation
is to be triggered by device-tree analysis of updates by RTAS events
analyzed by 'migation_store' during post-migration processing.  It
will be used for memory updates, initially.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/rtas.h |1 +
 arch/powerpc/mm/drmem.c |1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 71e393c..e510d82 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -320,6 +320,7 @@ struct pseries_hp_errorlog {
 #define PSERIES_HP_ELOG_ACTION_ADD 1
 #define PSERIES_HP_ELOG_ACTION_REMOVE  2
 #define PSERIES_HP_ELOG_ACTION_READD   3
+#define PSERIES_HP_ELOG_ACTION_READD_MULTIPLE  4
 
 #define PSERIES_HP_ELOG_ID_DRC_NAME1
 #define PSERIES_HP_ELOG_ID_DRC_INDEX   2
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index fd2cae92..2228586 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -422,6 +422,7 @@ static void init_drmem_v2_lmbs(const __be32 *prop,
 
lmb->aa_index = dr_cell.aa_index;
lmb->flags = dr_cell.flags;
+   lmb->internal_flags = 0;
}
}
 }



[PATCH v03 5/5] migration/memory: Support 'ibm,dynamic-memory-v2'

2018-10-01 Thread Michael Bringmann
migration/memory: This patch adds recognition for changes to the
associativity of memory blocks described by 'ibm,dynamic-memory-v2'.
If the associativity of an LMB has changed, it should be readded to
the system in order to update local and general kernel data structures.
This patch builds upon previous enhancements that scan the device-tree
"ibm,dynamic-memory" properties using the base LMB array, and a copy
derived from the updated properties.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 68bde2e..3e65aeb 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -1203,7 +1203,8 @@ static int pseries_memory_notifier(struct notifier_block 
*nb,
err = pseries_remove_mem_node(rd->dn);
break;
case OF_RECONFIG_UPDATE_PROPERTY:
-   if (!strcmp(rd->prop->name, "ibm,dynamic-memory")) {
+   if (!strcmp(rd->prop->name, "ibm,dynamic-memory") ||
+   !strcmp(rd->prop->name, "ibm,dynamic-memory-v2")) {
struct drmem_lmb_info *dinfo =
drmem_lmbs_init(rd->prop);
if (!dinfo)



Re: [PATCH 2/2] powerpc/64: Increase stack redzone for 64-bit kernel to 512 bytes

2018-10-01 Thread Bin Meng
Hi Nick,

On Mon, Oct 1, 2018 at 10:23 AM Nicholas Piggin  wrote:
>
> On Mon, 1 Oct 2018 09:11:04 +0800
> Bin Meng  wrote:
>
> > Hi Nick,
> >
> > On Mon, Oct 1, 2018 at 7:27 AM Nicholas Piggin  wrote:
> > >
> > > On Sat, 29 Sep 2018 23:25:20 -0700
> > > Bin Meng  wrote:
> > >
> > > > commit 573ebfa6601f ("powerpc: Increase stack redzone for 64-bit
> > > > userspace to 512 bytes") only changes stack userspace redzone size.
> > > > We need increase the kernel one to 512 bytes too per ABIv2 spec.
> > >
> > > You're right we need 512 to be compatible with ABIv2, but as the
> > > comment says, gcc limits this to 288 bytes so that's what is used
> > > to save stack space. We can use a compiler version test to change
> > > this if llvm or a new version of gcc does something different.
> > >
> >
> > I believe what the comment says is for ABIv1. At the time when commit
> > 573ebfa6601f was submitted, kernel had not switched to ABIv2 build
> > yet.
>
> I see, yes you are right about that. However gcc still seems to be using
> 288 bytes.
>
> static inline bool
> offset_below_red_zone_p (HOST_WIDE_INT offset)
> {
>   return offset < (DEFAULT_ABI == ABI_V4
>? 0
>: TARGET_32BIT ? -220 : -288);
> }
>
> llvm does as well AFAIKS
>
>   // DarwinABI has a 224-byte red zone. PPC32 SVR4ABI(Non-DarwinABI) has no
>   // red zone and PPC64 SVR4ABI has a 288-byte red zone.
>   unsigned  getRedZoneSize() const {
> return isDarwinABI() ? 224 : (isPPC64() ? 288 : 0);
>   }
>
> So I suspect we can get away with using 288 for the kernel. Although
> the ELFv2 ABI allows 512, I suspect at this point compilers won't switch
> over without an explicit red zone size flag.
>

Thanks for the info of gcc/llvm codes. I suspect for the red zone size
gcc/llvm still uses ABIv1 defined value which is 288. If we get way
with kernel using 288, what's the point of having user as 512 (commit
573ebfa6601f)?

Regards,
Bin


[RFC PATCH v3 7/7] powerpc/64: Modify CURRENT_THREAD_INFO()

2018-10-01 Thread Christophe Leroy
CURRENT_THREAD_INFO() now uses the PACA to retrieve 'current' pointer,
it doesn't use 'sp' anymore.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/exception-64s.h   |  4 ++--
 arch/powerpc/include/asm/thread_info.h |  2 +-
 arch/powerpc/kernel/entry_64.S | 10 +-
 arch/powerpc/kernel/exceptions-64e.S   |  2 +-
 arch/powerpc/kernel/exceptions-64s.S   |  2 +-
 arch/powerpc/kernel/idle_book3e.S  |  2 +-
 arch/powerpc/kernel/idle_power4.S  |  2 +-
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  6 +++---
 8 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 47578b79f0fb..e38d84c267b8 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -672,7 +672,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define RUNLATCH_ON\
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r3, r1);\
+   CURRENT_THREAD_INFO(r3);\
ld  r4,TI_LOCAL_FLAGS(r3);  \
andi.   r0,r4,_TLF_RUNLATCH;\
beqlppc64_runlatch_on_trampoline;   \
@@ -722,7 +722,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
 #ifdef CONFIG_PPC_970_NAP
 #define FINISH_NAP \
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r11, r1);   \
+   CURRENT_THREAD_INFO(r11);   \
ld  r9,TI_LOCAL_FLAGS(r11); \
andi.   r10,r9,_TLF_NAPPING;\
bnelpower4_fixup_nap;   \
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 1c42df627bf3..a339de87806b 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -18,7 +18,7 @@
 #define THREAD_SIZE(1 << THREAD_SHIFT)
 
 #ifdef CONFIG_PPC64
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
+#define CURRENT_THREAD_INFO(dest)  stringify_in_c(ld dest, 
PACACURRENT(r13))
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 697406572592..331b9e9b6d78 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -158,7 +158,7 @@ system_call:/* label this so stack 
traces look sane */
li  r10,IRQS_ENABLED
std r10,SOFTE(r1)
 
-   CURRENT_THREAD_INFO(r11, r1)
+   CURRENT_THREAD_INFO(r11)
ld  r10,TI_FLAGS(r11)
andi.   r11,r10,_TIF_SYSCALL_DOTRACE
bne .Lsyscall_dotrace   /* does not return */
@@ -205,7 +205,7 @@ system_call:/* label this so stack 
traces look sane */
ld  r3,RESULT(r1)
 #endif
 
-   CURRENT_THREAD_INFO(r12, r1)
+   CURRENT_THREAD_INFO(r12)
 
ld  r8,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3S
@@ -336,7 +336,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* Repopulate r9 and r10 for the syscall path */
addir9,r1,STACK_FRAME_OVERHEAD
-   CURRENT_THREAD_INFO(r10, r1)
+   CURRENT_THREAD_INFO(r10)
ld  r10,TI_FLAGS(r10)
 
cmpldi  r0,NR_syscalls
@@ -731,7 +731,7 @@ _GLOBAL(ret_from_except_lite)
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
 
-   CURRENT_THREAD_INFO(r9, r1)
+   CURRENT_THREAD_INFO(r9)
ld  r3,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3E
ld  r10,PACACURRENT(r13)
@@ -845,7 +845,7 @@ resume_kernel:
 1: bl  preempt_schedule_irq
 
/* Re-test flags and eventually loop */
-   CURRENT_THREAD_INFO(r9, r1)
+   CURRENT_THREAD_INFO(r9)
ld  r4,TI_FLAGS(r9)
andi.   r0,r4,_TIF_NEED_RESCHED
bne 1b
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 231d066b4a3d..f48d9aa07a73 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -469,7 +469,7 @@ exc_##n##_bad_stack:
\
  * interrupts happen before the wait instruction.
  */
 #define CHECK_NAPPING()
\
-   CURRENT_THREAD_INFO(r11, r1);   \
+   CURRENT_THREAD_INFO(r11);   \
ld  r10,TI_LOCAL_FLAGS(r11);\
andi.   r9,r10,_TLF_NAPPING;\
beq+1f; \
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 89d32bb79d5e..07701063d36e 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ 

[RFC PATCH v3 6/7] powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU

2018-10-01 Thread Christophe Leroy
Now that thread_info is similar to task_struct, it's address is in r2
so CURRENT_THREAD_INFO() macro is useless. This patch removes it.

At the same time, as the 'cpu' field is not anymore in thread_info,
this patch renames it to TASK_CPU.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Makefile  |  2 +-
 arch/powerpc/include/asm/thread_info.h |  2 --
 arch/powerpc/kernel/asm-offsets.c  |  2 +-
 arch/powerpc/kernel/entry_32.S | 43 --
 arch/powerpc/kernel/epapr_hcalls.S |  5 ++--
 arch/powerpc/kernel/head_fsl_booke.S   |  5 ++--
 arch/powerpc/kernel/idle_6xx.S |  8 +++
 arch/powerpc/kernel/idle_e500.S|  8 +++
 arch/powerpc/kernel/misc_32.S  |  3 +--
 arch/powerpc/mm/hash_low_32.S  | 14 ---
 arch/powerpc/sysdev/6xx-suspend.S  |  5 ++--
 11 files changed, 35 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 4e98989b5512..e2a0843028bc 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -426,5 +426,5 @@ ifdef CONFIG_SMP
 prepare: task_cpu_prepare
 
 task_cpu_prepare: prepare0
-   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == 
"TASK_CPU") print $$3;}' include/generated/asm-offsets.h))
 endif
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 62eb9ff31292..1c42df627bf3 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -19,8 +19,6 @@
 
 #ifdef CONFIG_PPC64
 #define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#else
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(mr dest, r2)
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index ae7eda4ca09e..08b8bfd98737 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -89,7 +89,7 @@ int main(void)
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
 #ifdef CONFIG_SMP
-   OFFSET(TI_CPU, task_struct, cpu);
+   OFFSET(TASK_CPU, task_struct, cpu);
 #endif
 
 #ifdef CONFIG_LIVEPATCH
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index b45da00b01ef..5d12b26e20a4 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -168,8 +168,7 @@ transfer_to_handler:
tophys(r11,r11)
addir11,r11,global_dbcr0@l
 #ifdef CONFIG_SMP
-   CURRENT_THREAD_INFO(r9, r1)
-   lwz r9,TI_CPU(r9)
+   lwz r9,TASK_CPU(r2)
slwir9,r9,3
add r11,r11,r9
 #endif
@@ -180,8 +179,7 @@ transfer_to_handler:
stw r12,4(r11)
 #endif
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9, r9)
+   tophys(r9, r2)
ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
 #endif
 
@@ -195,8 +193,7 @@ transfer_to_handler:
ble-stack_ovf   /* then the kernel stack overflowed */
 5:
 #if defined(CONFIG_6xx) || defined(CONFIG_E500)
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9,r9)   /* check local flags */
+   tophys(r9,r2)   /* check local flags */
lwz r12,TI_LOCAL_FLAGS(r9)
mtcrf   0x01,r12
bt- 31-TLF_NAPPING,4f
@@ -345,8 +342,7 @@ _GLOBAL(DoSyscall)
mtmsr   r11
 1:
 #endif /* CONFIG_TRACE_IRQFLAGS */
-   CURRENT_THREAD_INFO(r10, r1)
-   lwz r11,TI_FLAGS(r10)
+   lwz r11,TI_FLAGS(r2)
andi.   r11,r11,_TIF_SYSCALL_DOTRACE
bne-syscall_dotrace
 syscall_dotrace_cont:
@@ -379,13 +375,12 @@ ret_from_syscall:
lwz r3,GPR3(r1)
 #endif
mr  r6,r3
-   CURRENT_THREAD_INFO(r12, r1)
/* disable interrupts so current_thread_info()->flags can't change */
LOAD_MSR_KERNEL(r10,MSR_KERNEL) /* doesn't include MSR_EE */
/* Note: We don't bother telling lockdep about it */
SYNC
MTMSRD(r10)
-   lwz r9,TI_FLAGS(r12)
+   lwz r9,TI_FLAGS(r2)
li  r8,-MAX_ERRNO
andi.   
r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne-syscall_exit_work
@@ -432,8 +427,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
andi.   r4,r8,MSR_PR
beq 3f
-   CURRENT_THREAD_INFO(r4, r1)
-   ACCOUNT_CPU_USER_EXIT(r4, r5, r7)
+   ACCOUNT_CPU_USER_EXIT(r2, r5, r7)
 3:
 #endif
lwz r4,_LINK(r1)
@@ -526,7 +520,7 @@ syscall_exit_work:
/* Clear per-syscall TIF flags if any are set.  */
 
li  r11,_TIF_PERSYSCALL_MASK
-   addir12,r12,TI_FLAGS
+   addir12,r2,TI_FLAGS
 3: lwarx   r8,0,r12
andcr8,r8,r11
 #ifdef 

[RFC PATCH v3 4/7] powerpc: regain entire stack space

2018-10-01 Thread Christophe Leroy
thread_info is not anymore in the stack, so the entire stack
can now be used.

In the meantime, all pointers to the stacks are not anymore
pointers to thread_info so this patch changes them to void*

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/irq.h   | 10 +-
 arch/powerpc/include/asm/processor.h |  3 +--
 arch/powerpc/kernel/asm-offsets.c|  1 -
 arch/powerpc/kernel/entry_32.S   | 14 --
 arch/powerpc/kernel/irq.c| 19 +--
 arch/powerpc/kernel/misc_32.S|  6 ++
 arch/powerpc/kernel/process.c|  9 +++--
 arch/powerpc/kernel/setup_64.c   |  8 
 8 files changed, 28 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 8108d1fe33ca..3987929408d3 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -48,9 +48,9 @@ struct pt_regs;
  * Per-cpu stacks for handling critical, debug and machine check
  * level interrupts.
  */
-extern struct thread_info *critirq_ctx[NR_CPUS];
-extern struct thread_info *dbgirq_ctx[NR_CPUS];
-extern struct thread_info *mcheckirq_ctx[NR_CPUS];
+extern void *critirq_ctx[NR_CPUS];
+extern void *dbgirq_ctx[NR_CPUS];
+extern void *mcheckirq_ctx[NR_CPUS];
 extern void exc_lvl_ctx_init(void);
 #else
 #define exc_lvl_ctx_init()
@@ -59,8 +59,8 @@ extern void exc_lvl_ctx_init(void);
 /*
  * Per-cpu stacks for handling hard and soft interrupts.
  */
-extern struct thread_info *hardirq_ctx[NR_CPUS];
-extern struct thread_info *softirq_ctx[NR_CPUS];
+extern void *hardirq_ctx[NR_CPUS];
+extern void *softirq_ctx[NR_CPUS];
 
 extern void irq_ctx_init(void);
 extern void call_do_softirq(void *tp);
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 31873614392f..834d0d701e19 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -332,8 +332,7 @@ struct thread_struct {
 #define ARCH_MIN_TASKALIGN 16
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
-#define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long) 
_stack)
+#define INIT_SP_LIMIT  ((unsigned long) _stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index b042d85325f5..ae7eda4ca09e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -85,7 +85,6 @@ int main(void)
DEFINE(NMI_MASK, NMI_MASK);
OFFSET(TASKTHREADPPR, task_struct, thread.ppr);
 #else
-   DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index a14f9b5f2762..b45da00b01ef 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -97,14 +97,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,_SRR1(r11)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,SAVED_KSP_LIMIT(r11)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
@@ -121,14 +118,11 @@ crit_transfer_to_handler:
mfspr   r0,SPRN_SRR1
stw r0,crit_srr1@l(0)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,saved_ksp_limit@l(0)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 699f0f816687..00dbee440bc2 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -618,9 +618,8 @@ static inline void check_stack_overflow(void)
sp = current_stack_pointer() & (THREAD_SIZE-1);
 
/* check for stack overflow: is there less than 2KB free? */
-   if (unlikely(sp < (sizeof(struct thread_info) + 2048))) {
-   pr_err("do_IRQ: stack overflow: %ld\n",
-   sp - sizeof(struct thread_info));
+   if (unlikely(sp < 2048)) {
+   pr_err("do_IRQ: stack overflow: %ld\n", sp);
dump_stack();
}
 #endif
@@ -660,7 +659,7 @@ void __do_irq(struct pt_regs *regs)
 void do_IRQ(struct pt_regs *regs)
 {
struct pt_regs *old_regs = 

[RFC PATCH v3 5/7] powerpc: 'current_set' is now a table of task_struct pointers

2018-10-01 Thread Christophe Leroy
The table of pointers 'current_set' has been used for retrieving
the stack and current. They used to be thread_info pointers as
they were pointing to the stack and current was taken from the
'task' field of the thread_info.

Now, the pointers of 'current_set' table are now both pointers
to task_struct and pointers to thread_info.

As they are used to get current, and the stack pointer is
retrieved from current's stack field, this patch changes
their type to task_struct, and renames secondary_ti to
secondary_current.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/asm-prototypes.h |  4 ++--
 arch/powerpc/kernel/head_32.S |  6 +++---
 arch/powerpc/kernel/head_44x.S|  4 ++--
 arch/powerpc/kernel/head_fsl_booke.S  |  4 ++--
 arch/powerpc/kernel/smp.c | 10 --
 5 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 78ed3c3f879a..e74d24821931 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -23,8 +23,8 @@
 #include 
 
 /* SMP */
-extern struct thread_info *current_set[NR_CPUS];
-extern struct thread_info *secondary_ti;
+extern struct task_struct *current_set[NR_CPUS];
+extern struct task_struct *secondary_current;
 void start_secondary(void *unused);
 
 /* kexec */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 44dfd73b2a62..ba0341bd5a00 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -842,9 +842,9 @@ __secondary_start:
 #endif /* CONFIG_6xx */
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   tophys(r1,r1)
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   tophys(r2,r2)
+   lwz r2,secondary_current@l(r2)
tophys(r1,r2)
lwz r1,TASK_STACK(r1)
 
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 2c7e90f36358..48e4de4dfd0c 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1021,8 +1021,8 @@ _GLOBAL(start_secondary_47x)
/* Now we can get our task struct and real stack pointer */
 
/* Get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* Current stack pointer */
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index b8a2b789677e..0d27bfff52dd 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -1076,8 +1076,8 @@ __secondary_start:
bl  call_setup_cpu
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* stack */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index f22fcbeb9898..00193643f0da 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -74,7 +74,7 @@
 static DEFINE_PER_CPU(int, cpu_state) = { 0 };
 #endif
 
-struct thread_info *secondary_ti;
+struct task_struct *secondary_current;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
@@ -644,7 +644,7 @@ void smp_send_stop(void)
 }
 #endif /* CONFIG_NMI_IPI */
 
-struct thread_info *current_set[NR_CPUS];
+struct task_struct *current_set[NR_CPUS];
 
 static void smp_store_cpu_info(int id)
 {
@@ -724,7 +724,7 @@ void smp_prepare_boot_cpu(void)
paca_ptrs[boot_cpuid]->__current = current;
 #endif
set_numa_node(numa_cpu_lookup_table[boot_cpuid]);
-   current_set[boot_cpuid] = task_thread_info(current);
+   current_set[boot_cpuid] = current;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -809,15 +809,13 @@ static bool secondaries_inhibited(void)
 
 static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
 {
-   struct thread_info *ti = task_thread_info(idle);
-
 #ifdef CONFIG_PPC64
paca_ptrs[cpu]->__current = idle;
paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) +
  THREAD_SIZE - STACK_FRAME_OVERHEAD;
 #endif
idle->cpu = cpu;
-   secondary_ti = current_set[cpu] = ti;
+   secondary_current = current_set[cpu] = idle;
 }
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
-- 
2.13.3



[RFC PATCH v3 3/7] powerpc: Activate CONFIG_THREAD_INFO_IN_TASK

2018-10-01 Thread Christophe Leroy
This patch activates CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

This has the following consequences:
- thread_info is now located at the top of task_struct.
- The 'cpu' field is now in task_struct, and only exists when
CONFIG_SMP is active.
- thread_info doesn't have anymore the 'task' field.

This patch:
- Removes all recopy of thread_info struct when the stack changes.
- Changes the CURRENT_THREAD_INFO() macro to point to current.
- Selects CONFIG_THREAD_INFO_IN_TASK.
- Modifies raw_smp_processor_id() to get ->cpu from current without
including linux/sched.h to avoid circular inclusion.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  6 +
 arch/powerpc/include/asm/ptrace.h  |  2 +-
 arch/powerpc/include/asm/smp.h |  8 +-
 arch/powerpc/include/asm/thread_info.h | 17 ++--
 arch/powerpc/kernel/asm-offsets.c  |  5 ++--
 arch/powerpc/kernel/entry_32.S |  9 +++
 arch/powerpc/kernel/exceptions-64e.S   | 11 
 arch/powerpc/kernel/head_32.S  |  6 ++---
 arch/powerpc/kernel/head_44x.S |  4 +--
 arch/powerpc/kernel/head_64.S  |  1 +
 arch/powerpc/kernel/head_booke.h   |  8 +-
 arch/powerpc/kernel/head_fsl_booke.S   |  7 +++--
 arch/powerpc/kernel/irq.c  | 47 +-
 arch/powerpc/kernel/kgdb.c | 28 
 arch/powerpc/kernel/machine_kexec_64.c |  6 ++---
 arch/powerpc/kernel/setup-common.c |  2 +-
 arch/powerpc/kernel/setup_64.c | 21 ---
 arch/powerpc/kernel/smp.c  |  2 +-
 19 files changed, 39 insertions(+), 152 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a80669209155..c6c0b91ebd33 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -237,6 +237,7 @@ config PPC
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
+   select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
#
# Please keep this list sorted alphabetically.
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 07d9dce7eda6..4e98989b5512 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -422,3 +422,9 @@ checkbin:
 
 CLEAN_FILES += $(TOUT)
 
+ifdef CONFIG_SMP
+prepare: task_cpu_prepare
+
+task_cpu_prepare: prepare0
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+endif
diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 447cbd1bee99..3a7e5561630b 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -120,7 +120,7 @@ extern int ptrace_put_reg(struct task_struct *task, int 
regno,
  unsigned long data);
 
 #define current_pt_regs() \
-   ((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) 
- 1)
+   ((struct pt_regs *)((unsigned long)task_stack_page(current) + 
THREAD_SIZE) - 1)
 /*
  * We use the least-significant bit of the trap field to indicate
  * whether we have saved the full set of registers, or only a
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 95b66a0c639b..df519b7322e5 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -83,7 +83,13 @@ int is_cpu_dead(unsigned int cpu);
 /* 32-bit */
 extern int smp_hw_index[];
 
-#define raw_smp_processor_id() (current_thread_info()->cpu)
+/*
+ * This is particularly ugly: it appears we can't actually get the definition
+ * of task_struct here, but we need access to the CPU this task is running on.
+ * Instead of using task_struct we're using _TASK_CPU which is extracted from
+ * asm-offsets.h by kbuild to get the current processor ID.
+ */
+#define raw_smp_processor_id() (*(unsigned int*)((void*)current + 
_TASK_CPU))
 #define hard_smp_processor_id()(smp_hw_index[smp_processor_id()])
 
 static inline int get_hard_smp_processor_id(int cpu)
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 406eb952b808..62eb9ff31292 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -18,9 +18,9 @@
 #define THREAD_SIZE(1 << THREAD_SHIFT)
 
 #ifdef CONFIG_PPC64
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(clrrdi dest, sp, 
THREAD_SHIFT)
+#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
 #else
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(rlwinm dest, sp, 0, 0, 
31-THREAD_SHIFT)
+#define 

[RFC PATCH v3 2/7] powerpc: Prepare for moving thread_info into task_struct

2018-10-01 Thread Christophe Leroy
This patch cleans the powerpc kernel before activating
CONFIG_THREAD_INFO_IN_TASK:
- The purpose of the pointer given to call_do_softirq() and
call_do_irq() is to point the new stack ==> change it to void*
- Don't use CURRENT_THREAD_INFO() to locate the stack.
- Fixed a few comments.
- TI_CPU is only used when CONFIG_SMP is set.
- Replace current_thread_info()->task by current
- Remove unnecessary casts to thread_info, as they'll become invalid
once thread_info is not in stack anymore.
- Ensure task_struct 'cpu' fields is not used directly out of SMP code
- Rename THREAD_INFO to TASK_STASK: As it is in fact the offset of the
pointer to the stack in task_struct, this pointer will not be impacted
by the move of THREAD_INFO.
- Makes TASK_STACK available to PPC64 which will need it to the get
stack pointer from current once the thread_info have been moved.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/irq.h   |  4 ++--
 arch/powerpc/include/asm/livepatch.h |  2 +-
 arch/powerpc/include/asm/processor.h |  4 ++--
 arch/powerpc/include/asm/reg.h   |  2 +-
 arch/powerpc/kernel/asm-offsets.c|  2 +-
 arch/powerpc/kernel/entry_32.S   |  2 +-
 arch/powerpc/kernel/entry_64.S   |  2 +-
 arch/powerpc/kernel/head_32.S|  4 ++--
 arch/powerpc/kernel/head_40x.S   |  4 ++--
 arch/powerpc/kernel/head_44x.S   |  2 +-
 arch/powerpc/kernel/head_8xx.S   |  2 +-
 arch/powerpc/kernel/head_booke.h |  4 ++--
 arch/powerpc/kernel/head_fsl_booke.S |  6 --
 arch/powerpc/kernel/irq.c|  2 +-
 arch/powerpc/kernel/misc_32.S|  8 ++--
 arch/powerpc/kernel/process.c|  6 +++---
 arch/powerpc/kernel/setup_32.c   | 15 +--
 arch/powerpc/kernel/smp.c|  4 +++-
 arch/powerpc/xmon/xmon.c |  2 +-
 19 files changed, 40 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index ee39ce56b2a2..8108d1fe33ca 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -63,8 +63,8 @@ extern struct thread_info *hardirq_ctx[NR_CPUS];
 extern struct thread_info *softirq_ctx[NR_CPUS];
 
 extern void irq_ctx_init(void);
-extern void call_do_softirq(struct thread_info *tp);
-extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp);
+extern void call_do_softirq(void *tp);
+extern void call_do_irq(struct pt_regs *regs, void *tp);
 extern void do_IRQ(struct pt_regs *regs);
 extern void __init init_IRQ(void);
 extern void __do_irq(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
index 47a03b9b528b..818451bf629c 100644
--- a/arch/powerpc/include/asm/livepatch.h
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -49,7 +49,7 @@ static inline void klp_init_thread_info(struct thread_info 
*ti)
ti->livepatch_sp = (unsigned long *)(ti + 1) + 1;
 }
 #else
-static void klp_init_thread_info(struct thread_info *ti) { }
+static inline void klp_init_thread_info(struct thread_info *ti) { }
 #endif /* CONFIG_LIVEPATCH */
 
 #endif /* _ASM_POWERPC_LIVEPATCH_H */
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 353879db3e98..31873614392f 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -40,7 +40,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -333,7 +333,7 @@ struct thread_struct {
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
_stack)
 #define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(init_thread_info), 16) + (unsigned long) _stack)
+   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long) 
_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index e5b314ed054e..f3a9cf19a986 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1053,7 +1053,7 @@
  * - SPRG9 debug exception scratch
  *
  * All 32-bit:
- * - SPRG3 current thread_info pointer
+ * - SPRG3 current thread_struct physical addr pointer
  *(virtual on BookE, physical on others)
  *
  * 32-bit classic:
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index ba9d0fc98730..d1f161e48945 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -85,10 +85,10 @@ int main(void)
DEFINE(NMI_MASK, NMI_MASK);
OFFSET(TASKTHREADPPR, task_struct, thread.ppr);
 #else
-   OFFSET(THREAD_INFO, task_struct, stack);
DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
+   OFFSET(TASK_STACK, task_struct, stack);
 
 #ifdef CONFIG_LIVEPATCH
OFFSET(TI_livepatch_sp, thread_info, livepatch_sp);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S

[RFC PATCH v3 1/7] book3s/64: avoid circular header inclusion in mmu-hash.h

2018-10-01 Thread Christophe Leroy
When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h
includes asm/current.h. This generates a circular dependency.
To avoid that, asm/processor.h shall not be included in mmu-hash.h

In order to do that, this patch moves into a new header called
asm/task_size.h the information from asm/processor.h requires by
mmu-hash.h

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
 arch/powerpc/include/asm/processor.h  | 34 +-
 arch/powerpc/include/asm/task_size.h  | 42 +++
 arch/powerpc/kvm/book3s_hv_hmi.c  |  1 +
 4 files changed, 45 insertions(+), 34 deletions(-)
 create mode 100644 arch/powerpc/include/asm/task_size.h

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index bbeaf6adf93c..7788e35f19f0 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -23,7 +23,7 @@
  */
 #include 
 #include 
-#include 
+#include 
 #include 
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 350c584ca179..353879db3e98 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -101,40 +101,8 @@ void release_thread(struct task_struct *);
 #endif
 
 #ifdef CONFIG_PPC64
-/*
- * 64-bit user address space can have multiple limits
- * For now supported values are:
- */
-#define TASK_SIZE_64TB  (0x4000UL)
-#define TASK_SIZE_128TB (0x8000UL)
-#define TASK_SIZE_512TB (0x0002UL)
-#define TASK_SIZE_1PB   (0x0004UL)
-#define TASK_SIZE_2PB   (0x0008UL)
-/*
- * With 52 bits in the address we can support
- * upto 4PB of range.
- */
-#define TASK_SIZE_4PB   (0x0010UL)
 
-/*
- * For now 512TB is only supported with book3s and 64K linux page size.
- */
-#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
-/*
- * Max value currently used:
- */
-#define TASK_SIZE_USER64   TASK_SIZE_4PB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
-#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
-#else
-#define TASK_SIZE_USER64   TASK_SIZE_64TB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
-/*
- * We don't need to allocate extended context ids for 4K page size, because
- * we limit the max effective address on this config to 64TB.
- */
-#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
-#endif
+#include 
 
 /*
  * 32-bit user address space is 4GB - 1 page
diff --git a/arch/powerpc/include/asm/task_size.h 
b/arch/powerpc/include/asm/task_size.h
new file mode 100644
index ..ca45638617b0
--- /dev/null
+++ b/arch/powerpc/include/asm/task_size.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_TASK_SIZE_H
+#define _ASM_POWERPC_TASK_SIZE_H
+
+#ifdef CONFIG_PPC64
+/*
+ * 64-bit user address space can have multiple limits
+ * For now supported values are:
+ */
+#define TASK_SIZE_64TB  (0x4000UL)
+#define TASK_SIZE_128TB (0x8000UL)
+#define TASK_SIZE_512TB (0x0002UL)
+#define TASK_SIZE_1PB   (0x0004UL)
+#define TASK_SIZE_2PB   (0x0008UL)
+/*
+ * With 52 bits in the address we can support
+ * upto 4PB of range.
+ */
+#define TASK_SIZE_4PB   (0x0010UL)
+
+/*
+ * For now 512TB is only supported with book3s and 64K linux page size.
+ */
+#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
+/*
+ * Max value currently used:
+ */
+#define TASK_SIZE_USER64   TASK_SIZE_4PB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
+#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
+#else
+#define TASK_SIZE_USER64   TASK_SIZE_64TB
+#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
+/*
+ * We don't need to allocate extended context ids for 4K page size, because
+ * we limit the max effective address on this config to 64TB.
+ */
+#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
+#endif
+
+#endif /* CONFIG_PPC64 */
+#endif /* _ASM_POWERPC_TASK_SIZE_H */
diff --git a/arch/powerpc/kvm/book3s_hv_hmi.c b/arch/powerpc/kvm/book3s_hv_hmi.c
index e3f738eb1cac..64b5011475c7 100644
--- a/arch/powerpc/kvm/book3s_hv_hmi.c
+++ b/arch/powerpc/kvm/book3s_hv_hmi.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void wait_for_subcore_guest_exit(void)
 {
-- 
2.13.3



[RFC PATCH v3 0/7] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2018-10-01 Thread Christophe Leroy
The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
- It protects thread_info from corruption in the case of stack
overflows.
- Its address is harder to determine if stack addresses are
leaked, making a number of attacks more difficult.

Changes since RFC v2:
 - Removed the modification of names in asm-offsets
 - Created a rule in arch/powerpc/Makefile to append the offset of current->cpu 
in CFLAGS
 - Modified asm/smp.h to use the offset set in CFLAGS
 - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch
 - Moved the modification of current_pt_regs in the patch activating 
CONFIG_THREAD_INFO_IN_TASK

Changes since RFC v1:
 - Removed the first patch which was modifying header inclusion order in timer
 - Modified some names in asm-offsets to avoid conflicts when including 
asm-offsets in C files
 - Modified asm/smp.h to avoid having to include linux/sched.h (using 
asm-offsets instead)
 - Moved some changes from the activation patch to the preparation patch.

Christophe Leroy (7):
  book3s/64: avoid circular header inclusion in mmu-hash.h
  powerpc: Prepare for moving thread_info into task_struct
  powerpc: Activate CONFIG_THREAD_INFO_IN_TASK
  powerpc: regain entire stack space
  powerpc: 'current_set' is now a table of task_struct pointers
  powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
  powerpc/64: Modify CURRENT_THREAD_INFO()

 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  6 +++
 arch/powerpc/include/asm/asm-prototypes.h  |  4 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h  |  2 +-
 arch/powerpc/include/asm/exception-64s.h   |  4 +-
 arch/powerpc/include/asm/irq.h | 14 +++---
 arch/powerpc/include/asm/livepatch.h   |  2 +-
 arch/powerpc/include/asm/processor.h   | 39 ++-
 arch/powerpc/include/asm/ptrace.h  |  2 +-
 arch/powerpc/include/asm/reg.h |  2 +-
 arch/powerpc/include/asm/smp.h |  8 +++-
 arch/powerpc/include/asm/task_size.h   | 42 
 arch/powerpc/include/asm/thread_info.h | 17 +--
 arch/powerpc/kernel/asm-offsets.c  |  8 ++--
 arch/powerpc/kernel/entry_32.S | 66 +-
 arch/powerpc/kernel/entry_64.S | 12 ++---
 arch/powerpc/kernel/epapr_hcalls.S |  5 +-
 arch/powerpc/kernel/exceptions-64e.S   | 13 +
 arch/powerpc/kernel/exceptions-64s.S   |  2 +-
 arch/powerpc/kernel/head_32.S  | 14 +++---
 arch/powerpc/kernel/head_40x.S |  4 +-
 arch/powerpc/kernel/head_44x.S |  8 ++--
 arch/powerpc/kernel/head_64.S  |  1 +
 arch/powerpc/kernel/head_8xx.S |  2 +-
 arch/powerpc/kernel/head_booke.h   | 12 ++---
 arch/powerpc/kernel/head_fsl_booke.S   | 16 +++
 arch/powerpc/kernel/idle_6xx.S |  8 ++--
 arch/powerpc/kernel/idle_book3e.S  |  2 +-
 arch/powerpc/kernel/idle_e500.S|  8 ++--
 arch/powerpc/kernel/idle_power4.S  |  2 +-
 arch/powerpc/kernel/irq.c  | 66 --
 arch/powerpc/kernel/kgdb.c | 28 ---
 arch/powerpc/kernel/machine_kexec_64.c |  6 +--
 arch/powerpc/kernel/misc_32.S  | 17 +++
 arch/powerpc/kernel/process.c  | 15 +++---
 arch/powerpc/kernel/setup-common.c |  2 +-
 arch/powerpc/kernel/setup_32.c | 15 ++
 arch/powerpc/kernel/setup_64.c | 29 ++-
 arch/powerpc/kernel/smp.c  | 16 +++
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  6 +--
 arch/powerpc/kvm/book3s_hv_hmi.c   |  1 +
 arch/powerpc/mm/hash_low_32.S  | 14 ++
 arch/powerpc/sysdev/6xx-suspend.S  |  5 +-
 arch/powerpc/xmon/xmon.c   |  2 +-
 44 files changed, 203 insertions(+), 345 deletions(-)
 create mode 100644 arch/powerpc/include/asm/task_size.h

-- 
2.13.3



RE: [PATCH v3 6/6] arm64: dts: add LX2160ARDB board support

2018-10-01 Thread Vabhav Sharma


> -Original Message-
> From: devicetree-ow...@vger.kernel.org 
> On Behalf Of Li Yang
> Sent: Saturday, September 29, 2018 1:07 AM
> To: Vabhav Sharma 
> Cc: Sudeep Holla ; Scott Wood ;
> lkml ; open list:OPEN FIRMWARE AND
> FLATTENED DEVICE TREE BINDINGS ; Rob Herring
> ; Mark Rutland ; linuxppc-dev
> ; moderated list:ARM/FREESCALE IMX / MXC
> ARM ARCHITECTURE ; Michael Turquette
> ; sb...@kernel.org; Rafael J. Wysocki
> ; Viresh Kumar ; linux-clk 
>  c...@vger.kernel.org>; linux...@vger.kernel.org; linux-kernel-
> ow...@vger.kernel.org; Catalin Marinas ; Will
> Deacon ; Greg Kroah-Hartman
> ; Arnd Bergmann ; Kate
> Stewart ; yamada.masah...@socionext.com;
> Udit Kumar ; Priyanka Jain ;
> Russell King ; Varun Sethi ; Sriram
> Dash 
> Subject: Re: [PATCH v3 6/6] arm64: dts: add LX2160ARDB board support
> 
> On Mon, Sep 24, 2018 at 7:51 AM Vabhav Sharma 
> wrote:
> >
> > LX2160A reference design board (RDB) is a high-performance computing,
> > evaluation, and development platform with LX2160A SoC.
> 
> Please send next version with Shawn Guo and me in the "to" recipient so that 
> its
> less likely we will miss it.
My mistake, Not Sure how it's missed.
> 
> >
> > Signed-off-by: Priyanka Jain 
> > Signed-off-by: Sriram Dash 
> > Signed-off-by: Vabhav Sharma 
> > ---
> >  arch/arm64/boot/dts/freescale/Makefile|  1 +
> >  arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 88
> > +++
> >  2 files changed, 89 insertions(+)
> >  create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
> >
> > diff --git a/arch/arm64/boot/dts/freescale/Makefile
> > b/arch/arm64/boot/dts/freescale/Makefile
> > index 86e18ad..445b72b 100644
> > --- a/arch/arm64/boot/dts/freescale/Makefile
> > +++ b/arch/arm64/boot/dts/freescale/Makefile
> > @@ -13,3 +13,4 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-
> rdb.dtb
> >  dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb
> >  dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb
> >  dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb
> > +dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb
> > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
> > b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
> > new file mode 100644
> > index 000..1bbe663
> > --- /dev/null
> > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
> > @@ -0,0 +1,88 @@
> > +// SPDX-License-Identifier: (GPL-2.0 OR MIT) // // Device Tree file
> > +for LX2160ARDB // // Copyright 2018 NXP
> > +
> > +/dts-v1/;
> > +
> > +#include "fsl-lx2160a.dtsi"
> > +
> > +/ {
> > +   model = "NXP Layerscape LX2160ARDB";
> > +   compatible = "fsl,lx2160a-rdb", "fsl,lx2160a";
> > +
> > +   chosen {
> > +   stdout-path = "serial0:115200n8";
> > +   };
> > +};
> > +
> > + {
> > +   status = "okay";
> > +};
> > +
> > + {
> > +   status = "okay";
> > +};
> > +
> > + {
> > +   status = "okay";
> > +   i2c-mux@77 {
> > +   compatible = "nxp,pca9547";
> > +   reg = <0x77>;
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +
> > +   i2c@2 {
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +   reg = <0x2>;
> > +
> > +   power-monitor@40 {
> > +   compatible = "ti,ina220";
> > +   reg = <0x40>;
> > +   shunt-resistor = <1000>;
> > +   };
> > +   };
> > +
> > +   i2c@3 {
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +   reg = <0x3>;
> > +
> > +   temperature-sensor@4c {
> > +   compatible = "nxp,sa56004";
> > +   reg = <0x4c>;
> 
> Need a vcc-supply property according to the binding.
Ok
> 
> > +   };
> > +
> > +   temperature-sensor@4d {
> > +   compatible = "nxp,sa56004";
> > +   reg = <0x4d>;
> 
> Ditto.
Ok
> 
> > +   };
> > +   };
> > +   };
> > +};
> > +
> > + {
> > +   status = "okay";
> > +
> > +   rtc@51 {
> > +   compatible = "nxp,pcf2129";
> > +   reg = <0x51>;
> > +   // IRQ10_B
> > +   interrupts = <0 150 0x4>;
> > +   };
> > +
> > +};
> > +
> > + {
> > +   status = "okay";
> > +};
> > +
> > + {
> > +   status = "okay";
> > +};
> > +
> > + {
> > +   status = "okay";
> > +};
> > --
> > 2.7.4
> >


[PATCH] powerpc/lib: fix book3s/32 boot failure due to code patching

2018-10-01 Thread Christophe Leroy
Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init
sections") accesses 'init_mem_is_free' flag too early, before the
kernel is relocated. This provokes early boot failure (before the
console is active).

As it is not necessary to do this verification that early, this
patch moves the test into patch_instruction() instead of
__patch_instruction().

This modification also has the advantage of avoiding unnecessary
remappings.

Fixes: 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/lib/code-patching.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 6ae2777c220d..5ffee298745f 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -28,12 +28,6 @@ static int __patch_instruction(unsigned int *exec_addr, 
unsigned int instr,
 {
int err;
 
-   /* Make sure we aren't patching a freed init section */
-   if (init_mem_is_free && init_section_contains(exec_addr, 4)) {
-   pr_debug("Skipping init section patching addr: 0x%px\n", 
exec_addr);
-   return 0;
-   }
-
__put_user_size(instr, patch_addr, 4, err);
if (err)
return err;
@@ -148,7 +142,7 @@ static inline int unmap_patch_area(unsigned long addr)
return 0;
 }
 
-int patch_instruction(unsigned int *addr, unsigned int instr)
+static int do_patch_instruction(unsigned int *addr, unsigned int instr)
 {
int err;
unsigned int *patch_addr = NULL;
@@ -188,12 +182,22 @@ int patch_instruction(unsigned int *addr, unsigned int 
instr)
 }
 #else /* !CONFIG_STRICT_KERNEL_RWX */
 
-int patch_instruction(unsigned int *addr, unsigned int instr)
+static int do_patch_instruction(unsigned int *addr, unsigned int instr)
 {
return raw_patch_instruction(addr, instr);
 }
 
 #endif /* CONFIG_STRICT_KERNEL_RWX */
+
+int patch_instruction(unsigned int *addr, unsigned int instr)
+{
+   /* Make sure we aren't patching a freed init section */
+   if (init_mem_is_free && init_section_contains(addr, 4)) {
+   pr_debug("Skipping init section patching addr: 0x%px\n", addr);
+   return 0;
+   }
+   return do_patch_instruction(addr, instr);
+}
 NOKPROBE_SYMBOL(patch_instruction);
 
 int patch_branch(unsigned int *addr, unsigned long target, int flags)
-- 
2.13.3



Re: [PATCH 1/2] powerpc/64: Remove duplicated -mabi=elfv2 for little endian targets

2018-10-01 Thread Bin Meng
On Mon, Oct 1, 2018 at 4:59 PM Segher Boessenkool
 wrote:
>
> On Sat, Sep 29, 2018 at 11:25:19PM -0700, Bin Meng wrote:
> > The -mabi=elfv2 is currently specified twice in the makefile. Remove
> > the one that does not test compiler.
>
> This was
>
> ifdef CONFIG_PPC64
> cflags-$(CONFIG_CPU_BIG_ENDIAN) += $(call cc-option,-mabi=elfv1)
> cflags-$(CONFIG_CPU_BIG_ENDIAN) += $(call cc-option,-mcall-aixdesc)
> aflags-$(CONFIG_CPU_BIG_ENDIAN) += $(call cc-option,-mabi=elfv1)
> aflags-$(CONFIG_CPU_LITTLE_ENDIAN)  += -mabi=elfv2
> endif
>
> and the later setting is
>
> ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
> CFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mabi=elfv2,$(call 
> cc-option,-mcall-aixdesc))
> AFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mabi=elfv2)
> else
> CFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mabi=elfv1)
> CFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mcall-aixdesc)
> AFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mabi=elfv1)
> endif
>
> Maybe these two pieces should be joined completely?
>

Ah, yes! I was only looking at elfv2 stuff before :)

Regards,
Bin


RE: [PATCH v3 5/6] arm64: dts: add QorIQ LX2160A SoC support

2018-10-01 Thread Vabhav Sharma


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Li Yang
> Sent: Saturday, September 29, 2018 2:11 AM
> To: Vabhav Sharma 
> Cc: Sudeep Holla ; Scott Wood ;
> lkml ; open list:OPEN FIRMWARE AND
> FLATTENED DEVICE TREE BINDINGS ; Rob Herring
> ; Mark Rutland ; linuxppc-dev
> ; moderated list:ARM/FREESCALE IMX / MXC
> ARM ARCHITECTURE ; Michael Turquette
> ; sb...@kernel.org; Rafael J. Wysocki
> ; Viresh Kumar ; linux-clk 
>  c...@vger.kernel.org>; linux...@vger.kernel.org; linux-kernel-
> ow...@vger.kernel.org; Catalin Marinas ; Will
> Deacon ; Greg Kroah-Hartman
> ; Arnd Bergmann ; Kate
> Stewart ; yamada.masah...@socionext.com;
> Yogesh Narayan Gaur ; Udit Kumar
> ; Priyanka Jain ; Ying Zhang
> ; Russell King ; Ramneek
> Mehresh ; Varun Sethi ;
> Nipun Gupta ; Sriram Dash 
> Subject: Re: [PATCH v3 5/6] arm64: dts: add QorIQ LX2160A SoC support
> 
> On Mon, Sep 24, 2018 at 7:47 AM Vabhav Sharma 
> wrote:
> >
> > LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture.
> >
> > LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor
> > cores in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8
> > I2C controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011
> > SBSA UARTs etc.
> >
> > Signed-off-by: Ramneek Mehresh 
> > Signed-off-by: Zhang Ying-22455 
> > Signed-off-by: Nipun Gupta 
> > Signed-off-by: Priyanka Jain 
> > Signed-off-by: Yogesh Gaur 
> > Signed-off-by: Sriram Dash 
> > Signed-off-by: Vabhav Sharma 
> > ---
> >  arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 693
> > +
> >  1 file changed, 693 insertions(+)
> >  create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> >
> > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> > b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> > new file mode 100644
> > index 000..46eea16
> > --- /dev/null
> > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> > @@ -0,0 +1,693 @@
> > +// SPDX-License-Identifier: (GPL-2.0 OR MIT) // // Device Tree
> > +Include file for Layerscape-LX2160A family SoC.
> > +//
> > +// Copyright 2018 NXP
> > +
> > +#include 
> 
> You included the header file, but you didn't use the MACROs in most of the
> interrupts property below.  It is recommended to use them for better 
> readibity.
Ok, I will update it.
> 
> > +
> > +/memreserve/ 0x8000 0x0001;
> > +
> > +/ {
> > +   compatible = "fsl,lx2160a";
> > +   interrupt-parent = <>;
> > +   #address-cells = <2>;
> > +   #size-cells = <2>;
> > +
> > +   cpus {
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +
> > +   // 8 clusters having 2 Cortex-A72 cores each
> > +   cpu@0 {
> > +   device_type = "cpu";
> > +   compatible = "arm,cortex-a72";
> > +   reg = <0x0>;
> > +   clocks = < 1 0>;
> > +   d-cache-size = <0x8000>;
> > +   d-cache-line-size = <64>;
> > +   d-cache-sets = <128>;
> > +   i-cache-size = <0xC000>;
> > +   i-cache-line-size = <64>;
> > +   i-cache-sets = <192>;
> > +   next-level-cache = <_l2>;
> 
> enable-method is a required property for this and cpu below.
Ok
> 
> > +   };
> > +
> > +   cpu@1 {
> > +   device_type = "cpu";
> > +   compatible = "arm,cortex-a72";
> > +   reg = <0x1>;
> > +   clocks = < 1 0>;
> > +   d-cache-size = <0x8000>;
> > +   d-cache-line-size = <64>;
> > +   d-cache-sets = <128>;
> > +   i-cache-size = <0xC000>;
> > +   i-cache-line-size = <64>;
> > +   i-cache-sets = <192>;
> > +   next-level-cache = <_l2>;
> > +   };
> > +
> > +   cpu@100 {
> > +   device_type = "cpu";
> > +   compatible = "arm,cortex-a72";
> > +   reg = <0x100>;
> > +   clocks = < 1 1>;
> > +   d-cache-size = <0x8000>;
> > +   d-cache-line-size = <64>;
> > +   d-cache-sets = <128>;
> > +   i-cache-size = <0xC000>;
> > +   i-cache-line-size = <64>;
> > +   i-cache-sets = <192>;
> > +   next-level-cache = <_l2>;
> > +   };
> > +
> > +   cpu@101 {
> > +   device_type = "cpu";
> > +   compatible = "arm,cortex-a72";
> > +   reg = <0x101>;
> > +   clocks = < 1 1>;
> > +   d-cache-size = <0x8000>;
> > + 

Re: [v4] powerpc: Avoid code patching freed init sections

2018-10-01 Thread Christophe LEROY




Le 21/09/2018 à 13:59, Michael Ellerman a écrit :

On Fri, 2018-09-14 at 01:14:11 UTC, Michael Neuling wrote:

This stops us from doing code patching in init sections after they've
been freed.

In this chain:
   kvm_guest_init() ->
 kvm_use_magic_page() ->
   fault_in_pages_readable() ->
 __get_user() ->
   __get_user_nocheck() ->
 barrier_nospec();

We have a code patching location at barrier_nospec() and
kvm_guest_init() is an init function. This whole chain gets inlined,
so when we free the init section (hence kvm_guest_init()), this code
goes away and hence should no longer be patched.

We seen this as userspace memory corruption when using a memory
checker while doing partition migration testing on powervm (this
starts the code patching post migration via
/sys/kernel/mobility/migration). In theory, it could also happen when
using /sys/kernel/debug/powerpc/barrier_nospec.

cc: sta...@vger.kernel.org # 4.13+
Signed-off-by: Michael Neuling 
Reviewed-by: Nicholas Piggin 
Reviewed-by: Christophe Leroy 


Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/51c3c62b58b357e8d35e4cc32f7b4e



This patch breaks booting on my MPC83xx board (book3s/32) very early 
(before console is active), provoking restart.

u-boot reports a checkstop reset at restart.

Reverting this commit fixes the issue.

The following patch fixes the issue as well, but I think it is not the 
best solution. I still think the test should be in patch_instruction() 
instead of being in __patch_instruction(), see my comment on v2


Christophe

diff --git a/arch/powerpc/lib/code-patching.c 
b/arch/powerpc/lib/code-patching.c

index 6ae2777..6192fda 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -29,7 +29,7 @@ static int __patch_instruction(unsigned int 
*exec_addr, unsigned int instr,

int err;

/* Make sure we aren't patching a freed init section */
-   if (init_mem_is_free && init_section_contains(exec_addr, 4)) {
+   if (*PTRRELOC(_mem_is_free) && 
init_section_contains(exec_addr, 4)) {
pr_debug("Skipping init section patching addr: 
0x%px\n", exec_addr);

return 0;
}


Christophe


[PATCH v2] powerpc/rtas: Fix a potential race between CPU-Offline & Migration

2018-10-01 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

Live Partition Migrations require all the present CPUs to execute the
H_JOIN call, and hence rtas_ibm_suspend_me() onlines any offline CPUs
before initiating the migration for this purpose.

The commit 85a88cabad57
("powerpc/pseries: Disable CPU hotplug across migrations")
disables any CPU-hotplug operations once all the offline CPUs are
brought online to prevent any further state change. Once the
CPU-Hotplug operation is disabled, the code assumes that all the CPUs
are online.

However, there is a minor window in rtas_ibm_suspend_me() between
onlining the offline CPUs and disabling CPU-Hotplug when a concurrent
CPU-offline operations initiated by the userspace can succeed thereby
nullifying the the aformentioned assumption. In this unlikely case
these offlined CPUs will not call H_JOIN, resulting in a system hang.

Fix this by verifying that all the present CPUs are actually online
after CPU-Hotplug has been disabled, failing which we restore the
state of the offline CPUs in rtas_ibm_suspend_me() and return an
-EBUSY.

Cc: Nathan Fontenot 
Cc: Tyrel Datwyler 
Suggested-by: Michael Ellerman 
Signed-off-by: Gautham R. Shenoy 
---
v2: Restore the state of the offline CPUs if all CPUs aren't onlined.

 arch/powerpc/kernel/rtas.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 2c7ed31..d4468cb 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -982,6 +982,15 @@ int rtas_ibm_suspend_me(u64 handle)
}
 
cpu_hotplug_disable();
+
+   /* Check if we raced with a CPU-Offline Operation */
+   if (unlikely(!cpumask_equal(cpu_present_mask, cpu_online_mask))) {
+   pr_err("%s: Raced against a concurrent CPU-Offline\n",
+  __func__);
+   atomic_set(, -EBUSY);
+   goto out_hotplug_enable;
+   }
+
stop_topology_update();
 
/* Call function on all CPUs.  One of us will make the
@@ -996,6 +1005,8 @@ int rtas_ibm_suspend_me(u64 handle)
printk(KERN_ERR "Error doing global join\n");
 
start_topology_update();
+
+out_hotplug_enable:
cpu_hotplug_enable();
 
/* Take down CPUs not online prior to suspend */
-- 
1.9.4



Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-10-01 Thread David Hildenbrand
On 01/10/2018 10:40, Michal Hocko wrote:
> On Fri 28-09-18 17:03:57, David Hildenbrand wrote:
> [...]
> 
> I haven't read the patch itself but I just wanted to note one thing
> about this part
> 
>> For paravirtualized devices it is relevant that memory is onlined as
>> quickly as possible after adding - and that it is added to the NORMAL
>> zone. Otherwise, it could happen that too much memory in a row is added
>> (but not onlined), resulting in out-of-memory conditions due to the
>> additional memory for "struct pages" and friends. MOVABLE zone as well
>> as delays might be very problematic and lead to crashes (e.g. zone
>> imbalance).
> 
> I have proposed (but haven't finished this due to other stuff) a
> solution for this. Newly added memory can host memmaps itself and then
> you do not have the problem in the first place. For vmemmap it would
> have an advantage that you do not really have to beg for 2MB pages to
> back the whole section but you would get it for free because the initial
> part of the section is by definition properly aligned and unused.

So the plan is to "host metadata for new memory on the memory itself".
Just want to note that this is basically impossible for s390x with the
current mechanisms. (added memory is dead, until onlining notifies the
hypervisor and memory is allocated). It will also be problematic for
paravirtualized memory devices (e.g. XEN's "not backed by the
hypervisor" hacks).

This would only be possible for memory DIMMs, memory that is completely
accessible as far as I can see. Or at least, some specified "first part"
is accessible.

Other problems are other metadata like extended struct pages and friends.

(I really like the idea of adding memory without allocating memory in
the hypervisor in the first place, please keep me tuned).

And please note: This solves some problematic part ("adding too much
memory to the movable zone or not onlining it"), but not the issue of
zone imbalance in the first place. And not one issue I try to tackle
here: don't add paravirtualized memory to the movable zone.

> 
> I yet have to think about the whole proposal but I am missing the most
> important part. _Who_ is going to use the new exported information and
> for what purpose. You said that distributions have hard time to
> distinguish different types of onlinining policies but isn't this
> something that is inherently usecase specific?
> 

Let's think about a distribution. We have a clash of use cases here
(just what you describe). What I propose solves one part of it ("handle
what you know how to handle right in the kernel").

1. Users of DIMMs usually expect that they can be unplugged again. That
is why you want to control how to online memory in user space (== add it
to the movable zone).

2. Users of standby memory (s390) expect that memory will never be
onlined automatically. It will be onlined manually.

3. Users of paravirtualized devices (esp. Hyper-V) don't care about
memory unplug in the sense of MOVABLE at all. They (or Hyper-V!) will
add a whole bunch of memory and expect that everything works fine. So
that memory is onlined immediately and that memory is added to the
NORMAL zone. Users never want the MOVABLE zone.

1. is a reason why distributions usually don't configure
"MEMORY_HOTPLUG_DEFAULT_ONLINE", because you really want the option for
MOVABLE zone. That however implies, that e.g. for x86, you have to
handle all new memory in user space, especially also HyperV memory.
There, you then have to check for things like "isHyperV()" to decide
"oh, yes, this should definitely not go to the MOVABLE zone".

As you know, I am working on virtio-mem, which can basically be combined
with 1 or 2. And user space has no idea about the difference between
added memory blocks. Was it memory from a DIMM (== ZONE_MOVABLE)? Was it
memory from a paravirtualized device (== ZONE_NORMAL)? Was it standby
memory? (don't online)


That part, I try to solve with this interface.

To answer your question: User space will only care about "normal" memory
and then decide how to online it (for now, usually MOVABLE, because
that's what customers expect with DIMMs). The use case of DIMMS, we
don't know and therefore we can't expose. The use case of the other
cases, we know exactly already in the kernel.

Existing user space hacks will continue to work but can be replaces by a
new check against "normal" memory block types.

Thanks for looking into this!

-- 

Thanks,

David / dhildenb


Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-10-01 Thread David Hildenbrand
On 28/09/2018 19:02, Dave Hansen wrote:
> It's really nice if these kinds of things are broken up.  First, replace
> the old want_memblock parameter, then add the parameter to the
> __add_page() calls.

Definitely, once we agree that is is not nuts, I will split it up for
the next version :)

> 
>> +/*
>> + * NONE: No memory block is to be created (e.g. device memory).
>> + * NORMAL:   Memory block that represents normal (boot or hotplugged) memory
>> + *   (e.g. ACPI DIMMs) that should be onlined either automatically
>> + *   (memhp_auto_online) or manually by user space to select a
>> + *   specific zone.
>> + *   Applicable to memhp_auto_online.
>> + * STANDBY:  Memory block that represents standby memory that should only
>> + *   be onlined on demand by user space (e.g. standby memory on
>> + *   s390x), but never automatically by the kernel.
>> + *   Not applicable to memhp_auto_online.
>> + * PARAVIRT: Memory block that represents memory added by
>> + *   paravirtualized mechanisms (e.g. hyper-v, xen) that will
>> + *   always automatically get onlined. Memory will be unplugged
>> + *   using ballooning, not by relying on the MOVABLE ZONE.
>> + *   Not applicable to memhp_auto_online.
>> + */
>> +enum {
>> +MEMORY_BLOCK_NONE,
>> +MEMORY_BLOCK_NORMAL,
>> +MEMORY_BLOCK_STANDBY,
>> +MEMORY_BLOCK_PARAVIRT,
>> +};
> 
> This does not seem like the best way to expose these.
> 
> STANDBY, for instance, seems to be essentially a replacement for a check
> against running on s390 in userspace to implement a _typical_ s390
> policy.  It seems rather weird to try to make the userspace policy
> determination easier by telling userspace about the typical s390 policy
> via the kernel.

Now comes the fun part: I am working on another paravirtualized memory
hotplug way for KVM guests, based on virtio ("virtio-mem").

These devices can potentially be used concurrently with
- s390x standby memory
- DIMMs

How should a policy in user space look like when new memory gets added
- on s390x? Not onlining paravirtualized memory is very wrong.
- on e.g. x86? Onlining memory to the MOVABLE zone is very wrong.

So the type of memory is very important here to have in user space.
Relying on checks like "isS390()", "isKVMGuest()" or "isHyperVGuest()"
to decide whether to online memory and how to online memory is wrong.
Only some specific memory types (which I call "normal") are to be
handled by user space.

For the other ones, we exactly know what to do:
- standby? don't online
- paravirt? always online to normal zone

I will add some more details as reply to Michal.

> 
> As for the OOM issues, that sounds like something we need to fix by
> refusing to do (or delaying) hot-add operations once we consume too much
> ZONE_NORMAL from memmap[]s rather than trying to indirectly tell
> userspace to hurry thing along.

That is a moving target and doing that automatically is basically
impossible. You can add a lot of memory to the movable zone and
everything is fine. Suddenly a lot of processes are started - boom.
MOVABLE should only every be used if you expect an unplug. And for
paravirtualized devices, a "typical" unplug does not exist.

> 
> So, to my eye, we need:
> 
>  +enum {
>  +MEMORY_BLOCK_NONE,
>  +MEMORY_BLOCK_STANDBY, /* the default */
>  +MEMORY_BLOCK_AUTO_ONLINE,
>  +};

auto-online is strongly misleading, that's why I called it "normal", but
I am open for suggestions. The information about devices handles fully
in the kernel - "paravirt" is key for me.

> 
> and we can probably collapse NONE into AUTO_ONLINE because userspace
> ends up doing the same thing for both: nothing.

For external reasons, yes, for internal reasons no (see hmm/device
memory). In user space, we will never end up with MEMORY_BLOCK_NONE,
because there is no memory block.

> 
>>  struct memory_block {
>>  unsigned long start_section_nr;
>>  unsigned long end_section_nr;
>> @@ -34,6 +58,7 @@ struct memory_block {
>>  int (*phys_callback)(struct memory_block *);
>>  struct device dev;
>>  int nid;/* NID for this memory block */
>> +int type;   /* type of this memory block */
>>  };
> 
> Shouldn't we just be creating and using an actual named enum type?
> 

That makes sense.

Thanks!

-- 

Thanks,

David / dhildenb


Re: [PATCH 2/2] powerpc/64: Increase stack redzone for 64-bit kernel to 512 bytes

2018-10-01 Thread Segher Boessenkool
On Sat, Sep 29, 2018 at 11:25:20PM -0700, Bin Meng wrote:
>  /*
> - * Size of redzone that userspace is allowed to use below the stack
> + * Size of redzone that kernel/userspace is allowed to use below the stack
>   * pointer.  This is 288 in the 64-bit big-endian ELF ABI, and 512 in
>   * the new ELFv2 little-endian ABI, so we allow the larger amount.
> - *
> - * For kernel code we allow a 288-byte redzone, in order to conserve
> - * kernel stack space; gcc currently only uses 288 bytes, and will
> - * hopefully allow explicit control of the redzone size in future.
>   */

Btw: patches welcome!  This will never be useful for userland code, so no
one in GCC land is looking at this (we did not even know it is wanted).


Segher


Re: [PATCH 1/2] powerpc/64: Remove duplicated -mabi=elfv2 for little endian targets

2018-10-01 Thread Segher Boessenkool
On Sat, Sep 29, 2018 at 11:25:19PM -0700, Bin Meng wrote:
> The -mabi=elfv2 is currently specified twice in the makefile. Remove
> the one that does not test compiler.

This was

ifdef CONFIG_PPC64
cflags-$(CONFIG_CPU_BIG_ENDIAN) += $(call cc-option,-mabi=elfv1)
cflags-$(CONFIG_CPU_BIG_ENDIAN) += $(call cc-option,-mcall-aixdesc)
aflags-$(CONFIG_CPU_BIG_ENDIAN) += $(call cc-option,-mabi=elfv1)
aflags-$(CONFIG_CPU_LITTLE_ENDIAN)  += -mabi=elfv2
endif

and the later setting is

ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
CFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mabi=elfv2,$(call 
cc-option,-mcall-aixdesc))
AFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mabi=elfv2)
else
CFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mabi=elfv1)
CFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mcall-aixdesc)
AFLAGS-$(CONFIG_PPC64)  += $(call cc-option,-mabi=elfv1)
endif

Maybe these two pieces should be joined completely?


Segher


Re: [PATCH 2/2] powerpc/64: Increase stack redzone for 64-bit kernel to 512 bytes

2018-10-01 Thread Segher Boessenkool
Hi!

On Mon, Oct 01, 2018 at 12:22:56PM +1000, Nicholas Piggin wrote:
> On Mon, 1 Oct 2018 09:11:04 +0800
> Bin Meng  wrote:
> > On Mon, Oct 1, 2018 at 7:27 AM Nicholas Piggin  wrote:
> > > On Sat, 29 Sep 2018 23:25:20 -0700
> > > Bin Meng  wrote:
> > > > commit 573ebfa6601f ("powerpc: Increase stack redzone for 64-bit
> > > > userspace to 512 bytes") only changes stack userspace redzone size.
> > > > We need increase the kernel one to 512 bytes too per ABIv2 spec.  
> > >
> > > You're right we need 512 to be compatible with ABIv2, but as the
> > > comment says, gcc limits this to 288 bytes so that's what is used
> > > to save stack space. We can use a compiler version test to change
> > > this if llvm or a new version of gcc does something different.
> > >  
> > 
> > I believe what the comment says is for ABIv1. At the time when commit
> > 573ebfa6601f was submitted, kernel had not switched to ABIv2 build
> > yet.
> 
> I see, yes you are right about that. However gcc still seems to be using
> 288 bytes.

And that is required by the ABI!

"""
2.2.2.4. Protected Zone

The 288 bytes below the stack pointer are available as volatile program
storage that is not preserved across function calls. Interrupt handlers and
any other functions that might run without an explicit call must take care
to preserve a protected zone, also referred to as the red zone, of 512 bytes
that consists of:

 * The 288-byte volatile program storage region that is used to hold saved
   registers and local variables
 * An additional 224 bytes below the volatile program storage region that is
   set aside as a volatile system storage region for system functions

If a function does not call other functions and does not need more stack
space than is available in the volatile program storage region (that is, 288
bytes), it does not need to have a stack frame. The 224-byte volatile system
storage region is not available to compilers for allocation to saved
registers and local variables.
"""

A routine has a red zone of 288 bytes.  Below there is 224 more bytes of
available storage, but that is not available to the routine itself: some
(asynchronous) other code (like an interrupt) can use (i.e. clobber) it.


Segher


Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-10-01 Thread Michal Hocko
On Fri 28-09-18 17:03:57, David Hildenbrand wrote:
[...]

I haven't read the patch itself but I just wanted to note one thing
about this part

> For paravirtualized devices it is relevant that memory is onlined as
> quickly as possible after adding - and that it is added to the NORMAL
> zone. Otherwise, it could happen that too much memory in a row is added
> (but not onlined), resulting in out-of-memory conditions due to the
> additional memory for "struct pages" and friends. MOVABLE zone as well
> as delays might be very problematic and lead to crashes (e.g. zone
> imbalance).

I have proposed (but haven't finished this due to other stuff) a
solution for this. Newly added memory can host memmaps itself and then
you do not have the problem in the first place. For vmemmap it would
have an advantage that you do not really have to beg for 2MB pages to
back the whole section but you would get it for free because the initial
part of the section is by definition properly aligned and unused.

I yet have to think about the whole proposal but I am missing the most
important part. _Who_ is going to use the new exported information and
for what purpose. You said that distributions have hard time to
distinguish different types of onlinining policies but isn't this
something that is inherently usecase specific?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 6/9] kbuild: consolidate Devicetree dtb build rules

2018-10-01 Thread Geert Uytterhoeven
On Fri, Sep 28, 2018 at 8:42 PM Rob Herring  wrote:
> On Fri, Sep 28, 2018 at 12:21 PM Andreas Färber  wrote:
> > Am 13.09.18 um 17:51 schrieb Geert Uytterhoeven:
> > > On Wed, Sep 12, 2018 at 3:02 AM Masahiro Yamada
> > >  wrote:
> > >> Even x86 can enable OF and OF_UNITTEST.
> > >>
> > >> Another solution might be,
> > >> guard it by 'depends on ARCH_SUPPORTS_OF'.
> > >>
> > >> This is actually what ACPI does.
> > >>
> > >> menuconfig ACPI
> > >> bool "ACPI (Advanced Configuration and Power Interface) Support"
> > >> depends on ARCH_SUPPORTS_ACPI
> > >>  ...
> > >
> > > ACPI is a real platform feature, as it depends on firmware.
> > >
> > > CONFIG_OF can be enabled, and DT overlays can be loaded, on any platform,
> > > even if it has ACPI ;-)
> >
> > How would loading a DT overlay work on an ACPI platform? I.e., what
> > would it overlay against and how to practically load such a file?
>
> The DT unittests do just that. I run them on x86 and UM builds. In
> this case, the loading source is built-in.
>
> > I wonder whether that could be helpful for USB devices and serdev...
>
> How to load the overlays is pretty orthogonal to the issues to be
> solved here. It would certainly be possible to move forward with
> prototyping this and just have the overlay built-in. It may not even
> need to be an overlay if we can support multiple root nodes.

You indeed need to refer to some anchors for most use cases, although a
simple MMIO device could just be anchored to the root node.

Topologies hanging off a USB device would be my first use case, too,
for serdev, or for e.g. the mcp2210 USB-SPI bridge.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH] powerpc/rtas: Fix a potential race between CPU-Offline & Migration

2018-10-01 Thread Gautham R Shenoy
On Fri, Sep 28, 2018 at 03:36:08PM -0500, Nathan Fontenot wrote:
> On 09/28/2018 02:02 AM, Gautham R Shenoy wrote:
> > Hi Nathan,
> > 
> > On Thu, Sep 27, 2018 at 12:31:34PM -0500, Nathan Fontenot wrote:
> >> On 09/27/2018 11:51 AM, Gautham R. Shenoy wrote:
> >>> From: "Gautham R. Shenoy" 
> >>>
> >>> Live Partition Migrations require all the present CPUs to execute the
> >>> H_JOIN call, and hence rtas_ibm_suspend_me() onlines any offline CPUs
> >>> before initiating the migration for this purpose.
> >>>
> >>> The commit 85a88cabad57
> >>> ("powerpc/pseries: Disable CPU hotplug across migrations")
> >>> disables any CPU-hotplug operations once all the offline CPUs are
> >>> brought online to prevent any further state change. Once the
> >>> CPU-Hotplug operation is disabled, the code assumes that all the CPUs
> >>> are online.
> >>>
> >>> However, there is a minor window in rtas_ibm_suspend_me() between
> >>> onlining the offline CPUs and disabling CPU-Hotplug when a concurrent
> >>> CPU-offline operations initiated by the userspace can succeed thereby
> >>> nullifying the the aformentioned assumption. In this unlikely case
> >>> these offlined CPUs will not call H_JOIN, resulting in a system hang.
> >>>
> >>> Fix this by verifying that all the present CPUs are actually online
> >>> after CPU-Hotplug has been disabled, failing which we return from
> >>> rtas_ibm_suspend_me() with -EBUSY.
> >>
> >> Would we also want to havr the ability to re-try onlining all of the cpus
> >> before failing the migration?
> > 
> > Given that we haven't been able to hit issue in practice after your
> > fix to disable CPU Hotplug after migrations, it indicates that the
> > race-window, if it is not merely a theoretical one, is extremely
> > narrow. So, this current patch addresses the safety aspect, as in,
> > should someone manage to exploit this narrow race-window, it ensures
> > that the system doesn't go to a hang state.
> > 
> > Having the ability to retry onlining all the CPUs is only required for
> > progress of LPM in this rarest of cases. We should add the code to
> > retry onlining the CPUs if the consequence of failing an LPM is high,
> > even in this rarest of case. Otherwise IMHO we should be ok not adding
> > the additional code.
> 
> I believe you're correct. One small update to the patch below...
> 
> > 
> >>
> >> This would involve a bigger code change as the current code to online all
> >> CPUs would work in its current form.
> >>
> >> -Nathan
> >>
> >>>
> >>> Cc: Nathan Fontenot 
> >>> Cc: Tyrel Datwyler 
> >>> Suggested-by: Michael Ellerman 
> >>> Signed-off-by: Gautham R. Shenoy 
> >>> ---
> >>>  arch/powerpc/kernel/rtas.c | 10 ++
> >>>  1 file changed, 10 insertions(+)
> >>>
> >>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> >>> index 2c7ed31..27f6fd3 100644
> >>> --- a/arch/powerpc/kernel/rtas.c
> >>> +++ b/arch/powerpc/kernel/rtas.c
> >>> @@ -982,6 +982,16 @@ int rtas_ibm_suspend_me(u64 handle)
> >>>   }
> >>>
> >>>   cpu_hotplug_disable();
> >>> +
> >>> + /* Check if we raced with a CPU-Offline Operation */
> >>> + if (unlikely(!cpumask_equal(cpu_present_mask, cpu_online_mask))) {
> >>> + pr_err("%s: Raced against a concurrent CPU-Offline\n",
> >>> +__func__);
> >>> + atomic_set(, -EBUSY);
> >>> + cpu_hotplug_enable();
> 
> Before returning, we return all CPUs that were offline prior to the migration
> back to the offline state. We should be doing that here as well. This should
> be as simple as adding a call to rtas_offline_cpus_mask() here.

You are right. I will add the code to undo the offline and send it.

Thanks for the review!

> 
> -Nathan
> 
> >>> + goto out;
> >>> + }
> >>> +
> >>>   stop_topology_update();
> >>>
> >>>   /* Call function on all CPUs.  One of us will make the
> >>>



[PATCH] powerpc/nohash: fix undefined behaviour when testing page size support

2018-10-01 Thread Daniel Axtens
When enumerating page size definitions to check hardware support,
we construct a constant which is (1U << (def->shift - 10)).

However, the array of page size definitions is only initalised for
various MMU_PAGE_* constants, so it contains a number of 0-initialised
elements with def->shift == 0. This means we end up shifting by a
very large number, which gives the following UBSan splat:


UBSAN: Undefined behaviour in 
/home/dja/dev/linux/linux/arch/powerpc/mm/tlb_nohash.c:506:21
shift exponent 4294967286 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-00045-ga604f927b012-dirty #6
Call Trace:
[c101bc20] [c0a13d54] .dump_stack+0xa8/0xec (unreliable)
[c101bcb0] [c04f20a8] .ubsan_epilogue+0x18/0x64
[c101bd30] [c04f2b10] 
.__ubsan_handle_shift_out_of_bounds+0x110/0x1a4
[c101be20] [c0d21760] .early_init_mmu+0x1b4/0x5a0
[c101bf10] [c0d1ba28] .early_setup+0x100/0x130
[c101bf90] [c528] start_here_multiplatform+0x68/0x80


Fix this by first checking if the element exists (shift != 0) before
constructing the constant.

Signed-off-by: Daniel Axtens 
---
 arch/powerpc/mm/tlb_nohash.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 15fe5f0c8665..ae5d568e267f 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -503,6 +503,9 @@ static void setup_page_sizes(void)
for (psize = 0; psize < MMU_PAGE_COUNT; ++psize) {
struct mmu_psize_def *def = _psize_defs[psize];
 
+   if (!def->shift)
+   continue;
+
if (tlb1ps & (1U << (def->shift - 10))) {
def->flags |= MMU_PAGE_SIZE_DIRECT;
 
-- 
2.17.1



[PATCH] powerpc: remove leftover code of old GCC version checks

2018-10-01 Thread Masahiro Yamada
Clean up the leftover of commit f2910f0e6835 ("powerpc: remove old
GCC version checks").

Signed-off-by: Masahiro Yamada 
---

My patch had been sent earlier, with more clean-ups:
https://lore.kernel.org/patchwork/patch/977805/

Anyway, this cleans up the left-over of the Nicholas' one.


 arch/powerpc/Makefile | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 2ecd0976..b094375 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -400,10 +400,6 @@ archclean:
 
 archprepare: checkbin
 
-# Use the file '.tmp_gas_check' for binutils tests, as gas won't output
-# to stdout and these checks are run even on install targets.
-TOUT   := .tmp_gas_check
-
 # Check toolchain versions:
 # - gcc-4.6 is the minimum kernel-wide version so nothing required.
 checkbin:
@@ -414,7 +410,3 @@ checkbin:
echo -n '*** Please use a different binutils version.' ; \
false ; \
fi
-
-
-CLEAN_FILES += $(TOUT)
-
-- 
2.7.4