Re: [FIX] powerpc: discard .exit.data at runtime

2015-10-15 Thread Michael Ellerman
On Wed, 2015-07-10 at 23:28:28 UTC, Stephen Rothwell wrote:
> .exit.text is discarded at run time and there are some references from
> that to .exit.data, so we need to discard .exit.data at run time as well.
> 
> Fixes these errors:
> 
> `.exit.data' referenced in section `.exit.text' of drivers/built-in.o: 
> defined in discarded section `.exit.data' of drivers/built-in.o
> `.exit.data' referenced in section `.exit.text' of drivers/built-in.o: 
> defined in discarded section `.exit.data' of drivers/built-in.o
> 
> Signed-off-by: Stephen Rothwell 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/4c8123181d692c5b78650ee5

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: Delete old orphaned PrPMC 280/2800 DTS and boot file.

2015-10-15 Thread Michael Ellerman
On Tue, 2015-13-10 at 23:20:51 UTC, Paul Gortmaker wrote:
> In commit 3c8464a9b12bf83807b6e2c896d7e7b633e1cae7 ("powerpc:
> Delete old PrPMC 280/2800 support") we got rid of most of the C
> code, and the Makefile/Kconfig hooks, but it seems I left the
> platform's DTS file orphaned in the tree as well as the boot code.
> Here we get rid of them both.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Paul Gortmaker 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5fab1d1cb18d27d1a2a5f110

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-15 Thread Will Deacon
Dammit guys, it's never simple is it?

On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
> To that end, the herd tool can make a diagram of what it thought
> happened, and I have attached it.  I used this diagram to try and force
> this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> and succeeded.  Here is the sequence of events:
> 
> o Commit P0's write.  The model offers to propagate this write
>   to the coherence point and to P1, but don't do so yet.
> 
> o Commit P1's write.  Similar offers, but don't take them up yet.
> 
> o Commit P0's lwsync.
> 
> o Execute P0's lwarx, which reads a=0.  Then commit it.
> 
> o Commit P0's stwcx. as successful.  This stores a=1.

On arm64, this is a conditional-store-*release* and therefore cannot be
observed before the initial write to x...

> o Commit P0's branch (not taken).
> 
> o Commit P0's final register-to-register move.
> 
> o Commit P1's sync instruction.
> 
> o There is now nothing that can happen in either processor.
>   P0 is done, and P1 is waiting for its sync.  Therefore,
>   propagate P1's a=2 write to the coherence point and to
>   the other thread.

... therefore this is illegal, because you haven't yet propagated that
prior write...

> 
> o There is still nothing that can happen in either processor.
>   So pick the barrier propagate, then the acknowledge sync.
> 
> o P1 can now execute its read from x.  Because P0's write to
>   x is still waiting to propagate to P1, this still reads
>   x=0.  Execute and commit, and we now have both r3 registers
>   equal to zero and the final value a=2.

... and P1 would have to read x == 1.

So arm64 is ok. Doesn't lwsync order store->store observability for PPC?

Will
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/pci: export pcibios_free_controller()

2015-10-15 Thread Michael Ellerman
On Thu, 2015-10-09 at 06:28:34 UTC, Andrew Donnellan wrote:
> Export pcibios_free_controller(), so it can be used by the cxl module to
> free virtual PHBs.
> 
> Signed-off-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/6b8b252f40d39e5815be17aa

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: drivers/macintosh: adb: fix misleading Kconfig help text

2015-10-15 Thread Michael Ellerman
On Thu, 2015-01-10 at 19:41:40 UTC, Aaro Koskinen wrote:
> CONFIG_INPUT_KEYBDEV does not exist and no additional keyboard-specific
> options are needed to get the keyboard working.
> 
> Signed-off-by: Aaro Koskinen 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f27b86dc1ec41ff4b5b58094

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/numa: Use of_get_next_parent to simplify code

2015-10-15 Thread Michael Ellerman
On Sun, 2015-11-10 at 20:23:27 UTC, Christophe Jaillet wrote:
> of_get_next_parent can be used to simplify the while() loop and
> avoid the need of a temp variable.
> 
> Signed-off-by: Christophe JAILLET 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1def37586fb1f3bbbedeaa64

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/4] powerpc/pseries: Make PCI non-optional

2015-10-15 Thread Michael Ellerman
On Thu, 2015-01-10 at 06:44:31 UTC, Michael Ellerman wrote:
> The pseries build with PCI=n looks to have been broken for at least 5
> years, and no one's noticed or cared.
> 
> Following the obvious breakages backward, the first commit I can find
> that builds is the parent of 2eb4afb69ff3 ("powerpc/pci: Move pseries
> code into pseries platform specific area") from April 2009.
> 
> A distro would never ship a PCI=n kernel, so it is only useful for folks
> building custom kernels. Also on KVM the virtio devices appear on PCI,
> so it would only be useful if you were building kernels specifically to
> run on PowerVM and with no PCI devices.
> 
> The added code complexity, and testing load (which we've clearly not
> been doing), is not justified by the small reduction in kernel size for
> such a niche use case.
> 
> So just make PCI non-optional on pseries.
> 
> Signed-off-by: Michael Ellerman 

Series applied to powerpc next.

https://git.kernel.org/powerpc/c/4c9cd468b348c9e47f9380a5

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/eeh: atomic_dec_if_positive() to update passthru count

2015-10-15 Thread Michael Ellerman
On Thu, 2015-27-08 at 05:58:27 UTC, Gavin Shan wrote:
> No need to have two atomic opertions (update and fetch/check) when
> decreasing PE's number of passed devices as one atomic operation
> is enough.
> 
> Signed-off-by: Gavin Shan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/54f9a64a36e4fc041721a954

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,1/2] powerpc/xmon: Paged output for paca display

2015-10-15 Thread Michael Ellerman
On Thu, 2015-08-10 at 00:50:23 UTC, Sam bobroff wrote:
> The paca display is already more than 24 lines, which can be problematic
> if you have an old school 80x24 terminal, or more likely you are on a
> virtual terminal which does not scroll for whatever reason.
...
> Signed-off-by: Sam Bobroff 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/958b7c80507a6eb84b0d

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,1/1] powerpc: Individual System V IPC system calls

2015-10-15 Thread Michael Ellerman
On Tue, 2015-13-10 at 01:49:28 UTC, Sam bobroff wrote:
> This patch provides individual system call numbers for the following
> System V IPC system calls, on PowerPC, so that they do not need to be
> multiplexed:
> * semop, semget, semctl, semtimedop
> * msgsnd, msgrcv, msgget, msgctl
> * shmat, shmdt, shmget, shmctl
> 
> Signed-off-by: Sam Bobroff 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a34236155afb1cc41945e583

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: cxl: Free virtual PHB when removing

2015-10-15 Thread Michael Ellerman
On Tue, 2015-13-10 at 04:09:44 UTC, Andrew Donnellan wrote:
> When adding a vPHB in cxl_pci_vphb_add(), we allocate a pci_controller
> struct using pcibios_alloc_controller(). However, we don't free it in
> cxl_pci_vphb_remove(), causing a leak.
> 
> Call pcibios_free_controller() in cxl_pci_vphb_remove() to free the vPHB
> data structure correctly.
> 
> Signed-off-by: Daniel Axtens 
> Signed-off-by: Andrew Donnellan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2e1a2556ebbbe7b53a05b721

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/mpc5xxx: Use of_get_next_parent to simplify code

2015-10-15 Thread Michael Ellerman
On Sun, 2015-11-10 at 20:27:40 UTC, Christophe Jaillet wrote:
> of_get_next_parent can be used to simplify the while() loop and
> avoid the need of a temp variable.
> 
> Signed-off-by: Christophe JAILLET 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b340587e68b479e52039f800

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building with O=

2015-10-15 Thread Michal Marek
Dne 15.10.2015 v 08:05 Michael Ellerman napsal(a):
> My recent commit d2036f30cfe1 ("scripts/kconfig/Makefile: Allow
> KBUILD_DEFCONFIG to be a target"), contained a bug in that when it
> checks if KBUILD_DEFCONFIG is a file it forgets to prepend $(srctree) to
> the path.
> 
> This causes the build to fail when building out of tree (with O=), and
> when the value of KBUILD_DEFCONFIG is 'defconfig'. In that case we will
> fail to find the 'defconfig' file, because we look in the build
> directory not $(srctree), and so we will call Make again with
> 'defconfig' as the target. From there we loop infinitely calling 'make
> defconfig' again and again.
> 
> The fix is simple, we need to look for the file under $(srctree).
> 
> Fixes: d2036f30cfe1 ("scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be 
> a target")
> Reported-by: Olof Johansson 
> Signed-off-by: Michael Ellerman 

Acked-by: Michal Marek 

I could have spotted it myself :-/.

Michal

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [v3,1/5] powerpc/fsl: Move fsl_guts.h out of arch/powerpc

2015-10-15 Thread Hou Zhiqiang

> -Original Message-
> From patchwork Sun Sep 20 04:29:53 2015   
>  
> Content-Type: text/plain; charset="utf-8" 
>  
> MIME-Version: 1.0 
>  
> Content-Transfer-Encoding: 7bit   
>  
> Subject: [v3,1/5] powerpc/fsl: Move fsl_guts.h out of arch/powerpc
>  
> From: Scott Wood 
>  
> X-Patchwork-Id: 7225421   
>  
> Message-Id: <1442723397-26329-2-git-send-email-scottw...@freescale.com>   
>  
> To: Michael Turquette , Stephen Boyd 
>  
>   , "Rafael J. Wysocki" ,   
>   
>   Viresh Kumar
>  
>   , Russell King 
>  
> Cc: linux...@vger.kernel.org, Tang Yuantian ,
>  
>   Scott Wood , linuxppc-dev@lists.ozlabs.org,
>  
>   linux-...@vger.kernel.org, linux-arm-ker...@lists.infradead.org 
>  
> Date: Sat, 19 Sep 2015 23:29:53 -0500 
>  
>   
>  
> Freescale's Layerscape ARM chips use the same structure.  
>  
>   
>  
> Signed-off-by: Scott Wood    
>  
>   
>  
> ---   
>  
> v3: was patch 2/5 
>  
>   
>  
>  arch/powerpc/include/asm/fsl_guts.h| 192 
> -
>  arch/powerpc/platforms/85xx/mpc85xx_mds.c  |   2 +-  
>  
>  arch/powerpc/platforms/85xx/mpc85xx_rdb.c  |   2 +-  
>  
>  arch/powerpc/platforms/85xx/p1022_ds.c |   2 +-  
>  
>  arch/powerpc/platforms/85xx/p1022_rdk.c|   2 +-  
>  
>  arch/powerpc/platforms/85xx/smp.c  |   2 +-  
>  
>  arch/powerpc/platforms/85xx/twr_p102x.c|   2 +-  
>  
>  arch/powerpc/platforms/86xx/mpc8610_hpcd.c |   2 +-  
>  
>  drivers/iommu/fsl_pamu.c   |   2 +-  
>  
>  include/linux/fsl/guts.h   | 192 
> +
>  sound/soc/fsl/mpc8610_hpcd.c   |   2 +-  
>  
>  sound/soc/fsl/p1022_ds.c   |   2 +-  
>  
>  sound/soc/fsl/p1022_rdk.c  |   2 +-  
>  
>  13 files changed, 203 insertions(+), 203 deletions(-)
>  
>  delete mode 100644 arch/powerpc/include/asm/fsl_guts.h   
>  
>  create mode 100644 include/linux/fsl/guts.h  
>  

LS1043A clock driver depends on this patchset, and tested on LS1043ARDB board.

Thanks,
Zhiqiang
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: devicetree and IRQ7 mapping for T1042(mpic)

2015-10-15 Thread Scott Wood
On Thu, 2015-10-15 at 07:11 +, Joakim Tjernlund wrote:
> On Wed, 2015-10-14 at 19:11 -0500, Scott Wood wrote:
> > On Wed, 2015-10-14 at 19:37 +, Joakim Tjernlund wrote:
> > > I am trying to figure out how to describe/map external IRQ7 in the 
> > > devicetree.
> > > 
> > > Basically either IRQ7 to be left alone by Linux(becase u-boot already 
> > > set 
> > > it up)
> > > or map IRQ7 to sie 0(MPIC_EILR7=0xf0) and prio=0xf(MPIC_EIVPR7=0x4f)
> > > 
> > > There is no need for SW handler because IRQ7 will be routed to the DDR 
> > > controller
> > > and case an automatic Self Refresh just before CPU reset.
> > > 
> > > I cannot figure out how to do this. Any ideas?
> > > 
> > > If not possible from devicetree, then can one do it from board code?
> > 
> > The device tree describes the hardware.  Priority is configuration, and 
> > thus 
> > doesn't belong there.  You can call mpic_irq_set_priority() from board 
> > code.
> 
> Right.
> 
> > 
> > Likewise, the fact that you want to route irq7 to sie0 is configuration, 
> > not 
> > hardware description.  At most, the device tree should describe is what 
> > is 
> > connected to each sie output.  There's no current Linux support for 
> > routing 
> > an interrupt to sie or anything other than "int".
> 
> That explains why I could not find any mpic function for that ..
> 
> I found mpic dev. trees property "protected-sources" which might do what I 
> want, just
> leave the the irq alone but I cannot figure out what value to write there.
> Could you give me any example how to calculate dev. tree irq number for 
> IRQ7?
> 
> The mpic.txt mentions "Interrupt Source Configuration Registers" but google 
> did
> not turn up anything useful for me.

The device tree number for external IRQ 7 is 7.  Another option is to use the 
pic-no-reset property.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC, 1/2] scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target

2015-10-15 Thread Michal Marek
Dne 15.10.2015 v 05:27 Michael Ellerman napsal(a):
> On Wed, 2015-10-14 at 09:54 -0700, Olof Johansson wrote:
>> On Tue, Oct 13, 2015 at 4:43 PM, Michael Ellerman  
>> wrote:
>>> On Tue, 2015-10-13 at 14:02 -0700, Olof Johansson wrote:
 On Fri, Oct 2, 2015 at 12:47 AM, Michael Ellerman  
 wrote:
> On Wed, 2015-23-09 at 05:40:34 UTC, Michael Ellerman wrote:
>> Arch Makefiles can set KBUILD_DEFCONFIG to tell kbuild the name of the
>> defconfig that should be built by default.
>>
>> However currently there is an assumption that KBUILD_DEFCONFIG points to
>> a file at arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG).
>>
>> We would like to use a target, using merge_config, as our defconfig, so
>> adapt the logic in scripts/kconfig/Makefile to allow that.
>>
>> To minimise the chance of breaking anything, we first check if
>> KBUILD_DEFCONFIG is a file, and if so we do the old logic. If it's not a
>> file, then we call the top-level Makefile with KBUILD_DEFCONFIG as the
>> target.
>>
>> Signed-off-by: Michael Ellerman 
>> Acked-by: Michal Marek 
>
> Applied to powerpc next.
>
> https://git.kernel.org/powerpc/c/d2036f30cfe1daa19e63ce75

 This breaks arm64 defconfig for me:

 mkdir obj-tmp
 make -f Makefile O=obj-tmp ARCH=arm64 defconfig
 ... watch loop of:
 *** Default configuration is based on target 'defconfig'
   GEN ./Makefile
>>>
>>> Crap, sorry. I knew I shouldn't have touched that code!
>>>
>>> Does this fix it for you?
>>
>> Yes, it does, however:
>>
>>> diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
>>> index b2b9c87..3043d6b 100644
>>> --- a/scripts/kconfig/Makefile
>>> +++ b/scripts/kconfig/Makefile
>>> @@ -96,7 +96,7 @@ savedefconfig: $(obj)/conf
>>>  defconfig: $(obj)/conf
>>>  ifeq ($(KBUILD_DEFCONFIG),)
>>> $< $(silent) --defconfig $(Kconfig)
>>> -else ifneq ($(wildcard arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)),)
>>> +else ifneq ($(wildcard 
>>> $(srctree)/arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)),)
>>> @$(kecho) "*** Default configuration is based on 
>>> '$(KBUILD_DEFCONFIG)'"
>>> $(Q)$< $(silent) 
>>> --defconfig=arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG) $(Kconfig)
>>
>> Do you need a $(srctree) prefix here too? I'm not entirely sure what I
>> would do to reproduce a run that goes down this path so I can't
>> confirm.
> 
> That is the path you're going down, now that it's fixed. That's the path where
> KBUILD_DEFCONFIG is a real file, ie. the old behaviour.
> 
> I'm not sure why it doesn't have a $(srctree) there, but it's never had one.
> 
> It looks like it eventually boils down to zconf_fopen() which looks for the
> file in both .  and $(srctree).

Yes, the kconfig frontends do part of what would ideally be the job of
make or the Makefile.

Michal
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: devicetree and IRQ7 mapping for T1042(mpic)

2015-10-15 Thread Joakim Tjernlund
On Wed, 2015-10-14 at 19:11 -0500, Scott Wood wrote:
> On Wed, 2015-10-14 at 19:37 +, Joakim Tjernlund wrote:
> > I am trying to figure out how to describe/map external IRQ7 in the 
> > devicetree.
> > 
> > Basically either IRQ7 to be left alone by Linux(becase u-boot already set 
> > it up)
> > or map IRQ7 to sie 0(MPIC_EILR7=0xf0) and prio=0xf(MPIC_EIVPR7=0x4f)
> > 
> > There is no need for SW handler because IRQ7 will be routed to the DDR 
> > controller
> > and case an automatic Self Refresh just before CPU reset.
> > 
> > I cannot figure out how to do this. Any ideas?
> > 
> > If not possible from devicetree, then can one do it from board code?
> 
> The device tree describes the hardware.  Priority is configuration, and thus 
> doesn't belong there.  You can call mpic_irq_set_priority() from board code.

Right.

> 
> Likewise, the fact that you want to route irq7 to sie0 is configuration, not 
> hardware description.  At most, the device tree should describe is what is 
> connected to each sie output.  There's no current Linux support for routing 
> an interrupt to sie or anything other than "int".

That explains why I could not find any mpic function for that ..

I found mpic dev. trees property "protected-sources" which might do what I 
want, just
leave the the irq alone but I cannot figure out what value to write there.
Could you give me any example how to calculate dev. tree irq number for IRQ7?

The mpic.txt mentions "Interrupt Source Configuration Registers" but google did
not turn up anything useful for me.

 Jocke
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V5 3/6] powerpc/powernv: use one M64 BAR in Single PE mode for one VF BAR

2015-10-15 Thread Wei Yang
On Wed, Oct 14, 2015 at 12:15:32PM +1100, Gavin Shan wrote:
>On Tue, Oct 13, 2015 at 01:29:44PM +0800, Wei Yang wrote:
>>On Tue, Oct 13, 2015 at 02:32:49PM +1100, Gavin Shan wrote:
>>>On Tue, Oct 13, 2015 at 10:50:42AM +0800, Wei Yang wrote:
On Tue, Oct 13, 2015 at 10:55:27AM +1100, Gavin Shan wrote:
>On Fri, Oct 09, 2015 at 10:46:53AM +0800, Wei Yang wrote:
>>In current implementation, when VF BAR is bigger than 64MB, it uses 4 M64
>>BARs in Single PE mode to cover the number of VFs required to be enabled.
>>By doing so, several VFs would be in one VF Group and leads to 
>>interference
>>between VFs in the same group.
>>
>>And in this patch, m64_wins is renamed to m64_map, which means index 
>>number
>>of the M64 BAR used to map the VF BAR. Based on Gavin's comments.
>>
>>This patch changes the design by using one M64 BAR in Single PE mode for
>>one VF BAR. This gives absolute isolation for VFs.
>>
>>Signed-off-by: Wei Yang 
>>Reviewed-by: Gavin Shan 
>>Acked-by: Alexey Kardashevskiy 
>>---
>> arch/powerpc/include/asm/pci-bridge.h |   5 +-
>> arch/powerpc/platforms/powernv/pci-ioda.c | 169 
>> --
>> 2 files changed, 68 insertions(+), 106 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>>b/arch/powerpc/include/asm/pci-bridge.h
>>index 712add5..8aeba4c 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -214,10 +214,9 @@ struct pci_dn {
>>  u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
>>  u16 num_vfs;/* number of VFs enabled*/
>>  int offset; /* PE# for the first VF PE */
>>-#define M64_PER_IOV 4
>>- int m64_per_iov;
>>+ boolm64_single_mode;/* Use M64 BAR in Single Mode */
>> #define IODA_INVALID_M64(-1)
>>- int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
>>+ int (*m64_map)[PCI_SRIOV_NUM_BARS];
>> #endif /* CONFIG_PCI_IOV */
>> #endif
>>  struct list_head child_list;
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 7da476b..2886f90 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -1148,29 +1148,36 @@ static void pnv_pci_ioda_setup_PEs(void)
>> }
>>
>> #ifdef CONFIG_PCI_IOV
>>-static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
>>+static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
>> {
>>  struct pci_bus*bus;
>>  struct pci_controller *hose;
>>  struct pnv_phb*phb;
>>  struct pci_dn *pdn;
>>  inti, j;
>>+ intm64_bars;
>>
>>  bus = pdev->bus;
>>  hose = pci_bus_to_host(bus);
>>  phb = hose->private_data;
>>  pdn = pci_get_pdn(pdev);
>>
>>+ if (pdn->m64_single_mode)
>>+ m64_bars = num_vfs;
>>+ else
>>+ m64_bars = 1;
>>+
>>  for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>- for (j = 0; j < M64_PER_IOV; j++) {
>>- if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
>>+ for (j = 0; j < m64_bars; j++) {
>>+ if (pdn->m64_map[j][i] == IODA_INVALID_M64)
>>  continue;
>>  opal_pci_phb_mmio_enable(phb->opal_id,
>>- OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
>>- clear_bit(pdn->m64_wins[i][j], 
>>>ioda.m64_bar_alloc);
>>- pdn->m64_wins[i][j] = IODA_INVALID_M64;
>>+ OPAL_M64_WINDOW_TYPE, pdn->m64_map[j][i], 0);
>>+ clear_bit(pdn->m64_map[j][i], >ioda.m64_bar_alloc);
>>+ pdn->m64_map[j][i] = IODA_INVALID_M64;
>>  }
>>
>>+ kfree(pdn->m64_map);
>>  return 0;
>> }
>>
>>@@ -1187,8 +1194,7 @@ static int pnv_pci_vf_assign_m64(struct pci_dev 
>>*pdev, u16 num_vfs)
>>  inttotal_vfs;
>>  resource_size_tsize, start;
>>  intpe_num;
>>- intvf_groups;
>>- intvf_per_group;
>>+ intm64_bars;
>>
>>  bus = pdev->bus;
>>  hose = pci_bus_to_host(bus);
>>@@ -1196,26 +1202,26 @@ static int pnv_pci_vf_assign_m64(struct pci_dev 
>>*pdev, u16 num_vfs)
>>  pdn = pci_get_pdn(pdev);
>>  total_vfs = pci_sriov_get_totalvfs(pdev);
>>
>>- /* Initialize the m64_wins to IODA_INVALID_M64 */
>>- for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
>>- for (j = 0; j < M64_PER_IOV; j++)
>>- pdn->m64_wins[i][j] = 

Re: [PATCH] scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building with O=

2015-10-15 Thread Michael Ellerman
On Thu, 2015-10-15 at 09:27 +0200, Michal Marek wrote:
> Dne 15.10.2015 v 08:05 Michael Ellerman napsal(a):
> > My recent commit d2036f30cfe1 ("scripts/kconfig/Makefile: Allow
> > KBUILD_DEFCONFIG to be a target"), contained a bug in that when it
> > checks if KBUILD_DEFCONFIG is a file it forgets to prepend $(srctree) to
> > the path.
> > 
> > This causes the build to fail when building out of tree (with O=), and
> > when the value of KBUILD_DEFCONFIG is 'defconfig'. In that case we will
> > fail to find the 'defconfig' file, because we look in the build
> > directory not $(srctree), and so we will call Make again with
> > 'defconfig' as the target. From there we loop infinitely calling 'make
> > defconfig' again and again.
> > 
> > The fix is simple, we need to look for the file under $(srctree).
> > 
> > Fixes: d2036f30cfe1 ("scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to 
> > be a target")
> > Reported-by: Olof Johansson 
> > Signed-off-by: Michael Ellerman 
> 
> Acked-by: Michal Marek 
> 
> I could have spotted it myself :-/.

It was pretty easy to miss in the diff, especially as the kconfig invocation
doesn't use $(srctree).

I should have noticed it in my testing, but it didn't actually break powerpc,
so the only clue was that the message says "based on target". Anyway fixed now
hopefully.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] selfttest/powerpc: Add memory page migration tests

2015-10-15 Thread Anshuman Khandual
This adds two tests for memory page migration. One for normal page
migration which works for both 4K or 64K base page size kernel and
the other one is for 16MB huge page migration which will work both
4K or 64K base page sized 16MB huge pages as and when we support
huge page migration.

Signed-off-by: Anshuman Khandual 
---
- Works for normal page migration on both 64K and 4K base pages
- Works for 16MB huge page migration (64K) on Aneesh's V2 PTE changes

 tools/testing/selftests/powerpc/mm/Makefile|  14 +-
 .../selftests/powerpc/mm/hugepage-migration.c  |  30 
 tools/testing/selftests/powerpc/mm/migration.h | 196 +
 .../testing/selftests/powerpc/mm/page-migration.c  |  33 
 tools/testing/selftests/powerpc/mm/run_mmtests |  21 +++
 5 files changed, 289 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/hugepage-migration.c
 create mode 100644 tools/testing/selftests/powerpc/mm/migration.h
 create mode 100644 tools/testing/selftests/powerpc/mm/page-migration.c
 create mode 100755 tools/testing/selftests/powerpc/mm/run_mmtests

diff --git a/tools/testing/selftests/powerpc/mm/Makefile 
b/tools/testing/selftests/powerpc/mm/Makefile
index ee179e2..c482614 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -1,12 +1,16 @@
 noarg:
$(MAKE) -C ../
 
-TEST_PROGS := hugetlb_vs_thp_test subpage_prot
-TEST_FILES := tempfile
+TEST_PROGS := run_mmtests
+TEST_FILES := hugetlb_vs_thp_test
+TEST_FILES += subpage_prot
+TEST_FILES += tempfile
+TEST_FILES += hugepage-migration
+TEST_FILES += page-migration
 
-all: $(TEST_PROGS) $(TEST_FILES)
+all: $(TEST_FILES)
 
-$(TEST_PROGS): ../harness.c
+$(TEST_FILES): ../harness.c
 
 include ../../lib.mk
 
@@ -14,4 +18,4 @@ tempfile:
dd if=/dev/zero of=tempfile bs=64k count=1
 
 clean:
-   rm -f $(TEST_PROGS) tempfile
+   rm -f $(TEST_FILES)
diff --git a/tools/testing/selftests/powerpc/mm/hugepage-migration.c 
b/tools/testing/selftests/powerpc/mm/hugepage-migration.c
new file mode 100644
index 000..b60bc10
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/hugepage-migration.c
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2015, Anshuman Khandual, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+#include "migration.h"
+
+static int hugepage_migration(void)
+{
+   int ret = 0;
+
+   if ((unsigned long)getpagesize() == 0x1000)
+   printf("Running on base page size 4K\n");
+
+   if ((unsigned long)getpagesize() == 0x1)
+   printf("Running on base page size 64K\n");
+
+   ret = test_huge_migration(16 * MEM_MB);
+   ret = test_huge_migration(256 * MEM_MB);
+   ret = test_huge_migration(512 * MEM_MB);
+
+   return ret;
+}
+
+int main(void)
+{
+   return test_harness(hugepage_migration, "hugepage_migration");
+}
diff --git a/tools/testing/selftests/powerpc/mm/migration.h 
b/tools/testing/selftests/powerpc/mm/migration.h
new file mode 100644
index 000..2f9e3f9
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/migration.h
@@ -0,0 +1,196 @@
+/*
+ * Copyright (C) 2015, Anshuman Khandual, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+
+#define HPAGE_OFF  0
+#define HPAGE_ON   1
+
+#define PAGE_SHIFT_4K  12
+#define PAGE_SHIFT_64K 16
+#define PAGE_SIZE_4K   0x1000
+#define PAGE_SIZE_64K  0x1
+#define PAGE_SIZE_HUGE 16UL * 1024 * 1024
+
+#define MEM_GB 1024UL * 1024 * 1024
+#define MEM_MB 1024UL * 1024
+#define MME_KB 1024UL
+
+#define PMAP_FILE  "/proc/self/pagemap"
+#define PMAP_PFN   0x007FUL
+#define PMAP_SIZE  8
+
+#define SOFT_OFFLINE   "/sys/devices/system/memory/soft_offline_page"
+#define HARD_OFFLINE   "/sys/devices/system/memory/hard_offline_page"
+
+#define MMAP_LENGTH(256 * MEM_MB)
+#define MMAP_ADDR  (void *)(0x0UL)
+#define MMAP_PROT  (PROT_READ | PROT_WRITE)
+#define MMAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS)
+#define MMAP_FLAGS_HUGE(MAP_SHARED)
+
+#define FILE_NAME  "huge/hugepagefile"
+
+static void write_buffer(char *addr, unsigned long length)
+{
+   unsigned long i;
+
+   for (i = 0; i < length; i++)
+   *(addr + i) = (char)i;
+}
+
+static int read_buffer(char *addr, unsigned long length)
+{
+   unsigned long i;
+
+   for (i = 0; i < length; i++) {
+   if (*(addr + i) != (char)i) {
+   printf("Data miscompare at addr[%lu]\n", i);
+   return 

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-15 Thread Paul E. McKenney
On Thu, Oct 15, 2015 at 11:35:44AM +0100, Will Deacon wrote:
> Dammit guys, it's never simple is it?
> 
> On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
> > To that end, the herd tool can make a diagram of what it thought
> > happened, and I have attached it.  I used this diagram to try and force
> > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > and succeeded.  Here is the sequence of events:
> > 
> > o   Commit P0's write.  The model offers to propagate this write
> > to the coherence point and to P1, but don't do so yet.
> > 
> > o   Commit P1's write.  Similar offers, but don't take them up yet.
> > 
> > o   Commit P0's lwsync.
> > 
> > o   Execute P0's lwarx, which reads a=0.  Then commit it.
> > 
> > o   Commit P0's stwcx. as successful.  This stores a=1.
> 
> On arm64, this is a conditional-store-*release* and therefore cannot be
> observed before the initial write to x...
> 
> > o   Commit P0's branch (not taken).
> > 
> > o   Commit P0's final register-to-register move.
> > 
> > o   Commit P1's sync instruction.
> > 
> > o   There is now nothing that can happen in either processor.
> > P0 is done, and P1 is waiting for its sync.  Therefore,
> > propagate P1's a=2 write to the coherence point and to
> > the other thread.
> 
> ... therefore this is illegal, because you haven't yet propagated that
> prior write...

OK.  Power distinguishes between propagating to the coherence point
and to each of the other CPUs.

> > o   There is still nothing that can happen in either processor.
> > So pick the barrier propagate, then the acknowledge sync.
> > 
> > o   P1 can now execute its read from x.  Because P0's write to
> > x is still waiting to propagate to P1, this still reads
> > x=0.  Execute and commit, and we now have both r3 registers
> > equal to zero and the final value a=2.
> 
> ... and P1 would have to read x == 1.

Good!  Do ARMMEM and herd agree with you?

> So arm64 is ok. Doesn't lwsync order store->store observability for PPC?

Yes.  But this is not store->store observability, but rather store->load
visibility.  Furthermore, as I understand it, lwsync controls the
visibility to other CPUs, but not necessarily the coherence order.

Let's look at the example C code again:

CPU 0   CPU 1
-   -

WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
r3 = xchg(, 1);   smp_mb();
r3 = READ_ONCE(x);

The problem is that we are applying intuitions obtained from a
release-acquire chain, which hands off from stores to loads.  In contrast,
this example is quite weird in that we have a store handing off to another
store, but with reads also involved.  Making that work on Power requires
full memory barriers on both sides.  Intuitively, the coherence order
can be established after the fact as long as all readers see a consistent
set of values based on the subset of the sequence that each reader sees.

Anyway, it looks like Power does need a sync before and after for
value-returning atomics.  That certainly simplifies the analysis.

Thanx, Paul

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-15 Thread Paul E. McKenney
On Thu, Oct 15, 2015 at 10:49:23PM +0800, Boqun Feng wrote:
> On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 14, 2015 at 11:55:56PM +0800, Boqun Feng wrote:
> > > According to memory-barriers.txt, xchg, cmpxchg and their atomic{,64}_
> > > versions all need to imply a full barrier, however they are now just
> > > RELEASE+ACQUIRE, which is not a full barrier.
> > > 
> > > So replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
> > > PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
> > > __{cmp,}xchg_{u32,u64} respectively to guarantee a full barrier
> > > semantics of atomic{,64}_{cmp,}xchg() and {cmp,}xchg().
> > > 
> > > This patch is a complement of commit b97021f85517 ("powerpc: Fix
> > > atomic_xxx_return barrier semantics").
> > > 
> > > Acked-by: Michael Ellerman 
> > > Cc:  # 3.4+
> > > Signed-off-by: Boqun Feng 
> > > ---
> > >  arch/powerpc/include/asm/cmpxchg.h | 16 
> > >  1 file changed, 8 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/include/asm/cmpxchg.h 
> > > b/arch/powerpc/include/asm/cmpxchg.h
> > > index ad6263c..d1a8d93 100644
> > > --- a/arch/powerpc/include/asm/cmpxchg.h
> > > +++ b/arch/powerpc/include/asm/cmpxchg.h
> > > @@ -18,12 +18,12 @@ __xchg_u32(volatile void *p, unsigned long val)
> > >   unsigned long prev;
> > > 
> > >   __asm__ __volatile__(
> > > - PPC_RELEASE_BARRIER
> > > + PPC_ATOMIC_ENTRY_BARRIER
> > 
> > This looks to be the lwsync instruction.
> > 
> > >  "1:  lwarx   %0,0,%2 \n"
> > >   PPC405_ERR77(0,%2)
> > >  "stwcx.  %3,0,%2 \n\
> > >   bne-1b"
> > > - PPC_ACQUIRE_BARRIER
> > > + PPC_ATOMIC_EXIT_BARRIER
> > 
> > And this looks to be the sync instruction.
> > 
> > >   : "=" (prev), "+m" (*(volatile unsigned int *)p)
> > >   : "r" (p), "r" (val)
> > >   : "cc", "memory");
> > 
> > Hmmm...
> > 
> > Suppose we have something like the following, where "a" and "x" are both
> > initially zero:
> > 
> > CPU 0   CPU 1
> > -   -
> > 
> > WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > r3 = xchg(, 1);   smp_mb();
> > r3 = READ_ONCE(x);
> > 
> > If xchg() is fully ordered, we should never observe both CPUs'
> > r3 values being zero, correct?
> > 
> > And wouldn't this be represented by the following litmus test?
> > 
> > PPC SB+lwsync-RMW2-lwsync+st-sync-leading
> > ""
> > {
> > 0:r1=1; 0:r2=x; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
> > 1:r1=2; 1:r2=x; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
> > }
> >  P0 | P1 ;
> >  stw r1,0(r2)   | stw r1,0(r12)  ;
> >  lwsync | sync   ;
> >  lwarx  r11,r10,r12 | lwz r3,0(r2)   ;
> >  stwcx. r1,r10,r12  | ;
> >  bne Fail0  | ;
> >  mr r3,r11  | ;
> >  Fail0: | ;
> > exists
> > (0:r3=0 /\ a=2 /\ 1:r3=0)
> > 
> > I left off P0's trailing sync because there is nothing for it to order
> > against in this particular litmus test.  I tried adding it and verified
> > that it has no effect.
> > 
> > Am I missing something here?  If not, it seems to me that you need
> > the leading lwsync to instead be a sync.
> > 
> 
> If so, I will define PPC_ATOMIC_ENTRY_BARRIER as "sync" in the next
> version of this patch, any concern?
> 
> Of course, I will wait to do that until we all understand this is
> nececarry and agree to make the change.

I am in favor, but I am not the maintainer.  ;-)

Thanx, Paul

> > Of course, if I am not missing something, then this applies also to the
> > value-returning RMW atomic operations that you pulled this pattern from.
> 
> For the value-returning RMW atomics, if the leading barrier is
> necessarily to be "sync", I will just remove my __atomic_op_fence() in
> patch 4, but I will remain patch 3 unchanged for the consistency of
> __atomic_op_*() macros' definitions. Peter and Will, do that works for
> you both?
> 
> Regards,
> Boqun
> 
> > If so, it would seem that I didn't think through all the possibilities
> > back when PPC_ATOMIC_EXIT_BARRIER moved to sync...  In fact, I believe
> > that I worried about the RMW atomic operation acting as a barrier,
> > but not as the load/store itself.  :-/
> > 
> > Thanx, Paul
> > 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-15 Thread Boqun Feng
On Thu, Oct 15, 2015 at 11:35:44AM +0100, Will Deacon wrote:
> 
> So arm64 is ok. Doesn't lwsync order store->store observability for PPC?
> 

I did some litmus and put the result here. My understanding might be
wrong, and I think Paul can explain the lwsync and store->store order
better ;-)


When a store->lwsync->store pairs with load->lwsync->load, according to
herd, YES.

PPC W+lwsync+W-R+lwsync+R
"
  2015-10-15
  herds said (1:r1=0 /\ 1:r2=2) doesn't exist,
  so if P1 observe the write to 'b', it must also observe P0's write
  to 'a'
"
{
0:r1=1; 0:r2=2; 0:r12=a; 0:r13=b;
1:r1=0; 1:r2=0; 1:r12=a; 1:r13=b;
}

 P0  | P1 ;
 stw r1, 0(r12)  | lwz r2, 0(r13) ;
 lwsync  | lwsync ;
 stw r2, 0(r13)  | lwz r1, 0(r12) ;

exists
(1:r1=0 /\ 1:r2=2)


If observation also includes "a write on one CPU -override- another
write on another CPU", then

when a store->lwsync->store pairs(?) with store->sync->load, according
to herd, NO(?).

PPC W+lwsync+W-W+sync+R
"
  2015-10-15
  herds said (1:r1=0 /\ b=3) exists sometimes,
  so if P1 observe P0's write to 'b'(by 'overriding' this write to
  'b'), it may not observe P0's write to 'a'.
"
{
0:r1=1; 0:r2=2; 0:r12=a; 0:r13=b;
1:r1=0; 1:r2=3; 1:r12=a; 1:r13=b;
}

 P0  | P1 ;
 stw r1, 0(r12)  | stw r2, 0(r13) ;
 lwsync  | sync ;
 stw r2, 0(r13)  | lwz r1, 0(r12) ;

exists
(1:r1=0 /\ b=3)


Regards,
Boqun


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-15 Thread Will Deacon
On Thu, Oct 15, 2015 at 11:35:10AM +0100, Will Deacon wrote:
> Dammit guys, it's never simple is it?

I re-read this and it's even more confusing than I first thought.

> On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
> > To that end, the herd tool can make a diagram of what it thought
> > happened, and I have attached it.  I used this diagram to try and force
> > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > and succeeded.  Here is the sequence of events:
> > 
> > o   Commit P0's write.  The model offers to propagate this write
> > to the coherence point and to P1, but don't do so yet.
> > 
> > o   Commit P1's write.  Similar offers, but don't take them up yet.
> > 
> > o   Commit P0's lwsync.
> > 
> > o   Execute P0's lwarx, which reads a=0.  Then commit it.
> > 
> > o   Commit P0's stwcx. as successful.  This stores a=1.
> 
> On arm64, this is a conditional-store-*release* and therefore cannot be
> observed before the initial write to x...
> 
> > o   Commit P0's branch (not taken).
> > 
> > o   Commit P0's final register-to-register move.
> > 
> > o   Commit P1's sync instruction.
> > 
> > o   There is now nothing that can happen in either processor.
> > P0 is done, and P1 is waiting for its sync.  Therefore,
> > propagate P1's a=2 write to the coherence point and to
> > the other thread.
> 
> ... therefore this is illegal, because you haven't yet propagated that
> prior write...

I misread this as a propagation of PO's conditional store. What actually
happens on arm64, is that the early conditional store can only succeed
once it is placed into the coherence order of the location which it is
updating (but note that this is subtly different from multi-copy
atomicity!).

So, given that the previous conditional store succeeded, the coherence
order on A must be either {0, 1, 2} or {0, 2, 1}.

If it's {0, 1, 2} (as required by your complete example), that means
P1's a=2 write "observes" the conditional store by P0, and therefore
(because the conditional store has release semantics), also observes
P0's x=1 write.

On the other hand, if P1's a=2 write propagates first and we have a
coherence order of {0, 2, 1}, then P0 must have r3=2, because an
exclusive load returning zero would have led to a failed conditional
store thanks to the intervening write by P1.

I find it pretty weird if PPC allows the conditional store to succeed in
this way, as I think that would break simple cases like two threads
incrementing a shared variable in parallel:


""
{
0:r1=1; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
1:r1=1; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
}
P0 | P1 ;
lwarx  r11,r10,r12 | lwarx  r11,r10,r12 ;
add r11,r1,r11 | add r11,r1,r11 ;
stwcx. r11,r10,r12 | stwcx. r11,r10,r12 ;
bne Fail0  | bne Fail1  ;
mr r3,r1   | mr r3,r1   ;
Fail0: | Fail1: ;
exists
(0:r3=1 /\ a=1 /\ 1:r3=1)


Is also allowed by herd and forbidden by ppcmem, for example.

Will
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-15 Thread Boqun Feng
On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> On Wed, Oct 14, 2015 at 11:55:56PM +0800, Boqun Feng wrote:
> > According to memory-barriers.txt, xchg, cmpxchg and their atomic{,64}_
> > versions all need to imply a full barrier, however they are now just
> > RELEASE+ACQUIRE, which is not a full barrier.
> > 
> > So replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
> > PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
> > __{cmp,}xchg_{u32,u64} respectively to guarantee a full barrier
> > semantics of atomic{,64}_{cmp,}xchg() and {cmp,}xchg().
> > 
> > This patch is a complement of commit b97021f85517 ("powerpc: Fix
> > atomic_xxx_return barrier semantics").
> > 
> > Acked-by: Michael Ellerman 
> > Cc:  # 3.4+
> > Signed-off-by: Boqun Feng 
> > ---
> >  arch/powerpc/include/asm/cmpxchg.h | 16 
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/cmpxchg.h 
> > b/arch/powerpc/include/asm/cmpxchg.h
> > index ad6263c..d1a8d93 100644
> > --- a/arch/powerpc/include/asm/cmpxchg.h
> > +++ b/arch/powerpc/include/asm/cmpxchg.h
> > @@ -18,12 +18,12 @@ __xchg_u32(volatile void *p, unsigned long val)
> > unsigned long prev;
> > 
> > __asm__ __volatile__(
> > -   PPC_RELEASE_BARRIER
> > +   PPC_ATOMIC_ENTRY_BARRIER
> 
> This looks to be the lwsync instruction.
> 
> >  "1:lwarx   %0,0,%2 \n"
> > PPC405_ERR77(0,%2)
> >  "  stwcx.  %3,0,%2 \n\
> > bne-1b"
> > -   PPC_ACQUIRE_BARRIER
> > +   PPC_ATOMIC_EXIT_BARRIER
> 
> And this looks to be the sync instruction.
> 
> > : "=" (prev), "+m" (*(volatile unsigned int *)p)
> > : "r" (p), "r" (val)
> > : "cc", "memory");
> 
> Hmmm...
> 
> Suppose we have something like the following, where "a" and "x" are both
> initially zero:
> 
>   CPU 0   CPU 1
>   -   -
> 
>   WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
>   r3 = xchg(, 1);   smp_mb();
>   r3 = READ_ONCE(x);
> 
> If xchg() is fully ordered, we should never observe both CPUs'
> r3 values being zero, correct?
> 
> And wouldn't this be represented by the following litmus test?
> 
>   PPC SB+lwsync-RMW2-lwsync+st-sync-leading
>   ""
>   {
>   0:r1=1; 0:r2=x; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
>   1:r1=2; 1:r2=x; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
>   }
>P0 | P1 ;
>stw r1,0(r2)   | stw r1,0(r12)  ;
>lwsync | sync   ;
>lwarx  r11,r10,r12 | lwz r3,0(r2)   ;
>stwcx. r1,r10,r12  | ;
>bne Fail0  | ;
>mr r3,r11  | ;
>Fail0: | ;
>   exists
>   (0:r3=0 /\ a=2 /\ 1:r3=0)
> 
> I left off P0's trailing sync because there is nothing for it to order
> against in this particular litmus test.  I tried adding it and verified
> that it has no effect.
> 
> Am I missing something here?  If not, it seems to me that you need
> the leading lwsync to instead be a sync.
> 

If so, I will define PPC_ATOMIC_ENTRY_BARRIER as "sync" in the next
version of this patch, any concern?

Of course, I will wait to do that until we all understand this is
nececarry and agree to make the change.

> Of course, if I am not missing something, then this applies also to the
> value-returning RMW atomic operations that you pulled this pattern from.

For the value-returning RMW atomics, if the leading barrier is
necessarily to be "sync", I will just remove my __atomic_op_fence() in
patch 4, but I will remain patch 3 unchanged for the consistency of
__atomic_op_*() macros' definitions. Peter and Will, do that works for
you both?

Regards,
Boqun

> If so, it would seem that I didn't think through all the possibilities
> back when PPC_ATOMIC_EXIT_BARRIER moved to sync...  In fact, I believe
> that I worried about the RMW atomic operation acting as a barrier,
> but not as the load/store itself.  :-/
> 
>   Thanx, Paul
> 


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 1/3] ppc64: Fix warnings

2015-10-15 Thread Simon Horman
On Tue, Oct 06, 2015 at 05:55:48PM -0500, Scott Wood wrote:
> Produce a warning-free build on ppc64 (at least, when built as 64-bit
> userspace -- if a 64-bit binary for ppc64 is a requirement, why is -m64
> set only on purgatory?).  Mostly unused (or write-only) variable
> warnings, but also one nasty one where reserve() was used without a
> prototype, causing long long arguments to be passed as int.
> 
> Signed-off-by: Scott Wood 

Thanks, applied.

I would have slightly preferred one-patch per problem.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online

2015-10-15 Thread Michael Ellerman
On Thu, 2015-10-15 at 21:00 +0200, Laurent Vivier wrote:
> On kexec, all secondary offline CPUs are onlined before
> starting the new kernel, this is not done in the case of kdump.
> 
> If kdump is configured and a kernel crash occurs whereas
> some secondaries CPUs are offline (SMT=off),
> the new kernel is not able to start them and displays some
> "Processor X is stuck.".

Do we know why they are stuck?

I really don't like this fix. The reason we're doing a kdump is because the
first kernel has panicked, possibly with locks held or data structures
corrupted. Calling cpu_up() then goes and tries to run a bunch of code in the
crashed kernel, which increases the chance of us just wedging completely.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 3/3] ppc64: Add a flag to tell the kernel it's booting from kexec

2015-10-15 Thread Simon Horman
On Tue, Oct 06, 2015 at 05:55:50PM -0500, Scott Wood wrote:
> It needs to know this because the SMP release mechanism for Freescale
> book3e is different from when booting with normal hardware.  In theory
> we could simulate the normal spin table mechanism, but not (easily) at
> the addresses U-Boot put in the device tree -- so there'd need to be
> even more communication between the kernel and kexec to set that up.
> 
> Signed-off-by: Scott Wood 
> ---
> v2: Use a device tree property rather than setting a flag in the kernel
> image, as requested by Michael Ellerman.

I'd value a review of this from someone more familiar with ppc than I.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/3] ppc64: Avoid rfid if no need to clear MSR_LE

2015-10-15 Thread Simon Horman
On Tue, Oct 06, 2015 at 05:55:49PM -0500, Scott Wood wrote:
> Commit a304e2d82a8c3 ("ppc64: purgatory: Reset primary cpu endian to
> big-endian) changed bctr to rfid.  rfid is book3s-only and will cause a
> fatal exception on book3e.
> 
> Purgatory is an isolated environment which makes importing information
> about the subarch awkward, so instead rely on the fact that MSR_LE
> should never be set on book3e, and the rfid is only needed if MSR_LE is
> set (and thus needs to be cleared).  In theory that MSR bit is reserved
> on book3e, rather than zero, but in practice I have not seen it set.
> 
> Signed-off-by: Scott Wood 
> Cc: Samuel Mendoza-Jonas 
> ---
> v2: new patch

Could I get a review of this from at lest one ppc person?

>  purgatory/arch/ppc64/v2wrap.S | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/purgatory/arch/ppc64/v2wrap.S b/purgatory/arch/ppc64/v2wrap.S
> index 179ade9..3534080 100644
> --- a/purgatory/arch/ppc64/v2wrap.S
> +++ b/purgatory/arch/ppc64/v2wrap.S
> @@ -116,9 +116,17 @@ master:
>   stw 7,0x5c(4)   # and patch it into the kernel
>   mr  3,16# restore dt address
>  
> + mfmsr   5
> + andi.   10,5,1  # test MSR_LE
> + bne little_endian
> +
> + li  5,0 # r5 will be 0 for kernel
> + mtctr   4   # prepare branch to
> + bctr# start kernel
> + 
> +little_endian:   # book3s-only
>   mtsrr0  4   # prepare branch to
>  
> - mfmsr   5
>   clrrdi  5,5,1   # clear MSR_LE
>   mtsrr1  5
>  
> -- 
> 2.1.4
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] PPC: fix LOGMPP instruction opcode and inline asm

2015-10-15 Thread Michael Ellerman
On Fri, 2015-10-16 at 12:20 +1100, Stewart Smith wrote:
> Back in 9678cda when we started using the Micro Partition Prefetch Engine
> in POWER8 for KVM, there were two mistakes introduced from the original
> patch used for investigation and microbenchmarks.
> 
> One mistake was that the opcode was constructed incorrectly, putting
> the register in the wrong field in the opcode, meaning that we were
> asking the chip to read the memory address from some other register than
> what we intended - probably r0.

Where is the logmpp instruction documented?

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] PPC: fix LOGMPP instruction opcode and inline asm

2015-10-15 Thread Stewart Smith
Back in 9678cda when we started using the Micro Partition Prefetch Engine
in POWER8 for KVM, there were two mistakes introduced from the original
patch used for investigation and microbenchmarks.

One mistake was that the opcode was constructed incorrectly, putting
the register in the wrong field in the opcode, meaning that we were
asking the chip to read the memory address from some other register than
what we intended - probably r0. For those unfortunate enough to have r0
point somewhere in memory they cared about, the prefetch engine would
gleefully trash all over it leading to some data you cared about being
replaced with a list of physical addresses.

In addition, the logmpp inline function incorrectly used R1 rather than
%0, leading to even if we got the construction of the instruction right,
we'd still generate the wrong thing, looking at the address in r1 rather
than whatever we were asked to look at.

So, this patch fixes the following:
- the inline logmpp function's inline asm to be correct
- puts the register in the right field of the instruction

This bug would overwrite a single 64k page.

https://bugzilla.redhat.com/show_bug.cgi?id=1269653
https://bugzilla.redhat.com/show_bug.cgi?id=1271997

Cc: sta...@vger.kernel.org
Fixes: 9678cda ("Use the POWER8 Micro Partition Prefetch Engine in KVM HV")
Reported-by: David Gibson 
Reported-by: Benjamin Herrenschmidt 
Suggested-by: Benjamin Herrenschmidt 
Suggested-by: Paul Mackerras 
Signed-off-by: Stewart Smith 
---
 arch/powerpc/include/asm/cache.h  | 2 +-
 arch/powerpc/include/asm/ppc-opcode.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index 34a05a1a990b..3af1c1e35435 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -43,7 +43,7 @@ extern struct ppc64_caches ppc64_caches;
 
 static inline void logmpp(u64 x)
 {
-   asm volatile(PPC_LOGMPP(R1) : : "r" (x));
+   asm volatile(PPC_LOGMPP(%0) : : "r" (x));
 }
 
 #endif /* __powerpc64__ && ! __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 65136928a572..0dc2f6f9b445 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -304,8 +304,8 @@
 #define PPC_LDARX(t, a, b, eh) stringify_in_c(.long PPC_INST_LDARX | \
___PPC_RT(t) | ___PPC_RA(a) | \
___PPC_RB(b) | __PPC_EH(eh))
-#define PPC_LOGMPP(b)  stringify_in_c(.long PPC_INST_LOGMPP | \
-   __PPC_RB(b))
+#define PPC_LOGMPP(a)  stringify_in_c(.long PPC_INST_LOGMPP | \
+   ___PPC_RA(a))
 #define PPC_LWARX(t, a, b, eh) stringify_in_c(.long PPC_INST_LWARX | \
___PPC_RT(t) | ___PPC_RA(a) | \
___PPC_RB(b) | __PPC_EH(eh))
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online

2015-10-15 Thread kbuild test robot
Hi Laurent,

[auto build test ERROR on powerpc/next -- if it's inappropriate base, please 
suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/Laurent-Vivier/powerpc-on-crash-kexec-ed-kernel-needs-all-CPUs-are-online/20151016-030306
config: powerpc-wii_defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/crash.c: In function 'wake_offline_cpus':
>> arch/powerpc/kernel/crash.c:315:4: error: implicit declaration of function 
>> 'cpu_up' [-Werror=implicit-function-declaration]
   cpu_up(cpu);
   ^
   cc1: all warnings being treated as errors

vim +/cpu_up +315 arch/powerpc/kernel/crash.c

   309  {
   310  int cpu = 0;
   311  
   312  for_each_present_cpu(cpu) {
   313  if (!cpu_online(cpu)) {
   314  pr_info("kexec: Waking offline cpu %d.\n", cpu);
 > 315  cpu_up(cpu);
   316  }
   317  }
   318  }

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] PPC: fix LOGMPP instruction opcode and inline asm

2015-10-15 Thread Stewart Smith
Michael Ellerman  writes:
> On Fri, 2015-10-16 at 12:20 +1100, Stewart Smith wrote:
>> Back in 9678cda when we started using the Micro Partition Prefetch Engine
>> in POWER8 for KVM, there were two mistakes introduced from the original
>> patch used for investigation and microbenchmarks.
>> 
>> One mistake was that the opcode was constructed incorrectly, putting
>> the register in the wrong field in the opcode, meaning that we were
>> asking the chip to read the memory address from some other register than
>> what we intended - probably r0.
>
> Where is the logmpp instruction documented?

BookIV Section 5.1 - although I think this is now meant to all be in the
"P8 User Manual".

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-15 Thread Nishanth Aravamudan
On 14.10.2015 [08:42:51 -0700], Christoph Hellwig wrote:
> Hi Nishanth,
> 
> sorry for the late reply.
> 
> > > On Power, since it's technically variable, we'd need a function. So are
> > > you suggesting define'ing it to a function just on Power and leaving it
> > > a constant elsewhere?
> > > 
> > > I noticed that sparc has a IOMMU_PAGE_SHIFT already, fwiw.
> > 
> > Sorry, I should have been more specific -- I'm ready to spin out a v3,
> > with a sparc-specific function.
> > 
> > Are you ok with leaving it a function for now (the only caller is in
> > NVMe obviously).
> 
> 
> I guess we do indeed need a function then.  I'll take a look at your
> patch, but as long you found a way to avoid adding too much boilerplate
> code it should be fine.

Ok, so I've got the moved function (include/linux/dma-mapping.h instead
of dma-mapping-common.h) ready to go, which should only involve changing
the first patch in the series. But I'm really mystified by what to do
for sparc, which defines IOMMU_PAGE_SHIFT and IO_PAGE_SHIFT in
arch/sparc/kernel/iommu_common.h.

1) Which constant reflects the value we mean for this function on sparc?
I assume it should be IOMMU_PAGE_SHIFT, but they are the same value and
I want to make sure I get the semantics right.

2) Where would I put sparc's definition of dma_get_page_shift()? Should
it be in a asm/dma-mapping.h? Should we move some of the constants from
arch/sparc/kernel/iommu_common.h to
arch/sparc/include/asm/iommu_common.h and then #include that in
asm/dma-mapping.h?

Dave M., any opinions/insights? Essentially, this helper function
assists the NVMe driver in determining what page size it should use to
satisfy both the device and IOMMU's requirements. Maybe I misunderstand
the constants on sparc and PAGE_SHIFT is fine there too?

-Nish

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC] powerpc/numa: Use VPHN based node ID information on shared processor LPARs

2015-10-15 Thread Michael Ellerman
On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
> dynamic virtual-physical mapping for any given processor. Currently we
> use VPHN node ID information only after getting either a PRRN or a VPHN
> event. But during boot time inside the function numa_setup_cpu, we still
> query the OF device tree for the node ID value which might be different
> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
> scenario where there are no PRRN or VPHN event after boot, all node-cpu
> mapping will remain incorrect there after.
> 
> With this proposed change, numa_setup_cpu will try to override the OF
> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
> hcall fetched node ID value. Right now shared processor property of the
> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
> during boot time. So initmem_init function has been moved after ppc_md.
> setup_arch inside setup_arch during boot.
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 8b9502a..e404d05 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -553,6 +557,17 @@ static int numa_setup_cpu(unsigned long lcpu)
>  
>   nid = of_node_to_nid_single(cpu);
>  
> + /*
> +  * Override the OF device tree fetched node number
> +  * with VPHN based node number in case of a shared
> +  * processor LPAR on PHYP platform.
> +  */
> +#ifdef CONFIG_PPC_SPLPAR
> + if (lppaca_shared_proc(get_lppaca())) {
> + nid = vphn_get_node(lcpu);
> + }
> +#endif


That logic exposes a potential problem which you don't seem to have addressed.

You're not updating the logic in of_node_to_nid[_single](), instead you're
overriding it in *this one location*. But what about other code that uses
of_node_to_nid()? It will still get the old device-tree value and so will have
the wrong nid, won't it?

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC] powerpc/numa: Use VPHN based node ID information on shared processor LPARs

2015-10-15 Thread Michael Ellerman
On Wed, 2015-10-14 at 15:43 +0530, Anshuman Khandual wrote:
> On 10/14/2015 02:49 PM, Michael Ellerman wrote:
> > On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
> >> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
> >> dynamic virtual-physical mapping for any given processor. Currently we
> >> use VPHN node ID information only after getting either a PRRN or a VPHN
> >> event. But during boot time inside the function numa_setup_cpu, we still
> >> query the OF device tree for the node ID value which might be different
> >> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
> >> scenario where there are no PRRN or VPHN event after boot, all node-cpu
> >> mapping will remain incorrect there after.
> >>
> >> With this proposed change, numa_setup_cpu will try to override the OF
> >> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
> >> hcall fetched node ID value. Right now shared processor property of the
> >> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
> >> during boot time. So initmem_init function has been moved after ppc_md.
> >> setup_arch inside setup_arch during boot.
> > 
> > I would be *very* reluctant to change the order of initmem_init() vs
> > setup_arch().
> > 
> > At a minimum you'd need to go through every setup_arch() implementation and
> > carefully determine if the ordering of what it does matters vs 
> > initmem_init().
> > And then you'd need to test on every affected platform.
> > 
> > So I suggest you think of a different way to do it if at all possible.
> 
> vpa_init() is being called inside pSeries_setup_arch which is ppc_md
> .setup_arch for the platform. Its called directly for the boot cpu
> and through smp_init_pseries_xics for other cpus on the system. Not
> sure what is the reason behind calling vpa_init() from XICS init
> though.
> 
> If we can move all these vpa_init() calls from pSeries_setup_arch
> to initmem_init just before calling numa_setup_cpu, the VPA area
> would be initialized when we need it during boot. Will look in
> this direction.

Back up a bit. The dependency on vpa_init() is only because you want to call
lppaca_shared_proc() right?

But do you really need to? What happens if you call VPHN on a non-shared proc
machine? Does it 1) give you something sane or 2) give you an error or 3) give
you a junk value?

If it's either of 1 or 2 then you should be OK to just call it. You either use
the value it returned which is sane or you see the error and just fall back to
the device tree nid.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/5] clk: qoriq: Add ls2080a support.

2015-10-15 Thread Stephen Boyd
On 09/19, Scott Wood wrote:
> LS2080A is the first implementation of the chassis 3 clockgen, which
> has a different register layout than previous chips.  It is also little
> endian, unlike previous chips.
> 
> Signed-off-by: Scott Wood 
> ---

Acked-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 4/5] clk: Add consumer APIs for discovering possible parent clocks

2015-10-15 Thread Scott Wood
On Sat, 2015-09-19 at 23:29 -0500, Scott Wood wrote:
> Commit fc4a05d4b0eb ("clk: Remove unused provider APIs") removed
> __clk_get_num_parents() and clk_hw_get_parent_by_index(), leaving only
> true provider API versions that operate on struct clk_hw.
> 
> qoriq-cpufreq needs these functions in order to determine the options
> it has for calling clk_set_parent() and thus populate the cpufreq
> table, so revive them as legitimate consumer APIs.
> 
> Signed-off-by: Scott Wood 
> ---
> v3: new patch
> 
>  drivers/clk/clk.c   | 19 +++
>  include/linux/clk.h | 31 +++
>  2 files changed, 50 insertions(+)

Russell, could you ACK this if there are no objections?

Thanks,
Scott

> 
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 43e2c3a..9436356 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -290,6 +290,12 @@ struct clk_hw *__clk_get_hw(struct clk *clk)
>  }
>  EXPORT_SYMBOL_GPL(__clk_get_hw);
>  
> +unsigned int clk_get_num_parents(struct clk *clk)
> +{
> + return !clk ? 0 : clk->core->num_parents;
> +}
> +EXPORT_SYMBOL_GPL(clk_get_num_parents);
> +
>  unsigned int clk_hw_get_num_parents(const struct clk_hw *hw)
>  {
>   return hw->core->num_parents;
> @@ -359,6 +365,19 @@ static struct clk_core 
> *clk_core_get_parent_by_index(struct clk_core *core,
>   return core->parents[index];
>  }
>  
> +struct clk *clk_get_parent_by_index(struct clk *clk, unsigned int index)
> +{
> + struct clk_core *parent;
> +
> + if (!clk)
> + return NULL;
> +
> + parent = clk_core_get_parent_by_index(clk->core, index);
> +
> + return !parent ? NULL : parent->hw->clk;
> +}
> +EXPORT_SYMBOL_GPL(clk_get_parent_by_index);
> +
>  struct clk_hw *
>  clk_hw_get_parent_by_index(const struct clk_hw *hw, unsigned int index)
>  {
> diff --git a/include/linux/clk.h b/include/linux/clk.h
> index 0df4a51..937de0e 100644
> --- a/include/linux/clk.h
> +++ b/include/linux/clk.h
> @@ -392,6 +392,26 @@ int clk_set_parent(struct clk *clk, struct clk 
> *parent);
>  struct clk *clk_get_parent(struct clk *clk);
>  
>  /**
> + * clk_get_parent_by_index - get a possible parent clock by index
> + * @clk: clock source
> + * @index: index into the array of possible parents of this clock
> + *
> + * Returns struct clk corresponding to the requested possible
> + * parent clock source, or NULL.
> + */
> +struct clk *clk_get_parent_by_index(struct clk *clk,
> + unsigned int index);
> +
> +/**
> + * clk_get_num_parents - get number of possible parents
> + * @clk: clock source
> + *
> + * Returns the number of possible parents of this clock,
> + * which can then be enumerated using clk_get_parent_by_index().
> + */
> +unsigned int clk_get_num_parents(struct clk *clk);
> +
> +/**
>   * clk_get_sys - get a clock based upon the device name
>   * @dev_id: device name
>   * @con_id: connection ID
> @@ -461,6 +481,17 @@ static inline struct clk *clk_get_parent(struct clk 
> *clk)
>   return NULL;
>  }
>  
> +struct clk *clk_get_parent_by_index(struct clk *clk,
> + unsigned int index)
> +{
> + return NULL;
> +}
> +
> +unsigned int clk_get_num_parents(struct clk *clk)
> +{
> + return 0;
> +}
> +
>  #endif
>  
>  /* clk_prepare_enable helps cases using clk_enable in non-atomic context. 
> */
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/5] clk: qoriq: Move chip-specific knowledge into driver

2015-10-15 Thread Stephen Boyd
On 09/19, Scott Wood wrote:
> The device tree should describe the chips (or chip-like subblocks) in
> the system, but it generally does not describe individual registers --
> it should identify, rather than describe, a programming interface.
> 
> This has not been the case with the QorIQ clockgen nodes.  The
> knowledge of what each bit setting of CLKCnCSR means is encoded in
> three places (binding, pll node, and mux node), and the last also needs
> to know which options are valid on a particular chip.  All three of
> these locations are considered stable ABI, making it difficult to fix
> mistakes (of which I have found several), much less refactor the
> abstraction to be able to address problems, limitations, or new chips.
> 
> Under the current binding, a pll clock specifier of 2 means that the
> PLL is divided by 4 -- and the driver implements this, unless there
> happen to be four clock-output-names rather than 3, in which case it
> interprets it as PLL divided by 3.  This does not appear in the binding
> documentation at all.  That hack is now considered stable ABI.
> 
> The current device tree nodes contain errors, such as saying that
> T1040 can set a core clock to PLL/4 when only PLL and PLL/2 are options.
> The current binding also ignores some restrictions on clock selection,
> such as p5020's requirement that if a core uses the "wrong" PLL, that
> PLL must be clocked lower than the "correct" PLL and be at most 80% of
> the rated CPU frequency.
> 
> Possibly because of the lack of the ability to express such nuance in
> the binding, some valid options are omitted from the device trees, such
> as the ability on p4080 to run cores 0-3 from PLL3 and cores 4-7 from
> PLL1 (again, only if they are at most 80% of rated CPU frequency).
> This omission, combined with excessive caution in the cpufreq driver
> (addressed in a subsequent patch), means that currently on a 1500 MHz
> p4080 with typical PLL configuration, cpufreq can lower the frequency
> to 1200 MHz on half the CPUs and do nothing on the others.  With this
> patchset, all CPUs can be lowered to 1200 MHz on a rev2 p4080, and on a
> rev3 p4080 half can be lowered to 750 MHz and the other half to 600
> MHz.
> 
> The current binding only deals with CPU clocks.  To describe FMan in
> the device tree, we need to describe its clock.  Some chips have
> additional muxes that work like the CPU muxes, but are not described in
> the device tree.  Others require inspecting the Reset Control Word to
> determine which PLL is used.  Rather than continue to extend this mess,
> replace it.  Have the driver bind to the chip-specific clockgen
> compatible, and keep the detailed description of quirky chip variations
> in the driver, where it can be easily fixed, refactored, and extended.
> 
> Older device trees will continue to work (including a workaround for
> old ls1021a device trees that are missing compatible and reg in the
> clockgen node, which even the old binding required).  The pll/mux
> details in old device trees will be ignored, but "clocks" properties
> pointing at the old nodes will still work, and be directed at the
> corresponding new clock.
> 
> Signed-off-by: Scott Wood 
> ---

Acked-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online

2015-10-15 Thread David Gibson
On Thu, 15 Oct 2015 21:00:58 +0200
Laurent Vivier  wrote:

> On kexec, all secondary offline CPUs are onlined before
> starting the new kernel, this is not done in the case of kdump.
> 
> If kdump is configured and a kernel crash occurs whereas
> some secondaries CPUs are offline (SMT=off),
> the new kernel is not able to start them and displays some
> "Processor X is stuck.".
> 
> Starting with POWER8, subcore logic relies on all threads of
> core being booted. So, on startup kernel tries to start all
> threads, and asks OPAL (or RTAS) to start all CPUs (including
> threads). If a CPU has been offlined by the previous kernel,
> it has not been returned to OPAL, and thus OPAL cannot restart
> it: this CPU has been lost...
> 
> Signed-off-by: Laurent Vivier 

Nice analysis of the problem.  But, I'm a bit uneasy about this approach
to fixing it: Onlining potentially hundreds of CPU threads seems like
a risky operation in a kernel that's already crashed.

I don't have a terribly clear idea of what is the best way to address
this.  Here's a few ideas in the right general direction:

  * I'm already looking into a kdump userspace fixes to stop it
attempting to bring up secondary CPUs

  * A working kernel option to say "only allow this many online cpus
ever" which we could pass to the kdump kernel would be nice

  * Paulus had an idea about offline threads returning themselves
directly to OPAL by kicking a flag at kdump/kexec time.


BenH, Paulus,

OPAL <-> kernel cpu transitions don't seem to work quite how I thought
they would.  IIUC there's a register we can use to directly control
which threads on a core are active.  Given that I would have thought
cpu "ownership" OPAL vs. kernel would be on a per-core, rather than
per-thread basis.

Is there some way we can change the CPU onlining / offlining code so
that if threads aren't in OPAL, we directly enable them, rather than
just hoping they're in a nap loop somewhere?

-- 
David Gibson 
Senior Software Engineer, Virtualization, Red Hat


pgp9V7t6haiTA.pgp
Description: OpenPGP digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC] powerpc/numa: Use VPHN based node ID information on shared processor LPARs

2015-10-15 Thread Anshuman Khandual
On 10/16/2015 07:54 AM, Michael Ellerman wrote:
> On Wed, 2015-10-14 at 15:43 +0530, Anshuman Khandual wrote:
>> On 10/14/2015 02:49 PM, Michael Ellerman wrote:
>>> On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
 On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
 dynamic virtual-physical mapping for any given processor. Currently we
 use VPHN node ID information only after getting either a PRRN or a VPHN
 event. But during boot time inside the function numa_setup_cpu, we still
 query the OF device tree for the node ID value which might be different
 than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
 scenario where there are no PRRN or VPHN event after boot, all node-cpu
 mapping will remain incorrect there after.

 With this proposed change, numa_setup_cpu will try to override the OF
 device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
 hcall fetched node ID value. Right now shared processor property of the
 LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
 during boot time. So initmem_init function has been moved after ppc_md.
 setup_arch inside setup_arch during boot.
>>>
>>> I would be *very* reluctant to change the order of initmem_init() vs
>>> setup_arch().
>>>
>>> At a minimum you'd need to go through every setup_arch() implementation and
>>> carefully determine if the ordering of what it does matters vs 
>>> initmem_init().
>>> And then you'd need to test on every affected platform.
>>>
>>> So I suggest you think of a different way to do it if at all possible.
>>
>> vpa_init() is being called inside pSeries_setup_arch which is ppc_md
>> .setup_arch for the platform. Its called directly for the boot cpu
>> and through smp_init_pseries_xics for other cpus on the system. Not
>> sure what is the reason behind calling vpa_init() from XICS init
>> though.
>>
>> If we can move all these vpa_init() calls from pSeries_setup_arch
>> to initmem_init just before calling numa_setup_cpu, the VPA area
>> would be initialized when we need it during boot. Will look in
>> this direction.
> 
> Back up a bit. The dependency on vpa_init() is only because you want to call
> lppaca_shared_proc() right?

Right.

> 
> But do you really need to? What happens if you call VPHN on a non-shared proc
> machine? Does it 1) give you something sane or 2) give you an error or 3) give
> you a junk value?
> 
> If it's either of 1 or 2 then you should be OK to just call it. You either use
> the value it returned which is sane or you see the error and just fall back to
> the device tree nid.

Most probably it will be a sane value without any error. But the
decision to override the DT fetched value will be based on whether
we are running on a shared processor LPAR or not. Hence dependency
on lppaca_shared_proc(). In case of error from VPHN on a shared
processor LPAR, we will still have the DT fetched value to fall
back on (will update the logic in the patch for this).

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC] powerpc/numa: Use VPHN based node ID information on shared processor LPARs

2015-10-15 Thread Anshuman Khandual
On 10/16/2015 07:57 AM, Michael Ellerman wrote:
> On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
>> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
>> dynamic virtual-physical mapping for any given processor. Currently we
>> use VPHN node ID information only after getting either a PRRN or a VPHN
>> event. But during boot time inside the function numa_setup_cpu, we still
>> query the OF device tree for the node ID value which might be different
>> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
>> scenario where there are no PRRN or VPHN event after boot, all node-cpu
>> mapping will remain incorrect there after.
>>
>> With this proposed change, numa_setup_cpu will try to override the OF
>> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
>> hcall fetched node ID value. Right now shared processor property of the
>> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
>> during boot time. So initmem_init function has been moved after ppc_md.
>> setup_arch inside setup_arch during boot.
>>
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index 8b9502a..e404d05 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -553,6 +557,17 @@ static int numa_setup_cpu(unsigned long lcpu)
>>  
>>  nid = of_node_to_nid_single(cpu);
>>  
>> +/*
>> + * Override the OF device tree fetched node number
>> + * with VPHN based node number in case of a shared
>> + * processor LPAR on PHYP platform.
>> + */
>> +#ifdef CONFIG_PPC_SPLPAR
>> +if (lppaca_shared_proc(get_lppaca())) {
>> +nid = vphn_get_node(lcpu);
>> +}
>> +#endif
> 
> 
> That logic exposes a potential problem which you don't seem to have addressed.

You are right.

> 
> You're not updating the logic in of_node_to_nid[_single](), instead you're
> overriding it in *this one location*. But what about other code that uses
> of_node_to_nid()? It will still get the old device-tree value and so will have
> the wrong nid, won't it?

Yeah it will. of_node_to_nid() calls of_node_to_nid_single(). So we
can move in this VPHN override logic inside of_node_to_nid_single to
make it available across the board. But the original problem of timing
of vpa_init() still remains to make lppaca_shared_proc() check available
during boot time inside numa_setup_cpu() function.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-15 Thread Paul E. McKenney
On Thu, Oct 15, 2015 at 03:50:44PM +0100, Will Deacon wrote:
> On Thu, Oct 15, 2015 at 11:35:10AM +0100, Will Deacon wrote:
> > Dammit guys, it's never simple is it?
> 
> I re-read this and it's even more confusing than I first thought.
> 
> > On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
> > > To that end, the herd tool can make a diagram of what it thought
> > > happened, and I have attached it.  I used this diagram to try and force
> > > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > > and succeeded.  Here is the sequence of events:
> > > 
> > > o Commit P0's write.  The model offers to propagate this write
> > >   to the coherence point and to P1, but don't do so yet.
> > > 
> > > o Commit P1's write.  Similar offers, but don't take them up yet.
> > > 
> > > o Commit P0's lwsync.
> > > 
> > > o Execute P0's lwarx, which reads a=0.  Then commit it.
> > > 
> > > o Commit P0's stwcx. as successful.  This stores a=1.
> > 
> > On arm64, this is a conditional-store-*release* and therefore cannot be
> > observed before the initial write to x...
> > 
> > > o Commit P0's branch (not taken).
> > > 
> > > o Commit P0's final register-to-register move.
> > > 
> > > o Commit P1's sync instruction.
> > > 
> > > o There is now nothing that can happen in either processor.
> > >   P0 is done, and P1 is waiting for its sync.  Therefore,
> > >   propagate P1's a=2 write to the coherence point and to
> > >   the other thread.
> > 
> > ... therefore this is illegal, because you haven't yet propagated that
> > prior write...
> 
> I misread this as a propagation of PO's conditional store. What actually
> happens on arm64, is that the early conditional store can only succeed
> once it is placed into the coherence order of the location which it is
> updating (but note that this is subtly different from multi-copy
> atomicity!).
> 
> So, given that the previous conditional store succeeded, the coherence
> order on A must be either {0, 1, 2} or {0, 2, 1}.
> 
> If it's {0, 1, 2} (as required by your complete example), that means
> P1's a=2 write "observes" the conditional store by P0, and therefore
> (because the conditional store has release semantics), also observes
> P0's x=1 write.
> 
> On the other hand, if P1's a=2 write propagates first and we have a
> coherence order of {0, 2, 1}, then P0 must have r3=2, because an
> exclusive load returning zero would have led to a failed conditional
> store thanks to the intervening write by P1.
> 
> I find it pretty weird if PPC allows the conditional store to succeed in
> this way, as I think that would break simple cases like two threads
> incrementing a shared variable in parallel:
> 
> 
> ""
> {
> 0:r1=1; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
> 1:r1=1; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
> }
> P0 | P1 ;
> lwarx  r11,r10,r12 | lwarx  r11,r10,r12 ;
> add r11,r1,r11 | add r11,r1,r11 ;
> stwcx. r11,r10,r12 | stwcx. r11,r10,r12 ;
> bne Fail0  | bne Fail1  ;
> mr r3,r1   | mr r3,r1   ;
> Fail0: | Fail1: ;
> exists
> (0:r3=1 /\ a=1 /\ 1:r3=1)
> 
> 
> Is also allowed by herd and forbidden by ppcmem, for example.

I had no idea that either herd or ppcmem knew about the add instruction.
It looks like they at least try to understand.  Needless to say, in this
case I agree with ppcmem.

Thanx, Paul

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-15 Thread Paul E. McKenney
On Thu, Oct 15, 2015 at 12:48:03PM +0800, Boqun Feng wrote:
> On Wed, Oct 14, 2015 at 08:07:05PM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 15, 2015 at 08:53:21AM +0800, Boqun Feng wrote:
> [snip]
> > > 
> > > I'm afraid more than that, the above litmus also shows that
> > > 
> > >   CPU 0   CPU 1
> > >   -   -
> > > 
> > >   WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > >   r3 = xchg_release(, 1);   smp_mb();
> > >   r3 = READ_ONCE(x);
> > > 
> > >   (0:r3 == 0 && 1:r3 == 0 && a == 2) is not prohibitted
> > > 
> > > in the implementation of this patchset, which should be disallowed by
> > > the semantics of RELEASE, right?
> > 
> > Not necessarily.  If you had the read first on CPU 1, and you had a
> > similar problem, I would be more worried.
> > 
> 
> Sometimes I think maybe we should say that a single unpaired ACQUIRE or
> RELEASE doesn't have any order guarantee because of the above case.
> 
> But seems that's not a normal or even existing case, my bad ;-(
> 
> > > And even:
> > > 
> > >   CPU 0   CPU 1
> > >   -   -
> > > 
> > >   WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > >   smp_store_release(, 1);   smp_mb();
> > >   r3 = READ_ONCE(x);
> > > 
> > >   (1:r3 == 0 && a == 2) is not prohibitted
> > > 
> > > shows by:
> > > 
> > >   PPC weird-lwsync
> > >   ""
> > >   {
> > >   0:r1=1; 0:r2=x; 0:r3=3; 0:r12=a;
> > >   1:r1=2; 1:r2=x; 1:r3=3; 1:r12=a;
> > >   }
> > >P0 | P1 ;
> > >stw r1,0(r2)   | stw r1,0(r12)  ;
> > >lwsync | sync   ;
> > >stw  r1,0(r12) | lwz r3,0(r2)   ;
> > >   exists
> > >   (a=2 /\ 1:r3=0)
> > > 
> > > Please find something I'm (or the tool is) missing, maybe we can't use
> > > (a == 2) as a indication that STORE on CPU 1 happens after STORE on CPU
> > > 0?
> > 
> > Again, if you were pairing the smp_store_release() with an 
> > smp_load_acquire()
> > or even a READ_ONCE() followed by a barrier, I would be quite concerned.
> > I am not at all worried about the above two litmus tests.
> > 
> 
> Understood, thank you for think through that ;-)
> 
> > > And there is really something I find strange, see below.
> > > 
> > > > > 
> > > > > So the scenario that would fail would be this one, right?
> > > > > 
> > > > > a = x = 0
> > > > > 
> > > > >   CPU0CPU1
> > > > > 
> > > > >   r3 = load_locked ();
> > > > >   a = 2;
> > > > >   sync();
> > > > >   r3 = x;
> > > > >   x = 1;
> > > > >   lwsync();
> > > > >   if (!store_cond(, 1))
> > > > >   goto again
> > > > > 
> > > > > 
> > > > > Where we hoist the load way up because lwsync allows this.
> > > > 
> > > > That scenario would end up with a==1 rather than a==2.
> > > > 
> > > > > I always thought this would fail because CPU1's store to @a would fail
> > > > > the store_cond() on CPU0 and we'd do the 'again' thing, re-issuing the
> > > > > load and now seeing the new value (2).
> > > > 
> > > > The stwcx. failure was one thing that prevented a number of other
> > > > misordering cases.  The problem is that we have to let go of the notion
> > > > of an implicit global clock.
> > > > 
> > > > To that end, the herd tool can make a diagram of what it thought
> > > > happened, and I have attached it.  I used this diagram to try and force
> > > > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > > > and succeeded.  Here is the sequence of events:
> > > > 
> > > > o   Commit P0's write.  The model offers to propagate this write
> > > > to the coherence point and to P1, but don't do so yet.
> > > > 
> > > > o   Commit P1's write.  Similar offers, but don't take them up yet.
> > > > 
> > > > o   Commit P0's lwsync.
> > > > 
> > > > o   Execute P0's lwarx, which reads a=0.  Then commit it.
> > > > 
> > > > o   Commit P0's stwcx. as successful.  This stores a=1.
> > > > 
> > > > o   Commit P0's branch (not taken).
> > > 
> > > So at this point, P0's write to 'a' has propagated to P1, right? But
> > > P0's write to 'x' hasn't, even there is a lwsync between them, right?
> > > Doesn't the lwsync prevent this from happening?
> > 
> > No, because lwsync is quite a bit weaker than sync aside from just
> > the store-load ordering.
> > 
> 
> Understood, I've tried the ppcmem, much clear now ;-)
> 
> > > If at this point P0's write to 'a' hasn't propagated then when?
> > 
> > Later.  At the very end of the test, in this case.  ;-)
> > 
> 
> Hmm.. I tried exactly this sequence in ppcmem, seems propagation of P0's
> write to 'a' is never an option...
> 
> > Why not try creating a longer litmus test that requires P0's write to
> > "a" to 

[PATCH] scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building with O=

2015-10-15 Thread Michael Ellerman
My recent commit d2036f30cfe1 ("scripts/kconfig/Makefile: Allow
KBUILD_DEFCONFIG to be a target"), contained a bug in that when it
checks if KBUILD_DEFCONFIG is a file it forgets to prepend $(srctree) to
the path.

This causes the build to fail when building out of tree (with O=), and
when the value of KBUILD_DEFCONFIG is 'defconfig'. In that case we will
fail to find the 'defconfig' file, because we look in the build
directory not $(srctree), and so we will call Make again with
'defconfig' as the target. From there we loop infinitely calling 'make
defconfig' again and again.

The fix is simple, we need to look for the file under $(srctree).

Fixes: d2036f30cfe1 ("scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a 
target")
Reported-by: Olof Johansson 
Signed-off-by: Michael Ellerman 
---
 scripts/kconfig/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


This works for me and is a minimal fix. I'll merge this into the powerpc#next
branch unless anyone yells.

diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
index b2b9c87cec50..3043d6b0b51d 100644
--- a/scripts/kconfig/Makefile
+++ b/scripts/kconfig/Makefile
@@ -96,7 +96,7 @@ savedefconfig: $(obj)/conf
 defconfig: $(obj)/conf
 ifeq ($(KBUILD_DEFCONFIG),)
$< $(silent) --defconfig $(Kconfig)
-else ifneq ($(wildcard arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)),)
+else ifneq ($(wildcard 
$(srctree)/arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)),)
@$(kecho) "*** Default configuration is based on '$(KBUILD_DEFCONFIG)'"
$(Q)$< $(silent) 
--defconfig=arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG) $(Kconfig)
 else
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] book3s_hv: Handle H_DOORBELL on the guest exit path

2015-10-15 Thread Gautham R. Shenoy
Currently a CPU running a guest can receive a H_DOORBELL in the
following two cases:
1) When the CPU is napping due to CEDE or there not being a guest
vcpu.
2) The CPU is running the guest vcpu.

Case 1), the doorbell message is not cleared since we were waking up
from nap. Hence when the EE bit gets set on transition from guest to
host, the H_DOORBELL interrupt is delivered to the host and the
corresponding handler is invoked.

However in Case 2), the message gets cleared by the action of taking
the H_DOORBELL interrupt. Since the CPU was running a guest, instead
of invoking the doorbell handler, the code invokes the second-level
interrupt handler to switch the context from the guest to the host. At
this point the setting of the EE bit doesn't result in the CPU getting
the doorbell interrupt since it has already been delivered once. So,
the handler for this doorbell is never invoked!

This causes softlockups if the missed DOORBELL was as IPI sent from a
sibling subcore CPU.

This patch fixes it by explitly invoking the doorbell handler on the
exit path if the exit reason is H_DOORBELL similar to the way an
EXTERNAL interrupt is handled. Since this will also handle Case 1), we
can unconditionally clear the doorbell message in
kvmppc_check_wake_reason.

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b98889e..106c7f9 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -150,6 +150,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
beq 11f
+   cmpwi   r12, BOOK3S_INTERRUPT_H_DOORBELL
+   beq 15f /* Invoke the H_DOORBELL handler */
cmpwi   cr2, r12, BOOK3S_INTERRUPT_HMI
beq cr2, 14f/* HMI check */
 
@@ -174,6 +176,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_HSRR1, r7
b   hmi_exception_after_realmode
 
+15:mtspr SPRN_HSRR0, r8
+   mtspr SPRN_HSRR1, r7
+   ba0xe80
+
 kvmppc_primary_no_guest:
/* We handle this much like a ceded vcpu */
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
@@ -2436,14 +2442,19 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
/* hypervisor doorbell */
 3: li  r12, BOOK3S_INTERRUPT_H_DOORBELL
+
+   /*
+* Clear the doorbell as we will invoke the handler
+* explicitly in the guest exit path.
+*/
+   lis r6, (PPC_DBELL_SERVER << (63-36))@h
+   PPC_MSGCLR(6)
/* see if it's a host IPI */
li  r3, 1
lbz r0, HSTATE_HOST_IPI(r13)
cmpwi   r0, 0
bnelr
-   /* if not, clear it and return -1 */
-   lis r6, (PPC_DBELL_SERVER << (63-36))@h
-   PPC_MSGCLR(6)
+   /* if not, return -1 */
li  r3, -1
blr
 
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online

2015-10-15 Thread Laurent Vivier
On kexec, all secondary offline CPUs are onlined before
starting the new kernel, this is not done in the case of kdump.

If kdump is configured and a kernel crash occurs whereas
some secondaries CPUs are offline (SMT=off),
the new kernel is not able to start them and displays some
"Processor X is stuck.".

Starting with POWER8, subcore logic relies on all threads of
core being booted. So, on startup kernel tries to start all
threads, and asks OPAL (or RTAS) to start all CPUs (including
threads). If a CPU has been offlined by the previous kernel,
it has not been returned to OPAL, and thus OPAL cannot restart
it: this CPU has been lost...

Signed-off-by: Laurent Vivier 
---
 arch/powerpc/kernel/crash.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
index 51dbace..3ca9452 100644
--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -299,11 +300,30 @@ int crash_shutdown_unregister(crash_shutdown_t handler)
 }
 EXPORT_SYMBOL(crash_shutdown_unregister);
 
+/*
+ * The next kernel will try to start all secondary CPUs and if
+ * there are not online it will fail to start them.
+ *
+ */
+static void wake_offline_cpus(void)
+{
+   int cpu = 0;
+
+   for_each_present_cpu(cpu) {
+   if (!cpu_online(cpu)) {
+   pr_info("kexec: Waking offline cpu %d.\n", cpu);
+   cpu_up(cpu);
+   }
+   }
+}
+
 void default_machine_crash_shutdown(struct pt_regs *regs)
 {
unsigned int i;
int (*old_handler)(struct pt_regs *regs);
 
+   wake_offline_cpus();
+
/*
 * This function is only called after the system
 * has panicked or is otherwise in a critical state.
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev