date:20180213

Re: [bug report] ocxl: Add AFU interrupt support

2018-02-13 Thread Dan Carpenter

On Tue, Feb 13, 2018 at 08:29:26PM +0100, Frederic Barrat wrote:
> Hi,
> 
> Thanks for the report. I'll fix the first issue. The 2nd is already on its
> way to upstream:
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=dedab7f0d3137441a97fe7cf9b9ca5
> 
> (though we still have a useless cast in there; will fix as well).
> 
> May I ask what static checker you're using?
> 

These are Smatch warnings.

regards,
dan carpenter

Re: 4.16-rc1 virtual machine crash on boot

2018-02-13 Thread Cyril Bur

On Tue, 2018-02-13 at 21:12 -0800, Tyrel Datwyler wrote:
> On 02/13/2018 05:20 PM, Cyril Bur wrote:
> > Hello all,
> 
> Does reverting commit 02ef6dd8109b581343ebeb1c4c973513682535d6 alleviate the 
> issue?
> 

Hi Tyrel,

No it doesn't. Same backtrace.

> -Tyrel
> 
> > 
> > I'm seeing this crash trying to boot a KVM virtual machine. This kernel
> > was compiled with pseries_le_defconfig and run using the following qemu
> > commandline:
> > 
> > qemu-system-ppc64 -enable-kvm -cpu POWER8 -smp 4 -m 4G -M pseries
> > -nographic -vga none -drive file=vm.raw,if=virtio,format=raw -drive
> > file=mkvmconf2xeO,if=virtio,format=raw -netdev type=user,id=net0
> > -device virtio-net-pci,netdev=net0 -kernel vmlinux_tscr -append
> > 'root=/dev/vdb1 rw cloud-init=disabled'
> > 
> > qemu-system-ppc64 --version
> > QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.16), Copyright
> > (c) 2003-2008 Fabrice Bellard
> > 
> > 
> > Key type dns_resolver registered
> > Unable to handle kernel paging request for data at address 0x0010
> > Faulting instruction address: 0xc18f2bbc
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > LE SMP NR_CPUS=2048 NUMA pSeries
> > CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1v4.16-rc1 #8
> > NIP:  c18f2bbc LR: c18f2bb4 CTR: 
> > REGS: c000fea838d0 TRAP: 0380   Not tainted  (4.16.0-rc1v4.16-rc1)
> > MSR:  82009033   CR: 84000248  XER:
> > 2000
> > CFAR: c19591a0 SOFTE: 0 
> > GPR00: c18f2bb4 c000fea83b50 c1bd8400
> >  
> > GPR04: c000fea83b70  002f
> > 0022 
> > GPR08:  c22a3e90 
> > 0220 
> > GPR12:  cfb40980 c000d698
> >  
> > GPR16:   
> >  
> > GPR20:   
> >  
> > GPR24:  c18b9248 c18e36d8
> > c19738a8 
> > GPR28: 0007 c000fc68 c000fea83bf0
> > 0010 
> > NIP [c18f2bbc] read_drconf_v1_cell+0x50/0x9c
> > LR [c18f2bb4] read_drconf_v1_cell+0x48/0x9c
> > Call Trace:
> > [c000fea83b50] [c18f2bb4] read_drconf_v1_cell+0x48/0x9c
> > (unreliable)
> > [c000fea83b90] [c18f305c] drmem_init+0x13c/0x2ec
> > [c000fea83c40] [c18e4288] do_one_initcall+0xdc/0x1ac
> > [c000fea83d00] [c18e45d4] kernel_init_freeable+0x27c/0x358
> > [c000fea83dc0] [c000d6bc] kernel_init+0x2c/0x160
> > [c000fea83e30] [c000bc20] ret_from_kernel_thread+0x5c/0xbc
> > Instruction dump:
> > 7c7f1b78 6000 6000 7c240b78 3d22ffdc 3929f0a4 e95e
> > e8690002 
> > f9440021 4806657d 6000 e9210020  39090004 39490010
> > f9010020 
> > ---[ end trace bd9f49f482d30e03 ]---
> > 
> > Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
> > 
> > WARNING: CPU: 1 PID: 1 at drivers/tty/vt/vt.c:3883
> > do_unblank_screen+0x1f0/0x270
> > CPU: 1 PID: 1 Comm: swapper/0 Tainted: G  D  4.16.0-
> > rc1v4.16-rc1 #8
> > NIP:  c09aa800 LR: c09aa63c CTR: c148f5f0
> > REGS: c000fea832c0 TRAP: 0700   Tainted:
> > G  D   (4.16.0-rc1v4.16-rc1)
> > MSR:  82029033   CR: 2800  XER:
> > 2000
> > CFAR: c09aa658 SOFTE: 1 
> > GPR00: c09aa63c c000fea83540 c1bd8400
> >  
> > GPR04: 0001 c000fb0c200e 1dd7
> > c000fea834d0 
> > GPR08: fe43  
> > 0001 
> > GPR12: 28002428 cfb40980 c000d698
> >  
> > GPR16:   
> >  
> > GPR20:   
> >  
> > GPR24: c000fea4 c000feadf910 c1a4a7a8
> > c1cc4ea0 
> > GPR28: c173f4f0 c1cc4ec8 
> >  
> > NIP [c09aa800] do_unblank_screen+0x1f0/0x270
> > LR [c09aa63c] do_unblank_screen+0x2c/0x270
> > Call Trace:
> > [c000fea83540] [c09aa63c] do_unblank_screen+0x2c/0x270
> > (unreliable)
> > [c000fea835b0] [c08a2a70] bust_spinlocks+0x40/0x80
> > [c000fea835d0] [c00da90c] panic+0x1b8/0x32c
> > [c000fea83670] [c00e1bd4] do_exit+0xcb4/0xcc0
> > [c000fea83730] [c00275fc] die+0x29c/0x450
> > [c000fea837c0] [c0053f88] bad_page_fault+0xe8/0x160
> > [c000fea83830] [c0028a90] slb_miss_bad_addr+0x40/0x90
> > [c000fea83860] [c0008b08] bad_addr_slb+0x158/0x160
> > --- interrupt: 380 at read_drconf_v1_cell+0x50/0x9c
> > LR = read_drconf_v1_cell+0x48/0x9c
> > [c000fea83b90]

Re: [PATCH] powerpc/xive: use hw CPU ids when configuring the CPU queues

2018-02-13 Thread Michael Ellerman

Cédric Le Goater  writes:

> On 02/13/2018 10:18 AM, Michael Ellerman wrote:
>> Cédric Le Goater  writes:
>> 
>>> The CPU event notification queues on sPAPR should be configured using
>>> a hardware CPU identifier.
>>>
>>> The problem did not show up on the Power Hypervisor because pHyp
>>> supports 8 threads per core which keeps CPU number contiguous. This is
>>> not the case on all sPAPR virtual machines, some use SMT=1.
>>>
>>> Also improve error logging by adding the CPU number.
>>>
>>> Signed-off-by: Cédric Le Goater 
>>> ---
>>>
>>>  I think we should send this one to stable also.
>> 
>> Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt 
>> controller")
>
> yes.
>
>> Cc: sta...@vger.kernel.org # v4.14+
>
> yes. I just added the Cc:. I am not sure that will work with 
> patchwork though.

They don't accept patches that way.

I'll add the tags and commit it.

cheers

Re: [V3] powerpc/mm/hash64: memset the pagetable pages on allocation.

2018-02-13 Thread Michael Ellerman

On Tue, 2018-02-13 at 11:09:33 UTC, "Aneesh Kumar K.V" wrote:
> On powerpc we allocate page table pages from slab cache of different sizes. 
> For
> now we have a constructor that zero out the objects when we allocate then for
> the first time. We expect the objects to be zeroed out when we free the the
> object back to slab cache. This happens in the unmap path. For hugetlb pages
> we call huge_pte_get_and_clear to do that. With the current configuration of
> page table size, both pud and pgd level tables get allocated from the same 
> slab
> cache. At the pud level, we use the second half of the table to store the slot
> information. But never clear that when unmapping. When such an freed object 
> get
> allocated at pgd level, we will have part of the page table page not 
> initlaized
> correctly. This result in kernel crash
> 
> Simplify this by calling the object initialization after kmem_cache_alloc
> 
> Signed-off-by: Aneesh Kumar K.V 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/fc5c2f4a55a2c258e12013cdf287cf

cheers

Re: [2/2] powerpc/pseries: Declare optional dummy function for find_and_online_cpu_nid

2018-02-13 Thread Michael Ellerman

On Mon, 2018-02-12 at 22:34:08 UTC, Guenter Roeck wrote:
> Commit e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with
> memoryless nodes") adds an unconditional call to find_and_online_cpu_nid(),
> which is only declared if CONFIG_PPC_SPLPAR is enabled. This results in
> the following build error if this is not the case.
> 
> arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu':
> arch/powerpc/platforms/pseries/hotplug-cpu.c:369:
>   undefined reference to `.find_and_online_cpu_nid'
> 
> Follow the guideline provided by similar functions and provide a dummy
> function if CONFIG_PPC_SPLPAR is not enabled. This also moves the external
> function declaration into an include file where it should be.
> 
> Fixes: e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with ...")
> Cc: Michael Bringmann 
> Cc: Michael Ellerman 
> Cc: Nathan Fontenot 
> Signed-off-by: Guenter Roeck 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/82343484a2d4c97a03bfd81303b549

cheers

Re: selftests/powerpc: Fix: use ucontext_t instead of struct ucontext

2018-02-13 Thread Michael Ellerman

On Tue, 2018-02-13 at 06:32:55 UTC, Harish wrote:
> With glibc 2.26 'struct ucontext' is removed to improve POSIX
> compliance, which breaks powerpc/alignment_handler selftest.
> Fix the test by using ucontext_t. Tested on ppc, works with older
> glibc versions as well.
> 
> Fixes the following:
> alignment_handler.c: In function âsighandlerâ:
> alignment_handler.c:68:5: error: dereferencing pointer to incomplete type 
> âstruct ucontextâ
>   ucp->uc_mcontext.gp_regs[PT_NIP] += 4;
>  ^~
> 
> Signed-off-by: Harish 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/ecdf06e1ea5376bba03c155751f686

cheers

Re: [1/2] powerpc/kdump: Add missing optional dummy functions

2018-02-13 Thread Michael Ellerman

On Mon, 2018-02-12 at 22:34:07 UTC, Guenter Roeck wrote:
> If KEXEC_CORE is not enabled, PowerNV builds fail as follows.
> 
> arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self':
> arch/powerpc/platforms/powernv/smp.c:236:4: error:
>   implicit declaration of function 'crash_ipi_callback'
> 
> Add dummy function calls, similar to kdump_in_progress(), to solve the
> problem.
> 
> Fixes: 4145f358644b ("powernv/kdump: Fix cases where the kdump kernel ...")
> Cc: Balbir Singh 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Signed-off-by: Guenter Roeck 
> Acked-by: Balbir Singh 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/910961754572a2f4c83ad7e610d180

cheers

Re: [1/1] powerpc/pseries: Enable RAS hotplug events late

2018-02-13 Thread Michael Ellerman

On Mon, 2018-02-12 at 00:19:29 UTC, Sam Bobroff wrote:
> Currently if the kernel receives a memory hot-unplug event early
> enough, it may get stuck in an infinite loop in
> dissolve_free_huge_pages(). This appears as a stall just after:
> 
> pseries-hotplug-mem: Attempting to hot-remove XX LMB(s) at 
> 
> It appears to be caused by "minimum_order" being uninitialized, due to
> init_ras_IRQ() executing before hugetlb_init().
> 
> To correct this, extract the part of init_ras_IRQ() that enables
> hotplug event processing and place it in the machine_late_initcall
> phase, which is guaranteed to be after hugetlb_init() is called.
> 
> Signed-off-by: Sam Bobroff 
> Acked-by: Balbir Singh 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/c9dccf1d074a67d36c510845f66398

cheers

Re: [V2, 3/4] powerpc/mm/hash64: Store the slot information at the right offset.

2018-02-13 Thread Michael Ellerman

On Sun, 2018-02-11 at 15:00:08 UTC, "Aneesh Kumar K.V" wrote:
> The hugetlb pte entries are at the PMD and PUD level. Use the right offset
> for them to get the second half of the table.
> 
> Signed-off-by: Aneesh Kumar K.V 
> Reviewed-by: Ram Pai 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/ff31e105464d8c8c97301964682702

cheers

Re: [V2, 2/4] powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled.

2018-02-13 Thread Michael Ellerman

On Sun, 2018-02-11 at 15:00:07 UTC, "Aneesh Kumar K.V" wrote:
> Signed-off-by: Aneesh Kumar K.V 
> Reviewed-by: Ram Pai 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/4a7aa4fecbbf94b5c6fae8a983

cheers

Re: [V2,1/4] powerpc/mm: Fix crashes with PUD level hugetlb config

2018-02-13 Thread Michael Ellerman

On Sun, 2018-02-11 at 15:00:06 UTC, "Aneesh Kumar K.V" wrote:
> To support memory keys, we moved the hash pte slot information to the second
> half of the page table. This was ok with PTE entries at level 4 and level 3.
> We already allocate larger page table pages at those level to accomodate extra
> details. For level 4 we already have the extra space which was used to track
> 4k hash page table entry details and at pmd level the extra space was 
> allocated
> to track the THP details.
> 
> With hugetlbfs PTE, we used this extra space at the PMD level to store the
> slot details. But we also support hugetlbfs PTE at PUD leve and PUD level page
> didn't allocate extra space. This resulted in memory corruption.
> 
> Fix this by allocating extra space at PUD level when HUGETLB is enabled. We
> may need further changes to allocate larger space at PMD level when we enable
> HUGETLB. That will be done in next patch.
> 
> Fixes:bf9a95f9a6481bc6e(" powerpc: Free up four 64K PTE bits in 64K backed 
> HPTE pages")
> 
> Signed-off-by: Aneesh Kumar K.V 
> Reviewed-by: Ram Pai 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/fae2211697c9490414e974431051f7

cheers

Re: powerpc/vas: do not set uses_vas for kernel windows

2018-02-13 Thread Michael Ellerman

On Thu, 2018-02-08 at 09:18:38 UTC, Nicholas Piggin wrote:
> cp_abort is only required or user windows, because kernel context
> must not be preempted between a copy/paste pair.
> 
> Without this patch, the init task gets used_vas set when it runs
> the nx842_powernv_init initcall, which opens windows for kernel
> usage.
> 
> used_vas is then never cleared anywhere, so it gets propagated
> into all other tasks. It's a property of the address space, so it
> should really be cleared when a new mm is created (or in dup_mmap
> if the mmaps are marked as VM_DONTCOPY). For now we seem to have
> no such driver, so leave that for another patch.
> 
> Cc: Sukadev Bhattiprolu 
> Signed-off-by: Nicholas Piggin 
> Reviewed-by: Sukadev Bhattiprolu 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/b00b62898631b756c3e123542bbb04

cheers

Re: [kernel, v2] powerpc/mm: Flush radix process translations when setting MMU type

2018-02-13 Thread Michael Ellerman

On Thu, 2018-02-01 at 05:09:44 UTC, Alexey Kardashevskiy wrote:
> Radix guests do normally invalidate process-scoped translations when
> a new pid is allocated but migrated guests do not invalidate these so
> migrated guests crash sometime, especially easy to reproduce with
> migration happening within first 10 seconds after the guest boot start on
> the same machine.
> 
> This adds the "Invalidate process-scoped translations" flush to fix
> radix guests migration.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Tested-by: Laurent Vivier 
> Tested-by: Daniel Henrique Barboza 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/62e984ddfd6b056d399e24113f5e6a

cheers

Re: 4.16-rc1 virtual machine crash on boot

2018-02-13 Thread Tyrel Datwyler

On 02/13/2018 05:20 PM, Cyril Bur wrote:
> Hello all,

Does reverting commit 02ef6dd8109b581343ebeb1c4c973513682535d6 alleviate the 
issue?

-Tyrel

> 
> I'm seeing this crash trying to boot a KVM virtual machine. This kernel
> was compiled with pseries_le_defconfig and run using the following qemu
> commandline:
> 
> qemu-system-ppc64 -enable-kvm -cpu POWER8 -smp 4 -m 4G -M pseries
> -nographic -vga none -drive file=vm.raw,if=virtio,format=raw -drive
> file=mkvmconf2xeO,if=virtio,format=raw -netdev type=user,id=net0
> -device virtio-net-pci,netdev=net0 -kernel vmlinux_tscr -append
> 'root=/dev/vdb1 rw cloud-init=disabled'
> 
> qemu-system-ppc64 --version
> QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.16), Copyright
> (c) 2003-2008 Fabrice Bellard
> 
> 
> Key type dns_resolver registered
> Unable to handle kernel paging request for data at address 0x0010
> Faulting instruction address: 0xc18f2bbc
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE SMP NR_CPUS=2048 NUMA pSeries
> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1v4.16-rc1 #8
> NIP:  c18f2bbc LR: c18f2bb4 CTR: 
> REGS: c000fea838d0 TRAP: 0380   Not tainted  (4.16.0-rc1v4.16-rc1)
> MSR:  82009033   CR: 84000248  XER:
> 2000
> CFAR: c19591a0 SOFTE: 0 
> GPR00: c18f2bb4 c000fea83b50 c1bd8400
>  
> GPR04: c000fea83b70  002f
> 0022 
> GPR08:  c22a3e90 
> 0220 
> GPR12:  cfb40980 c000d698
>  
> GPR16:   
>  
> GPR20:   
>  
> GPR24:  c18b9248 c18e36d8
> c19738a8 
> GPR28: 0007 c000fc68 c000fea83bf0
> 0010 
> NIP [c18f2bbc] read_drconf_v1_cell+0x50/0x9c
> LR [c18f2bb4] read_drconf_v1_cell+0x48/0x9c
> Call Trace:
> [c000fea83b50] [c18f2bb4] read_drconf_v1_cell+0x48/0x9c
> (unreliable)
> [c000fea83b90] [c18f305c] drmem_init+0x13c/0x2ec
> [c000fea83c40] [c18e4288] do_one_initcall+0xdc/0x1ac
> [c000fea83d00] [c18e45d4] kernel_init_freeable+0x27c/0x358
> [c000fea83dc0] [c000d6bc] kernel_init+0x2c/0x160
> [c000fea83e30] [c000bc20] ret_from_kernel_thread+0x5c/0xbc
> Instruction dump:
> 7c7f1b78 6000 6000 7c240b78 3d22ffdc 3929f0a4 e95e
> e8690002 
> f9440021 4806657d 6000 e9210020  39090004 39490010
> f9010020 
> ---[ end trace bd9f49f482d30e03 ]---
> 
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
> 
> WARNING: CPU: 1 PID: 1 at drivers/tty/vt/vt.c:3883
> do_unblank_screen+0x1f0/0x270
> CPU: 1 PID: 1 Comm: swapper/0 Tainted: G  D  4.16.0-
> rc1v4.16-rc1 #8
> NIP:  c09aa800 LR: c09aa63c CTR: c148f5f0
> REGS: c000fea832c0 TRAP: 0700   Tainted:
> G  D   (4.16.0-rc1v4.16-rc1)
> MSR:  82029033   CR: 2800  XER:
> 2000
> CFAR: c09aa658 SOFTE: 1 
> GPR00: c09aa63c c000fea83540 c1bd8400
>  
> GPR04: 0001 c000fb0c200e 1dd7
> c000fea834d0 
> GPR08: fe43  
> 0001 
> GPR12: 28002428 cfb40980 c000d698
>  
> GPR16:   
>  
> GPR20:   
>  
> GPR24: c000fea4 c000feadf910 c1a4a7a8
> c1cc4ea0 
> GPR28: c173f4f0 c1cc4ec8 
>  
> NIP [c09aa800] do_unblank_screen+0x1f0/0x270
> LR [c09aa63c] do_unblank_screen+0x2c/0x270
> Call Trace:
> [c000fea83540] [c09aa63c] do_unblank_screen+0x2c/0x270
> (unreliable)
> [c000fea835b0] [c08a2a70] bust_spinlocks+0x40/0x80
> [c000fea835d0] [c00da90c] panic+0x1b8/0x32c
> [c000fea83670] [c00e1bd4] do_exit+0xcb4/0xcc0
> [c000fea83730] [c00275fc] die+0x29c/0x450
> [c000fea837c0] [c0053f88] bad_page_fault+0xe8/0x160
> [c000fea83830] [c0028a90] slb_miss_bad_addr+0x40/0x90
> [c000fea83860] [c0008b08] bad_addr_slb+0x158/0x160
> --- interrupt: 380 at read_drconf_v1_cell+0x50/0x9c
> LR = read_drconf_v1_cell+0x48/0x9c
> [c000fea83b90] [c18f305c] drmem_init+0x13c/0x2ec
> [c000fea83c40] [c18e4288] do_one_initcall+0xdc/0x1ac
> [c000fea83d00] [c18e45d4] kernel_init_freeable+0x27c/0x358
> [c000fea83dc0] [c000d6bc] kernel_init+0x2c/0x160
> [c000fea83e30] [c000bc20] ret_from_kernel_thread+0x5c/0xbc
> Instruction dump:

Re: [PATCH] powerpc/npu-dma.c: Fix deadlock in mmio_invalidate

2018-02-13 Thread Alistair Popple

> > +struct mmio_atsd_reg {
> > +   struct npu *npu;
> > +   int reg;
> > +};
> > +
> 
> Is it just easier to move reg to inside of struct npu?

I don't think so, struct npu is global to all npu contexts where as this is
specific to the given invalidation. We don't have enough registers to assign
each NPU context it's own dedicated register so I'm not sure it makes sense to
put it there either.

> > +static void acquire_atsd_reg(struct npu_context *npu_context,
> > +   struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS])
> > +{
> > +   int i, j;
> > +   struct npu *npu;
> > +   struct pci_dev *npdev;
> > +   struct pnv_phb *nphb;
> >  
> > -   /*
> > -* The GPU requires two flush ATSDs to ensure all entries have
> > -* been flushed. We use PID 0 as it will never be used for a
> > -* process on the GPU.
> > -*/
> > -   if (flush)
> > -   mmio_invalidate_pid(npu, 0, true);
> > +   for (i = 0; i <= max_npu2_index; i++) {
> > +   mmio_atsd_reg[i].reg = -1;
> > +   for (j = 0; j < NV_MAX_LINKS; j++) {
> 
> Is it safe to assume that npu_context->npdev will not change in this
> loop? I guess it would need to be stronger than just this loop.

It is not safe to assume that npu_context->npdev won't change during this loop,
however I don't think it is a problem if it does as we only read each element
once during the invalidation.

There are two possibilities for how this could change. pnv_npu2_init_context()
will add a nvlink to the npdev which will result in the TLB invalidation being
sent to that GPU as well which should not be a problem.

pnv_npu2_destroy_context() will remove the the nvlink from npdev. If it happens
prior to this loop it should not be a problem (as the destruction will have
already invalidated the GPU TLB). If it happens after this loop it shouldn't be
a problem either (it will just result in an extra TLB invalidate being sent to
this GPU).

> > +   npdev = npu_context->npdev[i][j];
> > +   if (!npdev)
> > +   continue;
> > +
> > +   nphb = pci_bus_to_host(npdev->bus)->private_data;
> > +   npu = >npu;
> > +   mmio_atsd_reg[i].npu = npu;
> > +   mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
> > +   while (mmio_atsd_reg[i].reg < 0) {
> > +   mmio_atsd_reg[i].reg = get_mmio_atsd_reg(npu);
> > +   cpu_relax();
> 
> A cond_resched() as well if we have too many tries?

I don't think we can as the invalidate_range() function is called under the ptl
spin-lock and is not allowed to sleep (at least according to
include/linux/mmu_notifier.h).

- Alistair

> Balbir
>

[PATCH] powerpc: Expose TSCR via sysfs only on powernv

2018-02-13 Thread Cyril Bur

The TSCR can only be accessed in hypervisor mode.

Fixes: 88b5e12eeb11 ("powerpc: Expose TSCR via sysfs")
Signed-off-by: Cyril Bur 
---
 arch/powerpc/kernel/sysfs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 5a8bfee6e187..04d0bbd7a1dd 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -788,7 +788,8 @@ static int register_cpu_online(unsigned int cpu)
if (cpu_has_feature(CPU_FTR_PPCAS_ARCH_V2))
device_create_file(s, _attr_pir);
 
-   if (cpu_has_feature(CPU_FTR_ARCH_206))
+   if (cpu_has_feature(CPU_FTR_ARCH_206) &&
+   !firmware_has_feature(FW_FEATURE_LPAR))
device_create_file(s, _attr_tscr);
 #endif /* CONFIG_PPC64 */
 
@@ -873,7 +874,8 @@ static int unregister_cpu_online(unsigned int cpu)
if (cpu_has_feature(CPU_FTR_PPCAS_ARCH_V2))
device_remove_file(s, _attr_pir);
 
-   if (cpu_has_feature(CPU_FTR_ARCH_206))
+   if (cpu_has_feature(CPU_FTR_ARCH_206) &&
+   !firmware_has_feature(FW_FEATURE_LPAR))
device_remove_file(s, _attr_tscr);
 #endif /* CONFIG_PPC64 */
 
-- 
2.16.1

Re: [PATCH 1/2] powerpc/kdump: Add missing optional dummy functions

2018-02-13 Thread Balbir Singh

On Mon, 12 Feb 2018 15:25:51 -0800
Guenter Roeck  wrote:

> On Tue, Feb 13, 2018 at 10:01:57AM +1100, Balbir Singh wrote:
> > On Tue, Feb 13, 2018 at 9:34 AM, Guenter Roeck  wrote:  
> > > If KEXEC_CORE is not enabled, PowerNV builds fail as follows.
> > >
> > > arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self':
> > > arch/powerpc/platforms/powernv/smp.c:236:4: error:
> > > implicit declaration of function 'crash_ipi_callback'
> > >
> > > Add dummy function calls, similar to kdump_in_progress(), to solve the
> > > problem.
> > >
> > > Fixes: 4145f358644b ("powernv/kdump: Fix cases where the kdump kernel 
> > > ...")
> > > Cc: Balbir Singh 
> > > Cc: Michael Ellerman 
> > > Cc: Nicholas Piggin 
> > > Signed-off-by: Guenter Roeck 
> > > ---  
> > 
> > Thanks for working on this.
> > 
> > You've added two functions, I understand the crash_send_ipi() bits
> > that I broke. Looks like crash_ipi_callback broken without KEXEC_CORE?
> >   
> 
> If I recall correctly, 4145f358644b introduced the call to 
> crash_ipi_callback().
> After I declared the dummy function for that, I got an error about the missing
> crash_send_ipi(). I didn't spend more time on it but just added another dummy
> function. It may well be that another problem was introduced in the same time
> frame. On the other side, maybe I got it all wrong, and my patch is not worth
> the computer it was written on.
> 

The patches worked for me with CONFIG_KEXEC=n and CONFIG_KEXEC_CORE=n

Tested-by: Balbir Singh 

Balbir Singh

Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

2018-02-13 Thread Mike Kravetz

On 02/12/2018 06:48 PM, Michael Ellerman wrote:
> Andrew Morton  writes:
> 
>> On Thu, 08 Feb 2018 12:30:45 + Punit Agrawal  
>> wrote:
>>

 So I don't think that the above test result means that errors are properly
 handled, and the proposed patch should help for arm64.
>>>
>>> Although, the deviation of pud_huge() avoids a kernel crash the code
>>> would be easier to maintain and reason about if arm64 helpers are
>>> consistent with expectations by core code.
>>>
>>> I'll look to update the arm64 helpers once this patch gets merged. But
>>> it would be helpful if there was a clear expression of semantics for
>>> pud_huge() for various cases. Is there any version that can be used as
>>> reference?
>>
>> Is that an ack or tested-by?
>>
>> Mike keeps plaintively asking the powerpc developers to take a look,
>> but they remain steadfastly in hiding.
> 
> Cc'ing linuxppc-dev is always a good idea :)
> 

Thanks Michael,

I was mostly concerned about use cases for soft/hard offline of huge pages
larger than PMD_SIZE on powerpc.  I know that powerpc supports PGD_SIZE
huge pages, and soft/hard offline support was specifically added for this.
See, 94310cbcaa3c "mm/madvise: enable (soft|hard) offline of HugeTLB pages
at PGD level"

This patch will disable that functionality.  So, at a minimum this is a
'heads up'.  If there are actual use cases that depend on this, then more
work/discussions will need to happen.  From the e-mail thread on PGD_SIZE
support, I can not tell if there is a real use case or this is just a
'nice to have'.

-- 
Mike Kravetz

>> Folks, this patch fixes a BUG and is marked for -stable.  Can we please
>> prioritize it?
> 
> It's not crashing for me (on 4.16-rc1):
> 
>   # ./huge-poison 
>   Poisoning page...once
>   Poisoning page...once again
>   madvise: Bad address
> 
> And I guess the above is the expected behaviour?
> 
> Looking at the function trace it looks like the 2nd madvise is going
> down reasonable code paths, but I don't know for sure:
> 
>   8)   |  SyS_madvise() {
>   8)   |capable() {
>   8)   |  ns_capable_common() {
>   8)   0.094 us|cap_capable();
>   8)   0.516 us|  }
>   8)   1.052 us|}
>   8)   |get_user_pages_fast() {
>   8)   0.354 us|  gup_pgd_range();
>   8)   |  get_user_pages_unlocked() {
>   8)   0.050 us|down_read();
>   8)   |__get_user_pages() {
>   8)   |  find_extend_vma() {
>   8)   |find_vma() {
>   8)   0.148 us|  vmacache_find();
>   8)   0.622 us|}
>   8)   1.064 us|  }
>   8)   0.028 us|  arch_vma_access_permitted();
>   8)   |  follow_hugetlb_page() {
>   8)   |huge_pte_offset() {
>   8)   0.128 us|  __find_linux_pte();
>   8)   0.580 us|}
>   8)   0.048 us|_raw_spin_lock();
>   8)   |hugetlb_fault() {
>   8)   |  huge_pte_offset() {
>   8)   0.034 us|__find_linux_pte();
>   8)   0.434 us|  }
>   8)   0.028 us|  is_hugetlb_entry_migration();
>   8)   0.032 us|  is_hugetlb_entry_hwpoisoned();
>   8)   2.118 us|}
>   8)   4.940 us|  }
>   8)   7.468 us|}
>   8)   0.056 us|up_read();
>   8)   8.722 us|  }
>   8) + 10.264 us   |}
>   8) + 12.212 us   |  }
> 
> 
> cheers
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>

4.16-rc1 virtual machine crash on boot

2018-02-13 Thread Cyril Bur

Hello all,

I'm seeing this crash trying to boot a KVM virtual machine. This kernel
was compiled with pseries_le_defconfig and run using the following qemu
commandline:

qemu-system-ppc64 -enable-kvm -cpu POWER8 -smp 4 -m 4G -M pseries
-nographic -vga none -drive file=vm.raw,if=virtio,format=raw -drive
file=mkvmconf2xeO,if=virtio,format=raw -netdev type=user,id=net0
-device virtio-net-pci,netdev=net0 -kernel vmlinux_tscr -append
'root=/dev/vdb1 rw cloud-init=disabled'

qemu-system-ppc64 --version
QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.16), Copyright
(c) 2003-2008 Fabrice Bellard


Key type dns_resolver registered
Unable to handle kernel paging request for data at address 0x0010
Faulting instruction address: 0xc18f2bbc
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1v4.16-rc1 #8
NIP:  c18f2bbc LR: c18f2bb4 CTR: 
REGS: c000fea838d0 TRAP: 0380   Not tainted  (4.16.0-rc1v4.16-rc1)
MSR:  82009033   CR: 84000248  XER:
2000
CFAR: c19591a0 SOFTE: 0 
GPR00: c18f2bb4 c000fea83b50 c1bd8400
 
GPR04: c000fea83b70  002f
0022 
GPR08:  c22a3e90 
0220 
GPR12:  cfb40980 c000d698
 
GPR16:   
 
GPR20:   
 
GPR24:  c18b9248 c18e36d8
c19738a8 
GPR28: 0007 c000fc68 c000fea83bf0
0010 
NIP [c18f2bbc] read_drconf_v1_cell+0x50/0x9c
LR [c18f2bb4] read_drconf_v1_cell+0x48/0x9c
Call Trace:
[c000fea83b50] [c18f2bb4] read_drconf_v1_cell+0x48/0x9c
(unreliable)
[c000fea83b90] [c18f305c] drmem_init+0x13c/0x2ec
[c000fea83c40] [c18e4288] do_one_initcall+0xdc/0x1ac
[c000fea83d00] [c18e45d4] kernel_init_freeable+0x27c/0x358
[c000fea83dc0] [c000d6bc] kernel_init+0x2c/0x160
[c000fea83e30] [c000bc20] ret_from_kernel_thread+0x5c/0xbc
Instruction dump:
7c7f1b78 6000 6000 7c240b78 3d22ffdc 3929f0a4 e95e
e8690002 
f9440021 4806657d 6000 e9210020  39090004 39490010
f9010020 
---[ end trace bd9f49f482d30e03 ]---

Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b

WARNING: CPU: 1 PID: 1 at drivers/tty/vt/vt.c:3883
do_unblank_screen+0x1f0/0x270
CPU: 1 PID: 1 Comm: swapper/0 Tainted: G  D  4.16.0-
rc1v4.16-rc1 #8
NIP:  c09aa800 LR: c09aa63c CTR: c148f5f0
REGS: c000fea832c0 TRAP: 0700   Tainted:
G  D   (4.16.0-rc1v4.16-rc1)
MSR:  82029033   CR: 2800  XER:
2000
CFAR: c09aa658 SOFTE: 1 
GPR00: c09aa63c c000fea83540 c1bd8400
 
GPR04: 0001 c000fb0c200e 1dd7
c000fea834d0 
GPR08: fe43  
0001 
GPR12: 28002428 cfb40980 c000d698
 
GPR16:   
 
GPR20:   
 
GPR24: c000fea4 c000feadf910 c1a4a7a8
c1cc4ea0 
GPR28: c173f4f0 c1cc4ec8 
 
NIP [c09aa800] do_unblank_screen+0x1f0/0x270
LR [c09aa63c] do_unblank_screen+0x2c/0x270
Call Trace:
[c000fea83540] [c09aa63c] do_unblank_screen+0x2c/0x270
(unreliable)
[c000fea835b0] [c08a2a70] bust_spinlocks+0x40/0x80
[c000fea835d0] [c00da90c] panic+0x1b8/0x32c
[c000fea83670] [c00e1bd4] do_exit+0xcb4/0xcc0
[c000fea83730] [c00275fc] die+0x29c/0x450
[c000fea837c0] [c0053f88] bad_page_fault+0xe8/0x160
[c000fea83830] [c0028a90] slb_miss_bad_addr+0x40/0x90
[c000fea83860] [c0008b08] bad_addr_slb+0x158/0x160
--- interrupt: 380 at read_drconf_v1_cell+0x50/0x9c
LR = read_drconf_v1_cell+0x48/0x9c
[c000fea83b90] [c18f305c] drmem_init+0x13c/0x2ec
[c000fea83c40] [c18e4288] do_one_initcall+0xdc/0x1ac
[c000fea83d00] [c18e45d4] kernel_init_freeable+0x27c/0x358
[c000fea83dc0] [c000d6bc] kernel_init+0x2c/0x160
[c000fea83e30] [c000bc20] ret_from_kernel_thread+0x5c/0xbc
Instruction dump:
3c62ffbf 38840001 7c8407b4 38639ca8 4b7ae0ed 6000 38210070
e8010010 
ebc1fff0 ebe1fff8 7c0803a6 4e800020 <0fe0> 4bfffe58 6000
6042 
---[ end trace bd9f49f482d30e04 ]---
Rebooting in 10 seconds..

Re: [PATCH] headers: untangle kmemleak.h from mm.h

2018-02-13 Thread Randy Dunlap

On 02/11/2018 11:27 PM, Ingo Molnar wrote:
> 
> * Randy Dunlap  wrote:
> 
>> From: Randy Dunlap 
>>
>> Currently  #includes  for no obvious
>> reason. It looks like it's only a convenience, so remove kmemleak.h
>> from slab.h and add  to any users of kmemleak_*
>> that don't already #include it.
>> Also remove  from source files that do not use it.
>>
>> This is tested on i386 allmodconfig and x86_64 allmodconfig. It
>> would be good to run it through the 0day bot for other $ARCHes.
>> I have neither the horsepower nor the storage space for the other
>> $ARCHes.
>>
>> [slab.h is the second most used header file after module.h; kernel.h
>> is right there with slab.h. There could be some minor error in the
>> counting due to some #includes having comments after them and I
>> didn't combine all of those.]
>>
>> This is Lingchi patch #1 (death by a thousand cuts, applied to kernel
>> header files).
>>
>> Signed-off-by: Randy Dunlap 
> 
> Nice find:
> 
> Reviewed-by: Ingo Molnar 
> 
> I agree that it needs to go through 0-day to find any hidden dependencies we 
> might 
> have grown due to this.

Andrew,

This patch has mostly survived both 0day and ozlabs multi-arch testing with
2 build errors being reported by both of them.  I have posted patches for
those separately. (and are attached here)

other-patch-1:
lkml.kernel.org/r/5664ced1-a0cd-7e4e-71b6-9c3a97d68...@infradead.org
"lib/test_firmware: add header file to prevent build errors"

other-patch-2:
lkml.kernel.org/r/b3b7eebb-0e9f-f175-94a8-379c5ddca...@infradead.org
"integrity/security: fix digsig.c build error"

Will you see that these are merged or do you want me to repost them?

thanks,
-- 
~Randy
From: Randy Dunlap 

security/integrity/digsig.c has build errors on some $ARCH due to a
missing header file, so add it.

  security/integrity/digsig.c:146:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]

Reported-by: Michael Ellerman 
Signed-off-by: Randy Dunlap 
Cc: Mimi Zohar 
Cc: linux-integr...@vger.kernel.org
Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
---
 security/integrity/digsig.c |1 +
 1 file changed, 1 insertion(+)

--- lnx-416-rc1.orig/security/integrity/digsig.c
+++ lnx-416-rc1/security/integrity/digsig.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 



From: Randy Dunlap 

lib/test_firmware.c has build errors on some $ARCH due to a
missing header file, so add it.

  lib/test_firmware.c:134:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
  lib/test_firmware.c:620:25: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]

Reported-by: Michael Ellerman 
Signed-off-by: Randy Dunlap 
Cc: Wei Yongjun 
Cc: Luis R. Rodriguez 
Cc: Greg Kroah-Hartman 
Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
---
 lib/test_firmware.c |1 +
 1 file changed, 1 insertion(+)

--- lnx-416-rc1.orig/lib/test_firmware.c
+++ lnx-416-rc1/lib/test_firmware.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define TEST_FIRMWARE_NAME	"test-firmware.bin"
 #define TEST_FIRMWARE_NUM_REQS	4

[no subject]

2018-02-13 Thread Shan Hai

confirm 179e695f420474677205db49a8cbfe950329975c

[no subject]

2018-02-13 Thread Shan Hai

confirm 0da5e6b1343dcc6395ebcc8054c362d930498440

Re: [PATCH] powerpc/xmon: Dont register sysrq key when kernel param xmon=off

2018-02-13 Thread Balbir Singh

On Mon, Feb 12, 2018 at 11:35 PM, Vaibhav Jain
 wrote:
> Thanks for reviewing this patch Balbir
>
> Balbir Singh  writes:
>
>> Any specific issue you've run into without this patch?
> Without this patch since xmon is still accessible via sysrq and there is
> no indication/warning on the xmon console mentioning that its is not
> fully functional. Specifically xmon-console would still allow user to
> set instruction/data breakpoint eventhough they wont work and will
> result in a kernel-oops.
>
> Below is command log illustrating this problem on one of my test system
> where I tried setting an instruction breakpoint on cmdline_proc_show()
> with xmon=off:
>
> ~# cat /proc/cmdline
> root=UUID=248ad10e-a272-4187-8672-5b25f701e8b9 ro xmon=off
>
> ~# echo 'x' > /proc/sysrq-trigger
> [  458.904802] sysrq: SysRq : Entering xmon
>
> [ snip ]
>
> 78:mon> ls cmdline_proc_show
> cmdline_proc_show: c04196e0
> 78:mon> bi c04196e0
> 78:mon> x
>
> ~# cat /proc/cmdline
> [  505.618702] Oops: Exception in kernel mode, sig: 5 [#1]
> [ snip ]
> [  505.620082] NIP [c04196e4] cmdline_proc_show+0x4/0x60
> [  505.620136] LR [c03b1db0] seq_read+0x130/0x5e0
> [  505.620177] Call Trace:
> [  505.620202] [c000200e5078fc00] [c03b1d74] seq_read+0xf4/0x5e0 
> (unreliable)
> [  505.620267] [c000200e5078fca0] [c040cae0] proc_reg_read+0xb0/0x110
> [  505.620322] [c000200e5078fcf0] [c037687c] __vfs_read+0x6c/0x1b0
> [  505.620376] [c000200e5078fd90] [c0376a7c] vfs_read+0xbc/0x1b0
> [  505.620430] [c000200e5078fde0] [c037724c] SyS_read+0x6c/0x110
> [  505.620485] [c000200e5078fe30] [c000b320] system_call+0x58/0x6c
> [  505.620536] Instruction dump:
> [  505.620570] 3c82ff2a 7fe3fb78 38a0 3884dee0 4bf98c05 6000 38210030 
> e8010010
> [  505.620656] ebe1fff8 7c0803a6 4e800020 3c4c00d6 <38422120> 7c0802a6 
> f8010010 f821ff91
> [  505.620728] ---[ end trace eaf583921860b3de ]---
> [  506.629019]
> Trace/breakpoint trap
> ~#
>
>
>> I presume running xmon=off indicates we don't want xmon to take over in case 
>> of
>> panic/die/oops,
> I believe that when xmon console is available it should be fully
> functional rather than partially, otherwise it gets really confusing to
> the user as to why Instruction/Data break points arent working.
>

OK, so kernel breakpoints are broken with xmon=off and lead to oops as
opposed to passing them on to a kprobe handler perhaps?

Balbir Singh.

Re: samples/seccomp/ broken when cross compiling s390, ppc allyesconfig

2018-02-13 Thread Kees Cook

On Tue, Feb 13, 2018 at 2:32 AM, Michal Hocko  wrote:
> On Tue 13-02-18 21:16:55, Michael Ellerman wrote:
>> Kees Cook  writes:
>>
>> > On Mon, Feb 12, 2018 at 7:25 PM, Michael Ellerman  
>> > wrote:
>> >> Michal Hocko  writes:
>> >>> Hi,
>> >>> my build test machinery chokes on samples/seccomp when cross compiling
>> >>> s390 and ppc64 allyesconfig. This has been the case for quite some
>> >>> time already but I never found time to look at the problem and report
>> >>> it. It seems this is not new issue and similar thing happend for
>> >>> MIPS e9107f88c985 ("samples/seccomp/Makefile: do not build tests if
>> >>> cross-compiling for MIPS").
>> >>>
>> >>> The build logs are attached.
>> >>>
>> >>> What is the best way around this? Should we simply skip compilation on
>> >>> cross compile or is actually anybody relying on that? Or should I simply
>> >>> disable it for s390 and ppc?
>> >>
>> >> The whole thing seems very confused. It's not building for the target,
>> >> it's building for the host, ie. the Makefile sets hostprogs-m and
>> >> HOSTCFLAGS etc.
>> >>
>> >> So it can't possibly work with cross compiling as it's currently
>> >> written.
>> >>
>> >> Either the Makefile needs some serious work to properly support cross
>> >> compiling or it should just be disabled when cross compiling.
>> >
>> > Hrm, yeah, the goal was to entirely disable cross compiling, but I
>> > guess we didn't hit it with a hard enough hammer. :)
>>
>> Do you know why it is written that way? Why doesn't it just try to cross
>> compile like normal code?
>
> No idea, sorry. All I know about this code is that it breaks my build
> testing.

IIRC, one of the problems is with build ordering problems: the kernel
headers used by the samples aren't available when cross compiling.

I'm happy to kill it entirely with Michal's patch, though. Feel free
to carry in your tree!

Acked-by: Kees Cook 

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v2 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-13 Thread James Smart


On 2/12/2018 9:59 PM, Michael Ellerman wrote:

Johannes Thumshirn  writes:


On Wed, Feb 07, 2018 at 10:51:57AM +0100, Johannes Thumshirn wrote:

+   /* Enable combined writes for DPP aperture */
+   pg_addr = (unsigned long)(wq->dpp_regaddr) & PAGE_MASK;
+#ifdef CONFIG_X86
+   rc = set_memory_wc(pg_addr, 1);
+   if (rc) {
+   lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
+   "3272 Cannot setup Combined "
+   "Write on WQ[%d] - disable 
DPP\n",
+   wq->queue_id);
+   phba->cfg_enable_dpp = 0;
+   }
+#else
+   phba->cfg_enable_dpp = 0;
+#endif
+   } else
+   wq->db_regaddr = phba->sli4_hba.WQDBregaddr;


I don't really like the set_memory_wc() call here. Neither do I like the ifdef
CONFIG_X86 special casing.

If you really need write combining, can't you at least use ioremap_wc()?


Coming back to this again (after talking to our ARM/POWER folks internally).
Is this really x86 specific here? I know there are servers with other 
architectures
using lpfcs out there.

I _think_ write combining should be possible on other architectures (that have
PCIe and aren't dead) as well.

The ioremap_wc() I suggested is probably wrong.

So can you please revisit this? I CCed Mark and Michael, maybe they can help
here.


I'm not much of an I/O guy, but I do know that on powerpc we don't
implement set_memory_wc(). So if you're using that then you do need the
ifdef.

I couldn't easily find the rest of this thread, so I'm not sure if
ioremap_wc() is an option. We do implement that and on modern CPUs at
least it will give you something that's not just a plain uncached
mapping.


I went back and looked at things.  It does appear that we should be 
using ioremap_wc().  There's a pci routine that wrappers it, but as 
we're already are using the other routines in the wrapper, it's not very 
interesting.   Ioremap_wc seems to be supported pretty much anywhere, 
with platforms managing what it resolves to. Granted, some platforms 
may not do write combining but will relax the caching aspects (as 
Michael indicates).


The interesting thing is - when wc is truly on, we see a substantial 
difference. But in cases where wc isn't on and we perform the individual 
writes plus the flush before the doorbell write to synchronize things, 
it turns out it takes longer than if we don't use the feature. So, in 
cases where we don't have real wc, I'm going to turn it off. Based on 
what we've tested so far (includes ppc p8), we'll be leaving it enabled 
on X86 only.


-- james

Re: [PATCH 5/5] mtd: Stop updating erase_info->state and calling mtd_erase_callback()

2018-02-13 Thread Bert Kenward

On 12/02/18 21:03, Boris Brezillon wrote:
> MTD users are no longer checking erase_info->state to determine if the
> erase operation failed or succeeded. Moreover, mtd_erase_callback() is
> now a NOP.
> 
> We can safely get rid of all mtd_erase_callback() calls and all
> erase_info->state assignments. While at it, get rid of the
> erase_info->state field, all MTD_ERASE_XXX definitions and the
> mtd_erase_callback() function.
> 
> Signed-off-by: Boris Brezillon 

For sfc parts:

Acked-by: Bert Kenward 


Thanks,

Bert.

[PATCH] powerpc: Revert support for ibm,drc-info devtree property

2018-02-13 Thread Michael Bringmann

This reverts commit 02ef6dd8109b581343ebeb1c4c973513682535d6.

The earlier patch tried to enable support for a new property
"ibm,drc-info" on powerpc systems.

Unfortunately, some errors in the associated patch set break things
in some of the DLPAR operations.  In particular when attempting to
hot-add a new CPU or set of CPUs, the original patch failed to
properly calculate the available resources, and aborted the operation.
In addition, the original set missed several opportunities to compress
and reuse common code.

As the associated patch set was meant to provide an optimization of
storage and performance of a set of device-tree properties for future
systems with large amounts of resources, reverting just restores
the previous behavior for existing systems.  It seems unnecessary
to enable this feature and introduce the consequent problems in the
field that it will cause at this time, so please revert it for now
until testing of the corrections are finished properly.

Signed-off-by: Michael W. Bringmann 
---
 arch/powerpc/kernel/prom_init.c | 1 +
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index adf044d..d22c41c 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -874,7 +874,7 @@ struct ibm_arch_vec __cacheline_aligned 
ibm_architecture_vec = {
.mmu = 0,
.hash_ext = 0,
.radix_ext = 0,
-   .byte22 = OV5_FEAT(OV5_DRC_INFO),
+   .byte22 = 0,
},
 
/* option vector 6: IBM PAPR hints */

Re: [PATCH] cxl: Remove function write_timebase_ctrl_psl9() for PSL9

2018-02-13 Thread Frederic Barrat




Le 09/02/2018 à 05:10, Vaibhav Jain a écrit :

For PSL9 the time-base enable bit has moved from PSL_TB_CTLSTAT
register to PSL_CONTROL register. Hence we don't need an sl_ops
implementation for 'write_timebase_ctrl' for PSL9.

Hence this patch removes function write_timebase_ctrl_psl9() and its
references from the code.

Signed-off-by: Vaibhav Jain 
---


The code change looks ok, but am I the only one to think the commit 
message doesn't match? The enable bit has always been in the PSL_CONTROL 
register, it was just badly documented on p8. What's been removed is 
much of the configuration found in PSL_TB_CTLSTAT.


  Fred



  drivers/misc/cxl/pci.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index c983f23cc2ed..9bc30c20b66b 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -572,12 +572,6 @@ static int init_implementation_adapter_regs_xsl(struct cxl 
*adapter, struct pci_
  /* For the PSL this is a multiple for 0 < n <= 7: */
  #define PSL_2048_250MHZ_CYCLES 1

-static void write_timebase_ctrl_psl9(struct cxl *adapter)
-{
-   cxl_p1_write(adapter, CXL_PSL9_TB_CTLSTAT,
-TBSYNC_CNT(2 * PSL_2048_250MHZ_CYCLES));
-}
-
  static void write_timebase_ctrl_psl8(struct cxl *adapter)
  {
cxl_p1_write(adapter, CXL_PSL_TB_CTLSTAT,
@@ -639,7 +633,8 @@ static void cxl_setup_psl_timebase(struct cxl *adapter, 
struct pci_dev *dev)
 * Setup PSL Timebase Control and Status register
 * with the recommended Timebase Sync Count value
 */
-   adapter->native->sl_ops->write_timebase_ctrl(adapter);
+   if (adapter->native->sl_ops->write_timebase_ctrl)
+   adapter->native->sl_ops->write_timebase_ctrl(adapter);

/* Enable PSL Timebase */
cxl_p1_write(adapter, CXL_PSL_Control, 0x);
@@ -1805,7 +1800,6 @@ static const struct cxl_service_layer_ops psl9_ops = {
.psl_irq_dump_registers = cxl_native_irq_dump_regs_psl9,
.err_irq_dump_registers = cxl_native_err_irq_dump_regs_psl9,
.debugfs_stop_trace = cxl_stop_trace_psl9,
-   .write_timebase_ctrl = write_timebase_ctrl_psl9,
.timebase_read = timebase_read_psl9,
.capi_mode = OPAL_PHB_CAPI_MODE_CAPI,
.needs_reset_before_disable = true,

[PATCH] Fix cleanup when VAS is not configured

2018-02-13 Thread Sukadev Bhattiprolu

From: Sukadev Bhattiprolu 
Date: Fri, 9 Feb 2018 11:49:06 -0600
Subject: [PATCH 1/1] powerpc/vas: Fix cleanup when VAS is not configured

When VAS is not configured, unregister the platform driver. Also simplify
cleanup by delaying vas debugfs init until we know VAS is configured.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v2]
- [Michael Ellerman] Move vas_init_dbgdir() into a lower level
  function to keep vas_init() cleaner.
---
 arch/powerpc/platforms/powernv/vas-debug.c | 11 +++
 arch/powerpc/platforms/powernv/vas.c   |  6 +++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/vas-debug.c 
b/arch/powerpc/platforms/powernv/vas-debug.c
index b4de4c6..4f7276e 100644
--- a/arch/powerpc/platforms/powernv/vas-debug.c
+++ b/arch/powerpc/platforms/powernv/vas-debug.c
@@ -179,6 +179,7 @@ void vas_instance_init_dbgdir(struct vas_instance *vinst)
 {
struct dentry *d;
 
+   vas_init_dbgdir();
if (!vas_debugfs)
return;
 
@@ -201,8 +202,18 @@ void vas_instance_init_dbgdir(struct vas_instance *vinst)
vinst->dbgdir = NULL;
 }
 
+/*
+ * Set up the "root" VAS debugfs dir. Return if we already set it up
+ * (or failed to) in an earlier instance of VAS.
+ */
 void vas_init_dbgdir(void)
 {
+   static bool first_time = true;
+
+   if (!first_time)
+   return;
+
+   first_time = false;
vas_debugfs = debugfs_create_dir("vas", NULL);
if (IS_ERR(vas_debugfs))
vas_debugfs = NULL;
diff --git a/arch/powerpc/platforms/powernv/vas.c 
b/arch/powerpc/platforms/powernv/vas.c
index aebbe95..5a2b24c 100644
--- a/arch/powerpc/platforms/powernv/vas.c
+++ b/arch/powerpc/platforms/powernv/vas.c
@@ -160,8 +160,6 @@ static int __init vas_init(void)
int found = 0;
struct device_node *dn;
 
-   vas_init_dbgdir();
-
platform_driver_register(_driver);
 
for_each_compatible_node(dn, NULL, "ibm,vas") {
@@ -169,8 +167,10 @@ static int __init vas_init(void)
found++;
}
 
-   if (!found)
+   if (!found) {
+   platform_driver_unregister(_driver);
return -ENODEV;
+   }
 
pr_devel("Found %d instances\n", found);
 
-- 
2.7.4

Re: [PATCH V3] powerpc/mm/hash64: memset the pagetable pages on allocation.

2018-02-13 Thread Ram Pai

On Tue, Feb 13, 2018 at 04:39:33PM +0530, Aneesh Kumar K.V wrote:
> On powerpc we allocate page table pages from slab cache of different sizes. 
> For
> now we have a constructor that zero out the objects when we allocate then for
> the first time. We expect the objects to be zeroed out when we free the the
> object back to slab cache. This happens in the unmap path. For hugetlb pages
> we call huge_pte_get_and_clear to do that. With the current configuration of
> page table size, both pud and pgd level tables get allocated from the same 
> slab
> cache. At the pud level, we use the second half of the table to store the slot
> information. But never clear that when unmapping. When such an freed object 
> get
> allocated at pgd level, we will have part of the page table page not 
> initlaized
> correctly. This result in kernel crash
> 
> Simplify this by calling the object initialization after kmem_cache_alloc
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/64/pgalloc.h | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
> b/arch/powerpc/include/asm/book3s/64/pgalloc.h
> index 53df86d3cfce..e4d154a4d114 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
> @@ -73,10 +73,13 @@ static inline void radix__pgd_free(struct mm_struct *mm, 
> pgd_t *pgd)
> 
>  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
> + pgd_t *pgd;
>   if (radix_enabled())
>   return radix__pgd_alloc(mm);
> - return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
> - pgtable_gfp_flags(mm, GFP_KERNEL));

kmem_cache_zalloc() wont work?

RP

> + pgd = kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
> +pgtable_gfp_flags(mm, GFP_KERNEL));
> + memset(pgd, 0, PGD_TABLE_SIZE);
> + return pgd;
>  }
> 
>  static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
> -- 
> 2.14.3

-- 
Ram Pai

Re: [PATCH 2/2] powerpc/pseries: Declare optional dummy function for find_and_online_cpu_nid

2018-02-13 Thread Tyrel Datwyler

On 02/12/2018 02:34 PM, Guenter Roeck wrote:
> Commit e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with
> memoryless nodes") adds an unconditional call to find_and_online_cpu_nid(),
> which is only declared if CONFIG_PPC_SPLPAR is enabled. This results in
> the following build error if this is not the case.
> 
> arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu':
> arch/powerpc/platforms/pseries/hotplug-cpu.c:369:
>   undefined reference to `.find_and_online_cpu_nid'
> 
> Follow the guideline provided by similar functions and provide a dummy
> function if CONFIG_PPC_SPLPAR is not enabled. This also moves the external
> function declaration into an include file where it should be.
> 
> Fixes: e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with ...")
> Cc: Michael Bringmann 
> Cc: Michael Ellerman 
> Cc: Nathan Fontenot 
> Signed-off-by: Guenter Roeck 

Nathan already sent a patch on the 9th for this issue to the list.

-Tyrel

> ---
>  arch/powerpc/include/asm/topology.h  | 5 +
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 --
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/topology.h 
> b/arch/powerpc/include/asm/topology.h
> index 88187c285c70..52815982436f 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -82,6 +82,7 @@ static inline int numa_update_cpu_topology(bool cpus_locked)
>  extern int start_topology_update(void);
>  extern int stop_topology_update(void);
>  extern int prrn_is_enabled(void);
> +extern int find_and_online_cpu_nid(int cpu);
>  #else
>  static inline int start_topology_update(void)
>  {
> @@ -95,6 +96,10 @@ static inline int prrn_is_enabled(void)
>  {
>   return 0;
>  }
> +static inline int find_and_online_cpu_nid(int cpu)
> +{
> + return 0;
> +}
>  #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
> 
>  #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_NEED_MULTIPLE_NODES)
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index dceb51454d8d..f5c6a8cd2926 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -340,8 +340,6 @@ static void pseries_remove_processor(struct device_node 
> *np)
>   cpu_maps_update_done();
>  }
> 
> -extern int find_and_online_cpu_nid(int cpu);
> -
>  static int dlpar_online_cpu(struct device_node *dn)
>  {
>   int rc = 0;
>

Re: [bug report] ocxl: Add AFU interrupt support

2018-02-13 Thread Frederic Barrat


Hi,

Thanks for the report. I'll fix the first issue. The 2nd is already on 
its way to upstream:

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=dedab7f0d3137441a97fe7cf9b9ca5

(though we still have a useless cast in there; will fix as well).

May I ask what static checker you're using?

Thanks,

  Fred

Le 13/02/2018 à 09:12, Dan Carpenter a écrit :

Hello Frederic Barrat,

The patch aeddad1760ae: "ocxl: Add AFU interrupt support" from Jan
23, 2018, leads to the following static checker warning:

 drivers/misc/ocxl/file.c:163 afu_ioctl()
 warn: maybe return -EFAULT instead of the bytes remaining?

drivers/misc/ocxl/file.c
111  static long afu_ioctl(struct file *file, unsigned int cmd,
112  unsigned long args)
113  {
114  struct ocxl_context *ctx = file->private_data;
115  struct ocxl_ioctl_irq_fd irq_fd;
116  u64 irq_offset;
117  long rc;
118
119  pr_debug("%s for context %d, command %s\n", __func__, 
ctx->pasid,
120  CMD_STR(cmd));
121
122  if (ctx->status == CLOSED)
123  return -EIO;
124
125  switch (cmd) {
126  case OCXL_IOCTL_ATTACH:
127  rc = afu_ioctl_attach(ctx,
128  (struct ocxl_ioctl_attach __user *) 
args);
129  break;
130
131  case OCXL_IOCTL_IRQ_ALLOC:
132  rc = ocxl_afu_irq_alloc(ctx, _offset);
133  if (!rc) {
134  rc = copy_to_user((u64 __user *) args, 
_offset,
135  sizeof(irq_offset));
136  if (rc)
 ^^
copy_to_user() returns the number of bytes remaining but we want to
return -EFAULT on error.

137  ocxl_afu_irq_free(ctx, irq_offset);
138  }
139  break;
140

 drivers/misc/ocxl/file.c:320 afu_read()
 warn: unsigned 'used' is never less than zero.

drivers/misc/ocxl/file.c
279  ssize_t rc;
280  size_t used = 0;
 ^^
This should be ssize_t

281  DEFINE_WAIT(event_wait);
282
283  memset(, 0, sizeof(header));
284
285  /* Require offset to be 0 */
286  if (*off != 0)
287  return -EINVAL;
288
289  if (count < (sizeof(struct ocxl_kernel_event_header) +
290  AFU_EVENT_BODY_MAX_SIZE))
291  return -EINVAL;
292
293  for (;;) {
294  prepare_to_wait(>events_wq, _wait,
295  TASK_INTERRUPTIBLE);
296
297  if (afu_events_pending(ctx))
298  break;
299
300  if (ctx->status == CLOSED)
301  break;
302
303  if (file->f_flags & O_NONBLOCK) {
304  finish_wait(>events_wq, _wait);
305  return -EAGAIN;
306  }
307
308  if (signal_pending(current)) {
309  finish_wait(>events_wq, _wait);
310  return -ERESTARTSYS;
311  }
312
313  schedule();
314  }
315
316  finish_wait(>events_wq, _wait);
317
318  if (has_xsl_error(ctx)) {
319  used = append_xsl_error(ctx, , buf + 
sizeof(header));
320  if (used < 0)
 
Impossible.

321  return used;
322  }
323
324  if (!afu_events_pending(ctx))
325  header.flags |= OCXL_KERNEL_EVENT_FLAG_LAST;
326
327  if (copy_to_user(buf, , sizeof(header)))
328  return -EFAULT;
329
330  used += sizeof(header);
331
332  rc = (ssize_t) used;
  ^^
You could remove the cast.

333  return rc;
334  }

regards,
dan carpenter

[PATCH 1/2] KVM: PPC: Fix compile error that occurs when CONFIG_ALTIVEC=n

2018-02-13 Thread Christian Zigotzky

I successfully compiled the latest Git kernel with this patch without enabled 
AltiVec for my Freescale P5020 board today. The patch works without any 
problems.

— Christian

Sent from my iPhone

On 13. Feb 2018, at 05:51, Paul Mackerras  wrote:

Commit accb757d798c ("KVM: Move vcpu_load to arch-specific
kvm_arch_vcpu_ioctl_run", 2017-12-04) added a "goto out"
statement and an "out:" label to kvm_arch_vcpu_ioctl_run().
Since the only "goto out" is inside a CONFIG_VSX block,
compiling with CONFIG_VSX=n gives a warning that label "out"
is defined but not used, and because arch/powerpc is compiled
with -Werror, that becomes a compile error that makes the kernel
build fail.

Merge commit 1ab03c072feb ("Merge tag 'kvm-ppc-next-4.16-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc",
2018-02-09) added a similar block of code inside a #ifdef
CONFIG_ALTIVEC, with a "goto out" statement.

In order to make the build succeed, this adds a #ifdef around the
"out:" label.  This is a minimal, ugly fix, to be replaced later
by a refactoring of the code.  Since CONFIG_VSX depends on
CONFIG_ALTIVEC, it is sufficient to use #ifdef CONFIG_ALTIVEC here.

Fixes: accb757d798c ("KVM: Move vcpu_load to arch-specific 
kvm_arch_vcpu_ioctl_run")
Reported-by: Christian Zigotzky 
Signed-off-by: Paul Mackerras 
---
arch/powerpc/kvm/powerpc.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 403e642c78f5..0083142c2f84 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1608,7 +1608,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)

   kvm_sigset_deactivate(vcpu);

+#ifdef CONFIG_ALTIVEC
out:
+#endif
   vcpu_put(vcpu);
   return r;
}
-- 
2.11.0

Re: [PATCH] powerpc/via-pmu: Fix section mismatch warning

2018-02-13 Thread Laurent Vivier

On 07/02/2018 20:44, Mathieu Malaterre wrote:
> Remove the __init annotation from pmu_init() to avoid the
> following warning.
> 
> WARNING: vmlinux.o(.data+0x4739c): Section mismatch in reference from the 
> variable via_pmu_driver to the function .init.text:pmu_init()
> The variable via_pmu_driver references
> the function __init pmu_init()
> If the reference is valid then annotate the
> variable with __init* or __refdata (see linux/init.h) or name the variable:
> *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
> 
> Signed-off-by: Mathieu Malaterre 
> ---
>  drivers/macintosh/via-pmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/macintosh/via-pmu.c b/drivers/macintosh/via-pmu.c
> index 08849e33c567..5f378272d5b2 100644
> --- a/drivers/macintosh/via-pmu.c
> +++ b/drivers/macintosh/via-pmu.c
> @@ -378,7 +378,7 @@ static int pmu_probe(void)
>   return vias == NULL? -ENODEV: 0;
>  }
>  
> -static int __init pmu_init(void)
> +static int pmu_init(void)
>  {
>   if (vias == NULL)
>   return -ENODEV;
> 

pmu_init() is really an init function only called by another init
function (adb_init()).

So I think it could be good to let the __init marker.

Did you try:

--- a/drivers/macintosh/via-pmu.c
+++ b/drivers/macintosh/via-pmu.c
@@ -198,7 +198,7 @@ static const struct file_operations
pmu_battery_proc_fops;
 static const struct file_operations pmu_options_proc_fops;

 #ifdef CONFIG_ADB
-struct adb_driver via_pmu_driver = {
+const struct adb_driver via_pmu_driver = {
"PMU",
pmu_probe,
pmu_init,


Thanks,
Laurent

Re: [PATCH] headers: untangle kmemleak.h from mm.h

2018-02-13 Thread Randy Dunlap

On 02/13/2018 02:09 AM, Michael Ellerman wrote:
> Randy Dunlap  writes:
> 
>> On 02/12/2018 04:28 AM, Michael Ellerman wrote:
>>> Randy Dunlap  writes:
>>>
 From: Randy Dunlap 

 Currently  #includes  for no obvious
 reason. It looks like it's only a convenience, so remove kmemleak.h
 from slab.h and add  to any users of kmemleak_*
 that don't already #include it.
 Also remove  from source files that do not use it.

 This is tested on i386 allmodconfig and x86_64 allmodconfig. It
 would be good to run it through the 0day bot for other $ARCHes.
 I have neither the horsepower nor the storage space for the other
 $ARCHes.

 [slab.h is the second most used header file after module.h; kernel.h
 is right there with slab.h. There could be some minor error in the
 counting due to some #includes having comments after them and I
 didn't combine all of those.]

 This is Lingchi patch #1 (death by a thousand cuts, applied to kernel
 header files).

 Signed-off-by: Randy Dunlap 
>>>
>>> I threw it at a random selection of configs and so far the only failures
>>> I'm seeing are:
>>>
>>>   lib/test_firmware.c:134:2: error: implicit declaration of function 
>>> 'vfree' [-Werror=implicit-function-declaration] 
>>> 
>>>  
>>>   lib/test_firmware.c:620:25: error: implicit declaration of function 
>>> 'vzalloc' [-Werror=implicit-function-declaration]
>>>   lib/test_firmware.c:620:2: error: implicit declaration of function 
>>> 'vzalloc' [-Werror=implicit-function-declaration]
>>>   security/integrity/digsig.c:146:2: error: implicit declaration of 
>>> function 'vfree' [-Werror=implicit-function-declaration]
>>
>> Both of those source files need to #include .
> 
> Yep, I added those and rebuilt. I don't see any more failures that look
> related to your patch.

Great, thanks.

I also sent patches for both of those.

>   http://kisskb.ellerman.id.au/kisskb/head/13399/
> 
> 
> I haven't gone through the defconfigs I have enabled for a while, so
> it's possible I have some missing but it's still a reasonable cross
> section.

-- 
~Randy

Re: pata-macio WARNING at dmam_alloc_coherent+0xec/0x110

2018-02-13 Thread Meelis Roos

> Does this fix your warning?
> 
> diff --git a/drivers/macintosh/macio_asic.c b/drivers/macintosh/macio_asic.c
> index 62f541f968f6..07074820a167 100644
> --- a/drivers/macintosh/macio_asic.c
> +++ b/drivers/macintosh/macio_asic.c
> @@ -375,6 +375,7 @@ static struct macio_dev * macio_add_one_device(struct 
> macio_chip *chip,
>   dev->ofdev.dev.of_node = np;
>   dev->ofdev.archdata.dma_mask = 0xUL;
>   dev->ofdev.dev.dma_mask = >ofdev.archdata.dma_mask;
> + dev->ofdev.dev.coherent_dma_mask = dev->ofdev.archdata.dma_mask;
>   dev->ofdev.dev.parent = parent;
>   dev->ofdev.dev.bus = _bus_type;
>   dev->ofdev.dev.release = macio_release_dev;

Yes, it does - thank you!

Tested-by: Meelis Roos 

-- 
Meelis Roos (mr...@linux.ee)

Re: pata-macio WARNING at dmam_alloc_coherent+0xec/0x110

2018-02-13 Thread Mathieu Malaterre

Hi,

On Tue, Feb 13, 2018 at 3:51 PM, Christoph Hellwig  wrote:
> Does this fix your warning?
>
> diff --git a/drivers/macintosh/macio_asic.c b/drivers/macintosh/macio_asic.c
> index 62f541f968f6..07074820a167 100644
> --- a/drivers/macintosh/macio_asic.c
> +++ b/drivers/macintosh/macio_asic.c
> @@ -375,6 +375,7 @@ static struct macio_dev * macio_add_one_device(struct 
> macio_chip *chip,
> dev->ofdev.dev.of_node = np;
> dev->ofdev.archdata.dma_mask = 0xUL;
> dev->ofdev.dev.dma_mask = >ofdev.archdata.dma_mask;
> +   dev->ofdev.dev.coherent_dma_mask = dev->ofdev.archdata.dma_mask;
> dev->ofdev.dev.parent = parent;
> dev->ofdev.dev.bus = _bus_type;
> dev->ofdev.dev.release = macio_release_dev;

Indeed, Thanks much! If needed:

Tested-by: Mathieu Malaterre 

System: Mac Mini G4

ref:
https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg128662.html

[PATCH 14/14] powerpc/64s/radix: allocate kernel page tables node-local if possible

2018-02-13 Thread Nicholas Piggin

Try to allocate kernel page tables for direct mapping and vmemmap
according to the node of the memory they will map. The node is not
available for the linear map in early boot, so use range allocation
to allocate the page tables from the region they map, which is
effectively node-local.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/pgtable-radix.c | 111 ++--
 1 file changed, 85 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 4c5cc69c92c2..66b07718875a 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -48,11 +48,26 @@ static int native_register_process_table(unsigned long 
base, unsigned long pg_sz
return 0;
 }
 
-static __ref void *early_alloc_pgtable(unsigned long size)
+static __ref void *early_alloc_pgtable(unsigned long size, int nid,
+   unsigned long region_start, unsigned long region_end)
 {
+   unsigned long pa = 0;
void *pt;
 
-   pt = __va(memblock_alloc_base(size, size, MEMBLOCK_ALLOC_ANYWHERE));
+   if (region_start || region_end) /* has region hint */
+   pa = memblock_alloc_range(size, size, region_start, region_end,
+   MEMBLOCK_NONE);
+   else if (nid != -1) /* has node hint */
+   pa = memblock_alloc_base_nid(size, size,
+   MEMBLOCK_ALLOC_ANYWHERE,
+   nid, MEMBLOCK_NONE);
+
+   if (!pa)
+   pa = memblock_alloc_base(size, size, MEMBLOCK_ALLOC_ANYWHERE);
+
+   BUG_ON(!pa);
+
+   pt = __va(pa);
memset(pt, 0, size);
 
return pt;
@@ -60,8 +75,11 @@ static __ref void *early_alloc_pgtable(unsigned long size)
 
 static int early_map_kernel_page(unsigned long ea, unsigned long pa,
  pgprot_t flags,
- unsigned int map_page_size)
+ unsigned int map_page_size,
+ int nid,
+ unsigned long region_start, unsigned long region_end)
 {
+   unsigned long pfn = pa >> PAGE_SHIFT;
pgd_t *pgdp;
pud_t *pudp;
pmd_t *pmdp;
@@ -69,8 +87,8 @@ static int early_map_kernel_page(unsigned long ea, unsigned 
long pa,
 
pgdp = pgd_offset_k(ea);
if (pgd_none(*pgdp)) {
-   pudp = early_alloc_pgtable(PUD_TABLE_SIZE);
-   BUG_ON(pudp == NULL);
+   pudp = early_alloc_pgtable(PUD_TABLE_SIZE, nid,
+   region_start, region_end);
pgd_populate(_mm, pgdp, pudp);
}
pudp = pud_offset(pgdp, ea);
@@ -79,8 +97,8 @@ static int early_map_kernel_page(unsigned long ea, unsigned 
long pa,
goto set_the_pte;
}
if (pud_none(*pudp)) {
-   pmdp = early_alloc_pgtable(PMD_TABLE_SIZE);
-   BUG_ON(pmdp == NULL);
+   pmdp = early_alloc_pgtable(PMD_TABLE_SIZE, nid,
+   region_start, region_end);
pud_populate(_mm, pudp, pmdp);
}
pmdp = pmd_offset(pudp, ea);
@@ -89,23 +107,29 @@ static int early_map_kernel_page(unsigned long ea, 
unsigned long pa,
goto set_the_pte;
}
if (!pmd_present(*pmdp)) {
-   ptep = early_alloc_pgtable(PAGE_SIZE);
-   BUG_ON(ptep == NULL);
+   ptep = early_alloc_pgtable(PAGE_SIZE, nid,
+   region_start, region_end);
pmd_populate_kernel(_mm, pmdp, ptep);
}
ptep = pte_offset_kernel(pmdp, ea);
 
 set_the_pte:
-   set_pte_at(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, flags));
+   set_pte_at(_mm, ea, ptep, pfn_pte(pfn, flags));
smp_wmb();
return 0;
 }
 
-
-int radix__map_kernel_page(unsigned long ea, unsigned long pa,
+/*
+ * nid, region_start, and region_end are hints to try to place the page
+ * table memory in the same node or region.
+ */
+static int __map_kernel_page(unsigned long ea, unsigned long pa,
  pgprot_t flags,
- unsigned int map_page_size)
+ unsigned int map_page_size,
+ int nid,
+ unsigned long region_start, unsigned long region_end)
 {
+   unsigned long pfn = pa >> PAGE_SHIFT;
pgd_t *pgdp;
pud_t *pudp;
pmd_t *pmdp;
@@ -115,9 +139,15 @@ int radix__map_kernel_page(unsigned long ea, unsigned long 
pa,
 */
BUILD_BUG_ON(TASK_SIZE_USER64 > RADIX_PGTABLE_RANGE);
 
-   if (!slab_is_available())
-   return early_map_kernel_page(ea, pa, flags, map_page_size);
+   if (unlikely(!slab_is_available()))
+   return early_map_kernel_page(ea, pa, flags,

[PATCH 13/14] powerpc/64s/radix: split early page table mapping to its own function

2018-02-13 Thread Nicholas Piggin

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/pgtable-radix.c | 114 +++-
 1 file changed, 66 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 435b19e74508..4c5cc69c92c2 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -58,6 +58,50 @@ static __ref void *early_alloc_pgtable(unsigned long size)
return pt;
 }
 
+static int early_map_kernel_page(unsigned long ea, unsigned long pa,
+ pgprot_t flags,
+ unsigned int map_page_size)
+{
+   pgd_t *pgdp;
+   pud_t *pudp;
+   pmd_t *pmdp;
+   pte_t *ptep;
+
+   pgdp = pgd_offset_k(ea);
+   if (pgd_none(*pgdp)) {
+   pudp = early_alloc_pgtable(PUD_TABLE_SIZE);
+   BUG_ON(pudp == NULL);
+   pgd_populate(_mm, pgdp, pudp);
+   }
+   pudp = pud_offset(pgdp, ea);
+   if (map_page_size == PUD_SIZE) {
+   ptep = (pte_t *)pudp;
+   goto set_the_pte;
+   }
+   if (pud_none(*pudp)) {
+   pmdp = early_alloc_pgtable(PMD_TABLE_SIZE);
+   BUG_ON(pmdp == NULL);
+   pud_populate(_mm, pudp, pmdp);
+   }
+   pmdp = pmd_offset(pudp, ea);
+   if (map_page_size == PMD_SIZE) {
+   ptep = pmdp_ptep(pmdp);
+   goto set_the_pte;
+   }
+   if (!pmd_present(*pmdp)) {
+   ptep = early_alloc_pgtable(PAGE_SIZE);
+   BUG_ON(ptep == NULL);
+   pmd_populate_kernel(_mm, pmdp, ptep);
+   }
+   ptep = pte_offset_kernel(pmdp, ea);
+
+set_the_pte:
+   set_pte_at(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, flags));
+   smp_wmb();
+   return 0;
+}
+
+
 int radix__map_kernel_page(unsigned long ea, unsigned long pa,
  pgprot_t flags,
  unsigned int map_page_size)
@@ -70,54 +114,28 @@ int radix__map_kernel_page(unsigned long ea, unsigned long 
pa,
 * Make sure task size is correct as per the max adddr
 */
BUILD_BUG_ON(TASK_SIZE_USER64 > RADIX_PGTABLE_RANGE);
-   if (slab_is_available()) {
-   pgdp = pgd_offset_k(ea);
-   pudp = pud_alloc(_mm, pgdp, ea);
-   if (!pudp)
-   return -ENOMEM;
-   if (map_page_size == PUD_SIZE) {
-   ptep = (pte_t *)pudp;
-   goto set_the_pte;
-   }
-   pmdp = pmd_alloc(_mm, pudp, ea);
-   if (!pmdp)
-   return -ENOMEM;
-   if (map_page_size == PMD_SIZE) {
-   ptep = pmdp_ptep(pmdp);
-   goto set_the_pte;
-   }
-   ptep = pte_alloc_kernel(pmdp, ea);
-   if (!ptep)
-   return -ENOMEM;
-   } else {
-   pgdp = pgd_offset_k(ea);
-   if (pgd_none(*pgdp)) {
-   pudp = early_alloc_pgtable(PUD_TABLE_SIZE);
-   BUG_ON(pudp == NULL);
-   pgd_populate(_mm, pgdp, pudp);
-   }
-   pudp = pud_offset(pgdp, ea);
-   if (map_page_size == PUD_SIZE) {
-   ptep = (pte_t *)pudp;
-   goto set_the_pte;
-   }
-   if (pud_none(*pudp)) {
-   pmdp = early_alloc_pgtable(PMD_TABLE_SIZE);
-   BUG_ON(pmdp == NULL);
-   pud_populate(_mm, pudp, pmdp);
-   }
-   pmdp = pmd_offset(pudp, ea);
-   if (map_page_size == PMD_SIZE) {
-   ptep = pmdp_ptep(pmdp);
-   goto set_the_pte;
-   }
-   if (!pmd_present(*pmdp)) {
-   ptep = early_alloc_pgtable(PAGE_SIZE);
-   BUG_ON(ptep == NULL);
-   pmd_populate_kernel(_mm, pmdp, ptep);
-   }
-   ptep = pte_offset_kernel(pmdp, ea);
+
+   if (!slab_is_available())
+   return early_map_kernel_page(ea, pa, flags, map_page_size);
+
+   pgdp = pgd_offset_k(ea);
+   pudp = pud_alloc(_mm, pgdp, ea);
+   if (!pudp)
+   return -ENOMEM;
+   if (map_page_size == PUD_SIZE) {
+   ptep = (pte_t *)pudp;
+   goto set_the_pte;
+   }
+   pmdp = pmd_alloc(_mm, pudp, ea);
+   if (!pmdp)
+   return -ENOMEM;
+   if (map_page_size == PMD_SIZE) {
+   ptep = pmdp_ptep(pmdp);
+   goto set_the_pte;
}
+   ptep = pte_alloc_kernel(pmdp, ea);
+   if (!ptep)
+   return -ENOMEM;
 
 set_the_pte:
set_pte_at(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, flags));
@@ -864,7 +882,7 @@ static void remove_pagetable(unsigned long start, unsigned 
long end)

[PATCH 12/14] powerpc: pass node id into create_section_mapping

2018-02-13 Thread Nicholas Piggin

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/hash.h  | 2 +-
 arch/powerpc/include/asm/book3s/64/radix.h | 2 +-
 arch/powerpc/include/asm/sparsemem.h   | 2 +-
 arch/powerpc/mm/hash_utils_64.c| 2 +-
 arch/powerpc/mm/mem.c  | 4 ++--
 arch/powerpc/mm/pgtable-book3s64.c | 6 +++---
 arch/powerpc/mm/pgtable-radix.c| 4 ++--
 7 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 0920eff731b3..b1ace9619e94 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -201,7 +201,7 @@ extern int __meminit hash__vmemmap_create_mapping(unsigned 
long start,
 extern void hash__vmemmap_remove_mapping(unsigned long start,
 unsigned long page_size);
 
-int hash__create_section_mapping(unsigned long start, unsigned long end);
+int hash__create_section_mapping(unsigned long start, unsigned long end, int 
nid);
 int hash__remove_section_mapping(unsigned long start, unsigned long end);
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 365010f66570..705193e7192f 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -313,7 +313,7 @@ static inline unsigned long radix__get_tree_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int radix__create_section_mapping(unsigned long start, unsigned long end);
+int radix__create_section_mapping(unsigned long start, unsigned long end, int 
nid);
 int radix__remove_section_mapping(unsigned long start, unsigned long end);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index a7916ee6dfb6..bc66712bdc3c 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -17,7 +17,7 @@
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-extern int create_section_mapping(unsigned long start, unsigned long end);
+extern int create_section_mapping(unsigned long start, unsigned long end, int 
nid);
 extern int remove_section_mapping(unsigned long start, unsigned long end);
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 7d07c7e17db6..ceb5494804b2 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -781,7 +781,7 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
}
 }
 
-int hash__create_section_mapping(unsigned long start, unsigned long end)
+int hash__create_section_mapping(unsigned long start, unsigned long end, int 
nid)
 {
int rc = htab_bolt_mapping(start, end, __pa(start),
   pgprot_val(PAGE_KERNEL), mmu_linear_psize,
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 4eee46ea4d96..f50ce66dd6bd 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -117,7 +117,7 @@ int memory_add_physaddr_to_nid(u64 start)
 }
 #endif
 
-int __weak create_section_mapping(unsigned long start, unsigned long end)
+int __weak create_section_mapping(unsigned long start, unsigned long end, int 
nid)
 {
return -ENODEV;
 }
@@ -137,7 +137,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct 
vmem_altmap *altmap,
resize_hpt_for_hotplug(memblock_phys_mem_size());
 
start = (unsigned long)__va(start);
-   rc = create_section_mapping(start, start + size);
+   rc = create_section_mapping(start, start + size, nid);
if (rc) {
pr_warn("Unable to create mapping for hot added memory 
0x%llx..0x%llx: %d\n",
start, start + size, rc);
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 422e80253a33..c736280068ce 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -155,12 +155,12 @@ void mmu_cleanup_all(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int create_section_mapping(unsigned long start, unsigned long end)
+int create_section_mapping(unsigned long start, unsigned long end, int nid)
 {
if (radix_enabled())
-   return radix__create_section_mapping(start, end);
+   return radix__create_section_mapping(start, end, nid);
 
-   return hash__create_section_mapping(start, end);
+   return hash__create_section_mapping(start, end, nid);
 }
 
 int remove_section_mapping(unsigned long start, unsigned long end)
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 328ff9abc333..435b19e74508 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -862,9 +862,9 @@ static void remove_pagetable(unsigned long start, unsigned 
long end)

[PATCH 11/14] powerpc/64: allocate per-cpu stacks node-local if possible

2018-02-13 Thread Nicholas Piggin

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/setup_64.c | 51 ++
 1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 02fa358982e6..16ea71fa1ead 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -611,6 +611,21 @@ __init u64 ppc64_bolted_size(void)
 #endif
 }
 
+static void *__init alloc_stack(unsigned long limit, int cpu)
+{
+   unsigned long pa;
+
+   pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit,
+   early_cpu_to_node(cpu), MEMBLOCK_NONE);
+   if (!pa) {
+   pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit);
+   if (!pa)
+   panic("cannot allocate stacks");
+   }
+
+   return __va(pa);
+}
+
 void __init irqstack_early_init(void)
 {
u64 limit = ppc64_bolted_size();
@@ -622,12 +637,8 @@ void __init irqstack_early_init(void)
 * accessed in realmode.
 */
for_each_possible_cpu(i) {
-   softirq_ctx[i] = (struct thread_info *)
-   __va(memblock_alloc_base(THREAD_SIZE,
-   THREAD_SIZE, limit));
-   hardirq_ctx[i] = (struct thread_info *)
-   __va(memblock_alloc_base(THREAD_SIZE,
-   THREAD_SIZE, limit));
+   softirq_ctx[i] = alloc_stack(limit, i);
+   hardirq_ctx[i] = alloc_stack(limit, i);
}
 }
 
@@ -635,20 +646,21 @@ void __init irqstack_early_init(void)
 void __init exc_lvl_early_init(void)
 {
unsigned int i;
-   unsigned long sp;
 
for_each_possible_cpu(i) {
-   sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
-   critirq_ctx[i] = (struct thread_info *)__va(sp);
-   paca_ptrs[i]->crit_kstack = __va(sp + THREAD_SIZE);
+   void *sp;
 
-   sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
-   dbgirq_ctx[i] = (struct thread_info *)__va(sp);
-   paca_ptrs[i]->dbg_kstack = __va(sp + THREAD_SIZE);
+   sp = alloc_stack(ULONG_MAX, i);
+   critirq_ctx[i] = sp;
+   paca_ptrs[i]->crit_kstack = sp + THREAD_SIZE;
 
-   sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
-   mcheckirq_ctx[i] = (struct thread_info *)__va(sp);
-   paca_ptrs[i]->mc_kstack = __va(sp + THREAD_SIZE);
+   sp = alloc_stack(ULONG_MAX, i);
+   dbgirq_ctx[i] = sp;
+   paca_ptrs[i]->dbg_kstack = sp + THREAD_SIZE;
+
+   sp = alloc_stack(ULONG_MAX, i);
+   mcheckirq_ctx[i] = sp;
+   paca_ptrs[i]->mc_kstack = sp + THREAD_SIZE;
}
 
if (cpu_has_feature(CPU_FTR_DEBUG_LVL_EXC))
@@ -702,20 +714,21 @@ void __init emergency_stack_init(void)
 
for_each_possible_cpu(i) {
struct thread_info *ti;
-   ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
+
+   ti = alloc_stack(limit, i);
memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->emergency_sp = (void *)ti + THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for NMI exception handling. */
-   ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
+   ti = alloc_stack(limit, i);
memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->nmi_emergency_sp = (void *)ti + THREAD_SIZE;
 
/* emergency stack for machine check exception handling. */
-   ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
+   ti = alloc_stack(limit, i);
memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->mc_emergency_sp = (void *)ti + THREAD_SIZE;
-- 
2.16.1

[PATCH 10/14] powerpc/64: allocate pacas per node

2018-02-13 Thread Nicholas Piggin

Per-node allocations are possible on 64s with radix that does
not have the bolted SLB limitation.

Hash would be able to do the same if all CPUs had the bottom of
their node-local memory bolted as well. This is left as an
exercise for the reader.
---
 arch/powerpc/kernel/paca.c | 41 +++--
 arch/powerpc/kernel/setup_64.c |  4 
 2 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 12d329467631..470ce21af8b5 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -20,6 +20,37 @@
 
 #include "setup.h"
 
+static void *__init alloc_paca_data(unsigned long size, unsigned long align,
+   unsigned long limit, int cpu)
+{
+   unsigned long pa;
+   int nid;
+
+   /*
+* boot_cpuid paca is allocated very early before cpu_to_node is up.
+* Set bottom-up mode, because the boot CPU should be on node-0,
+* which will put its paca in the right place.
+*/
+   if (cpu == boot_cpuid) {
+   nid = -1;
+   memblock_set_bottom_up(true);
+   } else {
+   nid = early_cpu_to_node(cpu);
+   }
+
+   pa = memblock_alloc_base_nid(size, align, limit, nid, MEMBLOCK_NONE);
+   if (!pa) {
+   pa = memblock_alloc_base(size, align, limit);
+   if (!pa)
+   panic("cannot allocate paca data");
+   }
+
+   if (cpu == boot_cpuid)
+   memblock_set_bottom_up(false);
+
+   return __va(pa);
+}
+
 #ifdef CONFIG_PPC_PSERIES
 
 /*
@@ -52,7 +83,7 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
if (early_cpu_has_feature(CPU_FTR_HVMODE))
return NULL;
 
-   lp = __va(memblock_alloc_base(size, 0x400, limit));
+   lp = alloc_paca_data(size, 0x400, limit, cpu);
init_lppaca(lp);
 
return lp;
@@ -82,7 +113,7 @@ static struct slb_shadow * __init new_slb_shadow(int cpu, 
unsigned long limit)
return NULL;
}
 
-   s = __va(memblock_alloc_base(sizeof(*s), L1_CACHE_BYTES, limit));
+   s = alloc_paca_data(sizeof(*s), L1_CACHE_BYTES, limit, cpu);
memset(s, 0, sizeof(*s));
 
s->persistent = cpu_to_be32(SLB_NUM_BOLTED);
@@ -173,7 +204,6 @@ void __init allocate_paca_ptrs(void)
 void __init allocate_paca(int cpu)
 {
u64 limit;
-   unsigned long pa;
struct paca_struct *paca;
 
BUG_ON(cpu >= paca_nr_cpu_ids);
@@ -188,9 +218,8 @@ void __init allocate_paca(int cpu)
limit = ppc64_rma_size;
 #endif
 
-   pa = memblock_alloc_base(sizeof(struct paca_struct),
-   L1_CACHE_BYTES, limit);
-   paca = __va(pa);
+   paca = alloc_paca_data(sizeof(struct paca_struct), L1_CACHE_BYTES,
+   limit, cpu);
paca_ptrs[cpu] = paca;
memset(paca, 0, sizeof(struct paca_struct));
 
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index dde34d35d1e7..02fa358982e6 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -312,6 +312,10 @@ void __init early_setup(unsigned long dt_ptr)
early_init_devtree(__va(dt_ptr));
 
/* Now we know the logical id of our boot cpu, setup the paca. */
+   if (boot_cpuid != 0) {
+   /* Poison paca_ptrs[0] again if it's not the boot cpu */
+   memset(_ptrs[0], 0x88, sizeof(paca_ptrs[0]));
+   }
setup_paca(paca_ptrs[boot_cpuid]);
fixup_boot_paca();
 
-- 
2.16.1

[PATCH 09/14] powerpc/64: defer paca allocation until memory topology is discovered

2018-02-13 Thread Nicholas Piggin

---
 arch/powerpc/include/asm/paca.h|  3 +-
 arch/powerpc/kernel/paca.c | 90 --
 arch/powerpc/kernel/prom.c |  5 ++-
 arch/powerpc/kernel/setup-common.c | 24 +++---
 4 files changed, 51 insertions(+), 71 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index f266b0a7be95..407a8076edd7 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -252,7 +252,8 @@ extern void copy_mm_to_paca(struct mm_struct *mm);
 extern struct paca_struct **paca_ptrs;
 extern void initialise_paca(struct paca_struct *new_paca, int cpu);
 extern void setup_paca(struct paca_struct *new_paca);
-extern void allocate_pacas(void);
+extern void allocate_paca_ptrs(void);
+extern void allocate_paca(int cpu);
 extern void free_unused_pacas(void);
 
 #else /* CONFIG_PPC64 */
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index e560072f122b..12d329467631 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -57,16 +57,6 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
 
return lp;
 }
-
-static void __init free_lppaca(struct lppaca *lp)
-{
-   size_t size = 0x400;
-
-   if (early_cpu_has_feature(CPU_FTR_HVMODE))
-   return;
-
-   memblock_free(__pa(lp), size);
-}
 #endif /* CONFIG_PPC_BOOK3S */
 
 #ifdef CONFIG_PPC_BOOK3S_64
@@ -169,12 +159,24 @@ void setup_paca(struct paca_struct *new_paca)
 
 static int __initdata paca_nr_cpu_ids;
 static int __initdata paca_ptrs_size;
+static int __initdata paca_struct_size;
+
+void __init allocate_paca_ptrs(void)
+{
+   paca_nr_cpu_ids = nr_cpu_ids;
+
+   paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
+   paca_ptrs = __va(memblock_alloc(paca_ptrs_size, 0));
+   memset(paca_ptrs, 0x88, paca_ptrs_size);
+}
 
-void __init allocate_pacas(void)
+void __init allocate_paca(int cpu)
 {
u64 limit;
-   unsigned long size = 0;
-   int cpu;
+   unsigned long pa;
+   struct paca_struct *paca;
+
+   BUG_ON(cpu >= paca_nr_cpu_ids);
 
 #ifdef CONFIG_PPC_BOOK3S_64
/*
@@ -186,69 +188,30 @@ void __init allocate_pacas(void)
limit = ppc64_rma_size;
 #endif
 
-   paca_nr_cpu_ids = nr_cpu_ids;
-
-   paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
-   paca_ptrs = __va(memblock_alloc_base(paca_ptrs_size, 0, limit));
-   memset(paca_ptrs, 0, paca_ptrs_size);
-
-   size += paca_ptrs_size;
-
-   for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
-   unsigned long pa;
-
-   pa = memblock_alloc_base(sizeof(struct paca_struct),
-   L1_CACHE_BYTES, limit);
-   paca_ptrs[cpu] = __va(pa);
-   memset(paca_ptrs[cpu], 0, sizeof(struct paca_struct));
-
-   size += sizeof(struct paca_struct);
-   }
-
-   printk(KERN_DEBUG "Allocated %lu bytes for %u pacas\n",
-   size, nr_cpu_ids);
-
-   /* Can't use for_each_*_cpu, as they aren't functional yet */
-   for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
-   struct paca_struct *paca = paca_ptrs[cpu];
+   pa = memblock_alloc_base(sizeof(struct paca_struct),
+   L1_CACHE_BYTES, limit);
+   paca = __va(pa);
+   paca_ptrs[cpu] = paca;
+   memset(paca, 0, sizeof(struct paca_struct));
 
-   initialise_paca(paca, cpu);
+   initialise_paca(paca, cpu);
 #ifdef CONFIG_PPC_PSERIES
-   paca->lppaca_ptr = new_lppaca(cpu, limit);
+   paca->lppaca_ptr = new_lppaca(cpu, limit);
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
-   paca->slb_shadow_ptr = new_slb_shadow(cpu, limit);
+   paca->slb_shadow_ptr = new_slb_shadow(cpu, limit);
 #endif
-   }
+   paca_struct_size += sizeof(struct paca_struct);
 }
 
 void __init free_unused_pacas(void)
 {
-   unsigned long size = 0;
int new_ptrs_size;
-   int cpu;
-
-   for (cpu = 0; cpu < paca_nr_cpu_ids; cpu++) {
-   if (!cpu_possible(cpu)) {
-   unsigned long pa = __pa(paca_ptrs[cpu]);
-#ifdef CONFIG_PPC_PSERIES
-   free_lppaca(paca_ptrs[cpu]->lppaca_ptr);
-#endif
-   memblock_free(pa, sizeof(struct paca_struct));
-   paca_ptrs[cpu] = NULL;
-   size += sizeof(struct paca_struct);
-   }
-   }
 
new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
-   if (new_ptrs_size < paca_ptrs_size) {
+   if (new_ptrs_size < paca_ptrs_size)
memblock_free(__pa(paca_ptrs) + new_ptrs_size,
paca_ptrs_size - new_ptrs_size);
-   size += paca_ptrs_size - new_ptrs_size;
-   }
-
-   if (size)
-   printk(KERN_DEBUG "Freed %lu bytes for unused pacas\n", size);

[PATCH 08/14] powerpc/setup: cpu_to_phys_id array

2018-02-13 Thread Nicholas Piggin

Build an array that finds hardware CPU number from logical CPU
number in firmware CPU discovery. Use that rather than setting
paca of other CPUs directly, to begin with. Subsequent patch will
not have pacas allocated at this point.
---
 arch/powerpc/include/asm/smp.h |  1 +
 arch/powerpc/kernel/prom.c |  7 +++
 arch/powerpc/kernel/setup-common.c | 15 ++-
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index ec7b299350d9..cfecfee1194b 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -31,6 +31,7 @@
 
 extern int boot_cpuid;
 extern int spinning_secondaries;
+extern u32 *cpu_to_phys_id;
 
 extern void cpu_die(void);
 extern int cpu_to_chip_id(int cpu);
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 4dffef947b8a..5979e34ba90e 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -874,5 +874,12 @@ EXPORT_SYMBOL(cpu_to_chip_id);
 
 bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
 {
+   /*
+* Early firmware scanning must use this rather than
+* get_hard_smp_processor_id because we don't have pacas allocated
+* until memory topology is discovered.
+*/
+   if (cpu_to_phys_id != NULL)
+   return (int)phys_id == cpu_to_phys_id[cpu];
return (int)phys_id == get_hard_smp_processor_id(cpu);
 }
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 9eaf26318d20..bd79a5644c78 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -437,6 +437,8 @@ static void __init cpu_init_thread_core_maps(int tpc)
 }
 
 
+u32 *cpu_to_phys_id = NULL;
+
 /**
  * setup_cpu_maps - initialize the following cpu maps:
  *  cpu_possible_mask
@@ -463,6 +465,10 @@ void __init smp_setup_cpu_maps(void)
 
DBG("smp_setup_cpu_maps()\n");
 
+   cpu_to_phys_id = __va(memblock_alloc(nr_cpu_ids * sizeof(u32),
+   __alignof__(u32)));
+   memset(cpu_to_phys_id, 0, nr_cpu_ids * sizeof(u32));
+
for_each_node_by_type(dn, "cpu") {
const __be32 *intserv;
__be32 cpu_be;
@@ -480,6 +486,7 @@ void __init smp_setup_cpu_maps(void)
intserv = of_get_property(dn, "reg", );
if (!intserv) {
cpu_be = cpu_to_be32(cpu);
+   /* XXX: what is this? uninitialized?? */
intserv = _be;  /* assume logical == 
phys */
len = 4;
}
@@ -499,8 +506,8 @@ void __init smp_setup_cpu_maps(void)
"enable-method", "spin-table");
 
set_cpu_present(cpu, avail);
-   set_hard_smp_processor_id(cpu, be32_to_cpu(intserv[j]));
set_cpu_possible(cpu, true);
+   cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
cpu++;
}
 
@@ -570,6 +577,12 @@ void __init smp_setup_cpu_maps(void)
setup_nr_cpu_ids();
 
free_unused_pacas();
+
+   for_each_possible_cpu(cpu) {
+   if (cpu == smp_processor_id())
+   continue;
+   set_hard_smp_processor_id(cpu, cpu_to_phys_id[cpu]);
+   }
 }
 #endif /* CONFIG_SMP */
 
-- 
2.16.1

[PATCH 07/14] powerpc/64: move default SPR recording

2018-02-13 Thread Nicholas Piggin

Move this into the early setup code, and don't iterate over CPU masks.
We don't want to call into sysfs so early from setup, and a future patch
won't initialize CPU masks by the time this is called.
---
 arch/powerpc/kernel/paca.c |  3 +++
 arch/powerpc/kernel/setup.h|  9 +++--
 arch/powerpc/kernel/setup_64.c |  8 
 arch/powerpc/kernel/sysfs.c| 18 +++---
 4 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 2699f9009286..e560072f122b 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -133,6 +133,9 @@ void __init initialise_paca(struct paca_struct *new_paca, 
int cpu)
new_paca->kexec_state = KEXEC_STATE_NONE;
new_paca->__current = _task;
new_paca->data_offset = 0xfeeeULL;
+#ifdef CONFIG_PPC64
+   new_paca->dscr_default = spr_default_dscr;
+#endif
 #ifdef CONFIG_PPC_BOOK3S_64
new_paca->slb_shadow_ptr = NULL;
 #endif
diff --git a/arch/powerpc/kernel/setup.h b/arch/powerpc/kernel/setup.h
index 3fc11e30308f..d144df54ad40 100644
--- a/arch/powerpc/kernel/setup.h
+++ b/arch/powerpc/kernel/setup.h
@@ -45,14 +45,11 @@ void emergency_stack_init(void);
 static inline void emergency_stack_init(void) { };
 #endif
 
-#ifdef CONFIG_PPC64
-void record_spr_defaults(void);
-#else
-static inline void record_spr_defaults(void) { };
-#endif
-
 #ifdef CONFIG_PPC64
 u64 ppc64_bolted_size(void);
+
+/* Default SPR values from firmware/kexec */
+extern unsigned long spr_default_dscr;
 #endif
 
 /*
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 3ce12af4906f..dde34d35d1e7 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -254,6 +254,14 @@ static void cpu_ready_for_interrupts(void)
get_paca()->kernel_msr = MSR_KERNEL;
 }
 
+unsigned long spr_default_dscr = 0;
+
+void __init record_spr_defaults(void)
+{
+   if (early_cpu_has_feature(CPU_FTR_DSCR))
+   spr_default_dscr = mfspr(SPRN_DSCR);
+}
+
 /*
  * Early initialization entry point. This is called by head.S
  * with MMU translation disabled. We rely on the "feature" of
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 1f9d94dac3a6..ab4eb61fe659 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -588,21 +588,17 @@ static DEVICE_ATTR(dscr_default, 0600,
 
 static void sysfs_create_dscr_default(void)
 {
-   int err = 0;
-   if (cpu_has_feature(CPU_FTR_DSCR))
-   err = device_create_file(cpu_subsys.dev_root, 
_attr_dscr_default);
-}
-
-void __init record_spr_defaults(void)
-{
-   int cpu;
-
if (cpu_has_feature(CPU_FTR_DSCR)) {
-   dscr_default = mfspr(SPRN_DSCR);
-   for (cpu = 0; cpu < nr_cpu_ids; cpu++)
+   int err = 0;
+   int cpu;
+
+   for_each_possible_cpu(cpu)
paca_ptrs[cpu]->dscr_default = dscr_default;
+
+   err = device_create_file(cpu_subsys.dev_root, 
_attr_dscr_default);
}
 }
+
 #endif /* CONFIG_PPC64 */
 
 #ifdef HAS_PPC_PMC_PA6T
-- 
2.16.1

[PATCH 06/14] powerpc/mm/numa: move numa topology discovery earlier

2018-02-13 Thread Nicholas Piggin

Split sparsemem initialisation from basic numa topology discovery.
Move the parsing earlier in boot, before pacas are allocated.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/setup.h   |  1 +
 arch/powerpc/kernel/setup-common.c |  3 +++
 arch/powerpc/mm/mem.c  |  5 -
 arch/powerpc/mm/numa.c | 32 +++-
 4 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 469b7fdc9be4..d2bf233aebd5 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -23,6 +23,7 @@ extern void reloc_got2(unsigned long);
 #define PTRRELOC(x)((typeof(x)) add_reloc_offset((unsigned long)(x)))
 
 void check_for_initrd(void);
+void mem_topology_setup(void);
 void initmem_init(void);
 void setup_panic(void);
 #define ARCH_PANIC_TIMEOUT 180
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index d73ec518ef80..9eaf26318d20 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -888,6 +888,9 @@ void __init setup_arch(char **cmdline_p)
/* Check the SMT related command line arguments (ppc64). */
check_smt_enabled();
 
+   /* Parse memory topology */
+   mem_topology_setup();
+
/* On BookE, setup per-core TLB data structures. */
setup_tlb_core_data();
 
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index fe8c61149fb8..4eee46ea4d96 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -212,7 +212,7 @@ walk_system_ram_range(unsigned long start_pfn, unsigned 
long nr_pages,
 EXPORT_SYMBOL_GPL(walk_system_ram_range);
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
-void __init initmem_init(void)
+void __init mem_topology_setup(void)
 {
max_low_pfn = max_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
min_low_pfn = MEMORY_START >> PAGE_SHIFT;
@@ -224,7 +224,10 @@ void __init initmem_init(void)
 * memblock_regions
 */
memblock_set_node(0, (phys_addr_t)ULLONG_MAX, , 0);
+}
 
+void __init initmem_init(void)
+{
/* XXX need to clip this if using highmem? */
sparse_memory_present_with_active_regions(0);
sparse_init();
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9c3eb62bced5..57a5029b4521 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -831,18 +831,13 @@ static void __init find_possible_nodes(void)
of_node_put(rtas);
 }
 
-void __init initmem_init(void)
+void __init mem_topology_setup(void)
 {
-   int nid, cpu;
-
-   max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
-   max_pfn = max_low_pfn;
+   int cpu;
 
if (parse_numa_properties())
setup_nonnuma();
 
-   memblock_dump_all();
-
/*
 * Modify the set of possible NUMA nodes to reflect information
 * available about the set of online nodes, and the set of nodes
@@ -853,6 +848,23 @@ void __init initmem_init(void)
 
find_possible_nodes();
 
+   setup_node_to_cpumask_map();
+
+   reset_numa_cpu_lookup_table();
+
+   for_each_present_cpu(cpu)
+   numa_setup_cpu(cpu);
+}
+
+void __init initmem_init(void)
+{
+   int nid;
+
+   max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
+   max_pfn = max_low_pfn;
+
+   memblock_dump_all();
+
for_each_online_node(nid) {
unsigned long start_pfn, end_pfn;
 
@@ -863,10 +875,6 @@ void __init initmem_init(void)
 
sparse_init();
 
-   setup_node_to_cpumask_map();
-
-   reset_numa_cpu_lookup_table();
-
/*
 * We need the numa_cpu_lookup_table to be accurate for all CPUs,
 * even before we online them, so that we can use cpu_to_{node,mem}
@@ -876,8 +884,6 @@ void __init initmem_init(void)
 */
cpuhp_setup_state_nocalls(CPUHP_POWER_NUMA_PREPARE, 
"powerpc/numa:prepare",
  ppc_numa_cpu_prepare, ppc_numa_cpu_dead);
-   for_each_present_cpu(cpu)
-   numa_setup_cpu(cpu);
 }
 
 static int __init early_numa(char *p)
-- 
2.16.1

[PATCH 05/14] mm: make memblock_alloc_base_nid non-static

2018-02-13 Thread Nicholas Piggin

This will be used by powerpc to allocate per-cpu stacks and other
data structures node-local where possible.

Signed-off-by: Nicholas Piggin 
---
 include/linux/memblock.h | 5 -
 mm/memblock.c| 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 8be5077efb5f..8cab51398705 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -316,9 +316,12 @@ static inline bool memblock_bottom_up(void)
 #define MEMBLOCK_ALLOC_ANYWHERE(~(phys_addr_t)0)
 #define MEMBLOCK_ALLOC_ACCESSIBLE  0
 
-phys_addr_t __init memblock_alloc_range(phys_addr_t size, phys_addr_t align,
+phys_addr_t memblock_alloc_range(phys_addr_t size, phys_addr_t align,
phys_addr_t start, phys_addr_t end,
ulong flags);
+phys_addr_t memblock_alloc_base_nid(phys_addr_t size,
+   phys_addr_t align, phys_addr_t max_addr,
+   int nid, ulong flags);
 phys_addr_t memblock_alloc_base(phys_addr_t size, phys_addr_t align,
phys_addr_t max_addr);
 phys_addr_t __memblock_alloc_base(phys_addr_t size, phys_addr_t align,
diff --git a/mm/memblock.c b/mm/memblock.c
index 5a9ca2a1751b..cea2af494da0 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1190,7 +1190,7 @@ phys_addr_t __init memblock_alloc_range(phys_addr_t size, 
phys_addr_t align,
flags);
 }
 
-static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
+phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
phys_addr_t align, phys_addr_t max_addr,
int nid, ulong flags)
 {
-- 
2.16.1

[PATCH 04/14] powerpc/64s: allocate slb_shadow structures individually

2018-02-13 Thread Nicholas Piggin

Allocate slb_shadow structures individually.

slb_shadow structures are avoided for radix environment.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/paca.c | 65 +-
 1 file changed, 30 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 6cddb9bdc151..2699f9009286 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -72,41 +72,28 @@ static void __init free_lppaca(struct lppaca *lp)
 #ifdef CONFIG_PPC_BOOK3S_64
 
 /*
- * 3 persistent SLBs are registered here.  The buffer will be zero
+ * 3 persistent SLBs are allocated here.  The buffer will be zero
  * initially, hence will all be invaild until we actually write them.
  *
  * If you make the number of persistent SLB entries dynamic, please also
  * update PR KVM to flush and restore them accordingly.
  */
-static struct slb_shadow * __initdata slb_shadow;
-
-static void __init allocate_slb_shadows(int nr_cpus, int limit)
-{
-   int size = PAGE_ALIGN(sizeof(struct slb_shadow) * nr_cpus);
-
-   if (early_radix_enabled())
-   return;
-
-   slb_shadow = __va(memblock_alloc_base(size, PAGE_SIZE, limit));
-   memset(slb_shadow, 0, size);
-}
-
-static struct slb_shadow * __init init_slb_shadow(int cpu)
+static struct slb_shadow * __init new_slb_shadow(int cpu, unsigned long limit)
 {
struct slb_shadow *s;
 
-   if (early_radix_enabled())
-   return NULL;
+   if (cpu != boot_cpuid) {
+   /*
+* Boot CPU comes here before early_radix_enabled
+* is parsed (e.g., for disable_radix). So allocate
+* always and this will be fixed up in free_unused_pacas.
+*/
+   if (early_radix_enabled())
+   return NULL;
+   }
 
-   s = _shadow[cpu];
-
-   /*
-* When we come through here to initialise boot_paca, the slb_shadow
-* buffers are not allocated yet. That's OK, we'll get one later in
-* boot, but make sure we don't corrupt memory at 0.
-*/
-   if (!slb_shadow)
-   return NULL;
+   s = __va(memblock_alloc_base(sizeof(*s), L1_CACHE_BYTES, limit));
+   memset(s, 0, sizeof(*s));
 
s->persistent = cpu_to_be32(SLB_NUM_BOLTED);
s->buffer_length = cpu_to_be32(sizeof(*s));
@@ -114,10 +101,6 @@ static struct slb_shadow * __init init_slb_shadow(int cpu)
return s;
 }
 
-#else /* !CONFIG_PPC_BOOK3S_64 */
-
-static void __init allocate_slb_shadows(int nr_cpus, int limit) { }
-
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
 /* The Paca is an array with one entry per processor.  Each contains an
@@ -151,7 +134,7 @@ void __init initialise_paca(struct paca_struct *new_paca, 
int cpu)
new_paca->__current = _task;
new_paca->data_offset = 0xfeeeULL;
 #ifdef CONFIG_PPC_BOOK3S_64
-   new_paca->slb_shadow_ptr = init_slb_shadow(cpu);
+   new_paca->slb_shadow_ptr = NULL;
 #endif
 
 #ifdef CONFIG_PPC_BOOK3E
@@ -222,13 +205,16 @@ void __init allocate_pacas(void)
printk(KERN_DEBUG "Allocated %lu bytes for %u pacas\n",
size, nr_cpu_ids);
 
-   allocate_slb_shadows(nr_cpu_ids, limit);
-
/* Can't use for_each_*_cpu, as they aren't functional yet */
for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
-   initialise_paca(paca_ptrs[cpu], cpu);
+   struct paca_struct *paca = paca_ptrs[cpu];
+
+   initialise_paca(paca, cpu);
 #ifdef CONFIG_PPC_PSERIES
-   paca_ptrs[cpu]->lppaca_ptr = new_lppaca(cpu, limit);
+   paca->lppaca_ptr = new_lppaca(cpu, limit);
+#endif
+#ifdef CONFIG_PPC_BOOK3S_64
+   paca->slb_shadow_ptr = new_slb_shadow(cpu, limit);
 #endif
}
 }
@@ -263,6 +249,15 @@ void __init free_unused_pacas(void)
 
paca_nr_cpu_ids = nr_cpu_ids;
paca_ptrs_size = new_ptrs_size;
+
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (early_radix_enabled()) {
+   /* Ugly fixup, see new_slb_shadow() */
+   memblock_free(__pa(paca_ptrs[boot_cpuid]->slb_shadow_ptr),
+   sizeof(struct slb_shadow));
+   paca_ptrs[boot_cpuid]->slb_shadow_ptr = NULL;
+   }
+#endif
 }
 
 void copy_mm_to_paca(struct mm_struct *mm)
-- 
2.16.1

[PATCH 03/14] powerpc/64s: allocate lppacas individually

2018-02-13 Thread Nicholas Piggin

Allocate LPPACAs individually.

We no longer allocate lppacas in an array, so this patch removes the 1kB
static alignment for the structure, and enforces the PAPR alignment
requirements at allocation time. We can not reduce the 1kB allocation size
however, due to existing KVM hypervisors.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/lppaca.h  | 24 -
 arch/powerpc/kernel/machine_kexec_64.c | 15 --
 arch/powerpc/kernel/paca.c | 89 --
 arch/powerpc/kvm/book3s_hv.c   |  3 +-
 arch/powerpc/mm/numa.c |  4 +-
 arch/powerpc/platforms/pseries/kexec.c |  7 ++-
 6 files changed, 63 insertions(+), 79 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index 6e4589eee2da..65d589689f01 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -36,14 +36,16 @@
 #include 
 
 /*
- * We only have to have statically allocated lppaca structs on
- * legacy iSeries, which supports at most 64 cpus.
- */
-#define NR_LPPACAS 1
-
-/*
- * The Hypervisor barfs if the lppaca crosses a page boundary.  A 1k
- * alignment is sufficient to prevent this
+ * The lppaca is the "virtual processor area" registered with the hypervisor,
+ * H_REGISTER_VPA etc.
+ *
+ * According to PAPR, the structure is 640 bytes long, must be L1 cache line
+ * aligned, and must not cross a 4kB boundary. Its size field must be at
+ * least 640 bytes (but may be more).
+ *
+ * Pre-v4.14 KVM hypervisors reject the VPA if its size field is smaller than
+ * 1kB, so we dynamically allocate 1kB and advertise size as 1kB, but keep
+ * this structure as the canonical 640 byte size.
  */
 struct lppaca {
/* cacheline 1 contains read-only data */
@@ -97,11 +99,9 @@ struct lppaca {
 
__be32  page_ins;   /* CMO Hint - # page ins by OS */
u8  reserved11[148];
-   volatile __be64 dtl_idx;/* Dispatch Trace Log head 
index */
+   volatile __be64 dtl_idx;/* Dispatch Trace Log head index */
u8  reserved12[96];
-} __attribute__((__aligned__(0x400)));
-
-extern struct lppaca lppaca[];
+} cacheline_aligned;
 
 #define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr)
 
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index a250e3331f94..1044bf15d5ed 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -323,17 +323,24 @@ void default_machine_kexec(struct kimage *image)
kexec_stack.thread_info.cpu = current_thread_info()->cpu;
 
/* We need a static PACA, too; copy this CPU's PACA over and switch to
-* it.  Also poison per_cpu_offset to catch anyone using non-static
-* data.
+* it. Also poison per_cpu_offset and NULL lppaca to catch anyone using
+* non-static data.
 */
memcpy(_paca, get_paca(), sizeof(struct paca_struct));
kexec_paca.data_offset = 0xedeaddeadeeeUL;
+#ifdef CONFIG_PPC_PSERIES
+   kexec_paca.lppaca_ptr = NULL;
+#endif
paca_ptrs[kexec_paca.paca_index] = _paca;
+
setup_paca(_paca);
 
-   /* XXX: If anyone does 'dynamic lppacas' this will also need to be
-* switched to a static version!
+   /*
+* The lppaca should be unregistered at this point so the HV won't
+* touch it. In the case of a crash, none of the lppacas are
+* unregistered so there is not much we can do about it here.
 */
+
/*
 * On Book3S, the copy must happen with the MMU off if we are either
 * using Radix page tables or we are not in an LPAR since we can
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index eef4891c9af6..6cddb9bdc151 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -23,82 +23,50 @@
 #ifdef CONFIG_PPC_PSERIES
 
 /*
- * The structure which the hypervisor knows about - this structure
- * should not cross a page boundary.  The vpa_init/register_vpa call
- * is now known to fail if the lppaca structure crosses a page
- * boundary.  The lppaca is also used on POWER5 pSeries boxes.
- * The lppaca is 640 bytes long, and cannot readily
- * change since the hypervisor knows its layout, so a 1kB alignment
- * will suffice to ensure that it doesn't cross a page boundary.
+ * See asm/lppaca.h for more detail.
+ *
+ * lppaca structures must must be 1kB in size, L1 cache line aligned,
+ * and not cross 4kB boundary. A 1kB size and 1kB alignment will satisfy
+ * these requirements.
  */
-struct lppaca lppaca[] = {
-   [0 ... (NR_LPPACAS-1)] = {
+static inline void init_lppaca(struct lppaca *lppaca)
+{
+   BUILD_BUG_ON(sizeof(struct lppaca) != 640);
+
+   *lppaca = (struct lppaca) {
.desc = cpu_to_be32(0xd397d781),/* "LpPa" */
-   .size = cpu_to_be16(sizeof(struct lppaca)),
+

[PATCH 02/14] powerpc/64: Use array of paca pointers and allocate pacas individually

2018-02-13 Thread Nicholas Piggin

Change the paca array into an array of pointers to pacas. Allocate
pacas individually.

This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.

This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_ppc.h   |  8 ++--
 arch/powerpc/include/asm/lppaca.h|  2 +-
 arch/powerpc/include/asm/paca.h  |  4 +-
 arch/powerpc/include/asm/smp.h   |  4 +-
 arch/powerpc/kernel/crash.c  |  2 +-
 arch/powerpc/kernel/head_64.S| 19 
 arch/powerpc/kernel/machine_kexec_64.c   | 22 -
 arch/powerpc/kernel/paca.c   | 70 +++-
 arch/powerpc/kernel/setup_64.c   | 23 -
 arch/powerpc/kernel/smp.c| 10 ++--
 arch/powerpc/kernel/sysfs.c  |  2 +-
 arch/powerpc/kvm/book3s_hv.c | 31 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c |  2 +-
 arch/powerpc/mm/tlb-radix.c  |  2 +-
 arch/powerpc/platforms/85xx/smp.c|  8 ++--
 arch/powerpc/platforms/cell/smp.c|  4 +-
 arch/powerpc/platforms/powernv/idle.c| 13 +++---
 arch/powerpc/platforms/powernv/setup.c   |  4 +-
 arch/powerpc/platforms/powernv/smp.c |  2 +-
 arch/powerpc/platforms/powernv/subcore.c |  2 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |  2 +-
 arch/powerpc/platforms/pseries/lpar.c|  4 +-
 arch/powerpc/platforms/pseries/setup.c   |  2 +-
 arch/powerpc/platforms/pseries/smp.c |  4 +-
 arch/powerpc/sysdev/xics/icp-native.c|  2 +-
 arch/powerpc/xmon/xmon.c |  2 +-
 26 files changed, 143 insertions(+), 107 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 9db18287b5f4..8908481cdfd7 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -432,15 +432,15 @@ struct openpic;
 extern void kvm_cma_reserve(void) __init;
 static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
 {
-   paca[cpu].kvm_hstate.xics_phys = (void __iomem *)addr;
+   paca_ptrs[cpu]->kvm_hstate.xics_phys = (void __iomem *)addr;
 }
 
 static inline void kvmppc_set_xive_tima(int cpu,
unsigned long phys_addr,
void __iomem *virt_addr)
 {
-   paca[cpu].kvm_hstate.xive_tima_phys = (void __iomem *)phys_addr;
-   paca[cpu].kvm_hstate.xive_tima_virt = virt_addr;
+   paca_ptrs[cpu]->kvm_hstate.xive_tima_phys = (void __iomem *)phys_addr;
+   paca_ptrs[cpu]->kvm_hstate.xive_tima_virt = virt_addr;
 }
 
 static inline u32 kvmppc_get_xics_latch(void)
@@ -454,7 +454,7 @@ static inline u32 kvmppc_get_xics_latch(void)
 
 static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
 {
-   paca[cpu].kvm_hstate.host_ipi = host_ipi;
+   paca_ptrs[cpu]->kvm_hstate.host_ipi = host_ipi;
 }
 
 static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index d0a2a2f99564..6e4589eee2da 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -103,7 +103,7 @@ struct lppaca {
 
 extern struct lppaca lppaca[];
 
-#define lppaca_of(cpu) (*paca[cpu].lppaca_ptr)
+#define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr)
 
 /*
  * We are using a non architected field to determine if a partition is
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 57fe8aa0c257..f266b0a7be95 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -246,10 +246,10 @@ struct paca_struct {
void *rfi_flush_fallback_area;
u64 l1d_flush_size;
 #endif
-};
+} cacheline_aligned;
 
 extern void copy_mm_to_paca(struct mm_struct *mm);
-extern struct paca_struct *paca;
+extern struct paca_struct **paca_ptrs;
 extern void initialise_paca(struct paca_struct *new_paca, int cpu);
 extern void setup_paca(struct paca_struct *new_paca);
 extern void allocate_pacas(void);
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index fac963e10d39..ec7b299350d9 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -170,12 +170,12 @@ static inline const struct cpumask *cpu_sibling_mask(int 
cpu)
 #ifdef CONFIG_PPC64
 static inline int get_hard_smp_processor_id(int cpu)
 {
-   return paca[cpu].hw_cpu_id;
+   return paca_ptrs[cpu]->hw_cpu_id;
 }
 
 static inline void set_hard_smp_processor_id(int cpu, int phys)
 {
-

[PATCH 01/14] powerpc/64s: do not allocate lppaca if we are not virtualized

2018-02-13 Thread Nicholas Piggin

The "lppaca" is a structure registered with the hypervisor. This
is unnecessary when running on non-virtualised platforms. One field
from the lppaca (pmcregs_in_use) is also used by the host, so move
the host part out into the paca (lppaca field is still updated in
guest mode).

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/paca.h |  9 +++--
 arch/powerpc/include/asm/pmc.h  | 13 -
 arch/powerpc/kernel/asm-offsets.c   |  5 +
 arch/powerpc/kernel/paca.c  | 16 +---
 arch/powerpc/kvm/book3s_hv_interrupts.S |  3 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  3 +--
 6 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index b62c31037cad..57fe8aa0c257 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -58,7 +58,7 @@ struct task_struct;
  * processor.
  */
 struct paca_struct {
-#ifdef CONFIG_PPC_BOOK3S
+#ifdef CONFIG_PPC_PSERIES
/*
 * Because hw_cpu_id, unlike other paca fields, is accessed
 * routinely from other CPUs (from the IRQ code), we stick to
@@ -67,7 +67,8 @@ struct paca_struct {
 */
 
struct lppaca *lppaca_ptr;  /* Pointer to LpPaca for PLIC */
-#endif /* CONFIG_PPC_BOOK3S */
+#endif /* CONFIG_PPC_PSERIES */
+
/*
 * MAGIC: the spinlock functions in arch/powerpc/lib/locks.c 
 * load lock_token and paca_index with a single lwz
@@ -160,10 +161,14 @@ struct paca_struct {
u64 saved_msr;  /* MSR saved here by enter_rtas */
u16 trap_save;  /* Used when bad stack is encountered */
u8 irq_soft_mask;   /* mask for irq soft masking */
+   u8 soft_enabled;/* irq soft-enable flag */
u8 irq_happened;/* irq happened while soft-disabled */
u8 io_sync; /* writel() needs spin_unlock sync */
u8 irq_work_pending;/* IRQ_WORK interrupt while 
soft-disable */
u8 nap_state_lost;  /* NV GPR values lost in power7_idle */
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   u8 pmcregs_in_use;  /* pseries puts this in lppaca */
+#endif
u64 sprg_vdso;  /* Saved user-visible sprg */
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
u64 tm_scratch; /* TM scratch area for reclaim */
diff --git a/arch/powerpc/include/asm/pmc.h b/arch/powerpc/include/asm/pmc.h
index 5a9ede4962cb..7ac3586c38ab 100644
--- a/arch/powerpc/include/asm/pmc.h
+++ b/arch/powerpc/include/asm/pmc.h
@@ -31,10 +31,21 @@ void ppc_enable_pmcs(void);
 
 #ifdef CONFIG_PPC_BOOK3S_64
 #include 
+#include 
 
 static inline void ppc_set_pmu_inuse(int inuse)
 {
-   get_lppaca()->pmcregs_in_use = inuse;
+#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
+   if (firmware_has_feature(FW_FEATURE_LPAR)) {
+#ifdef CONFIG_PPC_PSERIES
+   get_lppaca()->pmcregs_in_use = inuse;
+#endif
+   } else {
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   get_paca()->pmcregs_in_use = inuse;
+#endif
+   }
+#endif
 }
 
 extern void power4_enable_pmcs(void);
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 88b84ac76b53..b9b52490acfd 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -221,12 +221,17 @@ int main(void)
OFFSET(PACA_EXMC, paca_struct, exmc);
OFFSET(PACA_EXSLB, paca_struct, exslb);
OFFSET(PACA_EXNMI, paca_struct, exnmi);
+#ifdef CONFIG_PPC_PSERIES
OFFSET(PACALPPACAPTR, paca_struct, lppaca_ptr);
+#endif
OFFSET(PACA_SLBSHADOWPTR, paca_struct, slb_shadow_ptr);
OFFSET(SLBSHADOW_STACKVSID, slb_shadow, save_area[SLB_NUM_BOLTED - 
1].vsid);
OFFSET(SLBSHADOW_STACKESID, slb_shadow, save_area[SLB_NUM_BOLTED - 
1].esid);
OFFSET(SLBSHADOW_SAVEAREA, slb_shadow, save_area);
OFFSET(LPPACA_PMCINUSE, lppaca, pmcregs_in_use);
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   OFFSET(PACA_PMCINUSE, paca_struct, pmcregs_in_use);
+#endif
OFFSET(LPPACA_DTLIDX, lppaca, dtl_idx);
OFFSET(LPPACA_YIELDCOUNT, lppaca, yield_count);
OFFSET(PACA_DTL_RIDX, paca_struct, dtl_ridx);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 95ffedf14885..5900540e2ff8 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -20,7 +20,7 @@
 
 #include "setup.h"
 
-#ifdef CONFIG_PPC_BOOK3S
+#ifdef CONFIG_PPC_PSERIES
 
 /*
  * The structure which the hypervisor knows about - this structure
@@ -47,6 +47,9 @@ static long __initdata lppaca_size;
 
 static void __init allocate_lppacas(int nr_cpus, unsigned long limit)
 {
+   if (early_cpu_has_feature(CPU_FTR_HVMODE))
+   return;
+
if (nr_cpus <= NR_LPPACAS)
return;
 
@@ -60,6 +63,9 @@

[PATCH 00/14] numa aware allocation for pacas, stacks, pagetables

2018-02-13 Thread Nicholas Piggin

This series allows numa aware allocations for various early data
structures for radix. Hash still has a bolted SLB limitation that
prevents at least pacas and stacks from node-affine allocations.

Fixed up a number of bugs, got pSeries working, added a couple more
cases where page tables can be allocated node-local.

Thanks,
Nick

Nicholas Piggin (14):
  powerpc/64s: do not allocate lppaca if we are not virtualized
  powerpc/64: Use array of paca pointers and allocate pacas individually
  powerpc/64s: allocate lppacas individually
  powerpc/64s: allocate slb_shadow structures individually
  mm: make memblock_alloc_base_nid non-static
  powerpc/mm/numa: move numa topology discovery earlier
  powerpc/64: move default SPR recording
  powerpc/setup: cpu_to_phys_id array
  powerpc/64: defer paca allocation until memory topology is discovered
  powerpc/64: allocate pacas per node
  powerpc/64: allocate per-cpu stacks node-local if possible
  powerpc: pass node id into create_section_mapping
  powerpc/64s/radix: split early page table mapping to its own function
  powerpc/64s/radix: allocate kernel page tables node-local if possible

 arch/powerpc/include/asm/book3s/64/hash.h|   2 +-
 arch/powerpc/include/asm/book3s/64/radix.h   |   2 +-
 arch/powerpc/include/asm/kvm_ppc.h   |   8 +-
 arch/powerpc/include/asm/lppaca.h|  26 +--
 arch/powerpc/include/asm/paca.h  |  16 +-
 arch/powerpc/include/asm/pmc.h   |  13 +-
 arch/powerpc/include/asm/setup.h |   1 +
 arch/powerpc/include/asm/smp.h   |   5 +-
 arch/powerpc/include/asm/sparsemem.h |   2 +-
 arch/powerpc/kernel/asm-offsets.c|   5 +
 arch/powerpc/kernel/crash.c  |   2 +-
 arch/powerpc/kernel/head_64.S|  19 ++-
 arch/powerpc/kernel/machine_kexec_64.c   |  37 +++--
 arch/powerpc/kernel/paca.c   | 238 ++-
 arch/powerpc/kernel/prom.c   |  12 +-
 arch/powerpc/kernel/setup-common.c   |  30 +++-
 arch/powerpc/kernel/setup.h  |   9 +-
 arch/powerpc/kernel/setup_64.c   |  80 ++---
 arch/powerpc/kernel/smp.c|  10 +-
 arch/powerpc/kernel/sysfs.c  |  18 +-
 arch/powerpc/kvm/book3s_hv.c |  34 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c |   2 +-
 arch/powerpc/kvm/book3s_hv_interrupts.S  |   3 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   3 +-
 arch/powerpc/mm/hash_utils_64.c  |   2 +-
 arch/powerpc/mm/mem.c|   9 +-
 arch/powerpc/mm/numa.c   |  36 ++--
 arch/powerpc/mm/pgtable-book3s64.c   |   6 +-
 arch/powerpc/mm/pgtable-radix.c  | 203 ---
 arch/powerpc/mm/tlb-radix.c  |   2 +-
 arch/powerpc/platforms/85xx/smp.c|   8 +-
 arch/powerpc/platforms/cell/smp.c|   4 +-
 arch/powerpc/platforms/powernv/idle.c|  13 +-
 arch/powerpc/platforms/powernv/setup.c   |   4 +-
 arch/powerpc/platforms/powernv/smp.c |   2 +-
 arch/powerpc/platforms/powernv/subcore.c |   2 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |   2 +-
 arch/powerpc/platforms/pseries/kexec.c   |   7 +-
 arch/powerpc/platforms/pseries/lpar.c|   4 +-
 arch/powerpc/platforms/pseries/setup.c   |   2 +-
 arch/powerpc/platforms/pseries/smp.c |   4 +-
 arch/powerpc/sysdev/xics/icp-native.c|   2 +-
 arch/powerpc/xmon/xmon.c |   2 +-
 include/linux/memblock.h |   5 +-
 mm/memblock.c|   2 +-
 45 files changed, 543 insertions(+), 355 deletions(-)

-- 
2.16.1

Re: pata-macio WARNING at dmam_alloc_coherent+0xec/0x110

2018-02-13 Thread Christoph Hellwig

Does this fix your warning?

diff --git a/drivers/macintosh/macio_asic.c b/drivers/macintosh/macio_asic.c
index 62f541f968f6..07074820a167 100644
--- a/drivers/macintosh/macio_asic.c
+++ b/drivers/macintosh/macio_asic.c
@@ -375,6 +375,7 @@ static struct macio_dev * macio_add_one_device(struct 
macio_chip *chip,
dev->ofdev.dev.of_node = np;
dev->ofdev.archdata.dma_mask = 0xUL;
dev->ofdev.dev.dma_mask = >ofdev.archdata.dma_mask;
+   dev->ofdev.dev.coherent_dma_mask = dev->ofdev.archdata.dma_mask;
dev->ofdev.dev.parent = parent;
dev->ofdev.dev.bus = _bus_type;
dev->ofdev.dev.release = macio_release_dev;

Re: [PATCH] powerpc/xive: use hw CPU ids when configuring the CPU queues

2018-02-13 Thread Cédric Le Goater

On 02/13/2018 10:18 AM, Michael Ellerman wrote:
> Cédric Le Goater  writes:
> 
>> The CPU event notification queues on sPAPR should be configured using
>> a hardware CPU identifier.
>>
>> The problem did not show up on the Power Hypervisor because pHyp
>> supports 8 threads per core which keeps CPU number contiguous. This is
>> not the case on all sPAPR virtual machines, some use SMT=1.
>>
>> Also improve error logging by adding the CPU number.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>
>>  I think we should send this one to stable also.
> 
> Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt 
> controller")

yes.

> Cc: sta...@vger.kernel.org # v4.14+

yes. I just added the Cc:. I am not sure that will work with 
patchwork though.

Thanks,

C. 


> ?
> 
> cheers
> 
>> diff --git a/arch/powerpc/sysdev/xive/spapr.c 
>> b/arch/powerpc/sysdev/xive/spapr.c
>> index d9c4c9366049..091f1d0d0af1 100644
>> --- a/arch/powerpc/sysdev/xive/spapr.c
>> +++ b/arch/powerpc/sysdev/xive/spapr.c
>> @@ -356,7 +356,8 @@ static int xive_spapr_configure_queue(u32 target, struct 
>> xive_q *q, u8 prio,
>>  
>>  rc = plpar_int_get_queue_info(0, target, prio, _page, _size);
>>  if (rc) {
>> -pr_err("Error %lld getting queue info prio %d\n", rc, prio);
>> +pr_err("Error %lld getting queue info CPU %d prio %d\n", rc,
>> +   target, prio);
>>  rc = -EIO;
>>  goto fail;
>>  }
>> @@ -370,7 +371,8 @@ static int xive_spapr_configure_queue(u32 target, struct 
>> xive_q *q, u8 prio,
>>  /* Configure and enable the queue in HW */
>>  rc = plpar_int_set_queue_config(flags, target, prio, qpage_phys, order);
>>  if (rc) {
>> -pr_err("Error %lld setting queue for prio %d\n", rc, prio);
>> +pr_err("Error %lld setting queue for CPU %d prio %d\n", rc,
>> +   target, prio);
>>  rc = -EIO;
>>  } else {
>>  q->qpage = qpage;
>> @@ -389,8 +391,8 @@ static int xive_spapr_setup_queue(unsigned int cpu, 
>> struct xive_cpu *xc,
>>  if (IS_ERR(qpage))
>>  return PTR_ERR(qpage);
>>  
>> -return xive_spapr_configure_queue(cpu, q, prio, qpage,
>> -  xive_queue_shift);
>> +return xive_spapr_configure_queue(get_hard_smp_processor_id(cpu),
>> +  q, prio, qpage, xive_queue_shift);
>>  }
>>  
>>  static void xive_spapr_cleanup_queue(unsigned int cpu, struct xive_cpu *xc,
>> @@ -399,10 +401,12 @@ static void xive_spapr_cleanup_queue(unsigned int cpu, 
>> struct xive_cpu *xc,
>>  struct xive_q *q = >queue[prio];
>>  unsigned int alloc_order;
>>  long rc;
>> +int hw_cpu = get_hard_smp_processor_id(cpu);
>>  
>> -rc = plpar_int_set_queue_config(0, cpu, prio, 0, 0);
>> +rc = plpar_int_set_queue_config(0, hw_cpu, prio, 0, 0);
>>  if (rc)
>> -pr_err("Error %ld setting queue for prio %d\n", rc, prio);
>> +pr_err("Error %ld setting queue for CPU %d prio %d\n", rc,
>> +   hw_cpu, prio);
>>  
>>  alloc_order = xive_alloc_order(xive_queue_shift);
>>  free_pages((unsigned long)q->qpage, alloc_order);
>> -- 
>> 2.13.6

Re: [PATCH 1/2] powerpc/kdump: Add missing optional dummy functions

2018-02-13 Thread Michael Ellerman

Guenter Roeck  writes:

> If KEXEC_CORE is not enabled, PowerNV builds fail as follows.
>
> arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self':
> arch/powerpc/platforms/powernv/smp.c:236:4: error:
>   implicit declaration of function 'crash_ipi_callback'
>
> Add dummy function calls, similar to kdump_in_progress(), to solve the
> problem.
>
> Fixes: 4145f358644b ("powernv/kdump: Fix cases where the kdump kernel ...")

Personally I prefer these untruncated.

> Cc: Balbir Singh 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Signed-off-by: Guenter Roeck 
> ---
>  arch/powerpc/include/asm/kexec.h | 6 ++
>  1 file changed, 6 insertions(+)

Thanks. Nathan sent a fix for this a few days ago, but I like this
version because it uses the existing #ifdefs.

I've taken this version but used the subject from his patch.

cheers

> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index 9dcbfa6bbb91..d8b1e8e7e035 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -140,6 +140,12 @@ static inline bool kdump_in_progress(void)
>   return false;
>  }
>  
> +static inline void crash_ipi_callback(struct pt_regs *regs) { }
> +
> +static inline void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs 
> *))
> +{
> +}
> +
>  #endif /* CONFIG_KEXEC_CORE */
>  #endif /* ! __ASSEMBLY__ */
>  #endif /* __KERNEL__ */
> -- 
> 2.7.4

Re: [PATCH kernel v2] powerpc/mm: Flush radix process translations when setting MMU type

2018-02-13 Thread Laurent Vivier

On 07/02/2018 18:49, Daniel Henrique Barboza wrote:
> 
> 
> On 02/07/2018 12:33 PM, Laurent Vivier wrote:
>> On 01/02/2018 06:09, Alexey Kardashevskiy wrote:
>>> Radix guests do normally invalidate process-scoped translations when
>>> a new pid is allocated but migrated guests do not invalidate these so
>>> migrated guests crash sometime, especially easy to reproduce with
>>> migration happening within first 10 seconds after the guest boot
>>> start on
>>> the same machine.
>>>
>>> This adds the "Invalidate process-scoped translations" flush to fix
>>> radix guests migration.
>>>
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>> Changes:
>>> v2:
>>> * removed PPC_TLBIE_5() from the !(old_HR) case as it is pointless
>>> on hash
>>>
>>> ---
>>>
>>>
>>> Not so sure that "process-scoped translations" only require flushing
>>> at pid allocation and migration.
>>>
>>> ---
>>>   arch/powerpc/mm/pgtable_64.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
>>> index c9a623c..d75dd52 100644
>>> --- a/arch/powerpc/mm/pgtable_64.c
>>> +++ b/arch/powerpc/mm/pgtable_64.c
>>> @@ -471,6 +471,8 @@ void mmu_partition_table_set_entry(unsigned int
>>> lpid, unsigned long dw0,
>>>   if (old & PATB_HR) {
>>>   asm volatile(PPC_TLBIE_5(%0,%1,2,0,1) : :
>>>    "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
>>> +    asm volatile(PPC_TLBIE_5(%0,%1,2,1,1) : :
>>> + "r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
>>>   trace_tlbie(lpid, 0, TLBIEL_INVAL_SET_LPID, lpid, 2, 0, 1);
>>>   } else {
>>>   asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : :
>>>
>> This patch fixes for me a VM migration crash on POWER9.
> 
> Same here.
> 
> Tested-by: Daniel Henrique Barboza 
> 
>>
>> Tested-by: Laurent Vivier 

Any hope to have this patch merged soon?

It fixes a real problem and migration of VM is not reliable without it.

Thanks,
Laurent

Re: [RFC PATCH 0/5] powerpc/mm/slice: improve slice speed and stack use

2018-02-13 Thread Christophe LEROY




Le 13/02/2018 à 09:40, Nicholas Piggin a écrit :

On Mon, 12 Feb 2018 18:42:21 +0100
Christophe LEROY  wrote:


Le 12/02/2018 à 16:24, Nicholas Piggin a écrit :

On Mon, 12 Feb 2018 16:02:23 +0100
Christophe LEROY  wrote:
   

Le 10/02/2018 à 09:11, Nicholas Piggin a écrit :

This series intends to improve performance and reduce stack
consumption in the slice allocation code. It does it by keeping slice
masks in the mm_context rather than compute them for each allocation,
and by reducing bitmaps and slice_masks from stacks, using pointers
instead where possible.

checkstack.pl gives, before:
0x0de4 slice_get_unmapped_area [slice.o]:   656
0x1b4c is_hugepage_only_range [slice.o]:512
0x075c slice_find_area_topdown [slice.o]:   416
0x04c8 slice_find_area_bottomup.isra.1 [slice.o]:   272
0x1aa0 slice_set_range_psize [slice.o]: 240
0x0a64 slice_find_area [slice.o]:   176
0x0174 slice_check_fit [slice.o]:   112

after:
0x0d70 slice_get_unmapped_area [slice.o]:   320
0x08f8 slice_find_area [slice.o]:   144
0x1860 slice_set_range_psize [slice.o]: 144
0x18ec is_hugepage_only_range [slice.o]:144
0x0750 slice_find_area_bottomup.isra.4 [slice.o]:   128

The benchmark in https://github.com/linuxppc/linux/issues/49 gives, before:
$ time ./slicemask
real0m20.712s
user0m5.830s
sys 0m15.105s

after:
$ time ./slicemask
real0m13.197s
user0m5.409s
sys 0m7.779s


Hi,

I tested your serie on an 8xx, on top of patch
https://patchwork.ozlabs.org/patch/871675/

I don't get a result as significant as yours, but there is some
improvment anyway:

ITERATION 50

Before:

root@vgoip:~# time ./slicemask
real0m 33.26s
user0m 1.94s
sys 0m 30.85s

After:
root@vgoip:~# time ./slicemask
real0m 29.69s
user0m 2.11s
sys 0m 27.15s

Most significant improvment is obtained with the first patch of your serie:
root@vgoip:~# time ./slicemask
real0m 30.85s
user0m 1.80s
sys 0m 28.57s


Okay, thanks. Are you still spending significant time in the slice
code?


Do you mean am I still updating my patches ? No I hope we are at last


Actually I was wondering about CPU time spent for the microbenchmark :)


Lol.

I've got the following perf report (functions over 0.50%)

# Overhead  CommandShared Object  Symbol
#   .  .  ..
#
 7.13%  slicemask  [kernel.kallsyms]  [k] do_brk_flags
 6.19%  slicemask  [kernel.kallsyms]  [k] DoSyscall
 5.81%  slicemask  [kernel.kallsyms]  [k] perf_event_mmap
 5.55%  slicemask  [kernel.kallsyms]  [k] do_munmap
 4.55%  slicemask  [kernel.kallsyms]  [k] sys_brk
 4.43%  slicemask  [kernel.kallsyms]  [k] find_vma
 3.42%  slicemask  [kernel.kallsyms]  [k] vma_compute_subtree_gap
 3.08%  slicemask  libc-2.23.so   [.] __brk
 2.95%  slicemask  [kernel.kallsyms]  [k] slice_get_unmapped_area
 2.81%  slicemask  [kernel.kallsyms]  [k] __vm_enough_memory
 2.78%  slicemask  [kernel.kallsyms]  [k] kmem_cache_free
 2.51%  slicemask  [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.84
 2.40%  slicemask  [kernel.kallsyms]  [k] unmap_page_range
 2.27%  slicemask  [kernel.kallsyms]  [k] perf_iterate_sb
 2.21%  slicemask  [kernel.kallsyms]  [k] vmacache_find
 2.04%  slicemask  [kernel.kallsyms]  [k] vma_gap_update
 1.91%  slicemask  [kernel.kallsyms]  [k] unmap_region
 1.81%  slicemask  [kernel.kallsyms]  [k] memset_nocache_branch
 1.59%  slicemask  [kernel.kallsyms]  [k] kmem_cache_alloc
 1.57%  slicemask  [kernel.kallsyms]  [k] get_unmapped_area.part.7
 1.55%  slicemask  [kernel.kallsyms]  [k] up_write
 1.44%  slicemask  [kernel.kallsyms]  [k] vma_merge
 1.28%  slicemask  slicemask  [.] main
 1.27%  slicemask  [kernel.kallsyms]  [k] lru_add_drain
 1.22%  slicemask  [kernel.kallsyms]  [k] vma_link
 1.19%  slicemask  [kernel.kallsyms]  [k] tlb_gather_mmu
 1.17%  slicemask  [kernel.kallsyms]  [k] tlb_flush_mmu_free
 1.15%  slicemask  libc-2.23.so   [.] got_label
 1.11%  slicemask  [kernel.kallsyms]  [k] unlink_anon_vmas
 1.06%  slicemask  [kernel.kallsyms]  [k] lru_add_drain_cpu
 1.02%  slicemask  [kernel.kallsyms]  [k] free_pgtables
 1.01%  slicemask  [kernel.kallsyms]  [k] remove_vma
 0.98%  slicemask  [kernel.kallsyms]  [k] strlcpy
 0.98%  slicemask  [kernel.kallsyms]  [k] perf_event_mmap_output
 0.95%  slicemask  [kernel.kallsyms]  [k] may_expand_vm
 0.90%  slicemask  [kernel.kallsyms]  [k] unmap_vmas
 0.86%  slicemask  [kernel.kallsyms]  [k] down_write_killable
 0.83%  slicemask  [kernel.kallsyms]  [k] __vma_link_list
 0.83%  slicemask  [kernel.kallsyms]  [k] arch_vma_name
 0.81%  slicemask  [kernel.kallsyms]  [k]

[PATCH] cxl: Check if PSL data-cache is available before issue flush request

2018-02-13 Thread Vaibhav Jain

PSL9D doesn't have a data-cache that needs to be flushed before
resetting the card. However when cxl tries to flush data-cache on such
a card, it times-out as PSL_Control register never indicates flush
operation complete due to missing data-cache. This is usually
indicated in the kernel logs with this message:

"WARNING: cache flush timed out"

To fix this the patch checks PSL_Debug register CDC-Field(BIT:27)
which indicates the absence of a data-cache and sets a flag
'no_data_cache' in 'struct cxl_native' to indicate this. When
cxl_data_cache_flush() is called it checks the flag and if set bails
out early without requesting a data-cache flush operation to the PSL.

Signed-off-by: Vaibhav Jain 
---
 drivers/misc/cxl/cxl.h|  4 
 drivers/misc/cxl/native.c | 11 ++-
 drivers/misc/cxl/pci.c| 19 +--
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 4f015da78f28..4949b8d5a748 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -369,6 +369,9 @@ static const cxl_p2n_reg_t CXL_PSL_WED_An = {0x0A0};
 #define CXL_PSL_TFC_An_AE (1ull << (63-30)) /* Restart PSL with address error 
*/
 #define CXL_PSL_TFC_An_R  (1ull << (63-31)) /* Restart PSL transaction */
 
+/** CXL_PSL_DEBUG */
+#define CXL_PSL_DEBUG_CDC  (1ull << (63-27)) /* Coherent Data cache support */
+
 /** CXL_XSL9_IERAT_ERAT - CAIA 2 **/
 #define CXL_XSL9_IERAT_MLPID(1ull << (63-0))  /* Match LPID */
 #define CXL_XSL9_IERAT_MPID (1ull << (63-1))  /* Match PID */
@@ -669,6 +672,7 @@ struct cxl_native {
irq_hw_number_t err_hwirq;
unsigned int err_virq;
u64 ps_off;
+   bool no_data_cache; /* set if no data cache on the card */
const struct cxl_service_layer_ops *sl_ops;
 };
 
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 1b3d7c65ea3f..98f867fcef24 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -353,8 +353,17 @@ int cxl_data_cache_flush(struct cxl *adapter)
u64 reg;
unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT);
 
-   pr_devel("Flushing data cache\n");
+   /*
+* Do a datacache flush only if datacache is available.
+* In case of PSL9D datacache absent hence flush operation.
+* would timeout.
+*/
+   if (adapter->native->no_data_cache) {
+   pr_devel("No PSL data cache. Ignoring cache flush req.\n");
+   return 0;
+   }
 
+   pr_devel("Flushing data cache\n");
reg = cxl_p1_read(adapter, CXL_PSL_Control);
reg |= CXL_PSL_Control_Fr;
cxl_p1_write(adapter, CXL_PSL_Control, reg);
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 758842f65a1b..39ddf89c3c14 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -456,6 +456,7 @@ static int init_implementation_adapter_regs_psl9(struct cxl 
*adapter,
u64 chipid;
u32 phb_index;
u64 capp_unit_id;
+   u64 psl_debug;
int rc;
 
rc = cxl_calc_capp_routing(dev, , _index, _unit_id);
@@ -506,6 +507,16 @@ static int init_implementation_adapter_regs_psl9(struct 
cxl *adapter,
} else
cxl_p1_write(adapter, CXL_PSL9_DEBUG, 0x4000ULL);
 
+   /* Check if PSL has data-cache. We need to flush adapter datacache
+* when as its about to be removed. But data-cache flush is not
+* supported supported on P9-DD1 and
+*/
+   psl_debug = cxl_p1_read(adapter, CXL_PSL9_DEBUG);
+   if (cxl_is_power9_dd1() || (psl_debug & CXL_PSL_DEBUG_CDC)) {
+   dev_info(>dev, "No data-cache present\n");
+   adapter->native->no_data_cache = true;
+   }
+
return 0;
 }
 
@@ -1449,10 +1460,8 @@ int cxl_pci_reset(struct cxl *adapter)
 
/*
 * The adapter is about to be reset, so ignore errors.
-* Not supported on P9 DD1
 */
-   if ((cxl_is_power8()) || (!(cxl_is_power9_dd1(
-   cxl_data_cache_flush(adapter);
+   cxl_data_cache_flush(adapter);
 
/* pcie_warm_reset requests a fundamental pci reset which includes a
 * PERST assert/deassert.  PERST triggers a loading of the image
@@ -1936,10 +1945,8 @@ static void cxl_pci_remove_adapter(struct cxl *adapter)
 
/*
 * Flush adapter datacache as its about to be removed.
-* Not supported on P9 DD1.
 */
-   if ((cxl_is_power8()) || (!(cxl_is_power9_dd1(
-   cxl_data_cache_flush(adapter);
+   cxl_data_cache_flush(adapter);
 
cxl_deconfigure_adapter(adapter);
 
-- 
2.14.3

[PATCH V3] powerpc/mm/hash64: memset the pagetable pages on allocation.

2018-02-13 Thread Aneesh Kumar K.V

On powerpc we allocate page table pages from slab cache of different sizes. For
now we have a constructor that zero out the objects when we allocate then for
the first time. We expect the objects to be zeroed out when we free the the
object back to slab cache. This happens in the unmap path. For hugetlb pages
we call huge_pte_get_and_clear to do that. With the current configuration of
page table size, both pud and pgd level tables get allocated from the same slab
cache. At the pud level, we use the second half of the table to store the slot
information. But never clear that when unmapping. When such an freed object get
allocated at pgd level, we will have part of the page table page not initlaized
correctly. This result in kernel crash

Simplify this by calling the object initialization after kmem_cache_alloc

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 53df86d3cfce..e4d154a4d114 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -73,10 +73,13 @@ static inline void radix__pgd_free(struct mm_struct *mm, 
pgd_t *pgd)
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
+   pgd_t *pgd;
if (radix_enabled())
return radix__pgd_alloc(mm);
-   return kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
-   pgtable_gfp_flags(mm, GFP_KERNEL));
+   pgd = kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
+  pgtable_gfp_flags(mm, GFP_KERNEL));
+   memset(pgd, 0, PGD_TABLE_SIZE);
+   return pgd;
 }
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
-- 
2.14.3

Re: [PATCH 2/3] cxl: Introduce module parameter 'enable_psltrace'

2018-02-13 Thread Vaibhav Jain

Frederic Barrat  writes:

> Le 11/02/2018 à 18:10, Vaibhav Jain a écrit :
>> Thanks for reviewing the patch Christophe,
>> 
>> christophe lombard  writes:
 +bool cxl_enable_psltrace = true;
 +module_param_named(enable_psltrace, cxl_enable_psltrace, bool, 0600);
 +MODULE_PARM_DESC(enable_psltrace, "Set PSL traces on probe. default: on");
 +
>>> I am not too agree to add a new parameter. This can cause doubts.
>>> PSL team has confirmed that enabling traces has no impact.
>>> Do you see any reason to disable the traces ?
>> 
>> Traces on PSL follow a 'set and fetch' model. So once the trace buffer for
>> a specific array is full it will stop and switch to 'FIN' state and at
>> that point we need to fetch the trace-data and reinit the array to
>> re-arm it.
>
> If the PSL trace arrays don't wrap, is there anything to gain by 
> enabling tracing by default instead of letting the developer handle it 
> through sysfs? I was under the (now wrong) impression that the PSL would 
> wrap.
Enabling the traces quickly enough should let AFU developers debug init
issues. Specifically AFU's that rely on cxl kernel-apis.

> I'm not a big fan of the module parameter. It seems we're giving a 
> second way of activating traces on top of sysfs, more cumbersome and 
> limited.
Yes, this indeed is providing a second way of activating traces on top
of sysfs. The way I see this that there are two ways PSL traces are
managed:

1. Let userspace handle state machine of the traces entirely via sysfs.
2. PSL trace machine is handled via cxl. It starts it when a card is
probed and stops it when the card is reset.

-- 
Vaibhav Jain 
Linux Technology Center, IBM India Pvt. Ltd.

Re: samples/seccomp/ broken when cross compiling s390, ppc allyesconfig

2018-02-13 Thread Michal Hocko

On Tue 13-02-18 21:16:55, Michael Ellerman wrote:
> Kees Cook  writes:
> 
> > On Mon, Feb 12, 2018 at 7:25 PM, Michael Ellerman  
> > wrote:
> >> Michal Hocko  writes:
> >>> Hi,
> >>> my build test machinery chokes on samples/seccomp when cross compiling
> >>> s390 and ppc64 allyesconfig. This has been the case for quite some
> >>> time already but I never found time to look at the problem and report
> >>> it. It seems this is not new issue and similar thing happend for
> >>> MIPS e9107f88c985 ("samples/seccomp/Makefile: do not build tests if
> >>> cross-compiling for MIPS").
> >>>
> >>> The build logs are attached.
> >>>
> >>> What is the best way around this? Should we simply skip compilation on
> >>> cross compile or is actually anybody relying on that? Or should I simply
> >>> disable it for s390 and ppc?
> >>
> >> The whole thing seems very confused. It's not building for the target,
> >> it's building for the host, ie. the Makefile sets hostprogs-m and
> >> HOSTCFLAGS etc.
> >>
> >> So it can't possibly work with cross compiling as it's currently
> >> written.
> >>
> >> Either the Makefile needs some serious work to properly support cross
> >> compiling or it should just be disabled when cross compiling.
> >
> > Hrm, yeah, the goal was to entirely disable cross compiling, but I
> > guess we didn't hit it with a hard enough hammer. :)
> 
> Do you know why it is written that way? Why doesn't it just try to cross
> compile like normal code?

No idea, sorry. All I know about this code is that it breaks my build
testing.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] powerpc/xive: use hw CPU ids when configuring the CPU queues

2018-02-13 Thread Michael Ellerman

Cédric Le Goater  writes:

> The CPU event notification queues on sPAPR should be configured using
> a hardware CPU identifier.
>
> The problem did not show up on the Power Hypervisor because pHyp
> supports 8 threads per core which keeps CPU number contiguous. This is
> not the case on all sPAPR virtual machines, some use SMT=1.
>
> Also improve error logging by adding the CPU number.
>
> Signed-off-by: Cédric Le Goater 
> ---
>
>  I think we should send this one to stable also.

Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt 
controller")
Cc: sta...@vger.kernel.org # v4.14+

?

cheers

> diff --git a/arch/powerpc/sysdev/xive/spapr.c 
> b/arch/powerpc/sysdev/xive/spapr.c
> index d9c4c9366049..091f1d0d0af1 100644
> --- a/arch/powerpc/sysdev/xive/spapr.c
> +++ b/arch/powerpc/sysdev/xive/spapr.c
> @@ -356,7 +356,8 @@ static int xive_spapr_configure_queue(u32 target, struct 
> xive_q *q, u8 prio,
>  
>   rc = plpar_int_get_queue_info(0, target, prio, _page, _size);
>   if (rc) {
> - pr_err("Error %lld getting queue info prio %d\n", rc, prio);
> + pr_err("Error %lld getting queue info CPU %d prio %d\n", rc,
> +target, prio);
>   rc = -EIO;
>   goto fail;
>   }
> @@ -370,7 +371,8 @@ static int xive_spapr_configure_queue(u32 target, struct 
> xive_q *q, u8 prio,
>   /* Configure and enable the queue in HW */
>   rc = plpar_int_set_queue_config(flags, target, prio, qpage_phys, order);
>   if (rc) {
> - pr_err("Error %lld setting queue for prio %d\n", rc, prio);
> + pr_err("Error %lld setting queue for CPU %d prio %d\n", rc,
> +target, prio);
>   rc = -EIO;
>   } else {
>   q->qpage = qpage;
> @@ -389,8 +391,8 @@ static int xive_spapr_setup_queue(unsigned int cpu, 
> struct xive_cpu *xc,
>   if (IS_ERR(qpage))
>   return PTR_ERR(qpage);
>  
> - return xive_spapr_configure_queue(cpu, q, prio, qpage,
> -   xive_queue_shift);
> + return xive_spapr_configure_queue(get_hard_smp_processor_id(cpu),
> +   q, prio, qpage, xive_queue_shift);
>  }
>  
>  static void xive_spapr_cleanup_queue(unsigned int cpu, struct xive_cpu *xc,
> @@ -399,10 +401,12 @@ static void xive_spapr_cleanup_queue(unsigned int cpu, 
> struct xive_cpu *xc,
>   struct xive_q *q = >queue[prio];
>   unsigned int alloc_order;
>   long rc;
> + int hw_cpu = get_hard_smp_processor_id(cpu);
>  
> - rc = plpar_int_set_queue_config(0, cpu, prio, 0, 0);
> + rc = plpar_int_set_queue_config(0, hw_cpu, prio, 0, 0);
>   if (rc)
> - pr_err("Error %ld setting queue for prio %d\n", rc, prio);
> + pr_err("Error %ld setting queue for CPU %d prio %d\n", rc,
> +hw_cpu, prio);
>  
>   alloc_order = xive_alloc_order(xive_queue_shift);
>   free_pages((unsigned long)q->qpage, alloc_order);
> -- 
> 2.13.6

Re: samples/seccomp/ broken when cross compiling s390, ppc allyesconfig

2018-02-13 Thread Michael Ellerman

Kees Cook  writes:

> On Mon, Feb 12, 2018 at 7:25 PM, Michael Ellerman  wrote:
>> Michal Hocko  writes:
>>> Hi,
>>> my build test machinery chokes on samples/seccomp when cross compiling
>>> s390 and ppc64 allyesconfig. This has been the case for quite some
>>> time already but I never found time to look at the problem and report
>>> it. It seems this is not new issue and similar thing happend for
>>> MIPS e9107f88c985 ("samples/seccomp/Makefile: do not build tests if
>>> cross-compiling for MIPS").
>>>
>>> The build logs are attached.
>>>
>>> What is the best way around this? Should we simply skip compilation on
>>> cross compile or is actually anybody relying on that? Or should I simply
>>> disable it for s390 and ppc?
>>
>> The whole thing seems very confused. It's not building for the target,
>> it's building for the host, ie. the Makefile sets hostprogs-m and
>> HOSTCFLAGS etc.
>>
>> So it can't possibly work with cross compiling as it's currently
>> written.
>>
>> Either the Makefile needs some serious work to properly support cross
>> compiling or it should just be disabled when cross compiling.
>
> Hrm, yeah, the goal was to entirely disable cross compiling, but I
> guess we didn't hit it with a hard enough hammer. :)

Do you know why it is written that way? Why doesn't it just try to cross
compile like normal code?

cheers

Re: [PATCH 5/5] mtd: Stop updating erase_info->state and calling mtd_erase_callback()

2018-02-13 Thread Miquel Raynal

Hi Boris,

On Tue, 13 Feb 2018 09:17:14 +0100, Boris Brezillon
 wrote:

> On Tue, 13 Feb 2018 08:42:46 +0100
> Miquel Raynal  wrote:
> 
> > Hi Boris,
> > 
> > Just a few comments about the form.
> > 
> > Otherwise:
> > Reviewed-by: Miquel Raynal 
> > 
> >   
> > > diff --git a/drivers/mtd/devices/lart.c b/drivers/mtd/devices/lart.c
> > > index 555b94406e0b..3d6c8ffd351f 100644
> > > --- a/drivers/mtd/devices/lart.c
> > > +++ b/drivers/mtd/devices/lart.c
> > > @@ -415,7 +415,6 @@ static int flash_erase (struct mtd_info *mtd,struct 
> > > erase_info *instr)
> > >{
> > >   if (!erase_block (addr))
> > > {
> > > -  instr->state = MTD_ERASE_FAILED;
> > >return (-EIO);
> > > }
> > 
> > You can also safely remove these '{' '}'  
> 
> Well, this patch is not about fixing coding style issues, otherwise I'd
> have a lot more work on this driver :-)

Sure, I was not referring to the weird style but just that you switch
from two to one line in the block, thus the braces are not needed
anymore.

> 
> >   
> > >  
> > > @@ -425,9 +424,6 @@ static int flash_erase (struct mtd_info *mtd,struct 
> > > erase_info *instr)
> > >   if (addr == mtd->eraseregions[i].offset + 
> > > (mtd->eraseregions[i].erasesize * mtd->eraseregions[i].numblocks)) i++;
> > >}
> > >  
> > > -   instr->state = MTD_ERASE_DONE;
> > > -   mtd_erase_callback(instr);
> > > -
> > > return (0);
> > >  }
> > >  
> > > diff --git a/drivers/mtd/devices/mtd_dataflash.c 
> > > b/drivers/mtd/devices/mtd_dataflash.c
> > > index 5dc8bd042cc5..aaaeaae01e1d 100644
> > > --- a/drivers/mtd/devices/mtd_dataflash.c
> > > +++ b/drivers/mtd/devices/mtd_dataflash.c
> > > @@ -220,10 +220,6 @@ static int dataflash_erase(struct mtd_info *mtd, 
> > > struct erase_info *instr)
> > >   }
> > >   mutex_unlock(>lock);
> > >  
> > > - /* Inform MTD subsystem that erase is complete */
> > > - instr->state = MTD_ERASE_DONE;
> > > - mtd_erase_callback(instr);
> > > -
> > >   return 0;
> > >  }
> > >  
> > > diff --git a/drivers/mtd/devices/mtdram.c b/drivers/mtd/devices/mtdram.c
> > > index 0bf4aeaf0cb8..efef43c6684b 100644
> > > --- a/drivers/mtd/devices/mtdram.c
> > > +++ b/drivers/mtd/devices/mtdram.c
> > > @@ -60,8 +60,6 @@ static int ram_erase(struct mtd_info *mtd, struct 
> > > erase_info *instr)
> > >   if (check_offs_len(mtd, instr->addr, instr->len))
> > >   return -EINVAL;
> > >   memset((char *)mtd->priv + instr->addr, 0xff, instr->len);
> > > - instr->state = MTD_ERASE_DONE;
> > > - mtd_erase_callback(instr);
> > 
> > Space ?  
> 
> I could add a blank line, but again, I'm just following the coding style
> in place in this file :-).

Ok.

> 
> >   
> > >   return 0;
> > >  }
> > >  
> > > diff --git a/drivers/mtd/devices/phram.c b/drivers/mtd/devices/phram.c
> > > index 7287696a21f9..a963c88d392d 100644
> > > --- a/drivers/mtd/devices/phram.c
> > > +++ b/drivers/mtd/devices/phram.c
> > > @@ -44,8 +44,6 @@ static int phram_erase(struct mtd_info *mtd, struct 
> > > erase_info *instr)
> > >* I don't feel at all ashamed. This kind of thing is possible anyway
> > >* with flash, but unlikely.
> > >*/
> > 
> > Not sure this comment is still relevant? Maybe you could remove it
> > or at least change it? 
> >   
> > > - instr->state = MTD_ERASE_DONE;
> > > - mtd_erase_callback(instr);
> > 
> > Space ?
> >   
> > >   return 0;
> > >  }
> > >  
> > > diff --git a/drivers/mtd/devices/pmc551.c b/drivers/mtd/devices/pmc551.c
> > > index cadea0620cd0..5d842cbca3de 100644
> > > --- a/drivers/mtd/devices/pmc551.c
> > > +++ b/drivers/mtd/devices/pmc551.c
> > > @@ -184,12 +184,10 @@ static int pmc551_erase(struct mtd_info *mtd, 
> > > struct erase_info *instr)
> > >   }
> > >  
> > >out:
> > > - instr->state = MTD_ERASE_DONE;
> > >  #ifdef CONFIG_MTD_PMC551_DEBUG
> > >   printk(KERN_DEBUG "pmc551_erase() done\n");
> > >  #endif
> > >  
> > > - mtd_erase_callback(instr);
> > >   return 0;
> > >  }
> > >  
> > > diff --git a/drivers/mtd/devices/powernv_flash.c 
> > > b/drivers/mtd/devices/powernv_flash.c
> > > index 26f9feaa5d17..5f383630c16f 100644
> > > --- a/drivers/mtd/devices/powernv_flash.c
> > > +++ b/drivers/mtd/devices/powernv_flash.c
> > > @@ -175,19 +175,12 @@ static int powernv_flash_erase(struct mtd_info 
> > > *mtd, struct erase_info *erase)
> > >  {
> > >   int rc;
> > >  
> > > - erase->state = MTD_ERASING;
> > > -
> > >   /* todo: register our own notifier to do a true async implementation */
> > >   rc =  powernv_flash_async_op(mtd, FLASH_OP_ERASE, erase->addr,
> > >   erase->len, NULL, NULL);
> > 
> > Are you sure this is still needed? Maybe this should go away in your
> > first patch?  
> 
> Hm, indeed. This comment should be dropped.
> 
> >   
> > > -
> > > - if (rc) {
> > > + if (rc)
> > >   erase->fail_addr = erase->addr;
> > > -

Re: [PATCH 5/5] mtd: Stop updating erase_info->state and calling mtd_erase_callback()

2018-02-13 Thread Miquel Raynal

Hi Boris,

Just a few comments about the form.

Otherwise:
Reviewed-by: Miquel Raynal 


> diff --git a/drivers/mtd/devices/lart.c b/drivers/mtd/devices/lart.c
> index 555b94406e0b..3d6c8ffd351f 100644
> --- a/drivers/mtd/devices/lart.c
> +++ b/drivers/mtd/devices/lart.c
> @@ -415,7 +415,6 @@ static int flash_erase (struct mtd_info *mtd,struct 
> erase_info *instr)
>{
>   if (!erase_block (addr))
> {
> -  instr->state = MTD_ERASE_FAILED;
>return (-EIO);
> }

You can also safely remove these '{' '}'

>  
> @@ -425,9 +424,6 @@ static int flash_erase (struct mtd_info *mtd,struct 
> erase_info *instr)
>   if (addr == mtd->eraseregions[i].offset + 
> (mtd->eraseregions[i].erasesize * mtd->eraseregions[i].numblocks)) i++;
>}
>  
> -   instr->state = MTD_ERASE_DONE;
> -   mtd_erase_callback(instr);
> -
> return (0);
>  }
>  
> diff --git a/drivers/mtd/devices/mtd_dataflash.c 
> b/drivers/mtd/devices/mtd_dataflash.c
> index 5dc8bd042cc5..aaaeaae01e1d 100644
> --- a/drivers/mtd/devices/mtd_dataflash.c
> +++ b/drivers/mtd/devices/mtd_dataflash.c
> @@ -220,10 +220,6 @@ static int dataflash_erase(struct mtd_info *mtd, struct 
> erase_info *instr)
>   }
>   mutex_unlock(>lock);
>  
> - /* Inform MTD subsystem that erase is complete */
> - instr->state = MTD_ERASE_DONE;
> - mtd_erase_callback(instr);
> -
>   return 0;
>  }
>  
> diff --git a/drivers/mtd/devices/mtdram.c b/drivers/mtd/devices/mtdram.c
> index 0bf4aeaf0cb8..efef43c6684b 100644
> --- a/drivers/mtd/devices/mtdram.c
> +++ b/drivers/mtd/devices/mtdram.c
> @@ -60,8 +60,6 @@ static int ram_erase(struct mtd_info *mtd, struct 
> erase_info *instr)
>   if (check_offs_len(mtd, instr->addr, instr->len))
>   return -EINVAL;
>   memset((char *)mtd->priv + instr->addr, 0xff, instr->len);
> - instr->state = MTD_ERASE_DONE;
> - mtd_erase_callback(instr);

Space ?

>   return 0;
>  }
>  
> diff --git a/drivers/mtd/devices/phram.c b/drivers/mtd/devices/phram.c
> index 7287696a21f9..a963c88d392d 100644
> --- a/drivers/mtd/devices/phram.c
> +++ b/drivers/mtd/devices/phram.c
> @@ -44,8 +44,6 @@ static int phram_erase(struct mtd_info *mtd, struct 
> erase_info *instr)
>* I don't feel at all ashamed. This kind of thing is possible anyway
>* with flash, but unlikely.
>*/

Not sure this comment is still relevant? Maybe you could remove it
or at least change it? 

> - instr->state = MTD_ERASE_DONE;
> - mtd_erase_callback(instr);

Space ?

>   return 0;
>  }
>  
> diff --git a/drivers/mtd/devices/pmc551.c b/drivers/mtd/devices/pmc551.c
> index cadea0620cd0..5d842cbca3de 100644
> --- a/drivers/mtd/devices/pmc551.c
> +++ b/drivers/mtd/devices/pmc551.c
> @@ -184,12 +184,10 @@ static int pmc551_erase(struct mtd_info *mtd, struct 
> erase_info *instr)
>   }
>  
>out:
> - instr->state = MTD_ERASE_DONE;
>  #ifdef CONFIG_MTD_PMC551_DEBUG
>   printk(KERN_DEBUG "pmc551_erase() done\n");
>  #endif
>  
> - mtd_erase_callback(instr);
>   return 0;
>  }
>  
> diff --git a/drivers/mtd/devices/powernv_flash.c 
> b/drivers/mtd/devices/powernv_flash.c
> index 26f9feaa5d17..5f383630c16f 100644
> --- a/drivers/mtd/devices/powernv_flash.c
> +++ b/drivers/mtd/devices/powernv_flash.c
> @@ -175,19 +175,12 @@ static int powernv_flash_erase(struct mtd_info *mtd, 
> struct erase_info *erase)
>  {
>   int rc;
>  
> - erase->state = MTD_ERASING;
> -
>   /* todo: register our own notifier to do a true async implementation */
>   rc =  powernv_flash_async_op(mtd, FLASH_OP_ERASE, erase->addr,
>   erase->len, NULL, NULL);

Are you sure this is still needed? Maybe this should go away in your
first patch?

> -
> - if (rc) {
> + if (rc)
>   erase->fail_addr = erase->addr;
> - erase->state = MTD_ERASE_FAILED;
> - } else {
> - erase->state = MTD_ERASE_DONE;
> - }
> - mtd_erase_callback(erase);
> +
>   return rc;
>  }
>  
> diff --git a/drivers/mtd/devices/slram.c b/drivers/mtd/devices/slram.c
> index 0ec85f316d24..2f05e1801047 100644
> --- a/drivers/mtd/devices/slram.c
> +++ b/drivers/mtd/devices/slram.c
> @@ -88,8 +88,6 @@ static int slram_erase(struct mtd_info *mtd, struct 
> erase_info *instr)
>* I don't feel at all ashamed. This kind of thing is possible anyway
>* with flash, but unlikely.
>*/

Same with this comment.

> - instr->state = MTD_ERASE_DONE;
> - mtd_erase_callback(instr);

Space ?

>   return(0);
>  }
>  




-- 
Miquel Raynal, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com

Re: [PATCH] headers: untangle kmemleak.h from mm.h

2018-02-13 Thread Michael Ellerman

Randy Dunlap  writes:

> On 02/12/2018 04:28 AM, Michael Ellerman wrote:
>> Randy Dunlap  writes:
>> 
>>> From: Randy Dunlap 
>>>
>>> Currently  #includes  for no obvious
>>> reason. It looks like it's only a convenience, so remove kmemleak.h
>>> from slab.h and add  to any users of kmemleak_*
>>> that don't already #include it.
>>> Also remove  from source files that do not use it.
>>>
>>> This is tested on i386 allmodconfig and x86_64 allmodconfig. It
>>> would be good to run it through the 0day bot for other $ARCHes.
>>> I have neither the horsepower nor the storage space for the other
>>> $ARCHes.
>>>
>>> [slab.h is the second most used header file after module.h; kernel.h
>>> is right there with slab.h. There could be some minor error in the
>>> counting due to some #includes having comments after them and I
>>> didn't combine all of those.]
>>>
>>> This is Lingchi patch #1 (death by a thousand cuts, applied to kernel
>>> header files).
>>>
>>> Signed-off-by: Randy Dunlap 
>> 
>> I threw it at a random selection of configs and so far the only failures
>> I'm seeing are:
>> 
>>   lib/test_firmware.c:134:2: error: implicit declaration of function 'vfree' 
>> [-Werror=implicit-function-declaration]  
>> 
>>   lib/test_firmware.c:620:25: error: implicit declaration of function 
>> 'vzalloc' [-Werror=implicit-function-declaration]
>>   lib/test_firmware.c:620:2: error: implicit declaration of function 
>> 'vzalloc' [-Werror=implicit-function-declaration]
>>   security/integrity/digsig.c:146:2: error: implicit declaration of function 
>> 'vfree' [-Werror=implicit-function-declaration]
>
> Both of those source files need to #include .

Yep, I added those and rebuilt. I don't see any more failures that look
related to your patch.

  http://kisskb.ellerman.id.au/kisskb/head/13399/


I haven't gone through the defconfigs I have enabled for a while, so
it's possible I have some missing but it's still a reasonable cross
section.

cheers

[PATCH] powerpc/xive: use hw CPU ids when configuring the CPU queues

2018-02-13 Thread Cédric Le Goater

The CPU event notification queues on sPAPR should be configured using
a hardware CPU identifier.

The problem did not show up on the Power Hypervisor because pHyp
supports 8 threads per core which keeps CPU number contiguous. This is
not the case on all sPAPR virtual machines, some use SMT=1.

Also improve error logging by adding the CPU number.

Signed-off-by: Cédric Le Goater 
---

 I think we should send this one to stable also.

 arch/powerpc/sysdev/xive/spapr.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
index d9c4c9366049..091f1d0d0af1 100644
--- a/arch/powerpc/sysdev/xive/spapr.c
+++ b/arch/powerpc/sysdev/xive/spapr.c
@@ -356,7 +356,8 @@ static int xive_spapr_configure_queue(u32 target, struct 
xive_q *q, u8 prio,
 
rc = plpar_int_get_queue_info(0, target, prio, _page, _size);
if (rc) {
-   pr_err("Error %lld getting queue info prio %d\n", rc, prio);
+   pr_err("Error %lld getting queue info CPU %d prio %d\n", rc,
+  target, prio);
rc = -EIO;
goto fail;
}
@@ -370,7 +371,8 @@ static int xive_spapr_configure_queue(u32 target, struct 
xive_q *q, u8 prio,
/* Configure and enable the queue in HW */
rc = plpar_int_set_queue_config(flags, target, prio, qpage_phys, order);
if (rc) {
-   pr_err("Error %lld setting queue for prio %d\n", rc, prio);
+   pr_err("Error %lld setting queue for CPU %d prio %d\n", rc,
+  target, prio);
rc = -EIO;
} else {
q->qpage = qpage;
@@ -389,8 +391,8 @@ static int xive_spapr_setup_queue(unsigned int cpu, struct 
xive_cpu *xc,
if (IS_ERR(qpage))
return PTR_ERR(qpage);
 
-   return xive_spapr_configure_queue(cpu, q, prio, qpage,
- xive_queue_shift);
+   return xive_spapr_configure_queue(get_hard_smp_processor_id(cpu),
+ q, prio, qpage, xive_queue_shift);
 }
 
 static void xive_spapr_cleanup_queue(unsigned int cpu, struct xive_cpu *xc,
@@ -399,10 +401,12 @@ static void xive_spapr_cleanup_queue(unsigned int cpu, 
struct xive_cpu *xc,
struct xive_q *q = >queue[prio];
unsigned int alloc_order;
long rc;
+   int hw_cpu = get_hard_smp_processor_id(cpu);
 
-   rc = plpar_int_set_queue_config(0, cpu, prio, 0, 0);
+   rc = plpar_int_set_queue_config(0, hw_cpu, prio, 0, 0);
if (rc)
-   pr_err("Error %ld setting queue for prio %d\n", rc, prio);
+   pr_err("Error %ld setting queue for CPU %d prio %d\n", rc,
+  hw_cpu, prio);
 
alloc_order = xive_alloc_order(xive_queue_shift);
free_pages((unsigned long)q->qpage, alloc_order);
-- 
2.13.6

Re: samples/seccomp/ broken when cross compiling s390, ppc allyesconfig

2018-02-13 Thread Michal Hocko

On Mon 12-02-18 21:54:39, Kees Cook wrote:
> On Mon, Feb 12, 2018 at 7:25 PM, Michael Ellerman  wrote:
> > Michal Hocko  writes:
> >> Hi,
> >> my build test machinery chokes on samples/seccomp when cross compiling
> >> s390 and ppc64 allyesconfig. This has been the case for quite some
> >> time already but I never found time to look at the problem and report
> >> it. It seems this is not new issue and similar thing happend for
> >> MIPS e9107f88c985 ("samples/seccomp/Makefile: do not build tests if
> >> cross-compiling for MIPS").
> >>
> >> The build logs are attached.
> >>
> >> What is the best way around this? Should we simply skip compilation on
> >> cross compile or is actually anybody relying on that? Or should I simply
> >> disable it for s390 and ppc?
> >
> > The whole thing seems very confused. It's not building for the target,
> > it's building for the host, ie. the Makefile sets hostprogs-m and
> > HOSTCFLAGS etc.
> >
> > So it can't possibly work with cross compiling as it's currently
> > written.
> >
> > Either the Makefile needs some serious work to properly support cross
> > compiling or it should just be disabled when cross compiling.
> 
> Hrm, yeah, the goal was to entirely disable cross compiling, but I
> guess we didn't hit it with a hard enough hammer. :)

Hammer like this?

diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile
index 0e349b80686e..ba942e3ead89 100644
--- a/samples/seccomp/Makefile
+++ b/samples/seccomp/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
+ifndef CROSS_COMPILE
 hostprogs-$(CONFIG_SAMPLE_SECCOMP) := bpf-fancy dropper bpf-direct
 
 HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include
@@ -16,7 +17,6 @@ HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include
 bpf-direct-objs := bpf-direct.o
 
 # Try to match the kernel target.
-ifndef CROSS_COMPILE
 ifndef CONFIG_64BIT
 
 # s390 has -m31 flag to build 31 bit binaries
@@ -35,12 +35,4 @@ HOSTLOADLIBES_bpf-fancy += $(MFLAG)
 HOSTLOADLIBES_dropper += $(MFLAG)
 endif
 always := $(hostprogs-m)
-else
-# MIPS system calls are defined based on the -mabi that is passed
-# to the toolchain which may or may not be a valid option
-# for the host toolchain. So disable tests if target architecture
-# is MIPS but the host isn't.
-ifndef CONFIG_MIPS
-always := $(hostprogs-m)
-endif
 endif
-- 
Michal Hocko
SUSE Labs

Re: [RFC PATCH 0/5] powerpc/mm/slice: improve slice speed and stack use

2018-02-13 Thread Nicholas Piggin

On Mon, 12 Feb 2018 18:42:21 +0100
Christophe LEROY  wrote:

> Le 12/02/2018 à 16:24, Nicholas Piggin a écrit :
> > On Mon, 12 Feb 2018 16:02:23 +0100
> > Christophe LEROY  wrote:
> >   
> >> Le 10/02/2018 à 09:11, Nicholas Piggin a écrit :  
> >>> This series intends to improve performance and reduce stack
> >>> consumption in the slice allocation code. It does it by keeping slice
> >>> masks in the mm_context rather than compute them for each allocation,
> >>> and by reducing bitmaps and slice_masks from stacks, using pointers
> >>> instead where possible.
> >>>
> >>> checkstack.pl gives, before:
> >>> 0x0de4 slice_get_unmapped_area [slice.o]:   656
> >>> 0x1b4c is_hugepage_only_range [slice.o]:512
> >>> 0x075c slice_find_area_topdown [slice.o]:   416
> >>> 0x04c8 slice_find_area_bottomup.isra.1 [slice.o]:   272
> >>> 0x1aa0 slice_set_range_psize [slice.o]: 240
> >>> 0x0a64 slice_find_area [slice.o]:   176
> >>> 0x0174 slice_check_fit [slice.o]:   112
> >>>
> >>> after:
> >>> 0x0d70 slice_get_unmapped_area [slice.o]:   320
> >>> 0x08f8 slice_find_area [slice.o]:   144
> >>> 0x1860 slice_set_range_psize [slice.o]: 144
> >>> 0x18ec is_hugepage_only_range [slice.o]:144
> >>> 0x0750 slice_find_area_bottomup.isra.4 [slice.o]:   128
> >>>
> >>> The benchmark in https://github.com/linuxppc/linux/issues/49 gives, 
> >>> before:
> >>> $ time ./slicemask
> >>> real  0m20.712s
> >>> user  0m5.830s
> >>> sys   0m15.105s
> >>>
> >>> after:
> >>> $ time ./slicemask
> >>> real  0m13.197s
> >>> user  0m5.409s
> >>> sys   0m7.779s  
> >>
> >> Hi,
> >>
> >> I tested your serie on an 8xx, on top of patch
> >> https://patchwork.ozlabs.org/patch/871675/
> >>
> >> I don't get a result as significant as yours, but there is some
> >> improvment anyway:
> >>
> >> ITERATION 50
> >>
> >> Before:
> >>
> >> root@vgoip:~# time ./slicemask
> >> real0m 33.26s
> >> user0m 1.94s
> >> sys 0m 30.85s
> >>
> >> After:
> >> root@vgoip:~# time ./slicemask
> >> real0m 29.69s
> >> user0m 2.11s
> >> sys 0m 27.15s
> >>
> >> Most significant improvment is obtained with the first patch of your serie:
> >> root@vgoip:~# time ./slicemask
> >> real0m 30.85s
> >> user0m 1.80s
> >> sys 0m 28.57s  
> > 
> > Okay, thanks. Are you still spending significant time in the slice
> > code?  
> 
> Do you mean am I still updating my patches ? No I hope we are at last 

Actually I was wondering about CPU time spent for the microbenchmark :)

> run with v4 now that Aneesh has tagged all of them as reviewed-by himself.
> Once the serie has been accepted, my next step will be to backport at 
> least the 3 first ones in kernel 4.14
> 
> >   
> >>
> >> Had to modify your serie a bit, if you are interested I can post it.
> >>  
> > 
> > Sure, that would be good.  
> 
> Ok, lets share it. The patch are not 100% clean.

Those look pretty good, thanks for doing that work.

Thanks,
Nick

Re: [PATCH 5/5] mtd: Stop updating erase_info->state and calling mtd_erase_callback()

2018-02-13 Thread Boris Brezillon

On Tue, 13 Feb 2018 08:42:46 +0100
Miquel Raynal  wrote:

> Hi Boris,
> 
> Just a few comments about the form.
> 
> Otherwise:
> Reviewed-by: Miquel Raynal 
> 
> 
> > diff --git a/drivers/mtd/devices/lart.c b/drivers/mtd/devices/lart.c
> > index 555b94406e0b..3d6c8ffd351f 100644
> > --- a/drivers/mtd/devices/lart.c
> > +++ b/drivers/mtd/devices/lart.c
> > @@ -415,7 +415,6 @@ static int flash_erase (struct mtd_info *mtd,struct 
> > erase_info *instr)
> >  {
> > if (!erase_block (addr))
> >   {
> > -instr->state = MTD_ERASE_FAILED;
> >  return (-EIO);
> >   }  
> 
> You can also safely remove these '{' '}'

Well, this patch is not about fixing coding style issues, otherwise I'd
have a lot more work on this driver :-)

> 
> >  
> > @@ -425,9 +424,6 @@ static int flash_erase (struct mtd_info *mtd,struct 
> > erase_info *instr)
> > if (addr == mtd->eraseregions[i].offset + 
> > (mtd->eraseregions[i].erasesize * mtd->eraseregions[i].numblocks)) i++;
> >  }
> >  
> > -   instr->state = MTD_ERASE_DONE;
> > -   mtd_erase_callback(instr);
> > -
> > return (0);
> >  }
> >  
> > diff --git a/drivers/mtd/devices/mtd_dataflash.c 
> > b/drivers/mtd/devices/mtd_dataflash.c
> > index 5dc8bd042cc5..aaaeaae01e1d 100644
> > --- a/drivers/mtd/devices/mtd_dataflash.c
> > +++ b/drivers/mtd/devices/mtd_dataflash.c
> > @@ -220,10 +220,6 @@ static int dataflash_erase(struct mtd_info *mtd, 
> > struct erase_info *instr)
> > }
> > mutex_unlock(>lock);
> >  
> > -   /* Inform MTD subsystem that erase is complete */
> > -   instr->state = MTD_ERASE_DONE;
> > -   mtd_erase_callback(instr);
> > -
> > return 0;
> >  }
> >  
> > diff --git a/drivers/mtd/devices/mtdram.c b/drivers/mtd/devices/mtdram.c
> > index 0bf4aeaf0cb8..efef43c6684b 100644
> > --- a/drivers/mtd/devices/mtdram.c
> > +++ b/drivers/mtd/devices/mtdram.c
> > @@ -60,8 +60,6 @@ static int ram_erase(struct mtd_info *mtd, struct 
> > erase_info *instr)
> > if (check_offs_len(mtd, instr->addr, instr->len))
> > return -EINVAL;
> > memset((char *)mtd->priv + instr->addr, 0xff, instr->len);
> > -   instr->state = MTD_ERASE_DONE;
> > -   mtd_erase_callback(instr);  
> 
> Space ?

I could add a blank line, but again, I'm just following the coding style
in place in this file :-).

> 
> > return 0;
> >  }
> >  
> > diff --git a/drivers/mtd/devices/phram.c b/drivers/mtd/devices/phram.c
> > index 7287696a21f9..a963c88d392d 100644
> > --- a/drivers/mtd/devices/phram.c
> > +++ b/drivers/mtd/devices/phram.c
> > @@ -44,8 +44,6 @@ static int phram_erase(struct mtd_info *mtd, struct 
> > erase_info *instr)
> >  * I don't feel at all ashamed. This kind of thing is possible anyway
> >  * with flash, but unlikely.
> >  */  
> 
> Not sure this comment is still relevant? Maybe you could remove it
> or at least change it? 
> 
> > -   instr->state = MTD_ERASE_DONE;
> > -   mtd_erase_callback(instr);  
> 
> Space ?
> 
> > return 0;
> >  }
> >  
> > diff --git a/drivers/mtd/devices/pmc551.c b/drivers/mtd/devices/pmc551.c
> > index cadea0620cd0..5d842cbca3de 100644
> > --- a/drivers/mtd/devices/pmc551.c
> > +++ b/drivers/mtd/devices/pmc551.c
> > @@ -184,12 +184,10 @@ static int pmc551_erase(struct mtd_info *mtd, struct 
> > erase_info *instr)
> > }
> >  
> >out:
> > -   instr->state = MTD_ERASE_DONE;
> >  #ifdef CONFIG_MTD_PMC551_DEBUG
> > printk(KERN_DEBUG "pmc551_erase() done\n");
> >  #endif
> >  
> > -   mtd_erase_callback(instr);
> > return 0;
> >  }
> >  
> > diff --git a/drivers/mtd/devices/powernv_flash.c 
> > b/drivers/mtd/devices/powernv_flash.c
> > index 26f9feaa5d17..5f383630c16f 100644
> > --- a/drivers/mtd/devices/powernv_flash.c
> > +++ b/drivers/mtd/devices/powernv_flash.c
> > @@ -175,19 +175,12 @@ static int powernv_flash_erase(struct mtd_info *mtd, 
> > struct erase_info *erase)
> >  {
> > int rc;
> >  
> > -   erase->state = MTD_ERASING;
> > -
> > /* todo: register our own notifier to do a true async implementation */
> > rc =  powernv_flash_async_op(mtd, FLASH_OP_ERASE, erase->addr,
> > erase->len, NULL, NULL);  
> 
> Are you sure this is still needed? Maybe this should go away in your
> first patch?

Hm, indeed. This comment should be dropped.

> 
> > -
> > -   if (rc) {
> > +   if (rc)
> > erase->fail_addr = erase->addr;
> > -   erase->state = MTD_ERASE_FAILED;
> > -   } else {
> > -   erase->state = MTD_ERASE_DONE;
> > -   }
> > -   mtd_erase_callback(erase);
> > +
> > return rc;
> >  }
> >  
> > diff --git a/drivers/mtd/devices/slram.c b/drivers/mtd/devices/slram.c
> > index 0ec85f316d24..2f05e1801047 100644
> > --- a/drivers/mtd/devices/slram.c
> > +++ b/drivers/mtd/devices/slram.c
> > @@ -88,8 +88,6 @@ static int slram_erase(struct mtd_info *mtd, struct 
> > erase_info *instr)
> >  * I

[bug report] ocxl: Add AFU interrupt support

2018-02-13 Thread Dan Carpenter

Hello Frederic Barrat,

The patch aeddad1760ae: "ocxl: Add AFU interrupt support" from Jan
23, 2018, leads to the following static checker warning:

drivers/misc/ocxl/file.c:163 afu_ioctl()
warn: maybe return -EFAULT instead of the bytes remaining?

drivers/misc/ocxl/file.c
   111  static long afu_ioctl(struct file *file, unsigned int cmd,
   112  unsigned long args)
   113  {
   114  struct ocxl_context *ctx = file->private_data;
   115  struct ocxl_ioctl_irq_fd irq_fd;
   116  u64 irq_offset;
   117  long rc;
   118  
   119  pr_debug("%s for context %d, command %s\n", __func__, 
ctx->pasid,
   120  CMD_STR(cmd));
   121  
   122  if (ctx->status == CLOSED)
   123  return -EIO;
   124  
   125  switch (cmd) {
   126  case OCXL_IOCTL_ATTACH:
   127  rc = afu_ioctl_attach(ctx,
   128  (struct ocxl_ioctl_attach __user *) 
args);
   129  break;
   130  
   131  case OCXL_IOCTL_IRQ_ALLOC:
   132  rc = ocxl_afu_irq_alloc(ctx, _offset);
   133  if (!rc) {
   134  rc = copy_to_user((u64 __user *) args, 
_offset,
   135  sizeof(irq_offset));
   136  if (rc)
^^
copy_to_user() returns the number of bytes remaining but we want to
return -EFAULT on error.

   137  ocxl_afu_irq_free(ctx, irq_offset);
   138  }
   139  break;
   140  

drivers/misc/ocxl/file.c:320 afu_read()
warn: unsigned 'used' is never less than zero.

drivers/misc/ocxl/file.c
   279  ssize_t rc;
   280  size_t used = 0;
^^
This should be ssize_t

   281  DEFINE_WAIT(event_wait);
   282  
   283  memset(, 0, sizeof(header));
   284  
   285  /* Require offset to be 0 */
   286  if (*off != 0)
   287  return -EINVAL;
   288  
   289  if (count < (sizeof(struct ocxl_kernel_event_header) +
   290  AFU_EVENT_BODY_MAX_SIZE))
   291  return -EINVAL;
   292  
   293  for (;;) {
   294  prepare_to_wait(>events_wq, _wait,
   295  TASK_INTERRUPTIBLE);
   296  
   297  if (afu_events_pending(ctx))
   298  break;
   299  
   300  if (ctx->status == CLOSED)
   301  break;
   302  
   303  if (file->f_flags & O_NONBLOCK) {
   304  finish_wait(>events_wq, _wait);
   305  return -EAGAIN;
   306  }
   307  
   308  if (signal_pending(current)) {
   309  finish_wait(>events_wq, _wait);
   310  return -ERESTARTSYS;
   311  }
   312  
   313  schedule();
   314  }
   315  
   316  finish_wait(>events_wq, _wait);
   317  
   318  if (has_xsl_error(ctx)) {
   319  used = append_xsl_error(ctx, , buf + 
sizeof(header));
   320  if (used < 0)

Impossible.

   321  return used;
   322  }
   323  
   324  if (!afu_events_pending(ctx))
   325  header.flags |= OCXL_KERNEL_EVENT_FLAG_LAST;
   326  
   327  if (copy_to_user(buf, , sizeof(header)))
   328  return -EFAULT;
   329  
   330  used += sizeof(header);
   331  
   332  rc = (ssize_t) used;
 ^^
You could remove the cast.

   333  return rc;
   334  }

regards,
dan carpenter

71 matches

Mail list logo