Re: [PATCH kernel v3] powerpc/pci: Fix broken INTx configuration via OF

2018-02-10 Thread Bjorn Helgaas
On Fri, Feb 09, 2018 at 12:07:41PM -0600, Bjorn Helgaas wrote: > On Fri, Feb 09, 2018 at 05:23:58PM +1100, Alexey Kardashevskiy wrote: > > Commit 59f47eff03a0 ("powerpc/pci: Use of_irq_parse_and_map_pci() helper") > > replaced of_irq_parse_pci() + irq_create_of_mapping() with > >

[RFC PATCH 5/5] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-02-10 Thread Nicholas Piggin
The number of high slices a process might use now depends on its address space size, and what allocation address it has requested. This patch uses that limit throughout call chains where possible, rather than use the fixed SLICE_NUM_HIGH for bitmap operations. This saves some cost for processes

[RFC PATCH 4/5] powerpc/mm: Add support for handling > 512TB address in SLB miss

2018-02-10 Thread Aneesh Kumar K.V
For address above 512TB we allocate additonal mmu context. To make it all easy address above 512TB is handled with IR/DR=1 and with stack frame setup. We do the additonal context allocation in SLB miss handler. If the context is not allocated, we enable interrupts and allocate the context and

Re: [PATCH v1] PCI: Make PCI_SCAN_ALL_PCIE_DEVS work for Root as well as Downstream Ports

2018-02-10 Thread Christian Zigotzky
Hi All, The AmigaOne X1000 doesn’t boot anymore since the PCI updates. I have seen, that the PCI updates are different to the updates below. The code below works but the latest not. Is there a problem with the latest PCI updates currently? Thanks, Christian Sent from my iPhone On 2. Dec

[RFC PATCH 4/5] powerpc/mm/slice: Use const pointers to cached slice masks where possible

2018-02-10 Thread Nicholas Piggin
The slice_mask cache was a basic conversion which copied the slice mask into caller's structures, because that's how the original code worked. In most cases the pointer can be used directly instead, saving a copy and an on-stack structure. This also converts the slice_mask bit operation helpers

[RFC PATCH 3/5] powerpc/mm/slice: implement slice_check_range_fits

2018-02-10 Thread Nicholas Piggin
Rather than build slice masks from a range then use that to check for fit in a candidate mask, implement slice_check_range_fits that checks if a range fits in a mask directly. This allows several structures to be removed from stacks, and also we don't expect a huge range in a lot of these cases,

Re: [PATCH 1/2] powerpc/mm: Fix crashes with PUD level hugetlb config

2018-02-10 Thread Aneesh Kumar K.V
Aneesh Kumar K.V writes: > "Aneesh Kumar K.V" writes: > >> To support memory keys, we moved the hash pte slot information to the second >> half of the page table. This was ok with PTE entries at level 4 and level 3. >> We already

[RFC PATCH 0/5] Add support for 4PB virtual address space on hash

2018-02-10 Thread Aneesh Kumar K.V
This patch series extended the max virtual address space value from 512TB to 4PB with 64K page size. We do that by allocating one vsid context for each 512TB range. More details of that is explained in patch 4. Aneesh Kumar K.V (5): powerpc: Don't do runtime futex_cmpxchg test

[RFC PATCH 1/5] powerpc: Don't do runtime futex_cmpxchg test

2018-02-10 Thread Aneesh Kumar K.V
futex_detect_cmpxchg() does a cmpxchg_futex_value_locked on a NULL user addr to runtime detect whether architecture implements atomic cmpxchg for futex. POWER do implement the feature and hence we can enable the config instead of depending on runtime detection. We could possible enable this on

[RFC PATCH 5/5] powerpc/mm/hash64: Increase the VA range

2018-02-10 Thread Aneesh Kumar K.V
--- arch/powerpc/include/asm/book3s/64/hash-64k.h | 2 +- arch/powerpc/include/asm/processor.h | 9 - arch/powerpc/mm/init_64.c | 6 -- arch/powerpc/mm/pgtable_64.c | 5 - 4 files changed, 9 insertions(+), 13 deletions(-) diff --git

[RFC PATCH 0/5] powerpc/mm/slice: improve slice speed and stack use

2018-02-10 Thread Nicholas Piggin
This series intends to improve performance and reduce stack consumption in the slice allocation code. It does it by keeping slice masks in the mm_context rather than compute them for each allocation, and by reducing bitmaps and slice_masks from stacks, using pointers instead where possible.

[RFC PATCH 1/5] powerpc/mm/slice: pass pointers to struct slice_mask where possible

2018-02-10 Thread Nicholas Piggin
Pass around const pointers to struct slice_mask where possible, rather than copies of slice_mask, to reduce stack and call overhead. checkstack.pl gives, before: 0x0de4 slice_get_unmapped_area [slice.o]: 656 0x1b4c is_hugepage_only_range [slice.o]:512 0x075c

[RFC PATCH 2/5] powerpc/mm/slice: implement a slice mask cache

2018-02-10 Thread Nicholas Piggin
Calculating the slice mask can become a signifcant overhead for get_unmapped_area. This patch adds a struct slice_mask for each page size in the mm_context, and keeps these in synch with the slices psize arrays and slb_addr_limit. This saves about 30% kernel time on a single-page mmap/munmap

[RFC PATCH 2/5] powerpc/mm/slice: Update documentation in the file.

2018-02-10 Thread Aneesh Kumar K.V
We will make code changes in the next patch. To make the review easier split the documentation update in to a seperate patch. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/mm/slice.c | 27 +++ 1 file changed, 19 insertions(+), 8

[RFC PATCH 3/5] powerpc/mm/slice: Reduce the stack usage in slice_get_unmapped_area

2018-02-10 Thread Aneesh Kumar K.V
This patch kill potential_mask and compat_mask variable and instead use tmp_mask so that we can reduce the stack usage. This is required so that we can increase the high_slices bitmap to a larger value. The patch does result in extra computation in final stage, where it ends up recomputing the

Re: [PATCH v4 1/5] powerpc/mm/slice: Remove intermediate bitmap copy

2018-02-10 Thread Nicholas Piggin
On Sat, 10 Feb 2018 13:54:25 +0100 (CET) Christophe Leroy wrote: > bitmap_or() and bitmap_andnot() can work properly with dst identical > to src1 or src2. There is no need of an intermediate result bitmap > that is copied back to dst in a second step. Everyone seems to

Re: [PATCH v4 1/5] powerpc/mm/slice: Remove intermediate bitmap copy

2018-02-10 Thread Christophe LEROY
Le 10/02/2018 à 15:43, Nicholas Piggin a écrit : On Sat, 10 Feb 2018 13:54:25 +0100 (CET) Christophe Leroy wrote: bitmap_or() and bitmap_andnot() can work properly with dst identical to src1 or src2. There is no need of an intermediate result bitmap that is copied

Re: [PATCH v3 2/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-02-10 Thread Christophe LEROY
Le 29/01/2018 à 07:23, Aneesh Kumar K.V a écrit : Christophe Leroy writes: In preparation for the following patch which will fix an issue on the 8xx by re-using the 'slices', this patch enhances the 'slices' implementation to support 32 bits CPUs. On PPC32, the

Re: [PATCH v3 4/5] powerpc/mm: Allow up to 64 low slices

2018-02-10 Thread Christophe LEROY
Le 29/01/2018 à 07:29, Aneesh Kumar K.V a écrit : Christophe Leroy writes: While the implementation of the "slices" address space allows a significant amount of high slices, it limits the number of low slices to 16 due to the use of a single u64 low_slices_psize

[PATCH v4 2/5] powerpc/mm/slice: Enhance for supporting PPC32

2018-02-10 Thread Christophe Leroy
In preparation for the following patch which will fix an issue on the 8xx by re-using the 'slices', this patch enhances the 'slices' implementation to support 32 bits CPUs. On PPC32, the address space is limited to 4Gbytes, hence only the low slices will be used. This patch moves "slices"

[PATCH v4 3/5] powerpc/mm/slice: Fix hugepage allocation at hint address on 8xx

2018-02-10 Thread Christophe Leroy
On the 8xx, the page size is set in the PMD entry and applies to all pages of the page table pointed by the said PMD entry. When an app has some regular pages allocated (e.g. see below) and tries to mmap() a huge page at a hint address covered by the same PMD entry, the kernel accepts the hint

[PATCH v4 5/5] powerpc/8xx: Increase number of slices to 64

2018-02-10 Thread Christophe Leroy
On the 8xx, the minimum slice size is the size of the area covered by a single PMD entry, ie 4M in 4K pages mode and 64M in 16K pages mode. This patch increases the number of slices from 16 to 64 on the 8xx. Signed-off-by: Christophe Leroy --- v4: New

[PATCH v4 1/5] powerpc/mm/slice: Remove intermediate bitmap copy

2018-02-10 Thread Christophe Leroy
bitmap_or() and bitmap_andnot() can work properly with dst identical to src1 or src2. There is no need of an intermediate result bitmap that is copied back to dst in a second step. Signed-off-by: Christophe Leroy Reviewed-by: Aneesh Kumar K.V

[PATCH v4 4/5] powerpc/mm/slice: Allow up to 64 low slices

2018-02-10 Thread Christophe Leroy
While the implementation of the "slices" address space allows a significant amount of high slices, it limits the number of low slices to 16 due to the use of a single u64 low_slices_psize element in struct mm_context_t On the 8xx, the minimum slice size is the size of the area covered by a single

Re: [PATCH 1/2] powerpc/mm: Fix crashes with PUD level hugetlb config

2018-02-10 Thread Aneesh Kumar K.V
On 02/10/2018 10:20 PM, Ram Pai wrote: On Sat, Feb 10, 2018 at 03:17:02PM +0530, Aneesh Kumar K.V wrote: Aneesh Kumar K.V writes: "Aneesh Kumar K.V" writes: To support memory keys, we moved the hash pte slot information

Re: [RFC][PATCH bpf 1/2] bpf: allow 64-bit offsets for bpf function calls

2018-02-10 Thread Sandipan Das
On 02/10/2018 06:08 AM, Alexei Starovoitov wrote: > On 2/9/18 8:54 AM, Naveen N. Rao wrote: >> Naveen N. Rao wrote: >>> Alexei Starovoitov wrote: On 2/8/18 4:03 AM, Sandipan Das wrote: > The imm field of a bpf_insn is a signed 32-bit integer. For > JIT-ed bpf-to-bpf function calls,

Re: [PATCH 1/2] powerpc/mm: Fix crashes with PUD level hugetlb config

2018-02-10 Thread Ram Pai
On Sat, Feb 10, 2018 at 03:17:02PM +0530, Aneesh Kumar K.V wrote: > Aneesh Kumar K.V writes: > > > "Aneesh Kumar K.V" writes: > > > >> To support memory keys, we moved the hash pte slot information to the > >> second > >> half

Re: [PATCH v1] PCI: Make PCI_SCAN_ALL_PCIE_DEVS work for Root as well as Downstream Ports

2018-02-10 Thread Bjorn Helgaas
On Sat, Feb 10, 2018 at 09:05:40AM +0100, Christian Zigotzky wrote: > Hi All, > > The AmigaOne X1000 doesn’t boot anymore since the PCI updates. I > have seen, that the PCI updates are different to the updates below. > The code below works but the latest not. Is there a problem with the > latest

[GIT PULL] PCI fixes for v4.16

2018-02-10 Thread Bjorn Helgaas
PCI fixes: - fix POWER9/powernv INTx regression from the merge window (Alexey Kardashevskiy) The following changes since commit ab8c609356fbe8dbcd44df11e884ce8cddf3739e: Merge branch 'pci/spdx' into next (2018-02-01 11:40:07 -0600) are available in the Git repository at: