> This patch series adds OCXL support to the cxlflash driver. With this
> support, new devices using the OCXL transport will be supported by the
> cxlflash driver along with the existing CXL devices. An effort is made
> to keep this transport specific function independent of the existing
> core
Hi Cyril,
On 03/05/2018 08:49 PM, Cyril Bur wrote:
> On Mon, 2018-03-05 at 15:48 -0500, Gustavo Romero wrote:
>> Some processor revisions do not support transactional memory, and
>> additionally kernel support can be disabled. In either case the
>> tm-unavailable test should be skipped, otherwise
On Tue, 6 Mar 2018 16:02:20 +0100
Christophe LEROY wrote:
> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> > The number of high slices a process might use now depends on its
> > address space size, and what allocation address it has requested.
> >
> > This patch
On Wed, 21 Feb 2018 10:15:49 -0700 Khalid Aziz wrote:
> A protection flag may not be valid across entire address space and
> hence arch_validate_prot() might need the address a protection bit is
> being set on to ensure it is a valid protection flag. For example, sparc
>
On Tue, 6 Mar 2018 16:12:34 +0100
Christophe LEROY wrote:
> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> > This is a tidy up which removes radix MMU calls into the slice
> > code.
> >
> > Signed-off-by: Nicholas Piggin
> > ---
> >
On Tue, Mar 6, 2018 at 4:48 AM, Christian Borntraeger
wrote:
>
>
> On 03/06/2018 01:46 PM, Arnd Bergmann wrote:
>> On Mon, Mar 5, 2018 at 10:30 AM, Christian Borntraeger
>> wrote:
>>> On 01/16/2018 03:18 AM, Deepa Dinamani wrote:
All the
On Tue, 6 Mar 2018 15:41:00 +0100
Christophe LEROY wrote:
> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> > +static bool slice_check_range_fits(struct mm_struct *mm,
> > + const struct slice_mask *available,
> > + unsigned
On Tue, 6 Mar 2018 15:55:04 +0100
Christophe LEROY wrote:
> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> > @@ -572,11 +555,19 @@ unsigned long slice_get_unmapped_area(unsigned long
> > addr, unsigned long len,
> > #ifdef CONFIG_PPC_64K_PAGES
> > /* If we
On Tue, 6 Mar 2018 15:44:46 +0100
Christophe LEROY wrote:
> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> > This converts the slice_mask bit operation helpers to be the usual
> > 3-operand kind, which is clearer to work with.
>
> What's the real benefit of doing
This converts the slice_mask bit operation helpers to be the usual
3-operand kind, which allows 2 inputs to set a different output
without an extra copy, which is used in the next patch.
Adds slice_copy_mask, which will be used in the next patch.
Signed-off-by: Nicholas Piggin
On 13/02/18 16:51, Alexey Kardashevskiy wrote:
> GPUs and the corresponding NVLink bridges get different PEs as they have
> separate translation validation entries (TVEs). We put these PEs to
> the same IOMMU group so they cannot be passed through separately.
> So the
On Monday 05 March 2018 01:51 PM, Naveen N. Rao wrote:
Madhavan Srinivasan wrote:
Sampled Data Address Register (SDAR) is a 64-bit
register that contains the effective address of
the storage operand of an instruction that was
being executed, possibly out-of-order, at or around
the time that
The slice state of an mm gets zeroed then initialised upon exec.
This is the only caller of slice_set_user_psize now, so that can be
removed and instead implement a faster and simplified approach that
requires no locking or checking existing state.
This speeds up vfork+exec+exit performance on
Rather than build slice masks from a range then use that to check for
fit in a candidate mask, implement slice_check_range_fits that checks
if a range fits in a mask directly.
This allows several structures to be removed from stacks, and also we
don't expect a huge range in a lot of these cases,
This patch increase the max virtual address value to 4PB. With 4K page size
config we continue to limit ourself to 64TB.
Signed-off-by: Aneesh Kumar K.V
---
arch/powerpc/include/asm/book3s/64/hash-64k.h | 2 +-
arch/powerpc/include/asm/processor.h | 9
On Monday 05 March 2018 11:46 AM, Balbir Singh wrote:
On Sun, Mar 4, 2018 at 10:55 PM, Madhavan Srinivasan
wrote:
The current Branch History Rolling Buffer (BHRB) code does
not check for any privilege levels before updating the data
from BHRB. This leaks kernel
On Wed, 7 Mar 2018 11:37:18 +1000
Nicholas Piggin wrote:
> The number of high slices a process might use now depends on its
> address space size, and what allocation address it has requested.
>
> This patch uses that limit throughout call chains where possible,
> rather than
On 27/02/18 09:20, Uma Krishnan wrote:
When an adapter is initialized, transport specific configuration and MMIO
mapping details need to be saved. For CXL, this data is managed by the
underlying kernel module. To maintain a separation between the cxlflash
core and underlying transports,
On 27/02/18 09:20, Uma Krishnan wrote:
Per the OCXL specification, the underlying host can have multiple AFUs
per function with each function supporting its own configuration. The host
function configuration is read on the initialization path to evaluate the
number of functions present and
/linux/commits/Madhavan-Srinivasan/powerpc-perf-Fix-kernel-address-leaks-via-Sampling-registers/20180306-041036
config: powerpc-pmac32_defconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget
https://raw.githubusercontent.com/intel/lkp-tests
This is a tidy up which removes radix MMU calls into the slice
code.
Signed-off-by: Nicholas Piggin
---
arch/powerpc/include/asm/hugetlb.h | 7 ---
arch/powerpc/mm/hugetlbpage.c | 6 --
arch/powerpc/mm/slice.c| 17 -
3 files changed,
The number of high slices a process might use now depends on its
address space size, and what allocation address it has requested.
This patch uses that limit throughout call chains where possible,
rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
This saves some cost for processes
We need to zero-out pgd table only if we share the slab cache with pud/pmd
level caches. With the support of 4PB, we don't share the slab cache anymore.
Instead of removing the code completely hide it within an #ifdef. We don't need
to do this with any other page table level, because they all
Pass around const pointers to struct slice_mask where possible, rather
than copies of slice_mask, to reduce stack and call overhead.
checkstack.pl gives, before:
0x0d1c slice_get_unmapped_area [slice.o]: 592
0x1864 is_hugepage_only_range [slice.o]:448
0x0754
The slice_mask cache was a basic conversion which copied the slice
mask into caller's structures, because that's how the original code
worked. In most cases the pointer can be used directly instead, saving
a copy and an on-stack structure.
On POWER8, this increases vfork+exec+exit performance by
On Tue, Mar 06, 2018 at 04:49:48PM +1100, Russell Currey wrote:
> On Tue, 2018-03-06 at 11:00 +1100, Sam Bobroff wrote:
> > Checking for a "fully active" device state requires testing two flag
> > bits, which is open coded in several places, so add a function to do
> > it.
> >
> > Signed-off-by:
This patch series extended the max virtual address space value from 512TB
to 4PB with 64K page size. We do that by allocating one vsid context for
each 512TB range. More details of that is explained in patch 3.
Changes from V2:
* Rebased on top of slice_mask series from Nick Piggin
* Fixed
Le 07/03/2018 à 00:12, Nicholas Piggin a écrit :
On Tue, 6 Mar 2018 15:41:00 +0100
Christophe LEROY wrote:
Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
+static bool slice_check_range_fits(struct mm_struct *mm,
+ const struct
Make these loops look the same, and change their form so the
important part is not wrapped over so many lines.
Signed-off-by: Nicholas Piggin
---
arch/powerpc/mm/slice.c | 22 --
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git
Calculating the slice mask can become a signifcant overhead for
get_unmapped_area. This patch adds a struct slice_mask for
each page size in the mm_context, and keeps these in synch with
the slices psize arrays and slb_addr_limit.
On Book3S/64 this adds 288 bytes to the mm_context_t for the
slice
On 01/02/18 16:07, Alexey Kardashevskiy wrote:
> Fixes: 912cc87a6 "powerpc/mm/radix: Add LPID based tlb flush helpers"
> Signed-off-by: Alexey Kardashevskiy
Ping?
> ---
> arch/powerpc/mm/tlb-radix.c | 14 +++---
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
>
For address above 512TB we allocate additional mmu context. To make it all
easy address above 512TB is handled with IR/DR=1 and with stack frame setup.
We do the additional context allocation in SLB miss handler. If the context is
not allocated, we enable interrupts and allocate the context and
On 27/02/18 09:20, Uma Krishnan wrote:
Add initial infrastructure to support a new cxlflash transport, OCXL.
Claim a dependency on OCXL and add a new file, ocxl_hw.c, which will host
the backend routines that are specific to OCXL.
Signed-off-by: Uma Krishnan
On Wed, 7 Mar 2018 07:12:23 +0100
Christophe LEROY wrote:
> Le 07/03/2018 à 00:12, Nicholas Piggin a écrit :
> > On Tue, 6 Mar 2018 15:41:00 +0100
> > Christophe LEROY wrote:
> >
> >> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> >>>
Overall on POWER8, this series increases vfork+exec+exit
microbenchmark rate by 15.6%, and mmap+munmap rate by 81%. Slice
code/data size is reduced by 1kB, and max stack overhead through
slice_get_unmapped_area call goes rom 992 to 448 bytes. The cost is
288 bytes added to the mm_context_t per mm
This code is never compiled in, and it gets broken by the next
patch, so remove it.
Signed-off-by: Nicholas Piggin
---
arch/powerpc/mm/slice.c | 6 --
1 file changed, 6 deletions(-)
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index
* Benjamin Herrenschmidt [2018-03-01 08:40:22]:
> On Thu, 2018-03-01 at 01:03 +0530, Akshay Adiga wrote:
> > commit 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
> > states via stop API.") uses stop-api provided by the firmware to restore
> > PSSCR. PSSCR
Anju T Sudhakar writes:
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 4437c70..caefb64 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -757,6 +759,9 @@ static int register_cpu_online(unsigned int cpu)
On Mon, 5 Mar 2018 19:46:12 -0300
Mauricio Faria de Oliveira wrote:
> Hi Michael, Michal,
>
> I got back from vacation. Checking this one.
>
> On 02/20/2018 02:06 PM, Michal Suchánek wrote:
> >> I did it the way I did because otherwise we waste memory on every
> >>
Calculating the slice mask can become a signifcant overhead for
get_unmapped_area. This patch adds a struct slice_mask for
each page size in the mm_context, and keeps these in synch with
the slices psize arrays and slb_addr_limit.
On Book3S/64 this adds 288 bytes to the mm_context_t for the
slice
On Thu, 2018-02-22 at 14:27:20 UTC, Christophe Leroy wrote:
> bitmap_or() and bitmap_andnot() can work properly with dst identical
> to src1 or src2. There is no need of an intermediate result bitmap
> that is copied back to dst in a second step.
>
> Signed-off-by: Christophe Leroy
This converts the slice_mask bit operation helpers to be the usual
3-operand kind, which is clearer to work with.
Signed-off-by: Nicholas Piggin
---
arch/powerpc/mm/slice.c | 38 +++---
1 file changed, 23 insertions(+), 15 deletions(-)
diff
The slice_mask cache was a basic conversion which copied the slice
mask into caller's structures, because that's how the original code
worked. In most cases the pointer can be used directly instead, saving
a copy and an on-stack structure.
On POWER8, this increases vfork+exec+exit performance by
On Thu, 2018-03-01 at 01:02:49 UTC, Kees Cook wrote:
> From: Segher Boessenkool
>
> Newer gcc will support "-mno-readonly-in-sdata"[1], which makes sure that
> the optimization on PPC32 for variables getting moved into the .sdata
> section will not apply to const
Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
Rather than build slice masks from a range then use that to check for
fit in a candidate mask, implement slice_check_range_fits that checks
if a range fits in a mask directly.
This allows several structures to be removed from stacks, and also
-leaks-via-Sampling-registers/20180306-041036
config: powerpc-pmac32_defconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O
~/bin/make.cross
chmod +x ~/bin
Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
This is a tidy up which removes radix MMU calls into the slice
code.
Signed-off-by: Nicholas Piggin
---
arch/powerpc/include/asm/hugetlb.h | 9 ++---
arch/powerpc/mm/hugetlbpage.c | 5 +++--
The number of high slices a process might use now depends on its
address space size, and what allocation address it has requested.
This patch uses that limit throughout call chains where possible,
rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
This saves some cost for processes
This is a tidy up which removes radix MMU calls into the slice
code.
Signed-off-by: Nicholas Piggin
---
arch/powerpc/include/asm/hugetlb.h | 9 ++---
arch/powerpc/mm/hugetlbpage.c | 5 +++--
arch/powerpc/mm/slice.c| 17 -
3 files
On Tue, 2018-03-06 at 08:14:32 UTC, Bharata B Rao wrote:
> With ibm,dynamic-memory-v2 and ibm,drc-info coming around the
> same time, byte22 in vector5 of ibm architecture vector table
> got set twice separately. The end result is that guest kernel
> isn't advertising support for
Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
Pass around const pointers to struct slice_mask where possible, rather
than copies of slice_mask, to reduce stack and call overhead.
checkstack.pl gives, before:
0x0d1c slice_get_unmapped_area [slice.o]: 592
0x1864
Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
Calculating the slice mask can become a signifcant overhead for
get_unmapped_area. This patch adds a struct slice_mask for
each page size in the mm_context, and keeps these in synch with
the slices psize arrays and slb_addr_limit.
On Book3S/64
On Tue, 6 Mar 2018 14:43:40 +0100
Christophe LEROY wrote:
> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> > @@ -147,7 +149,8 @@ static void slice_mask_for_free(struct mm_struct *mm,
> > struct slice_mask *ret,
> > __set_bit(i,
On Tue, 6 Mar 2018 14:49:57 +0100
Christophe LEROY wrote:
> Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
> > @@ -201,6 +206,15 @@ typedef struct {
> > unsigned char low_slices_psize[SLICE_ARRAY_SIZE];
> > unsigned char high_slices_psize[0];
> > unsigned
Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
This converts the slice_mask bit operation helpers to be the usual
3-operand kind, which is clearer to work with.
What's the real benefit of doing that ?
It is helps for a subsequent patch, say it.
Otherwise, I really can't see the point.
Rather than build slice masks from a range then use that to check for
fit in a candidate mask, implement slice_check_range_fits that checks
if a range fits in a mask directly.
This allows several structures to be removed from stacks, and also we
don't expect a huge range in a lot of these cases,
On Tue, 6 Mar 2018 23:24:59 +1000
Nicholas Piggin wrote:
> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c
> b/arch/powerpc/mm/mmu_context_book3s64.c
> index 929d9ef7083f..80acad52b006 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++
Le 06/03/2018 à 14:25, Nicholas Piggin a écrit :
The number of high slices a process might use now depends on its
address space size, and what allocation address it has requested.
This patch uses that limit throughout call chains where possible,
rather than use the fixed SLICE_NUM_HIGH for
Le 02/03/2018 à 10:56, Philippe Bergheaud a écrit :
P9 supports PCI tunneled operations (atomics and as_notify). This
patch adds support for tunneled operations on powernv, with a new
API, to be called by device drivers:
pnv_pci_enable_tunnel()
Enable tunnel operations, tell driver the
Le 02/03/2018 à 10:56, Philippe Bergheaud a écrit :
Configure the P9 XSL_DSNCTL register with PHB indications found
in the device tree, or else use legacy hard-coded values.
Signed-off-by: Philippe Bergheaud
Reviewed-by: Frederic Barrat
With ibm,dynamic-memory-v2 and ibm,drc-info coming around the
same time, byte22 in vector5 of ibm architecture vector table
got set twice separately. The end result is that guest kernel
isn't advertising support for ibm,dynamic-memory-v2.
Fix this by removing the duplicate assignment of byte22.
On Tue, 06 Mar 2018 23:15:45 +1100
Michael Ellerman wrote:
> Mauricio Faria de Oliveira writes:
>
> > Hi Michael, Michal,
> >
> > I got back from vacation. Checking this one.
>
> Yeah it got stuck in a rut.
>
> > On 02/20/2018 02:06 PM,
Since this was last posted, it's been ported on top of Christophe's
8xx slice implementation that is merged in powerpc next, also taken
into account some feedback and bugs from Aneesh and Christophe --
thanks.
A few significant changes, first is refactoring slice_set_user_psize,
which makes it
Mauricio Faria de Oliveira writes:
> Hi Michael, Michal,
>
> I got back from vacation. Checking this one.
Yeah it got stuck in a rut.
> On 02/20/2018 02:06 PM, Michal Suchánek wrote:
>>> I did it the way I did because otherwise we waste memory on every
>>> system
On Mon, Mar 5, 2018 at 10:30 AM, Christian Borntraeger
wrote:
> On 01/16/2018 03:18 AM, Deepa Dinamani wrote:
>> All the current architecture specific defines for these
>> are the same. Refactor these common defines to a common
>> header file.
>>
>> The new common
Signed-off-by: Nicholas Piggin
---
.../selftests/powerpc/benchmarks/.gitignore| 2 +
.../testing/selftests/powerpc/benchmarks/Makefile | 8 +-
.../selftests/powerpc/benchmarks/exec_target.c | 5 +
tools/testing/selftests/powerpc/benchmarks/fork.c | 339
The slice state of an mm gets zeroed then initialised upon exec.
This is the only caller of slice_set_user_psize now, so that can be
removed and instead implement a faster and simplified approach that
requires no locking or checking existing state.
This speeds up vfork+exec+exit performance on
Pass around const pointers to struct slice_mask where possible, rather
than copies of slice_mask, to reduce stack and call overhead.
checkstack.pl gives, before:
0x0d1c slice_get_unmapped_area [slice.o]: 592
0x1864 is_hugepage_only_range [slice.o]:448
0x0754
On 03/06/2018 01:46 PM, Arnd Bergmann wrote:
> On Mon, Mar 5, 2018 at 10:30 AM, Christian Borntraeger
> wrote:
>> On 01/16/2018 03:18 AM, Deepa Dinamani wrote:
>>> All the current architecture specific defines for these
>>> are the same. Refactor these common defines to
Make these loops look the same, and change their form so the
important part is not wrapped over so many lines.
Signed-off-by: Nicholas Piggin
---
arch/powerpc/mm/slice.c | 22 --
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git
70 matches
Mail list logo