Re: [PATCH v2] mm/slub: fix panic in slab_alloc_node()

2020-10-28 Thread Christopher Lameter
On Tue, 27 Oct 2020, Laurent Dufour wrote: > The issue is that object is not NULL while page is NULL which is odd but > may happen if the cache flush happened after loading object but before > loading page. Thus checking for the page pointer is required too. Ok then lets revert commit

Re: [PATCH v3 0/3] Actually fix freelist pointer vs redzoning

2020-10-15 Thread Christopher Lameter
On Wed, 14 Oct 2020, Kees Cook wrote: > Note on patch 2: Christopher NAKed it, but I actually think this is a > reasonable thing to add -- the "too small" check is only made when built > with CONFIG_DEBUG_VM, so it *is* actually possible for someone to trip > over this directly, even if it would

Re: [PATCH] mm: Make allocator take care of memoryless numa node

2020-10-12 Thread Christopher Lameter
On Mon, 12 Oct 2020, Xianting Tian wrote: > In architecture like powerpc, we can have cpus without any local memory > attached to it. In such cases the node does not have real memory. > > In many places of current kernel code, it doesn't judge whether the node is > memoryless numa node before

Re: [PATCH v2 2/3] mm/slub: Fix redzoning for small allocations

2020-10-12 Thread Christopher Lameter
On Fri, 9 Oct 2020, Kees Cook wrote: > Store the freelist pointer out of line when object_size is smaller than > sizeof(void *) and redzoning is enabled. > > (Note that no caches with such a size are known to exist in the kernel > currently.) Ummm... The smallest allowable cache size is

Re: [RFC][PATCH 03/12] mm/vmscan: replace implicit RECLAIM_ZONE checks with explicit checks

2020-10-07 Thread Christopher Lameter
On Tue, 6 Oct 2020, Dave Hansen wrote: > These zero checks are not great because it is not obvious what a zero > mode *means* in the code. Replace them with a helper which makes it > more obvious: node_reclaim_enabled(). Well it uselessly checks bits. But whatever. It will prevent future code

Re: [RFC][PATCH 01/12] mm/vmscan: restore zone_reclaim_mode ABI

2020-10-07 Thread Christopher Lameter
On Tue, 6 Oct 2020, Dave Hansen wrote: > But, when the bit was removed (bit 0) the _other_ bit locations also > got changed. That's not OK because the bit values are documented to > mean one specific thing and users surely rely on them meaning that one > thing and not changing from kernel to

Re: [RFC][PATCH 02/12] mm/vmscan: move RECLAIM* bits to uapi header

2020-10-07 Thread Christopher Lameter
On Tue, 6 Oct 2020, Dave Hansen wrote: > It is currently not obvious that the RECLAIM_* bits are part of the > uapi since they are defined in vmscan.c. Move them to a uapi header > to make it obvious. Acked-by: Christoph Lameter

Re: [PATCH RFC v2 0/6] Break heap spraying needed for exploiting use-after-free

2020-10-06 Thread Christopher Lameter
On Tue, 6 Oct 2020, Matthew Wilcox wrote: > On Tue, Oct 06, 2020 at 12:56:33AM +0200, Jann Horn wrote: > > It seems to me like, if you want to make UAF exploitation harder at > > the heap allocator layer, you could do somewhat more effective things > > with a probably much smaller performance

Re: [PATCH RFC v2 0/6] Break heap spraying needed for exploiting use-after-free

2020-10-06 Thread Christopher Lameter
On Mon, 5 Oct 2020, Kees Cook wrote: > > TYPESAFE_BY_RCU, but if forcing that on by default would enhance security > > by a measurable amount, it wouldn't be a terribly hard sell ... > > Isn't the "easy" version of this already controlled by slab_merge? (i.e. > do not share same-sized/flagged

Re: [PATCH v4 00/10] Independent per-CPU data section for nVHE

2020-09-24 Thread Christopher Lameter
On Tue, 22 Sep 2020, David Brazdil wrote: > Introduce '.hyp.data..percpu' as part of ongoing effort to make nVHE > hyp code self-contained and independent of the rest of the kernel. The percpu subsystems point is to enable the use of special hardware instructions that can perform address

Re: [PATCH v2 05/10] mm, kfence: insert KFENCE hooks for SLUB

2020-09-17 Thread Christopher Lameter
On Tue, 15 Sep 2020, Marco Elver wrote: > void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags) > { > - void *ret = slab_alloc(s, gfpflags, _RET_IP_); > + void *ret = slab_alloc(s, gfpflags, _RET_IP_, s->object_size); The additional size parameter is a part of a struct

Re: [PATCH v2 04/10] mm, kfence: insert KFENCE hooks for SLAB

2020-09-17 Thread Christopher Lameter
On Tue, 15 Sep 2020, Marco Elver wrote: > @@ -3206,7 +3207,7 @@ static void *cache_alloc_node(struct kmem_cache > *cachep, gfp_t flags, > } > > static __always_inline void * > -slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, > +slab_alloc_node(struct kmem_cache

Re: [PATCH] mm/slub: branch optimization in free slowpath

2020-08-13 Thread Christopher Lameter
On Thu, 13 Aug 2020, wuyun...@huawei.com wrote: > The two conditions are mutually exclusive and gcc compiler will > optimise this into if-else-like pattern. Given that the majority > of free_slowpath is free_frozen, let's provide some hint to the > compilers. Acked-by: Christoph Lameter

Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial objects

2020-08-11 Thread Christopher Lameter
On Fri, 7 Aug 2020, Pekka Enberg wrote: > Why do you consider this to be a fast path? This is all partial list > accounting when we allocate/deallocate a slab, no? Just like > ___slab_alloc() says, I assumed this to be the slow path... What am I > missing? I thought these were per object

Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial objects

2020-08-07 Thread Christopher Lameter
On Fri, 7 Aug 2020, Pekka Enberg wrote: > I think we can just default to the counters. After all, if I > understood correctly, we're talking about up to 100 ms time period > with IRQs disabled when count_partial() is called. As this is > triggerable from user space, that's a performance bug

Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial objects

2020-07-09 Thread Christopher Lameter
On Tue, 7 Jul 2020, Pekka Enberg wrote: > On Fri, Jul 3, 2020 at 12:38 PM xunlei wrote: > > > > On 2020/7/2 PM 7:59, Pekka Enberg wrote: > > > On Thu, Jul 2, 2020 at 11:32 AM Xunlei Pang > > > wrote: > > >> The node list_lock in count_partial() spend long time iterating > > >> in case of large

Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial objects

2020-07-07 Thread Christopher Lameter
On Thu, 2 Jul 2020, Xunlei Pang wrote: > This patch introduces two counters to maintain the actual number > of partial objects dynamically instead of iterating the partial > page lists with list_lock held. > > New counters of kmem_cache_node are: pfree_objects, ptotal_objects. > The main

Re: [PATCH v1] mm:free unused pages in kmalloc_order

2020-07-01 Thread Christopher Lameter
On Mon, 29 Jun 2020, Matthew Wilcox wrote: > Sounds like we need a test somewhere that checks this behaviour. > > > In order to make such allocations possible one would have to create yet > > another kmalloc array for high memory. > > Not for this case because it goes straight to kmalloc_order().

Re: [PATCH v1] mm:free unused pages in kmalloc_order

2020-06-29 Thread Christopher Lameter
On Mon, 29 Jun 2020, Matthew Wilcox wrote: > Slab used to disallow GFP_HIGHMEM allocations earlier than this, It is still not allowed and not supported.

Re: [PATCH v1] mm:free unused pages in kmalloc_order

2020-06-29 Thread Christopher Lameter
On Sat, 27 Jun 2020, Long Li wrote: > Environment using the slub allocator, 1G memory in my ARM32. > kmalloc(1024, GFP_HIGHUSER) can allocate memory normally, > kmalloc(64*1024, GFP_HIGHUSER) will cause a memory leak, because > alloc_pages returns highmem physical pages, but it cannot be directly

Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

2020-06-29 Thread Christopher Lameter
On Wed, 24 Jun 2020, Srikar Dronamraju wrote: > Currently Linux kernel with CONFIG_NUMA on a system with multiple > possible nodes, marks node 0 as online at boot. However in practice, > there are systems which have node 0 as memoryless and cpuless. Maybe add something to explain why you are

Re: [PATCH] mm: ksize() should silently accept a NULL pointer

2020-06-17 Thread Christopher Lameter
On Tue, 16 Jun 2020, William Kucharski wrote: > Other mm routines such as kfree() and kzfree() silently do the right > thing if passed a NULL pointer, so ksize() should do the same. Ok so the size of an no object pointer is zero? Ignoring the freeing of a nonexisting object makes sense. But

Re: [PATCH 0/3] mm/slub: Fix slabs_node return value

2020-06-17 Thread Christopher Lameter
On Sun, 14 Jun 2020, Muchun Song wrote: > The slabs_node() always return zero when CONFIG_SLUB_DEBUG is disabled. > But some codes determine whether slab is empty by checking the return > value of slabs_node(). As you know, the result is not correct. we move > the nr_slabs of kmem_cache_node out

Re: [PATCH v3 04/19] mm: slub: implement SLUB version of obj_to_index()

2020-05-15 Thread Christopher Lameter
On Tue, 12 May 2020, Roman Gushchin wrote: > > Add it to the metadata at the end of the object. Like the debugging > > information or the pointer for RCU freeing. > > Enabling debugging metadata currently disables the cache merging. > I doubt that it's acceptable to sacrifice the cache merging in

Re: [PATCH v4 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

2020-05-12 Thread Christopher Lameter
On Tue, 12 May 2020, Srikar Dronamraju wrote: > +#ifdef CONFIG_NUMA > + [N_ONLINE] = NODE_MASK_NONE, Again. Same issue as before. If you do this then you do a global change for all architectures. You need to put something in the early boot sequence (in a non architecture specific way) that

Re: [PATCH v3 04/19] mm: slub: implement SLUB version of obj_to_index()

2020-05-08 Thread Christopher Lameter
On Mon, 4 May 2020, Roman Gushchin wrote: > On Sat, May 02, 2020 at 11:54:09PM +, Christoph Lameter wrote: > > On Thu, 30 Apr 2020, Roman Gushchin wrote: > > > > > Sorry, but what exactly do you mean? > > > > I think the right approach is to add a pointer to each slab object for > > memcg

Re: [PATCH] slub: limit count of partial slabs scanned to gather statistics

2020-05-07 Thread Christopher Lameter
On Mon, 4 May 2020, Andrew Morton wrote: > But I guess it's better than nothing at all, unless there are > alternative ideas? I its highly unsusual to have such large partial lists. In a typical case allocations whould reduce the size of the lists. 1000s? That is scary. Are there inodes or

Re: [PATCH] mm: slub: add panic_on_error to the debug facilities

2020-05-07 Thread Christopher Lameter
On Sun, 3 May 2020, Rafael Aquini wrote: > On Sat, May 02, 2020 at 11:16:30PM +0000, Christopher Lameter wrote: > > On Fri, 1 May 2020, Rafael Aquini wrote: > > > > > Sometimes it is desirable to override SLUB's debug facilities > > > default behavior upon stu

Re: [PATCH v3 04/19] mm: slub: implement SLUB version of obj_to_index()

2020-05-02 Thread Christopher Lameter
On Thu, 30 Apr 2020, Roman Gushchin wrote: > Sorry, but what exactly do you mean? I think the right approach is to add a pointer to each slab object for memcg support.

Re: [PATCH] mm: slub: add panic_on_error to the debug facilities

2020-05-02 Thread Christopher Lameter
On Fri, 1 May 2020, Rafael Aquini wrote: > Sometimes it is desirable to override SLUB's debug facilities > default behavior upon stumbling on a cache or object error > and just stop the execution in order to grab a coredump, at > the error-spotting time, instead of trying to fix the issue > and

Re: [PATCH v3 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

2020-05-02 Thread Christopher Lameter
On Fri, 1 May 2020, Srikar Dronamraju wrote: > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy); > */ > nodemask_t node_states[NR_NODE_STATES] __read_mostly = { > [N_POSSIBLE] = NODE_MASK_ALL, > +#ifdef CONFIG_NUMA > + [N_ONLINE] =

Re: [PATCH v3 1/3] powerpc/numa: Set numa_node for all possible cpus

2020-05-02 Thread Christopher Lameter
On Fri, 1 May 2020, Srikar Dronamraju wrote: > - for_each_present_cpu(cpu) > - numa_setup_cpu(cpu); > + for_each_possible_cpu(cpu) { > + /* > + * Powerpc with CONFIG_NUMA always used to have a node 0, > + * even if it was memoryless or

Re: [PATCH v3 04/19] mm: slub: implement SLUB version of obj_to_index()

2020-04-30 Thread Christopher Lameter
On Mon, 27 Apr 2020, Roman Gushchin wrote: > > Why do you need this? Just slap a pointer to the cgroup as additional > > metadata onto the slab object. Is that not much simpler, safer and faster? > > > > So, the problem is that not all slab objects are accounted, and sometimes > we don't know if

Re: [PATCH 02/16] mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat

2019-10-21 Thread Christopher Lameter
On Mon, 21 Oct 2019, Roman Gushchin wrote: > Sp far I haven't noticed any regression on the set of workloads where I did > test > the patchset, but if you know any benchmark or realistic test which can > affected > by this check, I'll be happy to try. > > Also, less-than-word-sized operations

Re: [PATCH 02/16] mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat

2019-10-20 Thread Christopher Lameter
On Thu, 17 Oct 2019, Roman Gushchin wrote: > Currently s8 type is used for per-cpu caching of per-node statistics. > It works fine because the overfill threshold can't exceed 125. > > But if some counters are in bytes (and the next commit in the series > will convert slab counters to bytes), it's

Re: [PATCH 02/16] mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat

2019-10-20 Thread Christopher Lameter
On Thu, 17 Oct 2019, Roman Gushchin wrote: > But if some counters are in bytes (and the next commit in the series > will convert slab counters to bytes), it's not gonna work: > value in bytes can easily exceed s8 without exceeding the threshold > converted to bytes. So to avoid overfilling

Re: [PATCH 25/34] mm: Use CONFIG_PREEMPTION

2019-10-16 Thread Christopher Lameter
Acked-by: Chistoph Lameter

Re: [PATCH] mm, page_alloc: drop pointless static qualifier in build_zonelists()

2019-10-08 Thread Christopher Lameter
On Sat, 28 Sep 2019, Kaitao Cheng wrote: > There is no need to make the 'node_order' variable static > since new value always be assigned before use it. In the past MAX_NUMMNODES could become quite large like 512 or 1k. Large array allocations on the stack are problematic. Maybe that is no

Re: [PATCH v5 0/7] mm, slab: Make kmalloc_info[] contain all types of names

2019-09-16 Thread Christopher Lameter
On Mon, 16 Sep 2019, Pengfei Li wrote: > The name of KMALLOC_NORMAL is contained in kmalloc_info[].name, > but the names of KMALLOC_RECLAIM and KMALLOC_DMA are dynamically > generated by kmalloc_cache_name(). > > Patch1 predefines the names of all types of kmalloc to save > the time spent

Re: [PATCH v5 7/7] mm, slab_common: Modify kmalloc_caches[type][idx] to kmalloc_caches[idx][type]

2019-09-16 Thread Christopher Lameter
On Mon, 16 Sep 2019, Pengfei Li wrote: > KMALLOC_NORMAL is the most frequently accessed, and kmalloc_caches[] > is initialized by different types of the same size. > > So modifying kmalloc_caches[type][idx] to kmalloc_caches[idx][type] > will benefit performance. Why would that increase

Re: [PATCH v2 4/4] mm: lock slub page when listing objects

2019-09-13 Thread Christopher Lameter
On Wed, 11 Sep 2019, Yu Zhao wrote: > Though I have no idea what the side effect of such race would be, > apparently we want to prevent the free list from being changed > while debugging the objects. process_slab() is called under the list_lock which prevents any allocation from the free list in

Re: [PATCH 0/5] mm, slab: Make kmalloc_info[] contain all types of names

2019-09-04 Thread Christopher Lameter
On Wed, 4 Sep 2019, Pengfei Li wrote: > There are three types of kmalloc, KMALLOC_NORMAL, KMALLOC_RECLAIM > and KMALLOC_DMA. I only got a few patches of this set. Can I see the complete patchset somewhere?

Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)

2019-09-03 Thread Christopher Lameter
On Sat, 31 Aug 2019, Matthew Wilcox wrote: > > The current behavior without special alignment for these caches has been > > in the wild for over a decade. And this is now coming up? > > In the wild ... and rarely enabled. When it is enabled, it may or may > not be noticed as data corruption, or

Re: [PATCH v2 2/2] mm, slab: Show last shrink time in us when slab/shrink is read

2019-07-18 Thread Christopher Lameter
On Wed, 17 Jul 2019, Waiman Long wrote: > The show method of /sys/kernel/slab//shrink sysfs file currently > returns nothing. This is now modified to show the time of the last > cache shrink operation in us. What is this useful for? Any use cases? > CONFIG_SLUB_DEBUG depends on CONFIG_SYSFS. So

Re: [PATCH v2 1/2] mm, slab: Extend slab/shrink to shrink all memcg caches

2019-07-18 Thread Christopher Lameter
On Wed, 17 Jul 2019, Waiman Long wrote: > Currently, a value of '1" is written to /sys/kernel/slab//shrink > file to shrink the slab by flushing out all the per-cpu slabs and free > slabs in partial lists. This can be useful to squeeze out a bit more memory > under extreme condition as well as

Re: [PATCH v5 4/5] mm/slab: Refactor common ksize KASAN logic into slab_common.c

2019-07-08 Thread Christopher Lameter
On Mon, 8 Jul 2019, Marco Elver wrote: > This refactors common code of ksize() between the various allocators > into slab_common.c: __ksize() is the allocator-specific implementation > without instrumentation, whereas ksize() includes the required KASAN > logic. Acked-by: Christoph Lameter

Re: [PATCH] mm/slab: One function call less in verify_redzone_free()

2019-07-05 Thread Christopher Lameter
On Fri, 5 Jul 2019, Markus Elfring wrote: > Avoid an extra function call by using a ternary operator instead of > a conditional statement for a string literal selection. Well. I thought the compiler does that on its own? And the tenary operator makes the code difficult to read.

Re: [PATCH] mm, slab: Extend slab/shrink to shrink all the memcg caches

2019-07-03 Thread Christopher Lameter
On Wed, 3 Jul 2019, Waiman Long wrote: > On 7/3/19 2:56 AM, Michal Hocko wrote: > > On Tue 02-07-19 14:37:30, Waiman Long wrote: > >> Currently, a value of '1" is written to /sys/kernel/slab//shrink > >> file to shrink the slab by flushing all the per-cpu slabs and free > >> slabs in partial

Re: [PATCH 2/2] mm, slab: Extend vm/drop_caches to shrink kmem slabs

2019-06-28 Thread Christopher Lameter
On Thu, 27 Jun 2019, Roman Gushchin wrote: > so that objects belonging to different memory cgroups can share the same page > and kmem_caches. > > It's a fairly big change though. Could this be done at another level? Put a cgoup pointer into the corresponding structures and then go back to just a

Re: [PATCH v1 4/4] mm: introduce MADV_PAGEOUT

2019-06-04 Thread Christopher Lameter
On Mon, 3 Jun 2019, Minchan Kim wrote: > @@ -415,6 +416,128 @@ static long madvise_cold(struct vm_area_struct *vma, > return 0; > } > > +static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, > + unsigned long end, struct mm_walk *walk) > +{ > +

Re: [PATCH] slab: remove /proc/slab_allocators

2019-05-21 Thread Christopher Lameter
On Thu, 16 May 2019, Qian Cai wrote: > It turned out that DEBUG_SLAB_LEAK is still broken even after recent > recue efforts that when there is a large number of objects like > kmemleak_object which is normal on a debug kernel, Acked-by: Christoph Lameter

Re: [PATCH v4 5/7] mm: rework non-root kmem_cache lifecycle management

2019-05-15 Thread Christopher Lameter
On Tue, 14 May 2019, Roman Gushchin wrote: > To make this possible we need to introduce a new percpu refcounter > for non-root kmem_caches. The counter is initialized to the percpu > mode, and is switched to atomic mode after deactivation, so we never > shutdown an active cache. The counter is

Re: [PATCH v3 4/7] mm: unify SLAB and SLUB page accounting

2019-05-13 Thread Christopher Lameter
On Wed, 8 May 2019, Roman Gushchin wrote: > Currently the page accounting code is duplicated in SLAB and SLUB > internals. Let's move it into new (un)charge_slab_page helpers > in the slab_common.c file. These helpers will be responsible > for statistics (global and memcg-aware) and memcg

Re: DISCONTIGMEM is deprecated

2019-04-30 Thread Christopher Lameter
On Mon, 29 Apr 2019, Christoph Hellwig wrote: > So maybe it it time to mark SN2 broken and see if anyone screams? > > Without SN2 the whole machvec mess could basically go away - the > only real difference between the remaining machvecs is which iommu > if any we set up. SPARSEMEM with VMEMMAP

Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release

2019-04-25 Thread Christopher Lameter
On Wed, 24 Apr 2019, Matthew Garrett wrote: > Applications that hold secrets and wish to avoid them leaking can use > mlock() to prevent the page from being pushed out to swap and > MADV_DONTDUMP to prevent it from being included in core dumps. Applications > can also use atexit() handlers to

Re: DISCONTIGMEM is deprecated

2019-04-22 Thread Christopher Lameter
On Fri, 19 Apr 2019, Matthew Wilcox wrote: > ia64 (looks complicated ...) Well as far as I can tell it was not even used 12 or so years ago on Itanium when I worked on that stuff.

Re: [PATCH 4/5] mm: rework non-root kmem_cache lifecycle management

2019-04-18 Thread Christopher Lameter
On Wed, 17 Apr 2019, Roman Gushchin wrote: > static __always_inline int memcg_charge_slab(struct page *page, >gfp_t gfp, int order, >struct kmem_cache *s) > { > - if (is_root_cache(s)) > + int idx =

Re: [PATCH 4/5] mm: rework non-root kmem_cache lifecycle management

2019-04-18 Thread Christopher Lameter
On Wed, 17 Apr 2019, Roman Gushchin wrote: > Let's make every page to hold a reference to the kmem_cache (we > already have a stable pointer), and make kmem_caches to hold a single > reference to the memory cgroup. Ok you are freeing one word in the page struct that can be used for other

Re: [External] Re: Basics : Memory Configuration

2019-04-10 Thread Christopher Lameter
Please respond to my comments in the way that everyone else communicates here. I cannot distinguish what you said from what I said before.

Re: Basics : Memory Configuration

2019-04-09 Thread Christopher Lameter
On Tue, 9 Apr 2019, Pankaj Suryawanshi wrote: > I am confuse about memory configuration and I have below questions Hmmm... Yes some of the terminology that you use is a bit confusing. > 1. if 32-bit os maximum virtual address is 4GB, When i have 4 gb of ram > for 32-bit os, What about the

Re: [PATCH] slab: fix a crash by reading /proc/slab_allocators

2019-04-08 Thread Christopher Lameter
On Sun, 7 Apr 2019, Linus Torvalds wrote: > On Sat, Apr 6, 2019 at 12:59 PM Qian Cai wrote: > > > > The commit 510ded33e075 ("slab: implement slab_root_caches list") > > changes the name of the list node within "struct kmem_cache" from > > "list" to "root_caches_node", but leaks_show() still use

Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

2019-04-04 Thread Christopher Lameter
On Wed, 13 Mar 2019, Barret Rhoden wrote: > > It is very expensive. VMSP exchanges 4K segments via RDMA between servers > > to build a large address space and run a kernel in the large address > > space. Using smaller segments can cause a lot of > > "cacheline" bouncing (meaning transfers of 4K

Re: [RFC 2/2] mm, slub: add missing kmem_cache_debug() checks

2019-04-04 Thread Christopher Lameter
On Thu, 4 Apr 2019, Vlastimil Babka wrote: > Some debugging checks in SLUB are not hidden behind kmem_cache_debug() check. > Add the check so that those places can also benefit from reduced overhead > thanks to the the static key added by the previous patch. Hmmm... I would not expect too much

Re: [RFC 0/2] add static key for slub_debug

2019-04-04 Thread Christopher Lameter
On Thu, 4 Apr 2019, Vlastimil Babka wrote: > I looked a bit at SLUB debugging capabilities and first thing I noticed is > there's no static key guarding the runtime enablement as is common for similar > debugging functionalities, so here's a RFC to add it. Can be further improved > if there's

Re: [RFC PATCH v2 14/14] dcache: Implement object migration

2019-04-04 Thread Christopher Lameter
On Wed, 3 Apr 2019, Al Viro wrote: > > This is an RFC and we want to know how to do this right. > > If by "how to do it right" you mean "expedit kicking out something with > non-zero refcount" - there's no way to do that. Nothing even remotely > sane. Sure we know that. > If you mean "kick out

Re: [RFC PATCH v2 14/14] dcache: Implement object migration

2019-04-03 Thread Christopher Lameter
On Wed, 3 Apr 2019, Al Viro wrote: > Let's do d_invalidate() on random dentries and hope they go away. > With convoluted and brittle logics for deciding which ones to > spare, which is actually wrong. This will pick mountpoints > and tear them out, to start with. > > NAKed-by: Al Viro > > And

Re: [PATCH v5 6/7] slab: Use slab_list instead of lru

2019-04-03 Thread Christopher Lameter
Acked-by: Christoph Lameter

Re: [PATCH v5 3/7] slob: Use slab_list instead of lru

2019-04-03 Thread Christopher Lameter
Acked-by: Christoph Lameter

Re: [PATCH v5 1/7] list: Add function list_rotate_to_front()

2019-04-03 Thread Christopher Lameter
On Wed, 3 Apr 2019, Tobin C. Harding wrote: > Add function list_rotate_to_front() to rotate a list until the specified > item is at the front of the list. Reviewed-by: Christoph Lameter

Re: [PATCH v5 2/7] slob: Respect list_head abstraction layer

2019-04-03 Thread Christopher Lameter
On Wed, 3 Apr 2019, Tobin C. Harding wrote: > Currently we reach inside the list_head. This is a violation of the > layer of abstraction provided by the list_head. It makes the code > fragile. More importantly it makes the code wicked hard to understand. Great It definitely makes it

Re: [PATCH v3] kmemleaak: survive in a low-memory situation

2019-03-26 Thread Christopher Lameter
On Tue, 26 Mar 2019, Qian Cai wrote: > + if (!object) { > + /* > + * The tracked memory was allocated successful, if the kmemleak > + * object failed to allocate for some reasons, it ends up with > + * the whole kmemleak disabled, so let it

Re: [PATCH 2/4] signal: Make flush_sigqueue() use free_q to release memory

2019-03-25 Thread Christopher Lameter
On Mon, 25 Mar 2019, Matthew Wilcox wrote: > Options: > > 1. Dispense with this optimisation and always store the size of the > object before the object. I think thats how SLOB handled it at some point in the past. Lets go back to that setup so its compatible with the other allocators?

Re: [PATCH 2/4] signal: Make flush_sigqueue() use free_q to release memory

2019-03-25 Thread Christopher Lameter
On Fri, 22 Mar 2019, Matthew Wilcox wrote: > On Fri, Mar 22, 2019 at 07:39:31PM +0000, Christopher Lameter wrote: > > On Fri, 22 Mar 2019, Waiman Long wrote: > > > > > > > > > >> I am looking forward to it. > > > > There is also a

Re: [PATCH 2/4] signal: Make flush_sigqueue() use free_q to release memory

2019-03-22 Thread Christopher Lameter
On Fri, 22 Mar 2019, Waiman Long wrote: > > > >> I am looking forward to it. > > There is also alrady rcu being used in these paths. kfree_rcu() would not > > be enough? It is an estalished mechanism that is mature and well > > understood. > > > In this case, the memory objects are from kmem

Re: [PATCH 1/4] mm: Implement kmem objects freeing queue

2019-03-22 Thread Christopher Lameter
On Thu, 21 Mar 2019, Waiman Long wrote: > When releasing kernel data structures, freeing up the memory > occupied by those objects is usually the last step. To avoid races, > the release operation is commonly done with a lock held. However, the > freeing operations do not need to be under lock,

Re: [PATCH 2/4] signal: Make flush_sigqueue() use free_q to release memory

2019-03-22 Thread Christopher Lameter
On Fri, 22 Mar 2019, Waiman Long wrote: > I am looking forward to it. There is also alrady rcu being used in these paths. kfree_rcu() would not be enough? It is an estalished mechanism that is mature and well understood.

Re: [PATCH] mm, slab: remove unneed check in cpuup_canceled

2019-03-22 Thread Christopher Lameter
On Thu, 21 Mar 2019, Li RongQing wrote: > nc is a member of percpu allocation memory, and impossible NULL Acked-by: Christoph Lameter

Re: [PATCH v4 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-19 Thread Christopher Lameter
On Tue, 19 Mar 2019, John Hubbard wrote: > > > > My concerns do not affect this patchset which just marks the get/put for > > the pagecache. The problem was that the description was making claims that > > were a bit misleading and seemed to prescribe a solution. > > > > So lets get this merged.

Re: [PATCH v4 1/1] mm: introduce put_user_page*(), placeholder versions

2019-03-19 Thread Christopher Lameter
On Wed, 20 Mar 2019, Dave Chinner wrote: > So the plan for GUP vs writeback so far is "break fsync()"? :) Well if its an anonymous page and not a file backed page then the semantics are preserved. Disallow GUP long term pinning (marking stuff like in this patchset may make that possible) and

Re: [PATCH v4 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-19 Thread Christopher Lameter
On Fri, 8 Mar 2019, john.hubb...@gmail.com wrote: > We seem to have pretty solid consensus on the concept and details of the > put_user_pages() approach. Or at least, if we don't, someone please speak > up now. Christopher Lameter, especially, since you had some concerns > recently.

Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-13 Thread Christopher Lameter
On Wed, 13 Mar 2019, Christoph Hellwig wrote: > On Wed, Mar 13, 2019 at 09:11:13AM +1100, Dave Chinner wrote: > > On Tue, Mar 12, 2019 at 03:39:33AM -0700, Ira Weiny wrote: > > > IMHO I don't think that the copy_file_range() is going to carry us > > > through the > > > next wave of user

Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-13 Thread Christopher Lameter
On Tue, 12 Mar 2019, Jerome Glisse wrote: > > > This has been discuss extensively already. GUP usage is now widespread in > > > multiple drivers, removing that would regress userspace ie break existing > > > application. We all know what the rules for that is. You are still misstating the issue.

Re: [PATCH v2 5/5] mm: Remove stale comment from page struct

2019-03-13 Thread Christopher Lameter
Acked-by: Christoph Lameter

Re: [PATCH v2 4/5] slob: Use slab_list instead of lru

2019-03-13 Thread Christopher Lameter
On Wed, 13 Mar 2019, Tobin C. Harding wrote: > @@ -297,7 +297,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int > align, int node) > continue; > > /* Attempt to alloc */ > - prev = sp->lru.prev; > + prev = sp->slab_list.prev; >

Re: [PATCH v2 2/5] slub: Use slab_list instead of lru

2019-03-13 Thread Christopher Lameter
Acked-by: Christoph Lameter

Re: [PATCH v2 1/5] slub: Add comments to endif pre-processor macros

2019-03-13 Thread Christopher Lameter
Acked-by: Christoph Lameter

Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-11 Thread Christopher Lameter
On Mon, 11 Mar 2019, Dave Chinner wrote: > > Direct IO on a mmapped file backed page doesnt make any sense. > > People have used it for many, many years as zero-copy data movement > pattern. i.e. mmap the destination file, use direct IO to DMA direct > into the destination file page cache pages,

Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-11 Thread Christopher Lameter
On Fri, 8 Mar 2019, Jerome Glisse wrote: > > > > It would good if that understanding would be enforced somehow given the > > problems > > that we see. > > This has been discuss extensively already. GUP usage is now widespread in > multiple drivers, removing that would regress userspace ie break

Re: [RFC 04/15] slub: Enable Slab Movable Objects (SMO)

2019-03-11 Thread Christopher Lameter
On Mon, 11 Mar 2019, Roman Gushchin wrote: > > +static inline void *alloc_scratch(struct kmem_cache *s) > > +{ > > + unsigned int size = oo_objects(s->max); > > + > > + return kmalloc(size * sizeof(void *) + > > + BITS_TO_LONGS(size) * sizeof(unsigned long), > > +

Re: [RFC 02/15] slub: Add isolate() and migrate() methods

2019-03-11 Thread Christopher Lameter
On Mon, 11 Mar 2019, Roman Gushchin wrote: > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -4325,6 +4325,34 @@ int __kmem_cache_create(struct kmem_cache *s, > > slab_flags_t flags) > > return err; > > } > > > > +void kmem_cache_setup_mobility(struct kmem_cache *s, > > +

Re: [RFC 02/15] slub: Add isolate() and migrate() methods

2019-03-08 Thread Christopher Lameter
On Fri, 8 Mar 2019, Tycho Andersen wrote: > On Fri, Mar 08, 2019 at 03:14:13PM +1100, Tobin C. Harding wrote: > > diff --git a/mm/slab_common.c b/mm/slab_common.c > > index f9d89c1b5977..754acdb292e4 100644 > > --- a/mm/slab_common.c > > +++ b/mm/slab_common.c > > @@ -298,6 +298,10 @@ int

Re: [PATCH v3 0/1] mm: introduce put_user_page*(), placeholder versions

2019-03-07 Thread Christopher Lameter
On Wed, 6 Mar 2019, john.hubb...@gmail.com wrote: > GUP was first introduced for Direct IO (O_DIRECT), allowing filesystem code > to get the struct page behind a virtual address and to let storage hardware > perform a direct copy to or from that page. This is a short-lived access > pattern, and

Re: [PATCH v3 1/1] mm: introduce put_user_page*(), placeholder versions

2019-03-07 Thread Christopher Lameter
On Wed, 6 Mar 2019, john.hubb...@gmail.com wrote: > Dave Chinner's description of this is very clear: > > "The fundamental issue is that ->page_mkwrite must be called on every > write access to a clean file backed page, not just the first one. > How long the GUP reference lasts is

Re: [PATCH] percpu/module resevation: change resevation size iff X86_VSMP is set

2019-03-01 Thread Christopher Lameter
On Fri, 1 Mar 2019, Barret Rhoden wrote: > I'm not familiar with VSMP - how bad is it to use L1 cache alignment instead > of 4K page alignment? Maybe some structures can use the smaller alignment? > Or maybe have VSMP require SRCU-using modules to be built-in? It is very expensive. VMSP

Re: [PATCH] cxgb4: fix undefined behavior in mem.c

2019-03-01 Thread Christopher Lameter
On Thu, 28 Feb 2019, Shaobo He wrote: > I think maybe the more problematic issue is that the value of a freed pointer > is intermediate. The pointer is not affected by freeing the data it points to. Thus it definitely has the same value as before and is not indeterminate. The pointer points now

Re: [PATCH 1/2] percpu: km: remove SMP check

2019-02-26 Thread Christopher Lameter
On Mon, 25 Feb 2019, Dennis Zhou wrote: > > @@ -27,7 +27,7 @@ > > * chunk size is not aligned. percpu-km code will whine about it. > > */ > > > > -#if defined(CONFIG_SMP) && defined(CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK) > > +#if defined(CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK) > > #error

Re: [PATCH 2/2] percpu: km: no need to consider pcpu_group_offsets[0]

2019-02-26 Thread Christopher Lameter
On Mon, 25 Feb 2019, den...@kernel.org wrote: > > @@ -67,7 +67,7 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) > > pcpu_set_page_chunk(nth_page(pages, i), chunk); > > > > chunk->data = pages; > > - chunk->base_addr = page_address(pages) - pcpu_group_offsets[0]; > > +

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-16 Thread Christopher Lameter
On Fri, 15 Feb 2019, Ira Weiny wrote: > > > > for filesystems and processes. The only problems come in for the things > > > > which bypass the page cache like O_DIRECT and DAX. > > > > > > It makes a lot of sense since the filesystems play COW etc games with the > > > pages and RDMA is very much

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-15 Thread Christopher Lameter
On Fri, 15 Feb 2019, Matthew Wilcox wrote: > > Since RDMA is something similar: Can we say that a file that is used for > > RDMA should not use the page cache? > > That makes no sense. The page cache is the standard synchronisation point > for filesystems and processes. The only problems come

Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

2019-02-15 Thread Christopher Lameter
On Fri, 15 Feb 2019, Dave Chinner wrote: > Which tells us filesystem people that the applications are doing > something that _will_ cause data corruption and hence not to spend > any time triaging data corruption reports because it's not a > filesystem bug that caused it. > > See open(2): > >

  1   2   3   4   5   6   7   >