Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

2017-08-15 Thread Christopher Lameter
On Tue, 15 Aug 2017, Paul E. McKenney wrote: > Don't the HPC guys just disable idle_balance(), or am I out of date again? Ummm.. Why does idle management matter when your goal is to keep all processor busy working at maximum throughput?

Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

2017-08-16 Thread Christopher Lameter
On Tue, 15 Aug 2017, Mike Galbraith wrote: > On Tue, 2017-08-15 at 10:52 -0500, Christopher Lameter wrote: > > On Tue, 15 Aug 2017, Paul E. McKenney wrote: > > > > > Don't the HPC guys just disable idle_balance(), or am I out of date again? > > > > Ummm.. Why d

Re: [linux-next][PATCH v2] mm/slub.c: add a naive detection of double free or corruption

2017-08-11 Thread Christopher Lameter
On Fri, 11 Aug 2017, Alexander Popov wrote: > Add an assertion similar to "fasttop" check in GNU C Library allocator > as a part of SLAB_FREELIST_HARDENED feature. An object added to a singly > linked freelist should not point to itself. That helps to detect some > double free errors (e.g.

Re: [RFC PATCH 0/9] Introduce housekeeping subsystem

2017-08-11 Thread Christopher Lameter
On Fri, 11 Aug 2017, Chris Metcalf wrote: > > > Maybe a CONFIG_HOUSEKEEPING_BOOT_ONLY as a way to restrict housekeeping > > > by default to just the boot cpu. In conjunction with NOHZ_FULL_ALL you > > > would > > > then get the expected semantics. > > A big box with only the boot cpu for

Re: [PATCH 1/1] mm/slub.c: add a naive detection of double free or corruption

2017-07-17 Thread Christopher Lameter
On Mon, 17 Jul 2017, Alexander Popov wrote: > Add an assertion similar to "fasttop" check in GNU C Library allocator: > an object added to a singly linked freelist should not point to itself. > That helps to detect some double free errors (e.g. CVE-2017-2636) without > slub_debug and KASAN.

Re: [PATCH 1/1] mm/slub.c: add a naive detection of double free or corruption

2017-07-17 Thread Christopher Lameter
On Mon, 17 Jul 2017, Matthew Wilcox wrote: > On Mon, Jul 17, 2017 at 07:45:07PM +0300, Alexander Popov wrote: > > Add an assertion similar to "fasttop" check in GNU C Library allocator: > > an object added to a singly linked freelist should not point to itself. > > That helps to detect some

Re: [RFC][PATCH] slub: Introduce 'alternate' per cpu partial lists

2017-07-12 Thread Christopher Lameter
On Thu, 8 Jun 2017, Laura Abbott wrote: > - Some of this code is redundant and can probably be combined. > - The fast path is very sensitive and it was suggested I leave it alone. The > approach I took means the fastpath cmpxchg always fails before trying the > alternate cmpxchg. From some of my

Re: [PATCH] slub: make sure struct kmem_cache_node is initialized before publication

2017-07-12 Thread Christopher Lameter
On Wed, 12 Jul 2017, Andrew Morton wrote: > - free_kmem_cache_nodes() frees the cache node before nulling out a > reference to it > > - init_kmem_cache_nodes() publishes the cache node before initializing it > > Neither of these matter at runtime because the cache nodes cannot be > looked up by

Re: BUG: using __this_cpu_read() in preemptible [00000000] code: mm_percpu_wq/7

2017-07-12 Thread Christopher Lameter
On Wed, 7 Jun 2017, Andre Wild wrote: > I'm currently seeing the following message running kernel version 4.11.0. > It looks like it was introduced with the patch > 4037d452202e34214e8a939fa5621b2b3bbb45b7. A 2007 patch? At that point we did not have __this_cpu_read() nor refresh_cpu_vmstats

Re: [PATCH 1/1] mm/slub.c: add a naive detection of double free or corruption

2017-07-18 Thread Christopher Lameter
On Mon, 17 Jul 2017, Alexander Popov wrote: > Christopher, if I change BUG_ON() to VM_BUG_ON(), it will be disabled by > default > again, right? It will be enabled if the distro ships with VM debugging on by default.

Re: [PATCH 2/5] mm: slub: constify attribute_group structures.

2017-07-27 Thread Christopher Lameter
On Thu, 27 Jul 2017, Arvind Yadav wrote: > attribute_group are not supposed to change at runtime. All functions > working with attribute_group provided by work with > const attribute_group. So mark the non-const structs as const. Acked-by: Christoph Lameter

Re: [v3] mm: Add SLUB free list pointer obfuscation

2017-07-27 Thread Christopher Lameter
On Wed, 26 Jul 2017, Kees Cook wrote: > > Although in either case we are adding code to the fastpath... > > While I'd like it unconditionally, I think Alexander's proposal was to > put it behind CONFIG_SLAB_FREELIST_HARDENED. Sounds good. > BTW, while I've got your attention, can you Ack the

Re: [PATCH v4] mm: Add SLUB free list pointer obfuscation

2017-07-27 Thread Christopher Lameter
On Tue, 25 Jul 2017, Kees Cook wrote: > +/* > + * Returns freelist pointer (ptr). With hardening, this is obfuscated > + * with an XOR of the address where the pointer is held and a per-cache > + * random number. > + */ > +static inline void *freelist_ptr(const struct kmem_cache *s, void *ptr, >

Re: [v3] mm: Add SLUB free list pointer obfuscation

2017-07-26 Thread Christopher Lameter
On Tue, 25 Jul 2017, Kees Cook wrote: > > @@ -290,6 +290,10 @@ static inline void set_freepointer(struct kmem_cache > > *s, > > void *object, void *fp) > > { > > unsigned long freeptr_addr = (unsigned long)object + s->offset; > > > > +#ifdef CONFIG_SLAB_FREELIST_HARDENED > > +

Re: [v3] mm: Add SLUB free list pointer obfuscation

2017-07-26 Thread Christopher Lameter
On Wed, 26 Jul 2017, Kees Cook wrote: > >> What happens if, instead of BUG_ON, we do: > >> > >> if (unlikely(WARN_RATELIMIT(object == fp, "double-free detected")) > >> return; > > > > This may work for the free fastpath but the set_freepointer function is > > use in multiple other

Re: [RFC PATCH] mm/slub: fix a deadlock due to incomplete patching of cpusets_enabled()

2017-07-26 Thread Christopher Lameter
On Wed, 26 Jul 2017, Dima Zavin wrote: > The fix is to cache the value that's returned by cpusets_enabled() at the > top of the loop, and only operate on the seqlock (both begin and retry) if > it was true. I think the proper fix would be to ensure that the calls to

Re: [PATCH 1/1] mm/slub.c: add a naive detection of double free or corruption

2017-07-19 Thread Christopher Lameter
On Tue, 18 Jul 2017, Kees Cook wrote: > I think there are two issues: first, this should likely be under > CONFIG_FREELIST_HARDENED since Christoph hasn't wanted to make these > changes enabled by default (if I'm understanding his earlier review > comments to me). The second issue is what to DO

Re: [RFC][PATCH] slub: Introduce 'alternate' per cpu partial lists

2017-07-12 Thread Christopher Lameter
On Wed, 14 Jun 2017, Joonsoo Kim wrote: > > - Some of this code is redundant and can probably be combined. > > - The fast path is very sensitive and it was suggested I leave it alone. The > > approach I took means the fastpath cmpxchg always fails before trying the > > alternate cmpxchg. From

Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods

2017-07-19 Thread Christopher Lameter
On Wed, 19 Jul 2017, Paul E. McKenney wrote: > > Do we have any problem if we skip RCU idle enter/exit under a fast idle > > scenario? > > My understanding is, if tick is not stopped, then we don't need inform RCU > > in > > idle path, it can be informed in irq exit. > > Indeed, the problem

Re: [PATCH 0/3] IPI: Avoid to use 2 cache lines for one call_single_data

2017-08-02 Thread Christopher Lameter
On Wed, 2 Aug 2017, Huang, Ying wrote: > To allocate cache line size aligned percpu memory dynamically, > alloc_percpu_aligned() is introduced and used in iova drivers too. alloc_percpu() already aligns objects as specified when they are declared. Moreover the function is improperly named since

Re: [PATCH 1/3] percpu: Add alloc_percpu_aligned()

2017-08-02 Thread Christopher Lameter
On Wed, 2 Aug 2017, Huang, Ying wrote: > --- a/include/linux/percpu.h > +++ b/include/linux/percpu.h > @@ -129,5 +129,8 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr); > #define alloc_percpu(type) \ > (typeof(type) __percpu

Re: [PATCH -mm] mm: Clear to access sub-page last when clearing huge page

2017-08-07 Thread Christopher Lameter
On Mon, 7 Aug 2017, Huang, Ying wrote: > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4374,9 +4374,31 @@ void clear_huge_page(struct page *page, > } > > might_sleep(); > - for (i = 0; i < pages_per_huge_page; i++) { > + VM_BUG_ON(clamp(addr_hint, addr, addr + > +

Re: [RFC][PATCH] mm/slub.c: Allow poisoning to use the fast path

2017-08-08 Thread Christopher Lameter
On Mon, 7 Aug 2017, Kees Cook wrote: > > To clarify, this is desirable to kill exploitation of > exposure-after-free flaws and some classes of use-after-free flaws, > since the contents will have be wiped out after a free. (Verification > of poison is nice, but is expensive compared to the

Re: [RFC][PATCH] mm/slub.c: Allow poisoning to use the fast path

2017-08-07 Thread Christopher Lameter
On Fri, 4 Aug 2017, Laura Abbott wrote: > All slub debug features currently disable the fast path completely. > Some features such as consistency checks require this to allow taking of > locks. Poisoning and red zoning don't require this and can safely use > the per-cpu fast path. Introduce a

Re: [v3] mm: Add SLUB free list pointer obfuscation

2017-07-27 Thread Christopher Lameter
On Fri, 28 Jul 2017, Alexander Popov wrote: > I don't really like ignoring double-free. I think, that: > - it will hide dangerous bugs in the kernel, > - it can make some kernel exploits more stable. > I would rather add BUG_ON to set_freepointer() behind SLAB_FREELIST_HARDENED. > Is > it

Re: FSGSBASE ABI considerations

2017-08-07 Thread Christopher Lameter
I hope this will finally enable thread local support to work in a sane way in gcc so that we can actually use it in kernel space and get rid of all the this_cpu_xxx() macros? And thread local RMVs primitives may actually be provided by gcc and be usable in user space so that we can write user

Re: [RFC][PATCH] mm/slub.c: Allow poisoning to use the fast path

2017-08-07 Thread Christopher Lameter
On Mon, 7 Aug 2017, Laura Abbott wrote: > > Ok I see that the objects are initialized with poisoning and redzoning but > > I do not see that there is fastpath code to actually check the values > > before the object is reinitialized. Is that intentional or am > > I missing something? > > Yes,

Re: [PATCH 0/2] Separate NUMA statistics from zone statistics

2017-08-22 Thread Christopher Lameter
Can we simple get rid of the stats or make then configurable (off by defaut)? I agree they are rarely used and have been rarely used in the past. Maybe some instrumentation for perf etc will allow similar statistics these days? Thus its possible to drop them? The space in the pcp pageset is

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Christopher Lameter
On Tue, 22 Aug 2017, Andi Kleen wrote: > We only see it on 4S+ today. But systems are always getting larger, > so what's a large system today, will be a normal medium scale system > tomorrow. > > BTW we also collected PT traces for the long hang cases, but it was > hard to find a consistent

Re: [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit

2017-09-14 Thread Christopher Lameter
On Wed, 13 Sep 2017, Tim Chen wrote: > Here's what the customer think happened and is willing to tell us. > They have a parent process that spawns off 10 children per core and > kicked them to run. The child processes all access a common library. > We have 384 cores so 3840 child processes

Re: [kernel-hardening] Re: [PATCH v3 03/31] usercopy: Mark kmalloc caches as usercopy caches

2017-09-21 Thread Christopher Lameter
On Thu, 21 Sep 2017, Kees Cook wrote: > > So what is the point of this patch? > > The DMA kmalloc caches are not whitelisted: The DMA kmalloc caches are pretty obsolete and mostly there for obscure drivers. ?? > >> kmalloc_dma_caches[i] = create_kmalloc_cache(n, > >> -

Re: [PATCH v3 01/31] usercopy: Prepare for usercopy whitelisting

2017-09-21 Thread Christopher Lameter
On Wed, 20 Sep 2017, Kees Cook wrote: > diff --git a/include/linux/stddef.h b/include/linux/stddef.h > index 9c61c7cda936..f00355086fb2 100644 > --- a/include/linux/stddef.h > +++ b/include/linux/stddef.h > @@ -18,6 +18,8 @@ enum { > #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE

Re: [PATCH v3 03/31] usercopy: Mark kmalloc caches as usercopy caches

2017-09-21 Thread Christopher Lameter
On Wed, 20 Sep 2017, Kees Cook wrote: > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void) >*/ > kmalloc_caches[INDEX_NODE] = create_kmalloc_cache( > kmalloc_info[INDEX_NODE].name, > -

Re: [PATCH v3 02/31] usercopy: Enforce slab cache usercopy region boundaries

2017-09-21 Thread Christopher Lameter
On Wed, 20 Sep 2017, Kees Cook wrote: > diff --git a/mm/slab.c b/mm/slab.c > index 87b6e5e0cdaf..df268999cf02 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -4408,7 +4408,9 @@ module_init(slab_proc_init); > > #ifdef CONFIG_HARDENED_USERCOPY > /* > - * Rejects objects that are incorrectly

Re: [PATCH 1/3] mm: slab: output reclaimable flag in /proc/slabinfo

2017-09-14 Thread Christopher Lameter
Well /proc/slabinfo is a legacy interface. The infomation if a slab is reclaimable is available via the slabinfo tool. We would break a format that is relied upon by numerous tools.

Re: [PATCH 2/3] tools: slabinfo: add "-U" option to show unreclaimable slabs only

2017-09-14 Thread Christopher Lameter
Acked-by: Christoph Lameter

Re: [PATCH 3/3] mm: oom: show unreclaimable slab info when kernel panic

2017-09-14 Thread Christopher Lameter
I am not sure that this is generally useful at OOM times unless this is not a rare occurrence. Certainly information like that would create more support for making objects movable.

Re: [RFC] mmap(MAP_CONTIG)

2017-10-04 Thread Christopher Lameter
On Wed, 4 Oct 2017, Anshuman Khandual wrote: > > - Using 'pre-allocated' pages in the fault paths may be intrusive. > > But we have already faulted in all of them for the mapping and they > are also locked. Hence there should not be any page faults any more > for the VMA. Am I missing something

Re: [PATCH v3] mm, sysctl: make NUMA stats configurable

2017-10-10 Thread Christopher Lameter
On Tue, 10 Oct 2017, Michal Hocko wrote: > > But, let's be honest, this leaves us with an option that nobody is ever > > going to turn on. IOW, nobody except a very small portion of our users > > will ever see any benefit from this. > > But aren't those small groups who would like to squeeze

Re: [RFC] mmap(MAP_CONTIG)

2017-10-05 Thread Christopher Lameter
On Thu, 5 Oct 2017, Vlastimil Babka wrote: > On 10/04/2017 01:56 AM, Mike Kravetz wrote: > > At Plumbers this year, Guy Shattah and Christoph Lameter gave a presentation > > titled 'User space contiguous memory allocation for DMA' [1]. The slides > Hm I didn't find slides on that link, are they

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-16 Thread Christopher Lameter
On Mon, 16 Oct 2017, Michal Hocko wrote: > > So I mmap(MAP_CONTIG) 1GB working of working memory, prefer some data > > structures there, maybe recieve from network, then decide to write > > some and not write some other. > > Why would you want this? Because we are receiving a 1GB block of data

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-16 Thread Christopher Lameter
On Mon, 16 Oct 2017, Michal Hocko wrote: > On Mon 16-10-17 11:02:24, Cristopher Lameter wrote: > > On Mon, 16 Oct 2017, Michal Hocko wrote: > > > > > > So I mmap(MAP_CONTIG) 1GB working of working memory, prefer some data > > > > structures there, maybe recieve from network, then decide to write

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-16 Thread Christopher Lameter
On Mon, 16 Oct 2017, Michal Hocko wrote: > > We already have that issue and have ways to control that by tracking > > pinned and mlocked pages as well as limits on their allocations. > > Ohh, it is very different because mlock limit is really small (64kB) > which is not even close to what this is

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-16 Thread Christopher Lameter
On Mon, 16 Oct 2017, Michal Hocko wrote: > But putting that aside. Pinning a lot of memory might cause many > performance issues and misbehavior. There are still kernel users > who need high order memory to work properly. On top of that you are > basically allowing an untrusted user to deplete

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-13 Thread Christopher Lameter
On Fri, 13 Oct 2017, Michal Hocko wrote: > > There is a generic posix interface that could we used for a variety of > > specific hardware dependent use cases. > > Yes you wrote that already and my counter argument was that this generic > posix interface shouldn't bypass virtual memory

Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel

2017-10-13 Thread Christopher Lameter
On Thu, 12 Oct 2017, Josh Poimboeuf wrote: > > Can you run SLUB with full debug? specify slub_debug on the commandline or > > set CONFIG_SLUB_DEBUG_ON > > Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I > got with CONFIG_SLOB. The trapping instruction is: > > vmovdqa

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-13 Thread Christopher Lameter
On Fri, 13 Oct 2017, Michal Hocko wrote: > I would, quite contrary, suggest a device specific mmap implementation > which would guarantee both the best memory wrt. physical contiguous > aspect as well as the placement - what if the device have a restriction > on that as well? Contemporary high

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-13 Thread Christopher Lameter
On Thu, 12 Oct 2017, Anshuman Khandual wrote: > > +static long __alloc_vma_contig_range(struct vm_area_struct *vma) > > +{ > > + gfp_t gfp = GFP_HIGHUSER | __GFP_ZERO; > > Would it be GFP_HIGHUSER_MOVABLE instead ? Why __GFP_ZERO ? If its > coming from Buddy, every thing should have already

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-13 Thread Christopher Lameter
On Fri, 13 Oct 2017, Michal Hocko wrote: > On Fri 13-10-17 10:20:06, Cristopher Lameter wrote: > > On Fri, 13 Oct 2017, Michal Hocko wrote: > [...] > > > I am not really convinced this is a good interface. You are basically > > > trying to bypass virtual memory abstraction and that is quite > > >

Re: [lkp-robot] [x86/kconfig] 81d3871900: BUG:unable_to_handle_kernel

2017-10-12 Thread Christopher Lameter
On Wed, 11 Oct 2017, Josh Poimboeuf wrote: > I failed to add the slab maintainers to CC on the last attempt. Trying > again. Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple allocator and the K mechanism that was used in the early kernels. > > Adding the slub

Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer

2017-09-08 Thread Christopher Lameter
On Thu, 7 Sep 2017, David Rientjes wrote: > > It has *nothing* to do with zillions of tasks. Its amusing that the SGI > > ghost is still haunting the discussion here. The company died a couple of > > years ago finally (ok somehow HP has an "SGI" brand now I believe). But > > there are multiple

Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer

2017-09-07 Thread Christopher Lameter
On Thu, 7 Sep 2017, Roman Gushchin wrote: > > Really? From what I know and worked on way back when: The reason was to be > > able to contain the affected application in a cpuset. Multiple apps may > > have been running in multiple cpusets on a large NUMA machine and the OOM > > condition in one

Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer

2017-09-07 Thread Christopher Lameter
On Wed, 6 Sep 2017, David Rientjes wrote: > > The oom_kill_allocating_task sysctl which causes the OOM killer > > to simple kill the allocating task is useless. Killing the random > > task is not the best idea. > > > > Nobody likes it, and hopefully nobody uses it. > > We want to completely

Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer

2017-09-07 Thread Christopher Lameter
On Tue, 5 Sep 2017, Michal Hocko wrote: > I would argue that we should simply deprecate and later drop the sysctl. > I _strongly_ suspect anybody is using this. If yes it is not that hard > to change the kernel command like rather than select the sysctl. The > deprecation process would be >

Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer

2017-09-07 Thread Christopher Lameter
On Thu, 7 Sep 2017, Roman Gushchin wrote: > On Thu, Sep 07, 2017 at 10:03:24AM -0500, Christopher Lameter wrote: > > On Thu, 7 Sep 2017, Roman Gushchin wrote: > > > > > > Really? From what I know and worked on way back when: The reason was to > > > >

Re: [v7 2/5] mm, oom: cgroup-aware OOM killer

2017-09-07 Thread Christopher Lameter
On Mon, 4 Sep 2017, Roman Gushchin wrote > To address these issues, cgroup-aware OOM killer is introduced. You are missing a major issue here. Processes may have allocation constraints to memory nodes, special DMA zones etc etc. OOM conditions on such resource constricted allocations need to be

Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer

2017-09-07 Thread Christopher Lameter
On Wed, 6 Sep 2017, Michal Hocko wrote: > I am not sure this is how things evolved actually. This is way before > my time so my git log interpretation might be imprecise. We do have > oom_badness heuristic since out_of_memory has been introduced and > oom_kill_allocating_task has been introduced

Re: [PATCH 2/2] mm/slub: don't use reserved memory for optimistic try

2017-09-06 Thread Christopher Lameter
On Wed, 6 Sep 2017, Vlastimil Babka wrote: > I think it's wasteful to do a function call for this, inline definition > in header would be better (gfp_pfmemalloc_allowed() is different as it > relies on a rather heavyweight __gfp_pfmemalloc_flags(). Right.

Re: [PATCH 1/2] mm/slub: wake up kswapd for initial high order allocation

2017-09-06 Thread Christopher Lameter
On Wed, 6 Sep 2017, js1...@gmail.com wrote: > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1578,8 +1578,12 @@ static struct page *allocate_slab(struct kmem_cache > *s, gfp_t flags, int node) >* so we fall-back to the minimum order allocation. >*/ > alloc_gfp = (flags |

Re: [PATCH] percpu: make this_cpu_generic_read() atomic w.r.t. interrupts

2017-09-26 Thread Christopher Lameter
On Tue, 26 Sep 2017, Thomas Gleixner wrote: > > because it could also occur before or after the preempt_enable/disable > > without the code being able to distinguish that case. > > > > The fetch of a scalar value from memory is an atomic operation and that is > > required from all arches. There

Re: [GIT PULL] Introduce housekeeping subsystem v4

2017-10-01 Thread Christopher Lameter
On Fri, 29 Sep 2017, Frederic Weisbecker wrote: > Indeed I feel that housekeeping is probably not the best concept to express > all these things. I'm all for something clearer. Hmmm some ideas: OS maintenance tasks Delayed maintenance Supervisor OS Management

Re: [PATCH 2/3] mm: oom: show unreclaimable slab info when kernel panic

2017-10-01 Thread Christopher Lameter
On Thu, 28 Sep 2017, Yang Shi wrote: > > CONFIG_SLABINFO and /proc/slabinfo have nothing to do with the > > unreclaimable slab info. > > The current design uses "struct slabinfo" and get_slabinfo() to retrieve some > info, i.e. active objs, etc. They are protected by CONFIG_SLABINFO. Ok I guess

Re: [PATCH 2/2] mm: oom: show unreclaimable slab info when unreclaimable slabs > user memory

2017-10-01 Thread Christopher Lameter
On Thu, 28 Sep 2017, Yang Shi wrote: > diff --git a/mm/slab.h b/mm/slab.h > index 0733628..b0496d1 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -505,6 +505,14 @@ static inline struct kmem_cache_node *get_node(struct > kmem_cache *s, int node) > void memcg_slab_stop(struct seq_file *m, void

Re: [PATCH 2/3] mm: oom: show unreclaimable slab info when kernel panic

2017-09-27 Thread Christopher Lameter
On Wed, 27 Sep 2017, Yang Shi wrote: > Print out unreclaimable slab info (used size and total size) which > actual memory usage is not zero (num_objs * size != 0) when: > - unreclaimable slabs : all user memory > unreclaim_slabs_oom_ratio > - panic_on_oom is set or no killable process Ok. I

Re: [PATCH 1/6] mm: add kmalloc_array_node and kcalloc_node

2017-09-27 Thread Christopher Lameter
On Wed, 27 Sep 2017, Johannes Thumshirn wrote: > +static inline void *kmalloc_array_node(size_t n, size_t size, gfp_t flags, > +int node) > +{ > + if (size != 0 && n > SIZE_MAX / size) > + return NULL; > + if (__builtin_constant_p(n) &&

Re: [PATCH 2/6] block: use kmalloc_array_node

2017-09-27 Thread Christopher Lameter
Reviewed-by: Christoph Lameter

Re: [PATCH 3/6] IB/qib: use kmalloc_array_node

2017-09-27 Thread Christopher Lameter
Reviewed-by: Christoph Lameter

Re: [PATCH 6/6] rds: ib: use kmalloc_array_node

2017-09-27 Thread Christopher Lameter
Reviewed-by: Christoph Lameter

Re: [PATCH 1/6] mm: add kmalloc_array_node and kcalloc_node

2017-09-27 Thread Christopher Lameter
On Wed, 27 Sep 2017, Michal Hocko wrote: > > Introduce a combination of the two above cases to have a NUMA-node aware > > version of kmalloc_array() and kcalloc(). > > Yes, this is helpful. I am just wondering why we cannot have > kmalloc_array to call kmalloc_array_node with the local node as a

Re: [PATCH 5/6] mm, mempool: use kmalloc_array_node

2017-09-27 Thread Christopher Lameter
Reviewed-by: Christoph Lameter

Re: [PATCH 4/6] IB/rdmavt: use kmalloc_array_node

2017-09-27 Thread Christopher Lameter
Reviewed-by: Christoph Lameter

Re: [PATCH 2/3] mm: oom: show unreclaimable slab info when kernel panic

2017-09-27 Thread Christopher Lameter
On Thu, 28 Sep 2017, Yang Shi wrote: > > CONFIG_SLABINFO? How does this relate to the oom info? /proc/slabinfo > > support is optional. Oom info could be included even if CONFIG_SLABINFO > > goes away. Remove the #ifdef? > > Because we want to dump the unreclaimable slab info in oom info.

Re: [PATCH] percpu: make this_cpu_generic_read() atomic w.r.t. interrupts

2017-09-26 Thread Christopher Lameter
On Mon, 25 Sep 2017, Tejun Heo wrote: > Hello, > > On Mon, Sep 25, 2017 at 04:33:02PM +0100, Mark Rutland wrote: > > Unfortunately, the generic this_cpu_read(), which is intended to be > > irq-safe, is not: > > > > #define this_cpu_generic_read(pcp) \ > > ({

Re: [RFC PATCH 12/12] housekeeping: Reimplement isolcpus on housekeeping

2017-08-28 Thread Christopher Lameter
On Mon, 28 Aug 2017, Peter Zijlstra wrote: > > I think that change is good maybe even a bugfix. I had some people be very > > surprised when they set affinities to multiple cpus and the processeds > > kept sticking to one cpu because of isolcpus. > > Those people cannot read. And no its not a bug

Re: [RFC PATCH 12/12] housekeeping: Reimplement isolcpus on housekeeping

2017-08-28 Thread Christopher Lameter
On Mon, 28 Aug 2017, Peter Zijlstra wrote: > Well, ideally something like this would start the system with all the > 'crap' threads in !root cgroup. But that means cgroupfs needs to be > populated with at least two directories on boot. And current cgroup > cruft doesn't expect that. Maybe an

Re: [RFC PATCH 12/12] housekeeping: Reimplement isolcpus on housekeeping

2017-08-23 Thread Christopher Lameter
On Wed, 23 Aug 2017, Frederic Weisbecker wrote: > While at it, this is a proposition for a reimplementation of isolcpus= > that doesn't involve scheduler domain isolation. Therefore this > brings a behaviour change: all user tasks inherit init/1 affinity which > avoid the isolcpus= range. But if

Re: [PATCH 1/2 v2] sched/wait: Break up long wake list walk

2017-08-25 Thread Christopher Lameter
On Fri, 25 Aug 2017, Tim Chen wrote: > for a long time. It is a result of the numa balancing migration of hot > pages that are shared by many threads. I think that would also call for some work to limit numa balacing of hot shared pages. The cache lines of hot pages are likely in present the

Re: [RFC PATCH for 4.16 02/21] rseq: Introduce restartable sequences system call (v12)

2017-12-14 Thread Christopher Lameter
On Thu, 14 Dec 2017, Mathieu Desnoyers wrote: > On x86, yet another possible approach would be to use the gs segment > selector to point to user-space per-cpu data. This approach performs > similarly to the cpu id cache, but it has two disadvantages: it is > not portable, and it is incompatible

Re: [RFC PATCH for 4.16 02/21] rseq: Introduce restartable sequences system call (v12)

2017-12-14 Thread Christopher Lameter
On Thu, 14 Dec 2017, Mathieu Desnoyers wrote: > > I think the proper way to think about gs and fs on x86 is as base > > registers. They are essentially values in registers added to the address > > generated in an instruction. As such the approach is transferable to other > > processor

Re: [RFC PATCH for 4.16 02/21] rseq: Introduce restartable sequences system call (v12)

2017-12-14 Thread Christopher Lameter
On Thu, 14 Dec 2017, Mathieu Desnoyers wrote: > If we port this concept to kernel-space (as I start to understand > would be your wish), then a simple pointer store to the current > task_struct would suffice. Certainly such a port would be beneficial for non x86 archs. But my company has

Re: [PATCH 4/5] sched/isolation: Residual 1Hz scheduler tick offload

2017-12-19 Thread Christopher Lameter
On Tue, 19 Dec 2017, Frederic Weisbecker wrote: > Adding the boot parameter "isolcpus=nohz_offload" will now outsource > these scheduler ticks to the global workqueue so that a housekeeping CPU > handles that tick remotely. The vmstat processing required per cpu area access. How does that work

Re: [PATCH 4/5] sched/isolation: Residual 1Hz scheduler tick offload

2017-12-19 Thread Christopher Lameter
On Tue, 19 Dec 2017, Peter Zijlstra wrote: > On Tue, Dec 19, 2017 at 04:23:57AM +0100, Frederic Weisbecker wrote: > > When a CPU runs in full dynticks mode, a 1Hz tick remains in order to > > keep the scheduler stats alive. However this residual tick is a burden > > for Real-Time tasks that can't

Re: [PATCH v2 2/5] mm: Extends local cpu counter vm_diff_nodestat from s8 to s16

2017-12-19 Thread Christopher Lameter
On Tue, 19 Dec 2017, Kemi Wang wrote: > The type s8 used for vm_diff_nodestat[] as local cpu counters has the > limitation of global counters update frequency, especially for those > monotone increasing type of counters like NUMA counters with more and more > cpus/nodes. This patch extends the

Re: [PATCH 4/5] sched/isolation: Residual 1Hz scheduler tick offload

2017-12-19 Thread Christopher Lameter
On Tue, 19 Dec 2017, Peter Zijlstra wrote: > > Depends what one means by RT. > > Real Time computing as per the literature. Any other definition is > wrong and confusing. That is an understanding of language rooted in the positivism of the early 20th century which was intending to assign a

Re: [PATCH v2 2/5] mm: Extends local cpu counter vm_diff_nodestat from s8 to s16

2017-12-19 Thread Christopher Lameter
On Tue, 19 Dec 2017, Michal Hocko wrote: > > Well the reason for s8 was to keep the data structures small so that they > > fit in the higher level cpu caches. The large these structures become the > > more cachelines are used by the counters and the larger the performance > > influence on the

Re: [PATCH 4/5] sched/isolation: Residual 1Hz scheduler tick offload

2017-12-19 Thread Christopher Lameter
On Tue, 19 Dec 2017, Frederic Weisbecker wrote: > > The vmstat processing required per cpu area access. How does that work if > > the code is running on a remote processor? > > It seems that current::sched_class::task_tick() is ok with this, as it > uses per runqueues or per task datas. And both

Re: [PATCH 4/5] sched/isolation: Residual 1Hz scheduler tick offload

2017-12-19 Thread Christopher Lameter
On Tue, 19 Dec 2017, Peter Zijlstra wrote: > On Tue, Dec 19, 2017 at 10:38:39AM -0600, Christopher Lameter wrote: > > And the term RT has been heavily abused by marketing folks to mean any > > number of things so people can use RT to refer to variety of things. So > > pleas

Re: [RFC PATCH for 4.16 02/21] rseq: Introduce restartable sequences system call (v12)

2017-12-15 Thread Christopher Lameter
On Thu, 14 Dec 2017, Peter Zijlstra wrote: > > But my company has extensive user space code that maintains a lot of > > counters and does other tricks to get full performance out of the > > hardware. Such a mechanism would also be good from user space. Why keep > > the good stuff only inside the

Re: [RFC PATCH for 4.16 02/21] rseq: Introduce restartable sequences system call (v12)

2017-12-15 Thread Christopher Lameter
On Fri, 15 Dec 2017, Mathieu Desnoyers wrote: > Another aspect that worries me is applications using the gs segment selector > for other purposes. Suddenly reserving the gs segment selector for use by a > library like glibc may lead to incompatibilities with applications already > using it.

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Christopher Lameter
On Tue, 19 Dec 2017, rao.sho...@oracle.com wrote: > This patch updates kfree_rcu to use new bulk memory free functions as they > are more efficient. It also moves kfree_call_rcu() out of rcu related code to > mm/slab_common.c It would be great to have separate patches so that we can review it

Re: [PATCH] kfree_rcu() should use the new kfree_bulk() interface for freeing rcu structures

2017-12-19 Thread Christopher Lameter
On Tue, 19 Dec 2017, Rao Shoaib wrote: > > > mm/slab_common.c > > It would be great to have separate patches so that we can review it > > properly: > > > > 1. Move the code into slab_common.c > > 2. The actual code changes to the kfree rcu mechanism > > 3. The whitespace changes > I can

Re: [PATCH] slub: Fix sysfs duplicate filename creation when slub_debug=O

2017-11-10 Thread Christopher Lameter
On Fri, 10 Nov 2017, Miles Chen wrote: > By checking disable_higher_order_debug & (slub_debug & > SLAB_NEVER_MERGE), we can detect if a cache is unmergeable but become > mergeable because the disable_higher_order_debug=1 logic. Those kind of > caches should be keep unmergeable. Acked-by:

Re: [PATCH RFC v2 4/4] mm/mempolicy: add nodes_empty check in SYSC_migrate_pages

2017-11-06 Thread Christopher Lameter
On Mon, 6 Nov 2017, Vlastimil Babka wrote: > I'm not sure what exactly is the EPERM intention. Should really the > capability of THIS process override the cpuset restriction of the TARGET > process? Maybe yes. Then, does "insufficient privilege (CAP_SYS_NICE) to CAP_SYS_NICE never overrides

Re: [PATCH v16 00/13] support "task_isolation" mode

2017-11-06 Thread Christopher Lameter
On Fri, 3 Nov 2017, Chris Metcalf wrote: > However, it doesn't seem possible to do the synchronous cancellation of > the vmstat deferred work with irqs disabled, though if there's a way, > it would be a little cleaner to do that; Christoph? We can certainly > update the statistics with

Re: [PATCH v16 00/13] support "task_isolation" mode

2017-11-07 Thread Christopher Lameter
On Mon, 6 Nov 2017, Chris Metcalf wrote: > On 11/6/2017 10:38 AM, Christopher Lameter wrote: > > > What about that d*mn 1 Hz clock? > > > > > > It's still there, so this code still requires some further work before > > > it can actually get a process

Re: [PATCH RFC v2 4/4] mm/mempolicy: add nodes_empty check in SYSC_migrate_pages

2017-11-09 Thread Christopher Lameter
On Thu, 9 Nov 2017, Yisheng Xie wrote: > > The caller of migrate_pages should be able to migrate the target process > > pages anywhere the caller can allocate memory. If that is outside the > > target processes cpuset then that is fine. Pagecache pages that are not > > allocated by the target

Re: [PATCH] slub: Fix sysfs duplicate filename creation when slub_debug=O

2017-11-09 Thread Christopher Lameter
On Thu, 9 Nov 2017, Miles Chen wrote: > In this fix patch, it disables slab merging if SLUB_DEBUG=O and > CONFIG_SLUB_DEBUG_ON=y but the debug features are disabled by the > disable_higher_order_debug logic and it holds the "slab merging is off > if any debug features are enabled" behavior.

Re: [PATCH] slub: Fix sysfs duplicate filename creation when slub_debug=O

2017-11-08 Thread Christopher Lameter
On Wed, 8 Nov 2017, Miles Chen wrote: > > Ok then the aliasing failed for some reason. The creation of the unique id > > and the alias detection needs to be in sync otherwise duplicate filenames > > are created. What is the difference there? > > The aliasing failed because find_mergeable()

Re: [PATCH RFC v2 4/4] mm/mempolicy: add nodes_empty check in SYSC_migrate_pages

2017-11-08 Thread Christopher Lameter
On Wed, 8 Nov 2017, Yisheng Xie wrote: > Another case is current process is *not* the same as target process, and > when current process try to migrate pages of target process from old_nodes > to new_nodes, the new_nodes should be a subset of target process cpuset. The caller of migrate_pages

Re: [PATCH RFC v2 4/4] mm/mempolicy: add nodes_empty check in SYSC_migrate_pages

2017-11-07 Thread Christopher Lameter
On Tue, 7 Nov 2017, Vlastimil Babka wrote: > > Migrate pages moves the pages of a single process there is no TARGET > > process. > > migrate_pages(2) takes a pid argument > > "migrate_pages() attempts to move all pages of the process pid that > are in memory nodes old_nodes to the memory nodes

  1   2   3   4   5   6   7   >