Re: [RFC 0/2] add static key for slub_debug

2019-04-04 Thread Vlastimil Babka
On 4/4/19 5:57 PM, Christopher Lameter wrote:
> On Thu, 4 Apr 2019, Vlastimil Babka wrote:
> 
>> I looked a bit at SLUB debugging capabilities and first thing I noticed is
>> there's no static key guarding the runtime enablement as is common for 
>> similar
>> debugging functionalities, so here's a RFC to add it. Can be further improved
>> if there's interest.
> 
> Well the runtime enablement is per slab cache and static keys are global.

Sure, but the most common scenario worth optimizing for is that no slab
cache has debugging enabled, and thus the global static key is disabled.

Once it becomes enabled, the flags are checked for each cache, no change
there.

> Adding static key adds code to the critical paths.

It's effectively a NOP as long as it's disabled. When enabled, the NOP
is livepatched into a jump (unconditional!) to the flags check.

> Since the flags for a
> kmem_cache have to be inspected anyways there may not be that much of a
> benefit.

The point is that as long as it's disabled (the common case), no flag
check (most likely involving a conditional jump) is being executed at
all (unlike now). NOP is obviously cheaper than a flag check. For the
(uncommon) case with debugging enabled, it adds unconditional jump which
is also rather cheap. So the tradeoff looks good.

>> It's true that in the alloc fast path the debugging check overhead is AFAICS
>> amortized by the per-cpu cache, i.e. when the allocation is from there, no
>> debugging functionality is performed. IMHO that's kinda a weakness, 
>> especially
>> for SLAB_STORE_USER, so I might also look at doing something about it, and 
>> then
>> the static key might be more critical for overhead reduction.
> 
> Moving debugging out of the per cpu fastpath allows that fastpath to be
> much simpler and faster.
> 
> SLAB_STORE_USER is mostly used only for debugging in which case we are
> less concerned with performance.

Agreed, so it would be nice if we could do e.g. SLAB_STORE_USER for all
allocations in such case.

> If you want to use SLAB_STORE_USER in the fastpath then we have to do some
> major redesign there.

Sure. Just saying that benefit of static key in alloc path is currently
limited as the debugging itself is limited due to alloc fast path being
effective. But there's still immediate benefit in free path.

>> In the freeing fast path I quickly checked the stats and it seems that in
>> do_slab_free(), the "if (likely(page == c->page))" is not as likely as it
>> declares, as in the majority of cases, freeing doesn't happen on the object
>> that belongs to the page currently cached. So the advantage of a static key 
>> in
>> slow path __slab_free() should be more useful immediately.
> 
> Right. The freeing logic is actuall a weakness in terms of performance for
> SLUB due to the need to operate on a per page queue immediately.
> 



Re: [RFC 0/2] add static key for slub_debug

2019-04-04 Thread Christopher Lameter
On Thu, 4 Apr 2019, Vlastimil Babka wrote:

> I looked a bit at SLUB debugging capabilities and first thing I noticed is
> there's no static key guarding the runtime enablement as is common for similar
> debugging functionalities, so here's a RFC to add it. Can be further improved
> if there's interest.

Well the runtime enablement is per slab cache and static keys are global.

Adding static key adds code to the critical paths. Since the flags for a
kmem_cache have to be inspected anyways there may not be that much of a
benefit.

> It's true that in the alloc fast path the debugging check overhead is AFAICS
> amortized by the per-cpu cache, i.e. when the allocation is from there, no
> debugging functionality is performed. IMHO that's kinda a weakness, especially
> for SLAB_STORE_USER, so I might also look at doing something about it, and 
> then
> the static key might be more critical for overhead reduction.

Moving debugging out of the per cpu fastpath allows that fastpath to be
much simpler and faster.

SLAB_STORE_USER is mostly used only for debugging in which case we are
less concerned with performance.

If you want to use SLAB_STORE_USER in the fastpath then we have to do some
major redesign there.

> In the freeing fast path I quickly checked the stats and it seems that in
> do_slab_free(), the "if (likely(page == c->page))" is not as likely as it
> declares, as in the majority of cases, freeing doesn't happen on the object
> that belongs to the page currently cached. So the advantage of a static key in
> slow path __slab_free() should be more useful immediately.

Right. The freeing logic is actuall a weakness in terms of performance for
SLUB due to the need to operate on a per page queue immediately.


[RFC 0/2] add static key for slub_debug

2019-04-04 Thread Vlastimil Babka
Hi,

I looked a bit at SLUB debugging capabilities and first thing I noticed is
there's no static key guarding the runtime enablement as is common for similar
debugging functionalities, so here's a RFC to add it. Can be further improved
if there's interest.

It's true that in the alloc fast path the debugging check overhead is AFAICS
amortized by the per-cpu cache, i.e. when the allocation is from there, no
debugging functionality is performed. IMHO that's kinda a weakness, especially
for SLAB_STORE_USER, so I might also look at doing something about it, and then
the static key might be more critical for overhead reduction.

In the freeing fast path I quickly checked the stats and it seems that in
do_slab_free(), the "if (likely(page == c->page))" is not as likely as it
declares, as in the majority of cases, freeing doesn't happen on the object
that belongs to the page currently cached. So the advantage of a static key in
slow path __slab_free() should be more useful immediately.

Vlastimil Babka (2):
  mm, slub: introduce static key for slub_debug
  mm, slub: add missing kmem_cache_debug() checks

 mm/slub.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

-- 
2.21.0