Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-08 Thread Herbert Xu
On Wed, Dec 09, 2015 at 03:18:26AM +0100, Thomas Graf wrote:
> 
> Assuming that we only encounter this scenario with very large
> table sizes, it might be OK to assume that deferring the actual
> resize via the worker thread while continuing to insert above
> 100% utilization in atomic context is safe.

As test_rhashtable has demonstrated already this approach doesn't
work.  There is nothing in the kernel that will ensure that the
worker thread gets to run at all.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-08 Thread Thomas Graf
On 12/09/15 at 10:24am, Herbert Xu wrote:
> On Wed, Dec 09, 2015 at 03:18:26AM +0100, Thomas Graf wrote:
> > 
> > Assuming that we only encounter this scenario with very large
> > table sizes, it might be OK to assume that deferring the actual
> > resize via the worker thread while continuing to insert above
> > 100% utilization in atomic context is safe.
> 
> As test_rhashtable has demonstrated already this approach doesn't
> work.  There is nothing in the kernel that will ensure that the
> worker thread gets to run at all.

If we define work assuming that an insertion in atomic context
should never fail then yes.  I'm not sure you can guarantee that
with a segmented table either though. I agree though that the
insertion behaviour is much better defined.

My argument is that if we are in a situation in which a worker
thread is never invoked and we've grown 2x from the original
table size, do we still need entries to be inserted into the
table or can we fail?

Without knowing your exact implementation plans: introducing an
additional reference indirection for every lookup will have a
huge performance penalty as well.

Is your plan to only introduce the master table after an
allocation has failed?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-08 Thread Thomas Graf

On 12/05/15 at 03:06pm, Herbert Xu wrote:
> Unless we can make __vmalloc work with BH disabled, I guess we'll
> have to go back to multi-level lookups unless someone has a better
> suggestion.

Assuming that we only encounter this scenario with very large
table sizes, it might be OK to assume that deferring the actual
resize via the worker thread while continuing to insert above
100% utilization in atomic context is safe.

On 12/07/15 at 02:29pm, David Miller wrote:
> You can't issue the cross-cpu TLB flushes from atomic contexts.
> It's the kernel page table updates that create the restriction.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-08 Thread Herbert Xu
On Wed, Dec 09, 2015 at 03:36:32AM +0100, Thomas Graf wrote:
> 
> Without knowing your exact implementation plans: introducing an
> additional reference indirection for every lookup will have a
> huge performance penalty as well.
> 
> Is your plan to only introduce the master table after an
> allocation has failed?

Right, obviously the extra indirections would only come into play
after a failed allocation.  As soon as we can run the worker thread
it'll try to remove the extra indirections by doing vmalloc.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-08 Thread Thomas Graf
On 12/09/15 at 10:38am, Herbert Xu wrote:
> On Wed, Dec 09, 2015 at 03:36:32AM +0100, Thomas Graf wrote:
> > 
> > Without knowing your exact implementation plans: introducing an
> > additional reference indirection for every lookup will have a
> > huge performance penalty as well.
> > 
> > Is your plan to only introduce the master table after an
> > allocation has failed?
> 
> Right, obviously the extra indirections would only come into play
> after a failed allocation.  As soon as we can run the worker thread
> it'll try to remove the extra indirections by doing vmalloc.

OK, this sounds like a good compromise. The penalty is isolated
for the duration of the atomic burst.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-07 Thread Thomas Graf
On 12/05/15 at 03:06pm, Herbert Xu wrote:
> On Fri, Dec 04, 2015 at 07:15:55PM +0100, Phil Sutter wrote:
> >
> > > Only one should really do this, while others are waiting.
> > 
> > Sure, that was my previous understanding of how this thing works.
> 
> Yes that's clearly how it should be.  Unfortunately while adding
> the locking to do this, I found out that you can't actually call
> __vmalloc with BH disabled so this is a no-go.
> 
> Unless we can make __vmalloc work with BH disabled, I guess we'll
> have to go back to multi-level lookups unless someone has a better
> suggestion.

Thanks for fixing the race.

As for the remaining problem, I think we'll have to find a way to
serve a hard pounding user if we want to convert TCP hashtables
later on.

Did you look into what __vmalloc prevents to work with BH disabled?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-07 Thread David Miller
From: Thomas Graf 
Date: Mon, 7 Dec 2015 16:35:24 +0100

> Did you look into what __vmalloc prevents to work with BH disabled?

You can't issue the cross-cpu TLB flushes from atomic contexts.
It's the kernel page table updates that create the restriction.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-05 Thread David Miller
From: Herbert Xu 
Date: Sat, 5 Dec 2015 15:03:54 +0800

> Sorry Dave but you'll have to revert this because I've been able
> to trigger the following crash with the patch:
> 
> Testing concurrent rhashtable access from 50 threads
> [ cut here ]
> kernel BUG at ../mm/vmalloc.c:1337!
> invalid opcode:  [#1] PREEMPT SMP 
> 
> The reason is that because I was testing insertions with BH disabled,
> and __vmalloc doesn't like that, even with GFP_ATOMIC.  As we
> obviously want to continue to support rhashtable users inserting
> entries with BH disabled, we'll have to look for an alternate
> solution.

Ok, reverted, thanks for the heads up.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-04 Thread Eric Dumazet
On Fri, 2015-12-04 at 18:01 +0100, Phil Sutter wrote:
> On Fri, Dec 04, 2015 at 10:39:56PM +0800, Herbert Xu wrote:
> > On Thu, Dec 03, 2015 at 08:08:39AM -0800, Eric Dumazet wrote:
> > >
> > > Anyway, __vmalloc() can be used with GFP_ATOMIC, have you tried this ?
> > 
> > OK I've tried it and I no longer get any ENOMEM errors!
> 
> I can't confirm this, sadly. Using 50 threads, results seem to be stable
> and good. But increasing the number of threads I can provoke ENOMEM
> condition again. See attached log which shows a failing test run with
> 100 threads.
> 
> I tried to extract logs of a test run with as few as possible failing
> threads, but wasn't successful. It seems like the error amplifies
> itself: While having stable success with less than 70 threads, going
> beyond a margin I could not identify exactly, much more threads failed
> than expected. For instance, the attached log shows 70 out of 100
> threads failing, while for me every single test with 50 threads was
> successful.
> 
> HTH, Phil

But this patch is about GFP_ATOMIC allocations, I doubt your test is
using GFP_ATOMIC.

Threads (process context) should use GFP_KERNEL allocations.

BTW, if 100 threads are simultaneously trying to vmalloc(32 MB), this
might not be very wise :(

Only one should really do this, while others are waiting.

If we really want parallelism (multiple cpus coordinating their effort),
it should be done very differently.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-04 Thread Phil Sutter
On Fri, Dec 04, 2015 at 10:39:56PM +0800, Herbert Xu wrote:
> On Thu, Dec 03, 2015 at 08:08:39AM -0800, Eric Dumazet wrote:
> >
> > Anyway, __vmalloc() can be used with GFP_ATOMIC, have you tried this ?
> 
> OK I've tried it and I no longer get any ENOMEM errors!

I can't confirm this, sadly. Using 50 threads, results seem to be stable
and good. But increasing the number of threads I can provoke ENOMEM
condition again. See attached log which shows a failing test run with
100 threads.

I tried to extract logs of a test run with as few as possible failing
threads, but wasn't successful. It seems like the error amplifies
itself: While having stable success with less than 70 threads, going
beyond a margin I could not identify exactly, much more threads failed
than expected. For instance, the attached log shows 70 out of 100
threads failing, while for me every single test with 50 threads was
successful.

HTH, Phil
[ 5196.212230] Running rhashtable test nelem=8, max_size=0, shrinking=0
[ 5196.243846] Test 00:
[ 5196.245990]   Adding 5 keys
[ 5196.250787] Info: encountered resize
[ 5196.251631] Info: encountered resize
[ 5196.252773] Info: encountered resize
[ 5196.256076]   Traversal complete: counted=5, nelems=5, 
entries=5, table-jumps=3
[ 5196.261961]   Traversal complete: counted=5, nelems=5, 
entries=5, table-jumps=0
[ 5196.263282]   Deleting 5 keys
[ 5196.267054]   Duration of test: 20359448 ns
[ 5196.267762] Test 01:
[ 5196.270278]   Adding 5 keys
[ 5196.276804] Info: encountered resize
[ 5196.278164]   Traversal complete: counted=5, nelems=5, 
entries=5, table-jumps=1
[ 5196.287668]   Traversal complete: counted=5, nelems=5, 
entries=5, table-jumps=0
[ 5196.289246]   Deleting 5 keys
[ 5196.293725]   Duration of test: 22689015 ns
[ 5196.294902] Test 02:
[ 5196.297878]   Adding 5 keys
[ 5196.305348] Info: encountered resize
[ 5196.306093] Info: encountered resize
[ 5196.306815] Info: encountered resize
[ 5196.307529] Info: encountered resize
[ 5196.308262] Info: encountered resize
[ 5196.308973] Info: encountered resize
[ 5196.309699] Info: encountered resize
[ 5196.310449] Info: encountered resize
[ 5196.311228] Info: encountered resize
[ 5196.311996] Info: encountered resize
[ 5196.312957] Info: encountered resize
[ 5196.314178] Info: encountered resize
[ 5196.318068]   Traversal complete: counted=5, nelems=5, 
entries=5, table-jumps=12
[ 5196.324140]   Traversal complete: counted=5, nelems=5, 
entries=5, table-jumps=0
[ 5196.325661]   Deleting 5 keys
[ 5196.338875]   Duration of test: 39997796 ns
[ 5196.339718] Test 03:
[ 5196.341610]   Adding 5 keys
[ 5196.349677]   Traversal complete: counted=5, nelems=5, 
entries=5, table-jumps=0
[ 5196.356153]   Traversal complete: counted=5, nelems=5, 
entries=5, table-jumps=0
[ 5196.357704]   Deleting 5 keys
[ 5196.362173]   Duration of test: 19844019 ns
[ 5196.363055] Average test time: 25722569
[ 5196.363815] Testing concurrent rhashtable access from 100 threads
[ 5196.684648] vmalloc: allocation failure, allocated 22126592 of 33562624 bytes
[ 5196.685195]   thread[87]: rhashtable_insert_fast failed
[ 5196.687075] rhashtable_thra: page allocation failure: order:0, mode:0x2088022
[ 5196.687652] vmalloc: allocation failure, allocated 22245376 of 33562624 bytes
[ 5196.687653] rhashtable_thra: page allocation failure: order:0, mode:0x2088022
[ 5196.687655] CPU: 1 PID: 12259 Comm: rhashtable_thra Not tainted 
4.4.0-rc1rhashtable+ #141
[ 5196.687656] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[ 5196.687659]   25caf0f8 88003094bc70 
81308384
[ 5196.687660]  02088022 88003094bd00 8117b18c 
81815d58
[ 5196.687661]  88003094bc90 0018 88003094bd10 
88003094bcb0
[ 5196.687661] Call Trace:
[ 5196.687667]  [] dump_stack+0x44/0x60
[ 5196.687669]  [] warn_alloc_failed+0xfc/0x170
[ 5196.687673]  [] ? alloc_pages_current+0x8c/0x110
[ 5196.687675]  [] __vmalloc_node_range+0x18e/0x290
[ 5196.687677]  [] ? bucket_table_alloc+0x4a/0xf0
[ 5196.687681]  [] __vmalloc+0x4a/0x50
[ 5196.687682]  [] ? bucket_table_alloc+0x4a/0xf0
[ 5196.687683]  [] bucket_table_alloc+0x4a/0xf0
[ 5196.687684]  [] rhashtable_insert_rehash+0x5d/0xe0
[ 5196.687687]  [] 
insert_retry.isra.9.constprop.15+0x177/0x270 [test_rhashtable]
[ 5196.687689]  [] threadfunc+0xa6/0x38e [test_rhashtable]
[ 5196.687691]  [] ? __schedule+0x2ac/0x920
[ 5196.687693]  [] ? 
insert_retry.isra.9.constprop.15+0x270/0x270 [test_rhashtable]
[ 5196.687695]  [] ? 
insert_retry.isra.9.constprop.15+0x270/0x270 [test_rhashtable]
[ 5196.687697]  [] kthread+0xd8/0xf0
[ 5196.687698]  [] ? kthread_park+0x60/0x60
[ 5196.687700]  [] ret_from_fork+0x3f/0x70
[ 5196.687701]  [] ? kthread_park+0x60/0x60
[ 5196.687702] Mem-Info:
[ 5196.687704] active_anon:16125 

Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-04 Thread David Miller
From: Herbert Xu 
Date: Fri, 4 Dec 2015 22:39:56 +0800

> When an rhashtable user pounds rhashtable hard with back-to-back
> insertions we may end up growing the table in GFP_ATOMIC context.
> Unfortunately when the table reaches a certain size this often
> fails because we don't have enough physically contiguous pages
> to hold the new table.
> 
> Eric Dumazet suggested (and in fact wrote this patch) using
> __vmalloc instead which can be used in GFP_ATOMIC context.
> 
> Reported-by: Phil Sutter 
> Suggested-by: Eric Dumazet 
> Signed-off-by: Herbert Xu 

Applied, thanks Herbert.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-04 Thread Herbert Xu
On Fri, Dec 04, 2015 at 07:15:55PM +0100, Phil Sutter wrote:
>
> > Only one should really do this, while others are waiting.
> 
> Sure, that was my previous understanding of how this thing works.

Yes that's clearly how it should be.  Unfortunately while adding
the locking to do this, I found out that you can't actually call
__vmalloc with BH disabled so this is a no-go.

Unless we can make __vmalloc work with BH disabled, I guess we'll
have to go back to multi-level lookups unless someone has a better
suggestion.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-04 Thread Herbert Xu
On Fri, Dec 04, 2015 at 04:53:34PM -0500, David Miller wrote:
> From: Herbert Xu 
> Date: Fri, 4 Dec 2015 22:39:56 +0800
> 
> > When an rhashtable user pounds rhashtable hard with back-to-back
> > insertions we may end up growing the table in GFP_ATOMIC context.
> > Unfortunately when the table reaches a certain size this often
> > fails because we don't have enough physically contiguous pages
> > to hold the new table.
> > 
> > Eric Dumazet suggested (and in fact wrote this patch) using
> > __vmalloc instead which can be used in GFP_ATOMIC context.
> > 
> > Reported-by: Phil Sutter 
> > Suggested-by: Eric Dumazet 
> > Signed-off-by: Herbert Xu 
> 
> Applied, thanks Herbert.

Sorry Dave but you'll have to revert this because I've been able
to trigger the following crash with the patch:

Testing concurrent rhashtable access from 50 threads
[ cut here ]
kernel BUG at ../mm/vmalloc.c:1337!
invalid opcode:  [#1] PREEMPT SMP 

The reason is that because I was testing insertions with BH disabled,
and __vmalloc doesn't like that, even with GFP_ATOMIC.  As we
obviously want to continue to support rhashtable users inserting
entries with BH disabled, we'll have to look for an alternate
solution.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation

2015-12-04 Thread Phil Sutter
On Fri, Dec 04, 2015 at 09:45:20AM -0800, Eric Dumazet wrote:
> On Fri, 2015-12-04 at 18:01 +0100, Phil Sutter wrote:
> > On Fri, Dec 04, 2015 at 10:39:56PM +0800, Herbert Xu wrote:
> > > On Thu, Dec 03, 2015 at 08:08:39AM -0800, Eric Dumazet wrote:
> > > >
> > > > Anyway, __vmalloc() can be used with GFP_ATOMIC, have you tried this ?
> > > 
> > > OK I've tried it and I no longer get any ENOMEM errors!
> > 
> > I can't confirm this, sadly. Using 50 threads, results seem to be stable
> > and good. But increasing the number of threads I can provoke ENOMEM
> > condition again. See attached log which shows a failing test run with
> > 100 threads.
> > 
> > I tried to extract logs of a test run with as few as possible failing
> > threads, but wasn't successful. It seems like the error amplifies
> > itself: While having stable success with less than 70 threads, going
> > beyond a margin I could not identify exactly, much more threads failed
> > than expected. For instance, the attached log shows 70 out of 100
> > threads failing, while for me every single test with 50 threads was
> > successful.
> 
> But this patch is about GFP_ATOMIC allocations, I doubt your test is
> using GFP_ATOMIC.
> 
> Threads (process context) should use GFP_KERNEL allocations.

Well, I assumed Herbert did his tests using test_rhashtable, and
therefore fixed whatever code-path that triggers. Maybe I'm wrong,
though.

Looking at the vmalloc allocation failure trace, it seems like it's
trying to indeed use GFP_ATOMIC from inside those threads: If I don't
miss anything, bucket_table_alloc is called from
rhashtable_insert_rehash, which passes GFP_ATOMIC unconditionally. But
then again bucket_table_alloc should use kzalloc if 'gfp != GFP_KERNEL',
so I'm probably just cross-eyed right now.

> BTW, if 100 threads are simultaneously trying to vmalloc(32 MB), this
> might not be very wise :(
> 
> Only one should really do this, while others are waiting.

Sure, that was my previous understanding of how this thing works.

> If we really want parallelism (multiple cpus coordinating their effort),
> it should be done very differently.

Maybe my approach of stress-testing rhashtable was too naive in the
first place.

Thanks, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html