Re: pool_grow hangs (Re: CVS commit: src/sys)
On Fri, 29 Dec 2017 16:40:52 + co...@sdf.org wrote: > On Fri, Dec 29, 2017 at 03:41:42PM +0100, Tobias Nygren wrote: > > The machine has survived for 30+ minutes where it previously hung after > > just 20 seconds. > > Do you mean "30+ minutes so far" or does it still hang somewhere? where? No, I did not manage to break it yet.
Re: pool_grow hangs (Re: CVS commit: src/sys)
On Fri, Dec 29, 2017 at 03:41:42PM +0100, Tobias Nygren wrote: > The machine has survived for 30+ minutes where it previously hung after > just 20 seconds. Do you mean "30+ minutes so far" or does it still hang somewhere? where?
Re: pool_grow hangs (Re: CVS commit: src/sys)
> Meanwhile another process, waiting for the grower to finish, is > spinning forever at 100% doing the mutex_exit/mutex_enter/ERESTART > thing on the same pool. It looks to me like the grower never actually > gets scheduled to run. The attached diff works around the problem by not releasing the lock during allocation in the PR_NOWAIT case. I don't know if doing it that way could have any negative side effects? The machine has survived for 30+ minutes where it previously hung after just 20 seconds. Kind regards, -Tobias Index: subr_pool.c === RCS file: /cvsroot/src/sys/kern/subr_pool.c,v retrieving revision 1.219 diff -u -r1.219 subr_pool.c --- subr_pool.c 16 Dec 2017 03:13:29 - 1.219 +++ subr_pool.c 29 Dec 2017 14:33:40 - @@ -1091,7 +1091,9 @@ if ((flags & PR_WAITOK) == 0) pp->pr_flags |= PR_GROWINGNOWAIT; - mutex_exit(&pp->pr_lock); + if (flags & PR_WAITOK) + mutex_exit(&pp->pr_lock); + char *cp = pool_allocator_alloc(pp, flags); if (__predict_false(cp == NULL)) goto out; @@ -1102,7 +1104,8 @@ goto out; } - mutex_enter(&pp->pr_lock); + if (flags & PR_WAITOK) + mutex_enter(&pp->pr_lock); pool_prime_page(pp, cp, ph); pp->pr_npagealloc++; KASSERT(pp->pr_flags & PR_GROWING); @@ -1115,8 +1118,9 @@ return 0; out: KASSERT(pp->pr_flags & PR_GROWING); + if (flags & PR_WAITOK) + mutex_enter(&pp->pr_lock); pp->pr_flags &= ~(PR_GROWING|PR_GROWINGNOWAIT); - mutex_enter(&pp->pr_lock); return ENOMEM; }
pool_grow hangs (Re: CVS commit: src/sys)
On Sat, 16 Dec 2017 03:13:29 + matthew green wrote: > Module Name: src > Committed By: mrg > Date: Sat Dec 16 03:13:29 UTC 2017 > > Modified Files: > src/sys/kern: subr_pool.c > src/sys/sys: pool.h > > Log Message: > hopefully workaround the irregularly "fork fails in init" problem. > > if a pool is growing, and the grower is PR_NOWAIT, mark this. > if another caller wants to grow the pool and is also PR_NOWAIT, > busy-wait for the original caller, which should either succeed > or hard-fail fairly quickly. > > implement the busy-wait by unlocking and relocking this pools > mutex and returning ERESTART. other methods (such as having > the caller do this) were significantly more code and this hack > is fairly localised. > > ok chs@ riastradh@ Hi! I have an easily reproducable system hang that I believe originates from this change. It can be triggered by doing lots of block and network i/o (like 3 multiple rsyncs) on a uniprocessor system running under Linux KVM. Basically what happens is that for unknown reasons the PR_NOWAIT grower blocks forever when it tries to reaquire the pool lock to do pool_prime_page() in pool_grow(). Meanwhile another process, waiting for the grower to finish, is spinning forever at 100% doing the mutex_exit/mutex_enter/ERESTART thing on the same pool. It looks to me like the grower never actually gets scheduled to run. Also, although it doesn't fix the issue, this pr_flags modification looks like it should be moved to after the mutex is acquired: pp->pr_flags &= ~(PR_GROWING|PR_GROWINGNOWAIT); mutex_enter(&pp->pr_lock); Kind regards, -Tobias