Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-11 Thread William Allen Simpson

On 9/9/17 12:16 AM, William Allen Simpson wrote:

On 9/8/17 9:44 AM, Daniel Gryniewicz wrote:

On 09/08/2017 09:07 AM, William Allen Simpson wrote:

On 9/7/17 10:47 PM, Malahal Naineni wrote:

Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 
second timeout that, it was working after such a timeout.


I have seen the same, after I sped up the work pool shutdown.  The work
pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting
for that last thread.

I don't know how/why a thread is getting into epoll_wait() during the
window between svc_rqst_shutdown() and work_pool_shutdown(), but that's
what happens sometimes.

Probably need yet another flag in svc_rqst_shutdown().



I'm looking at using an eventfd to wake up threads on shutdown.  That way, we 
can sleep for a long time while polling.


There's already a signal to awaken the threads on shutdown.

Finally figured it out, but it was complicated and took too long for
review and inclusion into this week's dev release:

(1) nfs_rpc_dispatch_stop() calls svc_rqst_thrd_signal() with
SVC_RQST_SIGNAL_SHUTDOWN for each service listener channel.

(2) somewhere else calls clnt_vc_ncreatef() and clnt_vc_call() over and
over, which sets up another transport epoll fd and then deletes it after
each reply.

Presumably this is unregistering services.  Should probably unregister
services *before* nfs_rpc_dispatch_stop() kills the listeners?


Done.  Removed nfs_rpc_dispatch_stop() entirely.



Should also call clnt_vc_ncreatef() once, and then call clnt_vc_call()
repeatedly instead.  No need to emulate UDP with TCP!


This still needs to be looked at, but not in this patch.



(3) then calls svc_shutdown(), which in turn calls svc_xprt_shutdown(),
svc_rqst_shutdown(), and work_pool_shutdown().

(4) svc_xprt_shutdown() kills any remaining open transports.

(5) svc_rqst_shutdown() didn't kill epolls that have no transports.  The
fix is to kill again channels previously killed in step #1, even though
they no longer have any open transports.


Done.  Especially as #1 is removed.



(6) work_pool_shutdown() waited until timeout caused that one remaining
channel for the epoll fd (step #2) to terminate.


Still takes an extra second or two for all the cleanup threads to complete.




This whole process has obviously been a problem in the past, and there
were several otherwise extraneous state flags.  This fix means they are
not needed anymore.



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-08 Thread William Allen Simpson

On 9/8/17 9:44 AM, Daniel Gryniewicz wrote:

On 09/08/2017 09:07 AM, William Allen Simpson wrote:

On 9/7/17 10:47 PM, Malahal Naineni wrote:

Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 
second timeout that, it was working after such a timeout.


I have seen the same, after I sped up the work pool shutdown.  The work
pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting
for that last thread.

I don't know how/why a thread is getting into epoll_wait() during the
window between svc_rqst_shutdown() and work_pool_shutdown(), but that's
what happens sometimes.

Probably need yet another flag in svc_rqst_shutdown().



I'm looking at using an eventfd to wake up threads on shutdown.  That way, we 
can sleep for a long time while polling.


There's already a signal to awaken the threads on shutdown.

Finally figured it out, but it was complicated and took too long for
review and inclusion into this week's dev release:

(1) nfs_rpc_dispatch_stop() calls svc_rqst_thrd_signal() with
SVC_RQST_SIGNAL_SHUTDOWN for each service listener channel.

(2) somewhere else calls clnt_vc_ncreatef() and clnt_vc_call() over and
over, which sets up another transport epoll fd and then deletes it after
each reply.

Presumably this is unregistering services.  Should probably unregister
services *before* nfs_rpc_dispatch_stop() kills the listeners?

Should also call clnt_vc_ncreatef() once, and then call clnt_vc_call()
repeatedly instead.  No need to emulate UDP with TCP!

(3) then calls svc_shutdown(), which in turn calls svc_xprt_shutdown(),
svc_rqst_shutdown(), and work_pool_shutdown().

(4) svc_xprt_shutdown() kills any remaining open transports.

(5) svc_rqst_shutdown() didn't kill epolls that have no transports.  The
fix is to kill again channels previously killed in step #1, even though
they no longer have any open transports.

(6) work_pool_shutdown() waited until timeout caused that one remaining
channel for the epoll fd (step #2) to terminate.

This whole process has obviously been a problem in the past, and there
were several otherwise extraneous state flags.  This fix means they are
not needed anymore.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-08 Thread Daniel Gryniewicz

On 09/08/2017 09:07 AM, William Allen Simpson wrote:

On 9/7/17 10:47 PM, Malahal Naineni wrote:
Last time I tried, I got the same. A thread was waiting 
in epoll_wait() with 29 second timeout that, it was working after such 
a timeout.



I have seen the same, after I sped up the work pool shutdown.  The work
pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting
for that last thread.

I don't know how/why a thread is getting into epoll_wait() during the
window between svc_rqst_shutdown() and work_pool_shutdown(), but that's
what happens sometimes.

Probably need yet another flag in svc_rqst_shutdown().



I'm looking at using an eventfd to wake up threads on shutdown.  That 
way, we can sleep for a long time while polling.


Daniel

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-08 Thread William Allen Simpson

On 9/7/17 10:47 PM, Malahal Naineni wrote:

Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 
second timeout that, it was working after such a timeout.


I have seen the same, after I sped up the work pool shutdown.  The work
pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting
for that last thread.

I don't know how/why a thread is getting into epoll_wait() during the
window between svc_rqst_shutdown() and work_pool_shutdown(), but that's
what happens sometimes.

Probably need yet another flag in svc_rqst_shutdown().

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] shutdown hangs/delays

2017-09-07 Thread Malahal Naineni
Last time I tried, I got the same. A thread was waiting in epoll_wait()
with 29 second timeout that, it was working after such a timeout.

On Fri, Sep 8, 2017 at 3:46 AM, Frank Filz  wrote:

> I wanted to see what is up with shutdown lately...
>
> Running under gdb, I hit a long pause, but shutdown is completing for me,
> during that pause, these are the active threads:
>
> (gdb) thread apply all bt
>
> Thread 276 (Thread 0x7fff6b2fb700 (LWP 5364)):
> #0  0x7638027d in nanosleep () from /lib64/libpthread.so.0
> #1  0x75f4f3dd in work_pool_shutdown (pool=0x76167ac0
> ) at
> /home/ffilz/ganesha/review/src/libntirpc/src/work_pool.c:318
> #2  0x75f4116d in svc_shutdown (flags=0) at
> /home/ffilz/ganesha/review/src/libntirpc/src/svc.c:811
> #3  0x0045a5c8 in do_shutdown () at
> /home/ffilz/ganesha/review/src/MainNFSD/nfs_admin_thread.c:512
> #4  0x0045a8b6 in admin_thread (UnusedArg=0x0) at
> /home/ffilz/ganesha/review/src/MainNFSD/nfs_admin_thread.c:545
> #5  0x7637760a in start_thread () from /lib64/libpthread.so.0
> #6  0x75a48a4d in clone () from /lib64/libc.so.6
>
> Thread 274 (Thread 0x7fff6c2fd700 (LWP 5362)):
> #0  0x7637fd9d in accept () from /lib64/libpthread.so.0
> #1  0x00458287 in _9p_dispatcher_thread (Arg=0x0) at
> /home/ffilz/ganesha/review/src/MainNFSD/9p_dispatcher.c:582
> #2  0x7637760a in start_thread () from /lib64/libpthread.so.0
> #3  0x75a48a4d in clone () from /lib64/libc.so.6
>
> Thread 9 (Thread 0x77f0e700 (LWP 5083)):
> #0  0x75a49043 in epoll_wait () from /lib64/libc.so.6
> #1  0x75f45ce1 in svc_rqst_epoll_loop (sr_rec=0x72ece8c0) at
> /home/ffilz/ganesha/review/src/libntirpc/src/svc_rqst.c:893
> #2  0x75f45e1e in svc_rqst_run_task (wpe=0x72ece8d0) at
> /home/ffilz/ganesha/review/src/libntirpc/src/svc_rqst.c:945
> #3  0x75f4ede1 in work_pool_thread (arg=0x7fffeffd7080) at
> /home/ffilz/ganesha/review/src/libntirpc/src/work_pool.c:171
> #4  0x7637760a in start_thread () from /lib64/libpthread.so.0
> #5  0x75a48a4d in clone () from /lib64/libc.so.6
>
> Thread 3 (Thread 0x723fe700 (LWP 5077)):
> #0  0x7637ceb9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x004fe9cf in fridgethr_freeze (fr=0x72c44480,
> thr_ctx=0x72c13580) at
> /home/ffilz/ganesha/review/src/support/fridgethr.c:416
> #2  0x004ff1f9 in fridgethr_start_routine (arg=0x72c13580) at
> /home/ffilz/ganesha/review/src/support/fridgethr.c:554
> #3  0x7637760a in start_thread () from /lib64/libpthread.so.0
> #4  0x75a48a4d in clone () from /lib64/libc.so.6
>
> Thread 2 (Thread 0x72bff700 (LWP 5076)):
> #0  0x7637ceb9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x004fe9cf in fridgethr_freeze (fr=0x72c44480,
> thr_ctx=0x72c13300) at
> /home/ffilz/ganesha/review/src/support/fridgethr.c:416
> #2  0x004ff1f9 in fridgethr_start_routine (arg=0x72c13300) at
> /home/ffilz/ganesha/review/src/support/fridgethr.c:554
> #3  0x7637760a in start_thread () from /lib64/libpthread.so.0
> #4  0x75a48a4d in clone () from /lib64/libc.so.6
>
> Thread 1 (Thread 0x77f43140 (LWP 5049)):
> #0  0x763786ad in pthread_join () from /lib64/libpthread.so.0
> #1  0x004529f1 in nfs_start (p_start_info=0x7d2ef8
> ) at
> /home/ffilz/ganesha/review/src/MainNFSD/nfs_init.c:960
> #2  0x0041d253 in main (argc=8, argv=0x7fffe3d8) at
> /home/ffilz/ganesha/review/src/MainNFSD/nfs_main.c:494
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel