Re: [Nfs-ganesha-devel] shutdown hangs/delays
On 9/9/17 12:16 AM, William Allen Simpson wrote: On 9/8/17 9:44 AM, Daniel Gryniewicz wrote: On 09/08/2017 09:07 AM, William Allen Simpson wrote: On 9/7/17 10:47 PM, Malahal Naineni wrote: Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 second timeout that, it was working after such a timeout. I have seen the same, after I sped up the work pool shutdown. The work pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting for that last thread. I don't know how/why a thread is getting into epoll_wait() during the window between svc_rqst_shutdown() and work_pool_shutdown(), but that's what happens sometimes. Probably need yet another flag in svc_rqst_shutdown(). I'm looking at using an eventfd to wake up threads on shutdown. That way, we can sleep for a long time while polling. There's already a signal to awaken the threads on shutdown. Finally figured it out, but it was complicated and took too long for review and inclusion into this week's dev release: (1) nfs_rpc_dispatch_stop() calls svc_rqst_thrd_signal() with SVC_RQST_SIGNAL_SHUTDOWN for each service listener channel. (2) somewhere else calls clnt_vc_ncreatef() and clnt_vc_call() over and over, which sets up another transport epoll fd and then deletes it after each reply. Presumably this is unregistering services. Should probably unregister services *before* nfs_rpc_dispatch_stop() kills the listeners? Done. Removed nfs_rpc_dispatch_stop() entirely. Should also call clnt_vc_ncreatef() once, and then call clnt_vc_call() repeatedly instead. No need to emulate UDP with TCP! This still needs to be looked at, but not in this patch. (3) then calls svc_shutdown(), which in turn calls svc_xprt_shutdown(), svc_rqst_shutdown(), and work_pool_shutdown(). (4) svc_xprt_shutdown() kills any remaining open transports. (5) svc_rqst_shutdown() didn't kill epolls that have no transports. The fix is to kill again channels previously killed in step #1, even though they no longer have any open transports. Done. Especially as #1 is removed. (6) work_pool_shutdown() waited until timeout caused that one remaining channel for the epoll fd (step #2) to terminate. Still takes an extra second or two for all the cleanup threads to complete. This whole process has obviously been a problem in the past, and there were several otherwise extraneous state flags. This fix means they are not needed anymore. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] shutdown hangs/delays
On 9/8/17 9:44 AM, Daniel Gryniewicz wrote: On 09/08/2017 09:07 AM, William Allen Simpson wrote: On 9/7/17 10:47 PM, Malahal Naineni wrote: Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 second timeout that, it was working after such a timeout. I have seen the same, after I sped up the work pool shutdown. The work pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting for that last thread. I don't know how/why a thread is getting into epoll_wait() during the window between svc_rqst_shutdown() and work_pool_shutdown(), but that's what happens sometimes. Probably need yet another flag in svc_rqst_shutdown(). I'm looking at using an eventfd to wake up threads on shutdown. That way, we can sleep for a long time while polling. There's already a signal to awaken the threads on shutdown. Finally figured it out, but it was complicated and took too long for review and inclusion into this week's dev release: (1) nfs_rpc_dispatch_stop() calls svc_rqst_thrd_signal() with SVC_RQST_SIGNAL_SHUTDOWN for each service listener channel. (2) somewhere else calls clnt_vc_ncreatef() and clnt_vc_call() over and over, which sets up another transport epoll fd and then deletes it after each reply. Presumably this is unregistering services. Should probably unregister services *before* nfs_rpc_dispatch_stop() kills the listeners? Should also call clnt_vc_ncreatef() once, and then call clnt_vc_call() repeatedly instead. No need to emulate UDP with TCP! (3) then calls svc_shutdown(), which in turn calls svc_xprt_shutdown(), svc_rqst_shutdown(), and work_pool_shutdown(). (4) svc_xprt_shutdown() kills any remaining open transports. (5) svc_rqst_shutdown() didn't kill epolls that have no transports. The fix is to kill again channels previously killed in step #1, even though they no longer have any open transports. (6) work_pool_shutdown() waited until timeout caused that one remaining channel for the epoll fd (step #2) to terminate. This whole process has obviously been a problem in the past, and there were several otherwise extraneous state flags. This fix means they are not needed anymore. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] shutdown hangs/delays
On 09/08/2017 09:07 AM, William Allen Simpson wrote: On 9/7/17 10:47 PM, Malahal Naineni wrote: Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 second timeout that, it was working after such a timeout. I have seen the same, after I sped up the work pool shutdown. The work pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting for that last thread. I don't know how/why a thread is getting into epoll_wait() during the window between svc_rqst_shutdown() and work_pool_shutdown(), but that's what happens sometimes. Probably need yet another flag in svc_rqst_shutdown(). I'm looking at using an eventfd to wake up threads on shutdown. That way, we can sleep for a long time while polling. Daniel -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] shutdown hangs/delays
On 9/7/17 10:47 PM, Malahal Naineni wrote: Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 second timeout that, it was working after such a timeout. I have seen the same, after I sped up the work pool shutdown. The work pool shutdown will nanosleep 1 second intervals (was 5 seconds) waiting for that last thread. I don't know how/why a thread is getting into epoll_wait() during the window between svc_rqst_shutdown() and work_pool_shutdown(), but that's what happens sometimes. Probably need yet another flag in svc_rqst_shutdown(). -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
Re: [Nfs-ganesha-devel] shutdown hangs/delays
Last time I tried, I got the same. A thread was waiting in epoll_wait() with 29 second timeout that, it was working after such a timeout. On Fri, Sep 8, 2017 at 3:46 AM, Frank Filzwrote: > I wanted to see what is up with shutdown lately... > > Running under gdb, I hit a long pause, but shutdown is completing for me, > during that pause, these are the active threads: > > (gdb) thread apply all bt > > Thread 276 (Thread 0x7fff6b2fb700 (LWP 5364)): > #0 0x7638027d in nanosleep () from /lib64/libpthread.so.0 > #1 0x75f4f3dd in work_pool_shutdown (pool=0x76167ac0 > ) at > /home/ffilz/ganesha/review/src/libntirpc/src/work_pool.c:318 > #2 0x75f4116d in svc_shutdown (flags=0) at > /home/ffilz/ganesha/review/src/libntirpc/src/svc.c:811 > #3 0x0045a5c8 in do_shutdown () at > /home/ffilz/ganesha/review/src/MainNFSD/nfs_admin_thread.c:512 > #4 0x0045a8b6 in admin_thread (UnusedArg=0x0) at > /home/ffilz/ganesha/review/src/MainNFSD/nfs_admin_thread.c:545 > #5 0x7637760a in start_thread () from /lib64/libpthread.so.0 > #6 0x75a48a4d in clone () from /lib64/libc.so.6 > > Thread 274 (Thread 0x7fff6c2fd700 (LWP 5362)): > #0 0x7637fd9d in accept () from /lib64/libpthread.so.0 > #1 0x00458287 in _9p_dispatcher_thread (Arg=0x0) at > /home/ffilz/ganesha/review/src/MainNFSD/9p_dispatcher.c:582 > #2 0x7637760a in start_thread () from /lib64/libpthread.so.0 > #3 0x75a48a4d in clone () from /lib64/libc.so.6 > > Thread 9 (Thread 0x77f0e700 (LWP 5083)): > #0 0x75a49043 in epoll_wait () from /lib64/libc.so.6 > #1 0x75f45ce1 in svc_rqst_epoll_loop (sr_rec=0x72ece8c0) at > /home/ffilz/ganesha/review/src/libntirpc/src/svc_rqst.c:893 > #2 0x75f45e1e in svc_rqst_run_task (wpe=0x72ece8d0) at > /home/ffilz/ganesha/review/src/libntirpc/src/svc_rqst.c:945 > #3 0x75f4ede1 in work_pool_thread (arg=0x7fffeffd7080) at > /home/ffilz/ganesha/review/src/libntirpc/src/work_pool.c:171 > #4 0x7637760a in start_thread () from /lib64/libpthread.so.0 > #5 0x75a48a4d in clone () from /lib64/libc.so.6 > > Thread 3 (Thread 0x723fe700 (LWP 5077)): > #0 0x7637ceb9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x004fe9cf in fridgethr_freeze (fr=0x72c44480, > thr_ctx=0x72c13580) at > /home/ffilz/ganesha/review/src/support/fridgethr.c:416 > #2 0x004ff1f9 in fridgethr_start_routine (arg=0x72c13580) at > /home/ffilz/ganesha/review/src/support/fridgethr.c:554 > #3 0x7637760a in start_thread () from /lib64/libpthread.so.0 > #4 0x75a48a4d in clone () from /lib64/libc.so.6 > > Thread 2 (Thread 0x72bff700 (LWP 5076)): > #0 0x7637ceb9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x004fe9cf in fridgethr_freeze (fr=0x72c44480, > thr_ctx=0x72c13300) at > /home/ffilz/ganesha/review/src/support/fridgethr.c:416 > #2 0x004ff1f9 in fridgethr_start_routine (arg=0x72c13300) at > /home/ffilz/ganesha/review/src/support/fridgethr.c:554 > #3 0x7637760a in start_thread () from /lib64/libpthread.so.0 > #4 0x75a48a4d in clone () from /lib64/libc.so.6 > > Thread 1 (Thread 0x77f43140 (LWP 5049)): > #0 0x763786ad in pthread_join () from /lib64/libpthread.so.0 > #1 0x004529f1 in nfs_start (p_start_info=0x7d2ef8 > ) at > /home/ffilz/ganesha/review/src/MainNFSD/nfs_init.c:960 > #2 0x0041d253 in main (argc=8, argv=0x7fffe3d8) at > /home/ffilz/ganesha/review/src/MainNFSD/nfs_main.c:494 > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Nfs-ganesha-devel mailing list > Nfs-ganesha-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel