Re: nfsd kernel threads won't die via SIGKILL

2018-06-26 Thread Rick Macklem
Konstantin Belousov wrote:
On Mon, Jun 25, 2018 at 02:04:32AM +, Rick Macklem wrote:
> Konstantin Belousov wrote:
> >On Sat, Jun 23, 2018 at 09:03:02PM +, Rick Macklem wrote:
> >> During testing of the pNFS server I have been frequently 
> >> killing/restarting the nfsd.
> >> Once in a while, the "slave" nfsd process doesn't terminate and a "ps 
> >> axHl" shows:
> >>   0 48889 1   0   20  0  5884  812 svcexit  D -   0:00.01 nfsd: 
> >> server
> >>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> >> server
> >> ... more of the same
> >>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> >> server
> >>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   1:51.78 nfsd: 
> >> server
> >>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   2:27.75 nfsd: 
> >> server
> >>
> >> You can see that the top thread (the one that was created with the 
> >> process) is
> >> stuck in "D"  on "svcexit".
> >> The rest of the threads are still servicing NFS RPCs.
[lots of stuff snipped]
>Signals are put onto a signal queue between a time where the signal is
>generated until the thread actually consumes it.  I.e. the signal queue
>is a container for the signals which are not yet acted upon.  There is
>one signal queue per process, and one signal queue for each thread
>belonging to the process.  When you signal the process, the signal is
>put into some thread' signal queue, where the only criteria for the
>selection of the thread is that the signal is not blocked.  Since
>SIGKILL is never blocked, it is put anywhere.
>
>Until signal is delivered by cursig()/postsig() loop, typically at the
>AST handler, the only consequence of its presence are the EINTR/ERESTART
>errors returned from the PCATCH-enabled sleeps.
Ok, now I think I understand how this works. Thanks a lot for the explanation.

> >Your description at the start of the message of the behaviour after
> >SIGKILL, where other threads continued to serve RPCs, exactly matches
> >above explanation. You need to add some global 'stop' flag, if it is not
I looked at the code and there is already basically a "global stop flag".
It's done by setting the sg_state variable to CLOSING for all thread groups
in a function called svc_exit().  (I missed this when I looked before, so I
didn't understand how all the threads normally terminate.)

So, when I looked at svc_run_internal(), there is a loop while (state != 
closing)
that calls cv_wait_sig()/cv_timedwait_sig() and when these return EINTR/ERESTART
the call to svc_exit() is done to make the threads all return from the function.
--> The only way in can get into the broken situation I see sometimes is if the
  top thread (called "ismaster" by the code) somehow returns from
  svc_run_internal() without calling svc_exit(), so that the state isn't 
set to
  "closing".

  Turns out there is only one place this can happen. It's this line:
   if (grp->sg_threadcount > grp->sg_maxthreads)
break;
  I wouldn't have thought that sg_threadcount would have become ">" than
  sg_maxthreads, but when I looked at the output of "ps" that I pasted into
  the first message, there are 33 threads. (When I started the nfsd, I 
specified
  32 threads, so I think it did the "break;" at this place to get out of 
the loop
  and return from svc_run_internal() without calling svc_exit().)

  I think changing the above line to:
 if (!ismaster && grp->sg_threadcount > grp->sg_maxthreads)
  will fix it.

  I'll test this and see if I can get it to fail.

Thanks again for your help, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: nfsd kernel threads won't die via SIGKILL

2018-06-25 Thread Konstantin Belousov
On Mon, Jun 25, 2018 at 02:04:32AM +, Rick Macklem wrote:
> Konstantin Belousov wrote:
> >On Sat, Jun 23, 2018 at 09:03:02PM +, Rick Macklem wrote:
> >> During testing of the pNFS server I have been frequently 
> >> killing/restarting the nfsd.
> >> Once in a while, the "slave" nfsd process doesn't terminate and a "ps 
> >> axHl" shows:
> >>   0 48889 1   0   20  0  5884  812 svcexit  D -   0:00.01 nfsd: 
> >> server
> >>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> >> server
> >> ... more of the same
> >>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> >> server
> >>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   1:51.78 nfsd: 
> >> server
> >>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   2:27.75 nfsd: 
> >> server
> >>
> >> You can see that the top thread (the one that was created with the 
> >> process) is
> >> stuck in "D"  on "svcexit".
> >> The rest of the threads are still servicing NFS RPCs. If you still have an 
> >> NFS mount >>on
> >> the server, the mount continues to work and the CPU time for the last two 
> >> threads
> >> slowly climbs, due to NFS RPC activity. A SIGKILL was posted for the 
> >> process and
> >> these threads (created by kthread_add) are here, but the
> >> cv_wait_sig/cv_timedwait_sig never seems to return EINTR for these other 
> >> >>threads.
> >>
> >>if (ismaster || (!ismaster &&
> >> 1207  grp->sg_threadcount > 
> >> grp->sg_minthreads))
> >> 1208  error = 
> >> cv_timedwait_sig(>st_cond,
> >> 1209  >sg_lock, 5 * hz);
> >> 1210  else
> >> 1211  error = cv_wait_sig(>st_cond,
> >> 1212  >sg_lock);
> >>
> >> The top thread (referred to in svc.c as "ismaster" did return from here 
> >> with EINTR
> >> and has now done an msleep() here, waiting for the other threads to 
> >> terminate.
> >>
> >>/* Waiting for threads to stop. */
> >> 1387  for (g = 0; g < pool->sp_groupcount; g++) {
> >> 1388  grp = >sp_groups[g];
> >> 1389  mtx_lock(>sg_lock);
> >> 1390  while (grp->sg_threadcount > 0)
> >> 1391  msleep(grp, >sg_lock, 0, "svcexit", 0);
> >> 1392  mtx_unlock(>sg_lock);
> >> 1393  }
> >>
> >> Although I can't be sure if this patch has fixed the problem because it 
> >> happens
> >> intermittently, I have not seen the problem since applying this patch:
> >> --- rpc/svc.c.sav 2018-06-21 22:52:11.623955000 -0400
> >> +++ rpc/svc.c 2018-06-22 09:01:40.271803000 -0400
> >> @@ -1388,7 +1388,7 @@ svc_run(SVCPOOL *pool)
> >>   grp = >sp_groups[g];
> >>   mtx_lock(>sg_lock);
> >>   while (grp->sg_threadcount > 0)
> >> - msleep(grp, >sg_lock, 0, "svcexit", 0);
> >> + msleep(grp, >sg_lock, 0, "svcexit", 1);
> >>   mtx_unlock(>sg_lock);
> >>   }
> >>  }
> >>
> >> As you can see, all it does is add a timeout to the msleep().
> >> I am not familiar with the signal delivery code in sleepqeue, so it 
> >> probably
> >> isn't correct, but my theory is alonge the lines of...
> >>
> >> Since the msleep() doesn't have PCATCH, it does not set TDF_SINTR
> >> and if that happens before the other threads return EINTR from 
> >> cv_wait_sig(),
> >> they no longer do so?
> >> And I thought that waking up from the msleep() via timeouts would maybe 
> >> allow
> >> the other threads to return EINTR from cv_wait_sig()?
> >>
> >> Does this make sense? rick
> >> ps: I'll post if I see the problem again with the patch applied.
> >> pss: This is a single core i386 system, just in case that might affect 
> >> this.
> >
> >No, the patch does not make sense. I think it was just coincidental that
> >with the patch you did not get the hang.
> >
> >Signals are delivered to a thread, which should take the appropriate
> >actions. For the kernel process like rpc pool, the signals are never
> >delivered, they are queued in the randomly selected thread' signal queue
> >and sit there. The interruptible sleeps are aborted in the context
> >of that thread, but nothing else happens. So if you need to make svc
> >pools properly killable, all threads must check at least for EINTR and
> >instruct other threads to exit as well.
> I'm not sure I understand what the "randomly selected thread signal
> queue" means, but it seems strange that this usually works. (The code
> is at least 10years old. Originally committed by dfr@. I've added him
> to the cc list in case he understands this? Is it that, usually, the
> threads will all return EINTR before the master one gets to where the
> msleep() happens if the count is > 0?

Signals are put onto a signal queue between a time where the signal is
generated until the thread 

Re: nfsd kernel threads won't die via SIGKILL

2018-06-24 Thread Rick Macklem
Konstantin Belousov wrote:
>On Sat, Jun 23, 2018 at 09:03:02PM +, Rick Macklem wrote:
>> During testing of the pNFS server I have been frequently killing/restarting 
>> the nfsd.
>> Once in a while, the "slave" nfsd process doesn't terminate and a "ps axHl" 
>> shows:
>>   0 48889 1   0   20  0  5884  812 svcexit  D -   0:00.01 nfsd: 
>> server
>>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
>> server
>> ... more of the same
>>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
>> server
>>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   1:51.78 nfsd: 
>> server
>>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   2:27.75 nfsd: 
>> server
>>
>> You can see that the top thread (the one that was created with the process) 
>> is
>> stuck in "D"  on "svcexit".
>> The rest of the threads are still servicing NFS RPCs. If you still have an 
>> NFS mount >>on
>> the server, the mount continues to work and the CPU time for the last two 
>> threads
>> slowly climbs, due to NFS RPC activity. A SIGKILL was posted for the process 
>> and
>> these threads (created by kthread_add) are here, but the
>> cv_wait_sig/cv_timedwait_sig never seems to return EINTR for these other 
>> >>threads.
>>
>>if (ismaster || (!ismaster &&
>> 1207  grp->sg_threadcount > grp->sg_minthreads))
>> 1208  error = cv_timedwait_sig(>st_cond,
>> 1209  >sg_lock, 5 * hz);
>> 1210  else
>> 1211  error = cv_wait_sig(>st_cond,
>> 1212  >sg_lock);
>>
>> The top thread (referred to in svc.c as "ismaster" did return from here with 
>> EINTR
>> and has now done an msleep() here, waiting for the other threads to 
>> terminate.
>>
>>/* Waiting for threads to stop. */
>> 1387  for (g = 0; g < pool->sp_groupcount; g++) {
>> 1388  grp = >sp_groups[g];
>> 1389  mtx_lock(>sg_lock);
>> 1390  while (grp->sg_threadcount > 0)
>> 1391  msleep(grp, >sg_lock, 0, "svcexit", 0);
>> 1392  mtx_unlock(>sg_lock);
>> 1393  }
>>
>> Although I can't be sure if this patch has fixed the problem because it 
>> happens
>> intermittently, I have not seen the problem since applying this patch:
>> --- rpc/svc.c.sav 2018-06-21 22:52:11.623955000 -0400
>> +++ rpc/svc.c 2018-06-22 09:01:40.271803000 -0400
>> @@ -1388,7 +1388,7 @@ svc_run(SVCPOOL *pool)
>>   grp = >sp_groups[g];
>>   mtx_lock(>sg_lock);
>>   while (grp->sg_threadcount > 0)
>> - msleep(grp, >sg_lock, 0, "svcexit", 0);
>> + msleep(grp, >sg_lock, 0, "svcexit", 1);
>>   mtx_unlock(>sg_lock);
>>   }
>>  }
>>
>> As you can see, all it does is add a timeout to the msleep().
>> I am not familiar with the signal delivery code in sleepqeue, so it probably
>> isn't correct, but my theory is alonge the lines of...
>>
>> Since the msleep() doesn't have PCATCH, it does not set TDF_SINTR
>> and if that happens before the other threads return EINTR from cv_wait_sig(),
>> they no longer do so?
>> And I thought that waking up from the msleep() via timeouts would maybe allow
>> the other threads to return EINTR from cv_wait_sig()?
>>
>> Does this make sense? rick
>> ps: I'll post if I see the problem again with the patch applied.
>> pss: This is a single core i386 system, just in case that might affect this.
>
>No, the patch does not make sense. I think it was just coincidental that
>with the patch you did not get the hang.
>
>Signals are delivered to a thread, which should take the appropriate
>actions. For the kernel process like rpc pool, the signals are never
>delivered, they are queued in the randomly selected thread' signal queue
>and sit there. The interruptible sleeps are aborted in the context
>of that thread, but nothing else happens. So if you need to make svc
>pools properly killable, all threads must check at least for EINTR and
>instruct other threads to exit as well.
I'm not sure I understand what the "randomly selected thread signal queue" 
means,
but it seems strange that this usually works. (The code is at least 10years old.
Originally committed by dfr@. I've added him to the cc list in case he 
understands
this?
Is it that, usually, the threads will all return EINTR before the master one 
gets
to where the msleep() happens if the count is > 0?

>Your description at the start of the message of the behaviour after
>SIGKILL, where other threads continued to serve RPCs, exactly matches
>above explanation. You need to add some global 'stop' flag, if it is not
>yet present, and recheck it after each RPC handled. Any thread which
>notes EINTR or does a direct check for the pending signal, should set
>the flag and wake up every other 

Re: nfsd kernel threads won't die via SIGKILL

2018-06-24 Thread Konstantin Belousov
On Sat, Jun 23, 2018 at 09:03:02PM +, Rick Macklem wrote:
> During testing of the pNFS server I have been frequently killing/restarting 
> the nfsd.
> Once in a while, the "slave" nfsd process doesn't terminate and a "ps axHl" 
> shows:
>   0 48889 1   0   20  0  5884  812 svcexit  D -   0:00.01 nfsd: 
> server 
>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> server 
> ... more of the same
>   0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: 
> server 
>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   1:51.78 nfsd: 
> server 
>   0 48889 1   0   -8  0  5884  812 rpcsvc   I -   2:27.75 nfsd: 
> server 
> 
> You can see that the top thread (the one that was created with the process) is
> stuck in "D"  on "svcexit".
> The rest of the threads are still servicing NFS RPCs. If you still have an 
> NFS mount on
> the server, the mount continues to work and the CPU time for the last two 
> threads
> slowly climbs, due to NFS RPC activity. A SIGKILL was posted for the process 
> and
> these threads (created by kthread_add) are here, but the
> cv_wait_sig/cv_timedwait_sig never seems to return EINTR for these other 
> threads.
> 
>if (ismaster || (!ismaster &&
> 1207  grp->sg_threadcount > grp->sg_minthreads))
> 1208  error = cv_timedwait_sig(>st_cond,
> 1209  >sg_lock, 5 * hz);
> 1210  else
> 1211  error = cv_wait_sig(>st_cond,
> 1212  >sg_lock);
> 
> The top thread (referred to in svc.c as "ismaster" did return from here with 
> EINTR
> and has now done an msleep() here, waiting for the other threads to terminate.
> 
>/* Waiting for threads to stop. */
> 1387  for (g = 0; g < pool->sp_groupcount; g++) {
> 1388  grp = >sp_groups[g];
> 1389  mtx_lock(>sg_lock);
> 1390  while (grp->sg_threadcount > 0)
> 1391  msleep(grp, >sg_lock, 0, "svcexit", 0);
> 1392  mtx_unlock(>sg_lock);
> 1393  }
> 
> Although I can't be sure if this patch has fixed the problem because it 
> happens
> intermittently, I have not seen the problem since applying this patch:
> --- rpc/svc.c.sav 2018-06-21 22:52:11.623955000 -0400
> +++ rpc/svc.c 2018-06-22 09:01:40.271803000 -0400
> @@ -1388,7 +1388,7 @@ svc_run(SVCPOOL *pool)
>   grp = >sp_groups[g];
>   mtx_lock(>sg_lock);
>   while (grp->sg_threadcount > 0)
> - msleep(grp, >sg_lock, 0, "svcexit", 0);
> + msleep(grp, >sg_lock, 0, "svcexit", 1);
>   mtx_unlock(>sg_lock);
>   }
>  }
> 
> As you can see, all it does is add a timeout to the msleep(). 
> I am not familiar with the signal delivery code in sleepqeue, so it probably
> isn't correct, but my theory is alonge the lines of...
> 
> Since the msleep() doesn't have PCATCH, it does not set TDF_SINTR
> and if that happens before the other threads return EINTR from cv_wait_sig(),
> they no longer do so?
> And I thought that waking up from the msleep() via timeouts would maybe allow
> the other threads to return EINTR from cv_wait_sig()?
> 
> Does this make sense? rick
> ps: I'll post if I see the problem again with the patch applied.
> pss: This is a single core i386 system, just in case that might affect this.

No, the patch does not make sense. I think it was just coincidental that
with the patch you did not get the hang.

Signals are delivered to a thread, which should take the appropriate
actions. For the kernel process like rpc pool, the signals are never
delivered, they are queued in the randomly selected thread' signal queue
and sit there. The interruptible sleeps are aborted in the context
of that thread, but nothing else happens. So if you need to make svc
pools properly killable, all threads must check at least for EINTR and
instruct other threads to exit as well.

Your description at the start of the message of the behaviour after
SIGKILL, where other threads continued to serve RPCs, exactly matches
above explanation. You need to add some global 'stop' flag, if it is not
yet present, and recheck it after each RPC handled. Any thread which
notes EINTR or does a direct check for the pending signal, should set
the flag and wake up every other thread in the pool.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


nfsd kernel threads won't die via SIGKILL

2018-06-23 Thread Rick Macklem
During testing of the pNFS server I have been frequently killing/restarting the 
nfsd.
Once in a while, the "slave" nfsd process doesn't terminate and a "ps axHl" 
shows:
  0 48889 1   0   20  0  5884  812 svcexit  D -   0:00.01 nfsd: server 
  0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: server 
... more of the same
  0 48889 1   0   40  0  5884  812 rpcsvc   I -   0:00.00 nfsd: server 
  0 48889 1   0   -8  0  5884  812 rpcsvc   I -   1:51.78 nfsd: server 
  0 48889 1   0   -8  0  5884  812 rpcsvc   I -   2:27.75 nfsd: server 

You can see that the top thread (the one that was created with the process) is
stuck in "D"  on "svcexit".
The rest of the threads are still servicing NFS RPCs. If you still have an NFS 
mount on
the server, the mount continues to work and the CPU time for the last two 
threads
slowly climbs, due to NFS RPC activity. A SIGKILL was posted for the process and
these threads (created by kthread_add) are here, but the
cv_wait_sig/cv_timedwait_sig never seems to return EINTR for these other 
threads.

   if (ismaster || (!ismaster &&
1207grp->sg_threadcount > grp->sg_minthreads))
1208error = cv_timedwait_sig(>st_cond,
1209>sg_lock, 5 * hz);
1210else
1211error = cv_wait_sig(>st_cond,
1212>sg_lock);

The top thread (referred to in svc.c as "ismaster" did return from here with 
EINTR
and has now done an msleep() here, waiting for the other threads to terminate.

   /* Waiting for threads to stop. */
1387for (g = 0; g < pool->sp_groupcount; g++) {
1388grp = >sp_groups[g];
1389mtx_lock(>sg_lock);
1390while (grp->sg_threadcount > 0)
1391msleep(grp, >sg_lock, 0, "svcexit", 0);
1392mtx_unlock(>sg_lock);
1393}

Although I can't be sure if this patch has fixed the problem because it happens
intermittently, I have not seen the problem since applying this patch:
--- rpc/svc.c.sav   2018-06-21 22:52:11.623955000 -0400
+++ rpc/svc.c   2018-06-22 09:01:40.271803000 -0400
@@ -1388,7 +1388,7 @@ svc_run(SVCPOOL *pool)
grp = >sp_groups[g];
mtx_lock(>sg_lock);
while (grp->sg_threadcount > 0)
-   msleep(grp, >sg_lock, 0, "svcexit", 0);
+   msleep(grp, >sg_lock, 0, "svcexit", 1);
mtx_unlock(>sg_lock);
}
 }

As you can see, all it does is add a timeout to the msleep(). 
I am not familiar with the signal delivery code in sleepqeue, so it probably
isn't correct, but my theory is alonge the lines of...

Since the msleep() doesn't have PCATCH, it does not set TDF_SINTR
and if that happens before the other threads return EINTR from cv_wait_sig(),
they no longer do so?
And I thought that waking up from the msleep() via timeouts would maybe allow
the other threads to return EINTR from cv_wait_sig()?

Does this make sense? rick
ps: I'll post if I see the problem again with the patch applied.
pss: This is a single core i386 system, just in case that might affect this.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"