The recent change of initializing sls_sig = 0 in sleep_setup() (r1.164 of kern_synch.c) has introduced a regression with execve(2). execve(2) sets the current process in single thread mode for replacing the program image. In this mode, sleep_setup_signal() cancels a sleep, making tsleep(9) with PCATCH return immediately. Previously, when sls_sig was initialized with value 1, tsleep() with PCATCH returned EINTR during exec. Now it returns zero.
The previous behaviour was not exactly right but did not seem to cause apparent harm. The current situation is different. For example, the call chain shown below can hang. The sleep does not actually happen so vwaitforio() can spin because the pending I/O does not get finished. tsleep vwaitforio nfs_flush nfs_close VOP_CLOSE vn_closefile fdrop closef fdcloseexec sys_execve The hang was reported and offending commit located by Pavel Korovin. I think the proper way to fix the problem is to tighten the conditions when sleep_setup_signal() cancels the sleep. Instead of checking p->p_p->ps_single != NULL, which affects the whole process, the thread should determine if P_SUSPSINGLE is set in p->p_flag. This limits the cancelling to the threads that have been suspended for single thread mode. The thread that runs execve(2) does not have the flag set. OK? Index: kern/kern_synch.c =================================================================== RCS file: src/sys/kern/kern_synch.c,v retrieving revision 1.166 diff -u -p -r1.166 kern_synch.c --- kern/kern_synch.c 20 Mar 2020 17:13:51 -0000 1.166 +++ kern/kern_synch.c 22 Mar 2020 04:25:32 -0000 @@ -480,7 +480,7 @@ sleep_setup_signal(struct sleep_state *s * stopped, p->p_wchan will be 0 upon return from CURSIG. */ atomic_setbits_int(&p->p_flag, P_SINTR); - if (p->p_p->ps_single != NULL || (sls->sls_sig = CURSIG(p)) != 0) { + if ((p->p_flag & P_SUSPSINGLE) || (sls->sls_sig = CURSIG(p)) != 0) { unsleep(p); p->p_stat = SONPROC; sls->sls_do_sleep = 0;