Re: wait skips signals but first one
On 2/5/24 12:22 PM, Mykyta Dorokhin wrote: Note 1: forgot to mention that I'm cross-compiling. Note 2: it probably makes sense to add a warning or something that states that HAVE_POSIX_SIGSETJMP disabled due to cross-compiling. The autoconf macro that tests for this (BASH_FUNC_POSIX_SETJMP) prints a warning if cross-compiling and defaults to the same setting as whether or not it thinks it has POSIX signals available (bash_cv_posix_signals). You probably didn't notice it the first time and used the cached value from then on. Thank you for your time! You are doing a great job! Thanks for your kind words. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: wait skips signals but first one
On 2/3/24 7:01 PM, Mykyta Dorokhin wrote: There is a line in trap.c with your change. If I revert it then everything works again: - if (interrupt_immediately && wait_intr_flag) + if (/* interrupt_immediately && */wait_intr_flag) So if I put interrupt_immediately back and rebuild the code with thes only fix then it starts working properly, signals are getting received as expected. OK. Let's look at that. By this time, interrupt_immediately was no longer set anywhere, so the code before this change did nothing but inhibit the siglongjmp/longjmp call from trap_handler, which means the sighandler returned and (possibly) did not interrupt the wait builtin. That is what this means (replace SIGINT with SIGUSR1 here): The one change that might make a difference is a bug fix: if the wait builtin is waiting for a process and receives a trapped signal, it's supposed to cause wait to return immediately and then run the trap. Bash didn't do that consistently for SIGINT, and would run the trap when it shouldn't, or before it should, and sometimes not return from the wait at all. So maybe the longjmp back to the wait builtin is what changed things, even though longjmp is one of the functions that POSIX says is safe to call from a signal handler context, and it restores the signal mask if you're running on a system that has sigsetjmp/siglongjmp. So the effect of this change is to longjmp/siglongjmp back to the wait builtin, so it can return from there before running the trap. If you use siglongjmp, it restores the original signal mask (look at the wait builtin's call to setjmp_sigs, a macro that calls sigsetjmp with 1 as the second argument), which means the trapped signal is no longer blocked. Since this works as intended on all other systems, I would check to see if your system has sigsetjmp/siglongjmp and whether or not they are behaving correctly. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: wait skips signals but first one
On 2/3/24 10:28 AM, Mykyta Dorokhin wrote: Analysis with strace. After receiving SIGUSR1, Debian only blocks SIGCHLD, then clears the block: 205295 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=205327, si_uid=1040} --- 205295 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 205295 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f063bdb5fd0}, {sa_handler=0x5637247940b0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f063bdb5fd0}, 8) = 0 205295 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 205295 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 # unblocks all signalas The above is the correct action. On our device, it blocks SIGUSR1 as well as SIGCHLD and keeps doing it over and over again: One explanation for this is SIGUSR1 being blocked when the shell is invoked. Another is that sigsetjmp/siglongjmp are either not available (or configure doesn't think they are) or don't properly save and restore the signal mask. 6707 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=6724, si_uid=0} --- 6707 rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0 6707 rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0 6707 rt_sigprocmask(SIG_BLOCK, NULL, [USR1 CHLD], 8) = 0 6707 write(1, ">>> TRAPPED USR1 <<<\n", 21) = 21 6707 rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0 6707 rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0 6707 rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0 6707 rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0 6707 write(1, "Iteration\n", 10) = 10 On modern systems, the OS blocks the signal that is caught during signal handling, and unblocks so that signal handlers are not called recursively. The exception to this is if SA_NODEFER is set. On some very old UNIX systems you had to block the signal yourself, and there was a small window where things could go wrong. I suspect BASH probably has a build option to allow blocking signals in handlers for compatibility with other systems, and is not being built correctly for Linux. Bash does have an autoconf test for this, but it didn't change as part of this push. You can check what MUST_REINSTALL_SIGHANDLERS is set to in config.h, but I suspect it won't be different. And `not being built correctly for Linux' would mean your Debian and my Red Hat tests would fail. I suspect on those very old systems the signal was automatically unblocked on return, but is not done here, because the POSIX sigprocmask is called, which requires calling it again to unblock the signal in Linux. And since wait is restarted, it never is unblocked. If you mean wait(2), it doesn't get restarted. waitpid(2) will return -1/EINTR since it received a caught signal. According to strace no additional user flags are set when the BASH signal handler is put in place for SIGUSR1. Correct, the trap signal handler doesn't assume that system calls are restarted. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: wait skips signals but first one
On 2/3/24 10:00 AM, Mykyta Dorokhin wrote: I have found the commit on devel branch which breaks things for me (and probably other Yocto-based builds): This one still works == commit 89d788fb0152724a93e0fdab8c15116e5c76572b Author: Chet Ramey Date: Mon Feb 17 11:41:35 2020 -0500 commit bash-20200214 snapshot This one not == commit 0df4ddca3f371bc258fe4185cdec36fce3e7be7b Author: Chet Ramey Date: Mon Feb 24 10:41:37 2020 -0500 commit bash-20200221 snapshot Please take a look. Maybe you'll notice something suspicious there. I don't know... uninitialized variables, endian-dependent code, etc. There are changes there, of course, but it's hard to see how they make a difference. The wait builtin was changed not to interrupt the wait for a trapped SIGCHLD, but to delay running any SIGCHLD trap until the wait exited. Since your example doesn't trap SIGCHLD, it doesn't seem significant. Any other trapped signal still interrupts the wait. Subshells clear the process substitution FIFO list, but you're not using process substitution. The one change that might make a difference is a bug fix: if the wait builtin is waiting for a process and receives a trapped signal, it's supposed to cause wait to return immediately and then run the trap. Bash didn't do that consistently for SIGINT, and would run the trap when it shouldn't, or before it should, and sometimes not return from the wait at all. So maybe the longjmp back to the wait builtin is what changed things, even though longjmp is one of the functions that POSIX says is safe to call from a signal handler context, and it restores the signal mask if you're running on a system that has sigsetjmp/siglongjmp. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: wait skips signals but first one
On 1/5/24 2:46 PM, Mykyta Dorokhin wrote: Bash Version: 5.1 Patch Level: 16 Release Status: release Description: I'm working on a custom project within the Yocto framework. After a recent build system update, the bash version updated to 5.1.16. Subsequently, I've noticed peculiar side effects related to using 'wait' and signals. Below is a script demonstrating the issue. The problem lies in the fact that, in the case of waiting on 'wait' in the script, only the first signal interrupts 'wait'; subsequent signals of the same type do not interrupt 'wait', and it remains blocked. I manually switched bash versions and compiled a table (bash version - Yocto version): 5.0.18 (dunfell): No issues 5.1.4 (hardknott): Has issues 5.1.8 (honister): Has issues 5.1.16 (kirkstone): Has issues 5.2.21 (master): Has issues Meanwhile, in my home desktop distribution, Ubuntu 22.04, I tested the same scenario, and everything works correctly; signals are processed as expected. My assumption is that the problem may be related to my Yocto build being intended for a 32-bit device, and perhaps the bug only manifests in this case. The only way I could see a potential problem was if you're using a 32-bit build on a 64-bit device. I consider this a significant issue. Could you confirm whether any testing has been conducted on 32-bit platforms, as it seems that everything works correctly on 64-bit desktops? I don't have any 32-bit platforms available for testing, but I have a hard time believing that it would make a difference. Like you, I can't reproduce it on the desktop platforms I have available right now. The bash devel git branch has fairly fine granularity. If you can automate the signal sending somewhat, maybe by having a child process send signals to $$, you could use your script and `git bisect' to find the commit where the behavior changed. bash-5.0 was frozen 12/31/2018, and bash-5.1 was frozen 12/14/2020, so that should get you started with the devel branch commits you want to inspect. http://git.savannah.gnu.org/cgit/bash.git/log/?h=devel -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature