Re: wait skips signals but first one

2024-02-05 Thread Chet Ramey

On 2/5/24 12:22 PM, Mykyta Dorokhin wrote:


Note 1: forgot to mention that I'm cross-compiling.
Note 2: it probably makes sense to add a warning or something that states 
that HAVE_POSIX_SIGSETJMP disabled due to cross-compiling.


The autoconf macro that tests for this (BASH_FUNC_POSIX_SETJMP) prints a
warning if cross-compiling and defaults to the same setting as whether or
not it thinks it has POSIX signals available (bash_cv_posix_signals).

You probably didn't notice it the first time and used the cached value
from then on.


Thank you  for your time! You are doing a great job!


Thanks for your kind words.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re[2]: wait skips signals but first one

2024-02-05 Thread Mykyta Dorokhin
Hello again,

configure log says:

checking if getcwd() will dynamically allocate memory with 0 size... (cached) 
yes
checking for presence of POSIX-style sigsetjmp/siglongjmp... (cached) missing
checking whether or not strcoll and strcmp differ... (cached) no

This is most likelly the problem.

Note 1: forgot to mention that I'm cross-compiling. 
Note 2: it probably makes sense to add a warning or something that states that 
HAVE_POSIX_SIGSETJMP disabled due to cross-compiling.

Will try to find a way to fix this.

Thank you  for your time! You are doing a great job!


M




5 февраля 2024, 16:28:36, от "Chet Ramey" :

On 2/3/24 7:01 PM, Mykyta Dorokhin wrote:

> There is a line in trap.c with your change. If I revert it then everything 
> works again:
> 
> - if (interrupt_immediately && wait_intr_flag)
> + if (/* interrupt_immediately && */wait_intr_flag)
> 
> So if I put interrupt_immediately back and rebuild the code with thes only 
> fix then it starts working properly, signals are getting received as expected.

OK. Let's look at that. By this time, interrupt_immediately was no longer
set anywhere, so the code before this change did nothing but inhibit the
siglongjmp/longjmp call from trap_handler, which means the sighandler
returned and (possibly) did not interrupt the wait builtin.

That is what this means (replace SIGINT with SIGUSR1 here):

> The one change that might make a difference is a bug fix: if the wait
> builtin is waiting for a process and receives a trapped signal, it's
> supposed to cause wait to return immediately and then run the trap. Bash
> didn't do that consistently for SIGINT, and would run the trap when it
> shouldn't, or before it should, and sometimes not return from the wait
> at all. So maybe the longjmp back to the wait builtin is what changed
> things, even though longjmp is one of the functions that POSIX says is
> safe to call from a signal handler context, and it restores the signal
> mask if you're running on a system that has sigsetjmp/siglongjmp.

So the effect of this change is to longjmp/siglongjmp back to the wait
builtin, so it can return from there before running the trap. If you use
siglongjmp, it restores the original signal mask (look at the wait
builtin's call to setjmp_sigs, a macro that calls sigsetjmp with 1 as the
second argument), which means the trapped signal is no longer blocked.

Since this works as intended on all other systems, I would check to see
if your system has sigsetjmp/siglongjmp and whether or not they are
behaving correctly.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/


Re: wait skips signals but first one

2024-02-05 Thread Chet Ramey

On 2/3/24 7:01 PM, Mykyta Dorokhin wrote:

There is a line in trap.c with your change. If I revert it then everything 
works again:


- if (interrupt_immediately && wait_intr_flag)
+ if (/* interrupt_immediately && */wait_intr_flag)

So if I put interrupt_immediately back and rebuild the code with thes only 
fix then it starts working properly, signals are getting received as expected.


OK. Let's look at that. By this time, interrupt_immediately was no longer
set anywhere, so the code before this change did nothing but inhibit the
siglongjmp/longjmp call from trap_handler, which means the sighandler
returned and (possibly) did not interrupt the wait builtin.

That is what this means (replace SIGINT with SIGUSR1 here):


The one change that might make a difference is a bug fix: if the wait
builtin is waiting for a process and receives a trapped signal, it's
supposed to cause wait to return immediately and then run the trap. Bash
didn't do that consistently for SIGINT, and would run the trap when it
shouldn't, or before it should, and sometimes not return from the wait
at all. So maybe the longjmp back to the wait builtin is what changed
things, even though longjmp is one of the functions that POSIX says is
safe to call from a signal handler context, and it restores the signal
mask if you're running on a system that has sigsetjmp/siglongjmp.


So the effect of this change is to longjmp/siglongjmp back to the wait
builtin, so it can return from there before running the trap. If you use
siglongjmp, it restores the original signal mask (look at the wait
builtin's call to setjmp_sigs, a macro that calls sigsetjmp with 1 as the
second argument), which means the trapped signal is no longer blocked.

Since this works as intended on all other systems, I would check to see
if your system has sigsetjmp/siglongjmp and whether or not they are
behaving correctly.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re[3]: wait skips signals but first one

2024-02-04 Thread Mykyta Dorokhin
I've just tried the same with 5.1.16 andit also fixes the problem.

diff --git a/trap.c b/trap.c
index 1b27fb3..65e8f66 100644
--- a/trap.c
+++ b/trap.c
@@ -526,7 +526,7 @@ trap_handler (sig)
      if (this_shell_builtin && (this_shell_builtin == wait_builtin))
 {
   wait_signal_received = sig;
-   if (waiting_for_child && wait_intr_flag)
+   if (interrupt_immediately && waiting_for_child && wait_intr_flag)
     sh_longjmp (wait_intr_buf, 1);
 }

--
2.25.1

Mykyta


4 февраля 2024, 02:01:28, от "Mykyta Dorokhin" :

Hello,

Again, I'm on ("commit bash-20200221 snapshot") commit, the one I  think breaks 
things:

https://git.savannah.gnu.org/cgit/bash.git/commit/?h=devel=0df4ddca3f371bc258fe4185cdec36fce3e7be7b

There is a line in trap.c with your change. If I revert it then everything 
works again:

- if (interrupt_immediately && wait_intr_flag)
+ if (/* interrupt_immediately && */wait_intr_flag)

So if I put interrupt_immediately back and rebuild the code with thes only fix 
then it starts working properly, signals are getting received as expected.

Can you comment? Maybe you want me to provide some additional debug info?

Thank you,
Mykyta




3 февраля 2024, 22:09:33, от "Chet Ramey" :

On 2/3/24 10:00 AM, Mykyta Dorokhin wrote:

> I have found the commit on devel branch which breaks things for me (and 
> probably other Yocto-based builds):
> 
> This one still works
> ==
> 
> commit 89d788fb0152724a93e0fdab8c15116e5c76572b
> Author: Chet Ramey 
> Date:   Mon Feb 17 11:41:35 2020 -0500
> 
>     commit bash-20200214 snapshot
> 
> This one not
> ==
> 
> 
> commit 0df4ddca3f371bc258fe4185cdec36fce3e7be7b
> Author: Chet Ramey 
> Date:   Mon Feb 24 10:41:37 2020 -0500
> 
>     commit bash-20200221 snapshot
> 
> 
> 
> Please take a look. Maybe you'll notice something suspicious there. I don't 
> know... uninitialized variables, endian-dependent code, etc.

There are changes there, of course, but it's hard to see how they make a
difference. The wait builtin was changed not to interrupt the wait for a
trapped SIGCHLD, but to delay running any SIGCHLD trap until the wait
exited. Since your example doesn't trap SIGCHLD, it doesn't seem
significant. Any other trapped signal still interrupts the wait. Subshells
clear the process substitution FIFO list, but you're not using process
substitution.

The one change that might make a difference is a bug fix: if the wait
builtin is waiting for a process and receives a trapped signal, it's
supposed to cause wait to return immediately and then run the trap. Bash
didn't do that consistently for SIGINT, and would run the trap when it
shouldn't, or before it should, and sometimes not return from the wait
at all. So maybe the longjmp back to the wait builtin is what changed
things, even though longjmp is one of the functions that POSIX says is
safe to call from a signal handler context, and it restores the signal
mask if you're running on a system that has sigsetjmp/siglongjmp.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/


Re[2]: wait skips signals but first one

2024-02-03 Thread Mykyta Dorokhin
Hello,

Again, I'm on ("commit bash-20200221 snapshot") commit, the one I  think breaks 
things:

https://git.savannah.gnu.org/cgit/bash.git/commit/?h=devel=0df4ddca3f371bc258fe4185cdec36fce3e7be7b

There is a line in trap.c with your change. If I revert it then everything 
works again:

- if (interrupt_immediately && wait_intr_flag)
+ if (/* interrupt_immediately && */wait_intr_flag)

So if I put interrupt_immediately back and rebuild the code with thes only fix 
then it starts working properly, signals are getting received as expected.

Can you comment? Maybe you want me to provide some additional debug info?

Thank you,
Mykyta




3 февраля 2024, 22:09:33, от "Chet Ramey" :

On 2/3/24 10:00 AM, Mykyta Dorokhin wrote:

> I have found the commit on devel branch which breaks things for me (and 
> probably other Yocto-based builds):
> 
> This one still works
> ==
> 
> commit 89d788fb0152724a93e0fdab8c15116e5c76572b
> Author: Chet Ramey 
> Date:   Mon Feb 17 11:41:35 2020 -0500
> 
>     commit bash-20200214 snapshot
> 
> This one not
> ==
> 
> 
> commit 0df4ddca3f371bc258fe4185cdec36fce3e7be7b
> Author: Chet Ramey 
> Date:   Mon Feb 24 10:41:37 2020 -0500
> 
>     commit bash-20200221 snapshot
> 
> 
> 
> Please take a look. Maybe you'll notice something suspicious there. I don't 
> know... uninitialized variables, endian-dependent code, etc.

There are changes there, of course, but it's hard to see how they make a
difference. The wait builtin was changed not to interrupt the wait for a
trapped SIGCHLD, but to delay running any SIGCHLD trap until the wait
exited. Since your example doesn't trap SIGCHLD, it doesn't seem
significant. Any other trapped signal still interrupts the wait. Subshells
clear the process substitution FIFO list, but you're not using process
substitution.

The one change that might make a difference is a bug fix: if the wait
builtin is waiting for a process and receives a trapped signal, it's
supposed to cause wait to return immediately and then run the trap. Bash
didn't do that consistently for SIGINT, and would run the trap when it
shouldn't, or before it should, and sometimes not return from the wait
at all. So maybe the longjmp back to the wait builtin is what changed
things, even though longjmp is one of the functions that POSIX says is
safe to call from a signal handler context, and it restores the signal
mask if you're running on a system that has sigsetjmp/siglongjmp.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/


Re: wait skips signals but first one

2024-02-03 Thread Chet Ramey

On 2/3/24 10:28 AM, Mykyta Dorokhin wrote:


Analysis with strace.

After receiving SIGUSR1, Debian only blocks SIGCHLD, then clears the block:

205295 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=205327, 
si_uid=1040} ---

205295 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
205295 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f063bdb5fd0}, 
{sa_handler=0x5637247940b0, sa_mask=[], sa_flags=SA_RESTORER, 
sa_restorer=0x7f063bdb5fd0}, 8) = 0

205295 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
205295 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0  # unblocks all signalas


The above is the correct action.

On our device, it blocks SIGUSR1 as well as SIGCHLD and keeps doing it over 
and over again:


One explanation for this is SIGUSR1 being blocked when the shell is
invoked. Another is that sigsetjmp/siglongjmp are either not available
(or configure doesn't think they are) or don't properly save and restore
the signal mask.



6707  --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=6724, 
si_uid=0} ---

6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  rt_sigprocmask(SIG_BLOCK, NULL, [USR1 CHLD], 8) = 0
6707  write(1, ">>> TRAPPED USR1 <<<\n", 21) = 21
6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  write(1, "Iteration\n", 10)       = 10


On modern systems, the OS blocks the signal that is caught during signal 
handling, and unblocks so that signal handlers are not called recursively. 
  The exception to this is if SA_NODEFER is set. On some very old UNIX 
systems you had to block the signal yourself, and there was a small window 
where things could go wrong. I suspect BASH probably has a build option to 
allow blocking signals in handlers for compatibility with other systems, 
and is not being built correctly for Linux. 


Bash does have an autoconf test for this, but it didn't change as part
of this push. You can check what MUST_REINSTALL_SIGHANDLERS is set to
in config.h, but I suspect it won't be different.

And `not being built correctly for Linux' would mean your Debian and my
Red Hat tests would fail.


I suspect on those very old 
systems the signal was automatically unblocked on return, but is not done 
here, because the POSIX sigprocmask is called, which requires calling it 
again to unblock the signal in Linux.  And since wait is restarted, it 
never is unblocked.


If you mean wait(2), it doesn't get restarted. waitpid(2) will return
-1/EINTR since it received a caught signal.



According to strace no additional user flags are set when the BASH signal 
handler is put in place for SIGUSR1.


Correct, the trap signal handler doesn't assume that system calls are
restarted.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: wait skips signals but first one

2024-02-03 Thread Chet Ramey

On 2/3/24 10:00 AM, Mykyta Dorokhin wrote:

I have found the commit on devel branch which breaks things for me (and 
probably other Yocto-based builds):


This one still works
==

commit 89d788fb0152724a93e0fdab8c15116e5c76572b
Author: Chet Ramey 
Date:   Mon Feb 17 11:41:35 2020 -0500

    commit bash-20200214 snapshot

This one not
==


commit 0df4ddca3f371bc258fe4185cdec36fce3e7be7b
Author: Chet Ramey 
Date:   Mon Feb 24 10:41:37 2020 -0500

    commit bash-20200221 snapshot



Please take a look. Maybe you'll notice something suspicious there. I don't 
know... uninitialized variables, endian-dependent code, etc.


There are changes there, of course, but it's hard to see how they make a
difference. The wait builtin was changed not to interrupt the wait for a
trapped SIGCHLD, but to delay running any SIGCHLD trap until the wait
exited. Since your example doesn't trap SIGCHLD, it doesn't seem
significant. Any other trapped signal still interrupts the wait. Subshells
clear the process substitution FIFO list, but you're not using process
substitution.

The one change that might make a difference is a bug fix: if the wait
builtin is waiting for a process and receives a trapped signal, it's
supposed to cause wait to return immediately and then run the trap. Bash
didn't do that consistently for SIGINT, and would run the trap when it
shouldn't, or before it should, and sometimes not return from the wait
at all. So maybe the longjmp back to the wait builtin is what changed
things, even though longjmp is one of the functions that POSIX says is
safe to call from a signal handler context, and it restores the signal
mask if you're running on a system that has sigsetjmp/siglongjmp.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re[2]: wait skips signals but first one

2024-02-03 Thread Mykyta Dorokhin
Hello again,


Here is another analysis that my collegue made on the issue:



Bash Compiled for wrong OS?

Analysis with strace.

After receiving SIGUSR1, Debian only blocks SIGCHLD, then clears the block:

205295 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=205327, 
si_uid=1040} ---
205295 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
205295 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f063bdb5fd0}, {sa_handler=0x5637247940b0, 
sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f063bdb5fd0}, 8) = 0
205295 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
205295 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0  # unblocks all signalas


The above is the correct action.

On our device, it blocks SIGUSR1 as well as SIGCHLD and keeps doing it over and 
over again:

6707  --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=6724, si_uid=0} ---
6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  rt_sigprocmask(SIG_BLOCK, NULL, [USR1 CHLD], 8) = 0
6707  write(1, ">>> TRAPPED USR1 <<<\n", 21) = 21
6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  write(1, "Iteration\n", 10)       = 10
6707  rt_sigprocmask(SIG_BLOCK, NULL, [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_BLOCK, [INT TERM CHLD], [USR1 CHLD], 8) = 0
6707  clone(child_stack=NULL, 
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x76fe9028) 
= 6725
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigprocmask(SIG_SETMASK, [USR1 CHLD], NULL, 8) = 0
6707  rt_sigprocmask(SIG_BLOCK, [CHLD], [USR1 CHLD], 8) = 0
6707  rt_sigaction(SIGINT, {sa_handler=0x46e15, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x76e90711},  
6707  <... rt_sigaction resumed>{sa_handler=0x46e15, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x76e90711}, 8) = 0
6707  wait4(-1,  

On modern systems, the OS blocks the signal that is caught during signal 
handling, and unblocks so that signal handlers are not called recursively.  The 
exception to this is if SA_NODEFER is set. On some very old UNIX systems you 
had to block the signal yourself, and there was a small window where things 
could go wrong. I suspect BASH probably has a build option to allow blocking 
signals in handlers for compatibility with other systems, and is not being 
built correctly for Linux. I suspect on those very old systems the signal was 
automatically unblocked on return, but is not done here, because the POSIX 
sigprocmask is called, which requires calling it again to unblock the signal in 
Linux.  And since wait is restarted, it never is unblocked.

According to strace no additional user flags are set when the BASH signal 
handler is put in place for SIGUSR1.

We need to look at bash build options, and possible the signal handling code, 
and sigprocmask or whatever C API they are using to call sigprocmask().


Re[2]: wait skips signals but first one

2024-02-03 Thread Mykyta Dorokhin


Like you, I can't reproduce it on the desktop platforms I have available
right now.

The bash devel git branch has fairly fine granularity. If you can automate
the signal sending somewhat, maybe by having a child process send signals
to $$, you could use your script and `git bisect' to find the commit where
the behavior changed. bash-5.0 was frozen 12/31/2018, and bash-5.1 was
frozen 12/14/2020, so that should get you started with the devel branch
commits you want to inspect.

http://git.savannah.gnu.org/cgit/bash.git/log/?h=devel

I have found the commit on devel branch which breaks things for me (and 
probably other Yocto-based builds):

This one still works
==

commit 89d788fb0152724a93e0fdab8c15116e5c76572b
Author: Chet Ramey 
Date:   Mon Feb 17 11:41:35 2020 -0500

   commit bash-20200214 snapshot

This one not
==


commit 0df4ddca3f371bc258fe4185cdec36fce3e7be7b
Author: Chet Ramey 
Date:   Mon Feb 24 10:41:37 2020 -0500

   commit bash-20200221 snapshot


Please take a look. Maybe you'll notice something suspicious there. I don't 
know... uninitialized variables, endian-dependent code, etc.


Thank you,
Mykyta
 






Re: wait skips signals but first one

2024-01-08 Thread Chet Ramey

On 1/5/24 2:46 PM, Mykyta Dorokhin wrote:


Bash Version: 5.1
Patch Level: 16
Release Status: release

Description:


I'm working on a custom project within the Yocto framework. After a recent 
build system update, the bash
version updated to 5.1.16. Subsequently, I've noticed peculiar side effects 
related to using 'wait' and signals.
Below is a script demonstrating the issue.

The problem lies in the fact that, in the case of waiting on 'wait' in the 
script, only the first signal
interrupts 'wait'; subsequent signals of the same type do not interrupt 'wait', 
and it remains blocked.

I manually switched bash versions and compiled a table (bash version - Yocto 
version):

5.0.18 (dunfell): No issues
5.1.4 (hardknott): Has issues
5.1.8 (honister): Has issues
5.1.16 (kirkstone): Has issues
5.2.21 (master): Has issues

Meanwhile, in my home desktop distribution, Ubuntu 22.04, I tested the same 
scenario, and everything works correctly; signals are processed as expected.

My assumption is that the problem may be related to my Yocto build being 
intended for a 32-bit device, and perhaps the bug only manifests in this case.


The only way I could see a potential problem was if you're using a 32-bit
build on a 64-bit device.



I consider this a significant issue. Could you confirm whether any testing has 
been conducted on 32-bit platforms,
as it seems that everything works correctly on 64-bit desktops?


I don't have any 32-bit platforms available for testing, but I have a hard
time believing that it would make a difference.

Like you, I can't reproduce it on the desktop platforms I have available
right now.

The bash devel git branch has fairly fine granularity. If you can automate
the signal sending somewhat, maybe by having a child process send signals
to $$, you could use your script and `git bisect' to find the commit where
the behavior changed. bash-5.0 was frozen 12/31/2018, and bash-5.1 was
frozen 12/14/2020, so that should get you started with the devel branch
commits you want to inspect.

http://git.savannah.gnu.org/cgit/bash.git/log/?h=devel


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


wait skips signals but first one

2024-01-05 Thread Mykyta Dorokhin


From: ki...@ukr.net
To: bug-bash@gnu.org
Subject: wait skips signals but first one

Configuration Information [Automatically generated, do not change]:
Machine: arm
OS: linux-gnueabi
Compiler: arm-mydistro-linux-gnueabi-gcc  -mthumb -mfpu=neon -mfloat-abi=hard 
-mcpu=cortex-a7
Compilation CFLAGS:  -O2 -pipe -g -feliminate-unused-debug-types  
-DNON_INTERACTIVE_LOGIN_SHELLS -DHEREDOC_PIPESIZE=65536 
-DBRACKETED_PASTE_DEFAULT=0
uname output: Linux mtcap3 5.15.87 #1 PREEMPT Tue Sep 19 16:09:21 UTC 2023 
armv7l GNU/Linux
Machine Type: arm-mydistro-linux-gnueabi

Bash Version: 5.1
Patch Level: 16
Release Status: release

Description:


I'm working on a custom project within the Yocto framework. After a recent 
build system update, the bash
version updated to 5.1.16. Subsequently, I've noticed peculiar side effects 
related to using 'wait' and signals.
Below is a script demonstrating the issue.

The problem lies in the fact that, in the case of waiting on 'wait' in the 
script, only the first signal
interrupts 'wait'; subsequent signals of the same type do not interrupt 'wait', 
and it remains blocked.

I manually switched bash versions and compiled a table (bash version - Yocto 
version):

5.0.18 (dunfell): No issues
5.1.4 (hardknott): Has issues
5.1.8 (honister): Has issues
5.1.16 (kirkstone): Has issues
5.2.21 (master): Has issues

Meanwhile, in my home desktop distribution, Ubuntu 22.04, I tested the same 
scenario, and everything works correctly; signals are processed as expected.

My assumption is that the problem may be related to my Yocto build being 
intended for a 32-bit device, and perhaps the bug only manifests in this case.

I consider this a significant issue. Could you confirm whether any testing has 
been conducted on 32-bit platforms,
as it seems that everything works correctly on 64-bit desktops?

Repeat-By:

Here is a simple script:

#!/bin/bash

trap 'echo ">>> TRAPPED HUP <<<"' HUP
trap 'echo ">>> TRAPPED USR1 <<<"' USR1

while true; do
 echo "Iteration"
 /bin/sleep 15 &
 wait $!
done


When I send HUP or USR1 multiple times I see the TRAPPED message only once.

strace says:

rt_sigprocmask(SIG_BLOCK, [CHLD], [HUP USR1 CHLD], 8) = 0

Meaning that HUP and USR1 remain blocked.