Re: Segfault on recursive trap/kill
On Mon, Oct 08, 2018 at 11:59:53 -0600, Bob Proulx wrote: > Some of those must be maintaining the stack in program data space > instead of in the machine stack space. (shrug) Indeed. But as I mentioned in another comment, such an implementation detail shouldn't matter to the user IMO. > My interpretation of the above is that you would want bash to use > libunwind (or whatever is appropriate) to set up a stack overflow > exception trap in order to handle stack overflow specially and then to > make an improved error reporting to the user when it happens. Either handle SIGSEGV and output a user-friendly message or do something like FUNCNEST does today. libunwind wouldn't be necessary, but I don't know enough about it to say whether or not it may be useful. But seeing as how the segfault isn't a bug after all (I'd consider it a lack of a feature to provide the user with a more useful message), I'm no longer concerned. But if someone _is_ interested in providing an improvement, I think it'd be a good one to have. I unfortunately am stretched far too thin to work on a patch. >> I also agree. But the context is very different. Shell is a very, >> very high-level language. > > My mind reels at the statement, "Shell is a very, very high-level > language." What?! The shell is a very simple low level command and > control language. It is very good at what it does. But if one needs > to do high level things then one should switch over to a high level > language. :-) I mean "high level" in the sense that machine code is low-level, x86 assembly is somewhat high-level (because it was designed for use by a programmer and includes many conveniences for doing so), C is high-level, Perl/Python/etc are very high-level, and Bash is so high-level that you're dealing with process manipulation---something very far abstracted from how a computer actually works. So, high-level in the sense of: https://en.wikipedia.org/wiki/High-level_programming_language > After sifting out the non-useful information. :-) That information was useful information regardless of whether I was aware of it. :) I'm sure I'm not the only person who read your message. > You are always eloquent! :-) You as well! -- Mike Gerwitz signature.asc Description: PGP signature
Re: Segfault on recursive trap/kill
Mike Gerwitz wrote: > Bob Proulx wrote: > > Let me give the discussion this way and I think you will be > > convinced. :-) > > Well, thanks for taking the time for such a long reply. :) > > > How is your example any different from a C program? Or Perl, Python, > > Ruby, and so forth? All of those also allow infinite recursion and > > the kernel will terminate them with a segfault. Because all of those > > also allow infinite recursion. A program that executes an infinite > > recursion would use infinite stack space. But real machines have a > > finite amount of stack available and therefore die when the stack is > > exceeded. > > I expect this behavior when writing in C, certainly. But in languages > where the user does not deal with memory management, I'm used to a more > graceful abort when the stack gets out of control. A segfault means > something to a C hacker. It means very little to users who are > unfamiliar with the concepts that you were describing. > > I don't have enough experience with Perl, Python, or Ruby to know how > they handle stack issues. But, out of interest, I gave it a try: Some of those must be maintaining the stack in program data space instead of in the machine stack space. (shrug) > $ perl -e 'sub foo() { foo(); }; foo()' > Out of memory! Perl apparently will let you use all available memory without limitation. > $ python <<< 'def foo(): > > foo() > >foo()' |& tail -n1 > RuntimeError: maximum recursion depth exceeded >... I wonder how they decide to set the depth. > $ php -r 'function foo() { foo(); } foo();' > > Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to > allocate 262144 bytes) in Command line code on line 1 134M seems arbitrarily small but at least it does say exactly what it is doing there. > $ guile -e '(let x () (+ (x)))' > allocate_stack failed: Cannot allocate memory > Warning: Unwind-only `stack-overflow' exception; skipping pre-unwind > handler. > > $ emacs --batch --eval '(message (defun foo () (foo)) (foo))' > Lisp nesting exceeds ‘max-lisp-eval-depth’ I think these two are different. It looks like guile is using libunwind to set up a stack exception handler whereas emacs appears to be using a tracked max-lisp-eval-depth variable defaulting to 800 on my system in emacs v25. This limit serves to catch infinite recursions for you before they cause actual stack overflow in C, which would be fatal for Emacs. You can safely make it considerably larger than its default value, if that proves inconveniently small. However, if you increase it too far, Emacs could overflow the real C stack, and crash. > I understand that in C you usually don't manage your own stack and, > consequently, you can't say that it falls under "memory management" in > the sense of malloc(3) and brk(2) and such. But C programmers are aware > of the mechanisms behind the stack (or at least better be) and won't be > surprised when they get a segfault in this situation. > > But if one of my coworkers who knows some web programming and not much > about system programming gets a segfault, that's not a friendly > error. If Bash instead said something like the above languages, then > that would be useful. > > When I first saw the error, I didn't know that my trap was > recursing. My immediate reaction was "shit, I found a bug". Once I saw > it was the trap, I _assumed_ it was just exhausting the stack, but I > wanted to report it regardless, just in case; I didn't have the time to > dig deeper, and even so, I wasn't sure if it was intended behavior to > just let the kernel handle it. My interpretation of the above is that you would want bash to use libunwind (or whatever is appropriate) to set up a stack overflow exception trap in order to handle stack overflow specially and then to make an improved error reporting to the user when it happens. Frankly I wasn't even aware of libunwind before this. And haven't learned enough about it to even know if that is what I should be mentioning here yet. :-) > > Shell script code is program source code. Infinite loops or infinite > > recursion are bugs in the shell script source code not the interpreter > > that is executing the code as written. > > I also agree. But the context is very different. Shell is a very, > very high-level language. My mind reels at the statement, "Shell is a very, very high-level language." What?! The shell is a very simple low level command and control language. It is very good at what it does. But if one needs to do high level things then one should switch over to a high level language. :-) > > Hope this helps! > > There was useful information, yes. After sifting out the non-useful information. :-) > I hope I was able to further clarify my concerns as well. You are always eloquent! :-) Bob signature.asc Description: PGP signature
Re: Segfault on recursive trap/kill
On Sun, Oct 07, 2018 at 09:42:01 +0700, Robert Elz wrote: > I expect that if you did look, you'd probably find that while > technically the former, it isn't a reference to some wild pointer, > but rather simply growing the stack until the OS says "no more" > and returns a SIGSEGV instead af allocating a new stack page. That makes sense. Thanks. Though, it is an implementation detail that IMO a user of bash shouldn't have to worry about---if bash instead implemented its interpreter stack on the heap rather than the same stack as bash itself, a segfault could have represented an actual bug. -- Mike Gerwitz signature.asc Description: PGP signature
Re: Segfault on recursive trap/kill
On Sun, Oct 07, 2018 at 08:52:25 +0200, Valentin Bajrami wrote: > As earlier expained, you are calling foo function recursively. To mitigate > this behaviour you simple set FUNCNEST= foo() { foo; }; foo where N > denotes the number of nested functios to be called. This is perfect and clear behavior, actually: $ FUNCNEST=10; foo() { foo; }; foo bash: foo: maximum function nesting level exceeded (10) If bash were to set a default value for FUNCNEST then a useful error would be provided rather than segfaulting (and possibly triggering a coredump). Of course, if bash itself is sharing a stack with the interpreter, then it's hard to come up with a good predetermined value. FUNCNEST doesn't seem to work with the issue of recursive traps, though (understandably). -- Mike Gerwitz signature.asc Description: PGP signature
Re: Segfault on recursive trap/kill
Hi Mike, As earlier expained, you are calling foo function recursively. To mitigate this behaviour you simple set FUNCNEST= foo() { foo; }; foo where N denotes the number of nested functios to be called. Op zo 7 okt. 2018 07:57 schreef Mike Gerwitz : > Hey, Bob! > > On Sat, Oct 06, 2018 at 22:44:17 -0600, Bob Proulx wrote: > > Let me give the discussion this way and I think you will be > > convinced. :-) > > Well, thanks for taking the time for such a long reply. :) > > > How is your example any different from a C program? Or Perl, Python, > > Ruby, and so forth? All of those also allow infinite recursion and > > the kernel will terminate them with a segfault. Because all of those > > also allow infinite recursion. A program that executes an infinite > > recursion would use infinite stack space. But real machines have a > > finite amount of stack available and therefore die when the stack is > > exceeded. > > I expect this behavior when writing in C, certainly. But in languages > where the user does not deal with memory management, I'm used to a more > graceful abort when the stack gets out of control. A segfault means > something to a C hacker. It means very little to users who are > unfamiliar with the concepts that you were describing. > > I don't have enough experience with Perl, Python, or Ruby to know how > they handle stack issues. But, out of interest, I gave it a try: > > $ perl -e 'sub foo() { foo(); }; foo()' > Out of memory! > > $ python <<< 'def foo(): > > foo() > >foo()' |& tail -n1 > RuntimeError: maximum recursion depth exceeded > > $ ruby -e 'def foo() > > foo() > > end > > foo()' > -e:2: stack level too deep (SystemStackError) > > Some languages I'm more familiar with: > > $ node -e '(function foo() { foo(); })()' > [eval]:1 > (function foo() { foo(); })() >^ > > RangeError: Maximum call stack size exceeded > > $ php -r 'function foo() { foo(); } foo();' > > Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to > allocate 262144 bytes) in Command line code on line 1 > > $ guile -e '(let x () (+ (x)))' > allocate_stack failed: Cannot allocate memory > Warning: Unwind-only `stack-overflow' exception; skipping pre-unwind > handler. > > $ emacs --batch --eval '(message (defun foo () (foo)) (foo))' > Lisp nesting exceeds ‘max-lisp-eval-depth’ > > And so on. > > I understand that in C you usually don't manage your own stack and, > consequently, you can't say that it falls under "memory management" in > the sense of malloc(3) and brk(2) and such. But C programmers are aware > of the mechanisms behind the stack (or at least better be) and won't be > surprised when they get a segfault in this situation. > > But if one of my coworkers who knows some web programming and not much > about system programming gets a segfault, that's not a friendly > error. If Bash instead said something like the above languages, then > that would be useful. > > When I first saw the error, I didn't know that my trap was > recursing. My immediate reaction was "shit, I found a bug". Once I saw > it was the trap, I _assumed_ it was just exhausting the stack, but I > wanted to report it regardless, just in case; I didn't have the time to > dig deeper, and even so, I wasn't sure if it was intended behavior to > just let the kernel handle it. > > > > This following complete C program recurses infinitely. Or at least > > until the stack is exhausted. At which time it triggers a segfault > > because it tries to use memory beyond the page mapped stack. > > [...] > > > Would you say that is a bug in the C language? A bug in gcc that > > compiled it? A bug in the Unix/Linux kernel for memory management > > that trapped the error? The parent shell that reported the exit code > > of the program? Or in the program source code? I am hoping that we > > will all agree that it is a bug in the program source code and not > > either gcc or the kernel. :-) > > I agree, yes. > > > Shell script code is program source code. Infinite loops or infinite > > recursion are bugs in the shell script source code not the interpreter > > that is executing the code as written. > > I also agree. But the context is very different. Shell is a very, > very high-level language. > > > This feels to me to be related to The Halting Problem. > > Knowing in advance whether there may be a problem certainly is, but we > don't need to do that; we'd just need to detect it at runtime to provide > a more useful error message. > > > Other shells are also fun to check: > > > > $ dash -c 'trap "kill 0" TERM; kill 0' > > Segmentation fault > > > > $ ash -c 'trap "kill 0" TERM; kill 0' > > Segmentation fault > > > > $ mksh -c 'trap "kill 0" TERM; kill 0' > > Segmentation fault > > Heh, interesting! > > > $ ksh93 -c 'trap "kill 0" TERM; kill 0' > > $ echo $? > > 0 > > This is not the behavior I'd want. > > > $ posh -c 'trap "kill 0" TERM; kill 0' >
Re: Segfault on recursive trap/kill
Hey, Bob! On Sat, Oct 06, 2018 at 22:44:17 -0600, Bob Proulx wrote: > Let me give the discussion this way and I think you will be > convinced. :-) Well, thanks for taking the time for such a long reply. :) > How is your example any different from a C program? Or Perl, Python, > Ruby, and so forth? All of those also allow infinite recursion and > the kernel will terminate them with a segfault. Because all of those > also allow infinite recursion. A program that executes an infinite > recursion would use infinite stack space. But real machines have a > finite amount of stack available and therefore die when the stack is > exceeded. I expect this behavior when writing in C, certainly. But in languages where the user does not deal with memory management, I'm used to a more graceful abort when the stack gets out of control. A segfault means something to a C hacker. It means very little to users who are unfamiliar with the concepts that you were describing. I don't have enough experience with Perl, Python, or Ruby to know how they handle stack issues. But, out of interest, I gave it a try: $ perl -e 'sub foo() { foo(); }; foo()' Out of memory! $ python <<< 'def foo(): > foo() >foo()' |& tail -n1 RuntimeError: maximum recursion depth exceeded $ ruby -e 'def foo() > foo() > end > foo()' -e:2: stack level too deep (SystemStackError) Some languages I'm more familiar with: $ node -e '(function foo() { foo(); })()' [eval]:1 (function foo() { foo(); })() ^ RangeError: Maximum call stack size exceeded $ php -r 'function foo() { foo(); } foo();' Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 262144 bytes) in Command line code on line 1 $ guile -e '(let x () (+ (x)))' allocate_stack failed: Cannot allocate memory Warning: Unwind-only `stack-overflow' exception; skipping pre-unwind handler. $ emacs --batch --eval '(message (defun foo () (foo)) (foo))' Lisp nesting exceeds ‘max-lisp-eval-depth’ And so on. I understand that in C you usually don't manage your own stack and, consequently, you can't say that it falls under "memory management" in the sense of malloc(3) and brk(2) and such. But C programmers are aware of the mechanisms behind the stack (or at least better be) and won't be surprised when they get a segfault in this situation. But if one of my coworkers who knows some web programming and not much about system programming gets a segfault, that's not a friendly error. If Bash instead said something like the above languages, then that would be useful. When I first saw the error, I didn't know that my trap was recursing. My immediate reaction was "shit, I found a bug". Once I saw it was the trap, I _assumed_ it was just exhausting the stack, but I wanted to report it regardless, just in case; I didn't have the time to dig deeper, and even so, I wasn't sure if it was intended behavior to just let the kernel handle it. > This following complete C program recurses infinitely. Or at least > until the stack is exhausted. At which time it triggers a segfault > because it tries to use memory beyond the page mapped stack. [...] > Would you say that is a bug in the C language? A bug in gcc that > compiled it? A bug in the Unix/Linux kernel for memory management > that trapped the error? The parent shell that reported the exit code > of the program? Or in the program source code? I am hoping that we > will all agree that it is a bug in the program source code and not > either gcc or the kernel. :-) I agree, yes. > Shell script code is program source code. Infinite loops or infinite > recursion are bugs in the shell script source code not the interpreter > that is executing the code as written. I also agree. But the context is very different. Shell is a very, very high-level language. > This feels to me to be related to The Halting Problem. Knowing in advance whether there may be a problem certainly is, but we don't need to do that; we'd just need to detect it at runtime to provide a more useful error message. > Other shells are also fun to check: > > $ dash -c 'trap "kill 0" TERM; kill 0' > Segmentation fault > > $ ash -c 'trap "kill 0" TERM; kill 0' > Segmentation fault > > $ mksh -c 'trap "kill 0" TERM; kill 0' > Segmentation fault Heh, interesting! > $ ksh93 -c 'trap "kill 0" TERM; kill 0' > $ echo $? > 0 This is not the behavior I'd want. > $ posh -c 'trap "kill 0" TERM; kill 0' > Terminated > Terminated > Terminated > ... > Terminated > ^C :x > This finds what look like bugs in posh and ksh93. That's a fair assessment. >> it's just that most users assume that a segfault represents a >> problem with the program > > Yes. And here it indicates a bug too. It is indicating a bug in the > shell program code which sets up the infinite recursion. Programs > should avoid doing that. :-) Indeed they should, but inevitably, such bugs do
Re: Segfault on recursive trap/kill
Hi Mike, Mike Gerwitz wrote: > ... but are you saying that terminating with a segfault is the > intended behavior for runaway recursion? Let me give the discussion this way and I think you will be convinced. :-) How is your example any different from a C program? Or Perl, Python, Ruby, and so forth? All of those also allow infinite recursion and the kernel will terminate them with a segfault. Because all of those also allow infinite recursion. A program that executes an infinite recursion would use infinite stack space. But real machines have a finite amount of stack available and therefore die when the stack is exceeded. This following complete C program recurses infinitely. Or at least until the stack is exhausted. At which time it triggers a segfault because it tries to use memory beyond the page mapped stack. int main() { return main(); } $ gcc -o forever forever.c $ ./forever Segmentation fault $ echo $? 139 # Signal 11 + 128 The return value of a simple command is its exit status, or 128+n if the command is terminated by signal n. Would you say that is a bug in the C language? A bug in gcc that compiled it? A bug in the Unix/Linux kernel for memory management that trapped the error? The parent shell that reported the exit code of the program? Or in the program source code? I am hoping that we will all agree that it is a bug in the program source code and not either gcc or the kernel. :-) Shell script code is program source code. Infinite loops or infinite recursion are bugs in the shell script source code not the interpreter that is executing the code as written. This feels to me to be related to The Halting Problem. > As long as there is no exploitable flaw here, then I suppose this isn't > a problem; It's not a privilege escalation. Nor a buffer overflow. Whether this is otherwise exploitable depends upon the surrounding environment usage. > I haven't inspected the code to see if this is an access violation > or if Bash is intentionally signaling SIGSEGV. It is the kernel that manages memory, maps pages, detects page faults, kills the program. The parent bash shell is only reporting the exit code that resulted. The interpreting shell executed the shell script souce code as written. Other shells are also fun to check: $ dash -c 'trap "kill 0" TERM; kill 0' Segmentation fault $ ash -c 'trap "kill 0" TERM; kill 0' Segmentation fault $ mksh -c 'trap "kill 0" TERM; kill 0' Segmentation fault $ ksh93 -c 'trap "kill 0" TERM; kill 0' $ echo $? 0 $ posh -c 'trap "kill 0" TERM; kill 0' Terminated Terminated Terminated ... Terminated ^C Testing zsh is interesting because it seems to keep the interpreter stack in data space and therefore can consume a large amount of memory if it is available. And then can trap the result of being out of data memory and then kills itself with a SIGTERM. Note that in my testing I have Linux memory overcommit disabled. This finds what look like bugs in posh and ksh93. > it's just that most users assume that a segfault represents a > problem with the program Yes. And here it indicates a bug too. It is indicating a bug in the shell program code which sets up the infinite recursion. Programs should avoid doing that. :-) bash -c 'trap "kill 0" TERM; kill 0' The trap handler was not set back to the default before the program sent the signal to itself. The way to fix this is: $ bash -c 'trap "trap - TERM; kill 0" TERM; kill 0' Terminated $ echo $? 143 # killed on SIGTERM as desired, good If ARG is absent (and a single SIGNAL_SPEC is supplied) or `-', each specified signal is reset to its original value. The proper way for a program to terminate itself upon catching a signal is to set the signal handler back to the default value and then send the signal to itself so that it will be terminated as a result of the signal and therefore the exit status will be set correctly. For example the following is useful boilerplate: unset tmpfile cleanup() { test -n "$tmpfile" && rm -f "$tmpfile" && unset tmpfile } trap "cleanup" EXIT trap "cleanup; trap - HUP; kill -HUP $$" HUP trap "cleanup; trap - INT; kill -INT $$" INT trap "cleanup; trap - QUIT; kill -QUIT $$" QUIT trap "cleanup; trap - TERM; kill -TERM $$" TERM tmpfile=$(mktemp) || exit 1 If a program traps a signal then it should restore the default signal handler for that signal and send the signal back to itself. Otherwise the exit code will be incorrect. Otherwise parent programs won't know that the child was killed with a signal. For a highly recommended deep dive into this: https://www.cons.org/cracauer/sigint.html Hope this helps! Bob signature.asc Description: PGP signature
Re: Segfault on recursive trap/kill
Date:Sat, 06 Oct 2018 19:53:25 -0400 From:Mike Gerwitz Message-ID: <874ldy1vka@gnu.org> | I haven't inspected the code to see if this is an access | violation or if Bash is intentionally signaling SIGSEGV. I expect that if you did look, you'd probably find that while technically the former, it isn't a reference to some wild pointer, but rather simply growing the stack until the OS says "no more" and returns a SIGSEGV instead af allocating a new stack page. kre
Re: Segfault on recursive trap/kill
On Sat, Oct 06, 2018 at 12:33:22 -0400, Chet Ramey wrote: > On 10/5/18 9:33 PM, Mike Gerwitz wrote: >> The following code will cause a segfault on bash-4.4.19(1) on >> GNU Guix. I reproduced the issue on an old Ubuntu 14.04 LTS running >> bash-4.3.11(1) as well as a Trisquel system running the same version. >> >> bash -c 'trap "kill 0" TERM; kill 0' >> >> Also segfaults when replacing `0' with `$$', and presumably in any other >> situation that would trigger the trap recursively. > > Yes. Bash has allowed recursive trap handlers since early 2014 (pre-4.3) > due to requests for the feature and compatibility with other shells that > allow it. > > If you manage to create infinite recursion, bash won't stop you. Sure, I agree that the feature is useful, but are you saying that terminating with a segfault is the intended behavior for runaway recursion? Upon further inspection, it does look like `foo() { foo; }; foo' also causes a segfault, so the behavior is consistent with trap recursion. As long as there is no exploitable flaw here, then I suppose this isn't a problem; it's just that most users assume that a segfault represents a problem with the program (unless they're dealing with their own memory management). I haven't inspected the code to see if this is an access violation or if Bash is intentionally signaling SIGSEGV. In any case, thanks for the reply. -- Mike Gerwitz signature.asc Description: PGP signature
Re: Segfault on recursive trap/kill
On 10/5/18 9:33 PM, Mike Gerwitz wrote: > The following code will cause a segfault on bash-4.4.19(1) on > GNU Guix. I reproduced the issue on an old Ubuntu 14.04 LTS running > bash-4.3.11(1) as well as a Trisquel system running the same version. > > bash -c 'trap "kill 0" TERM; kill 0' > > Also segfaults when replacing `0' with `$$', and presumably in any other > situation that would trigger the trap recursively. Yes. Bash has allowed recursive trap handlers since early 2014 (pre-4.3) due to requests for the feature and compatibility with other shells that allow it. If you manage to create infinite recursion, bash won't stop you. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Segfault on recursive trap/kill
The following code will cause a segfault on bash-4.4.19(1) on GNU Guix. I reproduced the issue on an old Ubuntu 14.04 LTS running bash-4.3.11(1) as well as a Trisquel system running the same version. bash -c 'trap "kill 0" TERM; kill 0' Also segfaults when replacing `0' with `$$', and presumably in any other situation that would trigger the trap recursively. I don't have the debug symbols, but here's the backtrace: #0 0x76f7ad77 in kill () at ../sysdeps/unix/syscall-template.S:78 #1 0x00446513 in kill_pid () #2 0x004817a6 in kill_builtin () #3 0x0043248d in execute_builtin.isra () #4 0x00434924 in execute_simple_command () #5 0x00435c2f in execute_command_internal () #6 0x004357e6 in execute_command_internal () #7 0x0047d88f in parse_and_execute () #8 0x0041be48 in run_one_command () #9 0x0041da19 in main () I don't have a strong opinion on what the expected behavior ought to be in this situation; I certainly didn't intend to discover this issue. :) For context: I discovered this when my trap tried to kill a subprocess, but the integer variable storing the pid of that process was not properly set. -- Mike Gerwitz Free Software Hacker+Activist | GNU Maintainer & Volunteer GPG: D6E9 B930 028A 6C38 F43B 2388 FEF6 3574 5E6F 6D05 https://mikegerwitz.com signature.asc Description: PGP signature