Re: [PATCH] Fix segfault with self-modifying array PROMPT_COMMAND
2020-08-31 10:36 Koichi Murase : > Description: > > In the devel branch (e4d38c2d: commit bash-20200819 snapshot), a > segmentation fault occurs when a prompt command stored in the array > version of PROMPT_COMMAND unsets the corresponding element in > PROMPT_COMMAND. I'm sorry, but I forgot to attach the patch files. -- Koichi 0001-Fix-a-segmentation-fault-of-array-PROPMT_COMMAND.patch Description: Binary data 0002-Fix-memleak-in-bash_execute_unix_command.patch Description: Binary data
[PATCH] Fix segfault with self-modifying array PROMPT_COMMAND
Hi, I hit a segmentation fault with the array PROMPT_COMMAND. Here is the report and a patch. There is also another trivial patch. Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -march=native -O3 uname output: Linux chatoyancy 5.6.13-100.fc30.x86_64 #1 SMP Fri May 15 00:36:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 5.1 Patch Level: 0 Release Status: release Description: In the devel branch (e4d38c2d: commit bash-20200819 snapshot), a segmentation fault occurs when a prompt command stored in the array version of PROMPT_COMMAND unsets the corresponding element in PROMPT_COMMAND. The same happens for Bash 5.1-alpha with PROMPT_COMMANDS. This is caused because the original array element data is free'd in the evaluation process of the command string. The array scan and the copy of all the command strings in PROMPT_COMMAND need to be finished before executing the command strings. Repeat-By: For devel branch (PROMPT_COMMAND), $ cat test19-devel.bashrc my-prompt-command() { unset 'PROMPT_COMMAND[0]'; } PROMPT_COMMAND=(my-prompt-command) $ bash-dev --rcfile test19-devel.bashrc Segmentation fault (core dumped) For 5.1-alpha (PROMPT_COMMANDS), $ cat test19-5.1.bashrc my-prompt-command() { unset 'PROMPT_COMMANDS[0]'; } PROMPT_COMMANDS=(my-prompt-command) $ bash-5.1-alpha --rcfile test19-5.1.bashrc Segmentation fault (core dumped) As a related behavior, when one of the prompt commands assigns new elements to the array PROMPT_COMMAND, the subsequent prompt commands will not be executed. With the devel branch, $ cat test19b.bashrc function unregister-prompt_command { local -a new=() cmd for cmd in "${PROMPT_COMMAND[@]}"; do [[ $cmd != "$1" ]] && new+=("$cmd") done PROMPT_COMMAND=("${new[@]}") } function my-prompt_command { echo "$FUNCNAME" # remove itself from PROMPT_COMMAND unregister-prompt_command "$FUNCNAME" } PROMPT_COMMAND+=('echo test1') PROMPT_COMMAND+=(my-prompt_command) PROMPT_COMMAND+=('echo test2') $ bash-dev --rcfile test19b.bashrc test1 my-prompt_command # <-- test2 is expected but missing bash-dev$ test1 test2 # <-- test2 is correctly called for the bash-dev$ # next execution of PROMPT_COMMAND Fix: There is no problem with the scalar PROMPT_COMMAND. For the scalar PROMPT_COMMAND, it uses the function `execute_variable_command' (parse.y) which first copies the command string using `savestring(cmd)' and pass it to `parse_and_execute'. This process can be illustrated in the following schematic code. { Copy the value of PROMPT_COMMAND; Execute the copy; /* this can modify the original variable */ Free the copy; } In this way, it doesn't cause problems even when the variable PROMPT_COMMAND is rewritten in the processing of the command string because the executed string is a copy of the original PROMPT_COMMAND. In the case of the array PROMPT_COMMAND, it uses the utility `execute_variable_command' for each element, so the schematic code would be: for (PROMPT_COMMAND array_elements) { Copy the element; Execute the copy; /* this can modify the array */ Free the copy; } This is robust for the case that the executed command modifies just the string of the corresponding element, but vulnerable for the case that the array structure is changed. This should be modified in the following way: { Copy all the elements of PROMPT_COMMAND. for (copied strings) Execute the command strings. Free the copied strings. } In the patch `0001-Fix-a-segmentation-fault-of-array-PROPMT_COMMAND.patch', I have added a function `execute_array_command' which is the array version of `execute_variable_command'. By the way, I suspect there is a memory leak in `bashline.c'. bashline.c:4343: r = parse_and_execute (savestring (cmd), "bash_execute_unix_command", SEVAL_NOHIST|SEVAL_NOFREE); It appears to me that SEVAL_NOFREE shouldn't be specified here [see the patch `0002-Fix-memleak-in-bash_execute_unix_command.patch'], or otherwise, the string allocated by `savestring' will not be free'd. In fact, I can observe that the memory use of the process is monotonically increasing with the following operation: $ bash --norc $ arr=({1..10}) $ bind -x "\"\C-t\":: '${arr[*]}" (hit C-t many times) Actually I have sent a related patch two years ago at https://lists.gnu.org/archive/html/bug-bash/2018-05/msg00020.html In the patch at that time, I added `savestring' and removed `SEVAL_NOFREE' as well. - r = parse_and_execute (cmd, "bash_execute_unix_command",
Re: Incorrect / Inconsistent behavior with nameref assignments in functions
2020-08-30 19:24 Binarus : > Actually, this is what first happened to me and what led me to the > problem described in my original post. > > [...] Thank you for your explanation! Now I see your situation. >> * Another way is to copy to the local array only when the name is >> different from `myArray': >> >> [...] > > However, eval is evil. If I ever had to provide that function to > other users (which currently is not the case), then I would have a > problem if another user would call it like that: Yes, I recognize the problem when the function isn't properly used. But, the use of eval itself is not fatal. When another user can call the function as in your example, Dummy 'myArray1[@]}"); echo Gotcha!; #' that means that the user can already run arbitrary commands. The user could have directly written echo 'Gotcha!' The real problems occur when the user write like Dummy "$input_to_program" with `input_to_program' provided by the third user who should not be able to run arbitrary commands, and the input is not checked nor sanitized. In this case, the problem should be evaded by checking or sanitizing the input. This check can be made inside the function Dummy, but it is also possible to make it at the time when the shell receives the input. > declare -a -i myArray1=('1' '2' '3') > Dummy 'myArray1[@]}"); echo Gotcha!; #' > > Output: > > root@cerberus:~/scripts# ./test6 > Gotcha! > declare -a myArray=([0]="1" [1]="2" [2]="3") Unfortunately, your original function `Dummy' also has the same vulnerability. As Greg has written, there are many other places that cause the command execution in the shell because that is the purpose of the shell. With your original method, $ cat testR2d.sh function Dummy { local -n namerefArray="$1" local -a -i myArray=("${namerefArray[@]}") } myArray1=("$@") Dummy 'myArray1' $ bash testR2d.sh 'a[$(echo Gotcha1 >/dev/tty)]' Gotcha1 This is caused by the arithmetic evaluation caused by `-i' flag. In arithmetic evaluations, the array subscripts are subject to the extra expansions so that the string $(echo Gotcha1 >/dev/tty) is expanded as a command substitution. Actually, the nameref also has the same behavior, so the use of `nameref' is not much safer than the use of `eval'. $ cat testR2e.sh function Dummy2 { local -n namerefScalar=$1 local var=$namerefScalar } Dummy2 "$1" $ bash testR2e.sh 'a[$(echo Gotcha2 >/dev/tty)]' Gotcha2 Yes, I guess it would be a valid strategy to disallow any use of `eval' because humans will make mistakes no matter how careful we are. But, there are still different traps, so anyway we need to carefully check or sanitize inputs even when we don't use `eval'. > I guess it would be very complicated, if possible at all, to protect > the code inside eval against every sort of such attacks. I think the standard way is to check the input before passing it to `eval' and is not complicated. You can just check if the array name has an expected form: function is-valid-array-name { local reg='^[_[:alpha:]][_[:alnum:]]*$' [[ $1 =~ $reg ]] } # Check it inside Dummy function Dummy { is-valid-array-name "$1" || return 1 [[ $1 == myArray ]] || eval "local -a myArray=(\"\${$1[@]}\")" declare -p myArray } # Or check it when it receives the array name (I prefer this) is-valid-array-name "$1" || exit 1 input_data=$1 Dummy "$input_data" >> * If you want to use namerefs to eliminate the use of `eval', maybe >> you could do like the following [...] > > However (hoping that I don't get flamed for a dumb question), I > don't understand why we need inputArray at all in that code. Sorry, I should have explained it in detail. The step of `inputArray' is only needed when you want to modify `myArray' locally in the function `Dummy' keeping the original array unmodified. Without `inputArray', $ cat testR2f.sh function Dummy { [[ $1 == refArray ]] || local -n refArray=$1 [[ $1 == myArray ]] || local -ia myArray=("${refArray[@]}") myArray[0]=${myArray[0]%/} } myArray=(my/dir/) declare -p Dummy Dummy myArray declare -p Dummy $ bash testR2f.sh declare -a myArray=([0]="my/dir/") declare -a myArray=([0]="my/dir") This is because, when the outer array has the same name `myArray', the function Dummy sees the outer array directly instead of the local copy. > Wouldn't the following function be sufficient? > > function Dummy { > [[ $1 == refArray ]] || local -n refArray=$1 > local -ia myArray=("${refArray[@]}") > declare -p myArray > } No, that function solves the problem of the collision with `refArray' (the circular reference) but not the problem of the collision with `myArray' (the problem in your original post). > Unfortunately, these solutions (while solving the circular reference > problem) don't solve my original problem. Have you tried? In the examples in my previous reply, I intended to solve both problems with older
Re: Incorrect / Inconsistent behavior with nameref assignments in functions
On Sun, Aug 30, 2020 at 12:24:03PM +0200, Binarus wrote: > On 30.08.2020 02:59, Koichi Murase wrote: > > * Another way is to copy to the local array only when the name is > > different from `myArray': > > > > function Dummy { > > [[ $1 == myArray ]] || > > eval "local -a myArray=(\"\${$1[@]}\")" > > declare -p myArray > > } > > Thank you very much for that idea! > > However, eval is evil. If I ever had to provide that function to other > users (which currently is not the case), then I would have a problem if > another user would call it like that: > > declare -a -i myArray1=('1' '2' '3') > Dummy 'myArray1[@]}"); echo Gotcha!; #' > > Output: > > root@cerberus:~/scripts# ./test6 > Gotcha! > declare -a myArray=([0]="1" [1]="2" [2]="3") The evil thing here is code injection. Obviously eval is one way to perform code injection, but it's not the *only* way. Eval itself isn't evil; if anything, it's all of the other forms of code injection, which people don't suspect, that are truly insidious. https://mywiki.wooledge.org/CodeInjection https://mywiki.wooledge.org/BashWeaknesses You're trying to do something that you feel should be possible -- passing an array to a function by reference. Every other language can do this, right? So bash should be able to do this... right? Nope. Passing variables by reference (especially arrays) is one of the major missing features of bash. Everyone wants it. Many, many people have attempted it. The sheer insanity of some of the attempts is astounding. https://fvue.nl/wiki/Bash:_Passing_variables_by_reference That's a slightly older page, but he found an exploit in "unset" which does bizarre things when called at different function scope levels, and managed to use it to manipulate the existence of variables at various function scopes. If you absolutely *need* to pass a variable by reference, don't use bash. That's the best advice I can give you.
Re: Incorrect / Inconsistent behavior with nameref assignments in functions
On 30.08.2020 02:59, Koichi Murase wrote: > 2020-08-29 14:46 Binarus : >> I am wondering when debian will include bash 5.1. It looks like >> debian testing and debian unstable are on bash 5.0, so it will >> probably take several years. > > Actually the problem of the function `Dummy' will not be solved even > in bash 5.1. There is another but similar problem with your function. > A user might specify `namerefArray' as the name of an outer array, > which results in a circular-reference error. Actually, this is what first happened to me and what led me to the problem described in my original post. I was trying to solve the circular reference problem by establishing a mixture of a simple variable naming convention and copying method. The idea was that all nameref variable names would start with the string "nameref" (designating the nameref "type"), that all other variable names would start with other strings, and that I would never pass a variable which is already a nameref to other functions as a parameter. Then, as the first action in each function which deals with namerefs, I would copy the contents of each nameref (whose name starts with "nameref") to a local variable (whose name doesn't start with "nameref"), and (if needed) would write back the contents from the local variable to the nameref variable at the end of the function. This would make sure that circular references couldn't occur. My example script 1 is the implementation of this idea, which worked in that it prevented the circular references ... In my case, this idea is not as dumb as it first sounds due to the seemingly unnecessary copying, because most of my functions have to operate on copies of the data they get passed (mostly arrays) anyway. Later, while the problem of the circular reference error had indeed been solved, I noticed that the script silently produced wrong results. This is the key point and the difference between the two sorts of error: The bug I have reported (I still believe that "bug" is the correct term) makes scripts silently produce wrong results. This is an absolute no-go. If I hadn't tested it thoroughly with edge cases, I eventually wouldn't have noticed it, which could have led to serious damage (the scripts in question will be part of a backup system). In contrast, a circular reference makes bash throw a visible, appropriate message. I am fine with bash throwing errors or warnings and possibly aborting script execution. When I see this, I can fix the problem. Therefore, I don't have any problem with bash 5.1 still not allowing that sort of circular reference (as long as it keeps throwing messages when it encounters that situation). But again, I consider it a massive problem when data just silently is not copied as expected and even the contents of the variable which is referenced by the nameref get destroyed (at least as long as we are in the respective function itself) by using it as the RHS in an assignment. > $ cat testR2c.sh > function Dummy { > local -n namerefArray="$1" > local -a -i myArray=("${namerefArray[@]}") > local -p > } > declare -a -i namerefArray=('1' '2' '3') > Dummy namerefArray > > $ bash-5.1-alpha testR2c.sh > testR2c.sh: line 4: local: warning: namerefArray: circular name reference > testR2c.sh: line 4: warning: namerefArray: circular name reference > testR2c.sh: line 5: warning: namerefArray: circular name reference > testR2c.sh: line 5: warning: namerefArray: circular name reference > declare -a myArray=([0]="1" [1]="2" [2]="3") > declare -n namerefArray="namerefArray" > > If you want to work around the problem, there are several ways. > > * One of the simplest ways is to use different variable names as > already suggested by other people. However, when the variable name > is not under control for some reason (that, e.g., the functon is > provided to users who may use it in an arbitrary way, or it imports > different shell-script frameworks), the probability of the name > collision is not 0%. This solution would be good enough for me, because bash warns me about the problem if it occurs, and I then can change the script accordingly; I (currently) don't need to provide the functions to other users. > * Another way is to copy to the local array only when the name is > different from `myArray': > > function Dummy { > [[ $1 == myArray ]] || > eval "local -a myArray=(\"\${$1[@]}\")" > declare -p myArray > } Thank you very much for that idea! However, eval is evil. If I ever had to provide that function to other users (which currently is not the case), then I would have a problem if another user would call it like that: declare -a -i myArray1=('1' '2' '3') Dummy 'myArray1[@]}"); echo Gotcha!; #' Output: root@cerberus:~/scripts# ./test6 Gotcha! declare -a myArray=([0]="1" [1]="2" [2]="3") I guess it would be very complicated, if possible at all, to protect the code inside eval against every sort of such
Re: Bash parameter expansion (remove largest trailing match, remove largest leading match, pattern replacement) does not work
Date:Sat, 29 Aug 2020 22:08:14 -0400 From:Bruce Lilly Message-ID: | dash also doesn't have adequate pattern matching for the example | task (building a path while ensuring no empty components); it | has no way to specify one-or-more (or zero-or-more) occurrences | of a pattern such as a slash. It does, you just don't get to do it in one absurdly complex variable expansion. There is very little that can't be done in sh that is reasonable to attempt in sh code (floating point matrix inversions are probably not a candidate for example) - including even the original 7th edition Bourne sh (with no functions, nothing builtin, bugs all over the place, ...) Sometimes it takes a bunch of code, and sometimes it might not be extremely fast, but it is possible. kre
Re: Bash parameter expansion (remove largest trailing match, remove largest leading match, pattern replacement) does not work
On Sat, Aug 29, 2020 at 11:13 PM Bruce Lilly wrote: > It's a bit more complicated than that; if, for example, some excerpt ended > up in regression tests, there would be a question about whether or not > there was a copyright violation. As I understand the GPL (IANAL), it > requires all parts of a "work" to be GPL'd, and that wouldn't be possible > for any parts of the script that ended up in bash regression tests. > That's hilarious. People post proof-of-concept scripts and code snippets as part of bug reports and such every day. If you'd just reduced the problem to a simple demonstration (below), you could have explicitly licensed it under the GPL if you were afraid someone might want to include it to a GPL'd software. In any case, for a one-liner like this, it might not even be copyrightable (at least not everywhere) as pretty much lacks any creativity. I'd also assume that test scripts often aren't even compiled with the main program, just aggregated to the code distribution. Anyway, it wouldn't be you doing the copyright violation if someone used your snippet without license. Bash and ksh indeed differ in this: $ bash -c 'str=foo/; sep="\057"; printf %s\\n ${str%%$sep}' foo/ $ ksh -c 'str=foo/; sep="\057"; printf %s\\n ${str%%$sep}' foo And there's nothing in the Bash manuals that says \057 should be taken as an octal escape in a pattern match. The workarounds to that are either sep=$'\057' which is documented to accept escapes like this, or sep=/ which just works in the obvious manner. I do wonder why you'd even bother with trying the octal escape here instead of just writing the slash as a slash. Something like \001 would be different, of course. The fact that $'\057' does what you seem to want is exactly the part where you might have used a form of quoting which would have worked, but there was no way for the reader to check that because you hid the code. $'' is described here: https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html (search for 'octal') Of course, you also appear to have missed extglob, which I guess is understandable if you're coming from ksh. But even so, reducing the problem to smaller, easier to debug pieces would have shown the difference there, too, separately from the differences in handling of octal escapes. And perhaps led you to read the rest of what the manual says on Pattern Matching, from the exact page you linked to. ("If the extglob shell option is enabled using the shopt builtin, several extended pattern matching operators are recognized...")