Re: [PATCH] Fix segfault with self-modifying array PROMPT_COMMAND

2020-08-30 Thread Koichi Murase
2020-08-31 10:36 Koichi Murase :
> Description:
>
>   In the devel branch (e4d38c2d: commit bash-20200819 snapshot), a
>   segmentation fault occurs when a prompt command stored in the array
>   version of PROMPT_COMMAND unsets the corresponding element in
>   PROMPT_COMMAND.

I'm sorry, but I forgot to attach the patch files.

--
Koichi


0001-Fix-a-segmentation-fault-of-array-PROPMT_COMMAND.patch
Description: Binary data


0002-Fix-memleak-in-bash_execute_unix_command.patch
Description: Binary data


[PATCH] Fix segfault with self-modifying array PROMPT_COMMAND

2020-08-30 Thread Koichi Murase
Hi, I hit a segmentation fault with the array PROMPT_COMMAND.  Here is
the report and a patch. There is also another trivial patch.

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -march=native -O3
uname output: Linux chatoyancy 5.6.13-100.fc30.x86_64 #1 SMP Fri May
15 00:36:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.1
Patch Level: 0
Release Status: release

Description:

  In the devel branch (e4d38c2d: commit bash-20200819 snapshot), a
  segmentation fault occurs when a prompt command stored in the array
  version of PROMPT_COMMAND unsets the corresponding element in
  PROMPT_COMMAND.

  The same happens for Bash 5.1-alpha with PROMPT_COMMANDS.

  This is caused because the original array element data is free'd in
  the evaluation process of the command string.  The array scan and
  the copy of all the command strings in PROMPT_COMMAND need to be
  finished before executing the command strings.

Repeat-By:

  For devel branch (PROMPT_COMMAND),

$ cat test19-devel.bashrc
my-prompt-command() { unset 'PROMPT_COMMAND[0]'; }
PROMPT_COMMAND=(my-prompt-command)
$ bash-dev --rcfile test19-devel.bashrc
Segmentation fault (core dumped)

  For 5.1-alpha (PROMPT_COMMANDS),

$ cat test19-5.1.bashrc
my-prompt-command() { unset 'PROMPT_COMMANDS[0]'; }
PROMPT_COMMANDS=(my-prompt-command)
$ bash-5.1-alpha --rcfile test19-5.1.bashrc
Segmentation fault (core dumped)

  As a related behavior, when one of the prompt commands assigns new
  elements to the array PROMPT_COMMAND, the subsequent prompt commands
  will not be executed.  With the devel branch,

$ cat test19b.bashrc
function unregister-prompt_command {
  local -a new=() cmd
  for cmd in "${PROMPT_COMMAND[@]}"; do
[[ $cmd != "$1" ]] && new+=("$cmd")
  done
  PROMPT_COMMAND=("${new[@]}")
}
function my-prompt_command {
  echo "$FUNCNAME"

  # remove itself from PROMPT_COMMAND
  unregister-prompt_command "$FUNCNAME"
}
PROMPT_COMMAND+=('echo test1')
PROMPT_COMMAND+=(my-prompt_command)
PROMPT_COMMAND+=('echo test2')
$ bash-dev --rcfile test19b.bashrc
test1
my-prompt_command   # <-- test2 is expected but missing
bash-dev$
test1
test2   # <-- test2 is correctly called for the
bash-dev$   # next execution of PROMPT_COMMAND

Fix:

  There is no problem with the scalar PROMPT_COMMAND.  For the scalar
  PROMPT_COMMAND, it uses the function `execute_variable_command'
  (parse.y) which first copies the command string using
  `savestring(cmd)' and pass it to `parse_and_execute'.  This process
  can be illustrated in the following schematic code.

{
  Copy the value of PROMPT_COMMAND;
  Execute the copy; /* this can modify the original variable */
  Free the copy;
}

  In this way, it doesn't cause problems even when the variable
  PROMPT_COMMAND is rewritten in the processing of the command string
  because the executed string is a copy of the original
  PROMPT_COMMAND.

  In the case of the array PROMPT_COMMAND, it uses the utility
  `execute_variable_command' for each element, so the schematic code
  would be:

for (PROMPT_COMMAND array_elements)
  {
Copy the element;
Execute the copy; /* this can modify the array */
Free the copy;
  }

  This is robust for the case that the executed command modifies just
  the string of the corresponding element, but vulnerable for the case
  that the array structure is changed.  This should be modified in the
  following way:

{
  Copy all the elements of PROMPT_COMMAND.
  for (copied strings)
Execute the command strings.
  Free the copied strings.
}

  In the patch
  `0001-Fix-a-segmentation-fault-of-array-PROPMT_COMMAND.patch', I
  have added a function `execute_array_command' which is the array
  version of `execute_variable_command'.

  

  By the way, I suspect there is a memory leak in `bashline.c'.

bashline.c:4343:  r = parse_and_execute (savestring (cmd),
"bash_execute_unix_command", SEVAL_NOHIST|SEVAL_NOFREE);

  It appears to me that SEVAL_NOFREE shouldn't be specified here [see
  the patch `0002-Fix-memleak-in-bash_execute_unix_command.patch'], or
  otherwise, the string allocated by `savestring' will not be free'd.
  In fact, I can observe that the memory use of the process is
  monotonically increasing with the following operation:

$ bash --norc
$ arr=({1..10})
$ bind -x "\"\C-t\":: '${arr[*]}"
(hit C-t many times)

  Actually I have sent a related patch two years ago at

https://lists.gnu.org/archive/html/bug-bash/2018-05/msg00020.html

  In the patch at that time, I added `savestring' and removed
  `SEVAL_NOFREE' as well.

  -  r = parse_and_execute (cmd, "bash_execute_unix_command",

Re: Incorrect / Inconsistent behavior with nameref assignments in functions

2020-08-30 Thread Koichi Murase
2020-08-30 19:24 Binarus :
> Actually, this is what first happened to me and what led me to the
> problem described in my original post.
>
> [...]

Thank you for your explanation! Now I see your situation.

>> * Another way is to copy to the local array only when the name is
>>   different from `myArray':
>>
>>   [...]
>
> However, eval is evil. If I ever had to provide that function to
> other users (which currently is not the case), then I would have a
> problem if another user would call it like that:

Yes, I recognize the problem when the function isn't properly used.
But, the use of eval itself is not fatal.  When another user can call
the function as in your example,

  Dummy 'myArray1[@]}"); echo Gotcha!; #'

that means that the user can already run arbitrary commands.  The user
could have directly written

  echo 'Gotcha!'

The real problems occur when the user write like

  Dummy "$input_to_program"

with `input_to_program' provided by the third user who should not be
able to run arbitrary commands, and the input is not checked nor
sanitized.  In this case, the problem should be evaded by checking or
sanitizing the input.  This check can be made inside the function
Dummy, but it is also possible to make it at the time when the shell
receives the input.

> declare -a -i myArray1=('1' '2' '3')
> Dummy 'myArray1[@]}"); echo Gotcha!; #'
>
> Output:
>
> root@cerberus:~/scripts# ./test6
> Gotcha!
> declare -a myArray=([0]="1" [1]="2" [2]="3")

Unfortunately, your original function `Dummy' also has the same
vulnerability.  As Greg has written, there are many other places that
cause the command execution in the shell because that is the purpose
of the shell.  With your original method,

  $ cat testR2d.sh
  function Dummy {
local -n namerefArray="$1"
local -a -i myArray=("${namerefArray[@]}")
  }
  myArray1=("$@")
  Dummy 'myArray1'
  $ bash testR2d.sh 'a[$(echo Gotcha1 >/dev/tty)]'
  Gotcha1

This is caused by the arithmetic evaluation caused by `-i' flag.  In
arithmetic evaluations, the array subscripts are subject to the extra
expansions so that the string $(echo Gotcha1 >/dev/tty) is expanded as
a command substitution.

Actually, the nameref also has the same behavior, so the use of
`nameref' is not much safer than the use of `eval'.

  $ cat testR2e.sh
  function Dummy2 {
local -n namerefScalar=$1
local var=$namerefScalar
  }
  Dummy2 "$1"
  $ bash testR2e.sh 'a[$(echo Gotcha2 >/dev/tty)]'
  Gotcha2

Yes, I guess it would be a valid strategy to disallow any use of
`eval' because humans will make mistakes no matter how careful we are.
But, there are still different traps, so anyway we need to carefully
check or sanitize inputs even when we don't use `eval'.

> I guess it would be very complicated, if possible at all, to protect
> the code inside eval against every sort of such attacks.

I think the standard way is to check the input before passing it to
`eval' and is not complicated.  You can just check if the array name
has an expected form:

  function is-valid-array-name {
local reg='^[_[:alpha:]][_[:alnum:]]*$'
[[ $1 =~ $reg ]]
  }

  # Check it inside Dummy
  function Dummy {
is-valid-array-name "$1" || return 1
[[ $1 == myArray ]] || eval "local -a myArray=(\"\${$1[@]}\")"
declare -p myArray
  }

  # Or check it when it receives the array name (I prefer this)
  is-valid-array-name "$1" || exit 1
  input_data=$1
  Dummy "$input_data"

>> * If you want to use namerefs to eliminate the use of `eval', maybe
>>   you could do like the following [...]
>
> However (hoping that I don't get flamed for a dumb question), I
> don't understand why we need inputArray at all in that code.

Sorry, I should have explained it in detail.  The step of `inputArray'
is only needed when you want to modify `myArray' locally in the
function `Dummy' keeping the original array unmodified.  Without
`inputArray',

  $ cat testR2f.sh
  function Dummy {
[[ $1 == refArray ]] || local -n refArray=$1
[[ $1 == myArray ]] || local -ia myArray=("${refArray[@]}")
myArray[0]=${myArray[0]%/}
  }
  myArray=(my/dir/)
  declare -p Dummy
  Dummy myArray
  declare -p Dummy
  $ bash testR2f.sh
  declare -a myArray=([0]="my/dir/")
  declare -a myArray=([0]="my/dir")

This is because, when the outer array has the same name `myArray', the
function Dummy sees the outer array directly instead of the local
copy.

> Wouldn't the following function be sufficient?
>
> function Dummy {
>   [[ $1 == refArray ]] || local -n refArray=$1
>   local -ia myArray=("${refArray[@]}")
>   declare -p myArray
> }

No, that function solves the problem of the collision with `refArray'
(the circular reference) but not the problem of the collision with
`myArray' (the problem in your original post).

> Unfortunately, these solutions (while solving the circular reference
> problem) don't solve my original problem.

Have you tried?  In the examples in my previous reply, I intended to
solve both problems with older 

Re: Incorrect / Inconsistent behavior with nameref assignments in functions

2020-08-30 Thread Greg Wooledge
On Sun, Aug 30, 2020 at 12:24:03PM +0200, Binarus wrote:
> On 30.08.2020 02:59, Koichi Murase wrote:
> > * Another way is to copy to the local array only when the name is
> >   different from `myArray':
> > 
> >   function Dummy {
> > [[ $1 == myArray ]] ||
> >   eval "local -a myArray=(\"\${$1[@]}\")"
> > declare -p myArray
> >   }
> 
> Thank you very much for that idea!
> 
> However, eval is evil. If I ever had to provide that function to other
> users (which currently is not the case), then I would have a problem if
> another user would call it like that:
> 
> declare -a -i myArray1=('1' '2' '3')
> Dummy 'myArray1[@]}"); echo Gotcha!; #'
> 
> Output:
> 
> root@cerberus:~/scripts# ./test6
> Gotcha!
> declare -a myArray=([0]="1" [1]="2" [2]="3")

The evil thing here is code injection.  Obviously eval is one way to
perform code injection, but it's not the *only* way.  Eval itself isn't
evil; if anything, it's all of the other forms of code injection,
which people don't suspect, that are truly insidious.

https://mywiki.wooledge.org/CodeInjection
https://mywiki.wooledge.org/BashWeaknesses

You're trying to do something that you feel should be possible -- passing
an array to a function by reference.  Every other language can do this,
right?  So bash should be able to do this... right?  Nope.

Passing variables by reference (especially arrays) is one of the
major missing features of bash.  Everyone wants it.  Many, many people
have attempted it.  The sheer insanity of some of the attempts is
astounding.

https://fvue.nl/wiki/Bash:_Passing_variables_by_reference

That's a slightly older page, but he found an exploit in "unset" which
does bizarre things when called at different function scope levels, and
managed to use it to manipulate the existence of variables at various
function scopes.

If you absolutely *need* to pass a variable by reference, don't use bash.
That's the best advice I can give you.



Re: Incorrect / Inconsistent behavior with nameref assignments in functions

2020-08-30 Thread Binarus


On 30.08.2020 02:59, Koichi Murase wrote:
> 2020-08-29 14:46 Binarus :
>> I am wondering when debian will include bash 5.1. It looks like
>> debian testing and debian unstable are on bash 5.0, so it will
>> probably take several years.
> 
> Actually the problem of the function `Dummy' will not be solved even
> in bash 5.1.  There is another but similar problem with your function.
> A user might specify `namerefArray' as the name of an outer array,
> which results in a circular-reference error.

Actually, this is what first happened to me and what led me to the
problem described in my original post.

I was trying to solve the circular reference problem by establishing a
mixture of a simple variable naming convention and copying method. The
idea was that all nameref variable names would start with the string
"nameref" (designating the nameref "type"), that all other variable
names would start with other strings, and that I would never pass a
variable which is already a nameref to other functions as a parameter.
Then, as the first action in each function which deals with namerefs, I
would copy the contents of each nameref (whose name starts with
"nameref") to a local variable (whose name doesn't start with
"nameref"), and (if needed) would write back the contents from the local
variable to the nameref variable at the end of the function.

This would make sure that circular references couldn't occur. My example
script 1 is the implementation of this idea, which worked in that it
prevented the circular references ...

In my case, this idea is not as dumb as it first sounds due to the
seemingly unnecessary copying, because most of my functions have to
operate on copies of the data they get passed (mostly arrays) anyway.

Later, while the problem of the circular reference error had indeed been
solved, I noticed that the script silently produced wrong results.

This is the key point and the difference between the two sorts of error:

The bug I have reported (I still believe that "bug" is the correct term)
makes scripts silently produce wrong results. This is an absolute no-go.
If I hadn't tested it thoroughly with edge cases, I eventually wouldn't
have noticed it, which could have led to serious damage (the scripts in
question will be part of a backup system).

In contrast, a circular reference makes bash throw a visible,
appropriate message. I am fine with bash throwing errors or warnings and
possibly aborting script execution. When I see this, I can fix the
problem. Therefore, I don't have any problem with bash 5.1 still not
allowing that sort of circular reference (as long as it keeps throwing
messages when it encounters that situation).

But again, I consider it a massive problem when data just silently is
not copied as expected and even the contents of the variable which is
referenced by the nameref get destroyed (at least as long as we are in
the respective function itself) by using it as the RHS in an assignment.

>   $ cat testR2c.sh
>   function Dummy {
> local -n namerefArray="$1"
> local -a -i myArray=("${namerefArray[@]}")
> local -p
>   }
>   declare -a -i namerefArray=('1' '2' '3')
>   Dummy namerefArray
> 
>   $ bash-5.1-alpha testR2c.sh
>   testR2c.sh: line 4: local: warning: namerefArray: circular name reference
>   testR2c.sh: line 4: warning: namerefArray: circular name reference
>   testR2c.sh: line 5: warning: namerefArray: circular name reference
>   testR2c.sh: line 5: warning: namerefArray: circular name reference
>   declare -a myArray=([0]="1" [1]="2" [2]="3")
>   declare -n namerefArray="namerefArray"
> 
> If you want to work around the problem, there are several ways.
> 
> * One of the simplest ways is to use different variable names as
>   already suggested by other people.  However, when the variable name
>   is not under control for some reason (that, e.g., the functon is
>   provided to users who may use it in an arbitrary way, or it imports
>   different shell-script frameworks), the probability of the name
>   collision is not 0%.

This solution would be good enough for me, because bash warns me about
the problem if it occurs, and I then can change the script accordingly;
I (currently) don't need to provide the functions to other users.

> * Another way is to copy to the local array only when the name is
>   different from `myArray':
> 
>   function Dummy {
> [[ $1 == myArray ]] ||
>   eval "local -a myArray=(\"\${$1[@]}\")"
> declare -p myArray
>   }

Thank you very much for that idea!

However, eval is evil. If I ever had to provide that function to other
users (which currently is not the case), then I would have a problem if
another user would call it like that:

declare -a -i myArray1=('1' '2' '3')
Dummy 'myArray1[@]}"); echo Gotcha!; #'

Output:

root@cerberus:~/scripts# ./test6
Gotcha!
declare -a myArray=([0]="1" [1]="2" [2]="3")

I guess it would be very complicated, if possible at all, to protect the
code inside eval against every sort of such 

Re: Bash parameter expansion (remove largest trailing match, remove largest leading match, pattern replacement) does not work

2020-08-30 Thread Robert Elz
Date:Sat, 29 Aug 2020 22:08:14 -0400
From:Bruce Lilly 
Message-ID:  


  | dash also doesn't have adequate pattern matching for the example
  | task (building a path while ensuring no empty components); it
  | has no way to specify one-or-more (or zero-or-more) occurrences
  | of a pattern such as a slash.

It does, you just don't get to do it in one absurdly complex variable
expansion.   There is very little that can't be done in sh that is
reasonable to attempt in sh code (floating point matrix inversions are
probably not a candidate for example) - including even the original
7th edition Bourne sh (with no functions, nothing builtin, bugs all over
the place, ...)   Sometimes it takes a bunch of code, and sometimes it
might not be extremely fast, but it is possible.

kre




Re: Bash parameter expansion (remove largest trailing match, remove largest leading match, pattern replacement) does not work

2020-08-30 Thread Ilkka Virta
On Sat, Aug 29, 2020 at 11:13 PM Bruce Lilly  wrote:

> It's a bit more complicated than that; if, for example, some excerpt ended
> up in regression tests, there would be a question about whether or not
> there was a copyright violation.  As I understand the GPL (IANAL), it
> requires all parts of a "work" to be GPL'd, and that wouldn't be possible
> for any parts of the script that ended up in bash regression tests.
>

That's hilarious. People post proof-of-concept scripts and code snippets as
part of bug
reports and such every day. If you'd just reduced the problem to a simple
demonstration
(below), you could have explicitly licensed it under the GPL if you were
afraid someone might
want to include it to a GPL'd software. In any case, for a one-liner like
this, it might not even
be copyrightable (at least not everywhere) as pretty much lacks any
creativity. I'd also assume
that test scripts often aren't even compiled with the main program, just
aggregated to the
code distribution. Anyway, it wouldn't be you doing the copyright violation
if someone used
your snippet without license.

Bash and ksh indeed differ in this:

 $ bash -c 'str=foo/; sep="\057"; printf %s\\n ${str%%$sep}'
 foo/

 $ ksh -c 'str=foo/; sep="\057"; printf %s\\n ${str%%$sep}'
 foo

And there's nothing in the Bash manuals that says \057 should be taken as
an octal
escape in a pattern match. The workarounds to that are either sep=$'\057'
which is
documented to accept escapes like this, or sep=/ which just works in the
obvious manner.
I do wonder why you'd even bother with trying the octal escape here instead
of just writing
the slash as a slash. Something like \001 would be different, of course.
The fact that $'\057'
does what you seem to want is exactly the part where you might have used a
form of quoting
which would have worked, but there was no way for the reader to check that
because you
hid the code.

$'' is described here:
https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html
(search for 'octal')

Of course, you also appear to have missed extglob, which I guess is
understandable if you're
coming from ksh. But even so, reducing the problem to smaller, easier to
debug pieces would
have shown the difference there, too, separately from the differences in
handling of octal escapes.
And perhaps led you to read the rest of what the manual says on Pattern
Matching, from the exact
page you linked to. ("If the extglob shell option is enabled using the
shopt builtin, several extended
pattern matching operators are recognized...")