Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-23 Thread Greg Wooledge
On Wed, May 22, 2019 at 10:23:04PM +, Charles-Henri Gros wrote:
> But unfortunately, grep was just illustrative, I'm using another tool
> that takes a regex but has no "-F" option

1. The questioner's first description of the problem/question will be
   misleading.

9. All examples given by the questioner will be broken, misleading,
   wrong, and/or not representative of the actual question.

25. The newbie won't accept any answer that uses practical or standard
tools.

26. The newbie will not TELL you about this restriction until you have
wasted half an hour.



Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Robert Elz
Date:Wed, 22 May 2019 22:23:04 +
From:Charles-Henri Gros 
Message-ID:  


  | But unfortunately, grep was just illustrative, I'm using another tool
  | that takes a regex but has no "-F" option (though admittedly with some
  | effort I could add one, I wrote the tool in question).

You can still do the sed to hide any $ in the command line the way
you were doing.   The important thing is to not expose the results
to pathname expansion, and if you're going to use the shell to
break apart the file names (field splitting) make sure IFS is set
correctly - you might find
IFS=$'\n'
works better for your usage than the default (so filenames with
spaces don't give problems).

You might also want to use Chet's suggestion, and disable pathname
expansion with "set -f".

But this kind of thing is what happens when you don't povide all of
the info about the problem you're having - people tend to provide
answers to the problem you say that you have, rather than the
actual issue.

It is all good (and helpful) to find a simple test case for a problem
you're seeing, and provide that as well - but always give the actual
problem details.

Here without knowing what kind of input your "tool in question" takes
it is impossible for anyone to work out what a good solution would be.

  | Yes I'm not expecting any special characters except "$".

It is best not to make too many assumptions - remember that even '.'
is special in RE's and '.' is very common in filenames.

kre




Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Charles-Henri Gros
On 5/22/19 3:13 PM, Robert Elz wrote:
> Date:Wed, 22 May 2019 17:34:22 +
> From:Charles-Henri Gros 
> Message-ID:  
> 
>
>   | The problem I'm trying to solve is to iterate over regex-escaped file
>   | names obtained from a "find" command. I don't know how to make this
>   | work. It works with other versions of bash and with other shells.
>
> You were relying upon a common bug, which has been fixed in bash, but
> your technique is all wrong, you don't need any kind of loop at all, not
> a for loop, and not the while read loop that Greg suggested.
>
> find -print produces a list of names, one per line.   Those are simple
> strings, which fgrep (or grep -F as Andreas suggested) can handle finding.
>
> What I'd do is
>
>   fgrep "$(find  -print)" wherever

Interesting, I didn't realize you could pass newline-separated patterns
to "grep" on the command line. Good to know for the future.

But unfortunately, grep was just illustrative, I'm using another tool
that takes a regex but has no "-F" option (though admittedly with some
effort I could add one, I wrote the tool in question).


>
> (You can use grep -F if you have an aversion to using its traditional name,
> but fgrep was once a different program to grep / egrep).
>
> This version will have a problem with filenames with embedded newlines,
> but so did your original, so I am simply assuming that you have none of
> those (using any variant of grep to search for strings containing newlines
> tends to be "difficult" as grep is a line at a time tool).
Yes I'm not expecting any special characters except "$".
>
> If you version of grep cannot handle the pattern list not having a
> terminating \n (the $() removes it) then you can add it back
>
>   fgrep "$(find ... -print)"$'\n' wherever.
>
> You're probably still going to need a | into sed inside the command
> substitution, as I doubt that you actually want to look for filenames
> in the format that find prints them (you have never shown your actual
> command) and I suspect that you want to delete the pathname component
> (a leading "./" or whatever) and it isn't clear what you want to
> happen with filenames in subdirectories.  But none of those manipulations
> will affect anything.
>
> The other difference between this method and the one that you were
> using, is that this one will mix up the output for all of the different
> file names (it reads the target files just once, looking for all of the
> filenames simultaneously) whereas your original scheme looked for each
> file name in the target sequentially (re-reading the target file(s) over
> and over again for each new file name).   That would group output lines
> for each file name together, whereas the technique above does not.

-- 
Charles-Henri Gros




Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Robert Elz
Date:Wed, 22 May 2019 17:34:22 +
From:Charles-Henri Gros 
Message-ID:  


  | The problem I'm trying to solve is to iterate over regex-escaped file
  | names obtained from a "find" command. I don't know how to make this
  | work. It works with other versions of bash and with other shells.

You were relying upon a common bug, which has been fixed in bash, but
your technique is all wrong, you don't need any kind of loop at all, not
a for loop, and not the while read loop that Greg suggested.

find -print produces a list of names, one per line.   Those are simple
strings, which fgrep (or grep -F as Andreas suggested) can handle finding.

What I'd do is

fgrep "$(find  -print)" wherever

(You can use grep -F if you have an aversion to using its traditional name,
but fgrep was once a different program to grep / egrep).

This version will have a problem with filenames with embedded newlines,
but so did your original, so I am simply assuming that you have none of
those (using any variant of grep to search for strings containing newlines
tends to be "difficult" as grep is a line at a time tool).

If you version of grep cannot handle the pattern list not having a
terminating \n (the $() removes it) then you can add it back

fgrep "$(find ... -print)"$'\n' wherever.

You're probably still going to need a | into sed inside the command
substitution, as I doubt that you actually want to look for filenames
in the format that find prints them (you have never shown your actual
command) and I suspect that you want to delete the pathname component
(a leading "./" or whatever) and it isn't clear what you want to
happen with filenames in subdirectories.  But none of those manipulations
will affect anything.

The other difference between this method and the one that you were
using, is that this one will mix up the output for all of the different
file names (it reads the target files just once, looking for all of the
filenames simultaneously) whereas your original scheme looked for each
file name in the target sequentially (re-reading the target file(s) over
and over again for each new file name).   That would group output lines
for each file name together, whereas the technique above does not.

kre




Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Andreas Kusalananda Kähäri
On Wed, May 22, 2019 at 05:34:22PM +, Charles-Henri Gros wrote:
[cut]
> The problem I'm trying to solve is to iterate over regex-escaped file
> names obtained from a "find" command. I don't know how to make this
> work. It works with other versions of bash and with other shells.
> 
> The original is closer to something like this:
> 
> for file in $(find ... | sed 's/\$/\\$/g'); do grep -e "$file"
> someinput; done

You may want to use "grep -F" to match fixed strings (not regular
expressions):

find ... -exec grep -F -e {} someinput \;


Add -x to grep if you want full line matches only.

Tis is assuming you'd want to look for the found pathnames in
"someinput".



> 
> It used to work. Now it doesn't. I do not know how to make it work again.
> 
> 
> -- 
> Charles-Henri Gros
> 

-- 
Kusalananda
Sweden



Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Andreas Schwab
On Mai 22 2019, Charles-Henri Gros  wrote:

> The file name is the regex (argument to "-e"), not the file "grep"
> reads. I want to check that some text file contains a reference to a file.
>
> But it looks like this would work:
>
> for file in $(find ...); do grep -e "$(echo -n "$file" | sed 's/\$/\\$/g')" 
> someinput; done

Use grep -F instead.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Chet Ramey
On 5/22/19 3:14 PM, Charles-Henri Gros wrote:

> That's what I find a bit surprising (but shells are complicated, so
> maybe this is right. All I know is that the code used to work). I didn't
> think glob expansions applied to command expansions.

Command substitution is one of the word expansions, as is parameter
(variable) expansion. Pathname expansion (globbing) is applied to the
results of the other expansions and word splitting. The order is detailed
here:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06

> 
> All I want here is word split (which is why I can't use quotes)

You could always try turning off pathname expansion temporarily with
`set -f'.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Charles-Henri Gros
On 5/22/19 10:47 AM, Greg Wooledge wrote:
> On Wed, May 22, 2019 at 05:34:22PM +, Charles-Henri Gros wrote:
>> On 5/22/19 5:43 AM, Greg Wooledge wrote:
>>> Standard disclaimers apply.  Stop using unquoted variables and these
>>> bugs will stop affecting you.  Nevertheless, Chet may want to take a
>>> peek.
>> What unquoted variables? Are you talking about the "$()" expansion?
> Yes.  I used a variable instead of a command substitution to make it
> easier to reproduce the problem.  Both have the same behavior in this
> case.

That's what I find a bit surprising (but shells are complicated, so
maybe this is right. All I know is that the code used to work). I didn't
think glob expansions applied to command expansions.

All I want here is word split (which is why I can't use quotes)

>
>> The problem I'm trying to solve is to iterate over regex-escaped file
>> names obtained from a "find" command. I don't know how to make this
>> work. It works with other versions of bash and with other shells.
> First step: do not "regex-escape" them, whatever that means.  Just use
> the actual filenames as printed by find -print0.
>
>> The original is closer to something like this:
>>
>> for file in $(find ... | sed 's/\$/\\$/g'); do grep -e "$file"
>> someinput; done
> Yeah, that's just the wrong approach.  It's also the first thing on
> the BashPitfalls page[1] (for a good reason).
>
> You have two choices here:
>
> 1) Use find -exec.
>
>find ... -exec grep -e someinput /dev/null {} +
>
> 2) Use find -print0 and a bash while read loop.  (NOT a for loop.)
>
>find ... -print0 |
>while IFS= read -rd '' file; do
>   something "$file"
>done
>
>(A variant of this uses < <() instead of a pipeline, so that the while
>loop runs in the main shell and variable assignments can persist.)
>
> Since you only show a simple grep as your action, find -exec is a better
> choice for this problem.  (Assuming you didn't fatally misrepresent the
> problem.)  Calling grep once for every file would be inefficient.

I don't think I fatally misrepresented the problem, however I do think
that you fatally misunderstood it (FWIW I know about -print0 and xargs -0)

The file name is the regex (argument to "-e"), not the file "grep"
reads. I want to check that some text file contains a reference to a file.

But it looks like this would work:

for file in $(find ...); do grep -e "$(echo -n "$file" | sed 's/\$/\\$/g')" 
someinput; done


-- 
Charles-Henri Gros




Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Greg Wooledge
On Wed, May 22, 2019 at 07:14:44PM +, Charles-Henri Gros wrote:
> The file name is the regex (argument to "-e"), not the file "grep"
> reads. I want to check that some text file contains a reference to a file.
> 
> But it looks like this would work:
> 
> for file in $(find ...); do grep -e "$(echo -n "$file" | sed 's/\$/\\$/g')" 
> someinput; done

That still has the same problems.  I've already given the BashPitfalls
link so I won't repeat that whole speech.

Since it seems you want to repeat your search for every file name
individually, the while read loop becomes a viable choice.

find ... -print0 |
while IFS= read -rd '' file; do
  grep -F -e "$file" /some/textfile
done

This is still going to be inefficient compared to running *one* grep
with all of the input filenames as matchable patterns, but if you're
set on doing it the slow way, so be it.

The faster way can be done safely as long as there aren't *too* many
filenames to pass:

args=()
while IFS= read -rd '' file; do
  args+=(-e "$file")
done < <(find ... -print0)
grep -F "${args[@]}" /some/textfile

That will fail if there are too many arguments.  A less-safe but still
fast way would involve generating a newline-delimited list of the
matchable filename-patterns, which means it'll fail if any filenames
have newlines in them.  Ignoring that for now, we get:

grep -F -f <(find ... -print) /some/textfile

You can add something like ! -name $'*\n*' to the find arguments to
prevent such filenames from being handled.

(Actually, the whole thing fails pretty catastrophically if any of
your filename-patterns have newlines in them, since grep can't handle
patterns with newlines... so you'd want to filter those out even in
the array-based alternative.)

In any case, for f in $(find...) is always wrong.  Sorry, but it's true.
There's no salvaging it.



Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Greg Wooledge
On Wed, May 22, 2019 at 05:34:22PM +, Charles-Henri Gros wrote:
> On 5/22/19 5:43 AM, Greg Wooledge wrote:
> > Standard disclaimers apply.  Stop using unquoted variables and these
> > bugs will stop affecting you.  Nevertheless, Chet may want to take a
> > peek.
> 
> What unquoted variables? Are you talking about the "$()" expansion?

Yes.  I used a variable instead of a command substitution to make it
easier to reproduce the problem.  Both have the same behavior in this
case.

> The problem I'm trying to solve is to iterate over regex-escaped file
> names obtained from a "find" command. I don't know how to make this
> work. It works with other versions of bash and with other shells.

First step: do not "regex-escape" them, whatever that means.  Just use
the actual filenames as printed by find -print0.

> The original is closer to something like this:
> 
> for file in $(find ... | sed 's/\$/\\$/g'); do grep -e "$file"
> someinput; done

Yeah, that's just the wrong approach.  It's also the first thing on
the BashPitfalls page[1] (for a good reason).

You have two choices here:

1) Use find -exec.

   find ... -exec grep -e someinput /dev/null {} +

2) Use find -print0 and a bash while read loop.  (NOT a for loop.)

   find ... -print0 |
   while IFS= read -rd '' file; do
  something "$file"
   done

   (A variant of this uses < <() instead of a pipeline, so that the while
   loop runs in the main shell and variable assignments can persist.)

Since you only show a simple grep as your action, find -exec is a better
choice for this problem.  (Assuming you didn't fatally misrepresent the
problem.)  Calling grep once for every file would be inefficient.

[1] https://mywiki.wooledge.org/BashPitfalls#pf1



Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Charles-Henri Gros
On 5/22/19 5:43 AM, Greg Wooledge wrote:
> On Wed, May 22, 2019 at 05:25:43PM +0700, Robert Elz wrote:
>> Date:Tue, 21 May 2019 22:11:20 +
>> From:Charles-Henri Gros 
>> Message-ID:  
>> 
>>
>>   | The existence or not of the file should not have any effect.
>>
>> But it does, and is intended to.   If the mattern matches a file
>> (when patyhname expanded as a result of the unquoted command substitution)
>> you get the file name produced.   If it does not match a file,
>> the pattern is left untouched.   That is the way that things are
>> supposed to work.
> With glob metacharacters, sure.  But none of the characters in his
> variable are glob metacharacters.
>
> There is definitely something weird happening here.
>
> wooledg:/tmp/x$ echo "$BASH_VERSION"
> 5.0.3(1)-release
> wooledg:/tmp/x$ touch 'a$.class'
> wooledg:/tmp/x$ i='a\$.class'; echo {$i} "{$i}"
> {a\$.class} {a\$.class}
> wooledg:/tmp/x$ i='a\$.class'; echo $i "{$i}"
> a$.class {a\$.class}
>
> Other versions of bash, plus ksh and dash, don't behave this way.
>
> wooledg:/tmp/x$ bash-2.05b
> wooledg:/tmp/x$ i='a\$.class'; echo $i "{$i}"
> a\$.class {a\$.class}
>
> wooledg:/tmp/x$ bash-4.4
> wooledg:/tmp/x$ i='a\$.class'; echo $i "{$i}"
> a\$.class {a\$.class}
>
> wooledg:/tmp/x$ ksh
> $ i='a\$.class'; echo $i "{$i}"
> a\$.class {a\$.class}
>
> wooledg:/tmp/x$ dash
> $ i='a\$.class'; echo $i "{$i}"
> a\$.class {a\$.class}
>
> It seems to be unique to bash 5.  If it's a bug fix, then I'm not
> understanding the rationale.  Backslashes shouldn't be consumed during
> glob expansion.
>
> This is also not limited to $ alone.  It happens with letters too.
>
> wooledg:/tmp/x$ touch i
> wooledg:/tmp/x$ i='\i' j='\j'
> wooledg:/tmp/x$ echo $i $j
> i \j
>
> Standard disclaimers apply.  Stop using unquoted variables and these
> bugs will stop affecting you.  Nevertheless, Chet may want to take a
> peek.

What unquoted variables? Are you talking about the "$()" expansion?

The problem I'm trying to solve is to iterate over regex-escaped file
names obtained from a "find" command. I don't know how to make this
work. It works with other versions of bash and with other shells.

The original is closer to something like this:

for file in $(find ... | sed 's/\$/\\$/g'); do grep -e "$file"
someinput; done

It used to work. Now it doesn't. I do not know how to make it work again.


-- 
Charles-Henri Gros




Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Chet Ramey
On 5/22/19 9:33 AM, Robert Elz wrote:
> Date:Wed, 22 May 2019 08:43:00 -0400
> From:Greg Wooledge 
> Message-ID:  <20190522124300.gz1...@eeg.ccf.org>
> 
>   | It seems to be unique to bash 5.  If it's a bug fix, then I'm not
>   | understanding the rationale.  Backslashes shouldn't be consumed during
>   | glob expansion.
> 
> They should - when a pattern comes from an expansion (be that a
> variable expansion, or as here, a command substitution) there needs
> to be a way to indicate whether the potential magic chars are in
> fact intended as magic chars, or as literals.   \ is used for that.

There is more discussion in

http://lists.gnu.org/archive/html/bug-bash/2019-02/msg00151.html

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Robert Elz
Date:Wed, 22 May 2019 08:43:00 -0400
From:Greg Wooledge 
Message-ID:  <20190522124300.gz1...@eeg.ccf.org>

  | It seems to be unique to bash 5.  If it's a bug fix, then I'm not
  | understanding the rationale.  Backslashes shouldn't be consumed during
  | glob expansion.

They should - when a pattern comes from an expansion (be that a
variable expansion, or as here, a command substitution) there needs
to be a way to indicate whether the potential magic chars are in
fact intended as magic chars, or as literals.   \ is used for that.

If quoted, everything is literal, and there's no issue, but when
unquoted there needs to be this mechanism.   So, I think it was a
bug fix (I recently made very similar fixes to the NetBSD shell).

Uses of this kind of thing are obscure, but they exist.

Here, the $ isn't magic to pathname expansion (glob is not a RE)
so the \ doesn't do anything useful, but consider

ls $( printf %s '\**.c' )

what that should do is list all files that end in .c and start
with an asterisk (star).   There the first '*' is to be treated
literally, and the 2nd is the "match anything" metc char.   Only
the presence of the \ can distinguish those two cases.   (Well, here
one could make the pattern be [*]*.c but that isn't always easy, or
even possible).

kre




Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Greg Wooledge
On Wed, May 22, 2019 at 05:25:43PM +0700, Robert Elz wrote:
> Date:Tue, 21 May 2019 22:11:20 +
> From:Charles-Henri Gros 
> Message-ID:  
> 
> 
>   | The existence or not of the file should not have any effect.
> 
> But it does, and is intended to.   If the mattern matches a file
> (when patyhname expanded as a result of the unquoted command substitution)
> you get the file name produced.   If it does not match a file,
> the pattern is left untouched.   That is the way that things are
> supposed to work.

With glob metacharacters, sure.  But none of the characters in his
variable are glob metacharacters.

There is definitely something weird happening here.

wooledg:/tmp/x$ echo "$BASH_VERSION"
5.0.3(1)-release
wooledg:/tmp/x$ touch 'a$.class'
wooledg:/tmp/x$ i='a\$.class'; echo {$i} "{$i}"
{a\$.class} {a\$.class}
wooledg:/tmp/x$ i='a\$.class'; echo $i "{$i}"
a$.class {a\$.class}

Other versions of bash, plus ksh and dash, don't behave this way.

wooledg:/tmp/x$ bash-2.05b
wooledg:/tmp/x$ i='a\$.class'; echo $i "{$i}"
a\$.class {a\$.class}

wooledg:/tmp/x$ bash-4.4
wooledg:/tmp/x$ i='a\$.class'; echo $i "{$i}"
a\$.class {a\$.class}

wooledg:/tmp/x$ ksh
$ i='a\$.class'; echo $i "{$i}"
a\$.class {a\$.class}

wooledg:/tmp/x$ dash
$ i='a\$.class'; echo $i "{$i}"
a\$.class {a\$.class}

It seems to be unique to bash 5.  If it's a bug fix, then I'm not
understanding the rationale.  Backslashes shouldn't be consumed during
glob expansion.

This is also not limited to $ alone.  It happens with letters too.

wooledg:/tmp/x$ touch i
wooledg:/tmp/x$ i='\i' j='\j'
wooledg:/tmp/x$ echo $i $j
i \j

Standard disclaimers apply.  Stop using unquoted variables and these
bugs will stop affecting you.  Nevertheless, Chet may want to take a
peek.



Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Robert Elz
Date:Tue, 21 May 2019 22:11:20 +
From:Charles-Henri Gros 
Message-ID:  


  | The existence or not of the file should not have any effect.

But it does, and is intended to.   If the mattern matches a file
(when patyhname expanded as a result of the unquoted command substitution)
you get the file name produced.   If it does not match a file,
the pattern is left untouched.   That is the way that things are
supposed to work.

I suspect that you meant to say

for i in "$(echo "a\\\$.class")"; do echo "$i"; done

then there would be no pathname expansion happening (more correctly,
there still is, but the pathname to be expanded contains no magic
chars, only chars that match literally, so you either get the file
with the exact same name, or the pattern untouched, which is the same
thing - shells generally optimise away the attempt to match in that
case, as the result is always known in advance).

kre




Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-21 Thread Charles-Henri Gros
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2
-fdebug-prefix-map=/build/bash-Dl674z/bash-5.0=.
-fstack-protector-strong -Wformat -Werror=format-security -Wall
-Wno-parentheses -Wno-format-security
uname output: Linux d-us6a-ubuntu-03 5.0.0-13-generic #14-Ubuntu SMP Mon
Apr 15 14:59:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.0
Patch Level: 3
Release Status: release

Description:
Backslash mysteriously disappears in command expansion when
unescaping would reference an existing file

Repeat-By:
> touch a\$.class
> for i in $(echo "a\\\$.class"); do echo "$i"; done
a$.class
> rm a\$.class
> for i in $(echo "a\\\$.class"); do echo "$i"; done
a\$.class

The existence or not of the file should not have any effect.