Re: bash-4.3.33 regexp bug

2015-03-06 Thread Chet Ramey
On 3/6/15 4:35 PM, Stephane Chazelas wrote:
> 2015-03-06 11:43:24 -0500, Chet Ramey:
>> On 3/5/15 12:36 PM, Greg Wooledge wrote:
>>> On Thu, Mar 05, 2015 at 05:26:00PM +, Stephane Chazelas wrote:
 The bash manual only points to regex(3).

 So it's down to your system's regex library (uses
 regcomp(REG_EXTENDED)) which on recent GNU systems supports \s.
>>>
>>> I see.  So it's another nonportable feature like printf '%(%s)T'.
>>
>> It's an imperfect world.  I'd rather not replace the system's regexp
>> library, just like I'd rather use the system's strftime(3) if it's
>> available.
> [...]
> 
> BTW:
> 
> $ ltrace -e regcomp bash -c 'bs="\\"; [[ a =~ ${bs}[\.s ]]'
> bash->regcomp(0x7fff4d48bef0, "\\[.s", 1)
> 
> I think it should be
>  
> bash->regcomp(0x7fff4d48bef0, "\\[\\.s", 1)

Thanks for the report.  I think you're right, and the code needs to handle
the absence of a closing bracket better.  This will be fixed for the next
release of bash.

Chet
> 


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: bash-4.3.33 regexp bug

2015-03-06 Thread Stephane Chazelas
2015-03-06 11:43:24 -0500, Chet Ramey:
> On 3/5/15 12:36 PM, Greg Wooledge wrote:
> > On Thu, Mar 05, 2015 at 05:26:00PM +, Stephane Chazelas wrote:
> >> The bash manual only points to regex(3).
> >>
> >> So it's down to your system's regex library (uses
> >> regcomp(REG_EXTENDED)) which on recent GNU systems supports \s.
> > 
> > I see.  So it's another nonportable feature like printf '%(%s)T'.
> 
> It's an imperfect world.  I'd rather not replace the system's regexp
> library, just like I'd rather use the system's strftime(3) if it's
> available.
[...]

BTW:

$ ltrace -e regcomp bash -c 'bs="\\"; [[ a =~ ${bs}[\.s ]]'
bash->regcomp(0x7fff4d48bef0, "\\[.s", 1)

I think it should be
 
bash->regcomp(0x7fff4d48bef0, "\\[\\.s", 1)

-- 
Stephane



Re: bash-4.3.33 regexp bug

2015-03-06 Thread Chet Ramey
On 3/5/15 9:51 AM, Eduardo A. Bustamante López wrote:
> On Thu, Mar 05, 2015 at 02:26:48PM +, Jason Vas Dias wrote:
>> Good day list, Chet -
>>
>> I think this is a bug:
>> ( set -x ;  tab=$'\011';  s="some text: 1.2.3";
>>   if [[ "$s" =~ ^some text:[\ $tab]+([0-9.]+) ]]; then
>> echo "${BASH_REMATCH[1]}";
>>   fi
>> )
>> -bash: syntax error in conditional expression
>> -bash: syntax error near `$tab]+([0-9.]+)'
>>
> From a quick glance, it does seem like a parsing bug, it should not break with
> a syntax error.

It's not a bug; the space between ^some and text needs to be escaped
somehow to prevent it breaking words.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: bash-4.3.33 regexp bug

2015-03-06 Thread Chet Ramey
On 3/5/15 12:36 PM, Greg Wooledge wrote:
> On Thu, Mar 05, 2015 at 05:26:00PM +, Stephane Chazelas wrote:
>> The bash manual only points to regex(3).
>>
>> So it's down to your system's regex library (uses
>> regcomp(REG_EXTENDED)) which on recent GNU systems supports \s.
> 
> I see.  So it's another nonportable feature like printf '%(%s)T'.

It's an imperfect world.  I'd rather not replace the system's regexp
library, just like I'd rather use the system's strftime(3) if it's
available.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: bash-4.3.33 regexp bug

2015-03-06 Thread Chet Ramey
On 3/5/15 9:26 AM, Jason Vas Dias wrote:
> Good day list, Chet -
> 
> I think this is a bug:
> ( set -x ;  tab=$'\011';  s="some text: 1.2.3";
>   if [[ "$s" =~ ^some text:[\ $tab]+([0-9.]+) ]]; then
> echo "${BASH_REMATCH[1]}";
>   fi
> )
> -bash: syntax error in conditional expression
> -bash: syntax error near `$tab]+([0-9.]+)'
> 
> Do you agree ?

No.

This is a conditional command that looks like

[[ "$s" =~ ^some garbage ]]

(four tokens).

When bash tries to make sense of the stuff after the space between
`^some' and `text', it can't find any operators that make sense in
that context -- or the closing `]]' -- and reports a syntax error.

The parser still has to tokenize the words in a conditional command,
and an unescaped space separates words.  If you escape that space, the
regular expression should match like you want.  It does in my testing.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: bash-4.3.33 regexp bug

2015-03-05 Thread Stephane Chazelas
2015-03-05 12:36:39 -0500, Greg Wooledge:
> On Thu, Mar 05, 2015 at 05:26:00PM +, Stephane Chazelas wrote:
> > The bash manual only points to regex(3).
> > 
> > So it's down to your system's regex library (uses
> > regcomp(REG_EXTENDED)) which on recent GNU systems supports \s.
> 
> I see.  So it's another nonportable feature like printf '%(%s)T'.
> Good to know!
> 
> imadev:~$ s='foo bar'; r='\s'; if [[ $s =~ $r ]]; then echo match; fi
> imadev:~$ printf '%(%s)T\n' -1
> s
> 
> wooledg@wooledg:~$ s='foo bar'; r='\s'; if [[ $s =~ $r ]]; then echo match; fi
> match
> wooledg@wooledg:~$ printf '%(%s)T\n' -1
> 1425576833
[...]

It's a bit worse than %(%s)T (another ksh93 feature (a subset
thereof as ksh93 also parses the argument as a date)) in that
while %(xxx)T passes xxx to strftime verbatim, in [[ ... =~ xxx
]], bash does some modification on xxx making some assumtion on
the syntax of that regex (provided by a 3rd party).

Since 3.2, shell-quotings (so with \, ', ") a regexp "escapes"
the regular expression operators.

That (I think) was done for compatibility with ksh93, but while
ksh93 has its own AT&T regexps, bash uses 3rd parties'.

So for instance, when you write:

[[ foo =~ ".". ]]

bash calls regcomp() with "\..".

There used to be a bug in that: ["."] would be turned into [\.]
(matching backslash in addition to dot).

Now bash should work as long as you use POSIX compatible
regexps and the system's regexp library is POSIX compliant.

When you want to make use of extensions in your system's regexps
is where it starts to be tricky and it helps to know how bash
works in that regard.

[[ foo =~ \s ]]

would call regcomp with "s" (backslash is taken as shell
quoting, s is not a POSIX regex operator so a \ is not added),
and \\s or "\s" with "\\s" (double backslash s) (quoted \, \
is also a regexp operator so \ added) . That's why you need the
variable to be able to use that non-POSIX \s extension.

[[ foo = $var ]] passes the content of $var verbatim to regcomp,
while [[ foo = "$var" ]] passes the content of $var with regexp
operators escaped.

You can also do:

bs='\'
[[ " " =~ ${bs}s ]]

to pass "\s" to regcomp().

-- 
Stephane




Re: bash-4.3.33 regexp bug

2015-03-05 Thread Greg Wooledge
On Thu, Mar 05, 2015 at 05:26:00PM +, Stephane Chazelas wrote:
> The bash manual only points to regex(3).
> 
> So it's down to your system's regex library (uses
> regcomp(REG_EXTENDED)) which on recent GNU systems supports \s.

I see.  So it's another nonportable feature like printf '%(%s)T'.
Good to know!

imadev:~$ s='foo bar'; r='\s'; if [[ $s =~ $r ]]; then echo match; fi
imadev:~$ printf '%(%s)T\n' -1
s

wooledg@wooledg:~$ s='foo bar'; r='\s'; if [[ $s =~ $r ]]; then echo match; fi
match
wooledg@wooledg:~$ printf '%(%s)T\n' -1
1425576833



Re: bash-4.3.33 regexp bug

2015-03-05 Thread Stephane Chazelas
2015-03-05 12:14:21 -0500, Greg Wooledge:
> On Thu, Mar 05, 2015 at 05:07:44PM +, Stephane Chazelas wrote:
> > bash also supports \s, but that's more for [[:space:]] (so
> > includes vertical spacing like CR, LF), and you need to use an
> > intermediary variable:
> > 
> > r='^some text:\s+([0-9.]+)'
> > [[ $s =~ $r ]]
> 
> Woah!  What?  Where is *that* documented?  The only \s in the man page
> is in PS1.

The bash manual only points to regex(3).

So it's down to your system's regex library (uses
regcomp(REG_EXTENDED)) which on recent GNU systems supports \s.

The need for the intermediary variable is down to the broken way
bash tries to overload \ as a quoting operator and regexp
operator since bash-3.2.

-- 
Stephane



Re: bash-4.3.33 regexp bug

2015-03-05 Thread Greg Wooledge
On Thu, Mar 05, 2015 at 05:07:44PM +, Stephane Chazelas wrote:
> bash also supports \s, but that's more for [[:space:]] (so
> includes vertical spacing like CR, LF), and you need to use an
> intermediary variable:
> 
> r='^some text:\s+([0-9.]+)'
> [[ $s =~ $r ]]

Woah!  What?  Where is *that* documented?  The only \s in the man page
is in PS1.



Re: bash-4.3.33 regexp bug

2015-03-05 Thread Stephane Chazelas
2015-03-05 14:26:48 +, Jason Vas Dias:
> Good day list, Chet -
> 
> I think this is a bug:
> ( set -x ;  tab=$'\011';  s="some text: 1.2.3";
>   if [[ "$s" =~ ^some text:[\ $tab]+([0-9.]+) ]]; then
> echo "${BASH_REMATCH[1]}";
>   fi
> )
> -bash: syntax error in conditional expression
> -bash: syntax error near `$tab]+([0-9.]+)'
[...]

You forgot to quote the first space. Should be

  if [[ "$s" =~ ^some\ text:[\ $tab]+([0-9.]+) ]]; then

I'd use [[:blank:]] to match blanks though.

bash also supports \s, but that's more for [[:space:]] (so
includes vertical spacing like CR, LF), and you need to use an
intermediary variable:

r='^some text:\s+([0-9.]+)'
[[ $s =~ $r ]]

or use

shopt -s compat31
[[ $s =~ '^some text:\s+([0-9.]+)' ]]

or use zsh

in bash-3.2, the behaviour was changed (broken IMO) to match
ksh93's.

-- 
Stephane



Re: bash-4.3.33 regexp bug

2015-03-05 Thread Eduardo A . Bustamante López
On Thu, Mar 05, 2015 at 02:26:48PM +, Jason Vas Dias wrote:
> Good day list, Chet -
> 
> I think this is a bug:
> ( set -x ;  tab=$'\011';  s="some text: 1.2.3";
>   if [[ "$s" =~ ^some text:[\ $tab]+([0-9.]+) ]]; then
> echo "${BASH_REMATCH[1]}";
>   fi
> )
> -bash: syntax error in conditional expression
> -bash: syntax error near `$tab]+([0-9.]+)'
> 
>From a quick glance, it does seem like a parsing bug, it should not break with
a syntax error.

Though, you can work-around this issue by doing the recommended approach to
regex matching: storing the regex in a variable.

|  dualbus@dualbus ~ % bash bug
|  + s='some text: 1.2.3'
|  + r='^some text:[   ]+([0-9.]+)'
|  + [[ some text: 1.2.3 =~ ^some text:[   ]+([0-9.]+) ]]
|  + echo 1.2.3
|  1.2.3
|  
|  dualbus@dualbus ~ % cat bug
|  #!/bin/bash
|  ( set -x ; s="some text: 1.2.3";
|r=$'^some text:[ \t]+([0-9.]+)'
|if [[ "$s" =~ $r ]]; then
|  echo "${BASH_REMATCH[1]}";
|fi
|  )

See http://mywiki.wooledge.org/BashGuide/Patterns#Regular_Expressions-1 and
http://tiswww.case.edu/php/chet/bash/FAQ (E14).



bash-4.3.33 regexp bug

2015-03-05 Thread Jason Vas Dias
Good day list, Chet -

I think this is a bug:
( set -x ;  tab=$'\011';  s="some text: 1.2.3";
  if [[ "$s" =~ ^some text:[\ $tab]+([0-9.]+) ]]; then
echo "${BASH_REMATCH[1]}";
  fi
)
-bash: syntax error in conditional expression
-bash: syntax error near `$tab]+([0-9.]+)'

Do you agree ?
If not, what sort of regexp should I use to match ':[]+[0-9]+' ?
The problem happens regardless of whether I use the $tab variable or
a literal '\'$'\011' sequence (sorry, I can't type  in this mailer).

Thanks in advance for any replies,
Regards,
Jason