Re: bash-4.3.33 regexp bug
On 3/6/15 4:35 PM, Stephane Chazelas wrote: > 2015-03-06 11:43:24 -0500, Chet Ramey: >> On 3/5/15 12:36 PM, Greg Wooledge wrote: >>> On Thu, Mar 05, 2015 at 05:26:00PM +, Stephane Chazelas wrote: The bash manual only points to regex(3). So it's down to your system's regex library (uses regcomp(REG_EXTENDED)) which on recent GNU systems supports \s. >>> >>> I see. So it's another nonportable feature like printf '%(%s)T'. >> >> It's an imperfect world. I'd rather not replace the system's regexp >> library, just like I'd rather use the system's strftime(3) if it's >> available. > [...] > > BTW: > > $ ltrace -e regcomp bash -c 'bs="\\"; [[ a =~ ${bs}[\.s ]]' > bash->regcomp(0x7fff4d48bef0, "\\[.s", 1) > > I think it should be > > bash->regcomp(0x7fff4d48bef0, "\\[\\.s", 1) Thanks for the report. I think you're right, and the code needs to handle the absence of a closing bracket better. This will be fixed for the next release of bash. Chet > -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: bash-4.3.33 regexp bug
2015-03-06 11:43:24 -0500, Chet Ramey: > On 3/5/15 12:36 PM, Greg Wooledge wrote: > > On Thu, Mar 05, 2015 at 05:26:00PM +, Stephane Chazelas wrote: > >> The bash manual only points to regex(3). > >> > >> So it's down to your system's regex library (uses > >> regcomp(REG_EXTENDED)) which on recent GNU systems supports \s. > > > > I see. So it's another nonportable feature like printf '%(%s)T'. > > It's an imperfect world. I'd rather not replace the system's regexp > library, just like I'd rather use the system's strftime(3) if it's > available. [...] BTW: $ ltrace -e regcomp bash -c 'bs="\\"; [[ a =~ ${bs}[\.s ]]' bash->regcomp(0x7fff4d48bef0, "\\[.s", 1) I think it should be bash->regcomp(0x7fff4d48bef0, "\\[\\.s", 1) -- Stephane
Re: bash-4.3.33 regexp bug
On 3/5/15 9:51 AM, Eduardo A. Bustamante López wrote: > On Thu, Mar 05, 2015 at 02:26:48PM +, Jason Vas Dias wrote: >> Good day list, Chet - >> >> I think this is a bug: >> ( set -x ; tab=$'\011'; s="some text: 1.2.3"; >> if [[ "$s" =~ ^some text:[\ $tab]+([0-9.]+) ]]; then >> echo "${BASH_REMATCH[1]}"; >> fi >> ) >> -bash: syntax error in conditional expression >> -bash: syntax error near `$tab]+([0-9.]+)' >> > From a quick glance, it does seem like a parsing bug, it should not break with > a syntax error. It's not a bug; the space between ^some and text needs to be escaped somehow to prevent it breaking words. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: bash-4.3.33 regexp bug
On 3/5/15 12:36 PM, Greg Wooledge wrote: > On Thu, Mar 05, 2015 at 05:26:00PM +, Stephane Chazelas wrote: >> The bash manual only points to regex(3). >> >> So it's down to your system's regex library (uses >> regcomp(REG_EXTENDED)) which on recent GNU systems supports \s. > > I see. So it's another nonportable feature like printf '%(%s)T'. It's an imperfect world. I'd rather not replace the system's regexp library, just like I'd rather use the system's strftime(3) if it's available. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: bash-4.3.33 regexp bug
On 3/5/15 9:26 AM, Jason Vas Dias wrote: > Good day list, Chet - > > I think this is a bug: > ( set -x ; tab=$'\011'; s="some text: 1.2.3"; > if [[ "$s" =~ ^some text:[\ $tab]+([0-9.]+) ]]; then > echo "${BASH_REMATCH[1]}"; > fi > ) > -bash: syntax error in conditional expression > -bash: syntax error near `$tab]+([0-9.]+)' > > Do you agree ? No. This is a conditional command that looks like [[ "$s" =~ ^some garbage ]] (four tokens). When bash tries to make sense of the stuff after the space between `^some' and `text', it can't find any operators that make sense in that context -- or the closing `]]' -- and reports a syntax error. The parser still has to tokenize the words in a conditional command, and an unescaped space separates words. If you escape that space, the regular expression should match like you want. It does in my testing. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: bash-4.3.33 regexp bug
2015-03-05 12:36:39 -0500, Greg Wooledge: > On Thu, Mar 05, 2015 at 05:26:00PM +, Stephane Chazelas wrote: > > The bash manual only points to regex(3). > > > > So it's down to your system's regex library (uses > > regcomp(REG_EXTENDED)) which on recent GNU systems supports \s. > > I see. So it's another nonportable feature like printf '%(%s)T'. > Good to know! > > imadev:~$ s='foo bar'; r='\s'; if [[ $s =~ $r ]]; then echo match; fi > imadev:~$ printf '%(%s)T\n' -1 > s > > wooledg@wooledg:~$ s='foo bar'; r='\s'; if [[ $s =~ $r ]]; then echo match; fi > match > wooledg@wooledg:~$ printf '%(%s)T\n' -1 > 1425576833 [...] It's a bit worse than %(%s)T (another ksh93 feature (a subset thereof as ksh93 also parses the argument as a date)) in that while %(xxx)T passes xxx to strftime verbatim, in [[ ... =~ xxx ]], bash does some modification on xxx making some assumtion on the syntax of that regex (provided by a 3rd party). Since 3.2, shell-quotings (so with \, ', ") a regexp "escapes" the regular expression operators. That (I think) was done for compatibility with ksh93, but while ksh93 has its own AT&T regexps, bash uses 3rd parties'. So for instance, when you write: [[ foo =~ ".". ]] bash calls regcomp() with "\..". There used to be a bug in that: ["."] would be turned into [\.] (matching backslash in addition to dot). Now bash should work as long as you use POSIX compatible regexps and the system's regexp library is POSIX compliant. When you want to make use of extensions in your system's regexps is where it starts to be tricky and it helps to know how bash works in that regard. [[ foo =~ \s ]] would call regcomp with "s" (backslash is taken as shell quoting, s is not a POSIX regex operator so a \ is not added), and \\s or "\s" with "\\s" (double backslash s) (quoted \, \ is also a regexp operator so \ added) . That's why you need the variable to be able to use that non-POSIX \s extension. [[ foo = $var ]] passes the content of $var verbatim to regcomp, while [[ foo = "$var" ]] passes the content of $var with regexp operators escaped. You can also do: bs='\' [[ " " =~ ${bs}s ]] to pass "\s" to regcomp(). -- Stephane
Re: bash-4.3.33 regexp bug
On Thu, Mar 05, 2015 at 05:26:00PM +, Stephane Chazelas wrote: > The bash manual only points to regex(3). > > So it's down to your system's regex library (uses > regcomp(REG_EXTENDED)) which on recent GNU systems supports \s. I see. So it's another nonportable feature like printf '%(%s)T'. Good to know! imadev:~$ s='foo bar'; r='\s'; if [[ $s =~ $r ]]; then echo match; fi imadev:~$ printf '%(%s)T\n' -1 s wooledg@wooledg:~$ s='foo bar'; r='\s'; if [[ $s =~ $r ]]; then echo match; fi match wooledg@wooledg:~$ printf '%(%s)T\n' -1 1425576833
Re: bash-4.3.33 regexp bug
2015-03-05 12:14:21 -0500, Greg Wooledge: > On Thu, Mar 05, 2015 at 05:07:44PM +, Stephane Chazelas wrote: > > bash also supports \s, but that's more for [[:space:]] (so > > includes vertical spacing like CR, LF), and you need to use an > > intermediary variable: > > > > r='^some text:\s+([0-9.]+)' > > [[ $s =~ $r ]] > > Woah! What? Where is *that* documented? The only \s in the man page > is in PS1. The bash manual only points to regex(3). So it's down to your system's regex library (uses regcomp(REG_EXTENDED)) which on recent GNU systems supports \s. The need for the intermediary variable is down to the broken way bash tries to overload \ as a quoting operator and regexp operator since bash-3.2. -- Stephane
Re: bash-4.3.33 regexp bug
On Thu, Mar 05, 2015 at 05:07:44PM +, Stephane Chazelas wrote: > bash also supports \s, but that's more for [[:space:]] (so > includes vertical spacing like CR, LF), and you need to use an > intermediary variable: > > r='^some text:\s+([0-9.]+)' > [[ $s =~ $r ]] Woah! What? Where is *that* documented? The only \s in the man page is in PS1.
Re: bash-4.3.33 regexp bug
2015-03-05 14:26:48 +, Jason Vas Dias: > Good day list, Chet - > > I think this is a bug: > ( set -x ; tab=$'\011'; s="some text: 1.2.3"; > if [[ "$s" =~ ^some text:[\ $tab]+([0-9.]+) ]]; then > echo "${BASH_REMATCH[1]}"; > fi > ) > -bash: syntax error in conditional expression > -bash: syntax error near `$tab]+([0-9.]+)' [...] You forgot to quote the first space. Should be if [[ "$s" =~ ^some\ text:[\ $tab]+([0-9.]+) ]]; then I'd use [[:blank:]] to match blanks though. bash also supports \s, but that's more for [[:space:]] (so includes vertical spacing like CR, LF), and you need to use an intermediary variable: r='^some text:\s+([0-9.]+)' [[ $s =~ $r ]] or use shopt -s compat31 [[ $s =~ '^some text:\s+([0-9.]+)' ]] or use zsh in bash-3.2, the behaviour was changed (broken IMO) to match ksh93's. -- Stephane
Re: bash-4.3.33 regexp bug
On Thu, Mar 05, 2015 at 02:26:48PM +, Jason Vas Dias wrote: > Good day list, Chet - > > I think this is a bug: > ( set -x ; tab=$'\011'; s="some text: 1.2.3"; > if [[ "$s" =~ ^some text:[\ $tab]+([0-9.]+) ]]; then > echo "${BASH_REMATCH[1]}"; > fi > ) > -bash: syntax error in conditional expression > -bash: syntax error near `$tab]+([0-9.]+)' > >From a quick glance, it does seem like a parsing bug, it should not break with a syntax error. Though, you can work-around this issue by doing the recommended approach to regex matching: storing the regex in a variable. | dualbus@dualbus ~ % bash bug | + s='some text: 1.2.3' | + r='^some text:[ ]+([0-9.]+)' | + [[ some text: 1.2.3 =~ ^some text:[ ]+([0-9.]+) ]] | + echo 1.2.3 | 1.2.3 | | dualbus@dualbus ~ % cat bug | #!/bin/bash | ( set -x ; s="some text: 1.2.3"; |r=$'^some text:[ \t]+([0-9.]+)' |if [[ "$s" =~ $r ]]; then | echo "${BASH_REMATCH[1]}"; |fi | ) See http://mywiki.wooledge.org/BashGuide/Patterns#Regular_Expressions-1 and http://tiswww.case.edu/php/chet/bash/FAQ (E14).
bash-4.3.33 regexp bug
Good day list, Chet - I think this is a bug: ( set -x ; tab=$'\011'; s="some text: 1.2.3"; if [[ "$s" =~ ^some text:[\ $tab]+([0-9.]+) ]]; then echo "${BASH_REMATCH[1]}"; fi ) -bash: syntax error in conditional expression -bash: syntax error near `$tab]+([0-9.]+)' Do you agree ? If not, what sort of regexp should I use to match ':[]+[0-9]+' ? The problem happens regardless of whether I use the $tab variable or a literal '\'$'\011' sequence (sorry, I can't type in this mailer). Thanks in advance for any replies, Regards, Jason