Re: Incorrect example for `[[` command.

2019-09-22 Thread Chet Ramey
On 9/21/19 5:34 AM, Ilkka Virta wrote:
> On 21.9. 03:12, hk wrote:
>> Thanks for the reply. I was wrong in my report. It does match values like
>> aab and  aab  in its original form.
> 
> In some systems, yes. (It does that on my Debian, but doesn't work at all
> on my Mac.)
> 
>> It is syntatically correct as a regular expression. 
> 
> [[:space:]]*?(a)b  isn't a well-defined POSIX ERE:
> 
>   9.4.6 EREs Matching Multiple Characters
> 
>   The behavior of multiple adjacent duplication symbols ( '+', '*', '?',
>   and intervals) produces undefined results.

It's ambiguous, but it can be interpreted as valid. I wonder why they used
"undefined" instead of the usual "unspecified."


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Incorrect example for `[[` command.

2019-09-21 Thread Chet Ramey
On 9/20/19 8:12 PM, hk wrote:

> What is wrong is the description `zero or one instances of 'a''. But if we
> correct the right hand side word to be  `[[:space:]]*(a)?b' that it does
> match what the description says.(the parenthese around `a' could be omitted).

Yeah, that's the typo.

> I was also wrong saying it was a pattern instead of a regular expression.
> It is syntatically correct as a regular expression.

That's true. According to the POSIX ERE definition, the `?' is a special
ERE character in an invalid position (it's only special after a specifier
that matches a single character, not after a separate specifier that
matches multiple characters), so it matches itself.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Incorrect example for `[[` command.

2019-09-21 Thread Ilkka Virta

On 21.9. 21:55, Dmitry Goncharov wrote:

On Sat, Sep 21, 2019 at 12:34:39PM +0300, Ilkka Virta wrote:

[[:space:]]*?(a)b  isn't a well-defined POSIX ERE:

9.4.6 EREs Matching Multiple Characters

The behavior of multiple adjacent duplication symbols ( '+', '*', '?',
and intervals) produces undefined results.

https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/V1_chap09.html


This is unfortunate.
*? and +? are widely used not greedy regexes.


In Perl-compatible regexes. Bash uses POSIX extended regular expressions.

And on a GNU system, while *? and +? don't give errors when used in an 
ERE, they still don't make the repetition non-greedy. They just act the 
same as a single * (as far as I can tell anyway).


 bash$ re='<.+?>'
 bash$ [[ "ace" =~ $re ]] && echo $BASH_REMATCH
 c
 bash$ [[ "a<>e" =~ $re ]] && echo $BASH_REMATCH
 <>

--
Ilkka Virta / itvi...@iki.fi



Re: Incorrect example for `[[` command.

2019-09-21 Thread Dmitry Goncharov via Bug reports for the GNU Bourne Again SHell
On Sat, Sep 21, 2019 at 12:34:39PM +0300, Ilkka Virta wrote:
> [[:space:]]*?(a)b  isn't a well-defined POSIX ERE:
> 
>9.4.6 EREs Matching Multiple Characters
> 
>The behavior of multiple adjacent duplication symbols ( '+', '*', '?',
>and intervals) produces undefined results.
> 
> https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/V1_chap09.html

This is unfortunate.
*? and +? are widely used not greedy regexes.

regards, Dmitry



Re: Incorrect example for `[[` command.

2019-09-21 Thread hk
Thanks. Have learnt a lot from your replies.

On Sat, Sep 21, 2019 at 5:34 PM Ilkka Virta  wrote:

> On 21.9. 03:12, hk wrote:
> > Thanks for the reply. I was wrong in my report. It does match values like
> > aab and  aab  in its original form.
>
> In some systems, yes. (It does that on my Debian, but doesn't work at
> all on my Mac.)
>
> > It is syntatically correct as a regular expression.
>
> [[:space:]]*?(a)b  isn't a well-defined POSIX ERE:
>
>9.4.6 EREs Matching Multiple Characters
>
>The behavior of multiple adjacent duplication symbols ( '+', '*', '?',
>and intervals) produces undefined results.
>
>
> https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/V1_chap09.html
>
>
> --
> Ilkka Virta / itvi...@iki.fi
>


Re: Incorrect example for `[[` command.

2019-09-21 Thread Ilkka Virta

On 21.9. 03:12, hk wrote:

Thanks for the reply. I was wrong in my report. It does match values like
aab and  aab  in its original form.


In some systems, yes. (It does that on my Debian, but doesn't work at 
all on my Mac.)


It is syntatically correct as a regular expression. 


[[:space:]]*?(a)b  isn't a well-defined POSIX ERE:

  9.4.6 EREs Matching Multiple Characters

  The behavior of multiple adjacent duplication symbols ( '+', '*', '?',
  and intervals) produces undefined results.

https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/V1_chap09.html


--
Ilkka Virta / itvi...@iki.fi



Re: Incorrect example for `[[` command.

2019-09-20 Thread hk
Thanks for the reply. I was wrong in my report. It does match values like
aab and  aab  in its original form.

What is wrong is the description `zero or one instances of 'a''. But if we
correct the right hand side word to be  `[[:space:]]*(a)?b' that it does
match what the description says.(the parenthese around `a' could be
omitted).

I was also wrong saying it was a pattern instead of a regular expression.
It is syntatically correct as a regular expression. A word can be
syntactically correct as a pattern and a regular expression at the same
time though the semantic might be different in most cases.

On Fri, Sep 20, 2019 at 11:48 PM Chet Ramey  wrote:

> On 9/20/19 1:40 AM, hk wrote:
>
> > Bash Version: 5.0
> > Patch Level: 0
> > Release Status: release
> >
> > Description:
> > On section 3.2.4.2 of Bash Reference Manual, the example on*
> > [[...]]* (page 13 in the PDF) is incorrect. Specifically, the example
> say *[[
> > $line =~ [[:space:]]*?(a)b ]]*  will match values like *'aab'* and*
> > 'aab*'. But it won't. The operator is* =~*, but the operand on the
> > right side is a pattern while it should be a regular expression.
>
> Thanks for the report, this is a good catch. It's been this way since 2011.
>
> It's supposed to be a regular expression, and there's a typo. You're right
> that it doesn't match the same things as if it were interpreted as a shell
> pattern.
>
> The pattern would match the description if it were `[[:space:]]*(a)?b'.
>
> The pattern, once corrected, does match the strings in the example below,
> since, as the description says, it matches "a sequence of characters in the
> value."
>
> The regexp is unanchored, though you can anchor it yourself. That's
> arguably less useful than the anchored case (like, say, grep), but that's
> what you get from regcomp/regexec, and you have $BASH_REMATCH to see what
> you matched.
>
> Chet
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
>


Re: Incorrect example for `[[` command.

2019-09-20 Thread Ilkka Virta

On 20.9. 21:39, Chet Ramey wrote:


The portion of the manual before the example explains BASH_REMATCH and
BASH_REMATCH[0]. It also says "a sequence of characters in the value..."
when describing the pattern. 


Yeah, though the preceding paragraph contains both the general 
description of the regex match, and the mention of BASH_REMATCH, so the 
BASH_REMATCH angle could be a bit more explicit.


So I'd probably say that the pattern would match e.g. 'xxx aabyyy', or 
'xxxbyyy' and set $BASH_REMATCH to ' aab', or 'b', respectively. And 
then mention that the ^ and $ anchors could be used.


I know the usual regex behavior is to find a match anywhere within the 
value, but since it's exactly the opposite of how regular pattern 
matches work, it's probably worth mentioning in some way.  (Though I do 
think it's better to document things rather explicitly in general.)



--
Ilkka Virta / itvi...@iki.fi



Re: Incorrect example for `[[` command.

2019-09-20 Thread Chet Ramey
On 9/20/19 9:30 AM, Ilkka Virta wrote:
> On 20.9. 15:48, Greg Wooledge wrote:
>> but after the regex-glob-thing, it says:
>>
>>    That means values like ‘aab’ and ‘ aab’ will match
>>
>> So there's a shift in intent between a? and a+ in what's supposed to be
>> a regular expression.  Although of course the sentence is *literally*
>> true because the regex would be unanchored, and therefore it's sufficient
>> to match only the 'ab', and the rest of the input doesn't matter.
>> But that's just confusing, and doesn't belong in this kind of document.
> 
> It goes on to say "as will a line containing a 'b' anywhere in its value",
> so the text does recognize the zero-width-matching parts don't affect what
> matches. (I suppose they would affect what goes to BASH_REMATCH[0], but the
> text doesn't mention that.)

The portion of the manual before the example explains BASH_REMATCH and
BASH_REMATCH[0]. It also says "a sequence of characters in the value..."
when describing the pattern. This is the usual behavior of regcomp/regexec
(and grep/egrep, for that matter, since grep will print lines when a
substring matches the supplied pattern).

> I think it would be a better example with the anchored version also
> presented for comparison.

How about saying you can anchor the match with the usual ^ and $ special
pattern characters?

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Incorrect example for `[[` command.

2019-09-20 Thread Chet Ramey
On 9/20/19 1:40 AM, hk wrote:

> Bash Version: 5.0
> Patch Level: 0
> Release Status: release
> 
> Description:
> On section 3.2.4.2 of Bash Reference Manual, the example on*
> [[...]]* (page 13 in the PDF) is incorrect. Specifically, the example say *[[
> $line =~ [[:space:]]*?(a)b ]]*  will match values like *'aab'* and*
> 'aab*'. But it won't. The operator is* =~*, but the operand on the
> right side is a pattern while it should be a regular expression.

Thanks for the report, this is a good catch. It's been this way since 2011.

It's supposed to be a regular expression, and there's a typo. You're right
that it doesn't match the same things as if it were interpreted as a shell
pattern.

The pattern would match the description if it were `[[:space:]]*(a)?b'.

The pattern, once corrected, does match the strings in the example below,
since, as the description says, it matches "a sequence of characters in the
value."

The regexp is unanchored, though you can anchor it yourself. That's
arguably less useful than the anchored case (like, say, grep), but that's
what you get from regcomp/regexec, and you have $BASH_REMATCH to see what
you matched.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Incorrect example for `[[` command.

2019-09-20 Thread Ilkka Virta

On 20.9. 15:48, Greg Wooledge wrote:

but after the regex-glob-thing, it says:

   That means values like ‘aab’ and ‘ aab’ will match

So there's a shift in intent between a? and a+ in what's supposed to be
a regular expression.  Although of course the sentence is *literally*
true because the regex would be unanchored, and therefore it's sufficient
to match only the 'ab', and the rest of the input doesn't matter.
But that's just confusing, and doesn't belong in this kind of document.


It goes on to say "as will a line containing a 'b' anywhere in its 
value", so the text does recognize the zero-width-matching parts don't 
affect what matches. (I suppose they would affect what goes to 
BASH_REMATCH[0], but the text doesn't mention that.)


I think it would be a better example with the anchored version also 
presented for comparison.


--
Ilkka Virta / itvi...@iki.fi



Re: Incorrect example for `[[` command.

2019-09-20 Thread Greg Wooledge
On Fri, Sep 20, 2019 at 01:40:00PM +0800, hk wrote:
> Description:
> On section 3.2.4.2 of Bash Reference Manual, the example on*
> [[...]]* (page 13 in the PDF) is incorrect. Specifically, the example say *[[
> $line =~ [[:space:]]*?(a)b ]]*  will match values like *'aab'* and*
> 'aab*'. But it won't. The operator is* =~*, but the operand on the
> right side is a pattern while it should be a regular expression.

Nice catch.

Actually it's a mixture of a regular expression (the postfix * closure
operator) and an extended glob (the ?() enclosing syntax).  So it wouldn't
work with either = or =~ .

Even more subtly wrong, the sentence before the first instance of this
regex-glob-thing says:

  any number, including zero, of space characters, zero or one instances
  of ‘a’, then a ‘b’

but after the regex-glob-thing, it says:

  That means values like ‘aab’ and ‘ aab’ will match

So there's a shift in intent between a? and a+ in what's supposed to be
a regular expression.  Although of course the sentence is *literally*
true because the regex would be unanchored, and therefore it's sufficient
to match only the 'ab', and the rest of the input doesn't matter.
But that's just confusing, and doesn't belong in this kind of document.



Incorrect example for `[[` command.

2019-09-20 Thread hk
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -Wno-parentheses -Wno-format-security
uname output: Linux hk 4.15.0-62-generic #69-Ubuntu SMP Wed Sep 4 20:55:53
UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.0
Patch Level: 0
Release Status: release

Description:
On section 3.2.4.2 of Bash Reference Manual, the example on*
[[...]]* (page 13 in the PDF) is incorrect. Specifically, the example say *[[
$line =~ [[:space:]]*?(a)b ]]*  will match values like *'aab'* and*
'aab*'. But it won't. The operator is* =~*, but the operand on the
right side is a pattern while it should be a regular expression.

Repeat-By:
[Describe the sequence of events that causes the problem
to occur.]

Fix:
[Description of how to fix the problem.  If you don't know a
fix for the problem, don't include this section.]