Re: parenthesised regular expressions and non-greedy operator ? - non standard bash behaviour
On Mo, 2017-12-04 at 16:49 -0500, Chet Ramey wrote: > The thing is, bash doesn't "implement" its regular expressions, per > se. > Bash uses the Posix standard library functions (regcomp/regexec) if > they > are available in the C library when it's configured and built. I'm > not > wild about adding a dependency on pcre, or a configure test for it, > just > to have two varieties of regular expressions available. > > Chet O.k. – so close this as „not a bug“. -- Signature H.-Dirk Schmitt H.-Dirk Schmitt Dipl.Math. eMail:dirk.schm...@computer42.org mobile:+49 177 616 8564 phone: +49 2642 99 41 14 fax: +49 2642 99 41 15 Schillerstr. 42, D-53489 Sinzig pgp: http://www.computer42.org/~dirk/OpenPGP-fingerprint.html
Re: parenthesised regular expressions and non-greedy operator ? - non standard bash behaviour
On 12/4/17 1:42 PM, H.-Dirk Schmitt wrote: > From the 2 replies I unterstand that the implementation in bash is > correct due to the „official“ standard. > > For myself I have solved the issue in my script - but the regular > expression I developed for my problem are without the 'non-greedy' > operator more difficult to read and maintain. From that point of view > it would be an improvement for bash to implement the non-greedy > operator. The thing is, bash doesn't "implement" its regular expressions, per se. Bash uses the Posix standard library functions (regcomp/regexec) if they are available in the C library when it's configured and built. I'm not wild about adding a dependency on pcre, or a configure test for it, just to have two varieties of regular expressions available. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: parenthesised regular expressions and non-greedy operator ? - non standard bash behaviour
>From the 2 replies I unterstand that the implementation in bash is correct due to the „official“ standard. For myself I have solved the issue in my script - but the regular expression I developed for my problem are without the 'non-greedy' operator more difficult to read and maintain. From that point of view it would be an improvement for bash to implement the non-greedy operator. Also if I look from an „normal developer“ I think it is a common pitfall if many testing resources and regexp implementations support the 'non-greedy' operator. Maybe there is a switch/option to enable the 'non-greedy' operator in a future release. So please feel free to change the „bug report“ to a „feature request“ ;-) Best Regards, H.-Dirk Schmitt On So, 2017-12-03 at 15:23 -0500, Chet Ramey wrote: > On 12/1/17 12:40 PM, d...@computer42.org wrote: > > > Bash Version: 4.4 > > Patch Level: 12 > > Release Status: release > > > > Description: > > I'm sanitising urls from advertisement crap. As described below > > I'm getting a wrong resolution of parenthesised expression defined > > with non-greedy operator '?'. > > > > The test url is: http://toolbox.contentspread.net/container/medim > > ops/track/xx.dyn?csRdu=https://www.medimops.de/?anid=M9 > > 9=details=M99_source=CRM_medium=email > > m_campaign=OS > > > > The regular expression is: > > https?:\/\/toolbox.contentspread.net\/(.*?)=(.+?)&.* > > The Bash =~ operator uses Posix extended regexps (EREs) as defined in > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.ht > ml#tag_09_04. > There's no concept of a `non-greedy' operator > in the Posix ERE definition. > > Chet >
Re: parenthesised regular expressions and non-greedy operator ? - non standard bash behaviour
On 12/1/17 12:40 PM, d...@computer42.org wrote: > Bash Version: 4.4 > Patch Level: 12 > Release Status: release > > Description: > I'm sanitising urls from advertisement crap. As described below I'm getting > a wrong resolution of parenthesised expression defined with non-greedy > operator '?'. > > The test url is: > http://toolbox.contentspread.net/container/medimops/track/xx.dyn?csRdu=https://www.medimops.de/?anid=M99=details=M99_source=CRM_medium=email_campaign=OS > > The regular expression is: > https?:\/\/toolbox.contentspread.net\/(.*?)=(.+?)&.* The Bash =~ operator uses Posix extended regexps (EREs) as defined in http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04. There's no concept of a `non-greedy' operator in the Posix ERE definition. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: parenthesised regular expressions and non-greedy operator ? - non standard bash behaviour
On Fri, Dec 01, 2017 at 06:40:35PM +0100, d...@computer42.org wrote: > I'm sanitising urls from advertisement crap. As described below I'm getting > a wrong resolution of parenthesised expression defined with non-greedy > operator '?'. > re='https?:\/\/toolbox.contentspread.net\/(.*?)=(.+?)&.*' > > if [[ ${url} =~ ${re} ]] Bash's =~ operator uses Extended Regular Expressions. There is no non-greedy operator (.*? or .+?) in an ERE. It's a perl extension. Also, you don't need to escape / but you *do* need to escape dots.
parenthesised regular expressions and non-greedy operator ? - non standard bash behaviour
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-pc-linux-gnu' -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I../. -I.././include -I.././lib -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wall -no-pie -Wno-parentheses -Wno-format-security uname output: Linux dilbert 4.10.0-41-generic #45~16.04.1-Ubuntu SMP Fri Nov 24 15:06:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 4.4 Patch Level: 12 Release Status: release Description: I'm sanitising urls from advertisement crap. As described below I'm getting a wrong resolution of parenthesised expression defined with non-greedy operator '?'. The test url is: http://toolbox.contentspread.net/container/medimops/track/xx.dyn?csRdu=https://www.medimops.de/?anid=M99=details=M99_source=CRM_medium=email_campaign=OS The regular expression is: https?:\/\/toolbox.contentspread.net\/(.*?)=(.+?)&.* As I understand the specification and verified with 'visual regexp' and https://regex101.com/ the result should be: 1 â container/medimops/track/xx.dyn?csRdu 2 â https://www.medimops.de/?anid=M99 Running the script below I got instead: 1 â container/medimops/track/xx.dyn?csRdu=https://www.medimops.de/?anid=M99=details=M99_source=CRM_medium 2 â email Repeat-By: Test script: #!/bin/bash url='http://toolbox.contentspread.net/container/medimops/track/xx.dyn?csRdu=https://www.medimops.de/?anid=M99=details=M99_source=CRM_medium=email_campaign=OS' re='https?:\/\/toolbox.contentspread.net\/(.*?)=(.+?)&.*' if [[ ${url} =~ ${re} ]] then echo "0 â ${BASH_REMATCH[0]}" echo "1 â ${BASH_REMATCH[1]}" echo "2 â ${BASH_REMATCH[2]}" fi