May I chip in at this point - I agree the bug report was invalid, but many of the replies were missing the point, as far as I see. It wasn't the backslash escape that "Fan" is *mainly* confused about (which he obviously is...), but the uses of the different brackets: [] ,() .
He/She was expecting this: egrep '[a|\-|c]' foo.txt to work the same as: egrep '(a|\-|c)' foo.txt which they do not. They are totally different. (and he doesn't know the proper use of "|" either... so we basically have established that "Fan" doesn't understand how \, |, [] and () are used in regular expressions...). HTL [EMAIL PROTECTED] wrote: > Both Solaris 8 grep and GNU grep 2.5.1 give > > gannet% cat > foo.txt > a-a > b > gannet% egrep '[d|-|c]' foo.txt > gannet% egrep '[-|c]' foo.txt > a-a > > agreeing exactly with R (and the POSIX standard) and contradicting 'Fan'. > > > On Thu, 4 Jan 2007, Fan wrote: > >> Let me detail a bit my bug report: >> >> the two commands ("expected" vs "strange") should return the >> same result, the objective of the commands is to test the presence >> of several characters, '-'included. >> >> The order in which we specify the different characters must not be >> an issue, i.e., to test the presence of several characters, including >> say char_1, the regular expressions [char_1|char_2|char_3] and >> [char_2|char_1|char_3] should play the same role. Other softwares >> work just like this. >> >> What's reported is that R actually returns different result for the >> character "-" (\- in a RE) regarding it's position in the regular >> expression, and the "perl" option would not be relevant. > > As described in the relevant international standard and R's own > documentation. > >> Prof Brian Ripley wrote: >>> Why do you think this is a bug in R? You have not told us what you >>> expected, but the character range |-| contains only | . Not agreeing with >>> your expectations (unstated or otherwise) is not a bug in R. >>> >>> \- is the same as -, and - is special in character classes. (If it is >>> first or last it is treated literally.) And | is not a metacharacter >>> inside a character class. Also, >>> >>>> grep("[d\\-c]", c("a-a","b")) >>> [1] 1 2 >>> >>>> grep("[d\\-c]", c("a-a","b"), perl=TRUE) >>> [1] 1 >>> >>> shows that escaping - works only in perl (which you will find from the >>> background references mentioned, e.g. >>> >>> The interpretation of an ordinary character preceded by a backslash >>> ('\') is undefined. >>> >>> .) >>> >>> This is all carefully documented in ?regexp, e.g. >>> >>> Patterns are described here as they would be printed by 'cat': do >>> remember that backslashes need to be doubled in entering R >>> character strings from the keyboard. >>> >>> >>> This is not the first time you have wasted our resources with false bug >>> reports, so please show more respect for the R developers' time. >>> You were also explicitly asked not to report on obselete versions of R. >>> >>> On Wed, 3 Jan 2007, [EMAIL PROTECTED] wrote: >>> >>>> Full_Name: FAN >>>> Version: 2.4.0 >>>> OS: Windows >>>> Submission from: (NULL) (159.50.101.9) >>>> >>>> >>>> These are expected: >>>> >>>>> grep("[\-|c]", c("a-a","b")) >>>> [1] 1 >>>> >>>>> gsub("[\-|c]", "&", c("a-a","b")) >>>> [1] "a&a" "b" >>>> >>>> but these are strange: >>>> >>>>> grep("[d|\-|c]", c("a-a","b")) >>>> integer(0) >>>> >>>>> gsub("[d|\-|c]", "&", c("a-a","b")) >>>> [1] "a-a" "b" >>>> >>>> Thanks >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel