On Wed, Mar 5, 2008 at 6:40 PM, Henrik Bengtsson <[EMAIL PROTECTED]> wrote: > On Wed, Mar 5, 2008 at 6:18 PM, Duncan Murdoch <[EMAIL PROTECTED]> wrote: > > On 05/03/2008 8:56 PM, Henrik Bengtsson wrote: > > > Hi, > > > > > > just curious, but does anyone know the source/reason of observing the > > > following error on OSX but not on WinXP and Linux? > > > > Presumably in the locale you're using on OSX, "a" < "Z" is false. This > > is the ascii sort order used in the C locale. On my Windows box, "a" < > > "Z" is true, because it uses the English_Canada.1252 collation order. > > That's it indeed. The person who first reported the error had > sessionInfo() locale > 'en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8' and I > missed that 'C' in the middle, which I guess his system falls back to > if none of the previous ones exist?!? > > Now I can reproduce it on both Windows and Linux: > > > Sys.setlocale("LC_ALL", "C") > [1] "C" > > regexpr("[a-Z]", "foo") > Error in regexpr("[a-Z]", "foo") : invalid regular expression '[a-Z]' > > Sys.setlocale("LC_ALL", "en") > [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;L > C_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United > States > .1252" > > regexpr("[a-Z]", "foo") > > [1] 1 > attr(,"match.length") > [1] 1 > > Case almost closed, but then the question is why don't you get an > error in one of the two cases '[a-Z]' and '[A-z]' then with the other > locale(s)? > > > Sys.setlocale("LC_ALL", "en") > [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;L > C_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United > States > .1252" > > regexpr("[a-Z]", "foo") > > [1] 1 > attr(,"match.length") > [1] 1 > > regexpr("[A-z]", "foo") > [1] 1 > attr(,"match.length") > [1] 1 > > "a" < "Z" > [1] TRUE > > "a" > "Z" > [1] FALSE
My bad... > "A" < "z" [1] TRUE > > regexpr("[A-z]", "foo") > [1] 1 > attr(,"match.length") > [1] 1 > "z" < "A" [1] FALSE > regexpr("[z-A]", "foo") Error in regexpr("[z-A]", "foo") : invalid regular expression '[z-A]' Case closed /Henrik > > Thanks > > /Henrik > > > > > > > Duncan Murdoch > > > > > > I've tried with a > > > few different versions of R (v2.5.1, v2.6.1, v2.6.2, v2.7.0devel). > > > The locale does not seem to affect the error, i.e. I've tested a few > > > different and it is still only OSX that gives the error but not the > > > other two. > > > > > >> regexpr("[a-Z]", "foo") > > > Error in regexpr(pattern, text, extended, fixed, useBytes) : > > > invalid regular expression '[a-Z]' > > >> regexpr("[a-zA-Z]", "foo") > > > [1] 1 > > > attr(,"match.length") > > > [1] 1 > > >> regexpr("[A-z]", "foo") > > > [1] 1 > > > attr(,"match.length") > > > [1] 1 > > > > > > At least now I know it that the safest is to use '[a-zA-Z]' (or > > > possibly '[[:alpha:]]'). > > > > > > /Henrik > > > > > > ______________________________________________ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel