On Fri, 2004-06-11 at 10:28, Prof Brian Ripley wrote: > This is actually PCRE. Something is wrong with your build of R-patched > (1.9.1 alpha, I assume): I get 84 everywhere. You are asking for a first > character l, then one or more characters of `word' then tmean. In your > example this is the same as (in a suitable locale, including C) > > length(grep("^l[A-Za-z0-9]+tmean", x, perl = TRUE, value = TRUE)) > length(grep("^l[[:alnum:]_]+tmean", x, perl = TRUE, value = TRUE)) > > which each give 84. > > One issue: PCRE is locale-dependent. Did you use the same locale for > each? What happens if you force LANG=C? > > (I've just checked an R-devel Solaris system. This gave 13 on a build > from Weds, and 84 when remade today. The result with 13 seems truncated, > as they are the first 13. Might be coincidental, of course.)
The above is confirmed using Version 1.9.1 alpha (2004-06-10) on FC2: > x <- dget(file = url("http://www.biostat.jhsph.edu/~rpeng/names.R")) > length(grep("^l[A-Za-z0-9]+tmean", x, perl = TRUE, value = TRUE)) [1] 84 > length(grep("^l[[:alnum:]_]+tmean", x, perl = TRUE, value = TRUE)) [1] 84 Also, to demonstrate Roger's follow up example: > d <- replicate(1000, length(grep("^l\\w+tmean", x, perl = TRUE, value = TRUE))) > summary(d) Min. 1st Qu. Median Mean 3rd Qu. Max. 13.00 13.00 13.00 14.14 13.00 84.00 BTW: pcre-4.5-2 HTH, Marc Schwartz ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel