So the consensus is - it happens equally in 1.9.0 and 1.9.1 alpha current - it happens in the C locale - it is random and bursty, as in
> d [1] 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 [25] 84 84 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 [49] 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 [73] 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 [97] 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 [121] 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 [145] 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 84 84 84 84 84 84 [169] 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 [193] 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 84 13 13 84 84 84 13 13 13 [217] 84 84 84 13 13 13 84 84 84 13 13 13 84 84 84 13 13 13 13 13 13 13 13 13 ... So looks like a problem in the PCRE compiled code. On Fri, 11 Jun 2004, Marc Schwartz wrote: > On Fri, 2004-06-11 at 10:28, Prof Brian Ripley wrote: > > This is actually PCRE. Something is wrong with your build of R-patched > > (1.9.1 alpha, I assume): I get 84 everywhere. You are asking for a first > > character l, then one or more characters of `word' then tmean. In your > > example this is the same as (in a suitable locale, including C) > > > > length(grep("^l[A-Za-z0-9]+tmean", x, perl = TRUE, value = TRUE)) I omitted _ there, not that it mattered. > > length(grep("^l[[:alnum:]_]+tmean", x, perl = TRUE, value = TRUE)) > > > > which each give 84. > > > > One issue: PCRE is locale-dependent. Did you use the same locale for > > each? What happens if you force LANG=C? > > > > (I've just checked an R-devel Solaris system. This gave 13 on a build > > from Weds, and 84 when remade today. The result with 13 seems truncated, > > as they are the first 13. Might be coincidental, of course.) > > > The above is confirmed using Version 1.9.1 alpha (2004-06-10) on FC2: > > > x <- dget(file = url("http://www.biostat.jhsph.edu/~rpeng/names.R")) > > length(grep("^l[A-Za-z0-9]+tmean", x, perl = TRUE, value = TRUE)) > [1] 84 > > length(grep("^l[[:alnum:]_]+tmean", x, perl = TRUE, value = TRUE)) > [1] 84 > > > Also, to demonstrate Roger's follow up example: > > > d <- replicate(1000, length(grep("^l\\w+tmean", x, perl = TRUE, value > = TRUE))) > > summary(d) > Min. 1st Qu. Median Mean 3rd Qu. Max. > 13.00 13.00 13.00 14.14 13.00 84.00 table(d) is more informative. > BTW: pcre-4.5-2 Did you use --with-pcre, though? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel