>>>>> <dietmar.schind...@manroland-web.com> >>>>> on Tue, 4 Apr 2017 08:45:30 +0000 writes:
> Dear Sirs, > while >> regexpr('(.{1,2})\\1', 'foo') > [1] 2 > attr(,"match.length") > [1] 2 > attr(,"useBytes") > [1] TRUE > yields the correct match, an incremented upper bound in >> regexpr('(.{1,3})\\1', 'foo') > [1] -1 > attr(,"match.length") > [1] -1 > attr(,"useBytes") > [1] TRUE > incorrectly yields no match. Hmm, yes, I would also say that this is incorrect (though I'm always cautious: The ?regex help page explicitly mentions greedy repetitions, and these can "bite you" ..) The behavior is also different from the perl=TRUE one which is correct (according to the above understanding). Using grep() instead of regexpr() makes the behavior easier to parse. The following code ---------------------------------------------------------------------- tx <- c("ab","abc", paste0("foo", c("", "b", "o", "bar", "oofy"))) setNames(nchar(tx), tx) ## ab abc foo foob fooo foobar foooofy ## 2 3 3 4 4 6 7 grep1r <- function(n, txt, ...) { pattern <- paste0('(.{1,',n,'})\\1', collapse="") ## can have empty n ans <- grep(pattern, txt, value=TRUE, ...) cat(sprintf("pattern '%s' : ", pattern)); print(ans, quote=FALSE) invisible(ans) } grep1r({}, tx)# '.{1,}' : because of _greedy_ matching there is __no__ repetiion! grep1r(100,tx)# i.e., these both give an empty match : character(0) ## matching at most once: grep1r(1, tx)# matches all 5 starting with "foo" grep1r(2, tx)# ditto : all have more than 2 chars grep1r(3, tx)# not "foo": those with more than 3 chars grep1r(4, tx)# .. those with more than 4 characters grep1r(5, tx)# .. those with more than 5 characters grep1r(6, tx)# .. those with more than 6 characters grep1r(7, tx)# NONE (= those with more than 7 characters) for(p in c(FALSE,TRUE)) { cat("\ngrep(*, perl =", p, ") :\n") for(n in c(list(NULL), 1:7)) grep1r(n, tx, perl = p) } ---------------------------------------------------------------------- ends with > for(p in c(FALSE,TRUE)) { + cat("\ngrep(*, perl =", p, ") :\n") + for(n in c(list(NULL), 1:7)) + grep1r(n, tx, perl = p) + } grep(*, perl = FALSE ) : pattern '(.{1,})\1' : character(0) pattern '(.{1,1})\1' : [1] foo foob fooo foobar foooofy pattern '(.{1,2})\1' : [1] foo foob fooo foobar foooofy pattern '(.{1,3})\1' : [1] foob fooo foobar foooofy pattern '(.{1,4})\1' : [1] foobar foooofy pattern '(.{1,5})\1' : [1] foobar foooofy pattern '(.{1,6})\1' : [1] foooofy pattern '(.{1,7})\1' : character(0) grep(*, perl = TRUE ) : pattern '(.{1,})\1' : [1] foo foob fooo foobar foooofy pattern '(.{1,1})\1' : [1] foo foob fooo foobar foooofy pattern '(.{1,2})\1' : [1] foo foob fooo foobar foooofy pattern '(.{1,3})\1' : [1] foo foob fooo foobar foooofy pattern '(.{1,4})\1' : [1] foo foob fooo foobar foooofy pattern '(.{1,5})\1' : [1] foo foob fooo foobar foooofy pattern '(.{1,6})\1' : [1] foo foob fooo foobar foooofy pattern '(.{1,7})\1' : [1] foo foob fooo foobar foooofy > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel