On 11/17/10 6:06 PM, "Joris Meys" <jorism...@gmail.com> wrote:

>Indeed, I get it. If the pattern is "xx", it is only matched against 2
>letters at the same time. All the rest doesn't matter. But still that
>doesn't explain
>
>>agrep("ANNTCG", "ANNXXTCG", max = list(ins=3))
>integer(0)
>>agrep("ANNTCG", "ANNXTCG", max = list(ins=3))
>[1] 1
>>agrep("ANNTCG", "ANTCG", max = list(del=3))
>[1] 1
>>agrep("ANNTCG", "ATCG", max = list(del=3))
>integer(0)

It looks like R's agrep defaults max.distance$all to 0.1 if unspecified by
the argument, so that explains these examples (the first and last one have
a net distance of 2, which is > ceiling(0.1 * nchar(pattern))).

The attachment is a completely untested fix that turns the pattern into a
regex (I haven't yet succeeded in setting up an environment to compile R
from source).  Since TRE defaults to Basic POSIX regex syntax, in theory
only backslashes in the user-provided pattern need to be escaped, and \^
and \$ added to the pattern.  Hopefully somebody can review this to see if
it looks correct.

Daniel



Daniel  Dickison
Research Programmer
ddicki...@carnegielearning.com
Toll Free: (888) 851-7094 x103
FAX: (412) 690-2444

Revolutionary Math Curricula. Revolutionary Results.

Carnegie Learning, Inc. | 437 Grant St. 20th Floor | Pittsburgh, PA 15219
www.carnegielearning.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to