Is this fast enough for you; matches of 2000 against 2M tags takes 0.2 seconds:
> str(x) chr [1:2000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC" "DADAA" "ABCAD" ... > str(z) chr [1:2000000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC" "DADAA" "ABCAD" ... > system.time(y <- match(x,z)) user system elapsed 0.2 0.0 0.2 > str(y) int [1:2000] 1 2 3 4 5 6 7 8 9 10 ... > On Mon, Jan 12, 2009 at 10:17 PM, Gundala Viswanath <gunda...@gmail.com> wrote: > Yes Jim, exactly. > > BTW, I found from ?match > > " Matching for lists is potentially very slow and best avoided > except in simple cases." > > Since I am doing this for million of tags. Is there a faster alternatives? > > > - Gundala Viswanath > Jakarta - Indonesia > > > > On Tue, Jan 13, 2009 at 12:14 PM, jim holtman <jholt...@gmail.com> wrote: >> Is this what you want: >> >>> repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT") >>> qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT") >>> match(qr, repo) >> [1] 3 6 6 3 6 6 2 6 6 >>> >> >> >> >> On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gunda...@gmail.com> >> wrote: >>> Hi Jorge and all, >>> >>> How can I modified your code when >>> >>> query size can be bigger than repository, >>> meaning that it can contain repeats. >>> >>> e.g. qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT", >>> ) >>> >>> >>> Sorry, I should have mentioned this earlier. >>> >>> >>> - Gundala Viswanath >>> Jakarta - Indonesia >>> >>> >>> >>> On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez >>> <jorgeivanve...@gmail.com> wrote: >>>> >>>> Perhaps >>>> which(repo%in%qr) >>>> ? >>>> HTH, >>>> >>>> Jorge >>>> >>>> >>>> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gunda...@gmail.com> >>>> wrote: >>>>> >>>>> Dear all, >>>>> >>>>> Suppose I have the following vector as repository: >>>>> >>>>> > repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT") >>>>> >>>>> Given another query vector >>>>> >>>>> > qr <- c("AAC", "ATT") >>>>> >>>>> is there a way I can find the query index in repository in a fast way. >>>>> >>>>> Giving: >>>>> >>>>> [1] 3 6 >>>>> >>>>> Typically the size of repo is around ~12million element, and >>>>> query around ~1 million element. >>>>> >>>>> >>>>> - Gundala Viswanath >>>>> Jakarta - Indonesia >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.