Impressive stuff. Nice to see people giving some though to this. I will explore the packages you mentioned.
Thank you Saptarshi Guha On Mon, May 11, 2009 at 12:37 AM, Patrick Aboyoun <paboy...@fhcrc.org> wrote: > Saptarshi, > I know of two alternatives you can use to do fast extraction of consecutive > subsequences of a vector: > > 1) Fast copy: The method you mentioned of creating a memcpy'd vector > 2) Pointer management: Creating an externalptr object in R and manage the > start and end of your data > > If you are looking for a prototyping environment to try, I recommend using > the IRanges and Biostrings packages from the Bioconductor project. The > IRanges package contains a function called subseq for performing 1) on all > basic vector types (raw, logical, integer, etc.) and Biostrings package > contains a subseq method on an externalptr based class that implements 2. > > I was going to lobby R core members quietly about adding something akin to > subseq from IRanges into base R since it is extremely useful for all long > vectors and could replace all a:b calls with a <= b in R code, but this > publicity can't hurt. > > Here is an example: > >> source("http://bioconductor.org/biocLite.R") >> biocLite(c("IRanges", "Biostrings")) > > << download output omitted >> >> >> suppressMessages(library(Biostrings)) >> x <- rep(charToRaw("a"), 1e7) >> y <- BString(rawToChar(x)) >> suppressMessages(library(Biostrings)) >> x <- rep(charToRaw("a"), 1e7) >> y <- BString(rawToChar(x)) >> system.time(x[13:1e7]) > > user system elapsed > 0.304 0.073 0.378 >> >> system.time(subseq(x, 13)) > > user system elapsed > 0.011 0.007 0.019 >> >> system.time(subseq(y, 13)) > > user system elapsed > 0.003 0.000 0.004 >> >> identical(x[13:1e7], subseq(x, 13)) > > [1] TRUE >> >> identical(x[13:1e7], charToRaw(as.character(subseq(y, 13)))) > > [1] TRUE >> >> sessionInfo() > > R version 2.10.0 Under development (unstable) (2009-05-08 r48504) > i386-apple-darwin9.6.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Biostrings_2.13.5 IRanges_1.3.5 > > loaded via a namespace (and not attached): > [1] Biobase_2.5.2 > > > > Quoting Saptarshi Guha <saptarshi.g...@gmail.com>: > >> Hello, >> Suppose in the following code, >> PROTECT(sr = R_tryEval( .... )) >> >> sr is a RAWSXP vector. I wish to return another RAWSXP starting at >> position 13 onwards (base=0). >> >> I could create another RAWSXP of the correct length and then memcpy >> the required bytes and length to this new one. >> >> However is there a more efficient method? >> >> Regards >> Saptarshi Guha >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel