Re: [R] Successive subsets from a vector?
Here is a solution that uses gsub with a negative lookahead perl-style regexp to do it: VECTOR <- c(1,4,2,6,5,0,11,10,4,3,6,8,6) e <- "([[:digit:]]+),(?=([[:digit:]]+),([[:digit:]]+),([[:digit:]]+),([[:digit:]]+))" out <- gsub(e, "\\1\\2\\3\\4\\5 ", paste(VECTOR, collapse = ","), perl = TRUE) head(strsplit(out, " ")[[1]], -1) # uses head from R 2.4.0 On 8/22/06, kone <[EMAIL PROTECTED]> wrote: > I'd like to pick every imbricated five character long subsets from a > vector. I guess there is some efficient way to do this without loops... > Here is a for-loop-version and a model for output: > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > ADDRESSES=c(); > for(i in 1:(length(VECTOR)-4)){ >ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > } > > > ADDRESSES > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > "1110436" "104368" > [9] "43686" > > > Atte Tenkanen > University of Turku, Finland > >[[alternative text/enriched version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Successive subsets from a vector?
Like this: > do.call( paste, c( list(sep=""), lapply(1:5,function(x) > VECTOR[x:(length(VECTOR)-5+x)]) )) [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" "1110436" "104368" "43686" > HTH, Chuck On Tue, 22 Aug 2006, kone wrote: > I'd like to pick every imbricated five character long subsets from a > vector. I guess there is some efficient way to do this without loops... > Here is a for-loop-version and a model for output: > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > ADDRESSES=c(); > for(i in 1:(length(VECTOR)-4)){ > ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > } > > > ADDRESSES > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > "1110436" "104368" > [9] "43686" > > > Atte Tenkanen > University of Turku, Finland > > [[alternative text/enriched version deleted]] > > > >[ Part 3.64: "Included Message" ] > Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Successive subsets from a vector?
On Tue, 22 Aug 2006, hadley wickham wrote: > > The loop method took 195 secs. Just assigning to an answer of the correct > > length reduced this to 5 secs. e.g. use > > > > ADDRESSES <- character(length(VECTOR)-4) > > > > Moral: don't grow vectors repeatedly. > > Other languages (eg. Java) grow the size of the vector independently > of the number of observations in it (I think Java doubles the size > whenever the vector is filled), thus changing O(n) behaviour to O(log > n). I've always wondered why R doesn't do this. At one point at least that was too expensive on memory/address space (and it may still be for 32-bit OSes). There is even a 'truelength' field in the vector header to allow for such a strategy, and the strategy is used in scan() and elsewhere. In my experience it is relatively rare not to know the vector length in advance in R code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Successive subsets from a vector?
> The loop method took 195 secs. Just assigning to an answer of the correct > length reduced this to 5 secs. e.g. use > > ADDRESSES <- character(length(VECTOR)-4) > > Moral: don't grow vectors repeatedly. Other languages (eg. Java) grow the size of the vector independently of the number of observations in it (I think Java doubles the size whenever the vector is filled), thus changing O(n) behaviour to O(log n). I've always wondered why R doesn't do this. Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Successive subsets from a vector?
Thanks! I have used tons of for- and while-loops (I'm ashamed to reveal these scripts, but I'm primarily a musician;-) http://users.utu.fi/attenka/SetTheoryScripts.r), taken some or more cup of cocoa and mostly been happy ;-) Now I got so many new ways to do these things, that it takes a while to ruminate all the ideas here. Atte >embed(VECTOR, 5)[, 5:1] > > gives the subsets, so something like > >apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse="") > > does the job. > > The following is a bit more efficient > >ind <- 1:(length(VECTOR)-4) >do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep="")) > > but by looking at how embed() works it could be made as efficient. > > Larger example: > > VECTOR <- sample(1:10, 1e5, replace=TRUE) > > system.time(apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse="")) > [1] 5.73 0.05 5.81 NA NA > > system.time({ind <- 1:(length(VECTOR)-4) > + do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep="")) > + }) > [1] 1.00 0.01 1.01 NA NA > > The loop method took 195 secs. Just assigning to an answer of the > correct > length reduced this to 5 secs. e.g. use > >ADDRESSES <- character(length(VECTOR)-4) > > Moral: don't grow vectors repeatedly. > > On Tue, 22 Aug 2006, kone wrote: > > > I'd like to pick every imbricated five character long subsets > from a > > vector. I guess there is some efficient way to do this without > loops...> Here is a for-loop-version and a model for output: > > > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > > > ADDRESSES=c(); > > You do not need the semicolons, and they just confuse readers. > > > for(i in 1:(length(VECTOR)-4)){ > > ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > > } > > > > > ADDRESSES > > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > > "1110436" "104368" > > [9] "43686" > > > > > > Atte Tenkanen > > University of Turku, Finland > > > > [[alternative text/enriched version deleted]] > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html> and provide commented, minimal, self-contained, > reproducible code. > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Successive subsets from a vector?
> VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6) > x <- lapply(seq(length(VECTOR)-4),function(z)paste(VECTOR[z:(z+4)], collapse='')) > unlist(x) [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" "1110436" "104368" "43686" > On 8/22/06, kone <[EMAIL PROTECTED]> wrote: > > I'd like to pick every imbricated five character long subsets from a > vector. I guess there is some efficient way to do this without loops... > Here is a for-loop-version and a model for output: > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > ADDRESSES=c(); > for(i in 1:(length(VECTOR)-4)){ >ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > } > > > ADDRESSES > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > "1110436" "104368" > [9] "43686" > > > Atte Tenkanen > University of Turku, Finland > >[[alternative text/enriched version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Successive subsets from a vector?
embed(VECTOR, 5)[, 5:1] gives the subsets, so something like apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse="") does the job. The following is a bit more efficient ind <- 1:(length(VECTOR)-4) do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep="")) but by looking at how embed() works it could be made as efficient. Larger example: VECTOR <- sample(1:10, 1e5, replace=TRUE) > system.time(apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse="")) [1] 5.73 0.05 5.81 NA NA > system.time({ind <- 1:(length(VECTOR)-4) + do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep="")) + }) [1] 1.00 0.01 1.01 NA NA The loop method took 195 secs. Just assigning to an answer of the correct length reduced this to 5 secs. e.g. use ADDRESSES <- character(length(VECTOR)-4) Moral: don't grow vectors repeatedly. On Tue, 22 Aug 2006, kone wrote: > I'd like to pick every imbricated five character long subsets from a > vector. I guess there is some efficient way to do this without loops... > Here is a for-loop-version and a model for output: > > VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); > > ADDRESSES=c(); You do not need the semicolons, and they just confuse readers. > for(i in 1:(length(VECTOR)-4)){ > ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") > } > > > ADDRESSES > [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" > "1110436" "104368" > [9] "43686" > > > Atte Tenkanen > University of Turku, Finland > > [[alternative text/enriched version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Successive subsets from a vector?
I'd like to pick every imbricated five character long subsets from a vector. I guess there is some efficient way to do this without loops... Here is a for-loop-version and a model for output: VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6); ADDRESSES=c(); for(i in 1:(length(VECTOR)-4)){ ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse="") } > ADDRESSES [1] "14265" "42650" "265011" "6501110" "5011104" "0111043" "1110436" "104368" [9] "43686" Atte Tenkanen University of Turku, Finland [[alternative text/enriched version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.