[R] Successive subsets from a vector?

2006-08-22 Thread kone
I'd like to pick every imbricated five character long subsets from a 
vector. I guess there is some efficient way to do this without loops...
Here is a for-loop-version and a model for output:

VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);

ADDRESSES=c();
for(i in 1:(length(VECTOR)-4)){
ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse=) 
}

  ADDRESSES
[1] 14265   42650   265011  6501110 5011104 0111043 
1110436 104368
[9] 43686


Atte Tenkanen
University of Turku, Finland

[[alternative text/enriched version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Successive subsets from a vector?

2006-08-22 Thread Prof Brian Ripley
embed(VECTOR, 5)[, 5:1]

gives the subsets, so something like

apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse=)

does the job.

The following is a bit more efficient

ind - 1:(length(VECTOR)-4)
do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep=))

but by looking at how embed() works it could be made as efficient.

Larger example:

VECTOR - sample(1:10, 1e5, replace=TRUE)
 system.time(apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse=))
[1] 5.73 0.05 5.81   NA   NA
 system.time({ind - 1:(length(VECTOR)-4)
+ do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep=))
+ })
[1] 1.00 0.01 1.01   NA   NA

The loop method took 195 secs.  Just assigning to an answer of the correct 
length reduced this to 5 secs.  e.g. use

ADDRESSES - character(length(VECTOR)-4)

Moral: don't grow vectors repeatedly.

On Tue, 22 Aug 2006, kone wrote:

 I'd like to pick every imbricated five character long subsets from a 
 vector. I guess there is some efficient way to do this without loops...
 Here is a for-loop-version and a model for output:
 
 VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);
 
 ADDRESSES=c();

You do not need the semicolons, and they just confuse readers.

 for(i in 1:(length(VECTOR)-4)){
   ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse=) 
 }
 
   ADDRESSES
 [1] 14265   42650   265011  6501110 5011104 0111043 
 1110436 104368
 [9] 43686
 
 
 Atte Tenkanen
 University of Turku, Finland
 
   [[alternative text/enriched version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Successive subsets from a vector?

2006-08-22 Thread jim holtman
 VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6)
 x - lapply(seq(length(VECTOR)-4),function(z)paste(VECTOR[z:(z+4)],
collapse=''))
 unlist(x)
[1] 14265   42650   265011  6501110 5011104 0111043 1110436
104368  43686



On 8/22/06, kone [EMAIL PROTECTED] wrote:

 I'd like to pick every imbricated five character long subsets from a
 vector. I guess there is some efficient way to do this without loops...
 Here is a for-loop-version and a model for output:

 VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);

 ADDRESSES=c();
 for(i in 1:(length(VECTOR)-4)){
ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse=)
 }

  ADDRESSES
 [1] 14265   42650   265011  6501110 5011104 0111043
 1110436 104368
 [9] 43686


 Atte Tenkanen
 University of Turku, Finland

[[alternative text/enriched version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Successive subsets from a vector?

2006-08-22 Thread Atte Tenkanen
Thanks!

I have used tons of for- and while-loops (I'm ashamed to reveal these scripts, 
but I'm primarily a musician;-) 
http://users.utu.fi/attenka/SetTheoryScripts.r), taken some or more cup of 
cocoa and mostly been happy ;-) Now I got so many new ways to do these things, 
that it takes a while to ruminate all the ideas here.

Atte

embed(VECTOR, 5)[, 5:1]
 
 gives the subsets, so something like
 
apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse=)
 
 does the job.
 
 The following is a bit more efficient
 
ind - 1:(length(VECTOR)-4)
do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep=))
 
 but by looking at how embed() works it could be made as efficient.
 
 Larger example:
 
 VECTOR - sample(1:10, 1e5, replace=TRUE)
  system.time(apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse=))
 [1] 5.73 0.05 5.81   NA   NA
  system.time({ind - 1:(length(VECTOR)-4)
 + do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep=))
 + })
 [1] 1.00 0.01 1.01   NA   NA
 
 The loop method took 195 secs.  Just assigning to an answer of the 
 correct 
 length reduced this to 5 secs.  e.g. use
 
ADDRESSES - character(length(VECTOR)-4)
 
 Moral: don't grow vectors repeatedly.
 
 On Tue, 22 Aug 2006, kone wrote:
 
  I'd like to pick every imbricated five character long subsets 
 from a 
  vector. I guess there is some efficient way to do this without 
 loops... Here is a for-loop-version and a model for output:
  
  VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);
  
  ADDRESSES=c();
 
 You do not need the semicolons, and they just confuse readers.
 
  for(i in 1:(length(VECTOR)-4)){
  ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse=) 
  }
  
ADDRESSES
  [1] 14265   42650   265011  6501110 5011104 0111043 
  1110436 104368
  [9] 43686
  
  
  Atte Tenkanen
  University of Turku, Finland
  
  [[alternative text/enriched version deleted]]
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html and provide commented, minimal, self-contained, 
 reproducible code.
  
 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Successive subsets from a vector?

2006-08-22 Thread hadley wickham
 The loop method took 195 secs.  Just assigning to an answer of the correct
 length reduced this to 5 secs.  e.g. use

 ADDRESSES - character(length(VECTOR)-4)

 Moral: don't grow vectors repeatedly.

Other languages (eg. Java) grow the size of the vector independently
of the number of observations in it (I think Java doubles the size
whenever the vector is filled), thus changing O(n) behaviour to O(log
n).  I've always wondered why R doesn't do this.

Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Successive subsets from a vector?

2006-08-22 Thread Prof Brian Ripley
On Tue, 22 Aug 2006, hadley wickham wrote:

  The loop method took 195 secs.  Just assigning to an answer of the correct
  length reduced this to 5 secs.  e.g. use
 
  ADDRESSES - character(length(VECTOR)-4)
 
  Moral: don't grow vectors repeatedly.
 
 Other languages (eg. Java) grow the size of the vector independently
 of the number of observations in it (I think Java doubles the size
 whenever the vector is filled), thus changing O(n) behaviour to O(log
 n).  I've always wondered why R doesn't do this.

At one point at least that was too expensive on memory/address space (and 
it may still be for 32-bit OSes). There is even a 'truelength' field in 
the vector header to allow for such a strategy, and the strategy is used 
in scan() and elsewhere.

In my experience it is relatively rare not to know the vector length in 
advance in R code.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Successive subsets from a vector?

2006-08-22 Thread Charles C. Berry


Like this:
 do.call( paste, c( list(sep=), lapply(1:5,function(x) 
 VECTOR[x:(length(VECTOR)-5+x)]) ))
[1] 14265   42650   265011  6501110 5011104 0111043 1110436 
104368  43686


HTH,

Chuck

On Tue, 22 Aug 2006, kone wrote:

 I'd like to pick every imbricated five character long subsets from a
 vector. I guess there is some efficient way to do this without loops...
 Here is a for-loop-version and a model for output:

 VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);

 ADDRESSES=c();
 for(i in 1:(length(VECTOR)-4)){
   ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse=)
 }

  ADDRESSES
 [1] 14265   42650   265011  6501110 5011104 0111043
 1110436 104368
 [9] 43686


 Atte Tenkanen
 University of Turku, Finland

   [[alternative text/enriched version deleted]]



[ Part 3.64: Included Message ]


Charles C. Berry(858) 534-2098
  Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]   UC San Diego
http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Successive subsets from a vector?

2006-08-22 Thread Gabor Grothendieck
Here is a solution that uses gsub with a negative lookahead perl-style
regexp to do it:

VECTOR - c(1,4,2,6,5,0,11,10,4,3,6,8,6)
e - 
([[:digit:]]+),(?=([[:digit:]]+),([[:digit:]]+),([[:digit:]]+),([[:digit:]]+))
out - gsub(e, \\1\\2\\3\\4\\5 , paste(VECTOR, collapse = ,), perl = TRUE)
head(strsplit(out,  )[[1]], -1)  # uses head from R 2.4.0


On 8/22/06, kone [EMAIL PROTECTED] wrote:
 I'd like to pick every imbricated five character long subsets from a
 vector. I guess there is some efficient way to do this without loops...
 Here is a for-loop-version and a model for output:

 VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);

 ADDRESSES=c();
 for(i in 1:(length(VECTOR)-4)){
ADDRESSES[i]=paste(VECTOR[i:(i+4)],collapse=)
 }

   ADDRESSES
 [1] 14265   42650   265011  6501110 5011104 0111043
 1110436 104368
 [9] 43686


 Atte Tenkanen
 University of Turku, Finland

[[alternative text/enriched version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.