Re: [R] Efficiency challenge: MANY subsets

2009-01-20 Thread Johannes Graumann
Many thanks for this example, which doesn't entirely cover my case since I 
have as many indexes entries as sequences entries. It was very 
educational none the less and I used it to come up with something a bit 
faster than what I had before. The main trick I used though was naming all 
entries in sequences and indexes likes so 
  name(indexes) - seq(length(indexes)
and then do a lapply on names(indexes), which allows me to access both 
lists easily. What I end up with is this:

fragments - lapply(
names(indexes),
function(x){
  lapply(
indexes[[x]],
function(.range){
  .range - seq.int(
.range[1], .range[2]
  )
  unlist(lapply(sequences[x], '[', .range),use.names=FALSE)
}
  )
}
  )

Although this is still quite slow, it's much faster than what I had before. 
Any further comments are highly welcome. I can send the real sequences and 
indexes as exported R objects ...

Thanks, Joh

jim holtman wrote:

 Try this one;  it is doing a list of 7000 in under 2 seconds:
 
  sequences - list(
 +
 +
 + 
 c(M,G,L,W,I,S,F,G,T,P,P,S,Y,T,Y,L,L,I
 + ,M, +
 +
 + 
 
N,H,K,L,L,L,I,N,N,N,N,L,T,E,V,H,T,Y,F,
 N,I,N,I,N,I,D,K,M,Y,I,H,*)
 +  )



  indexes - list(
 +   list(
 + c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
 +   )
 +  )

 indexes - rep(indexes,10)
 sequences - rep(sequences,7000)

 system.time({
 + fragments - lapply(indexes, function(.seq){
 + lapply(.seq, function(.range){
 + .range - seq(.range[1], .range[2])  # save since we use several
 times
 + lapply(sequences, '[', .range)
 + })
 + })
 + })
user  system elapsed
1.240.001.26


 
 
 On Fri, Jan 16, 2009 at 3:16 PM, Johannes Graumann
 johannes_graum...@web.de wrote:
 Thanks. Very elegant, but doesn't solve the problem of the outer for
 loop, since I now would rewrite the code like so:

 fragments - list()
 for(iN in seq(length(sequences))){
  cat(paste(iN,\n))
  fragments[[iN]] -
lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq,
as.list(g))])
 }

 still very slow for length(sequences) ~ 7000.

 Joh

 On Friday 16 January 2009 14:23:47 Henrique Dallazuanna wrote:
 Try this:

 lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq,
 as.list(g))])

 On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann 

 johannes_graum...@web.de wrote:
  Hello,
 
  I have a list of character vectors like this:
 
  sequences - list(
 
 
  
c(M,G,L,W,I,S,F,G,T,P,P,S,Y,T,Y,L,L,I
 ,M,
 
 
  
N,H,K,L,L,L,I,N,N,N,N,L,T,E,V,H,T,Y,
 F, N,I,N,I,N,I,D,K,M,Y,I,H,*)
  )
 
  and another list of subset ranges like this:
 
  indexes - list(
   list(
 c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
   )
  )
 
  What I now want to do is to subset each entry in sequences
  (sequences[[1]]) with all ranges in the corresponding low level list
  in indexes (indexes[[1]]). Here is what I came up with.
 
  fragments - list()
  for(iN in seq(length(sequences))){
   cat(paste(iN,\n))
   tmpFragments - sapply(
 indexes[[iN]],
 function(x){
   sequences[[iN]][seq.int(x[1],x[2])]
 }
   )
   fragments[[iN]] - tmpFragments
  }
 
  This works fine, but sequences contains thousands of entries and the
  corresponding indexes are sometimes hundreds of ranges long, so this
  whole
  process is EXTREMELY inefficient.
 
  Does somebody out there take the challenge and show me a way on how to
  speed
  this up?
 
  Thanks for any hints,
 
  Joh
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.


 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Efficiency challenge: MANY subsets

2009-01-16 Thread Johannes Graumann
Hello,

I have a list of character vectors like this:

sequences - list(
  c(M,G,L,W,I,S,F,G,T,P,P,S,Y,T,Y,L,L,I,M,
  N,H,K,L,L,L,I,N,N,N,N,L,T,E,V,H,T,Y,F,
  N,I,N,I,N,I,D,K,M,Y,I,H,*)
)

and another list of subset ranges like this:

indexes - list(
  list(
c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
  )
)

What I now want to do is to subset each entry in sequences 
(sequences[[1]]) with all ranges in the corresponding low level list in 
indexes (indexes[[1]]). Here is what I came up with.

fragments - list()
for(iN in seq(length(sequences))){
  cat(paste(iN,\n))
  tmpFragments - sapply(
indexes[[iN]],
function(x){
  sequences[[iN]][seq.int(x[1],x[2])]
}
  )
  fragments[[iN]] - tmpFragments
}

This works fine, but sequences contains thousands of entries and the 
corresponding indexes are sometimes hundreds of ranges long, so this whole 
process is EXTREMELY inefficient.

Does somebody out there take the challenge and show me a way on how to speed 
this up?

Thanks for any hints,

Joh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficiency challenge: MANY subsets

2009-01-16 Thread Jorge Ivan Velez
Dear Johannes,
Try this:


sequences - c(M,G,L,W,I,S,F,G,T,P,P,S,Y,T,
Y,L,L,I,M,N,H,K,L,L,L,I,N,N,N,N,L,T,E,V,
H,T,Y,F,N,I,N,I,N,I,D,K,M,Y,I,H,*)

indexes - matrix(c(1,22,22,46,46,51,1,46,22,51,1,51),ncol=2,byrow=TRUE)

apply(indexes,1,function(x){
  ind- x[1]:x[2]
  sequences[ind]
  }
  )


HTH,

Jorge



On Fri, Jan 16, 2009 at 8:06 AM, Johannes Graumann johannes_graum...@web.de
 wrote:

 Hello,

 I have a list of character vectors like this:

 sequences - list(

  
 c(M,G,L,W,I,S,F,G,T,P,P,S,Y,T,Y,L,L,I,M,

  N,H,K,L,L,L,I,N,N,N,N,L,T,E,V,H,T,Y,F,
  N,I,N,I,N,I,D,K,M,Y,I,H,*)
 )

 and another list of subset ranges like this:

 indexes - list(
  list(
c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
  )
 )

 What I now want to do is to subset each entry in sequences
 (sequences[[1]]) with all ranges in the corresponding low level list in
 indexes (indexes[[1]]). Here is what I came up with.

 fragments - list()
 for(iN in seq(length(sequences))){
  cat(paste(iN,\n))
  tmpFragments - sapply(
indexes[[iN]],
function(x){
  sequences[[iN]][seq.int(x[1],x[2])]
}
  )
  fragments[[iN]] - tmpFragments
 }

 This works fine, but sequences contains thousands of entries and the
 corresponding indexes are sometimes hundreds of ranges long, so this
 whole
 process is EXTREMELY inefficient.

 Does somebody out there take the challenge and show me a way on how to
 speed
 this up?

 Thanks for any hints,

 Joh

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficiency challenge: MANY subsets

2009-01-16 Thread Henrique Dallazuanna
Try this:

lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])

On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann 
johannes_graum...@web.de wrote:

 Hello,

 I have a list of character vectors like this:

 sequences - list(

  
 c(M,G,L,W,I,S,F,G,T,P,P,S,Y,T,Y,L,L,I,M,

  N,H,K,L,L,L,I,N,N,N,N,L,T,E,V,H,T,Y,F,
  N,I,N,I,N,I,D,K,M,Y,I,H,*)
 )

 and another list of subset ranges like this:

 indexes - list(
  list(
c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
  )
 )

 What I now want to do is to subset each entry in sequences
 (sequences[[1]]) with all ranges in the corresponding low level list in
 indexes (indexes[[1]]). Here is what I came up with.

 fragments - list()
 for(iN in seq(length(sequences))){
  cat(paste(iN,\n))
  tmpFragments - sapply(
indexes[[iN]],
function(x){
  sequences[[iN]][seq.int(x[1],x[2])]
}
  )
  fragments[[iN]] - tmpFragments
 }

 This works fine, but sequences contains thousands of entries and the
 corresponding indexes are sometimes hundreds of ranges long, so this
 whole
 process is EXTREMELY inefficient.

 Does somebody out there take the challenge and show me a way on how to
 speed
 this up?

 Thanks for any hints,

 Joh

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficiency challenge: MANY subsets

2009-01-16 Thread Johannes Graumann
Thanks. Very elegant, but doesn't solve the problem of the outer for loop, 
since I now would rewrite the code like so:

fragments - list()
for(iN in seq(length(sequences))){
  cat(paste(iN,\n))
  fragments[[iN]] - 
lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])
}

still very slow for length(sequences) ~ 7000.

Joh

On Friday 16 January 2009 14:23:47 Henrique Dallazuanna wrote:
 Try this:

 lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])

 On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann 

 johannes_graum...@web.de wrote:
  Hello,
 
  I have a list of character vectors like this:
 
  sequences - list(
 
  
  c(M,G,L,W,I,S,F,G,T,P,P,S,Y,T,Y,L,L,I
 ,M,
 
  
  N,H,K,L,L,L,I,N,N,N,N,L,T,E,V,H,T,Y,
 F, N,I,N,I,N,I,D,K,M,Y,I,H,*)
  )
 
  and another list of subset ranges like this:
 
  indexes - list(
   list(
 c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
   )
  )
 
  What I now want to do is to subset each entry in sequences
  (sequences[[1]]) with all ranges in the corresponding low level list in
  indexes (indexes[[1]]). Here is what I came up with.
 
  fragments - list()
  for(iN in seq(length(sequences))){
   cat(paste(iN,\n))
   tmpFragments - sapply(
 indexes[[iN]],
 function(x){
   sequences[[iN]][seq.int(x[1],x[2])]
 }
   )
   fragments[[iN]] - tmpFragments
  }
 
  This works fine, but sequences contains thousands of entries and the
  corresponding indexes are sometimes hundreds of ranges long, so this
  whole
  process is EXTREMELY inefficient.
 
  Does somebody out there take the challenge and show me a way on how to
  speed
  this up?
 
  Thanks for any hints,
 
  Joh
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



signature.asc
Description: This is a digitally signed message part.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficiency challenge: MANY subsets

2009-01-16 Thread jim holtman
Try this one;  it is doing a list of 7000 in under 2 seconds:

  sequences - list(
+
+
+  c(M,G,L,W,I,S,F,G,T,P,P,S,Y,T,Y,L,L,I
+ ,M,
+
+
+  N,H,K,L,L,L,I,N,N,N,N,L,T,E,V,H,T,Y,F,
N,I,N,I,N,I,D,K,M,Y,I,H,*)
+  )



  indexes - list(
+   list(
+ c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
+   )
+  )

 indexes - rep(indexes,10)
 sequences - rep(sequences,7000)

 system.time({
+ fragments - lapply(indexes, function(.seq){
+ lapply(.seq, function(.range){
+ .range - seq(.range[1], .range[2])  # save since we use several times
+ lapply(sequences, '[', .range)
+ })
+ })
+ })
   user  system elapsed
   1.240.001.26




On Fri, Jan 16, 2009 at 3:16 PM, Johannes Graumann
johannes_graum...@web.de wrote:
 Thanks. Very elegant, but doesn't solve the problem of the outer for loop,
 since I now would rewrite the code like so:

 fragments - list()
 for(iN in seq(length(sequences))){
  cat(paste(iN,\n))
  fragments[[iN]] -
lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])
 }

 still very slow for length(sequences) ~ 7000.

 Joh

 On Friday 16 January 2009 14:23:47 Henrique Dallazuanna wrote:
 Try this:

 lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])

 On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann 

 johannes_graum...@web.de wrote:
  Hello,
 
  I have a list of character vectors like this:
 
  sequences - list(
 
 
  c(M,G,L,W,I,S,F,G,T,P,P,S,Y,T,Y,L,L,I
 ,M,
 
 
  N,H,K,L,L,L,I,N,N,N,N,L,T,E,V,H,T,Y,
 F, N,I,N,I,N,I,D,K,M,Y,I,H,*)
  )
 
  and another list of subset ranges like this:
 
  indexes - list(
   list(
 c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
   )
  )
 
  What I now want to do is to subset each entry in sequences
  (sequences[[1]]) with all ranges in the corresponding low level list in
  indexes (indexes[[1]]). Here is what I came up with.
 
  fragments - list()
  for(iN in seq(length(sequences))){
   cat(paste(iN,\n))
   tmpFragments - sapply(
 indexes[[iN]],
 function(x){
   sequences[[iN]][seq.int(x[1],x[2])]
 }
   )
   fragments[[iN]] - tmpFragments
  }
 
  This works fine, but sequences contains thousands of entries and the
  corresponding indexes are sometimes hundreds of ranges long, so this
  whole
  process is EXTREMELY inefficient.
 
  Does somebody out there take the challenge and show me a way on how to
  speed
  this up?
 
  Thanks for any hints,
 
  Joh
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.