Re: [R] Sorting and subsetting

2010-09-21 Thread Matthew Dowle
All the solutions in this thread so far use the lapply(split(...)) paradigm either directly or indirectly. That paradigm doesn't scale. That's the likely source of quite a few 'out of memory' errors and performance issues in R. data.table doesn't do that internally, and it's syntax is pretty

Re: [R] Sorting and subsetting

2010-09-21 Thread Joshua Wiley
On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: All the solutions in this thread so far use the lapply(split(...)) paradigm either directly or indirectly. That paradigm doesn't scale. That's the likely source of quite a few 'out of memory' errors and performance

Re: [R] Sorting and subsetting

2010-09-21 Thread Matthew Dowle
Probably true, thats cunning, but look at base::match. The first thing it does is coerce factor to character (an allocate and copy needed internally). data.table doesn't do that either, see data.table:::sortedmatch. I made first basic steps towards a proper reproducible test suite (timings.Rnw).

Re: [R] Sorting and subsetting

2010-09-21 Thread peter dalgaard
On Sep 21, 2010, at 16:27 , Joshua Wiley wrote: On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: All the solutions in this thread so far use the lapply(split(...)) paradigm either directly or indirectly. That paradigm doesn't scale. That's the likely source

Re: [R] Sorting and subsetting

2010-09-21 Thread Matthew Dowle
See data.table:::duplist which does that (or at least very similar) in C, for multiple columns too. Matthew http://datatable.r-forge.r-project.org/ peter dalgaard pda...@gmail.com wrote in message news:660991c3-b52b-4d58-b819-eadc95ecc...@gmail.com... On Sep 21, 2010, at 16:27 , Joshua

[R] Sorting and subsetting

2010-09-20 Thread Doran, Harold
Suppose I have a data frame, such as the one below: tmp - data.frame(index = gl(2,20), foo = rnorm(40)) And further assume it is sorted by index and then by the variable foo. tmp - tmp[order(tmp$index, tmp$foo) , ] Now, I want to grab the first N rows of tmp for each index. In the end, what I

Re: [R] Sorting and subsetting

2010-09-20 Thread Phil Spector
Harold - Two ways that come to mind: 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,])) 2) subset(tmp,unlist(tapply(foo,index,seq))=5) - Phil Spector Statistical Computing Facility

Re: [R] Sorting and subsetting

2010-09-20 Thread Doran, Harold
Very nice, Phil. Thank you. -Original Message- From: Phil Spector [mailto:spec...@stat.berkeley.edu] Sent: Monday, September 20, 2010 1:28 PM To: Doran, Harold Cc: R-help Subject: Re: [R] Sorting and subsetting Harold - Two ways that come to mind: 1) do.call(rbind,lapply(split(tmp

Re: [R] Sorting and subsetting

2010-09-20 Thread Tal Galili
Hi Harold, I thought of one way to do this, but maybe (probably) there is a faster way: tmp - data.frame(index = gl(3,20), foo = rnorm(60)) subset.first.x.elements - function(INDEX, num.of.elements = 5) { t.INDEX - table(factor(INDEX, levels = unique(INDEX))) running.indexes -

Re: [R] Sorting and subsetting

2010-09-20 Thread Joshua Wiley
On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector spec...@stat.berkeley.edu wrote: Harold -   Two ways that come to mind: 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,])) 2) subset(tmp,unlist(tapply(foo,index,seq))=5) 3) do.call(rbind, by(tmp, tmp$index, .Primitive([), 1:5,

Re: [R] Sorting and subsetting

2010-09-20 Thread Peter Dalgaard
On 09/20/2010 07:16 PM, Doran, Harold wrote: tmp1 - tmp1[1:5,] tmp2 - tmp2[1:5,] result - rbind(tmp1, tmp2) Does anyone see a way to subset and subsequently bind without a loop? do.call(rbind,lapply(split(tmp,tmp$index),head,5)) indexfoo 1.11 1 -1.5124909 1.10 1

Re: [R] Sorting and subsetting

2010-09-20 Thread David Winsemius
On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote: On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector spec...@stat.berkeley.edu wrote: Harold - Two ways that come to mind: 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,])) 2) subset(tmp,unlist(tapply(foo,index,seq))=5) 3)

Re: [R] Sorting and subsetting

2010-09-20 Thread David Winsemius
On Sep 20, 2010, at 2:01 PM, David Winsemius wrote: On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote: On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector spec...@stat.berkeley.edu wrote: Harold - Two ways that come to mind: 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,])) 2)

Re: [R] Sorting and subsetting

2010-09-20 Thread William Dunlap
: Monday, September 20, 2010 10:16 AM To: R-help Subject: [R] Sorting and subsetting Suppose I have a data frame, such as the one below: tmp - data.frame(index = gl(2,20), foo = rnorm(40)) And further assume it is sorted by index and then by the variable foo. tmp - tmp[order(tmp$index, tmp

Re: [R] Sorting and subsetting

2010-09-20 Thread Joshua Wiley
On Mon, Sep 20, 2010 at 11:15 AM, David Winsemius dwinsem...@comcast.net wrote: On Sep 20, 2010, at 2:01 PM, David Winsemius wrote: On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote: On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector spec...@stat.berkeley.edu wrote: Harold -  Two ways that

Re: [R] Sorting and subsetting

2010-09-20 Thread Peter Dalgaard
On 09/20/2010 08:01 PM, David Winsemius wrote: indexfoo 1.6 1 -3.0267759 1.7 1 -1.3725536 1.19 1 -1.1476048 1.16 1 -1.0963967 1.2 1 -1.0684793 2.29 2 -1.6601486 2.21 2 -1.2633632 2.22 2 -0.9875626 2.38 2 -0.9515301 2.30 2