All the solutions in this thread so far use the lapply(split(...)) paradigm
either directly or indirectly. That paradigm doesn't scale. That's the
likely
source of quite a few 'out of memory' errors and performance issues in R.
data.table doesn't do that internally, and it's syntax is pretty
On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle mdo...@mdowle.plus.com wrote:
All the solutions in this thread so far use the lapply(split(...)) paradigm
either directly or indirectly. That paradigm doesn't scale. That's the
likely
source of quite a few 'out of memory' errors and performance
Probably true, thats cunning, but look at base::match. The
first thing it does is coerce factor to character (an allocate
and copy needed internally). data.table doesn't do that
either, see data.table:::sortedmatch.
I made first basic steps towards a proper reproducible test
suite (timings.Rnw).
On Sep 21, 2010, at 16:27 , Joshua Wiley wrote:
On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle mdo...@mdowle.plus.com wrote:
All the solutions in this thread so far use the lapply(split(...)) paradigm
either directly or indirectly. That paradigm doesn't scale. That's the
likely
source
See data.table:::duplist which does that (or at least very similar) in C,
for multiple columns too.
Matthew
http://datatable.r-forge.r-project.org/
peter dalgaard pda...@gmail.com wrote in message
news:660991c3-b52b-4d58-b819-eadc95ecc...@gmail.com...
On Sep 21, 2010, at 16:27 , Joshua
Suppose I have a data frame, such as the one below:
tmp - data.frame(index = gl(2,20), foo = rnorm(40))
And further assume it is sorted by index and then by the variable foo.
tmp - tmp[order(tmp$index, tmp$foo) , ]
Now, I want to grab the first N rows of tmp for each index. In the end, what I
Harold -
Two ways that come to mind:
1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
2) subset(tmp,unlist(tapply(foo,index,seq))=5)
- Phil Spector
Statistical Computing Facility
Very nice, Phil. Thank you.
-Original Message-
From: Phil Spector [mailto:spec...@stat.berkeley.edu]
Sent: Monday, September 20, 2010 1:28 PM
To: Doran, Harold
Cc: R-help
Subject: Re: [R] Sorting and subsetting
Harold -
Two ways that come to mind:
1) do.call(rbind,lapply(split(tmp
Hi Harold,
I thought of one way to do this, but maybe (probably) there is a faster way:
tmp - data.frame(index = gl(3,20), foo = rnorm(60))
subset.first.x.elements - function(INDEX, num.of.elements = 5)
{
t.INDEX - table(factor(INDEX, levels = unique(INDEX)))
running.indexes -
On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
spec...@stat.berkeley.edu wrote:
Harold -
Two ways that come to mind:
1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
2) subset(tmp,unlist(tapply(foo,index,seq))=5)
3) do.call(rbind, by(tmp, tmp$index, .Primitive([), 1:5,
On 09/20/2010 07:16 PM, Doran, Harold wrote:
tmp1 - tmp1[1:5,]
tmp2 - tmp2[1:5,]
result - rbind(tmp1, tmp2)
Does anyone see a way to subset and subsequently bind without a loop?
do.call(rbind,lapply(split(tmp,tmp$index),head,5))
indexfoo
1.11 1 -1.5124909
1.10 1
On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote:
On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
spec...@stat.berkeley.edu wrote:
Harold -
Two ways that come to mind:
1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
2) subset(tmp,unlist(tapply(foo,index,seq))=5)
3)
On Sep 20, 2010, at 2:01 PM, David Winsemius wrote:
On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote:
On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
spec...@stat.berkeley.edu wrote:
Harold -
Two ways that come to mind:
1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
2)
: Monday, September 20, 2010 10:16 AM
To: R-help
Subject: [R] Sorting and subsetting
Suppose I have a data frame, such as the one below:
tmp - data.frame(index = gl(2,20), foo = rnorm(40))
And further assume it is sorted by index and then by the variable foo.
tmp - tmp[order(tmp$index, tmp
On Mon, Sep 20, 2010 at 11:15 AM, David Winsemius
dwinsem...@comcast.net wrote:
On Sep 20, 2010, at 2:01 PM, David Winsemius wrote:
On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote:
On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
spec...@stat.berkeley.edu wrote:
Harold -
Two ways that
On 09/20/2010 08:01 PM, David Winsemius wrote:
indexfoo
1.6 1 -3.0267759
1.7 1 -1.3725536
1.19 1 -1.1476048
1.16 1 -1.0963967
1.2 1 -1.0684793
2.29 2 -1.6601486
2.21 2 -1.2633632
2.22 2 -0.9875626
2.38 2 -0.9515301
2.30 2
16 matches
Mail list logo