Hi Harold,
Generally: you can not beat data.table, unless you can represent your
data in a matrix (or array or vector). For some specific cases, Hervé's
suggestion might be also competitive.
Your problem is that you did not put any effort to read at least part of
the very extensive
On 09/28/2016 02:53 PM, Hervé Pagès wrote:
Hi,
I'm surprised nobody suggested split(). Splitting the data.frame
upfront is faster than repeatedly subsetting it:
tmp <- data.frame(id = rep(1:2, each = 10), foo = rnorm(20))
idList <- unique(tmp$id)
system.time(for (i in idList)
"I'm surprised nobody suggested split(). "
I did.
by() is a data frame oriented version of tapply(), which uses split().
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom
Hi,
I'm surprised nobody suggested split(). Splitting the data.frame
upfront is faster than repeatedly subsetting it:
tmp <- data.frame(id = rep(1:2, each = 10), foo = rnorm(20))
idList <- unique(tmp$id)
system.time(for (i in idList) tmp[which(tmp$id == i),])
# user system
eplicate(500, subset(tmp2, id == idList[1])))
>
> From: Dominik Schneider [mailto:dosc3...@colorado.edu]
> Sent: Wednesday, September 28, 2016 12:27 PM
> To: Doran, Harold <hdo...@air.org>
> Cc: r-help@r-project.org
> Subject: Re: [R] Faster Subsetting
>
> I regularly crunch
I regularly crunch through this amount of data with tidyverse. You can also
try the data.table package. They are optimized for speed, as long as you
have the memory.
Dominik
On Wed, Sep 28, 2016 at 10:09 AM, Doran, Harold wrote:
> I have an extremely large data frame (~13
I regularly crunch through this amount of data with tidyverse. You can also
try the data.table package. They are optimized for speed, as long as you
have the memory.
Dominik
On Wed, Sep 28, 2016 at 10:09 AM, Doran, Harold wrote:
> I have an extremely large data frame (~13
On Wed, 28 Sep 2016, "Doran, Harold" writes:
> I have an extremely large data frame (~13 million rows) that resembles
> the structure of the object tmp below in the reproducible code. In my
> real data, the variable, 'id' may or may not be ordered, but I think
> that is
each time compared to the indexing method.
>
> Perhaps I'm using it incorrectly?
>
>
>
> -Original Message-
> From: Constantin Weiser [mailto:constantin.wei...@hhu.de]
> Sent: Wednesday, September 28, 2016 12:55 PM
> To: r-help@r-project.org
> Cc: Doran, Harold &
compared to the indexing method.
Perhaps I'm using it incorrectly?
-Original Message-
From: Constantin Weiser [mailto:constantin.wei...@hhu.de]
Sent: Wednesday, September 28, 2016 12:55 PM
To: r-help@r-project.org
Cc: Doran, Harold <hdo...@air.org>
Subject: Re: [R] Faster Subsett
<- as.data.table(tmp) # data.table
>
> system.time(replicate(500, tmp2[which(tmp$id == idList[1]),]))
>
> system.time(replicate(500, subset(tmp2, id == idList[1])))
>
> From: Dominik Schneider [mailto:dosc3...@colorado.edu]
> Sent: Wednesday, September 28, 2016 12:27
Hello,
If you work with a matrix instead of a data.frame, it usually runs
faster, but your column vectors must all be numeric.
### Fast, but not fast enough
system.time(replicate(500, tmp[which(tmp$id == idList[1]),]))
user system elapsed
0.050.000.04
### Not fast at all,
Schneider [mailto:dosc3...@colorado.edu]
Sent: Wednesday, September 28, 2016 12:27 PM
To: Doran, Harold <hdo...@air.org>
Cc: r-help@r-project.org
Subject: Re: [R] Faster Subsetting
I regularly crunch through this amount of data with tidyverse. You can also try
the data.table package. The
I have an extremely large data frame (~13 million rows) that resembles the
structure of the object tmp below in the reproducible code. In my real data,
the variable, 'id' may or may not be ordered, but I think that is irrelevant.
I have a process that requires subsetting the data by id and then
14 matches
Mail list logo