You may want to try using isplit (from the iterators package). Combined with
foreach, it's an efficient way of iterating through a data frame by groups
of rows defined by common values of a columns (which I think is what you're
after). You can speed things up further if you have a multiprocessor system
with the doMC package to run iterations in parallel. There's an example
here:
http://blog.revolution-computing.com/2009/08/blockprocessing-a-data-frame-with-isplit.html

Hope this helps,
# David Smith

On Wed, Sep 2, 2009 at 3:39 PM, Leo Alekseyev <dnqu...@gmail.com> wrote:

> I have a data frame with about 10^6 rows; I want to group the data
> according to entries in one of the columns and do something with it.
> For instance, suppose I want to count up the number of elements in
> each group.  I tried something like aggregate(my.df$my.field,
> list(my.df$my.field), length) but it seems to be very slow.  Likewise,
> the split() function was slow (I killed it before it completed).  Is
> there a way to efficiently accomplish this in R?..  I am almost
> tempted to write an external Perl/Python script entering every row
> into a hashtable keyed by my.field and iterating over the keys...
> Might this be faster?..
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
David M Smith <da...@revolution-computing.com>
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at
www.revolution-computing.com/events

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to