"Steve Lianoglou" <mailinglist.honey...@gmail.com> wrote in message news:t2ybbdc7ed01004290812n433515b5vb15b49c170f5a...@mail.gmail.com...
> Thanks for directing me to the data.table package. I read through some > of the vignettes, and it looks quite nice. > > While your sample code would provide answer if I wanted to just > compute some summary statistic/function of groups of my data.frame > (using `by=symbol`), what's the best way to produces several pieces of > info per subset. > > For instance, I see that I can do something like this: > > summaries[, list(counts=sum(counts), width=sum(exon.width)), by=symbol] Yes, thats it. > But what if I need to do some more complex processing within the > subsets defined in `by=symbol` -- like several lines of programming > logic for 1 result, say. > > I guess I can open a new block that just returns a data.table? Like: > > summaries[, { > cnts <- sum(counts) > ew <- sum(exon.width) > # ... some complex things > complex <- # .. result of complex things > data.table(counts=cnts, width=ew, cplx=complex) >}, by=symbol] > > Is that right? (I mean, it looks like it's working, but maybe there's > a more idiomatic way(?)) Yes, you got it. Rather than a data.table at the end though, just return a list, its faster. Shorter vectors will still be recycled to match any longer ones. Or just this : summaries[, list( counts = sum(counts), width = sum(exon.width), cplx = # .. result of complex things ), by=symbol] Sounds like its working, but could you give us an idea whether it is quick and memory efficient ? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.