On Nov 19, 2012, at 1:25 PM, Sam Steingold wrote: > Thanks Steve, > what is the analogue of .N for min and max?
?seq > i.e., what is the data.table's version of > aggregate(infl$delay,by=list(infl$share.id),FUN=min) > aggregate(infl$delay,by=list(infl$share.id),FUN=max) > DT[, list( max(v)), by=x] x V1 1: a 3 2: b 6 3: c 9 > thanks! > Sam. > > On Fri, Sep 14, 2012 at 3:40 PM, Steve Lianoglou > <[email protected]> wrote: >> Hi, >> >> On Fri, Sep 14, 2012 at 3:26 PM, Sam Steingold <[email protected]> wrote: >>> I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 >>> columns). >>> I want to get the result of >>> table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x) >>> alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is >>> 24.3G, and no end in sight. >>> both V1 and V2 are characters (not factors). >>> Is there anything I could do to speed this up? >>> Thanks. >> >> You might find you'll get a lot of mileage out of data.table when >> working with such large data.frames ... >> >> To get something close to what you're after, you can try: >> >> R> library(data.table) >> R> Z <- as.data.table(Z) >> R> setkeyv(Z, 'V2') >> R> agg <- Z[, list(count=.N), by='V2'] >> >> From here you might >> >> R> tab1 <- table(agg$count) >> >> I think that'll get you where you want to be ... I'm ashamed to say >> that I haven't really done much w/ aggregate since I mostly have used >> plyr and data.table like stuff, so I might be missing your end goal -- >> providing a reproducible example with a small data.frame from you can >> help here (for me at least). >> >> HTH, >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact > > > > -- > Sam Steingold <http://sds.podval.org> <http://www.childpsy.net/> > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

