Simon, Absolutely was about RDS, but R is all about choices and the underlying issue was time to read in data which fread and feather are quite fast at. I assume when you say efficient you are referring to disk space?
I put together a script to look at this further with and without compression*. If speed is a priority over disk space then Feather and data.table (CSV) are good options**. CSV is portable to any system and feather can be used by python/Julia. RDS/RDA saves a lot of space and, but are slower to write and read due to compression. I hope that's helpful to those thinking about their priorities for file IO in R. Brandon * http://rpubs.com/bhive01/fileioinr ** writing a CSV with data.table is freaky fast if you can get OpenMP working on your machine https://github.com/Rdatatable/data.table/issues/1692 Reading that same CSV is comparable to RDS. On Fri, May 6, 2016 at 6:07 AM, Simon Urbanek <[email protected]> wrote: > Brandon, > note that the post was about RDS which is more efficient than all the options > you list (in particular when not compressed). General advice is to avoid > strings. Numeric vectors are several orders of magnitude faster than strings > to load/save. > Cheers, > Simon > > >> On May 5, 2016, at 6:49 PM, Brandon Hurr <[email protected]> wrote: >> >> You might be interested in the speed wars that are happening in the >> file reading/writing space currently. >> >> Matt Dowle/Arun Srinivasan's data.table and Hadley Wickham/Wes >> McKinney's Feather have made huge speed advances in reading/writing >> large datasets from disks (mostly csv). >> >> Data Table fread()/fwrite(): >> https://github.com/Rdatatable/data.table >> https://stackoverflow.com/questions/35763574/fastest-way-to-read-in-100-000-dat-gz-files >> http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/ >> >> >> Feather read_feather()/write_feather() >> https://github.com/wesm/feather >> >> I don't often have big datasets (10s of MBs) so I don't see the >> benefits of these much, but you might. >> >> HTH, >> B >> >> On Thu, May 5, 2016 at 3:16 PM, Charles DiMaggio >> <[email protected]> wrote: >>> Been a while, but wanted to close the page on a previous post describing R >>> hanging on readRDS() and load() for largish (say 500MB or larger) files. >>> Tried again with recent release (3.3.0). Am able to read in large files >>> under El Cap. While the file is reading in, I get a disconcerting spinning >>> pinwheel of death and a check under Force Quit reports R is not responding. >>> But if I wait it out, it eventually reads in. Odd. But I can live with >>> it. >>> >>> Cheers >>> >>> Charles >>> >>> >>> >>> >>> >>> >>> Charles DiMaggio, PhD, MPH >>> Professor of Surgery and Population Health >>> Director of Injury Research >>> Department of Surgery >>> New York University School of Medicine >>> 462 First Avenue, NBV 15 >>> New York, NY 10016-9196 >>> [email protected] >>> Office: 212.263.3202 >>> Mobile: 516.308.6426 >>> >>> >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> R-SIG-Mac mailing list >>> [email protected] >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac >> >> _______________________________________________ >> R-SIG-Mac mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/r-sig-mac >> > _______________________________________________ R-SIG-Mac mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-mac
