Simon,

Absolutely was about RDS, but R is all about choices and the
underlying issue was time to read in data which fread and feather are
quite fast at. I assume when you say efficient you are referring to
disk space?

I put together a script to look at this further with and without
compression*. If speed is a priority over disk space then Feather and
data.table (CSV) are good options**. CSV is portable to any system and
feather can be used by python/Julia. RDS/RDA saves a lot of space and,
but are slower to write and read due to compression.

I hope that's helpful to those thinking about their priorities for
file IO in R.

Brandon

* http://rpubs.com/bhive01/fileioinr
**  writing a CSV with data.table is freaky fast if you can get OpenMP
working on your machine
https://github.com/Rdatatable/data.table/issues/1692 Reading that same
CSV is comparable to RDS.


On Fri, May 6, 2016 at 6:07 AM, Simon Urbanek
<[email protected]> wrote:
> Brandon,
> note that the post was about RDS which is more efficient than all the options 
> you list (in particular when not compressed). General advice is to avoid 
> strings. Numeric vectors are several orders of magnitude faster than strings 
> to load/save.
> Cheers,
> Simon
>
>
>> On May 5, 2016, at 6:49 PM, Brandon Hurr <[email protected]> wrote:
>>
>> You might be interested in the speed wars that are happening in the
>> file reading/writing space currently.
>>
>> Matt Dowle/Arun Srinivasan's data.table and Hadley Wickham/Wes
>> McKinney's Feather have made huge speed advances in reading/writing
>> large datasets from disks (mostly csv).
>>
>> Data Table fread()/fwrite():
>> https://github.com/Rdatatable/data.table
>> https://stackoverflow.com/questions/35763574/fastest-way-to-read-in-100-000-dat-gz-files
>> http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/
>>
>>
>> Feather read_feather()/write_feather()
>> https://github.com/wesm/feather
>>
>> I don't often have big datasets (10s of MBs) so I don't see the
>> benefits of these much, but you might.
>>
>> HTH,
>> B
>>
>> On Thu, May 5, 2016 at 3:16 PM, Charles DiMaggio
>> <[email protected]> wrote:
>>> Been a while, but wanted to close the page on a previous post describing R 
>>> hanging on readRDS() and load() for largish (say 500MB or larger) files. 
>>> Tried again with recent release (3.3.0).  Am able to read in large files 
>>> under El Cap.  While the file is reading in, I get a disconcerting spinning 
>>> pinwheel of death and a check under Force Quit reports R is not responding. 
>>>  But if I wait it out, it eventually reads in.  Odd.  But I can live with 
>>> it.
>>>
>>> Cheers
>>>
>>> Charles
>>>
>>>
>>>
>>>
>>>
>>>
>>> Charles DiMaggio, PhD, MPH
>>> Professor of Surgery and Population Health
>>> Director of Injury Research
>>> Department of Surgery
>>> New York University School of Medicine
>>> 462 First Avenue, NBV 15
>>> New York, NY 10016-9196
>>> [email protected]
>>> Office: 212.263.3202
>>> Mobile: 516.308.6426
>>>
>>>
>>>
>>>
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-SIG-Mac mailing list
>>> [email protected]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>
>> _______________________________________________
>> R-SIG-Mac mailing list
>> [email protected]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>
>

_______________________________________________
R-SIG-Mac mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

Reply via email to