Re: [Rd] Memory allocation in read.table

Simon Urbanek Wed, 28 Aug 2013 11:11:16 -0700

On Aug 28, 2013, at 1:59 PM, Hadley Wickham wrote:

>>> Why do those lines need any allocations? I thought class<- and attr<-
>>> were primitives, and hence would modify in place.
>>> 
>> 
>> .. but only if there is no other reference to the data (i.e. NAMED < 2). If 
>> there are two references, they have to copy, because it would change the 
>> other copy.
>> Here, however, it already has NAMED=2 because of
>> 
>> data <- data[keep]
> 
> Ah, got it - thanks!
> 
>> PS: if you are loading any sizable data, the one thing you don't want to do 
>> is to use read.table() ;)
> 
> Yes ;)  Romain and I (mostly Romain) are working on some faster
> alternatives at https://github.com/romainfrancois/fastread.
> 
> One surprising finding so far (to me at least), is that when loading a
> file full of doubles, you pretty quickly get to the point where strtod
> is the bottleneck.
>


Yup - parsing is the most expensive part. That's why for high-throughput data 
you don't want to use ASCII representation. It's amazing that the disk speeds 
are now so high that CPUs are the bottlenecks now, not vice versa.

Re fast loading - yes, that's something I was also working around in iotools 
https://github.com/s-u/iotools 

Cheers,
Simon

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Memory allocation in read.table

Reply via email to