On 27/08/2014 14:43, Simon Urbanek wrote:
Mario,

On Aug 27, 2014, at 4:03 AM, Mario Emmenlauer <ma...@emmenlauer.de> wrote:


Hello,

I'm very new to R and don't know much about it yet. I would like
to develop R-programs that work with data of sizes of 10^10 - 10^11
data points. We have very-high-memory machines with ~256 GB, but it
would significantly help if I could store the data points in single
precision in RAM instead of double precision. Is that possible?


You can (e.g. in raw vectors), but it may not help much since you can't operate 
on them directly, since no functions in R know how to deal with 
single-precision floats - all arithmetics are on double precision vectors. If 
you want to load the data in memory but only work on small pieces, then it 
would work since you could extract the piece, convert to doubles and carry on.

We have almost no idea what you want to do with the data, but in my experience datasets of a billion cases are best divided into homogeneous groups for analysis followed by a meta-analysis. I've yet to see an example where storing the data in an efficient RDBMS and loading sections into multiple R sessions did not make a better workflow. They may exist: they are not the norm.

And BTW, 256GB is not really a lot of RAM, and storing as floats would only reduce the footprint 0.5x.


In the documentation I found a sentence saying its not supported,
at least not out of the box. But I am quite desperate and would also
consider working with an alpha version or with extension packages?

Ideally I would like type promotion to work, i.e. that when using
the data in math operations they should be promoted to double.


That won't work automatically that way, but you cloud write methods for 
operators on your new type class and implement it as coercion + call to the 
regular operators. You may take a hint from the 64-bit int packages and I dimly 
recall that some of the mem-mapping packages (bigMemory, ff, ..) may also 
support single-precision storage.

Cheers,
Simon



Any help is greatly appreciated! All the best,

    Mario



--
Mario Emmenlauer BioDataAnalysis             Mobil: +49-(0)151-68108489
Balanstrasse 43                    mailto: mario.emmenlauer * unibas.ch
D-81669 München                          http://www.marioemmenlauer.de/

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to