[R] Handling large data sets via scan()

Nawaaz Ahmed Thu, 03 Feb 2005 22:42:36 -0800

I'm trying to read in datasets with roughly 150,000 rows and 600
features. I wrote a function using scan() to read it in (I have a 4GB
linux machine) and it works like a charm.  Unfortunately, converting the
scanned list into a datafame using as.data.frame() causes the memory
usage to explode (it can go from 300MB for the scanned list to 1.4GB for
a data.frame of 30000 rows) and it fails claiming it cannot allocate
memory (though it is still not close to the 3GB limit per process on my
linux box - the message is "unable to allocate vector of size 522K").


So I have three questions --

1) Why is it failing even though there seems to be enough memory available?

2) Why is converting it into a data.frame causing the memory usage to
explode? Am I using as.data.frame() wrongly? Should I be using some
other command?

3) All the model fitting packages seem to want to use data.frames as
their input. If I cannot convert my list into a data.frame what can I
do? Is there any way of getting around this?

Much thanks!
Nawaaz

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Handling large data sets via scan()

Reply via email to