Hi,

Here is what I want to do. I have a dataset containing 4.2 *million* rows and about 10 columns and want to do some statistics with it, mainly using it as a prediction set for GAM and GLM models. I tried to load it from a csv file but, after filling up memory and part of the swap (1 gb each), I get a segmentation fault and R stops. I use R under Linux. Here are my questions :

1) Has anyone ever tried to use such a big dataset?
2) Do you think that it would possible on a more powerfull machine, such as a cluster of computers?
3) Finaly, does R has some "memory limitation" or does it just depend on the machine I'm using?


Best wishes

Fabien Fivaz

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to