On 3/3/2006 2:42 PM, Berton Gunter wrote: > What you propose is not really a solution, as even if your data set didn't > break the modified precision, another would. And of course, there is a price > to be paid for reduced numerical precision. > > The real issue is that R's current design is incapable of dealing with data > sets larger than what can fit in physical memory (expert > comment/correction?).
It can deal with big data sets, just not nearly as conveniently as it deals with ones that fit in memory. The most straightforward way is probably to put them in a database, and use RODBC or one of the database-specific packages to read the data in blocks. (You could also leave the data in a flat file and read it a block at a time from there, but the database is probably worth the trouble: other people have done the work involved in sorting, selecting, etc.) The main problem you'll run into is that almost none of the R functions know about databases, so you'll end up doing a lot of work to rewrite the algorithms to work one block at a time, or on a random sample of data, or whatever. The original poster didn't say what he wanted to do with his data, but if he only needs to work with a few variables at a time, he can easily fit an 820,000 x N dataframe in memory, for small values of N. Reading it in from a database would be easy. Duncan Murdoch > My understanding is that there is no way to change > this without a fundamental redesign of R. This means that you must either > live with R's limitations or use other software for "large" data sets. > > -- Bert Gunter > Genentech Non-Clinical Statistics > South San Francisco, CA > > "The business of the statistician is to catalyze the scientific learning > process." - George E. P. Box > > > >> -----Original Message----- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of Dimitri Joe >> Sent: Friday, March 03, 2006 11:28 AM >> To: R-Help >> Subject: [R] memory once again >> >> Dear all, >> >> A few weeks ago, I asked this list why small Stata files >> became huge R >> files. Thomas Lumley said it was because "Stata uses single-precision >> floating point by default and can use 1-byte and 2-byte >> integers. R uses >> double precision floating point and four-byte integers." And >> it seemed I >> couldn't do anythig about it. >> >> Is it true? I mean, isn't there a (more or less simple) way to change >> how R stores data (maybe by changing the source code and >> compiling it)? >> >> The reason why I insist in this point is because I am trying to work >> with a data frame with more than 820.000 observations and 80 >> variables. >> The Stata file has 150Mb. With my Pentiun IV 2GHz and 1G RAM, Windows >> XP, I could't do the import using the read.dta() function >> from package >> foreign. With Stat Transfer I managed to convert the Stata >> file to a S >> file of 350Mb, but my machine still didn't manage to import it using >> read.S(). >> >> I even tried to "increase" my memory by memory.limit(4000), >> but it still >> didn't work. >> >> Regardless of the answer to my question, I'd appreciate to hear about >> your experience/suggestions in working with big files in R. >> >> Thank you for youR-Help, >> >> Dimitri Szerman >> >> ______________________________________________ >> [email protected] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >> > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
