Take a look at the package filehash. It allows you to work with large objects in R (bigger than your RAM) by storing them on the disk. The objects are represented as pointers in R and have a small footprint in memory. You can load all of them in an environment and access them with the $ operator. I think filehash is more general than R.huge. R.huge works very well with numerical 2D data only.
Adrian Dragulescu On 7/31/07, Eric Doviak <[EMAIL PROTECTED]> wrote: > > > Just a note of thanks for all the help I have received. I haven't gotten a > chance to implement any of your suggestions because I'm still trying to > catalog all of them! Thank you so much! > > Just to recap (for my own benefit and to create a summary for others): > > Bruce Bernzweig suggested using the R.huge package. > > Ben Bolker pointed out that my original message wasn't clear and asked > what I want to do with the data. At this point, just getting a dataset > loaded would be wonderful, so I'm trying to trim variables (and if possible, > I would also like to trim observations). He also provided an example of > "vectorizing." > > Ted Harding suggested that I use AWK to process the data and provided the > necessary code. He also tested his code on older hardware running GNU-Linux > (or Unix?) and showed that AWK can process the data even when the computer > has very little memory and processing power. Jim Holtman had similar success > when he used Cygwin's UNIX utilities on a machine running MS Windows. They > both used the following code: > > gawk 'BEGIN{FS=","}{print $(1) "," $(1000) "," $(1275) "," $(5678)}' > < tempxx.txt > newdata.csv > > Fortunately, there is a version of GAWK for MS Windows. ... Not that I > like MS Windows. It's just that I'm forced to use that 19th century > operating system on the job. (After using Debian at home and happily running > RKWard for my dissertation, returning to Windows World is downright > depressing). > > Roland Rau suggested that I use a database with RSQLite and pointed out > that RODBC can work with MS Access. He also pointed me to a sub-chapter in > Venables and Ripley's _S Programming_ and "The Whole-Object View" pages in > John Chamber's _Programming with Data_. > > Greg Snow recommended biglm for regression analysis with data that is > too large to fit into memory. > > Last, but not least, Peter Dalgaard pointed out that there are options > within R. He suggests using the colClasses= argument for when "reading" data > and the what= argument for "scanning" data, so that you don't load more > columns than necessary. He also provided the following script: > > dict <- readLines(" > ftp://www.sipp.census.gov/pub/sipp/2004/l04puw1d.txt") > D.lines <- grep("^D ", dict) > vdict <- read.table(con <- textConnection(dict[D.lines])); close(con) > head(vdict) > > I'll try these solutions and report back on my success. > > Thanks again! > - Eric > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.