Eric Doviak wrote: > > I need to find some way to overcome these constraints and work with large > datasets. Does anyone have any suggestions? I might be not the most authoritative person on this subject but I put all my large datasets[1] into an SQLite database and extract/summarize data from it with R using the RSQLite package. If your data come in ASCII format, it is rather easy to read them into an SQLite DB.
> > I've read that I should "carefully vectorize my code." What does that mean > ??? !!! The book "S Programming" by Venables & Ripley has a sub-chapter on this. If you happen to have John Chamber's "Programming with Data" book, there are a few pages on "The Whole-Object View". > > I wrote a script which loads large datasets a few lines at a time, writes the > dozen or so variables of interest to a CSV file, removes the loaded data and > then (via a "for" loop) loads the next few lines .... I managed to get it to > work with one of the SIPP core files, but it's SLOOOOW. Worse, if I discover > later that I omitted a relevant variable, then I'll have to run the whole > script all over again. > That means you have huge datasets but you never need the whole dataset? Just a selected number of variables and then the files are of managable size? If this is the case, using RSQLite (or any other DB package, also RODBC is very easy to use, if you have, for example, an MS Access DB) is a good option. Alternatively, are you familiar with some old-fashioned Unix-Tools? Ports for MS Windows also exist and the program 'cut' could help you considerably. Please note: - I am only a causal user of the DB interfaces. So there might be better solutions and people with more detailed knowledge about it. - All the tools I mentioned here are licensed under the same or similar free software licenses as R, so you should have no problems obtaining/installing them. - A good source of information is the R Data Import/Export Manual -- shipped with every R distribution and available online at http://cran.at.r-project.org/doc/manuals/R-data.html I hope this helps, Roland [1] The largest one is 1GB -- so probably not really large. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.