"José E. Lozano" <[EMAIL PROTECTED]> writes: >> Maybe you've not lurked on R-help for long enough :) Apologies! > > Probably. > >> So, how much "design" is in this data? If none, and what you've >> basically got is a 2000x500000 grid of numbers, then maybe a more raw > > Exactly, raw data, but a little more complex since all the 500000 variables > are in text format, so the width is around 2,500,000. > >> http://cran.r-project.org/web/packages/RNetCDF/index.html >> http://cran.r-project.org/web/packages/hdf5/index.html > > Thanks, I will check. Right now I am reading line by line the file. It's > time consuming, but since I will do it only once, just to rearrange the data > into smaller tables to query, it's ok. > >> Thinking back to your 4GB file with 1,000,000,000 entries, that's >> only 3 bytes per entry (+1 for the comma). What is this data? There >> may be more efficient ways to handle it. > > Is genetic DNA data (individuals genotyped), hence the large amount of > columns to analyze.
The Bioconductor package snpMatrix is designed for this type of data. See http://www.bioconductor.org/packages/2.2/bioc/html/snpMatrix.html and if that looks promising > source('http://bioconductor.org/biocLite.R') > biocLite('snpMatrix') Likely you'll quickly want a 64 bit (linux or Mac) machine. Martin > Best Regards, > Jose Lozano > ------------------------------------------ > Jose E. Lozano Alonso > Observatorio de Salud Pública. > Direccion General de Salud Pública e I+D+I. > Junta de Castilla y León. > Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.