Hi all, My institute will hopefully be working on cutting-edge genetic sequencing data by the Fall of 2010. The datasets will be 10's of GB large and growing. I'd like to use R to do primary analyses. This is OK, because we can just throw $ at the problem and get lots of RAM running on 64 bit R. However, we are still running up against the fact that vectors in R cannot contain more than 2^31-1. I know there are "ways around" this issue, and trust me, I think I've tried them all (e.g., bringing in portions of the data at a time; using large-dataset packages in R; using SQL databases, etc). But all these 'solutions' are, at the end of the day, much much more cumbersome, programming-wise, than just doing things in native R. Maybe that's just the cost of doing what I'm doing. But my questions, which may well be naive (I'm not a computer programmer), are:
1) Is there an *inherent* limit to vectors being < 2^31-1 long? I.e., in an alternative history of R's development, would it have been feasible for R to not have had this limitation? 2) Is there any possibility that this limit will be overcome in future revisions of R? I'm very very grateful to the people who have spent important parts of their professional lives developing R. I don't think anyone back in, say, 1995, could have foreseen that datasets would be >>2^32-1 in size. For better or worse, however, in many fields of science, that is routinely the case today. *If* it's possible to get around this limit, then I'd like to know whether the R Development Team takes seriously the needs of large data users, or if they feel that (perhaps not mutually exclusively) developing such capacity is best left up to ad hoc R packages and alternative analysis programs. Best, Matt -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.