Or, more succinctly, "Pinard's Law": The demands of ever more data always exceed the capabilities of ever better hardware.
;-D -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of François Pinard > Sent: Tuesday, July 18, 2006 3:56 PM > To: Thomas Lumley > Cc: [email protected] > Subject: Re: [R] FW: Large datasets in R > > [Thomas Lumley] > > >People have used R in this way, storing data in a database > and reading it > >as required. There are also some efforts to provide > facilities to support > >this sort of programming (such as the current project funded > by Google > >Summer of Code: > http://tolstoy.newcastle.edu.au/R/devel/06/05/5525.html). > > Interesting project indeed! However, if R requires uses more > swapping > because arrays do not all fit in physical memory, crudely replacing > swapping with database accesses is not necessarily going to buy > a drastic speed improvement: the paging gets done in user > space instead > of being done in the kernel. > > Long ago, while working on CDC mainframes, astonishing at the > time but > tiny by nowadays standards, there was a program able to invert or do > simplexes on very big matrices. I do not remember the name of the > program, and never studied it but superficially (I was in computer > support for researchers, but not a researcher myself). The > program was > documented as being extremely careful at organising accesses > to rows and > columns (or parts thereof) in such a way that real memory was > best used. > In other words, at the core of this program was a paging system very > specialised and cooperative with the problems meant to be solved. > > However, the source of this program was just plain huge > (let's say from > memory, about three or four times the size of the optimizing FORTRAN > compiler, which I already knew better as an impressive algorithmic > undertaking). So, good or wrong, the prejudice stuck solidly > in me at > the time, if nothing else, that handling big arrays the right way, > speed-wise, ought to be very difficult. > > >One reason there isn't more of this is that relying on > Moore's Law has > >worked very well over the years. > > On the other hand, the computational needs for scientific > problems grow > fairly quickly to the size of our ability to solve them. Let me take > weather forecasting for example. 3-D geographical grids are > never fine > enough for the resolution meteorologists would like to get, > and the time > required for each prediction step grows very rapidly, to increase > precision by not so much. By merely tuning a few parameters, these > people may easily pump nearly all the available cycles out the > supercomputers given to them, and they do so without hesitation. > Moore's Law will never succeed at calming their starving hunger! :-). > > -- > François Pinard http://pinard.progiciels-bpi.ca > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
