Thanks Brian for pointing this out... Yes indeed my thesis involved distributed computing and R. It consisted of two parts, a distributed scoping feature for limiting data movements, and a parrallel computing interface for speeding up computations. The former used CORBA and the latter PVM (plus embedded R-s and ScaLAPACK).
There are three documents available describing this in more detail http://www.stats.ox.ac.uk/~feic/Rs/thesis.pdf my thesis http://www.stats.ox.ac.uk/~feic/Rs/shorter.pdf a shorter summary http://www.stats.ox.ac.uk/~feic/Rs/DSC2003.pdf the DSC document Brian pointed out. I haven't publicized this mainly because the distributed scoping piece involved modifying internal R code, most notably the R_eval() function, which is a bit non-portable... But if there's interest in how I did things I can certainly clean up my code and make it available. The parallel engine part uses standard R so it should be easier to set up. Cheers, fei On Wed, 24 Mar 2004, Prof Brian Ripley wrote: > Fei Chen implemented distribution of data and ScaLAPACK as part of his > DPhil thesis, with a high-level R interface. Moving data around is often > the major limiting factor on large-scale model fitting (he was > experimenting with glm's). > > There are two brief papers at > > http://www.isi-2003.de/guest/3427.pdf?MItabObj=pcoabstract&MIcolObj=uploadpaper&MInamObj=id&MIvalObj=3427&MItypeObj=application/pdf > > adn in the DSC2003 proceedings (but the ci.tuwien server is currently not > available, at least from here). > > Now Fei's process is complete, perhaps he will make the thesis available > on line. > > > On Tue, 23 Mar 2004 [EMAIL PROTECTED] wrote: > > Quoting someone unamed! -- > > > > My inclination would be to, whenever possible, replace the core scalar > > > libraries with compatible parallel versions (lapack -> scalapack), > > > rather than make it an add-on package. If the R client code is general > > > enough, and the make file can automatically find the parallel version, > > > then its a simple matter of compiling with the parallel libs. (Don't > > > know if this is possible at run-time.) No rewriting (high level) R code > > > at all. I tried to contact the plapack folks here at UT about > > > integrating with R, but it appears the project is no longer active. > > > > Unfortunately, there is a major complication to this approach: the distribution > > of data. ScaLAPACK (and PLAPACK) requires the data to be distributed in a > > special way before calculation functions can be called. Given a generic R > > matrix, we have to distribute the data before we can call ScaLAPACK functions on > > it. We then have to collect the answer before we can return it to R. Because > > of this serious overhead, replacing all LAPACK calls with ScaLAPACK calls would > > not be recommended. Future versions of our package [1] may include some type of > > automatic benchmarking to decide when problems are large enough to be worth > > sending to ScaLAPACK. > > > > > > David Bauer > > > > [1] http://www.aspect-sdm.org/Parallel-R/ > > > > ______________________________________________ > > [EMAIL PROTECTED] mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel > > > > > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel