On 4/9/07, Simon Urbanek <[EMAIL PROTECTED]> wrote: > > On Apr 7, 2007, at 10:56 AM, Ramon Diaz-Uriarte wrote: > > > Dear All, > > > > The "clients.txt" file of the latest Rserve package, by Simon > > Urbanek, says, regarding its R client, > > > > "(...) a simple R client, i.e. it allows you to connect to Rserve > > from R itself. It is very simple and limited, because Rserve was > > not primarily meant for R-to-R communication (there are better ways > > to do that), but it is useful for quick interactive connection to > > an Rserve farm." > > > > Which are those better ways to do it? I am thinking about using > > Rserve to have an R process send jobs to a bunch of Rserves in > > different machines. It is like what we could do with Rmpi (or pvm), > > but without the MPI layer. Therefore, presumably it'd be easier to > > deal with network problems, machine's failures, using checkpoints, > > etc. (i.e., to try to get better fault tolerance). > > > > It seems that Rserve would provide the basic infrastructure for > > doing that and saves me from reinventing the wheel of using > > sockets, etc, directly from R. > > > > However, Simon's comment about better ways of R-to-R communication > > made me wonder if this idea really makes sense. What is the catch? > > Have other people tried similar approaches? > > > > I was commenting on direct R-to-R communication using sockets + > 'serialize' in R or the 'snow' package for parallel processing. The > latter could be useful for what you have in mind, because it includes > a socket-based implementation which allows you to spawn multiple > children (across multiple machines) and collect their results. It > uses regular rsh or ssh to start the jobs, so if can use that, it > should work for you. 'snow' also has PVM and MPI implementations, the > PVM one is really easy to setup (on unix) and that was what I was > using for parallel computing in R on a cluster. >
I think I now understand your comments. I've used snow and Rmpi quite a bit. But the problem with Rmpi (or, rather, MPI) is the lack of fault tolerance: if a node goes down, the whole MPI universe breaks, and thus the complete set of slaves. Setting up some kind of fault-tolerant scheme with Rserve seemed possible/simpler (as it does not depend on the MPI layer). (Yes, maybe I should check snowFT, but it uses PVM, and I recall a while back there was a reason why we decided to go with MPI instead of PVM). > Rserve is sort of comparable, but in addition it provides the > spawning infrastructure due to its client/server concept. What it > doesn't have is the convenience functions that snow provides like > clusterApply etc. Thinking of it, it would be actually possible to > add them, although I admit that the original goal of Rserve was not > parallel computing :). The idea was to have one Rserve server and > multiple clients Aha. I should have seen that. I think I understand the differences better now. > whereas in 'snow' you sort of have one client and > multiple servers. You could spawn multiple Rserves on multiple > machines, but Rserve itself doesn't provide any load-balancing out of > the box, so you'd have to do that yourself. > Yes, sure. I think that should be doable, though, if I decide to try to go down this route. > I don't know if that helps... :) > It does help! Thanks a lot. Best, R. > Cheers, > Simon > > > > -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
