On 11/7/06 11:28 AM, "Ramon Diaz-Uriarte" <[EMAIL PROTECTED]> wrote:
> On Tuesday 07 November 2006 15:56, Randall C Johnson [Contr.] wrote: >> Hello everyone, >> I've been fiddling around with the snow and Rmpi packages on my new Intel >> Mac, and have run into a few problems. When I make a cluster on my machine, >> both slaves start up just fine, and everything works as expected. When I >> try to make a cluster including another networked machine it hangs. I've >> followed the suggestions at >> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/83086.html and >> http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html but to no avail. >> Everything seems to start up fine using lamboot, but then hangs when making >> the cluster in R. Making a cluster with 2 slaves seems to work fine, but if >> I increase the number (to use the networked machines) it hangs again. >> >> I've tried networking to another Mac, and also to a machine running Red Hat >> Linux. Both machines can set up their own local clusters. Does anyone have >> any ideas? > > Dear Randy, > > A few suggestions: > > a) make sure there are no firewalls; I assume this is actually the case, but > anyway; I don't think I have any firewalls running. I checked and they all seem to be disabled... > b) what happens if you lamboot outside R (and create a universe with a local > and a networked machine) and then you do: "lamexec -np 6 hostname"? This prints out the host names of each machine as expected. > c) are the Rmpi and snow installed in the same directories in the different > machines? are there version differences in Rmpi (or Snow) between machines? I've installed the same versions, but they are in different directories... I also tried an example per Luke Tierney's suggestion using only Rmpi, and I get the following error when trying to spawn the Rslaves after starting up with lamboot (outside of R). I tried to use laminfo, but I'm not sure what I'm looking for or how to use the information given... > library(Rmpi) > mpi.spawn.Rslaves() ---------------------------------------------------------------------------- It seems that [at least] one of the child processes that was started by MPI_Comm_spawn* chose a different RPI than the parent MPI application. For example, one (of the) child process(es) that differed from the parent is shown below: Parent application: MPI_Comm_spawn Child MPI_COMM_WORLD rank usysv (v7.1.0): 0 All MPI processes must choose the same RPI module and version when they start. Check your SSI settings and/or the local environment variables on each node. ---------------------------------------------------------------------------- R(26444) malloc: *** Deallocation of a pointer not malloced: 0x16379a0; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug Error in mpi.comm.spawn(slave = system.file("Rslaves.sh", package = "Rmpi"), : MPI_Error_string: unclassified > > HTH, > > R. > > > >> >> Thanks, >> Randy >> >>> sessionInfo() >> >> R version 2.4.0 Patched (2006-10-03 r39576) >> i386-apple-darwin8.8.2 >> >> locale: >> C >> >> attached base packages: >> [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" >> [7] "base" >> >> other attached packages: >> Rmpi snow >> "0.5-3" "0.2-2" >> >> ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.