If you want, you can upgrade to the last release in the 1.2 series from the www.open-mpi.org web site. Anything in 1.2 will work - just not beyond.
On Jun 21, 2010, at 1:40 PM, Barrett, Brian W wrote: > You have to set two environment variables (XGRID_CONTROLLER_HOSTNAME and > XGRID_CONTROLLER_PASSWORD) with the correct information in order for the > XGrid starter to work. Due to the way XGrid works, the nolocal option will > not work properly when launching with XGrid. > > Brian > > On Jun 21, 2010, at 1:28 PM, charlie strauss wrote: > >> Perhaps I was mistaken about 1.5rc1. As for the installed openMPI on mac >> osx, my 10.5 OSX has v1.2.3 when I try to run it, it works fine locally but >> it never finds the xgrid. >> >> any mpi job I run, will run on the localhost not the xgrid agents. If try >> to force the issue by specifying -nolocal then it just complains there are >> no nodes. >> >> SO how do I use openMPI so that it uses the nodes of an xgrid cluster? >> >> mpirun -nolocal -n 32 /bin/hostname >> -------------------------------------------------------------------------- >> There are no available nodes allocated to this job. This could be because >> no nodes were found or all the available nodes were already used. >> >> Note that since the -nolocal option was given no processes can be >> launched on the local node. >> -------------------------------------------------------------------------- >> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in >> file >> /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/base/rmaps_base_support_fns.c >> at line 168 >> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in >> file >> /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/round_robin/rmaps_rr.c >> at line 402 >> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in >> file >> /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/base/rmaps_base_map_job.c >> at line 210 >> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of resource in >> file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmgr/urm/rmgr_urm.c at >> line 372 >> >> >> >> >> >> >> >> >> >> On Jun 16, 2010, at 1:36 PM, Ralph Castain wrote: >> >>> Where did you see that 1.5 works with xgrid? That support has been broken >>> since the 1.2 series, unfortunately, so it would help to ensure we don't >>> have stale docs out there to the contrary. >>> >>> As for the 1.2 results, you are aware (I imagine) that OSX ships with the >>> last 1.2 release already installed? You don't have to do anything to use it >>> but run. >>> >>> If you are getting peer timeouts, that is almost always a firewall issue. >>> But I would try the factory-installed version first to be sure. >>> >>> On Jun 16, 2010, at 1:14 PM, Charlie E. Strauss wrote: >>> >>>> I'm new to openMPI. I'm trying to set it up for using xgrid. I have read >>>> that v1.3 and v1.4 are broken on OSX 10.5 and 10.6 although I have seen >>>> some discussions in the archives of this mail list saying some people have >>>> v1.4 running on 10.6. >>>> >>>> I have now compiled both openMPI 1.2 and openMPI1.5rc and neither of >>>> these is working for me with xgrid. Both of these say they work with >>>> xgrid. >>>> >>>> The failuremodes are different. >>>> >>>> Anyone know how to get a working install? I am building this on a OSX >>>> 10.5.8 >>>> machine. THe xgrid controller is on a OSX 10.6 server machine. I have >>>> tried >>>> configuring with and without the --with-xgrid option. >>>> >>>> Behaviour of openMPI1.2 >>>> $ /usr/local/openmpi/bin/mpirun -nolocal -n 2 /bin/hostname >>>> >>>> THe job appears in the xgrid queue, and the logs show it is running on a >>>> remote machine. However nothing ever happens and peeking in the xgrid >>>> results I see: >>>> >>>> $ xgrid -job results -id 8703 >>>> [brio.llnl.gov:38789] [0,0,1]-[0,0,0] mca_oob_tcp_peer_complete_connect: >>>> connection failed: Operation timed out (60) - retrying >>>> [brio.llnl.gov:38792] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect: >>>> connection failed: Operation timed out (60) - retrying >>>> >>>> Perhaps a firewall issue? >>>> >>>> Of course I'm more interested in getting the new openMPI1.5 working. >>>> When I run this, again I get an entry in the queue, and the job runs on a >>>> remote machine but I get a job failed message >>>> >>>> $ /usr/local/openmpi5/bin/mpirun -n 2 /bin/hostname >>>> $ xgrid -job results -id 8702 >>>> [brio.llnl.gov:38776] Error: unknown option "-mca" >>>> >>>> ---- >>>> >>>> Note I have NOT installed openMPI on any of the other computers in the >>>> grid. So perhaps that is the problem? If I did install it on other >>>> computers how would I tell mpirun where to find the path to the install >>>> point? >>>> >>>> ---- >>>> >>>> >>>> Finally in both cases, I don't see any way to pass xgrid specific argument >>>> in on the mpi command line. An xgrid controller divides the agents into >>>> sets of logical grids and you need to specify which logical grid to submit >>>> the job to. In xgrid cli syntax one write "xgrid -gid 2" for grid 2. >>>> When I use openMPI all the jobs get sent to just the default grid which is >>>> the grid that xgrid uses if no gid is specified. >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> Charlie Strauss >> Bioscience Division >> c...@lanl.gov >> 505 665 4838 >> Quidquid latine dictum sit, altum sonatur. >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users