Thank you, Gus. I am encouraged. I will look into Torque in a day or two or three.
Regards, Tena Sakai tsa...@gallo.ucsf.edu On 1/12/11 6:49 PM, "Gus Correa" <g...@ldeo.columbia.edu> wrote: > Tena Sakai wrote: >> Hi, >> >> I can execute the command below: >> $ mpirun -H vixen -np 1 hostname : -H >> compute-0-0,compute-0-1,compute-0-2 -np 3 hostname >> and I get: >> vixen.egcrc.org >> compute-0-0.local >> compute-0-1.local >> compute-0-2.local >> >> I have a file myhosts, which looks like: >> compute-0-0 slots=1 >> compute-0-1 slots=1 >> compute-0-2 slots=1 >> but when I execute: >> $ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 3 hostname >> I get: >> There are no allocated resources for the application >> hostname >> that match the requested mapping: >> >> Verify that you have mapped the allocated resources properly using the >> --host or --hostfile specification. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to >> launch so we are aborting. >> >> There may be more information reported by the environment (see above). >> >> This may be because the daemon was unable to find all the needed shared >> libraries on the remote node. You may set your LD_LIBRARY_PATH to >> have the >> location of the shared libraries on the remote nodes and this will >> automatically be forwarded to the remote nodes. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> mpirun noticed that the job aborted, but has no info as to the process >> that caused that situation. >> -------------------------------------------------------------------------- >> mpirun: clean termination accomplished >> >> Interestingly, this works: >> $ mpirun --hostfile myhosts -np 3 hostname >> compute-0-0.local >> compute-0-1.local >> compute-0-2.local >> $ >> >> Am I correct in concluding that H and ‹hostfile cannot be issued in the >> same mpirun command which contains a colon (:)? Or is there any trick >> or work-around to have both H and ‹hostfile? >> >> Thank you. >> >> Tena Sakai >> tsa...@gallo.ucsf.edu >> > > Hi Tena > > I don't know if this is an option for you, but OpenMPI can be built > integrated with a resource manager. > This obviates completely the need to specify the host list > on the mpirun command line, or to use > a hostfile, or to get involved with all this syntactical nitty-gritty. > OpenMPI will use exactly those resources (nodes, cores, etc) that are > made available to it by the resource manager upon your request. > > We use Torque here, which is simple, effective, and even available > through RPM-type packages on many Linux distributions. > (Although it is also easy to build from source.) > I think OpenMPI also builds with SGE, > maybe with other resource managers too. > See the FAQ and the README file for more details on how to build > OpenMPI with Torque (or SGE) support. > > Resource managers are also a no-nonsense way to manage jobs, either > from one or from many users. > > My two cents, > Gus Correa > > PS - Looking at your node's names, it looks like to me you have a Rocks > cluster, right? > Rocks has an SGE and a Torque roll. > You could install one of them (only one!), if not yet there, and enjoy! > ('rocks list roll' will tell what you have.) > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users