Thank you, Gus.  I am encouraged.  I will look into Torque
in a day or two or three.

Regards,

Tena Sakai
tsa...@gallo.ucsf.edu


On 1/12/11 6:49 PM, "Gus Correa" <g...@ldeo.columbia.edu> wrote:

> Tena Sakai wrote:
>> Hi,
>> 
>> I can execute the command below:
>>    $ mpirun -H vixen -np 1 hostname : -H
>> compute-0-0,compute-0-1,compute-0-2 -np 3 hostname
>> and I get:
>>    vixen.egcrc.org
>>    compute-0-0.local
>>    compute-0-1.local
>>    compute-0-2.local
>> 
>> I have a file myhosts, which looks like:
>>    compute-0-0 slots=1
>>    compute-0-1 slots=1
>>    compute-0-2 slots=1
>> but when I execute:
>>    $ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 3 hostname
>> I get:
>>    There are no allocated resources for the application
>>      hostname
>>    that match the requested mapping:
>>      
>>    Verify that you have mapped the allocated resources properly using the
>>    --host or --hostfile specification.
>>    --------------------------------------------------------------------------
>>    --------------------------------------------------------------------------
>>    A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
>>    launch so we are aborting.
>>    
>>    There may be more information reported by the environment (see above).
>>    
>>    This may be because the daemon was unable to find all the needed shared
>>    libraries on the remote node. You may set your LD_LIBRARY_PATH to
>> have the
>>    location of the shared libraries on the remote nodes and this will
>>    automatically be forwarded to the remote nodes.
>>    --------------------------------------------------------------------------
>>    --------------------------------------------------------------------------
>>    mpirun noticed that the job aborted, but has no info as to the process
>>    that caused that situation.
>>    --------------------------------------------------------------------------
>>    mpirun: clean termination accomplished
>> 
>> Interestingly, this works:
>>    $ mpirun --hostfile myhosts -np 3 hostname
>>    compute-0-0.local
>>    compute-0-1.local
>>    compute-0-2.local
>>    $
>> 
>> Am I correct in concluding that ­H and ‹hostfile cannot be issued in the
>> same mpirun command which contains a colon (:)?  Or is there any trick
>> or work-around to have both ­H and ‹hostfile?
>> 
>> Thank you.
>> 
>> Tena Sakai
>> tsa...@gallo.ucsf.edu
>> 
> 
> Hi Tena
> 
> I don't know if this is an option for you, but OpenMPI can be built
> integrated with a resource manager.
> This obviates completely the need to specify the host list
> on the mpirun command line, or to use
> a hostfile, or to get involved with all this syntactical nitty-gritty.
> OpenMPI will use exactly those resources (nodes, cores, etc) that are
> made available to it by the resource manager upon your request.
> 
> We use Torque here, which is simple, effective, and even available
> through RPM-type packages on many Linux distributions.
> (Although it is also easy to build from source.)
> I think OpenMPI also builds with SGE,
> maybe with other resource managers too.
> See the FAQ and the README file for more details on how to build
> OpenMPI with Torque (or SGE) support.
> 
> Resource managers are also a no-nonsense way to manage jobs, either
> from one or from many users.
> 
> My two cents,
> Gus Correa
> 
> PS - Looking at your node's names, it looks like to me you have a Rocks
> cluster, right?
> Rocks has an SGE and a Torque roll.
> You could install one of them (only one!), if not yet there, and enjoy!
> ('rocks list roll' will tell what you have.)
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to