Re: [OMPI users] More newbie question: --hostfile option

2011-01-12 Thread Tena Sakai
Thank you, Gus.  I am encouraged.  I will look into Torque
in a day or two or three.

Regards,

Tena Sakai
tsa...@gallo.ucsf.edu


On 1/12/11 6:49 PM, "Gus Correa"  wrote:

> Tena Sakai wrote:
>> Hi,
>> 
>> I can execute the command below:
>>$ mpirun -H vixen -np 1 hostname : -H
>> compute-0-0,compute-0-1,compute-0-2 -np 3 hostname
>> and I get:
>>vixen.egcrc.org
>>compute-0-0.local
>>compute-0-1.local
>>compute-0-2.local
>> 
>> I have a file myhosts, which looks like:
>>compute-0-0 slots=1
>>compute-0-1 slots=1
>>compute-0-2 slots=1
>> but when I execute:
>>$ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 3 hostname
>> I get:
>>There are no allocated resources for the application
>>  hostname
>>that match the requested mapping:
>>  
>>Verify that you have mapped the allocated resources properly using the
>>--host or --hostfile specification.
>>--
>>--
>>A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
>>launch so we are aborting.
>>
>>There may be more information reported by the environment (see above).
>>
>>This may be because the daemon was unable to find all the needed shared
>>libraries on the remote node. You may set your LD_LIBRARY_PATH to
>> have the
>>location of the shared libraries on the remote nodes and this will
>>automatically be forwarded to the remote nodes.
>>--
>>--
>>mpirun noticed that the job aborted, but has no info as to the process
>>that caused that situation.
>>--
>>mpirun: clean termination accomplished
>> 
>> Interestingly, this works:
>>$ mpirun --hostfile myhosts -np 3 hostname
>>compute-0-0.local
>>compute-0-1.local
>>compute-0-2.local
>>$
>> 
>> Am I correct in concluding that ­H and ‹hostfile cannot be issued in the
>> same mpirun command which contains a colon (:)?  Or is there any trick
>> or work-around to have both ­H and ‹hostfile?
>> 
>> Thank you.
>> 
>> Tena Sakai
>> tsa...@gallo.ucsf.edu
>> 
> 
> Hi Tena
> 
> I don't know if this is an option for you, but OpenMPI can be built
> integrated with a resource manager.
> This obviates completely the need to specify the host list
> on the mpirun command line, or to use
> a hostfile, or to get involved with all this syntactical nitty-gritty.
> OpenMPI will use exactly those resources (nodes, cores, etc) that are
> made available to it by the resource manager upon your request.
> 
> We use Torque here, which is simple, effective, and even available
> through RPM-type packages on many Linux distributions.
> (Although it is also easy to build from source.)
> I think OpenMPI also builds with SGE,
> maybe with other resource managers too.
> See the FAQ and the README file for more details on how to build
> OpenMPI with Torque (or SGE) support.
> 
> Resource managers are also a no-nonsense way to manage jobs, either
> from one or from many users.
> 
> My two cents,
> Gus Correa
> 
> PS - Looking at your node's names, it looks like to me you have a Rocks
> cluster, right?
> Rocks has an SGE and a Torque roll.
> You could install one of them (only one!), if not yet there, and enjoy!
> ('rocks list roll' will tell what you have.)
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] More newbie question: --hostfile option

2011-01-12 Thread Tena Sakai
Thank you, David.  That did it!

Tena Sakai
tsa...@gallo.ucsf.edu


On 1/12/11 6:40 PM, "David Zhang"  wrote:

I think you just make a hostfile with

vixen
compute-0-0
...

and load the file in the first -H before the colon.

On Wed, Jan 12, 2011 at 6:23 PM, Tena Sakai  wrote:
Hi,

I can execute the command below:
   $ mpirun -H vixen -np 1 hostname : -H compute-0-0,compute-0-1,compute-0-2 
-np 3 hostname
and I get:
   vixen.egcrc.org 
   compute-0-0.local
   compute-0-1.local
   compute-0-2.local

I have a file myhosts, which looks like:
   compute-0-0 slots=1
   compute-0-1 slots=1
   compute-0-2 slots=1
but when I execute:
   $ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 3 hostname
I get:
   There are no allocated resources for the application
 hostname
   that match the requested mapping:

   Verify that you have mapped the allocated resources properly using the
   --host or --hostfile specification.
   --
   --
   A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
   launch so we are aborting.

   There may be more information reported by the environment (see above).

   This may be because the daemon was unable to find all the needed shared
   libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
   location of the shared libraries on the remote nodes and this will
   automatically be forwarded to the remote nodes.
   --
   --
   mpirun noticed that the job aborted, but has no info as to the process
   that caused that situation.
   --
   mpirun: clean termination accomplished

Interestingly, this works:
   $ mpirun --hostfile myhosts -np 3 hostname
   compute-0-0.local
   compute-0-1.local
   compute-0-2.local
   $

Am I correct in concluding that –H and —hostfile cannot be issued in the
same mpirun command which contains a colon (:)?  Or is there any trick
or work-around to have both –H and —hostfile?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu 

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] More newbie question: LD_LIBRARY_PATH

2011-01-12 Thread Tena Sakai
Hi Gus,

> How do you intend to use Rmpi?

The problem I want to solve/attack is in highly embarrasingly parallel
nature.  According to a couple of brief tutorials I read, Rmpi favors
LamMpi and, as I understand, even with emergence of openMPI it hasn't
been modified in the way to work with openMPI.  What I want to do, in
concept, is to spawn a master and a bunch of slaves, and have master
issue appropriate R commands/functions to slaves.

I have read some R users (including Rmpi author) tell that it does work
with openMPI, though I cannot get really critical information.  In some
cases, I find completely erroneous information as well.  That's a bit
discouraging, but...

If I cannot get Rmpi to work with openMPI, I am planning to do MPMD with
openMPI via mpirun command.

There is an open-source software called r (pronounced little-r) which
provides hash-bang capability for R and it is nice for command line
and piping use.  I am hoping that I can use r with mpirun.

My long-term goal is to launch a bunch of instances on cloud and do the
(embarrasingly) parallel computing.  Ie., master issues R directives to
slaves and when all slaves finish it (the master) collects output from
each slave run and put them together.  The cost of using Amazon's HPC
cluster is much higher than that of non-hpc machines and since my problem
is embarrassingly parallel, I don't really need high-speed inter-node
communications.

I have heard words like Torque, Slurg(?) that I can use to do mpi resource
control, but I don't know enough about mpi and don't want to bite more
than I can chew.  What I want to do at the moment is to get to know openMpi
to the best I can.

Tena Sakai
tsa...@gallo.ucsf.edu

On 1/12/11 6:30 PM, "Gus Correa"  wrote:

> Tena Sakai wrote:
>> Thank you, Gus.
>> 
>> I agree with what you say about location of OpenMPI software.
>> Indeed, /usr/local is nfs-mounted to all cluster nodes, albeit
>> a bit unfortunate name "local."  If/when I have a chance to
>> Set up machines, I will make local really local to each node.
>> 
>> Regards,
>> 
>> Tena Sakai
>> tsa...@gallo.ucsf.edu
>> 
>> 
>> On 1/12/11 4:20 PM, "Gus Correa"  wrote:
>> 
>>> Tena Sakai wrote:
 Hi,
 
 On a FAQ page (
 http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path),
 (under 1. What pre-requisites are necessary for running an Open MPI
 job?), I see an example
 of LD_LIBRARY_PATH environment variable:
LD_LIBRARY_PATH: /opt/openmpi/lib
 
 When I compiled the source, a directory /usr/local/lib/openmpi was
 generated (with many
 files in it).  Given that, do I set D_LIBRARY_PATH to /usr/local/lib or
 do I set
 it to /usr/local/lib/openmpi?
 
 Thank you.
 
 Tena Sakai
 tsa...@gallo.ucsf.edu
 
>>> To /usr/local/lib.
>>> 
>>> I would suggest not using the default /usr/local to install OpenMPI,
>>> since it tends to be  really local to the machine where you built OpenMPI.
>>> This will require that you install OpenMPI on all nodes/machines if
>>> you want to run programs across a network.
>>> 
>>> Instead, a simpler way to get OpenMPI available to all nodes,
>>> although installing only on one of them (say the head node of your
>>> cluster) is to do it in a directory that is shared, typically via NFS.
>>> To do so, use the --prefix=/my/shared/OpenMPI/directory option of
>>> the configure script.
>>> 
>>> There are FAQs about this too.
>>> 
>>> Anyway, it may depend on your environment also, whether it is a cluster
>>> with a private subnet (where my suggestion is typically used),
>>> a bunch of separate computers on a LAN (where the suggestion won't work
>>> unless you have a shared NFS directory), or other.
>>> 
>>> Gus Correa
> Hi Tena
> 
> Just out of curiosity, this is not really an OpenMPI issue,
> and I don't know anything about Rmpi either:
> 
> How do you intend to use Rmpi?
> 
> Interactively without any resource manager control (Torque,SGI,etc),
> as in a standalone PC, although utilizing several hosts via MPI?
> 
> Interactively but under resource manager control?
> (E.g., Torque has the -I directive for this type of thing,
> and I guess other resource managers have similar mechanisms.)
> 
> In batch mode, where a full R script runs and eventually exits?
> (Which can be with or without a resource manager.)
> 
> Maybe it is something else.
> 
> Matlab (parallel or not), IDL, etc, just like R[mpi],
> bring about the same type of questions for us here.
> 
> Gus Correa
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] More newbie question: --hostfile option

2011-01-12 Thread Gus Correa

Tena Sakai wrote:

Hi,

I can execute the command below:
   $ mpirun -H vixen -np 1 hostname : -H 
compute-0-0,compute-0-1,compute-0-2 -np 3 hostname

and I get:
   vixen.egcrc.org
   compute-0-0.local
   compute-0-1.local
   compute-0-2.local

I have a file myhosts, which looks like:
   compute-0-0 slots=1
   compute-0-1 slots=1
   compute-0-2 slots=1
but when I execute:
   $ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 3 hostname
I get:
   There are no allocated resources for the application
 hostname
   that match the requested mapping:
 
   Verify that you have mapped the allocated resources properly using the

   --host or --hostfile specification.
   --
   --
   A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
   launch so we are aborting.
   
   There may be more information reported by the environment (see above).
   
   This may be because the daemon was unable to find all the needed shared
   libraries on the remote node. You may set your LD_LIBRARY_PATH to 
have the

   location of the shared libraries on the remote nodes and this will
   automatically be forwarded to the remote nodes.
   --
   --
   mpirun noticed that the job aborted, but has no info as to the process
   that caused that situation.
   --
   mpirun: clean termination accomplished

Interestingly, this works:
   $ mpirun --hostfile myhosts -np 3 hostname
   compute-0-0.local
   compute-0-1.local
   compute-0-2.local
   $

Am I correct in concluding that –H and —hostfile cannot be issued in the
same mpirun command which contains a colon (:)?  Or is there any trick
or work-around to have both –H and —hostfile?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu



Hi Tena

I don't know if this is an option for you, but OpenMPI can be built
integrated with a resource manager.
This obviates completely the need to specify the host list
on the mpirun command line, or to use
a hostfile, or to get involved with all this syntactical nitty-gritty.
OpenMPI will use exactly those resources (nodes, cores, etc) that are
made available to it by the resource manager upon your request.

We use Torque here, which is simple, effective, and even available 
through RPM-type packages on many Linux distributions.

(Although it is also easy to build from source.)
I think OpenMPI also builds with SGE,
maybe with other resource managers too.
See the FAQ and the README file for more details on how to build
OpenMPI with Torque (or SGE) support.

Resource managers are also a no-nonsense way to manage jobs, either
from one or from many users.

My two cents,
Gus Correa

PS - Looking at your node's names, it looks like to me you have a Rocks 
cluster, right?

Rocks has an SGE and a Torque roll.
You could install one of them (only one!), if not yet there, and enjoy!
('rocks list roll' will tell what you have.)


Re: [OMPI users] More newbie question: --hostfile option

2011-01-12 Thread David Zhang
I think you just make a hostfile with

vixen
compute-0-0
...

and load the file in the first -H before the colon.

On Wed, Jan 12, 2011 at 6:23 PM, Tena Sakai  wrote:

>  Hi,
>
> I can execute the command below:
>$ mpirun -H vixen -np 1 hostname : -H
> compute-0-0,compute-0-1,compute-0-2 -np 3 hostname
> and I get:
>vixen.egcrc.org
>compute-0-0.local
>compute-0-1.local
>compute-0-2.local
>
> I have a file myhosts, which looks like:
>compute-0-0 slots=1
>compute-0-1 slots=1
>compute-0-2 slots=1
> but when I execute:
>$ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 3 hostname
> I get:
>There are no allocated resources for the application
>  hostname
>that match the requested mapping:
>
>Verify that you have mapped the allocated resources properly using the
>--host or --hostfile specification.
>
>--
>
>--
>A daemon (pid unknown) died unexpectedly on signal 1  while attempting
> to
>launch so we are aborting.
>
>There may be more information reported by the environment (see above).
>
>This may be because the daemon was unable to find all the needed shared
>libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
>location of the shared libraries on the remote nodes and this will
>automatically be forwarded to the remote nodes.
>
>--
>
>--
>mpirun noticed that the job aborted, but has no info as to the process
>that caused that situation.
>
>--
>mpirun: clean termination accomplished
>
> Interestingly, this works:
>$ mpirun --hostfile myhosts -np 3 hostname
>compute-0-0.local
>compute-0-1.local
>compute-0-2.local
>$
>
> Am I correct in concluding that –H and —hostfile cannot be issued in the
> same mpirun command which contains a colon (:)?  Or is there any trick
> or work-around to have both –H and —hostfile?
>
> Thank you.
>
> Tena Sakai
> tsa...@gallo.ucsf.edu
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Zhang
University of California, San Diego


Re: [OMPI users] More newbie question: --hostfile option

2011-01-12 Thread Ralph Castain

On Jan 12, 2011, at 7:23 PM, Tena Sakai wrote:

> Hi,
> 
> I can execute the command below:
>$ mpirun -H vixen -np 1 hostname : -H compute-0-0,compute-0-1,compute-0-2 
> -np 3 hostname
> and I get:
>vixen.egcrc.org
>compute-0-0.local
>compute-0-1.local
>compute-0-2.local
> 
> I have a file myhosts, which looks like:
>compute-0-0 slots=1
>compute-0-1 slots=1
>compute-0-2 slots=1
> but when I execute:
>$ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 3 hostname
> I get:
>There are no allocated resources for the application 
>  hostname
>that match the requested mapping:
>  
>Verify that you have mapped the allocated resources properly using the 
>--host or --hostfile specification.
>--
>--
>A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
>launch so we are aborting.
>
>There may be more information reported by the environment (see above).
>
>This may be because the daemon was unable to find all the needed shared
>libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>location of the shared libraries on the remote nodes and this will
>automatically be forwarded to the remote nodes.
>--
>--
>mpirun noticed that the job aborted, but has no info as to the process
>that caused that situation.
>--
>mpirun: clean termination accomplished
> 
> Interestingly, this works:
>$ mpirun --hostfile myhosts -np 3 hostname
>compute-0-0.local
>compute-0-1.local
>compute-0-2.local
>$
> 
> Am I correct in concluding that –H and —hostfile cannot be issued in the
> same mpirun command which contains a colon (:)?

It may depend on what version of OMPI you are using. Given what you see, the 
answer is "correct".


>  Or is there any trick
> or work-around to have both –H and —hostfile?

See the wiki page for an explanation of how the options are used:

https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan

> 
> Thank you.
> 
> Tena Sakai
> tsa...@gallo.ucsf.edu
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] More newbie question: LD_LIBRARY_PATH

2011-01-12 Thread Gus Correa

Tena Sakai wrote:

Thank you, Gus.

I agree with what you say about location of OpenMPI software.
Indeed, /usr/local is nfs-mounted to all cluster nodes, albeit
a bit unfortunate name "local."  If/when I have a chance to
Set up machines, I will make local really local to each node.

Regards,

Tena Sakai
tsa...@gallo.ucsf.edu


On 1/12/11 4:20 PM, "Gus Correa"  wrote:


Tena Sakai wrote:

Hi,

On a FAQ page ( 
http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path),

(under 1. What pre-requisites are necessary for running an Open MPI
job?), I see an example
of LD_LIBRARY_PATH environment variable:
   LD_LIBRARY_PATH: /opt/openmpi/lib

When I compiled the source, a directory /usr/local/lib/openmpi was
generated (with many
files in it).  Given that, do I set D_LIBRARY_PATH to /usr/local/lib or
do I set
it to /usr/local/lib/openmpi?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu


To /usr/local/lib.

I would suggest not using the default /usr/local to install OpenMPI,
since it tends to be  really local to the machine where you built OpenMPI.
This will require that you install OpenMPI on all nodes/machines if
you want to run programs across a network.

Instead, a simpler way to get OpenMPI available to all nodes,
although installing only on one of them (say the head node of your
cluster) is to do it in a directory that is shared, typically via NFS.
To do so, use the --prefix=/my/shared/OpenMPI/directory option of
the configure script.

There are FAQs about this too.

Anyway, it may depend on your environment also, whether it is a cluster
with a private subnet (where my suggestion is typically used),
a bunch of separate computers on a LAN (where the suggestion won't work
unless you have a shared NFS directory), or other.

Gus Correa

Hi Tena

Just out of curiosity, this is not really an OpenMPI issue,
and I don't know anything about Rmpi either:

How do you intend to use Rmpi?

Interactively without any resource manager control (Torque,SGI,etc),
as in a standalone PC, although utilizing several hosts via MPI?

Interactively but under resource manager control?
(E.g., Torque has the -I directive for this type of thing,
and I guess other resource managers have similar mechanisms.)

In batch mode, where a full R script runs and eventually exits?
(Which can be with or without a resource manager.)

Maybe it is something else.

Matlab (parallel or not), IDL, etc, just like R[mpi],
bring about the same type of questions for us here.

Gus Correa


[OMPI users] More newbie question: --hostfile option

2011-01-12 Thread Tena Sakai
Hi,

I can execute the command below:
   $ mpirun -H vixen -np 1 hostname : -H compute-0-0,compute-0-1,compute-0-2 
-np 3 hostname
and I get:
   vixen.egcrc.org
   compute-0-0.local
   compute-0-1.local
   compute-0-2.local

I have a file myhosts, which looks like:
   compute-0-0 slots=1
   compute-0-1 slots=1
   compute-0-2 slots=1
but when I execute:
   $ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 3 hostname
I get:
   There are no allocated resources for the application
 hostname
   that match the requested mapping:

   Verify that you have mapped the allocated resources properly using the
   --host or --hostfile specification.
   --
   --
   A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
   launch so we are aborting.

   There may be more information reported by the environment (see above).

   This may be because the daemon was unable to find all the needed shared
   libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
   location of the shared libraries on the remote nodes and this will
   automatically be forwarded to the remote nodes.
   --
   --
   mpirun noticed that the job aborted, but has no info as to the process
   that caused that situation.
   --
   mpirun: clean termination accomplished

Interestingly, this works:
   $ mpirun --hostfile myhosts -np 3 hostname
   compute-0-0.local
   compute-0-1.local
   compute-0-2.local
   $

Am I correct in concluding that –H and —hostfile cannot be issued in the
same mpirun command which contains a colon (:)?  Or is there any trick
or work-around to have both –H and —hostfile?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu


Re: [OMPI users] More newbie question: LD_LIBRARY_PATH

2011-01-12 Thread Tena Sakai
Thank you, Gus.

I agree with what you say about location of OpenMPI software.
Indeed, /usr/local is nfs-mounted to all cluster nodes, albeit
a bit unfortunate name "local."  If/when I have a chance to
Set up machines, I will make local really local to each node.

Regards,

Tena Sakai
tsa...@gallo.ucsf.edu


On 1/12/11 4:20 PM, "Gus Correa"  wrote:

> Tena Sakai wrote:
>> Hi,
>> 
>> On a FAQ page ( 
>> http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path),
>> (under 1. What pre-requisites are necessary for running an Open MPI
>> job?), I see an example
>> of LD_LIBRARY_PATH environment variable:
>>LD_LIBRARY_PATH: /opt/openmpi/lib
>> 
>> When I compiled the source, a directory /usr/local/lib/openmpi was
>> generated (with many
>> files in it).  Given that, do I set D_LIBRARY_PATH to /usr/local/lib or
>> do I set
>> it to /usr/local/lib/openmpi?
>> 
>> Thank you.
>> 
>> Tena Sakai
>> tsa...@gallo.ucsf.edu
>> 
> 
> To /usr/local/lib.
> 
> I would suggest not using the default /usr/local to install OpenMPI,
> since it tends to be  really local to the machine where you built OpenMPI.
> This will require that you install OpenMPI on all nodes/machines if
> you want to run programs across a network.
> 
> Instead, a simpler way to get OpenMPI available to all nodes,
> although installing only on one of them (say the head node of your
> cluster) is to do it in a directory that is shared, typically via NFS.
> To do so, use the --prefix=/my/shared/OpenMPI/directory option of
> the configure script.
> 
> There are FAQs about this too.
> 
> Anyway, it may depend on your environment also, whether it is a cluster
> with a private subnet (where my suggestion is typically used),
> a bunch of separate computers on a LAN (where the suggestion won't work
> unless you have a shared NFS directory), or other.
> 
> Gus Correa
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] More newbie question: LD_LIBRARY_PATH

2011-01-12 Thread Gus Correa

Tena Sakai wrote:

Hi,

On a FAQ page ( 
http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path),
(under 1. What pre-requisites are necessary for running an Open MPI 
job?), I see an example

of LD_LIBRARY_PATH environment variable:
   LD_LIBRARY_PATH: /opt/openmpi/lib

When I compiled the source, a directory /usr/local/lib/openmpi was 
generated (with many
files in it).  Given that, do I set D_LIBRARY_PATH to /usr/local/lib or 
do I set

it to /usr/local/lib/openmpi?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu



To /usr/local/lib.

I would suggest not using the default /usr/local to install OpenMPI,
since it tends to be  really local to the machine where you built OpenMPI.
This will require that you install OpenMPI on all nodes/machines if
you want to run programs across a network.

Instead, a simpler way to get OpenMPI available to all nodes,
although installing only on one of them (say the head node of your 
cluster) is to do it in a directory that is shared, typically via NFS.

To do so, use the --prefix=/my/shared/OpenMPI/directory option of
the configure script.

There are FAQs about this too.

Anyway, it may depend on your environment also, whether it is a cluster
with a private subnet (where my suggestion is typically used),
a bunch of separate computers on a LAN (where the suggestion won't work 
unless you have a shared NFS directory), or other.


Gus Correa


[OMPI users] More newbie question: LD_LIBRARY_PATH

2011-01-12 Thread Tena Sakai
Hi,

On a FAQ page ( 
http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path),
(under 1. What pre-requisites are necessary for running an Open MPI job?), I 
see an example
of LD_LIBRARY_PATH environment variable:
   LD_LIBRARY_PATH: /opt/openmpi/lib

When I compiled the source, a directory /usr/local/lib/openmpi was generated 
(with many
files in it).  Given that, do I set D_LIBRARY_PATH to /usr/local/lib or do I set
it to /usr/local/lib/openmpi?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu


Re: [OMPI users] Error from mpirun command

2011-01-12 Thread Gus Correa

Tena Sakai wrote:

Hi,

I am trying to run simple mpirun commands (pretty much straight out of
mpirun man page) and getting a bit of error message.  Here’s what I mean:

   [tsakai@vixen Rmpi]$
   [tsakai@vixen Rmpi]$ hostname
   vixen.egcrc.org
   [tsakai@vixen Rmpi]$
   [tsakai@vixen Rmpi]$ mpirun -H vixen -np 1 hostname
   vixen.egcrc.org
   [tsakai@vixen Rmpi]$ mpirun -H blitzen -np 1 hostname
   stty: standard input: Invalid argument
   blitzen.egcrc.org
   [tsakai@vixen Rmpi]$ mpirun -H vixen -np 1 hostname : -H blitzen -np 
1 hostname

   stty: standard inputvixen.egcrc.org
   blitzen.egcrc.org
   [tsakai@vixen Rmpi]$ : Invalid argument
   
   [tsakai@vixen Rmpi]$
   [tsakai@vixen Rmpi]$ mpirun -H vixen -np 1 hostname : -H blitzen -np 
1 hostname 2> stdErr

   vixen.egcrc.org
   blitzen.egcrc.org
   [tsakai@vixen Rmpi]$
   [tsakai@vixen Rmpi]$ cat stdErr
   stty: standard input: Invalid argument
   [tsakai@vixen Rmpi]$
   [tsakai@vixen Rmpi]$ ssh -l tsakai blitzen
   Last login: Wed Jan 12 15:41:59 2011 from vixen.egcrc.org
   Platform OCS Frontend Node - Blitzen Cluster
   Platform OCS 4.5.1 (Flintstone)
   Profile built 11:01 10-Jul-2008
   
   Kickstarted 11:02 10-Jul-2008

   [tsakai@blitzen ~]$
   [tsakai@blitzen ~]$ hostname
   blitzen.egcrc.org
   [tsakai@blitzen ~]$
   [tsakai@blitzen ~]$ mpirun -H blitzen -np 1 hostname
   blitzen.egcrc.org
   [tsakai@blitzen ~]$ mpirun -H vixen -np 1 hostname
   stty: standard inputvixen.egcrc.org
   [tsakai@blitzen ~]$ : Invalid argument
   
   [tsakai@blitzen ~]$ mpirun -H vixen -np 1 hostname : -H blitzen -np 1 
hostname

   stty: standard inputblitzen.egcrc.org
   vixen.egcrc.org
   [tsakai@blitzen ~]$ : Invalid argument
   
   [tsakai@blitzen ~]$
   [tsakai@blitzen ~]$ mpirun -H vixen -np 1 hostname : -H blitzen -np 1 
hostname 2> stdErr

   blitzen.egcrc.org
   vixen.egcrc.org
   [tsakai@blitzen ~]$
   [tsakai@blitzen ~]$ cat stdErr
   stty: standard input: Invalid argument
   [tsakai@blitzen ~]$
   [tsakai@blitzen ~]$ exit
   logout
   [tsakai@vixen Rmpi]$

I am using two hosts: vixen and blitzen.  It appears that when a machine 
other than
the one I am on it is specified via –H flag, I get “stty: standard 
input: Invalid argument”
message to stderr.  It doesn’t seem to impeed the execution of the 
command (in

my example, hostname), though.

Can somebody please tell me what this means and what it takes to cure 
the problem?


Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu



Guessin' ...
Anything in your .bashrc/.tcshrc or on
system-wide initialization files in /etc /etc/profile.d
that may be causing the stty output to stderr?
I did a little googling and found some stuff about it.
Perhaps it is not redirecting stderr  2>dev/null.
The message may come from the ssh session opened when mpiexec connects
you to the remote machine.

My $0.02
Gus Correa


[OMPI users] Error from mpirun command

2011-01-12 Thread Tena Sakai
Hi,

I am trying to run simple mpirun commands (pretty much straight out of
mpirun man page) and getting a bit of error message.  Here’s what I mean:

   [tsakai@vixen Rmpi]$
   [tsakai@vixen Rmpi]$ hostname
   vixen.egcrc.org
   [tsakai@vixen Rmpi]$
   [tsakai@vixen Rmpi]$ mpirun -H vixen -np 1 hostname
   vixen.egcrc.org
   [tsakai@vixen Rmpi]$ mpirun -H blitzen -np 1 hostname
   stty: standard input: Invalid argument
   blitzen.egcrc.org
   [tsakai@vixen Rmpi]$ mpirun -H vixen -np 1 hostname : -H blitzen -np 1 
hostname
   stty: standard inputvixen.egcrc.org
   blitzen.egcrc.org
   [tsakai@vixen Rmpi]$ : Invalid argument

   [tsakai@vixen Rmpi]$
   [tsakai@vixen Rmpi]$ mpirun -H vixen -np 1 hostname : -H blitzen -np 1 
hostname 2> stdErr
   vixen.egcrc.org
   blitzen.egcrc.org
   [tsakai@vixen Rmpi]$
   [tsakai@vixen Rmpi]$ cat stdErr
   stty: standard input: Invalid argument
   [tsakai@vixen Rmpi]$
   [tsakai@vixen Rmpi]$ ssh -l tsakai blitzen
   Last login: Wed Jan 12 15:41:59 2011 from vixen.egcrc.org
   Platform OCS Frontend Node - Blitzen Cluster
   Platform OCS 4.5.1 (Flintstone)
   Profile built 11:01 10-Jul-2008

   Kickstarted 11:02 10-Jul-2008
   [tsakai@blitzen ~]$
   [tsakai@blitzen ~]$ hostname
   blitzen.egcrc.org
   [tsakai@blitzen ~]$
   [tsakai@blitzen ~]$ mpirun -H blitzen -np 1 hostname
   blitzen.egcrc.org
   [tsakai@blitzen ~]$ mpirun -H vixen -np 1 hostname
   stty: standard inputvixen.egcrc.org
   [tsakai@blitzen ~]$ : Invalid argument

   [tsakai@blitzen ~]$ mpirun -H vixen -np 1 hostname : -H blitzen -np 1 
hostname
   stty: standard inputblitzen.egcrc.org
   vixen.egcrc.org
   [tsakai@blitzen ~]$ : Invalid argument

   [tsakai@blitzen ~]$
   [tsakai@blitzen ~]$ mpirun -H vixen -np 1 hostname : -H blitzen -np 1 
hostname 2> stdErr
   blitzen.egcrc.org
   vixen.egcrc.org
   [tsakai@blitzen ~]$
   [tsakai@blitzen ~]$ cat stdErr
   stty: standard input: Invalid argument
   [tsakai@blitzen ~]$
   [tsakai@blitzen ~]$ exit
   logout
   [tsakai@vixen Rmpi]$

I am using two hosts: vixen and blitzen.  It appears that when a machine other 
than
the one I am on it is specified via –H flag, I get “stty: standard input: 
Invalid argument”
message to stderr.  It doesn’t seem to impeed the execution of the command (in
my example, hostname), though.

Can somebody please tell me what this means and what it takes to cure the 
problem?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu


Re: [OMPI users] Newbie question

2011-01-12 Thread Gus Correa

Ralph Castain wrote:

On Jan 12, 2011, at 12:54 PM, Tena Sakai wrote:


Hi Siegmar,

Many thanks for your reply.

I have tried man pages you mention, but one hurdle I am running into
is orte_hosts page.  I don't find the specification of fields for
the file.  I see an example:

  dummy1 slots=4
  dummy2 slots=4
  dummy3 slots=4
  dummy4 slots=4
  dummy5 slots=4

Is the first field (dummyX) machine/node name?  


Yes


What is the definition
of slots?  (Max number of processes to spawn?)


Yes


Here we don't let 'slots' exceed the number of physical cores.
(Actually, Torque does this for us.)
I suppose this prevents the cores to be oversubscribed,
at least by default, right?

Gus Correa





Am I missing a different man page?  Can you please shed some light?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu




On 1/10/11 11:38 PM, "Siegmar Gross" 
wrote:


Hi,


What I want is to spawn a bunch of R slaves to other machines on
the network. I can spawn R slaves, as many as I like, to the local
machine, but I don t know how to do this with machines on the
network.  That s what hosts parameter of mpi.spawn.Rslaves()
enables me to do, I think.  If I can do that, then Rmpi has
function(s) to send command to each of the spawned slaves.

My question is how can I get open MPI to give me those hosts
parameters.

I am not quite sure if I understood your question, but when you
read "man MPI_Comm_spawn" you can find the parameter "MPI_Info info"
which allows to specify where and how to start processes. "man
MPI_Info_create" shows you how to create an info object and "man
MPI_Info_set" how to add a key/value pair. "man orte_hosts" shows
you how you can build a hostfile. I do not know how to do these
things in your language R but hopefully the information of the
manual pages helps to solve your problem.

Kind regards

Siegmar

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Newbie question

2011-01-12 Thread Ralph Castain

On Jan 12, 2011, at 12:54 PM, Tena Sakai wrote:

> Hi Siegmar,
> 
> Many thanks for your reply.
> 
> I have tried man pages you mention, but one hurdle I am running into
> is orte_hosts page.  I don't find the specification of fields for
> the file.  I see an example:
> 
>   dummy1 slots=4
>   dummy2 slots=4
>   dummy3 slots=4
>   dummy4 slots=4
>   dummy5 slots=4
> 
> Is the first field (dummyX) machine/node name?  

Yes

> What is the definition
> of slots?  (Max number of processes to spawn?)

Yes


> 
> Am I missing a different man page?  Can you please shed some light?
> 
> Thank you.
> 
> Tena Sakai
> tsa...@gallo.ucsf.edu
> 
> 
> 
> 
> On 1/10/11 11:38 PM, "Siegmar Gross" 
> wrote:
> 
>> Hi,
>> 
>>> What I want is to spawn a bunch of R slaves to other machines on
>>> the network. I can spawn R slaves, as many as I like, to the local
>>> machine, but I don t know how to do this with machines on the
>>> network.  That s what hosts parameter of mpi.spawn.Rslaves()
>>> enables me to do, I think.  If I can do that, then Rmpi has
>>> function(s) to send command to each of the spawned slaves.
>>> 
>>> My question is how can I get open MPI to give me those hosts
>>> parameters.
>> 
>> I am not quite sure if I understood your question, but when you
>> read "man MPI_Comm_spawn" you can find the parameter "MPI_Info info"
>> which allows to specify where and how to start processes. "man
>> MPI_Info_create" shows you how to create an info object and "man
>> MPI_Info_set" how to add a key/value pair. "man orte_hosts" shows
>> you how you can build a hostfile. I do not know how to do these
>> things in your language R but hopefully the information of the
>> manual pages helps to solve your problem.
>> 
>> Kind regards
>> 
>> Siegmar
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Newbie question

2011-01-12 Thread Tena Sakai
Hi Siegmar,

Many thanks for your reply.

I have tried man pages you mention, but one hurdle I am running into
is orte_hosts page.  I don't find the specification of fields for
the file.  I see an example:

   dummy1 slots=4
   dummy2 slots=4
   dummy3 slots=4
   dummy4 slots=4
   dummy5 slots=4

Is the first field (dummyX) machine/node name?  What is the definition
of slots?  (Max number of processes to spawn?)

Am I missing a different man page?  Can you please shed some light?

Thank you.

Tena Sakai
tsa...@gallo.ucsf.edu




On 1/10/11 11:38 PM, "Siegmar Gross" 
wrote:

> Hi,
> 
>> What I want is to spawn a bunch of R slaves to other machines on
>> the network. I can spawn R slaves, as many as I like, to the local
>> machine, but I don t know how to do this with machines on the
>> network.  That s what hosts parameter of mpi.spawn.Rslaves()
>> enables me to do, I think.  If I can do that, then Rmpi has
>> function(s) to send command to each of the spawned slaves.
>> 
>> My question is how can I get open MPI to give me those hosts
>> parameters.
> 
> I am not quite sure if I understood your question, but when you
> read "man MPI_Comm_spawn" you can find the parameter "MPI_Info info"
> which allows to specify where and how to start processes. "man
> MPI_Info_create" shows you how to create an info object and "man
> MPI_Info_set" how to add a key/value pair. "man orte_hosts" shows
> you how you can build a hostfile. I do not know how to do these
> things in your language R but hopefully the information of the
> manual pages helps to solve your problem.
> 
> Kind regards
> 
> Siegmar
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2011-01-12 Thread Shamis, Pavel
You are running 1.4.1 version. If it does not work, I would contact your ib 
vendor, or ofa-general mail list to check what combination of Firmware / driver 
you have to use.

Regards,

Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
Email: sham...@ornl.gov





On Jan 7, 2011, at 10:50 AM, Gilbert Grosdidier wrote:

Bonjour Pavel,

 Here is the output of the ofed_info command :

==
OFED-1.4.1
libibverbs:
git://git.openfabrics.org/ofed_1_4/libibverbs.git ofed_1_4
commit b00dc7d2f79e0660ac40160607c9c4937a895433
libmthca:
git://git.kernel.org/pub/scm/libs/infiniband/libmthca.git master
commit be5eef3895eb7864db6395b885a19f770fde7234
libmlx4:
git://git.openfabrics.org/ofed_1_4/libmlx4.git ofed_1_4
commit d5e5026e2bd3bbd7648199a48c4245daf313aa48
libehca:
git://git.openfabrics.org/ofed_1_4/libehca.git ofed_1_4
commit 0249815e9b6f134f33546da6fa2e84e1185eea6d
libipathverbs:
git://git.openfabrics.org/~ralphc/libipathverbs ofed_1_4
commit 337df3c1cbe43c3e9cb58e7f6e91f44603dd23fb
libcxgb3:
git://git.openfabrics.org/~swise/libcxgb3.git ofed_1_4
commit f685c8fe7e77e64614d825e563dd9f02a0b1ae16
libnes:
git://git.openfabrics.org/~glenn/libnes.git master
commit 379cccb4484f39b99c974eb6910d3a0407c0bbd1
libibcm:
git://git.openfabrics.org/~shefty/libibcm.git master
commit 7fb57e005b3eae2feb83b3fd369aeba700a5bcf8
librdmacm:
git://git.openfabrics.org/~shefty/librdmacm.git master
commit 62c2bddeaf5275425e1a7e3add59c3913ccdb4e9
libsdp:
git://git.openfabrics.org/ofed_1_4/libsdp.git ofed_1_4
commit b1eaecb7806d60922b2fe7a2592cea4ae56cc2ab
sdpnetstat:
git://git.openfabrics.org/~amirv/sdpnetstat.git ofed_1_4
commit 798e44f6d5ff8b15b2a86bc36768bd2ad473a6d7
srptools:
git://git.openfabrics.org/~ishai/srptools.git master
commit ce1f64c8dd63c93d56c1cc5fbcdaaadd4f74a1e3
perftest:
git://git.openfabrics.org/~orenmeron/perftest.git master
commit 1cd38e844dc50d670b48200bcda91937df5f5a92
qlvnictools:
git://git.openfabrics.org/~ramachandrak/qlvnictools.git ofed_1_4
commit 4ce9789273896d0e67430c330eb3703405b59951
tvflash:
git://git.openfabrics.org/ofed_1_4/tvflash.git ofed_1_4
commit e1b50b3b8af52b0bc55b2825bb4d6ce699d5c43b
mstflint:
git://git.openfabrics.org/~orenk/mstflint.git master
commit 3352f8997591c6955430b3e68adba33e80a974e3
qperf:
git://git.openfabrics.org/~johann/qperf.git/.git master
commit 18e1c1e8af96cd8bcacced3c4c2a4fd90f880792
ibutils:
git://git.openfabrics.org/~kliteyn/ibutils.git ofed_1_4
commit 9d4bfc3ba19875dfa4583dfaef6f0f579bb013bb
ibsim:
git://git.openfabrics.org/ofed_1_4/ibsim.git ofed_1_4
commit a76132ae36dde8302552d896e35bd29608ac9524

ofa_kernel-1.4.1:
Git:
git://git.openfabrics.org/ofed_1_4/linux-2.6.git ofed_kernel
commit 868661b127c355c64066a796460a7380a722dd84


 Does this mean the resize_cq function should be available, please ?

 Thanks,   Regards, Gilbert.


Le 7 janv. 11 à 16:14, Shamis, Pavel a écrit :

The FW version looks ok. But it may be driver issues as well. I guess that OFED 
1.4.X or 1.5.x driver should be ok.
To check driver version , you may run ofed_info command.

Regards,

Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
Email: sham...@ornl.gov





On Dec 17, 2010, at 12:30 PM, Gilbert Grosdidier wrote:

John,

Thanks, more info below.


Le 17/12/2010 17:32, John Hearns a écrit :
On 17 December 2010 15:47, Gilbert Grosdidier
>  wrote:
gg= I don't know, and firmware_revs does not seem to be available.
Only thing I got on a worker node was with lspci :
If you log into a compute node the command is /usr/sbin/ibstat
gg= Here it is :

/usr/sbin/ibstat
CA 'mlx4_0'
   CA type: MT26418
   Number of ports: 2
   Firmware version: 2.7.0
   Hardware version: a0
   Node GUID: 0x003048f036c4
   System image GUID: 0x003048f036c7
   Port 1:
   State: Active
   Physical state: LinkUp
   Rate: 20
   Base lid: 6611
   LMC: 0
   SM lid: 1
   Capability mask: 0x02510868
   Port GUID: 0x003048f036c5
   Port 2:
   State: Active
   Physical state: LinkUp
   Rate: 20
   Base lid: 6612
   LMC: 0
   SM lid: 1
   Capability mask: 0x02510868
   Port GUID: 0x003048f036c6

Does this mean resize_cq should be available, please ?

Thanks,Best,  G.


The firmware_revs command is on the cluster admin node, and is
provided by the sgi-admin-node RPM package.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users