Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified

2013-01-24 Thread Ralph Castain
Sure  - just add --with-openib=no --with-psm=no to your config line and we'll 
ignore it

On Jan 24, 2013, at 7:09 AM, Sabuj Pattanayek  wrote:

> ahha, with --display-allocation I'm getting :
> 
> mca: base: component_find: unable to open
> /sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm:
> libpsm_infinipath.so.1: cannot open shared object file: No such file
> or directory (ignored)
> 
> I think the system I compiled it on has different ib libs than the
> nodes. I'll need to recompile and then see if it runs, but is there
> anyway to get it to ignore IB and just use gigE? Not all of our nodes
> have IB and I just want to use any node.
> 
> On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain  wrote:
>> How did you configure OMPI? If you add --display-allocation to your cmd 
>> line, does it show all the nodes?
>> 
>> On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek  wrote:
>> 
>>> Hi,
>>> 
>>> I'm submitting a job through torque/PBS, the head node also runs the
>>> Moab scheduler, the .pbs file has this in the resources line :
>>> 
>>> #PBS -l nodes=2:ppn=4
>>> 
>>> I've also tried something like :
>>> 
>>> #PBS -l procs=56
>>> 
>>> and at the end of script I'm running :
>>> 
>>> mpirun -np 8 cat /dev/urandom > /dev/null
>>> 
>>> or
>>> 
>>> mpirun -np 56 cat /dev/urandom > /dev/null
>>> 
>>> ...depending on how many processors I requested. The job starts,
>>> $PBS_NODEFILE has the nodes that the job was assigned listed, but all
>>> the cat's are piled onto the first node. Any idea how I can get this
>>> to submit jobs across multiple nodes? Note, I have OSU mpiexec working
>>> without problems with mvapich and mpich2 on our cluster to launch jobs
>>> across multiple nodes.
>>> 
>>> Thanks,
>>> Sabuj
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified

2013-01-24 Thread Brock Palen
On Jan 24, 2013, at 10:10 AM, Sabuj Pattanayek wrote:

> or do i just need to compile two versions, one with IB and one without?

You should not need to, we have OMPI compiled for openib/psm and run that same 
install on psm/tcp and verbs(openib) based gear.

All the nodes assigned to your job have qlogic IB adaptors? They also have 
libpsm_ininipath installed on all of them?  This will be required.

Also did you build your openmpi with tm?  --with-tm=/usr/local/torque/  (or 
where ever the path to lib/libtorque.so  is.)

With TM support, mpirun from OMPI will know how to find the CPUs assigned to 
your job by torque.  This is the better way, you can also in a pinch use 
mpirun -machinefile $PBS_NODEFILE -np 8 

But really tm is better.

Here is our build line for OMPI:

./configure --prefix=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1 
--mandir=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1/man 
--with-tm=/usr/local/torque --with-openib --with-psm 
--with-mxm=/home/software/rhel6/mxm/1.5 
--with-io-romio-flags=--with-file-system=testfs+ufs+lustre --disable-dlopen 
--enable-shared CC=icc CXX=icpc FC=ifort F77=ifort

We run torque with OMPI.

> 
> On Thu, Jan 24, 2013 at 9:09 AM, Sabuj Pattanayek  wrote:
>> ahha, with --display-allocation I'm getting :
>> 
>> mca: base: component_find: unable to open
>> /sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm:
>> libpsm_infinipath.so.1: cannot open shared object file: No such file
>> or directory (ignored)
>> 
>> I think the system I compiled it on has different ib libs than the
>> nodes. I'll need to recompile and then see if it runs, but is there
>> anyway to get it to ignore IB and just use gigE? Not all of our nodes
>> have IB and I just want to use any node.
>> 
>> On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain  wrote:
>>> How did you configure OMPI? If you add --display-allocation to your cmd 
>>> line, does it show all the nodes?
>>> 
>>> On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek  wrote:
>>> 
 Hi,
 
 I'm submitting a job through torque/PBS, the head node also runs the
 Moab scheduler, the .pbs file has this in the resources line :
 
 #PBS -l nodes=2:ppn=4
 
 I've also tried something like :
 
 #PBS -l procs=56
 
 and at the end of script I'm running :
 
 mpirun -np 8 cat /dev/urandom > /dev/null
 
 or
 
 mpirun -np 56 cat /dev/urandom > /dev/null
 
 ...depending on how many processors I requested. The job starts,
 $PBS_NODEFILE has the nodes that the job was assigned listed, but all
 the cat's are piled onto the first node. Any idea how I can get this
 to submit jobs across multiple nodes? Note, I have OSU mpiexec working
 without problems with mvapich and mpich2 on our cluster to launch jobs
 across multiple nodes.
 
 Thanks,
 Sabuj
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified

2013-01-24 Thread Sabuj Pattanayek
or do i just need to compile two versions, one with IB and one without?

On Thu, Jan 24, 2013 at 9:09 AM, Sabuj Pattanayek  wrote:
> ahha, with --display-allocation I'm getting :
>
> mca: base: component_find: unable to open
> /sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm:
> libpsm_infinipath.so.1: cannot open shared object file: No such file
> or directory (ignored)
>
> I think the system I compiled it on has different ib libs than the
> nodes. I'll need to recompile and then see if it runs, but is there
> anyway to get it to ignore IB and just use gigE? Not all of our nodes
> have IB and I just want to use any node.
>
> On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain  wrote:
>> How did you configure OMPI? If you add --display-allocation to your cmd 
>> line, does it show all the nodes?
>>
>> On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek  wrote:
>>
>>> Hi,
>>>
>>> I'm submitting a job through torque/PBS, the head node also runs the
>>> Moab scheduler, the .pbs file has this in the resources line :
>>>
>>> #PBS -l nodes=2:ppn=4
>>>
>>> I've also tried something like :
>>>
>>> #PBS -l procs=56
>>>
>>> and at the end of script I'm running :
>>>
>>> mpirun -np 8 cat /dev/urandom > /dev/null
>>>
>>> or
>>>
>>> mpirun -np 56 cat /dev/urandom > /dev/null
>>>
>>> ...depending on how many processors I requested. The job starts,
>>> $PBS_NODEFILE has the nodes that the job was assigned listed, but all
>>> the cat's are piled onto the first node. Any idea how I can get this
>>> to submit jobs across multiple nodes? Note, I have OSU mpiexec working
>>> without problems with mvapich and mpich2 on our cluster to launch jobs
>>> across multiple nodes.
>>>
>>> Thanks,
>>> Sabuj
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified

2013-01-24 Thread Sabuj Pattanayek
ahha, with --display-allocation I'm getting :

mca: base: component_find: unable to open
/sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm:
libpsm_infinipath.so.1: cannot open shared object file: No such file
or directory (ignored)

I think the system I compiled it on has different ib libs than the
nodes. I'll need to recompile and then see if it runs, but is there
anyway to get it to ignore IB and just use gigE? Not all of our nodes
have IB and I just want to use any node.

On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain  wrote:
> How did you configure OMPI? If you add --display-allocation to your cmd line, 
> does it show all the nodes?
>
> On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek  wrote:
>
>> Hi,
>>
>> I'm submitting a job through torque/PBS, the head node also runs the
>> Moab scheduler, the .pbs file has this in the resources line :
>>
>> #PBS -l nodes=2:ppn=4
>>
>> I've also tried something like :
>>
>> #PBS -l procs=56
>>
>> and at the end of script I'm running :
>>
>> mpirun -np 8 cat /dev/urandom > /dev/null
>>
>> or
>>
>> mpirun -np 56 cat /dev/urandom > /dev/null
>>
>> ...depending on how many processors I requested. The job starts,
>> $PBS_NODEFILE has the nodes that the job was assigned listed, but all
>> the cat's are piled onto the first node. Any idea how I can get this
>> to submit jobs across multiple nodes? Note, I have OSU mpiexec working
>> without problems with mvapich and mpich2 on our cluster to launch jobs
>> across multiple nodes.
>>
>> Thanks,
>> Sabuj
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified

2013-01-24 Thread Ralph Castain
How did you configure OMPI? If you add --display-allocation to your cmd line, 
does it show all the nodes?

On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek  wrote:

> Hi,
> 
> I'm submitting a job through torque/PBS, the head node also runs the
> Moab scheduler, the .pbs file has this in the resources line :
> 
> #PBS -l nodes=2:ppn=4
> 
> I've also tried something like :
> 
> #PBS -l procs=56
> 
> and at the end of script I'm running :
> 
> mpirun -np 8 cat /dev/urandom > /dev/null
> 
> or
> 
> mpirun -np 56 cat /dev/urandom > /dev/null
> 
> ...depending on how many processors I requested. The job starts,
> $PBS_NODEFILE has the nodes that the job was assigned listed, but all
> the cat's are piled onto the first node. Any idea how I can get this
> to submit jobs across multiple nodes? Note, I have OSU mpiexec working
> without problems with mvapich and mpich2 on our cluster to launch jobs
> across multiple nodes.
> 
> Thanks,
> Sabuj
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified

2013-01-24 Thread Sabuj Pattanayek
Hi,

I'm submitting a job through torque/PBS, the head node also runs the
Moab scheduler, the .pbs file has this in the resources line :

#PBS -l nodes=2:ppn=4

I've also tried something like :

#PBS -l procs=56

and at the end of script I'm running :

mpirun -np 8 cat /dev/urandom > /dev/null

or

mpirun -np 56 cat /dev/urandom > /dev/null

...depending on how many processors I requested. The job starts,
$PBS_NODEFILE has the nodes that the job was assigned listed, but all
the cat's are piled onto the first node. Any idea how I can get this
to submit jobs across multiple nodes? Note, I have OSU mpiexec working
without problems with mvapich and mpich2 on our cluster to launch jobs
across multiple nodes.

Thanks,
Sabuj