Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified
Sure - just add --with-openib=no --with-psm=no to your config line and we'll ignore it On Jan 24, 2013, at 7:09 AM, Sabuj Pattanayek wrote: > ahha, with --display-allocation I'm getting : > > mca: base: component_find: unable to open > /sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm: > libpsm_infinipath.so.1: cannot open shared object file: No such file > or directory (ignored) > > I think the system I compiled it on has different ib libs than the > nodes. I'll need to recompile and then see if it runs, but is there > anyway to get it to ignore IB and just use gigE? Not all of our nodes > have IB and I just want to use any node. > > On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain wrote: >> How did you configure OMPI? If you add --display-allocation to your cmd >> line, does it show all the nodes? >> >> On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek wrote: >> >>> Hi, >>> >>> I'm submitting a job through torque/PBS, the head node also runs the >>> Moab scheduler, the .pbs file has this in the resources line : >>> >>> #PBS -l nodes=2:ppn=4 >>> >>> I've also tried something like : >>> >>> #PBS -l procs=56 >>> >>> and at the end of script I'm running : >>> >>> mpirun -np 8 cat /dev/urandom > /dev/null >>> >>> or >>> >>> mpirun -np 56 cat /dev/urandom > /dev/null >>> >>> ...depending on how many processors I requested. The job starts, >>> $PBS_NODEFILE has the nodes that the job was assigned listed, but all >>> the cat's are piled onto the first node. Any idea how I can get this >>> to submit jobs across multiple nodes? Note, I have OSU mpiexec working >>> without problems with mvapich and mpich2 on our cluster to launch jobs >>> across multiple nodes. >>> >>> Thanks, >>> Sabuj >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified
On Jan 24, 2013, at 10:10 AM, Sabuj Pattanayek wrote: > or do i just need to compile two versions, one with IB and one without? You should not need to, we have OMPI compiled for openib/psm and run that same install on psm/tcp and verbs(openib) based gear. All the nodes assigned to your job have qlogic IB adaptors? They also have libpsm_ininipath installed on all of them? This will be required. Also did you build your openmpi with tm? --with-tm=/usr/local/torque/ (or where ever the path to lib/libtorque.so is.) With TM support, mpirun from OMPI will know how to find the CPUs assigned to your job by torque. This is the better way, you can also in a pinch use mpirun -machinefile $PBS_NODEFILE -np 8 But really tm is better. Here is our build line for OMPI: ./configure --prefix=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1 --mandir=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1/man --with-tm=/usr/local/torque --with-openib --with-psm --with-mxm=/home/software/rhel6/mxm/1.5 --with-io-romio-flags=--with-file-system=testfs+ufs+lustre --disable-dlopen --enable-shared CC=icc CXX=icpc FC=ifort F77=ifort We run torque with OMPI. > > On Thu, Jan 24, 2013 at 9:09 AM, Sabuj Pattanayek wrote: >> ahha, with --display-allocation I'm getting : >> >> mca: base: component_find: unable to open >> /sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm: >> libpsm_infinipath.so.1: cannot open shared object file: No such file >> or directory (ignored) >> >> I think the system I compiled it on has different ib libs than the >> nodes. I'll need to recompile and then see if it runs, but is there >> anyway to get it to ignore IB and just use gigE? Not all of our nodes >> have IB and I just want to use any node. >> >> On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain wrote: >>> How did you configure OMPI? If you add --display-allocation to your cmd >>> line, does it show all the nodes? >>> >>> On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek wrote: >>> Hi, I'm submitting a job through torque/PBS, the head node also runs the Moab scheduler, the .pbs file has this in the resources line : #PBS -l nodes=2:ppn=4 I've also tried something like : #PBS -l procs=56 and at the end of script I'm running : mpirun -np 8 cat /dev/urandom > /dev/null or mpirun -np 56 cat /dev/urandom > /dev/null ...depending on how many processors I requested. The job starts, $PBS_NODEFILE has the nodes that the job was assigned listed, but all the cat's are piled onto the first node. Any idea how I can get this to submit jobs across multiple nodes? Note, I have OSU mpiexec working without problems with mvapich and mpich2 on our cluster to launch jobs across multiple nodes. Thanks, Sabuj ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified
or do i just need to compile two versions, one with IB and one without? On Thu, Jan 24, 2013 at 9:09 AM, Sabuj Pattanayek wrote: > ahha, with --display-allocation I'm getting : > > mca: base: component_find: unable to open > /sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm: > libpsm_infinipath.so.1: cannot open shared object file: No such file > or directory (ignored) > > I think the system I compiled it on has different ib libs than the > nodes. I'll need to recompile and then see if it runs, but is there > anyway to get it to ignore IB and just use gigE? Not all of our nodes > have IB and I just want to use any node. > > On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain wrote: >> How did you configure OMPI? If you add --display-allocation to your cmd >> line, does it show all the nodes? >> >> On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek wrote: >> >>> Hi, >>> >>> I'm submitting a job through torque/PBS, the head node also runs the >>> Moab scheduler, the .pbs file has this in the resources line : >>> >>> #PBS -l nodes=2:ppn=4 >>> >>> I've also tried something like : >>> >>> #PBS -l procs=56 >>> >>> and at the end of script I'm running : >>> >>> mpirun -np 8 cat /dev/urandom > /dev/null >>> >>> or >>> >>> mpirun -np 56 cat /dev/urandom > /dev/null >>> >>> ...depending on how many processors I requested. The job starts, >>> $PBS_NODEFILE has the nodes that the job was assigned listed, but all >>> the cat's are piled onto the first node. Any idea how I can get this >>> to submit jobs across multiple nodes? Note, I have OSU mpiexec working >>> without problems with mvapich and mpich2 on our cluster to launch jobs >>> across multiple nodes. >>> >>> Thanks, >>> Sabuj >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified
ahha, with --display-allocation I'm getting : mca: base: component_find: unable to open /sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored) I think the system I compiled it on has different ib libs than the nodes. I'll need to recompile and then see if it runs, but is there anyway to get it to ignore IB and just use gigE? Not all of our nodes have IB and I just want to use any node. On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain wrote: > How did you configure OMPI? If you add --display-allocation to your cmd line, > does it show all the nodes? > > On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek wrote: > >> Hi, >> >> I'm submitting a job through torque/PBS, the head node also runs the >> Moab scheduler, the .pbs file has this in the resources line : >> >> #PBS -l nodes=2:ppn=4 >> >> I've also tried something like : >> >> #PBS -l procs=56 >> >> and at the end of script I'm running : >> >> mpirun -np 8 cat /dev/urandom > /dev/null >> >> or >> >> mpirun -np 56 cat /dev/urandom > /dev/null >> >> ...depending on how many processors I requested. The job starts, >> $PBS_NODEFILE has the nodes that the job was assigned listed, but all >> the cat's are piled onto the first node. Any idea how I can get this >> to submit jobs across multiple nodes? Note, I have OSU mpiexec working >> without problems with mvapich and mpich2 on our cluster to launch jobs >> across multiple nodes. >> >> Thanks, >> Sabuj >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified
How did you configure OMPI? If you add --display-allocation to your cmd line, does it show all the nodes? On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek wrote: > Hi, > > I'm submitting a job through torque/PBS, the head node also runs the > Moab scheduler, the .pbs file has this in the resources line : > > #PBS -l nodes=2:ppn=4 > > I've also tried something like : > > #PBS -l procs=56 > > and at the end of script I'm running : > > mpirun -np 8 cat /dev/urandom > /dev/null > > or > > mpirun -np 56 cat /dev/urandom > /dev/null > > ...depending on how many processors I requested. The job starts, > $PBS_NODEFILE has the nodes that the job was assigned listed, but all > the cat's are piled onto the first node. Any idea how I can get this > to submit jobs across multiple nodes? Note, I have OSU mpiexec working > without problems with mvapich and mpich2 on our cluster to launch jobs > across multiple nodes. > > Thanks, > Sabuj > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] openmpi 1.6.3, job submitted through torque/PBS + Moab (scheduler) only land on one node even though multiple nodes/processors are specified
Hi, I'm submitting a job through torque/PBS, the head node also runs the Moab scheduler, the .pbs file has this in the resources line : #PBS -l nodes=2:ppn=4 I've also tried something like : #PBS -l procs=56 and at the end of script I'm running : mpirun -np 8 cat /dev/urandom > /dev/null or mpirun -np 56 cat /dev/urandom > /dev/null ...depending on how many processors I requested. The job starts, $PBS_NODEFILE has the nodes that the job was assigned listed, but all the cat's are piled onto the first node. Any idea how I can get this to submit jobs across multiple nodes? Note, I have OSU mpiexec working without problems with mvapich and mpich2 on our cluster to launch jobs across multiple nodes. Thanks, Sabuj