I tried out the TCP connection and here is what the error file came out as.
[wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file
../../../../orte/mca\
/pls/base/pls_base_orted_cmds.c at line 275
[wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file
../../../../../orte/\
mca/pls/tm/pls
mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 8
/home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log
to run using tcp interface in job submission script.
Rangam
___
From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of
Tushar
Ya sure, here is the list
Open MPI: 1.2.7
Open MPI SVN revision: r19401
Open RTE: 1.2.7
Open RTE SVN revision: r19401
OPAL: 1.2.7
OPAL SVN revision: r19401
Prefix: /opt/libraries/openmpi/openmpi-1.2.7-pgi
Configured a
Hello Tushar,
Can you send me the output of ompi_info.
Have you tried using just tcp instead of IB to narrow down.
Rangam
#!/bin/sh
#PBS -V
#PBS -q wasatch
#PBS -N SWMF
#PBS -l nodes=1:ppn=8
# change to the run directory
#cd $SWMF_v2.3/run
cat `echo ${PBS_NODEFILE}` > list_of_nodes
mpirun --mca b
On 20 November 2010 16:31, Gilbert Grosdidier wrote:
> Bonjour,
Bonjour Gilbert.
I manage ICE clusters also.
Please could you have look at /etc/init.d/pbs on the compute blades?
Do you have something like:
if [ "${PBS_START_MOM}" -gt 0 ] ; then
if check_prog "mom" ; then
e
Rangam,
It does not want to run at all. Attached is the log file from the batch file
run u sent.
On Sat, Nov 20, 2010 at 10:32 AM, Addepalli, Srirangam V <
srirangam.v.addepa...@ttu.edu> wrote:
> Hello Tushar,
> MPIRUN is not able to spawn processes on the node allocated. This should
> help
>
>
Hello Tushar,
MPIRUN is not able to spawn processes on the node allocated. This should help
#!/bin/sh
#PBS -V
#PBS -q wasatch
#PBS -N SWMF
#PBS -l nodes=2:ppn=8
# change to the run directory
#cd $SWMF_v2.3/run
cat `echo ${PBS_NODEFILE}` > list_of_nodes
mpirun -np 8 /home/A00945081/SWMF_v2.3/run
Hi Rangam,
I ran the batch file that you gave and have attached the error file. Also,
since the WASATCH cluster is kind of small, people usually run on UINTA. So,
if possible could you look at the uinta error files?
Tushar
On Fri, Nov 19, 2010 at 12:31 PM, Addepalli, Srirangam V <
srirangam.v.add
Bonjour,
I am afraid I got a weird issue when running an OpenMPI job using OpenIB
on an SGI ICE cluster with 4096 cores (or larger), and the FAQ does not help.
The OMPI version is 1.4.1, and it is running just fine with a smaller number of
cores (up to 512).
The error message is the following