Maybe it's my pine mailer.

This is a NAMD run on 256 procs across 32 dual-socket quad-core AMD
shangai nodes running a standard benchmark called stmv.

The basic error message, which occurs 31 times is like:

[s0164:24296] [[64102,0],16] ORTE_ERROR_LOG: Not found in file 
../../../.././orte/mca/odls/base/odls_base_default_fns.c at line 595

The mpirun command has long paths in it, sorry. It's invoking a special binding
script which in turn lauches the NAMD run. This works on an older SVN at
level 1.4a1r20123 (for 16,32,64,128 and 512 procs)but not for this 256 proc run 
where
the older SVN hangs indefinitely polling some completion (sm or openib). So, I 
was trying
later SVNs with this 256 proc run, hoping the error would go away.

Here's some of the invocation again. Hope you can read it:

EAGER_SIZE=32767
export OMPI_MCA_btl_openib_use_eager_rdma=0
export OMPI_MCA_btl_openib_eager_limit=$EAGER_SIZE
export OMPI_MCA_btl_self_eager_limit=$EAGER_SIZE
export OMPI_MCA_btl_sm_eager_limit=$EAGER_SIZE

and, unexpanded

mpirun --prefix $PREFIX -np %PE% $MCA -x OMPI_MCA_btl_openib_use_eager_rdma -x 
OMPI_MCA_btl_openib_eager_limit -x OMPI_MCA_btl_self_eager_limit -x 
OMPI_MCA_btl_sm_eager_limit -machinefile $HOSTS $MPI_BINDER $NAMD2 stmv.namd

and, expanded

mpirun --prefix 
/tools/openmpi/1.4a1r20643_svn/connectx/intel64/10.1.015/openib/suse_sles_10/x86_64/opteron
 -np 256 --mca btl sm,openib,self -x OMPI_MCA_btl_openib_use_eager_rdma -x 
OMPI_MCA_btl_openib_eager_limit -x OMPI_MCA_btl_self_eager_limit -x 
OMPI_MCA_btl_sm_eager_limit -machinefile /tmp/48292.1.all.q/newhosts 
/ctmp8/mostyn/IMSC/bench_intel_openmpi_I_shang2/mpi_binder.MRL 
/ctmp8/mostyn/IMSC/bench_intel_openmpi_I_shang2/intel-10.1.015_ofed_1.3.1_openmpi_1.4a1r20643_svn/NAMD_2.6_Source/Linux-amd64-MPI/namd2
 stmv.namd

This is all via Sun Grid Engine.
The OS as indicated above is SuSE SLES 10 SP2.

DM
On Thu, 26 Feb 2009, Ralph Castain wrote:

I'm sorry, but I can't make any sense of this message. Could you provide a
little explanation of what you are doing, what the system looks like, what is
supposed to happen, etc? I can barely parse your cmd line...

Thanks
Ralph

On Feb 26, 2009, at 1:03 PM, Mostyn Lewis wrote:

Today's and yesterdays.

1.4a1r20643_svn

+ mpirun --prefix
/tools/openmpi/1.4a1r20643_svn/connectx/intel64/10.1.015/openib/suse_sles_10/x86_6
4/opteron -np 256 --mca btl sm,openib,self -x
OMPI_MCA_btl_openib_use_eager_rdma -x OMPI_MCA_btl_ope
nib_eager_limit -x OMPI_MCA_btl_self_eager_limit -x
OMPI_MCA_btl_sm_eager_limit -machinefile /tmp/48
269.1.all.q/newhosts
/ctmp8/mostyn/IMSC/bench_intel_openmpi_I_shang2/mpi_binder.MRL
/ctmp8/mostyn/IM
SC/bench_intel_openmpi_I_shang2/intel-10.1.015_ofed_1.3.1_openmpi_1.4a1r20643_svn/NAMD_2.6_Source/Li
nux-amd64-MPI/namd2 stmv.namd
[s0164:24296] [[64102,0],16] ORTE_ERROR_LOG: Not found in file
../../../.././orte/mca/odls/base/odls
_base_default_fns.c at line 595
[s0128:24439] [[64102,0],4] ORTE_ERROR_LOG: Not found in file
../../../.././orte/mca/odls/base/odls_
base_default_fns.c at line 595
[s0156:29300] [[64102,0],12] ORTE_ERROR_LOG: Not found in file
../../../.././orte/mca/odls/base/odls
_base_default_fns.c at line 595
[s0168:20585] [[64102,0],20] ORTE_ERROR_LOG: Not found in file
../../../.././orte/mca/odls/base/odls
_base_default_fns.c at line 595
[s0181:19554] [[64102,0],28] ORTE_ERROR_LOG: Not found in file
../../../.././orte/mca/odls/base/odls
_base_default_fns.c at line 595

Made with INTEL compilers 10.1.015.


Regards,
Mostyn

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to