Hi Prentice,
after some tests I've concluded that is not an environment problem;
following you can see the env printed by a job. And seems correct.
I've seen if that the library /usr/local/lib/openmpi/mca_plm_lsf is in the
appropriate location the job fail:
> mpirun: symbol lookup error: /usr/local/lib/openmpi/mca_plm_lsf.so:
> undefined symbol: lsb_init

 The problem disappaers  if a rename/rmeove the lib
/usr/local/lib/openmpi/mca_plm_lsf .
So I think that the LSF support included in the last version on Open mpi
doesn't interact well with the lsf process that run openmpi jobs ( perhaps
TaskManager ).


Have you any ideas?

Bye
Alex

+ exec pam -g /opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/openmpi_wrapper
/mnt/ewd/mpi/hello/hello
[grid01.ags.wan:11820] mca: base: component_find: unable to open
/usr/local/lib/openmpi/mca_plm_lsf: file not found (ignored)
Hello World! from process 2 out of 4 on grid01.ags.wan
Hello World! from process 3 out of 4 on grid01.ags.wan
Hello World! from process 1 out of 4 on grid05.ags.wan
Hello World! from process 0 out of 4 on grid03.ags.wan

MANPATH=/opt/lsf/7.0/man:
EGO_CONFDIR=/opt/lsf/conf/ego/grid-cluster-01/kernel
LSB_EXEC_CLUSTER=grid-cluster-01
LSF_EAUTH_AUX_PASS=yes
HOSTNAME=grid01
EGO_TOP=/opt/lsf
LSF_LIM_API_NTRIES=1
LSF_LOGDIR=/opt/lsf/log
LSB_BATCH_JID=748
EGO_SERVERDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/etc
LSB_TRAPSIGS=trap # 15 10 12 2 1
LS_JOBPID=11809
LSB_JOBRES_CALLBACK=45290@grid01
LSB_JOB_EXECUSER=lsfadmin
LSB_JOBID=748
LSF_SERVERDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/etc
LSB_JOBRES_PID=11809
LSF_TS_OPTIONS=-p grid01:42740 -c /opt/lsf/conf -s
/opt/lsf/7.0/linux2.6-glibc2.3-x86/etc -a LINUX86
LSB_JOBNAME=mpirun.lsf /mnt/ewd/mpi/hello/hello
PM_SOURCE=pam
LSF_PJL_TYPE=openmpi
LSF_LIBDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/lib
USER=lsfadmin
LSB_EEXEC_REAL_UID=
EGO_LIBDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/lib
HOSTTYPE=LINUX86
LSF_INVOKE_CMD=bsub
LS_EXEC_T=START
LSF_EAUTH_SERVER=mbatchd@grid-cluster-01
LS_SUBCWD=/mnt/ewd/mpi/hello
LSF_VERSION=7.0
LSB_DJOB_RU_INTERVAL=15
LSB_HOSTS=grid01 grid01 grid05 grid03
LSB_UNIXGROUP_INT=lsfadmin
LSB_DJOB_HB_INTERVAL=15
LSB_JOBFILENAME=/home/lsfadmin/.lsbatch/1239206877.748
LSB_JOBINDEX=0
PATH=/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin:/opt/lsf/7.0/linux2.6-glibc2.3-x86/etc:/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/lsfadmin/bin
MAIL=/var/spool/mail/lsfadmin
LSB_EXIT_PRE_ABORT=99
LSB_JOBEXIT_STAT=0
LSF_TSOPT_NUM=0
PWD=/mnt/ewd/mpi/hello
LSB_CHKFILENAME=/home/lsfadmin/.lsbatch/1239206877.748
LSF_EAUTH_CLIENT=user
LSB_DJOB_HOSTFILE=/home/lsfadmin/.lsbatch/1239206877.748.hostfile
LSF_BINDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin
HOME=/home/lsfadmin
SHLVL=3
LSB_ACCT_FILE=/tmp/.1239206877.748.acct
BINARY_TYPE_HPC=
LSF_PM_MPIARGS=-p4pg /home/lsfadmin/pam_pg.11813
LSB_SUB_HOST=grid03
EGO_LOCAL_CONFDIR=/opt/lsf/conf/ego/grid-cluster-01/kernel
LSFUSER=lsfadmin
LSB_QUEUE=normal
LSB_MCPU_HOSTS=grid03 1 grid05 1 grid01 2
LOGNAME=lsfadmin
CVS_RSH=ssh
XLSF_UIDDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/lib/uid
LESSOPEN=|/usr/bin/lesspipe.sh %s
EGO_ESRVDIR=/opt/lsf/conf/ego/grid-cluster-01/eservice
LSB_EEXEC_REAL_GID=
LSF_ENVDIR=/opt/lsf/conf
LSF_EGO_ENVDIR=/opt/lsf/conf/ego/grid-cluster-01/kernel
G_BROKEN_FILENAMES=1
EGO_BINDIR=/opt/lsf/7.0/linux2.6-glibc2.3-x86/bin
_=/bin/env
ldd /mnt/ewd/mpi/fibonacci/fibonacci_mpi
        linux-gate.so.1 =>  (0x40000000)
        libmpi.so.0 => /usr/local/lib/libmpi.so.0 (0x40002000)
        libopen-rte.so.0 => /usr/local/lib/libopen-rte.so.0 (0x40090000)
        libopen-pal.so.0 => /usr/local/lib/libopen-pal.so.0 (0x400d2000)
        libdl.so.2 => /lib/libdl.so.2 (0x00c00000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x00cca000)
        libutil.so.1 => /lib/libutil.so.1 (0x03668000)
        libm.so.6 => /lib/i686/nosegneg/libm.so.6 (0x00c06000)
        libpthread.so.0 => /lib/i686/nosegneg/libpthread.so.0 (0x00c2f000)
        libc.so.6 => /lib/i686/nosegneg/libc.so.6 (0x00ab8000)
        /lib/ld-linux.so.2 (0x00a95000)


On Mon, Apr 6, 2009 at 10:02 PM, Prentice Bisbal <prent...@ias.edu> wrote:

> Alessandro Surace wrote:
> > Hi guys, I try to repost my question...
> > I've a problem with the last stable build and the last nightly snapshot.
> >
> > When I run a job directly with mpirun no problem.
> > If I try to submit it with lsf:
> > bsub -a openmpi -m grid01 mpirun.lsf /mnt/ewd/mpi/fibonacci/fibonacci_mpi
> >
> > I get the follow error:
> > mpirun: symbol lookup error: /usr/local/lib/openmpi/mca_plm_lsf.so:
> > undefined symbol: lsb_init
> > Job  /opt/lsf/7.0/linux2.6-glibc2.3-x86/bin/openmpi_wrapper
> > /mnt/ewd/mpi/fibonacci/fibonacci_mpi
> >
> > I've verified that the lsb_init symbol is present in the library:
> > [root@grid01 lib]# strings libbat.* |grep lsb_init
> > lsb_init
> > sch_lsb_init
> > lsb_init()
> > lsb_init
> > sch_lsb_init
> > sch_lsb_init
> > sch_lsb_init
> > sch_lsb_init
> > lsb_init()
> > sch_lsb_init
> >
>
> Can you verify that LSF is passing your evironment along correctly? It
> looks like your LD_LIBRARY_PATH is set in your login environment, but
> not the environment that the LSF job runs in
>
> You can check this by submitting a jog that executes just the command
> 'printenv'. Compare the output to what you get when you type 'printenv'
> on the command. Compare the values for LD_LIBRARY_PATH, in particular.
>
> If that looks okay, then try running a job that just executes
>
> ldd /mnt/ewd/mpi/fibonacci/fibonacci_mpi
>
> This will show you any libraries that ld can't find in the LSF run-time
> environment.
>
> --
> Prentice
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to