Sorry, the first line of the ouput below (copied manually) should be rad /usr/local/bin/mpirun -host deb64 -n 4 connectivity_c 2>&1 | tee connectivity.ou
---------- Forwarded message ---------- From: Francesco Pietra <chiendar...@gmail.com> List-Post: users@lists.open-mpi.org Date: Fri, Apr 10, 2009 at 6:16 PM Subject: Re: [OMPI users] shared libraries issue compiling 1.3.1/intel 10.1.022 To: Open MPI Users <us...@open-mpi.org> Hi Gus: If you feel that the observations below are not relevant to openmpi, please disregard the message. You have already kindly devoted so much time to my problems. The "limits.h" issue is solved with 10.1.022 intel compilers: as I felt, the problem was with the pre-10.1.021 version of the intel C++ and ifort compilers, a subtle bug observed also by gentoo people (web intel). There remains an orted issue. The openmpi 1.3.1 installation was able to compile connectivity_c.c and hello_c.c, though, running mpirun (output below between ===): ================= /usr/local/bin/mpirun -host deb64 (see above) -n 4 connectivity_c 2>&1 | tee connectivity.out /usr/local/bin/orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory -------------------------------------------------------------------------- A daemon (pid 8472) died unexpectedly with status 127 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- mpirun: clean termination accomplished ============= At this point, Amber10 serial compiled nicely (all intel, like openmpi), but parallel compilation, as expected, returned the same problem above: ================= export TESTsander=/usr/local/amber10/exe/sander.MPI; make test.sander.BASIC make[1]: Entering directory `/usr/local/amber10/test' cd cytosine && ./Run.cytosine orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory -------------------------------------------------------------------------- A daemon (pid 8371) died unexpectedly with status 127 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- mpirun: clean termination accomplished ./Run.cytosine: Program error make[1]: *** [test.sander.BASIC] Error 1 make[1]: Leaving directory `/usr/local/amber10/test' make: *** [test.sander.BASIC.MPI] Error 2 ===================== Relevant info: The daemon was not ssh (thus my hypothesis that a firewall on the router was killing ssh is not the case). During these procedures, there were only deb64 and deb32 on the local network. On monoprocessor deb32 (i386) there is nothing of openmpi or amber. Only ssh. Thus, my .bashrc on deb32 can't correspond to that of deb 64 as far as libraries are concerned. echo $LD_LIBRARY_PATH /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1..022/lib:/opt/intel/fce/10.1.022/lib:/usr/local/lib # dpkg --search libimf.so intel-iforte101022: /opt/intel/fce/10.1.022/lib/libimf.so intel-icce101022: /opt/intel/cce/10.1.022/lib/libimf.so i.e., libimf.so is on the unix path, still not found by mpirun. Before compiling I trie to carefully check all env variables and paths. In particular, as to mpi: mpif90 -show /opt/intel/fce/10.1.022//bin/ifort -I/usr/local/include -pthread -I/usr/local/lib -L/usr/local/lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil thanks francesco On Thu, Apr 9, 2009 at 9:29 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Hi Francesco > > Francesco Pietra wrote: >> >> Hi: >> As failure to find "limits.h" in my attempted compilations of Amber of >> th fast few days (amd64 lenny, openmpi 1.3.1, intel compilers >> 10.1.015) is probably (or I hope so) a bug of the version used of >> intel compilers (I made with debian the same observations reported >> for gentoo, >> http://software.intel.com/en-us/forums/intel-c-compiler/topic/59886/). >> >> I made a deb package of 10.1.022, icc and ifort. >> >> ./configure CC icc, CXX icp, > > The Intel C++ compiler is called icpc, not icp. > Is this a typo on your message, or on the actual configure options? > > F77 and FC ifort --with-libnuma=/usr (not >> >> /usr/lib so that the numa.h issue is not raised), "make clean", > > If you really did "make clean" you may have removed useful things. > What did you do, "make" or "make clean"? > > and >> >> "mak install" gave no error signals. However, the compilation tests in >> the examples did not pass and I really don't understand why. >> > > Which compilation tests you are talking about? > From Amber or from the OpenMPI example programs (connectivity_c and > hello_c), or both? > >> The error, with both connectivity_c and hello_c (I was operating on >> the parallel computer deb64 directly and have access to everything >> there) was failure to find a shared library, libimf.so >> > > To get the right Intel environment, > you need to put these commands inside your login files > (.bashrc or .cshrc), to setup the Intel environment variables correctly: > > source /path/to/your/intel/cce/bin/iccvars.sh > source /path/to/your/intel/cce/bin/ifortvars.sh > > and perhaps a similar one for mkl. > (I don't use MKL, I don't know much about it). > > If your home directory is NFS mounted to all the computers you > use to run parallel programs, > then the same .bashrc/.csrhc will work on all computers. > However, if you have a separate home directory on each computer, > then you need to do this on each of them. > I.e., you have to include the "source" commands above > in the .bashrc/.cshrc files on your home directory in EACH computer. > > Also I presume you use bash/sh not tcsh/csh, right? > Otherwise you need to source iccvars.csh instead of iccvars.sh. > > >> # dpkg --search libimf.so >> /opt/intel/fce/10.1.022/lib/libimf.so (the same for cce) >> >> which path seems to be correctly sourced by iccvars.sh and >> ifortvars.sh (incidentally, both files are -rw-r--r-- root root; >> correct that non executable?) >> > > The permissions here are not a problem. > You are supposed to *source* the files, not to execute them. > If you execute them instead of sourcing the files, > your Intel environment will *NOT* be setup. > > BTW, the easy way to check your environment is to type "env" on the > shell command prompt. > >> echo $LD_LIBRARY_PATH >> returned, inter alia, >> >> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/lib:/opt/intel/fce/10.1.022/lib >> (why twice the mkl?) >> > > Hard to tell in which computer you were when you did this, > and hence what it should affect. > > You man have sourced twice the mkl shell that sets up the MKL environment > variables, which would write its library path more than > once. > > When the environment variables get this much confused, > with duplicate paths and so on, you may want to logout > and login again, to start fresh. > > Do you need MKL for Amber? > If you don't use it, keep things simple, and don't bother about it. > > >> I surely miss to understand something fundamental. Hope other eyes see >> better >> > > Jody helped you run the hello_c program successfully. > Try to repeat carefully the same steps. > You should get the same result, > the OpenMPI test programs should run. > >> A kind person elsewhere suggested me on passing "The use of -rpath >> during linking is highly recommended as opposed to setting >> LD_LIBRARY_PATH at run time, not the least because it hardcodes the >> paths to the "right" library files in the executables themselves" >> Should this be relevant to the present issue, where to learn about >> -rpath linking? >> > > If you are talking about Amber, > you would have to tweak the Makefiles to tweak the linker -rpath. > And we don't know much about Amber's Makefiles, > so this may be a very tricky approach. > > If you are talking about the OpenMPI test programs, > I think it is just a matter of setting the Intel environment variables > right, sourcing the ifortvars.sh iccvars.sh properly, > to get the right runtime LD_LIBRARY_PATH. > >> thanks >> francesco pietra >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > I hope this helps. > Gus Correa > > --------------------------------------------------------------------- > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > --------------------------------------------------------------------- > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >