Hi Gus: Please see below while I go to study what Jeff suggested, On Fri, Apr 10, 2009 at 6:51 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Hi Francesco > > Let's concentrate on the Intel shared libraries problem for now. > > The FAQ Jeff sent you summarizes what I told you before. > > You need to setup your Intel environment (on deb64) to work with mpirun. > You need to insert these commands on your .bashrc (most likely you use bash) > or .cshrc (if you use csh/tcsh) file. > These files sit on your home directory. > They are hidden files, to see them do "ls -a". > Edit this file and insert these commands there: > > source /path/to/your/intel/cce/bin/iccvars.sh > source /path/to/your/intel/cce/bin/ifortvars.sh > > Did you do this?
my .bashrc contained #For intel Fortran and C++ compilers . /opt/intel/fce/10.1.022/bin/ifortvars.sh . /opt/intel/cce/10.1.022/bin/iccvars.sh echo $LD_LIBRARY_PATH /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/lib:/opt/intel/fce/10.1.022/lib:/usr/local/lib Because I understand that I am messing something, I saved a copy of the original .bashrc and replaced the dot with "source". Of course, everything came out as above, I sincerely apologize to bother the list. francesco > > This Intel environment **cannot** be setup on the > shell command prompt **only**, > otherwise it will **only work for your interactive session**, > but **not** for mpirun. > > Edit your .bashrc file, and try to run connectivity_c again. > We can talk about Amber after you get the Intel shared libraries problem > behind you. > > (OK, I was about to say you forgot deb64 after -host, > but you sent the fix below.) > > I hope this helps. > > Gus Correa > --------------------------------------------------------------------- > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > --------------------------------------------------------------------- > > Francesco Pietra wrote: >> >> Sorry, the first line of the ouput below (copied manually) should be rad >> >> /usr/local/bin/mpirun -host deb64 -n 4 connectivity_c 2>&1 | tee >> connectivity.ou >> >> >> ---------- Forwarded message ---------- >> From: Francesco Pietra <chiendar...@gmail.com> >> Date: Fri, Apr 10, 2009 at 6:16 PM >> Subject: Re: [OMPI users] shared libraries issue compiling 1.3.1/intel >> 10.1.022 >> To: Open MPI Users <us...@open-mpi.org> >> >> >> Hi Gus: >> >> If you feel that the observations below are not relevant to openmpi, >> please disregard the message. You have already kindly devoted so much >> time to my problems. >> >> The "limits.h" issue is solved with 10.1.022 intel compilers: as I >> felt, the problem was with the pre-10.1.021 version of the intel C++ >> and ifort compilers, a subtle bug observed also by gentoo people (web >> intel). There remains an orted issue. >> >> The openmpi 1.3.1 installation was able to compile connectivity_c.c >> and hello_c.c, though, running mpirun (output below between ===): >> >> ================= >> /usr/local/bin/mpirun -host deb64 (see above) -n 4 connectivity_c 2>&1 >> | tee connectivity.out >> /usr/local/bin/orted: error while loading shared libraries: libimf.so: >> cannot open shared object file: No such file or directory >> -------------------------------------------------------------------------- >> A daemon (pid 8472) died unexpectedly with status 127 while attempting >> to launch so we are aborting. >> >> There may be more information reported by the environment (see above). >> >> This may be because the daemon was unable to find all the needed shared >> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the >> location of the shared libraries on the remote nodes and this will >> automatically be forwarded to the remote nodes. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> mpirun noticed that the job aborted, but has no info as to the process >> that caused that situation. >> -------------------------------------------------------------------------- >> mpirun: clean termination accomplished >> ============= >> >> At this point, Amber10 serial compiled nicely (all intel, like >> openmpi), but parallel compilation, as expected, returned the same >> problem above: >> >> ================= >> export TESTsander=/usr/local/amber10/exe/sander.MPI; make >> test.sander.BASIC >> make[1]: Entering directory `/usr/local/amber10/test' >> cd cytosine && ./Run.cytosine >> orted: error while loading shared libraries: libimf.so: cannot open >> shared object file: No such file or directory >> -------------------------------------------------------------------------- >> A daemon (pid 8371) died unexpectedly with status 127 while attempting >> to launch so we are aborting. >> >> There may be more information reported by the environment (see above). >> >> This may be because the daemon was unable to find all the needed shared >> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the >> location of the shared libraries on the remote nodes and this will >> automatically be forwarded to the remote nodes. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> mpirun noticed that the job aborted, but has no info as to the process >> that caused that situation. >> -------------------------------------------------------------------------- >> mpirun: clean termination accomplished >> >> ./Run.cytosine: Program error >> make[1]: *** [test.sander.BASIC] Error 1 >> make[1]: Leaving directory `/usr/local/amber10/test' >> make: *** [test.sander.BASIC.MPI] Error 2 >> ===================== >> >> Relevant info: >> >> The daemon was not ssh (thus my hypothesis that a firewall on the >> router was killing ssh is not the case). During these procedures, >> there were only deb64 and deb32 on the local network. On monoprocessor >> deb32 (i386) there is nothing of openmpi or amber. Only ssh. Thus, my >> .bashrc on deb32 can't correspond to that of deb 64 as far as >> libraries are concerned. >> >> echo $LD_LIBRARY_PATH >> >> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1..022/lib:/opt/intel/fce/10.1.022/lib:/usr/local/lib >> >> # dpkg --search libimf.so >> intel-iforte101022: /opt/intel/fce/10.1.022/lib/libimf.so >> intel-icce101022: /opt/intel/cce/10.1.022/lib/libimf.so >> >> i.e., libimf.so is on the unix path, still not found by mpirun. >> >> Before compiling I trie to carefully check all env variables and >> paths. In particular, as to mpi: >> >> mpif90 -show /opt/intel/fce/10.1.022//bin/ifort -I/usr/local/include >> -pthread -I/usr/local/lib -L/usr/local/lib -lmpi_f90 -lmpi_f77 -lmpi >> -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil >> >> thanks >> francesco >> >> >> >> On Thu, Apr 9, 2009 at 9:29 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: >>> >>> Hi Francesco >>> >>> Francesco Pietra wrote: >>>> >>>> Hi: >>>> As failure to find "limits.h" in my attempted compilations of Amber of >>>> th fast few days (amd64 lenny, openmpi 1.3.1, intel compilers >>>> 10.1.015) is probably (or I hope so) a bug of the version used of >>>> intel compilers (I made with debian the same observations reported >>>> for gentoo, >>>> http://software.intel.com/en-us/forums/intel-c-compiler/topic/59886/). >>>> >>>> I made a deb package of 10.1.022, icc and ifort. >>>> >>>> ./configure CC icc, CXX icp, >>> >>> The Intel C++ compiler is called icpc, not icp. >>> Is this a typo on your message, or on the actual configure options? >>> >>> F77 and FC ifort --with-libnuma=/usr (not >>>> >>>> /usr/lib so that the numa.h issue is not raised), "make clean", >>> >>> If you really did "make clean" you may have removed useful things. >>> What did you do, "make" or "make clean"? >>> >>> and >>>> >>>> "mak install" gave no error signals. However, the compilation tests in >>>> the examples did not pass and I really don't understand why. >>>> >>> Which compilation tests you are talking about? >>> From Amber or from the OpenMPI example programs (connectivity_c and >>> hello_c), or both? >>> >>>> The error, with both connectivity_c and hello_c (I was operating on >>>> the parallel computer deb64 directly and have access to everything >>>> there) was failure to find a shared library, libimf.so >>>> >>> To get the right Intel environment, >>> you need to put these commands inside your login files >>> (.bashrc or .cshrc), to setup the Intel environment variables correctly: >>> >>> source /path/to/your/intel/cce/bin/iccvars.sh >>> source /path/to/your/intel/cce/bin/ifortvars.sh >>> >>> and perhaps a similar one for mkl. >>> (I don't use MKL, I don't know much about it). >>> >>> If your home directory is NFS mounted to all the computers you >>> use to run parallel programs, >>> then the same .bashrc/.csrhc will work on all computers. >>> However, if you have a separate home directory on each computer, >>> then you need to do this on each of them. >>> I.e., you have to include the "source" commands above >>> in the .bashrc/.cshrc files on your home directory in EACH computer. >>> >>> Also I presume you use bash/sh not tcsh/csh, right? >>> Otherwise you need to source iccvars.csh instead of iccvars.sh. >>> >>> >>>> # dpkg --search libimf.so >>>> /opt/intel/fce/10.1.022/lib/libimf.so (the same for cce) >>>> >>>> which path seems to be correctly sourced by iccvars.sh and >>>> ifortvars.sh (incidentally, both files are -rw-r--r-- root root; >>>> correct that non executable?) >>>> >>> The permissions here are not a problem. >>> You are supposed to *source* the files, not to execute them. >>> If you execute them instead of sourcing the files, >>> your Intel environment will *NOT* be setup. >>> >>> BTW, the easy way to check your environment is to type "env" on the >>> shell command prompt. >>> >>>> echo $LD_LIBRARY_PATH >>>> returned, inter alia, >>>> >>>> >>>> /opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/mkl/10.1.2.024/lib/em64t:/opt/intel/cce/10.1.022/lib:/opt/intel/fce/10.1.022/lib >>>> (why twice the mkl?) >>>> >>> Hard to tell in which computer you were when you did this, >>> and hence what it should affect. >>> >>> You man have sourced twice the mkl shell that sets up the MKL environment >>> variables, which would write its library path more than >>> once. >>> >>> When the environment variables get this much confused, >>> with duplicate paths and so on, you may want to logout >>> and login again, to start fresh. >>> >>> Do you need MKL for Amber? >>> If you don't use it, keep things simple, and don't bother about it. >>> >>> >>>> I surely miss to understand something fundamental. Hope other eyes see >>>> better >>>> >>> Jody helped you run the hello_c program successfully. >>> Try to repeat carefully the same steps. >>> You should get the same result, >>> the OpenMPI test programs should run. >>> >>>> A kind person elsewhere suggested me on passing "The use of -rpath >>>> during linking is highly recommended as opposed to setting >>>> LD_LIBRARY_PATH at run time, not the least because it hardcodes the >>>> paths to the "right" library files in the executables themselves" >>>> Should this be relevant to the present issue, where to learn about >>>> -rpath linking? >>>> >>> If you are talking about Amber, >>> you would have to tweak the Makefiles to tweak the linker -rpath. >>> And we don't know much about Amber's Makefiles, >>> so this may be a very tricky approach. >>> >>> If you are talking about the OpenMPI test programs, >>> I think it is just a matter of setting the Intel environment variables >>> right, sourcing the ifortvars.sh iccvars.sh properly, >>> to get the right runtime LD_LIBRARY_PATH. >>> >>>> thanks >>>> francesco pietra >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> I hope this helps. >>> Gus Correa >>> >>> --------------------------------------------------------------------- >>> Gustavo Correa >>> Lamont-Doherty Earth Observatory - Columbia University >>> Palisades, NY, 10964-8000 - USA >>> --------------------------------------------------------------------- >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >