Re: [OMPI users] Making RPM from source that respects --prefix
Hi Bill, you might want to have a look here if you need a working example of that: https://savannah.fzk.de/cgi-bin/viewcvs.cgi/trunk/?root=openmpi-build We used to generate both SRPMs and RPMs for Open MPI in a previous project and did in fact specify a separate installation directory : (%define ompi_prefix /opt/i2g/openmpi in i2g-openmpi.spec.in) Hope that helps. Regards, Kiril On Fri, 2009-10-02 at 03:48 -0700, Bill Johnstone wrote: > I'm trying to build an RPM of 1.3.3 from the SRPM. Despite typical RPM > practice, I need to build ompi so that it installs to a different directory > from /usr or /opt, i.e. what I would get if I just built from source myself > with a --prefix argument to configure. > > When I invoke buildrpm with the --define 'configure_options --prefix= path> ...', the options do get set when the building process gets kicked off. > However, when I query the final RPM, only vampirtrace has paid attention to > the specified --prefix and wants to place its files accordingly. How should > I alter the .spec file (or in some other place?) to get the desired behavior > for the final file locations in the RPM? > > Thank you for any help. > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Dipl.-Inf. Kiril Dichev Tel.: +49 711 685 60492 E-mail: dic...@hlrs.de High Performance Computing Center Stuttgart (HLRS) Universität Stuttgart 70550 Stuttgart Germany
Re: [OMPI users] bug in MPI_Cart_create?
Hi David, I believe this particular bug was fixed in the trunk some weeks ago shortly before your post. Regards, Kiril On Tue, 2009-10-13 at 17:54 +1100, David Singleton wrote: > Looking back through the archives, a lot of people have hit error > messages like > > > [bl302:26556] *** An error occurred in MPI_Cart_create > > [bl302:26556] *** on communicator MPI_COMM_WORLD > > [bl302:26556] *** MPI_ERR_ARG: invalid argument of some other kind > > [bl302:26556] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > > One of the reasons people *may* be hitting this is what I believe to > be an incorrect test in MPI_Cart_create(): > > if (0 > reorder || 1 < reorder) { > return OMPI_ERRHANDLER_INVOKE (old_comm, MPI_ERR_ARG, >FUNC_NAME); > } > > reorder is a "logical" argument and "2.5.2 C bindings" in the MPI 1.3 > standard says: > > Logical flags are integers with value 0 meaning “false” and a > non-zero value meaning “true.” > > So I'm not sure there should be any argument test. > > > We hit this because we (sorta erroneously) were trying to use a GNU build > of Open MPI with Intel compilers. gfortran has true=1 while ifort has > true=-1. It seems to all work (by luck, I know) except this test. Are > there any other tests like this in Open MPI? > > David > _______ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Dipl.-Inf. Kiril Dichev Tel.: +49 711 685 60492 E-mail: dic...@hlrs.de High Performance Computing Center Stuttgart (HLRS) Universität Stuttgart 70550 Stuttgart Germany
[OMPI users] How to enable vprotocol + pessimistic message logging?
Hi, I’m doing some research on message logging protocols. It seems that Vprotocol in Open MPI can wrap around communication calls and log messages, if enabled. Unfortunately, when I try to use it with Open MPI- 4.0.0, I get an error: mpirun --mca vprotocol pessimist-mca vprotocol_pessimist_priority 10 -n 4 $HOME/NPB3.3-MPI/bin/cg.B.4 … vprotocol_pessimist: component_init: threads are enabled, and not supported by vprotocol pessimist fault tolerant layer, will not load … Unfortunately, it seems that actually disabling multi-threading is not possible in 4.0.0 (MPI_THREAD_MULTIPLE is always used during compilation, and in contrast to the README file, --enable-mpi-thread-multiple or —disable-mpi-thread-multiple are not recognised as options). I’m pretty much stuck. Should I give up on the VProtocol as unusable then at the moment? Thanks, Kiril___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] Fwd: How to enable vprotocol + pessimistic message logging?
Thanks for the quick reply Aurelien. I tried initialising MPI from the benchmark via “call mpi_init_thread(MPI_THREAD_SINGLE, provided, ierror)" — it’s in Fortran -- but nothing changed there. I still get the same “threads are enabled” warning and vprotocol doesn’t seem to be used . I also tried disabling that sanity check in /ompi/mca/vprotocol/pessimist/vprotocol_pessimist_component.c, but that was a bad idea. The runtime crashes. Regards, Kiril ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] Issue with PBS Pro
Hi, I am trying to run with Open MPI 1.3 on a cluster using PBS Pro: pbs_version = PBSPro_9.2.0.81361 However, after compiling with these options: ../configure --prefix=/home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-dynamic-threads CC=/opt/intel/cce/10.1.015/bin/icc CXX=/opt/intel/cce/10.1.015/bin/icpc CPP="/opt/intel/cce/10.1.015/bin/icc -E" FC=/opt/intel/fce/10.1.015/bin/ifort F90=/opt/intel/fce/10.1.015/bin/ifort F77=/opt/intel/fce/10.1.015/bin/ifort --enable-mpi-f90 --with-tm=/usr/pbs/ --enable-mpi-threads=yes --enable-contrib-no-build=vt I get runtime errors when running on more than one reserved node even /bin/hostname: /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-dynamic-threads/bin/mpirun -np 5 /bin/hostname /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-dynamic-threads/bin/mpirun: symbol lookup error: /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-dynamic-threads/lib/openmpi/mca_plm_tm.so: undefined symbol: tm_init When running on one node only, I don't get this error. Now, I see that I only have static PBS libraries so I tried to compile this component statically. I added to the above configure: "--enable-mca-static=ras-tm,pls-tm" However, nothing changed. The same errors occurr. But if I compile Open MPI only with static libraries ("--enable-static --disable-shared"), the MPI (or non-MPI) programs run OK. Can you help me here ? Thanks, Kiril -- Dipl.-Inf. Kiril Dichev Tel.: +49 711 685 60492 E-mail: dic...@hlrs.de High Performance Computing Center Stuttgart (HLRS) Universität Stuttgart 70550 Stuttgart Germany
Re: [OMPI users] Issue with PBS Pro
Hi, I did a few things wrong before: 1. The new name of the component "pls" is "plm". 2. It seems for the components, now a ":" separation is used instead of a "-" separation. Anyway, for me specifying "--enable-mca-static=plm:tm" seems to fix the problem - I still have shared libraries for Open MPI with statically compiled Torque support. Cheers, Kiril On Thu, 2009-01-29 at 12:37 -0700, Ralph Castain wrote: > On a Torque system, your job is typically started on a backend node. > Thus, you need to have the Torque libraries installed on those nodes - > or else build OMPI static, as you found. > > I have never tried --enable-mca-static, so I have no idea if this > works or what it actually does. If I want static, I just build the > entire tree that way. > > If you want to run dynamic, though, you'll have to make the Torque > libs available on the backend nodes. > > Ralph > > > On Jan 29, 2009, at 8:32 AM, Kiril Dichev wrote: > > > Hi, > > > > I am trying to run with Open MPI 1.3 on a cluster using PBS Pro: > > > > pbs_version = PBSPro_9.2.0.81361 > > > > > > However, after compiling with these options: > > > > ../configure > > --prefix=/home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3- > > intel10.1-64bit-dynamic-threads CC=/opt/intel/cce/10.1.015/bin/icc > > CXX=/opt/intel/cce/10.1.015/bin/icpc CPP="/opt/intel/cce/10.1.015/ > > bin/icc -E" FC=/opt/intel/fce/10.1.015/bin/ifort F90=/opt/intel/fce/ > > 10.1.015/bin/ifort F77=/opt/intel/fce/10.1.015/bin/ifort --enable- > > mpi-f90 --with-tm=/usr/pbs/ --enable-mpi-threads=yes --enable- > > contrib-no-build=vt > > > > I get runtime errors when running on more than one reserved node > > even /bin/hostname: > > > > /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit- > > dynamic-threads/bin/mpirun -np 5 /bin/hostname > > /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit- > > dynamic-threads/bin/mpirun: symbol lookup error: /home_nfs/parma/ > > x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-dynamic-threads/ > > lib/openmpi/mca_plm_tm.so: undefined symbol: tm_init > > > > When running on one node only, I don't get this error. > > > > Now, I see that I only have static PBS libraries so I tried to compile > > this component statically. I added to the above configure: > > "--enable-mca-static=ras-tm,pls-tm" > > > > However, nothing changed. The same errors occurr. > > > > > > But if I compile Open MPI only with static libraries ("--enable-static > > --disable-shared"), the MPI (or non-MPI) programs run OK. > > > > Can you help me here ? > > > > Thanks, > > Kiril > > > > > > > > -- > > Dipl.-Inf. Kiril Dichev > > Tel.: +49 711 685 60492 > > E-mail: dic...@hlrs.de > > High Performance Computing Center Stuttgart (HLRS) > > Universität Stuttgart > > 70550 Stuttgart > > Germany > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Dipl.-Inf. Kiril Dichev Tel.: +49 711 685 60492 E-mail: dic...@hlrs.de High Performance Computing Center Stuttgart (HLRS) Universität Stuttgart 70550 Stuttgart Germany
[OMPI users] Problems in 1.3 loading shared libs when using VampirServer
Hi guys, sorry for the long e-mail. I have been trying for some time now to run VampirServer with shared libs for Open MPI 1.3. First of all: The "--enable-static --disable-shared" version works. Also, the 1.2 series worked fine with the shared libs. But here is the story for the shared libraries with OMPI 1.3: Compilation of OMPI went fine and also the VampirServer guys compiled the MPI driver they need against OMPI. The driver just refers to the shared libraries of Open MPI. However, on launching the server I got errors of the type "undefined symbol": error: /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-MT-shared/lib/openmpi/mca_paffinity_linux.so: undefined symbol: mca_base_param_reg_int It seemed to me that probably my LD_LIBRARY_PATH is not including /lib/openmpi , but I exported it and did "mpirun -x LD_LIBRARY_PATH ..." and nothing changed. Then, I started building any component complaining with "undefined symbol" with "--enable-mca-static" - for example the above message disappeared after I did --enable-mca-static paffinity. I don't know why this worked, but it seemed to help. However, it was always replaced by another error message of another component. After a few components another error came mca: base: component_find: unable to open /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-MT-shared/lib/openmpi/mca_rml_oob: file not found (ignored) (full output attached) Now, I was unsure what to do, but again, when compiling the complaining component statically, things went a step further. One thing that struck me is that there is such a file with an extra ".so" at the end in the directory -but maybe dlopen also accepts files without the ".so", I don't know. Anywas, now I have included like 20 components statically and still build shared objects for the OMPI libs and things seem to work. Does anyone have any idea why these dozens of errors happen when loading shared libs? Like I said, I never had this in 1.2 series. Thanks, Kiril [nv8:21349] mca: base: component_find: unable to open /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-MT-shared/lib/openmpi/mca_rml_oob: file not found (ignored) [nv8:21349] [[8664,1],1] ORTE_ERROR_LOG: Error in file ../../../../orte/mca/ess/base/ess_base_std_app.c at line 72 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_rml_base_select failed --> Returned value Error (-1) instead of ORTE_SUCCESS -- [nv8:21349] [[8664,1],1] ORTE_ERROR_LOG: Error in file ../../../../../orte/mca/ess/env/ess_env_module.c at line 154 [nv8:21349] [[8664,1],1] ORTE_ERROR_LOG: Error in file ../../orte/runtime/orte_init.c at line 132 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_set_name failed --> Returned value Error (-1) instead of ORTE_SUCCESS -- [nv8:21348] mca: base: component_find: unable to open /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-MT-shared/lib/openmpi/mca_rml_oob: file not found (ignored) [nv8:21348] [[8664,1],0] ORTE_ERROR_LOG: Error in file ../../../../orte/mca/ess/base/ess_base_std_app.c at line 72 -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Error" (-1) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [nv8:21349] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! -- It looks like orte_init failed
Re: [OMPI users] Problems in 1.3 loading shared libs when using VampirServer
I am happy to confirm that Jeff's suggestion worked. The problem was following: in previous versions VampirServer issued ComLib = dlopen( driverName, RTLD_LAZY ); Changing this to following fixed the problem: ComLib = dlopen( driverName, RTLD_LAZY | RTLD_GLOBAL ); The VampirServer guys compiled the modified version of VampirServer and now, the shared library Open MPI 1.3 launches VampirServer without issues. It seems that the previous dlopen call did not have a global scope and so the VampirServer plugin did not find the Open MPI 1.3 shared objects. Thanks for the help! Kiril On Tue, 2009-02-24 at 11:02 -0500, Jeff Squyres wrote: > On Feb 23, 2009, at 8:59 PM, Jeff Squyres wrote: > > > Err... I'm a little confused. We've been emailing about this exact > > issue for a week or two (off list); you just re-started the > > conversation from the beginning, moved it to the user's list, and > > dropped all the CC's (which include several people who are not on > > this list). Why did you do that? > > > GAAH!! Mea maxima culpa. :-( > > My stupid mail program did something strange (exact details > unimportant) that made me think you re-sent your message to the users > list yesterday -- thereby re-starting the whole conversation, etc. > Upon double checking, I see that this is *not* what you did at all -- > my mail program was showing me your original post from Feb 4 and > making it look like you re-sent it yesterday. I just wasn't careful > in my reading. Sorry about that; the fault and confusion was entirely > mine. :-( > > (we're continuing the conversation off-list just because it's gnarly > and full of details about Vampir that most people probably don't care > about; they're working on a small example to send to me that > replicates the problem -- will post back here when we have some kind > of solution...) > > We now return you to your regularly scheduled programming... > -- Dipl.-Inf. Kiril Dichev Tel.: +49 711 685 60492 E-mail: dic...@hlrs.de High Performance Computing Center Stuttgart (HLRS) Universität Stuttgart 70550 Stuttgart Germany