Re: [OMPI users] Making RPM from source that respects --prefix

2009-10-02 Thread Kiril Dichev
Hi Bill,

you might want to have a look here if you need a working example of
that:

https://savannah.fzk.de/cgi-bin/viewcvs.cgi/trunk/?root=openmpi-build

We used to generate both SRPMs and RPMs for Open MPI in a previous
project and did in fact specify a separate installation directory :
(%define ompi_prefix  /opt/i2g/openmpi in i2g-openmpi.spec.in)

Hope that helps.


Regards,
Kiril


On Fri, 2009-10-02 at 03:48 -0700, Bill Johnstone wrote:
> I'm trying to build an RPM of 1.3.3 from the SRPM.  Despite typical RPM 
> practice, I need to build ompi so that it installs to a different directory 
> from /usr or /opt, i.e. what I would get if I just built from source myself 
> with a --prefix argument to configure.
> 
> When I invoke buildrpm with the --define 'configure_options --prefix= path> ...', the options do get set when the building process gets kicked off. 
>  However, when I query the final RPM, only vampirtrace has paid attention to 
> the specified --prefix and wants to place its files accordingly.  How should 
> I alter the .spec file (or in some other place?) to get the desired behavior 
> for the final file locations in the RPM?
> 
> Thank you for any help.
> 
> 
> 
>   
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
-- 
Dipl.-Inf. Kiril Dichev
Tel.: +49 711 685 60492
E-mail: dic...@hlrs.de
High Performance Computing Center Stuttgart (HLRS)
Universität Stuttgart
70550 Stuttgart
Germany




Re: [OMPI users] bug in MPI_Cart_create?

2009-10-26 Thread Kiril Dichev
Hi David, 

I believe this particular bug was fixed in the trunk some weeks ago
shortly before your post. 

Regards,
Kiril

On Tue, 2009-10-13 at 17:54 +1100, David Singleton wrote:
> Looking back through the archives, a lot of people have hit error
> messages like
> 
>  > [bl302:26556] *** An error occurred in MPI_Cart_create
>  > [bl302:26556] *** on communicator MPI_COMM_WORLD
>  > [bl302:26556] *** MPI_ERR_ARG: invalid argument of some other kind
>  > [bl302:26556] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> 
> One of the reasons people *may* be hitting this is what I believe to
> be an incorrect test in MPI_Cart_create():
> 
>  if (0 > reorder || 1 < reorder) {
>  return OMPI_ERRHANDLER_INVOKE (old_comm, MPI_ERR_ARG,
>FUNC_NAME);
>  }
> 
> reorder is a "logical" argument and "2.5.2 C bindings" in the MPI 1.3
> standard says:
> 
>  Logical flags are integers with value 0 meaning “false” and a
>  non-zero value meaning “true.”
> 
> So I'm not sure there should be any argument test.
> 
> 
> We hit this because we (sorta erroneously) were trying to use a GNU build
> of Open MPI with Intel compilers.  gfortran has true=1 while ifort has
> true=-1.  It seems to all work (by luck, I know) except this test.  Are
> there any other tests like this in Open MPI?
> 
> David
> _______
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
-- 
Dipl.-Inf. Kiril Dichev
Tel.: +49 711 685 60492
E-mail: dic...@hlrs.de
High Performance Computing Center Stuttgart (HLRS)
Universität Stuttgart
70550 Stuttgart
Germany




[OMPI users] How to enable vprotocol + pessimistic message logging?

2019-01-22 Thread Kiril Dichev
Hi,

I’m doing some research on message logging protocols. It seems that Vprotocol 
in Open MPI can wrap around communication calls and log messages, if enabled. 
Unfortunately, when I try to use it with Open MPI- 4.0.0, I get an error:

mpirun   --mca vprotocol pessimist-mca vprotocol_pessimist_priority 10  -n 
4 $HOME/NPB3.3-MPI/bin/cg.B.4
…
vprotocol_pessimist: component_init: threads are enabled, and not supported by 
vprotocol pessimist fault tolerant layer, will not load
…

Unfortunately, it seems that actually disabling multi-threading is not possible 
in 4.0.0 (MPI_THREAD_MULTIPLE is always used during compilation, and in 
contrast to the README file, --enable-mpi-thread-multiple or 
—disable-mpi-thread-multiple are not recognised as options). 

I’m pretty much stuck. Should I give up on the VProtocol as unusable then at 
the moment?

Thanks,
Kiril___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Fwd: How to enable vprotocol + pessimistic message logging?

2019-01-24 Thread Kiril Dichev
Thanks for the quick reply Aurelien. 

I tried initialising MPI from the benchmark via “call 
mpi_init_thread(MPI_THREAD_SINGLE, provided, ierror)" — it’s in Fortran -- but 
nothing changed there. I still get the same “threads are enabled” warning and 
vprotocol doesn’t seem to be used . I also tried disabling that sanity check in 
/ompi/mca/vprotocol/pessimist/vprotocol_pessimist_component.c,  but that was a 
bad idea. The runtime crashes.


Regards,
Kiril

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] Issue with PBS Pro

2009-01-29 Thread Kiril Dichev
Hi,

I am trying to run with Open MPI 1.3 on a cluster using PBS Pro:

pbs_version = PBSPro_9.2.0.81361


However, after compiling with these options:

../configure
--prefix=/home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-dynamic-threads
 CC=/opt/intel/cce/10.1.015/bin/icc CXX=/opt/intel/cce/10.1.015/bin/icpc 
CPP="/opt/intel/cce/10.1.015/bin/icc -E" FC=/opt/intel/fce/10.1.015/bin/ifort 
F90=/opt/intel/fce/10.1.015/bin/ifort F77=/opt/intel/fce/10.1.015/bin/ifort 
--enable-mpi-f90 --with-tm=/usr/pbs/ --enable-mpi-threads=yes 
--enable-contrib-no-build=vt

I get runtime errors when running on more than one reserved node
even /bin/hostname:

/home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-dynamic-threads/bin/mpirun
  -np 5  /bin/hostname 
/home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-dynamic-threads/bin/mpirun:
 symbol lookup error: 
/home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-dynamic-threads/lib/openmpi/mca_plm_tm.so:
 undefined symbol: tm_init

When running on one node only, I don't get this error.

Now, I see that I only have static PBS libraries so I tried to compile
this component statically. I added to the above configure:
"--enable-mca-static=ras-tm,pls-tm"

However, nothing changed. The same errors occurr.


But if I compile Open MPI only with static libraries ("--enable-static
--disable-shared"), the MPI (or non-MPI) programs run OK.

Can you help me here ?

Thanks,
Kiril



-- 
Dipl.-Inf. Kiril Dichev
Tel.: +49 711 685 60492
E-mail: dic...@hlrs.de
High Performance Computing Center Stuttgart (HLRS)
Universität Stuttgart
70550 Stuttgart
Germany




Re: [OMPI users] Issue with PBS Pro

2009-01-30 Thread Kiril Dichev
Hi,

I did a few things wrong before:

1. The new name of the component "pls" is "plm".
2. It seems for the components, now a ":" separation is used instead of
a "-" separation.

Anyway, for me specifying "--enable-mca-static=plm:tm" seems to fix the
problem - I still have shared libraries for Open MPI with statically
compiled Torque support.

Cheers,
Kiril

On Thu, 2009-01-29 at 12:37 -0700, Ralph Castain wrote:
> On a Torque system, your job is typically started on a backend node.  
> Thus, you need to have the Torque libraries installed on those nodes -  
> or else build OMPI static, as you found.
> 
> I have never tried --enable-mca-static, so I have no idea if this  
> works or what it actually does. If I want static, I just build the  
> entire tree that way.
> 
> If you want to run dynamic, though, you'll have to make the Torque  
> libs available on the backend nodes.
> 
> Ralph
> 
> 
> On Jan 29, 2009, at 8:32 AM, Kiril Dichev wrote:
> 
> > Hi,
> >
> > I am trying to run with Open MPI 1.3 on a cluster using PBS Pro:
> >
> > pbs_version = PBSPro_9.2.0.81361
> >
> >
> > However, after compiling with these options:
> >
> > ../configure
> > --prefix=/home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3- 
> > intel10.1-64bit-dynamic-threads CC=/opt/intel/cce/10.1.015/bin/icc  
> > CXX=/opt/intel/cce/10.1.015/bin/icpc CPP="/opt/intel/cce/10.1.015/ 
> > bin/icc -E" FC=/opt/intel/fce/10.1.015/bin/ifort F90=/opt/intel/fce/ 
> > 10.1.015/bin/ifort F77=/opt/intel/fce/10.1.015/bin/ifort --enable- 
> > mpi-f90 --with-tm=/usr/pbs/ --enable-mpi-threads=yes --enable- 
> > contrib-no-build=vt
> >
> > I get runtime errors when running on more than one reserved node
> > even /bin/hostname:
> >
> > /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit- 
> > dynamic-threads/bin/mpirun  -np 5  /bin/hostname
> > /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit- 
> > dynamic-threads/bin/mpirun: symbol lookup error: /home_nfs/parma/ 
> > x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-dynamic-threads/ 
> > lib/openmpi/mca_plm_tm.so: undefined symbol: tm_init
> >
> > When running on one node only, I don't get this error.
> >
> > Now, I see that I only have static PBS libraries so I tried to compile
> > this component statically. I added to the above configure:
> > "--enable-mca-static=ras-tm,pls-tm"
> >
> > However, nothing changed. The same errors occurr.
> >
> >
> > But if I compile Open MPI only with static libraries ("--enable-static
> > --disable-shared"), the MPI (or non-MPI) programs run OK.
> >
> > Can you help me here ?
> >
> > Thanks,
> > Kiril
> >
> >
> >
> > -- 
> > Dipl.-Inf. Kiril Dichev
> > Tel.: +49 711 685 60492
> > E-mail: dic...@hlrs.de
> > High Performance Computing Center Stuttgart (HLRS)
> > Universität Stuttgart
> > 70550 Stuttgart
> > Germany
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
-- 
Dipl.-Inf. Kiril Dichev
Tel.: +49 711 685 60492
E-mail: dic...@hlrs.de
High Performance Computing Center Stuttgart (HLRS)
Universität Stuttgart
70550 Stuttgart
Germany




[OMPI users] Problems in 1.3 loading shared libs when using VampirServer

2009-02-04 Thread Kiril Dichev
Hi guys,

sorry for the long e-mail.

I have been trying for some time now to run VampirServer with shared
libs for Open MPI 1.3.

First of all: The "--enable-static --disable-shared" version works.
Also, the 1.2 series worked fine with the shared libs.

But here is the story for the shared libraries with OMPI 1.3:
Compilation of OMPI went fine and also the VampirServer guys compiled
the MPI driver they need against OMPI. The driver just refers to the
shared libraries of Open MPI.

However, on launching the server I got errors of the type "undefined
symbol":

error: 
/home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-MT-shared/lib/openmpi/mca_paffinity_linux.so:
 
undefined symbol: mca_base_param_reg_int

It seemed to me that probably my LD_LIBRARY_PATH is not including
/lib/openmpi , but I exported it and did "mpirun -x
LD_LIBRARY_PATH ..." and nothing changed.

Then, I started building any component complaining with "undefined
symbol" with "--enable-mca-static" - for example the above message
disappeared after I did --enable-mca-static paffinity. I don't know why
this worked, but it seemed to help. However, it was always replaced by
another error message of another component. 

After a few components another error came

mca: base: component_find: unable to
open 
/home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-MT-shared/lib/openmpi/mca_rml_oob:
 file not found (ignored) 

(full output attached)

Now, I was unsure what to do, but again, when compiling the complaining
component statically, things went a step further. One thing that struck
me is that there is such a file with an extra ".so" at the end in the
directory -but maybe dlopen also accepts files without the ".so", I
don't know.


Anywas, now I have included like 20 components statically and still
build shared objects for the OMPI libs and things seem to work.

Does anyone have any idea why these dozens of errors happen when loading
shared libs? Like I said, I never had this in 1.2 series.


Thanks,
Kiril


[nv8:21349] mca: base: component_find: unable to open 
/home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-MT-shared/lib/openmpi/mca_rml_oob:
 file not found (ignored)
[nv8:21349] [[8664,1],1] ORTE_ERROR_LOG: Error in file 
../../../../orte/mca/ess/base/ess_base_std_app.c at line 72
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_rml_base_select failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--
[nv8:21349] [[8664,1],1] ORTE_ERROR_LOG: Error in file 
../../../../../orte/mca/ess/env/ess_env_module.c at line 154
[nv8:21349] [[8664,1],1] ORTE_ERROR_LOG: Error in file 
../../orte/runtime/orte_init.c at line 132
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--
[nv8:21348] mca: base: component_find: unable to open 
/home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-intel10.1-64bit-MT-shared/lib/openmpi/mca_rml_oob:
 file not found (ignored)
[nv8:21348] [[8664,1],0] ORTE_ERROR_LOG: Error in file 
../../../../orte/mca/ess/base/ess_base_std_app.c at line 72
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[nv8:21349] Abort before MPI_INIT completed successfully; not able to guarantee 
that all other processes were killed!
--
It looks like orte_init failed 

Re: [OMPI users] Problems in 1.3 loading shared libs when using VampirServer

2009-02-26 Thread Kiril Dichev
I am happy to confirm that Jeff's suggestion worked. 

The problem was following: in previous versions VampirServer issued 


 ComLib = dlopen( driverName, RTLD_LAZY );

Changing this to following fixed the problem:

 ComLib = dlopen( driverName, RTLD_LAZY | RTLD_GLOBAL );


The VampirServer guys compiled the modified version of VampirServer and
now, the shared library Open MPI 1.3 launches VampirServer without
issues.

It seems that the previous dlopen call did not have a global scope and
so the VampirServer plugin did not find the Open MPI 1.3 shared objects.

Thanks for the help!

Kiril

On Tue, 2009-02-24 at 11:02 -0500, Jeff Squyres wrote:
> On Feb 23, 2009, at 8:59 PM, Jeff Squyres wrote:
> 
> > Err... I'm a little confused.  We've been emailing about this exact  
> > issue for a week or two (off list); you just re-started the  
> > conversation from the beginning, moved it to the user's list, and  
> > dropped all the CC's (which include several people who are not on  
> > this list).  Why did you do that?
> 
> 
> GAAH!!  Mea maxima culpa.  :-(
> 
> My stupid mail program did something strange (exact details  
> unimportant) that made me think you re-sent your message to the users  
> list yesterday -- thereby re-starting the whole conversation, etc.   
> Upon double checking, I see that this is *not* what you did at all --  
> my mail program was showing me your original post from Feb 4 and  
> making it look like you re-sent it yesterday.  I just wasn't careful  
> in my reading.  Sorry about that; the fault and confusion was entirely  
> mine.  :-(
> 
> (we're continuing the conversation off-list just because it's gnarly  
> and full of details about Vampir that most people probably don't care  
> about; they're working on a small example to send to me that  
> replicates the problem -- will post back here when we have some kind  
> of solution...)
> 
> We now return you to your regularly scheduled programming...
> 
-- 
Dipl.-Inf. Kiril Dichev
Tel.: +49 711 685 60492
E-mail: dic...@hlrs.de
High Performance Computing Center Stuttgart (HLRS)
Universität Stuttgart
70550 Stuttgart
Germany