Re: [OMPI users] Open MPI State of the Union BOF at SC'16 next week

2016-11-15 Thread Sean Ahern
Make sense. Thanks for making the slides available. Would you mind posting
to the list when the rest of us can get them?

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883

On Tue, Nov 15, 2016 at 10:53 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> On Nov 10, 2016, at 9:31 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com>
> wrote:
> >
> > The slides will definitely be available afterwards.  We'll see if we can
> make some flavor of recording available as well.
>
> After poking around a bit, it looks like the SC rules prohibit us from
> recording the BOF (which is not unreasonable, actually).
>
> The slides will definitely be available on the Open MPI web site.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Open MPI State of the Union BOF at SC'16 next week

2016-11-10 Thread Sean Ahern
Any chance this will be recorded for those who won't be able to be there
next week?

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883

On Thu, Nov 10, 2016 at 11:00 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> Be sure to come see us at "Open MPI State of the Union X" BOF (yes, that's
> right, we've been giving these BOFs for ***10 years***!) next week in Salt
> Lake City, UT at the SC'16 trade show:
>
> http://sc16.supercomputing.org/presentation/?id=bof103=sess322
>
> This year, the BOF is at 5:30pm on Wednesday (in previous years, it has
> been in the noon timeslot).
>
> Many of the Open MPI developers will be at SC; be sure to stop by at any
> organizations' booths and ask for us.  I'll be spending lots of time at the
> Cisco booth, for example.
>
> We hope to see you there!
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] error on dlopen

2016-11-03 Thread Sean Ahern
Sounds to me like you're missing a -ldl linker flag.

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883

On Thu, Nov 3, 2016 at 3:57 PM, Mahmood Naderan <mahmood...@gmail.com>
wrote:

> Hi
> I am building scalapack with mpicc and mpifort, however this is the error
> I get:
>
> mpifort -O3 -o xCbtest blacstest.o btprim.o tools.o Cbt.o
> ../../libscalapack.a
> /opt/openmpi-2.0.1/lib/libopen-pal.a(dl_dlopen_module.o): In function
> `dlopen_close':
> dl_dlopen_module.c:(.text+0x29d): undefined reference to `dlclose'
> /opt/openmpi-2.0.1/lib/libopen-pal.a(dl_dlopen_module.o): In function
> `dlopen_lookup':
> dl_dlopen_module.c:(.text+0x2d0): undefined reference to `dlsym'
> dl_dlopen_module.c:(.text+0x2fb): undefined reference to `dlerror'
> /opt/openmpi-2.0.1/lib/libopen-pal.a(dl_dlopen_module.o): In function
> `dlopen_open':
> dl_dlopen_module.c:(.text+0x3ca): undefined reference to `dlopen'
> dl_dlopen_module.c:(.text+0x431): undefined reference to `dlerror'
> dl_dlopen_module.c:(.text+0x456): undefined reference to `dlopen'
> dl_dlopen_module.c:(.text+0x4a9): undefined reference to `dlerror'
> dl_dlopen_module.c:(.text+0x501): undefined reference to `dlopen'
> /opt/openmpi-2.0.1/lib/libopen-pal.a(patcher_overwrite_module.o): In
> function `mca_patcher_overwrite_patch_symbol':
> patcher_overwrite_module.c:(.text+0x12e): undefined reference to `dlsym'
> patcher_overwrite_module.c:(.text+0x166): undefined reference to `dlsym'
> patcher_overwrite_module.c:(.text+0x173): undefined reference to `dlerror'
> collect2: error: ld returned 1 exit status
> Makefile:18: recipe for target 'xCbtest' failed
> make[2]: *** [xCbtest] Error 1
>
>
>
> As I grep "dlopen", some OMPI binary files match. Any idea about that?
>
>
> Regards,
> Mahmood
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MCA compilation later

2016-11-01 Thread Sean Ahern
That's useful. Thank you.

It sounds like, as long as the component exists for OpenMPI already, it's
just a matter of compiling OpenMPI on a machine that has the headers and
libraries (with appropriate configure flags), and grabbing the individual
component from there.

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883

On Tue, Nov 1, 2016 at 12:45 AM, r...@open-mpi.org <r...@open-mpi.org> wrote:

> Here’s a link on how to create components:
>
> https://github.com/open-mpi/ompi/wiki/devel-CreateComponent
>
> and if you want to create a completely new framework:
>
> https://github.com/open-mpi/ompi/wiki/devel-CreateFramework
>
> If you want to distribute a proprietary plugin, you first develop and
> build it within the OMPI code base on your own machines. Then, just take
> the dll for your plugin from the /lib/openmpi directory and
> distribute that “blob”.
>
> I’ll correct my comment: you need the headers and the libraries. You just
> don’t need the hardware, though it means you cannot test those features.
>
>
> On Oct 31, 2016, at 6:19 AM, Sean Ahern <s...@ensight.com> wrote:
>
> Thanks. That's what I expected and hoped. But is there a pointer about how
> to get started? If I've got an existing OpenMPI build, what's the process
> to get a new MCA plugin built with a new set of header files?
>
> (I'm a bit surprised only header files are necessary. Shouldn't the plugin
> require at least runtime linking with a low-level transport library?)
>
> -Sean
>
> --
> Sean Ahern
> Computational Engineering International
> 919-363-0883
>
> On Fri, Oct 28, 2016 at 3:40 PM, r...@open-mpi.org <r...@open-mpi.org>
> wrote:
>
>> You don’t need any of the hardware - you just need the headers. Things
>> like libfabric and libibverbs are all publicly available, and so you can
>> build all that support even if you cannot run it on your machine.
>>
>> Once your customer installs the binary, the various plugins will check
>> for their required library and hardware and disqualify themselves if it
>> isn’t found.
>>
>> On Oct 28, 2016, at 12:33 PM, Sean Ahern <s...@ensight.com> wrote:
>>
>> There's been discussion on the OpenMPI list recently about static linking
>> of OpenMPI with all of the desired MCAs in it. I've got the opposite
>> question. I'd like to add MCAs later on to an already-compiled version of
>> OpenMPI and am not quite sure how to do it.
>>
>> Let me summarize. We've got a commercial code that we deploy on customer
>> machines in binary form. We're working to integrate OpenMPI into the
>> installer, and things seem to be progressing well. (Note: because we're a
>> commercial code, making the customer compile something doesn't work for us
>> like it can for open source or research codes.)
>>
>> Now, we want to take advantage of OpenMPI's ability to find MCAs at
>> runtime, pointing to the various plugins that might apply to a deployed
>> system. I've configured and compiled OpenMPI on one of our build machines,
>> one that doesn't have any special interconnect hardware or software
>> installed. We take this compiled version of OpenMPI and use it on all of
>> our machines. (Yes, I've read Building FAQ #39
>> <https://www.open-mpi.org/faq/?category=building#installdirs> about
>> relocating OpenMPI. Useful, that.) I'd like to take our pre-compiled
>> version of OpenMPI and add MCA libraries to it, giving OpenMPI the ability
>> to communicate via transport mechanisms that weren't available on the
>> original build machine. Things like InfiniBand, OmniPath, or one of Cray's
>> interconnects.
>>
>> How would I go about doing this? And what are the limitations?
>>
>> I'm guessing that I need to go configure and compile the same version of
>> OpenMPI on a machine that has the desired interconnect installation
>> (headers and libraries), then go grab the corresponding
>> lib/openmpi/mca_*{la,so} files. Take those files and drop them in our
>> pre-built OpenMPI from our build machine in the same relative plugin
>> location (lib/openmpi). If I stick with the same compiler (gcc, in this
>> case), I'm hoping that symbols will all resolve themselves at runtime. (I
>> probably will have to do some LD_LIBRARY_PATH games to be sure to find the
>> appropriate underlying libraries unless OpenMPI's process for building MCAs
>> links them in statically somehow.)
>>
>> Am I even on the right track here? (The various system-level FAQs (here
>> <https://www.open-mpi.org/faq/?category=supported-systems>, here
>> <https://www.open-mpi.org/faq/?category=developers>,

Re: [OMPI users] MCA compilation later

2016-10-31 Thread Sean Ahern
Thanks. That's what I expected and hoped. But is there a pointer about how
to get started? If I've got an existing OpenMPI build, what's the process
to get a new MCA plugin built with a new set of header files?

(I'm a bit surprised only header files are necessary. Shouldn't the plugin
require at least runtime linking with a low-level transport library?)

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883

On Fri, Oct 28, 2016 at 3:40 PM, r...@open-mpi.org <r...@open-mpi.org> wrote:

> You don’t need any of the hardware - you just need the headers. Things
> like libfabric and libibverbs are all publicly available, and so you can
> build all that support even if you cannot run it on your machine.
>
> Once your customer installs the binary, the various plugins will check for
> their required library and hardware and disqualify themselves if it isn’t
> found.
>
> On Oct 28, 2016, at 12:33 PM, Sean Ahern <s...@ensight.com> wrote:
>
> There's been discussion on the OpenMPI list recently about static linking
> of OpenMPI with all of the desired MCAs in it. I've got the opposite
> question. I'd like to add MCAs later on to an already-compiled version of
> OpenMPI and am not quite sure how to do it.
>
> Let me summarize. We've got a commercial code that we deploy on customer
> machines in binary form. We're working to integrate OpenMPI into the
> installer, and things seem to be progressing well. (Note: because we're a
> commercial code, making the customer compile something doesn't work for us
> like it can for open source or research codes.)
>
> Now, we want to take advantage of OpenMPI's ability to find MCAs at
> runtime, pointing to the various plugins that might apply to a deployed
> system. I've configured and compiled OpenMPI on one of our build machines,
> one that doesn't have any special interconnect hardware or software
> installed. We take this compiled version of OpenMPI and use it on all of
> our machines. (Yes, I've read Building FAQ #39
> <https://www.open-mpi.org/faq/?category=building#installdirs> about
> relocating OpenMPI. Useful, that.) I'd like to take our pre-compiled
> version of OpenMPI and add MCA libraries to it, giving OpenMPI the ability
> to communicate via transport mechanisms that weren't available on the
> original build machine. Things like InfiniBand, OmniPath, or one of Cray's
> interconnects.
>
> How would I go about doing this? And what are the limitations?
>
> I'm guessing that I need to go configure and compile the same version of
> OpenMPI on a machine that has the desired interconnect installation
> (headers and libraries), then go grab the corresponding
> lib/openmpi/mca_*{la,so} files. Take those files and drop them in our
> pre-built OpenMPI from our build machine in the same relative plugin
> location (lib/openmpi). If I stick with the same compiler (gcc, in this
> case), I'm hoping that symbols will all resolve themselves at runtime. (I
> probably will have to do some LD_LIBRARY_PATH games to be sure to find the
> appropriate underlying libraries unless OpenMPI's process for building MCAs
> links them in statically somehow.)
>
> Am I even on the right track here? (The various system-level FAQs (here
> <https://www.open-mpi.org/faq/?category=supported-systems>, here
> <https://www.open-mpi.org/faq/?category=developers>, and especially here
> <https://www.open-mpi.org/faq/?category=sysadmin>) seem to suggest that I
> am.)
>
> Our first test platform will be getting OpenMPI via IB working on our
> cluster, where we have IB (and TCP/IP) functional and not OpenMPI. This
> will be a great stand-in for a customer that has an IB cluster and wants to
> just run our binary installation.
>
> Thanks.
>
> -Sean
>
> --
> Sean Ahern
> Computational Engineering International
> 919-363-0883
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] MCA compilation later

2016-10-28 Thread Sean Ahern
There's been discussion on the OpenMPI list recently about static linking
of OpenMPI with all of the desired MCAs in it. I've got the opposite
question. I'd like to add MCAs later on to an already-compiled version of
OpenMPI and am not quite sure how to do it.

Let me summarize. We've got a commercial code that we deploy on customer
machines in binary form. We're working to integrate OpenMPI into the
installer, and things seem to be progressing well. (Note: because we're a
commercial code, making the customer compile something doesn't work for us
like it can for open source or research codes.)

Now, we want to take advantage of OpenMPI's ability to find MCAs at
runtime, pointing to the various plugins that might apply to a deployed
system. I've configured and compiled OpenMPI on one of our build machines,
one that doesn't have any special interconnect hardware or software
installed. We take this compiled version of OpenMPI and use it on all of
our machines. (Yes, I've read Building FAQ #39
<https://www.open-mpi.org/faq/?category=building#installdirs> about
relocating OpenMPI. Useful, that.) I'd like to take our pre-compiled
version of OpenMPI and add MCA libraries to it, giving OpenMPI the ability
to communicate via transport mechanisms that weren't available on the
original build machine. Things like InfiniBand, OmniPath, or one of Cray's
interconnects.

How would I go about doing this? And what are the limitations?

I'm guessing that I need to go configure and compile the same version of
OpenMPI on a machine that has the desired interconnect installation
(headers and libraries), then go grab the corresponding
lib/openmpi/mca_*{la,so} files. Take those files and drop them in our
pre-built OpenMPI from our build machine in the same relative plugin
location (lib/openmpi). If I stick with the same compiler (gcc, in this
case), I'm hoping that symbols will all resolve themselves at runtime. (I
probably will have to do some LD_LIBRARY_PATH games to be sure to find the
appropriate underlying libraries unless OpenMPI's process for building MCAs
links them in statically somehow.)

Am I even on the right track here? (The various system-level FAQs (here
<https://www.open-mpi.org/faq/?category=supported-systems>, here
<https://www.open-mpi.org/faq/?category=developers>, and especially here
<https://www.open-mpi.org/faq/?category=sysadmin>) seem to suggest that I
am.)

Our first test platform will be getting OpenMPI via IB working on our
cluster, where we have IB (and TCP/IP) functional and not OpenMPI. This
will be a great stand-in for a customer that has an IB cluster and wants to
just run our binary installation.

Thanks.

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem with double shared library

2016-10-28 Thread Sean Ahern
Gilles,

You described the problem exactly. I think we were able to nail down a
solution to this one through judicious use of the -rpath $MPI_DIR/lib
linker flag, allowing the runtime linker to properly find OpenMPI symbols
at runtime. We're operational. Thanks for your help.

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883

On Mon, Oct 17, 2016 at 9:45 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Sean,
>
>
> if i understand correctly, your built a libtransport_mpi.so library that
> depends on Open MPI, and your main program dlopen libtransport_mpi.so.
>
> in this case, and at least for the time being,  you need to use
> RTLD_GLOBAL in your dlopen flags.
>
>
> Cheers,
>
>
> Gilles
>
> On 10/18/2016 4:53 AM, Sean Ahern wrote:
>
> Folks,
>
> For our code, we have a communication layer that abstracts the code that
> does the actual transfer of data. We call these "transports", and we link
> them as shared libraries. We have created an MPI transport that
> compiles/links against OpenMPI 2.0.1 using the compiler wrappers. When I
> compile OpenMPI with the--disable-dlopen option (thus cramming all of
> OpenMPI's plugins into the MPI library directly), things work great with
> our transport shared library. But when I have a "normal" OpenMPI (without
> --disable-dlopen) and create the same transport shared library, things
> fail. Upon launch, it appears that OpenMPI is unable to find the
> appropriate plugins:
>
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_patcher_overwrite: /home/sean/work/ceisvn/apex/
> branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-
> 2.0.1/lib/openmpi/mca_patcher_overwrite.so: undefined symbol:
> *mca_patcher_base_patch_t_class* (ignored)
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_shmem_mmap: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/
> machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_mmap.so:
> undefined symbol: *opal_show_help* (ignored)
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_shmem_posix: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/
> machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_posix.so:
> undefined symbol: *opal_show_help* (ignored)
> [hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
> open mca_shmem_sysv: /home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/
> machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_sysv.so:
> undefined symbol: *opal_show_help* (ignored)
> --
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   opal_shmem_base_select failed
>   --> Returned value -1 instead of OPAL_SUCCESS
> --
> --
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   opal_init failed
>   --> Returned value Error (-1) instead of ORTE_SUCCESS
> --
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "Error" (-1) instead of "Success" (0)
>
>
> If I skip our shared libraries and instead write a standard MPI-based
> "hello, world" program that links against MPI directly (without
> --disable-dlopen), everything is again fine.
>
> It seems that having the double dlopen is causing problems for OpenMPI

[OMPI users] Problem with double shared library

2016-10-17 Thread Sean Ahern
Folks,

For our code, we have a communication layer that abstracts the code that
does the actual transfer of data. We call these "transports", and we link
them as shared libraries. We have created an MPI transport that
compiles/links against OpenMPI 2.0.1 using the compiler wrappers. When I
compile OpenMPI with the--disable-dlopen option (thus cramming all of
OpenMPI's plugins into the MPI library directly), things work great with
our transport shared library. But when I have a "normal" OpenMPI (without
--disable-dlopen) and create the same transport shared library, things
fail. Upon launch, it appears that OpenMPI is unable to find the
appropriate plugins:

[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
open mca_patcher_overwrite:
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_patcher_overwrite.so:
undefined symbol: *mca_patcher_base_patch_t_class* (ignored)
[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
open mca_shmem_mmap:
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_mmap.so:
undefined symbol: *opal_show_help* (ignored)
[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
open mca_shmem_posix:
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_posix.so:
undefined symbol: *opal_show_help* (ignored)
[hyperion.ceintl.com:25595] mca_base_component_repository_open: unable to
open mca_shmem_sysv:
/home/sean/work/ceisvn/apex/branches/OpenMPI/apex32/machines/linux_2.6_64/openmpi-2.0.1/lib/openmpi/mca_shmem_sysv.so:
undefined symbol: *opal_show_help* (ignored)
--
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)


If I skip our shared libraries and instead write a standard MPI-based
"hello, world" program that links against MPI directly (without
--disable-dlopen), everything is again fine.

It seems that having the double dlopen is causing problems for OpenMPI
finding its own shared libraries.

Note: I do have LD_LIBRARY_PATH pointing to …"openmpi-2.0.1/lib", as well
as OPAL_PREFIX pointing to …"openmpi-2.0.1".

Any thoughts about how I can try to tease out what's going wrong here?

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Can't compile OpenMPI 2.0.0 on a CentOS 6 system

2016-09-01 Thread Sean Ahern
On Thu, Sep 1, 2016 at 3:14 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com>
wrote:

>
> FWIW, we usually store the tarballs themselves in VCSs if we want to
> preserve specific third-party tarballs.  It's a little gross (i.e., storing
> a big binary tarball in a VCS), but it works.  Depends on your tolerance
> level for "ick" in a VCS.  :-)
>
​
Not a bad idea! I may end up implementing precisely this for our work.

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Can't compile OpenMPI 2.0.0 on a CentOS 6 system

2016-09-01 Thread Sean Ahern
Okay, I think I figured it out. The short answer is that version control
systems can mess up
​ ​
relative
file system timestamps.

While I was basically doing:

tar xzf openmpi-2.0.0.tar.gz
cd openmpi-2.0.0
./configure …
make


In actuality, I stored off the source in our "third party" repo before I
built it.

svn add openmpi-2.0.0

svn commit


When I grabbed that source back on the machine I wanted to build on, the
relative timestamps weren't the same as what I would have gotten with a
simple untar.


machine 1:
tar xvf openmpi-2.0.0.tar.gz
machine 1:
svn add openmpi-2.0.0

machine 1:
​ ​
svn commit

machine 2:
svn update openmpi-2.0.0
machine 2:
cd openmpi-2.0.0
machine 2:
./configure …
machine 2:
make


Thus, I got make dependencies like this:

  Finished prerequisites of target file `aclocal.m4'.
  Prerequisite `config/c_get_alignment.m4' is older than target
`aclocal.m4'.
  Prerequisite `config/c_weak_symbols.m4' is older than target
`aclocal.m4'.
  Prerequisite `config/libtool.m4' is older than target `aclocal.m4'.

Which is why it tried to rebuild portions of the autoconf configuration.

I think I can get things going again.

Sorry for the noise completely unrelated to OpenMPI!


-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883

On Thu, Sep 1, 2016 at 1:24 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com>
wrote:

> Ok, weird.  Try running the process again (untar, configure, make) but use
> "make -d" and capture the entire output so that you can see what file(s)
> is(are) triggering Automake to invoke aclocal during the build (it will be
> a *LOT* of output).
>
>
> > On Sep 1, 2016, at 1:20 PM, Sean Ahern <s...@ensight.com> wrote:
> >
> > Yep, that's it.
> > -Sean
> >
> > --
> > Sean Ahern
> > Computational Engineering International
> > 919-363-0883
> >
> >
> > On Thu, Sep 1, 2016 at 1:04 PM, Jeff Squyres (jsquyres)
> > <jsquy...@cisco.com> wrote:
> >> That's odd -- I've never seen this kind of problem happen on a
> locally-mounted filesystem.
> >>
> >> Just to make sure: you're *not* running autogen.pl, right?  You're
> just basically doing this:
> >>
> >> -----
> >> $ tar xf openmpi-2.0.0.tar.bz2
> >> $ cd openmpi-2.0.0
> >> $ ./configure ...
> >> $ make ...
> >> -
> >>
> >> Right?
> >>
> >>
> >>> On Sep 1, 2016, at 12:51 PM, Sean Ahern <s...@ensight.com> wrote:
> >>>
> >>> Greetings, Jeff.
> >>>
> >>> Sure, I could see that. But I'm trying to run on a locally mounted
> filesystem in this case. I may need to run make in debug mode and see what
> it thinks is out of date. See if you guys can help me track down the
> dependency problem.
> >>>
> >>> -Sean
> >>>
> >>> --
> >>> Sean Ahern
> >>> Computational Engineering International
> >>> 919-363-0883
> >>>
> >>> On Thu, Sep 1, 2016 at 11:56 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> >>> Greetings Sean.
> >>>
> >>> Yes, you are correct - when you build from the tarball, you should not
> need the GNU autotools.
> >>>
> >>> When tarball builds fail like this, it *usually* means that you are
> building in a network filesystem, and the time is not well synchronized
> between the machine on which you are building and the network filesystem
> server.  Specifically: GNU Autotools-based builds are heavily dependent
> upon filesystem timestamps.  If the sync is off between a network
> filesystem client and server, all kinds of things go wrong (to include
> thinking that it needs to run the autotools as part of the build).
> >>>
> >>>
> >>>
> >>>> On Sep 1, 2016, at 11:47 AM, Sean Ahern <s...@ensight.com> wrote:
> >>>>
> >>>> I'm trying to compile OpenMPI 2.0.0 on a CentOS 6.7 system and am
> running into what appears to be a very basic problem. I'm hoping someone
> here can give me a pointer. (I'll have more involved questions later.) I've
> looked through the FAQ for an answer and didn't see anything related. And I
> don't see any messages in the archives from the last several years about
> this.
> >>>>
> >>>> I'm using the tarball 2.0.0 release, which presumably shouldn't
> require the GNU autotools. But, after running "configure", the subsequent
> "make" fails, trying to run aclocal-1.15.
> >>>>
> >>>> Here's running configure:
> >>>>
> >>

Re: [OMPI users] Can't compile OpenMPI 2.0.0 on a CentOS 6 system

2016-09-01 Thread Sean Ahern
Yep, that's it.
-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883


On Thu, Sep 1, 2016 at 1:04 PM, Jeff Squyres (jsquyres)
<jsquy...@cisco.com> wrote:
> That's odd -- I've never seen this kind of problem happen on a 
> locally-mounted filesystem.
>
> Just to make sure: you're *not* running autogen.pl, right?  You're just 
> basically doing this:
>
> -
> $ tar xf openmpi-2.0.0.tar.bz2
> $ cd openmpi-2.0.0
> $ ./configure ...
> $ make ...
> -
>
> Right?
>
>
>> On Sep 1, 2016, at 12:51 PM, Sean Ahern <s...@ensight.com> wrote:
>>
>> Greetings, Jeff.
>>
>> Sure, I could see that. But I'm trying to run on a locally mounted 
>> filesystem in this case. I may need to run make in debug mode and see what 
>> it thinks is out of date. See if you guys can help me track down the 
>> dependency problem.
>>
>> -Sean
>>
>> --
>> Sean Ahern
>> Computational Engineering International
>> 919-363-0883
>>
>> On Thu, Sep 1, 2016 at 11:56 AM, Jeff Squyres (jsquyres) 
>> <jsquy...@cisco.com> wrote:
>> Greetings Sean.
>>
>> Yes, you are correct - when you build from the tarball, you should not need 
>> the GNU autotools.
>>
>> When tarball builds fail like this, it *usually* means that you are building 
>> in a network filesystem, and the time is not well synchronized between the 
>> machine on which you are building and the network filesystem server.  
>> Specifically: GNU Autotools-based builds are heavily dependent upon 
>> filesystem timestamps.  If the sync is off between a network filesystem 
>> client and server, all kinds of things go wrong (to include thinking that it 
>> needs to run the autotools as part of the build).
>>
>>
>>
>> > On Sep 1, 2016, at 11:47 AM, Sean Ahern <s...@ensight.com> wrote:
>> >
>> > I'm trying to compile OpenMPI 2.0.0 on a CentOS 6.7 system and am running 
>> > into what appears to be a very basic problem. I'm hoping someone here can 
>> > give me a pointer. (I'll have more involved questions later.) I've looked 
>> > through the FAQ for an answer and didn't see anything related. And I don't 
>> > see any messages in the archives from the last several years about this.
>> >
>> > I'm using the tarball 2.0.0 release, which presumably shouldn't require 
>> > the GNU autotools. But, after running "configure", the subsequent "make" 
>> > fails, trying to run aclocal-1.15.
>> >
>> > Here's running configure:
>> >
>> > % ./configure --prefix=blah/blah/openmpi-2.0.0 --disable-java 
>> > --disable-mpi-fortran
>> > 
>> > == Configuring Open MPI
>> > 
>> >
>> > *** Startup tests
>> > checking build system type... x86_64-unknown-linux-gnu
>> > checking host system type... x86_64-unknown-linux-gnu
>> > checking target system type... x86_64-unknown-linux-gnu
>> > checking for gcc... gcc
>> > checking whether the C compiler works... yes
>> > checking for C compiler default output file name... a.out
>> > checking for suffix of executables...
>> > checking whether we are cross compiling... no
>> > … lots of output …
>> > config.status: creating ompi/mpiext/cuda/c/mpiext_cuda_c.h
>> > config.status: ompi/mpiext/cuda/c/mpiext_cuda_c.h is unchanged
>> > config.status: executing depfiles commands
>> > config.status: executing 
>> > opal/mca/event/libevent2022/libevent/include/event2/event-config.h commands
>> > config.status: executing libtool commands
>> >
>> > Seems fine. (If someone wants the full "configure" output, I'll send it.)
>> > But when I run "make", I immediately get this:
>> >
>> > % make all
>> > CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh 
>> > /home/sean/work/thirdparty/trunk/OpenMPI/openmpi-2.0.0/config/missing 
>> > aclocal-1.15 -I config
>> > /home/sean/work/thirdparty/trunk/OpenMPI/openmpi-2.0.0/config/missing: 
>> > line 81: aclocal-1.15: command not found
>> > WARNING: 'aclocal-1.15' is missing on your system.
>> >  You should only need it if you modified 'acinclude.m4' or
>> >  'configure.ac' or m4 files included by 'configure.ac'.
>> >  The 'aclocal' program is part of the GNU Automake package:
>>

Re: [OMPI users] Can't compile OpenMPI 2.0.0 on a CentOS 6 system

2016-09-01 Thread Sean Ahern
Greetings, Jeff.

Sure, I could see that. But I'm trying to run on a locally mounted
filesystem in this case. I may need to run make in debug mode and see what
it thinks is out of date. See if you guys can help me track down the
dependency problem.

-Sean

--
Sean Ahern
Computational Engineering International
919-363-0883

On Thu, Sep 1, 2016 at 11:56 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:

> Greetings Sean.
>
> Yes, you are correct - when you build from the tarball, you should not
> need the GNU autotools.
>
> When tarball builds fail like this, it *usually* means that you are
> building in a network filesystem, and the time is not well synchronized
> between the machine on which you are building and the network filesystem
> server.  Specifically: GNU Autotools-based builds are heavily dependent
> upon filesystem timestamps.  If the sync is off between a network
> filesystem client and server, all kinds of things go wrong (to include
> thinking that it needs to run the autotools as part of the build).
>
>
>
> > On Sep 1, 2016, at 11:47 AM, Sean Ahern <s...@ensight.com> wrote:
> >
> > I'm trying to compile OpenMPI 2.0.0 on a CentOS 6.7 system and am
> running into what appears to be a very basic problem. I'm hoping someone
> here can give me a pointer. (I'll have more involved questions later.)​
> ​I've looked through the FAQ for an answer and didn't see anything related.
> And I don't see any messages in the archives from the last several years
> about this.
> >
> > I'm using the tarball 2.0.0 release, which presumably shouldn't require
> the GNU autotools. But, after running "configure", the subsequent "make"
> fails, trying to run aclocal-1.15​.
> >
> > Here's running configure:​
> >
> > % ./configure --prefix=blah/blah/openmpi-2.0.0 --disable-java
> --disable-mpi-fortran
> > 
> 
> > == Configuring Open MPI
> > 
> 
> >
> > *** Startup tests
> > checking build system type... x86_64-unknown-linux-gnu
> > checking host system type... x86_64-unknown-linux-gnu
> > checking target system type... x86_64-unknown-linux-gnu
> > checking for gcc... gcc
> > checking whether the C compiler works... yes
> > checking for C compiler default output file name... a.out
> > checking for suffix of executables...
> > checking whether we are cross compiling... no
> > … lots of output …
> > config.status: creating ompi/mpiext/cuda/c/mpiext_cuda_c.h
> > config.status: ompi/mpiext/cuda/c/mpiext_cuda_c.h is unchanged
> > config.status: executing depfiles commands
> > config.status: executing opal/mca/event/libevent2022/
> libevent/include/event2/event-config.h commands
> > config.status: executing libtool commands
> >
> > Seems fine. (If someone wants the full "configure" output, I'll send it.)
> > But when I run "make", I immediately get this:
> >
> > % make all
> > CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /home/sean/work/thirdparty/
> trunk/OpenMPI/openmpi-2.0.0/config/missing aclocal-1.15 -I config
> > /home/sean/work/thirdparty/trunk/OpenMPI/openmpi-2.0.0/config/missing:
> line 81: aclocal-1.15: command not found
> > WARNING: 'aclocal-1.15' is missing on your system.
> >  You should only need it if you modified 'acinclude.m4' or
> >  'configure.ac' or m4 files included by 'configure.ac'.
> >  The 'aclocal' program is part of the GNU Automake package:
> >  <http://www.gnu.org/software/automake>
> >  It also requires GNU Autoconf, GNU m4 and Perl in order to run:
> >  <http://www.gnu.org/software/autoconf>
> >  <http://www.gnu.org/software/m4/>
> >  <http://www.perl.org/>
> > make: *** [aclocal.m4] Error 127
> >
> > I haven't modified any m4 files. Indeed, I haven't modified anything. I
> simply ran "configure" and then "make". What am I doing wrong? I feel like
> there's something very basic failing here.
> >
> > (I have attached my bzip2ed config.log file here.)
> >
> > ​-Sean
> >
> > --
> > Sean Ahern
> > Computational Engineering International
> > 919-363-0883
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users