[OMPI devel] OMPI-5.0 vs. gfortran in RHEL

2019-11-20 Thread Paul Hargrove via devel
All,

Following up on my question in the OMPI BoF today, I have confirmed that
RHEL8 is shipping GCC-8.2, *not* the problematic GCC-4.8.5 used in RHEL7.

I don't believe that the numerous RHEL7-based systems are going to jump to
upgrade, at least you can have some confidence that NEW systems
coming online with a too-old gfortran should be small in number.

So, while dropping support in OMPI 5.0 for gfortran < 4.9 will hurt folks
using RHEL7 (and the vendor-provided GCC), this decision will not impact
RHEL8 systems.

-Paul

-- 
Paul H. Hargrove 
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department
Lawrence Berkeley National Laboratory


Re: [OMPI devel] Mac OS X 10.4.x users?

2018-09-28 Thread Paul Hargrove
Even *I* don't attempt to support OS X that old.  :wink:

-Paul

On Fri, Sep 28, 2018 at 11:41 AM Jeff Squyres (jsquyres) via devel <
devel@lists.open-mpi.org> wrote:

> Fun fact: we cause configure to fail for OS X <= 10.4 anyway:
>
> https://github.com/open-mpi/ompi/blob/master/configure.ac#L328-L347
>
> According to the comment, we do this because of a known-bad implementation
> of pty in the OS X kernel that causes kernel panics.
>
> So I think we're definitely safe removing a OS X 10.4.x workaround.
>
>
> > On Sep 28, 2018, at 2:18 PM, Ralph H Castain  wrote:
> >
> > Good lord - break away!!
> >
> >> On Sep 28, 2018, at 11:11 AM, Barrett, Brian via devel <
> devel@lists.open-mpi.org> wrote:
> >>
> >> All -
> >>
> >> In trying to clean up some warnings, I noticed one (around pack/unpack
> in net/if.h) that is due to a workaround of a bug in MacOS X 10.4.x and
> earlier.  The simple way to remove the warning would be to remove the
> workaround, which would break the next major version of Open MPI on 10.4.x
> and earlier on 64 bit systems.  10.5.x was released 11 years ago and didn’t
> drop support for any 64 bit systems.  I posted a PR which removes support
> for 10.4.x and earlier (through the README) and removes the warning
> generated workaround (https://github.com/open-mpi/ompi/pull/5803).
> >>
> >> Does anyone object to breaking 10.4.x and earlier?
> >>
> >> Brian
> >> ___
> >> devel mailing list
> >> devel@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/devel
> >
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel



-- 
Paul H. Hargrove 
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department
Lawrence Berkeley National Laboratory
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI 3.0.1rc2 available for testing

2018-02-06 Thread Paul Hargrove
All (that I have) looks good to me.

I don't have Intel compiler tests today due to planned maintenance at NERSC.
My Mac OSX High Sierra system has a failed SSD.
The UCX and PSM(1) test systems I was using appear to be retired.
I also don't have all the slow ARM and MIPS emulator results yet.

However, everything else I normally test has passed.

-Paul


On Tue, Jan 23, 2018 at 8:04 PM, Barrett, Brian via devel <
devel@lists.open-mpi.org> wrote:

> I’ve posted the first public release candidate of Open MPI 3.0.1 this
> evening.  It can be downloaded for testing from:
>
>   https://www.open-mpi.org/software/ompi/v3.0/
>
> We appreciate any testing you can do in preparation for a release in the
> next week or two.
>
>
> Thanks,
>
> Brian & Howard
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel




-- 
Paul H. Hargrove 
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department
Lawrence Berkeley National Laboratory
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Failure to download 3.0.1rc2 tarball

2018-02-06 Thread Paul Hargrove
Howard says that Jeff asked him to ask me to report the problem shown
below, which I am having downloading the recent RC with an old wget.  FWIW
I can use curl on the effected system, and so this is not a big issue to me
personally.

-Paul



phargrov@erpro8-fsf1:~$ wget
https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.1rc2.tar.bz2
--2018-02-06 14:52:16--
https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.1rc2.tar.bz2
Resolving www.open-mpi.org (www.open-mpi.org)... 192.185.39.252
Connecting to www.open-mpi.org (www.open-mpi.org)|192.185.39.252|:443...
connected.
GnuTLS: A TLS fatal alert has been received.
Unable to establish SSL connection.

phargrov@erpro8-fsf1:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:Debian GNU/Linux 7.10 (wheezy)
Release:7.10
Codename:   wheezy
phargrov@erpro8-fsf1:~$ apt-cache policy wget libgnutls26
wget:
  Installed: 1.13.4-3+deb7u2
  Candidate: 1.13.4-3+deb7u2
  Version table:
 *** 1.13.4-3+deb7u2 0
500 http://ftp.fr.debian.org/debian/ wheezy/main mips Packages
100 /var/lib/dpkg/status
libgnutls26:
  Installed: 2.12.20-8+deb7u5
  Candidate: 2.12.20-8+deb7u5
  Version table:
 *** 2.12.20-8+deb7u5 0
500 http://ftp.fr.debian.org/debian/ wheezy/main mips Packages
100 /var/lib/dpkg/status


-- 
Paul H. Hargrove 
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department
Lawrence Berkeley National Laboratory
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Poor performance when compiling with --disable-dlopen

2018-01-23 Thread Paul Hargrove
Ah, this sounds familiar.

I believe that the issue Dave sees is that without patcher/overwrite the
"leave pinned" protocol is OFF by default.

Use of '-mca mpi_leave_pinned 1' may help if my guess is right.
HOWEVER, w/o the memory management hooks provided using patcher/overwrite,
leave pinned can give incorrect results.

-Paul

On Tue, Jan 23, 2018 at 9:17 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Dave,
>
> here is what I found
>
>  - MPI_THREAD_MULTIPLE is not part of the equation (I just found it is
> no more required by IMB by default)
>  - patcher/overwrite is not built when Open MPI is configure'd with
> --disable-dlopen
>  - when configure'd without --disable-dlopen, performances are way
> worst for the IMB (PingPong) benchmark when ran with
>mpirun --mca patcher ^overwrite
>  - OSU (osu_bw) performances are not impacted by the patcher/overwrite
> component being blacklisted
>
> I am afraid that's all I can do ...
>
>
> Nathan,
>
> could you please shed some light ?
>
>
> Cheers,
>
> Gilles
>
> On Wed, Jan 24, 2018 at 1:29 PM, Gilles Gouaillardet
>  wrote:
> > Dave,
> >
> > i can reproduce the issue with btl/openib and the IMB benchmark, that
> > is known to MPI_Init_thread(MPI_THREAD_MULTIPLE)
> >
> > note performance is ok with OSU benchmark that does not require
> > MPI_THREAD_MULTIPLE
> >
> > Cheers,
> >
> > Gilles
> >
> > On Wed, Jan 24, 2018 at 1:16 PM, Gilles Gouaillardet 
> wrote:
> >> Dave,
> >>
> >>
> >> one more question, are you running the openib/btl ? or other libraries
> such
> >> as MXM or UCX ?
> >>
> >>
> >> Cheers,
> >>
> >>
> >> Gilles
> >>
> >>
> >> On 1/24/2018 12:55 PM, Dave Turner wrote:
> >>>
> >>>
> >>>We compiled OpenMPI 2.1.1 using the EasyBuild configuration
> >>> for CentOS as below and tested on Mellanox QDR hardware.
> >>>
> >>> ./configure --prefix=/homes/daveturner/libs/openmpi-2.1.1c
> >>>  --enable-shared
> >>>  --enable-mpi-thread-multiple
> >>>  --with-verbs
> >>>  --enable-mpirun-prefix-by-default
> >>>  --with-mpi-cxx
> >>>  --enable-mpi-cxx
> >>>  --with-hwloc=$EBROOTHWLOC
> >>>  --disable-dlopen
> >>>
> >>> The red curve in the attached NetPIPE graph shows the poor performance
> >>> above
> >>> 8 kB for the uni-directional tests with bi-directional and aggregate
> >>> tests also showing similar problems.  When I compile using the same
> >>> configuration but with the --disable-dlopen parameter removed then the
> >>> performance is very good as the green curve in the graph shows.
> >>>
> >>> We see the same problems with OpenMPI 2.0.2.
> >>> Replacing --disable-dlopen with --disable-mca-dso showed good
> performance.
> >>> Replacing --disable-dlopen with --enable-static showed good
> performance.
> >>> So it's only --disable-dlopen that leads to poor performance.
> >>>
> >>> http://netpipe.cs.ksu.edu
> >>>
> >>>Dave Turner
> >>>
> >>> --
> >>> Work: davetur...@ksu.edu  (785)
> 532-7791
> >>>  2219 Engineering Hall, Manhattan KS  66506
> >>> Home: drdavetur...@gmail.com 
> >>>   cell: (785) 770-5929
> >>>
> >>>
> >>> ___
> >>> devel mailing list
> >>> devel@lists.open-mpi.org
> >>> https://lists.open-mpi.org/mailman/listinfo/devel
> >>
> >>
> >> ___
> >> devel mailing list
> >> devel@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/devel
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove 
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department
Lawrence Berkeley National Laboratory
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI 2.1.2rc3 available for testing

2017-08-31 Thread Paul Hargrove
Pretty much the same report as I gave for 3.0.0rc4 about 24 hours ago:

I have nearly completed my normal suite of tests.
> Only slow emulated 32-bit ARM and MIPS remain.

This time around I've dropped big-endian PPC (because Open MPI did).
> However, I've added Apple's public betas of Mac OSX High Sierra and Xcode
> 9.


I find no new issues and all those xlc bugs mysteriously vanished :-).
Seriously, the CMA compilation issue is confirmed fixed.

The only thing close to a "new issue" is almost certainly a clang bug on
FreeBSD-11.1/amd64.
That is probably worthy of a note in the README (similar, but not
identical, to the one added for 3.0.0).

-Paul


On Wed, Aug 30, 2017 at 1:48 PM, Howard Pritchard <hpprit...@gmail.com>
wrote:

> Hi Folks,
>
> Open MPI 2.1.2rc3 tarballs are available for testing at the usual place:
>
> https://www.open-mpi.org/software/ompi/v2.1/
>
> Fixes since rc2:
>
> Issue #4122: CMA compilation error in SM BTL.Thanks to Paul Hargrove
> for catching this.
> Issue #4034: NAG Fortran compiler -rpath configuration error.  Thanks to
> Neil Carlson for
> reporting.
>
> Also, removed support for big endian PPC and XL compilers older than 13.1.
>
> Thanks,
>
> Jeff and Howard
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread Paul Hargrove
A gcc-based build is fine.

So, I think this is similar to issue #3992
<https://github.com/open-mpi/ompi/issues/3992> in which we seem to have
decided that /usr/bin/cc (clang) is not to be trusted on this platform.

-Paul

On Wed, Aug 30, 2017 at 4:49 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> Ralph,
>
> See my response to Larry.  The impossibly large value was a figment of
> gdb's imagination.
>
> This system has worked for Open MPI when it was still at 11.0.
> I cannot say if the current problem is w/ FreeBSD-11.1 (e.g. its compiler)
> or with Open MPI.
>
> I am trying a gcc-based build now.
>
> -Pau
>
>
> On Wed, Aug 30, 2017 at 4:22 PM, r...@open-mpi.org <r...@open-mpi.org>
> wrote:
>
>> Yeah, that caught my eye too as that is impossibly large. We only have a
>> handful of active queues - looks to me like there is some kind of alignment
>> issue.
>>
>> Paul - has this configuration worked with prior versions of OMPI? Or is
>> this something new?
>>
>> Ralph
>>
>> On Aug 30, 2017, at 4:17 PM, Larry Baker <ba...@usgs.gov> wrote:
>>
>> Paul,
>>
>> (gdb) print base->nactivequeues
>>
>>
>> seems like an extraordinarily large number to me.  I don't know what the
>> implications are of the --enable-debug clang option is.  Any chance the
>> SEGFAULT is a debugging trap when an uninitialized value is encountered?
>>
>> The other thought I had is an alignment trap if, for example,
>> nactivequeues is a 64-bit int but is not 64-bit aligned.  As far as I can
>> tell, nactivequeues is a plain int.  But, what that is on FreeBSD/amd64, I
>> do not know.
>>
>> Should there be more information in dmesg or a system log file with the
>> trap code so you can identify whether it is an instruction fetch (VERY
>> unlikely), an operand fetch, or a store that caused the trap?
>>
>> Larry Baker
>> US Geological Survey
>> 650-329-5608 <(650)%20329-5608>
>> ba...@usgs.gov
>>
>>
>>
>> On 30 Aug 2017, at 3:17:05 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>
>> I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with
>>--prefix=[...] --enable-debug CC=clang CXX=clang++
>> --disable-mpi-fortran --with-hwloc=/usr/local
>>
>> The CC/CXX setting are to use the system default compilers (rather than
>> gcc/g++ in /usr/local/bin).
>> The --with-hwloc is to avoid issue #3992
>> <https://github.com/open-mpi/ompi/issues/3992> (though I have not
>> determined if that impacts this RC).
>>
>> When running ring_c I get a SEGV from orterun, for which a gdb backtrace
>> is given below.
>> The one surprising thing (highlighted) in the backtrace is that both the
>> RHS and LHS of the assignment appear to be valid memory locations.
>> So, if the backtrace is accurate then I am at a loss as to why a SEGV
>> occurs.
>>
>> -Paul
>>
>>
>> Program terminated with signal 11, Segmentation fault.
>> [...]
>> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base=> optimized out>, fd=,
>> events=2, callback=, arg=0x0)
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
>> 1779ev->ev_pri = base->nactivequeues / 2;
>> (gdb) print base->nactivequeues
>> $3 = 106201992
>> (gdb) print ev->ev_pri
>> $4 = 0 '\0'
>> (gdb) where
>> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base=> optimized out>, fd=,
>> events=2, callback=, arg=0x0)
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
>> #1  0x0008062e1fd2 in pmix_start_progress_thread ()
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83
>> #2  0x0008063047e4 in PMIx_server_init (module=0x806545be8,
>> info=0x802e16a00, ninfo=2)
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c:310
>> #3  0x0008062c12f6 in pmix1_server_init (module=0x800b106a0,
>> info=0x7fffe290)
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/opal/mca/pmix/pmix112/pmix1_server_south.c:140
>> #4  0x000800889f43 in pmix_server_init ()
>> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi
>> -2.1.2rc3/orte/orted/pmix/pmix_server.c:261
>> 

Re: [OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread Paul Hargrove
Ralph,

See my response to Larry.  The impossibly large value was a figment of
gdb's imagination.

This system has worked for Open MPI when it was still at 11.0.
I cannot say if the current problem is w/ FreeBSD-11.1 (e.g. its compiler)
or with Open MPI.

I am trying a gcc-based build now.

-Pau


On Wed, Aug 30, 2017 at 4:22 PM, r...@open-mpi.org <r...@open-mpi.org> wrote:

> Yeah, that caught my eye too as that is impossibly large. We only have a
> handful of active queues - looks to me like there is some kind of alignment
> issue.
>
> Paul - has this configuration worked with prior versions of OMPI? Or is
> this something new?
>
> Ralph
>
> On Aug 30, 2017, at 4:17 PM, Larry Baker <ba...@usgs.gov> wrote:
>
> Paul,
>
> (gdb) print base->nactivequeues
>
>
> seems like an extraordinarily large number to me.  I don't know what the
> implications are of the --enable-debug clang option is.  Any chance the
> SEGFAULT is a debugging trap when an uninitialized value is encountered?
>
> The other thought I had is an alignment trap if, for example,
> nactivequeues is a 64-bit int but is not 64-bit aligned.  As far as I can
> tell, nactivequeues is a plain int.  But, what that is on FreeBSD/amd64, I
> do not know.
>
> Should there be more information in dmesg or a system log file with the
> trap code so you can identify whether it is an instruction fetch (VERY
> unlikely), an operand fetch, or a store that caused the trap?
>
> Larry Baker
> US Geological Survey
> 650-329-5608 <(650)%20329-5608>
> ba...@usgs.gov
>
>
>
> On 30 Aug 2017, at 3:17:05 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with
>--prefix=[...] --enable-debug CC=clang CXX=clang++
> --disable-mpi-fortran --with-hwloc=/usr/local
>
> The CC/CXX setting are to use the system default compilers (rather than
> gcc/g++ in /usr/local/bin).
> The --with-hwloc is to avoid issue #3992
> <https://github.com/open-mpi/ompi/issues/3992> (though I have not
> determined if that impacts this RC).
>
> When running ring_c I get a SEGV from orterun, for which a gdb backtrace
> is given below.
> The one surprising thing (highlighted) in the backtrace is that both the
> RHS and LHS of the assignment appear to be valid memory locations.
> So, if the backtrace is accurate then I am at a loss as to why a SEGV
> occurs.
>
> -Paul
>
>
> Program terminated with signal 11, Segmentation fault.
> [...]
> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base= out>, fd=,
> events=2, callback=, arg=0x0)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
> 1779ev->ev_pri = base->nactivequeues / 2;
> (gdb) print base->nactivequeues
> $3 = 106201992
> (gdb) print ev->ev_pri
> $4 = 0 '\0'
> (gdb) where
> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base= out>, fd=,
> events=2, callback=, arg=0x0)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
> #1  0x0008062e1fd2 in pmix_start_progress_thread ()
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83
> #2  0x0008063047e4 in PMIx_server_init (module=0x806545be8,
> info=0x802e16a00, ninfo=2)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c:310
> #3  0x0008062c12f6 in pmix1_server_init (module=0x800b106a0,
> info=0x7fffe290)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix1_server_south.c:140
> #4  0x000800889f43 in pmix_server_init ()
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/orte/orted/pmix/pmix_server.c:261
> #5  0x000803e22d87 in rte_init ()
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:666
> #6  0x00080084a45e in orte_init (pargc=0x7fffe988,
> pargv=0x7fffe980, flags=4)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/orte/runtime/orte_init.c:226
> #7  0x004046a4 in orterun (argc=7, argv=0x7fffea18)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/orte/tools/orterun/orterun.c:831
> #8  0x00403bc2 in main (argc=7, argv=0x7fffea18)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/orte/tools/orterun/main.c:13
>
>
>
> --
&

Re: [OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread Paul Hargrove
Larry,

Thanks for the suggestions.

The system logs show only
   Aug 30 14:16:06 freebsd-amd64 kernel: pid 95624 (orterun), uid 19214:
exited on signal 11 (core dumped)

However, while "nactivequeues" is a 4-byte integer it *does* appear to be
misaligned (suggesting to me that "base" is bogus).

(gdb) print sizeof(base->nactivequeues)
$1 = 4
(gdb) print >nactivequeues
$2 = (int *) 0x100f7

Digging deeper it looks like the "base" being used by gdb is bogus.
Going up one stack frame to the caller, we see a totally different base
being passed:

(gdb) up
#1  0x0008062e1fd2 in pmix_start_progress_thread ()
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83
83  event_assign(_ev, ev_base, block_pipe[0],
(gdb) print ev_base
$6 = (pmix_event_base_t *) 0x2ec9500

I am distrusting gdb on this system.

Here is what lldb says:

(lldb) bt
* thread #1, name = 'orterun', stop reason = signal SIGSEGV
  * frame #0:
libopen-pal.so.19`opal_libevent2022_event_assign(ev=0x0008065482c0,
base=, fd=, events=2, callback=,
arg=0x) at event.c:1779
frame #1: mca_pmix_pmix112.so`pmix_start_progress_thread at
progress_threads.c:83
frame #2:
mca_pmix_pmix112.so`PMIx_server_init(module=0x000806545be8,
info=0x000802e16a00, ninfo=2) at pmix_server.c:310
frame #3:
mca_pmix_pmix112.so`pmix1_server_init(module=0x000800b106a0,
info=0x7fffe290) at pmix1_server_south.c:140
frame #4: libopen-rte.so.19`pmix_server_init at pmix_server.c:261
frame #5: mca_ess_hnp.so`rte_init at ess_hnp_module.c:666
frame #6: libopen-rte.so.19`orte_init(pargc=0x7fffe988,
pargv=0x7fffe980, flags=4) at orte_init.c:226
frame #7: orterun`orterun(argc=7, argv=0x7fffea18) at
orterun.c:831
frame #8: orterun`main(argc=7, argv=0x7fffea18) at main.c:13
frame #9: 0x00403a9f orterun`_start + 383
(lldb) up
frame #1: mca_pmix_pmix112.so`pmix_start_progress_thread at
progress_threads.c:83
   80   event_base_free(ev_base);
   81   return NULL;
   82   }
-> 83   event_assign(_ev, ev_base, block_pipe[0],
   84EV_READ, wakeup, NULL);
   85   event_add(_ev, 0);
   86   evlib_active = true;
(lldb) print ev_base
(pmix_event_base_t *) $2 = 0x02ec9500
(lldb) print *ev_base
error: Couldn't apply expression side effects : Couldn't dematerialize a
result variable: couldn't read its memory

So, it looks like the SEGV is due to a bad 2nd argument to event_assign().

-Paul

On Wed, Aug 30, 2017 at 4:17 PM, Larry Baker <ba...@usgs.gov> wrote:

> Paul,
>
> (gdb) print base->nactivequeues
>
>
> seems like an extraordinarily large number to me.  I don't know what the
> implications are of the --enable-debug clang option is.  Any chance the
> SEGFAULT is a debugging trap when an uninitialized value is encountered?
>
> The other thought I had is an alignment trap if, for example,
> nactivequeues is a 64-bit int but is not 64-bit aligned.  As far as I can
> tell, nactivequeues is a plain int.  But, what that is on FreeBSD/amd64, I
> do not know.
>
> Should there be more information in dmesg or a system log file with the
> trap code so you can identify whether it is an instruction fetch (VERY
> unlikely), an operand fetch, or a store that caused the trap?
>
> Larry Baker
> US Geological Survey
> 650-329-5608 <(650)%20329-5608>
> ba...@usgs.gov
>
>
>
> On 30 Aug 2017, at 3:17:05 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with
>--prefix=[...] --enable-debug CC=clang CXX=clang++
> --disable-mpi-fortran --with-hwloc=/usr/local
>
> The CC/CXX setting are to use the system default compilers (rather than
> gcc/g++ in /usr/local/bin).
> The --with-hwloc is to avoid issue #3992
> <https://github.com/open-mpi/ompi/issues/3992> (though I have not
> determined if that impacts this RC).
>
> When running ring_c I get a SEGV from orterun, for which a gdb backtrace
> is given below.
> The one surprising thing (highlighted) in the backtrace is that both the
> RHS and LHS of the assignment appear to be valid memory locations.
> So, if the backtrace is accurate then I am at a loss as to why a SEGV
> occurs.
>
> -Paul
>
>
> Program terminated with signal 11, Segmentation fault.
> [...]
> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base= out>, fd=,
> events=2, callback=, arg=0x0)
> at /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/
> openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
> 1779ev->ev_pri = base->nactivequeues / 2;
> (gdb) print base->nactivequeues
> $3 = 106201992
>

[OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread Paul Hargrove
I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with
   --prefix=[...] --enable-debug CC=clang CXX=clang++ --disable-mpi-fortran
--with-hwloc=/usr/local

The CC/CXX setting are to use the system default compilers (rather than
gcc/g++ in /usr/local/bin).
The --with-hwloc is to avoid issue #3992
 (though I have not
determined if that impacts this RC).

When running ring_c I get a SEGV from orterun, for which a gdb backtrace is
given below.
The one surprising thing (highlighted) in the backtrace is that both the
RHS and LHS of the assignment appear to be valid memory locations.
So, if the backtrace is accurate then I am at a loss as to why a SEGV
occurs.

-Paul


Program terminated with signal 11, Segmentation fault.
[...]
#0  opal_libevent2022_event_assign (ev=0x8065482c0, base=, fd=,
events=2, callback=, arg=0x0)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
1779ev->ev_pri = base->nactivequeues / 2;
(gdb) print base->nactivequeues
$3 = 106201992
(gdb) print ev->ev_pri
$4 = 0 '\0'
(gdb) where
#0  opal_libevent2022_event_assign (ev=0x8065482c0, base=, fd=,
events=2, callback=, arg=0x0)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
#1  0x0008062e1fd2 in pmix_start_progress_thread ()
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83
#2  0x0008063047e4 in PMIx_server_init (module=0x806545be8,
info=0x802e16a00, ninfo=2)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c:310
#3  0x0008062c12f6 in pmix1_server_init (module=0x800b106a0,
info=0x7fffe290)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix1_server_south.c:140
#4  0x000800889f43 in pmix_server_init ()
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/orted/pmix/pmix_server.c:261
#5  0x000803e22d87 in rte_init ()
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:666
#6  0x00080084a45e in orte_init (pargc=0x7fffe988,
pargv=0x7fffe980, flags=4)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/runtime/orte_init.c:226
#7  0x004046a4 in orterun (argc=7, argv=0x7fffea18)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/tools/orterun/orterun.c:831
#8  0x00403bc2 in main (argc=7, argv=0x7fffea18)
at
/home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/tools/orterun/main.c:13



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI 3.0.0rc4 available

2017-08-29 Thread Paul Hargrove
I have nearly completed my normal suite of tests.
Only slow emulated 32-bit ARM and MIPS remain.

This time around I've dropped big-endian PPC (because Open MPI did).
However, I've added Apple's public betas of Mac OSX High Sierra and Xcode 9.

I have no new issues to report, and the ones I raised in testing previous
3.0.0 RCs appear to have been resolved as their respective PRs intended.
Looks good.

-Paul

On Tue, Aug 29, 2017 at 10:55 AM, Barrett, Brian via devel <
devel@lists.open-mpi.org> wrote:

> The fourth release candidate for Open MPI 3.0.0 is now available for
> download.  Changes since rc2 include:
>
> * Better handling of OPAL_PREFIX for PMIx
> * Update to hwloc 1.11.7
> * Sync with PMIx 2.0.1
> * Revert atomics behavior to that of the 2.x series
> * usnic, openib, and portals4 bug fixes
> * Add README notes about older versions of the XL compiler
> * Use UCX by default if found in usual locations
>
> We will be releasing rc5 later this week, which we hope will be the basis
> of the 3.0.0 release early next week.  So the rc4 release is the last
> chance to get bugs addressed.  Please give it a go on your systems.  Open
> MPI 3.0.0rc4 can be downloaded from:
>
> https://www.open-mpi.org/software/ompi/v3.0/
>
>
> Thanks,
>
> The Open MPI Team
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.2rc2] CMA build failure on Linux/SPARC64

2017-08-21 Thread Paul Hargrove
Takahiro,

This is a Debian/Sid system w/ glibc-2.24.

The patch you pointed me at does appear to fix the problem!
I will note this in your PRs.

-Paul

On Mon, Aug 21, 2017 at 9:17 PM, Kawashima, Takahiro <
t-kawash...@jp.fujitsu.com> wrote:

> Paul,
>
> Did you upgrade glibc or something? I suspect newer glibc
> supports process_vm_readv and process_vm_writev and output
> of configure script changed. My Linux/SPARC64 with old glibc
> can compile Open MPI 2.1.2rc2 (CMA is disabled).
>
> To fix this, we need to cherry-pick d984b4b. Could you test the
> d984b4b patch? I cannot test it because I cannot update glibc.
> If it is fine, I'll create a PR for v2.x branch.
>
>   https://github.com/open-mpi/ompi/commit/d984b4b
>
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
>
> > Two things to note:
> >
> > 1) This is *NOT* present in 3.0.0rc2, thought I don't know what has
> changed.
> >
> > 2) Here are the magic numbers:
> > /usr/include/sparc64-linux-gnu/asm/unistd.h:#define
> __NR_process_vm_readv
> > 338
> > /usr/include/sparc64-linux-gnu/asm/unistd.h:#define
> __NR_process_vm_writev
> >  339
> >
> > -Paul
> >
> > On Mon, Aug 21, 2017 at 6:56 PM, Paul Hargrove <phhargr...@lbl.gov>
> wrote:
> >
> > > Both the v9 and v8+ ABIs on a Linux/SPARC64 system are failing "make
> all"
> > > with the error below.
> > >
> > > -Paul
> > >
> > > make[2]: Entering directory '/home/phargrov/OMPI/openmpi-
> > > 2.1.2rc2-linux-sparcv9/BLD/opal/mca/btl/sm'
> > >   CC   mca_btl_sm_la-btl_sm.lo
> > > In file included from /home/phargrov/OMPI/openmpi-2.
> > > 1.2rc2-linux-sparcv9/openmpi-2.1.2rc2/opal/mca/btl/sm/btl_sm.c:45:0:
> > > /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> > > 2.1.2rc2/opal/include/opal/sys/cma.h:101:2: error: #error "Unsupported
> > > architecture for process_vm_readv and process_vm_writev syscalls"
> > >  #error "Unsupported architecture for process_vm_readv and
> > > process_vm_writev syscalls"
> > >   ^
> > > /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> > > 2.1.2rc2/opal/include/opal/sys/cma.h: In function  process_vm_readv:
> > > /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> > > 2.1.2rc2/opal/include/opal/sys/cma.h:113:18: error:
> > > __NR_process_vm_readv undeclared (first use in this function); did you
> > > mean process_vm_readv?
> > >return syscall(__NR_process_vm_readv, pid, lvec, liovcnt, rvec,
> > > riovcnt, flags);
> > >   ^
> > >   process_vm_readv
> > > /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> > > 2.1.2rc2/opal/include/opal/sys/cma.h:113:18: note: each undeclared
> > > identifier is reported only once for each function it appears in
> > > /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> > > 2.1.2rc2/opal/include/opal/sys/cma.h: In function  process_vm_writev:
> > > /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> > > 2.1.2rc2/opal/include/opal/sys/cma.h:124:18: error:
> > > __NR_process_vm_writev undeclared (first use in this function); did you
> > > mean process_vm_writev?
> > >return syscall(__NR_process_vm_writev, pid, lvec, liovcnt, rvec,
> > > riovcnt, flags);
> > >   ^~
> > >   process_vm_writev
> > > Makefile:1838: recipe for target 'mca_btl_sm_la-btl_sm.lo' failed
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.2rc2] CMA build failure on Linux/SPARC64

2017-08-21 Thread Paul Hargrove
Two things to note:

1) This is *NOT* present in 3.0.0rc2, thought I don't know what has changed.

2) Here are the magic numbers:
/usr/include/sparc64-linux-gnu/asm/unistd.h:#define __NR_process_vm_readv
338
/usr/include/sparc64-linux-gnu/asm/unistd.h:#define __NR_process_vm_writev
 339

-Paul

On Mon, Aug 21, 2017 at 6:56 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> Both the v9 and v8+ ABIs on a Linux/SPARC64 system are failing "make all"
> with the error below.
>
> -Paul
>
> make[2]: Entering directory '/home/phargrov/OMPI/openmpi-
> 2.1.2rc2-linux-sparcv9/BLD/opal/mca/btl/sm'
>   CC   mca_btl_sm_la-btl_sm.lo
> In file included from /home/phargrov/OMPI/openmpi-2.
> 1.2rc2-linux-sparcv9/openmpi-2.1.2rc2/opal/mca/btl/sm/btl_sm.c:45:0:
> /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> 2.1.2rc2/opal/include/opal/sys/cma.h:101:2: error: #error "Unsupported
> architecture for process_vm_readv and process_vm_writev syscalls"
>  #error "Unsupported architecture for process_vm_readv and
> process_vm_writev syscalls"
>   ^
> /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> 2.1.2rc2/opal/include/opal/sys/cma.h: In function  process_vm_readv�:
> /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> 2.1.2rc2/opal/include/opal/sys/cma.h:113:18: error:
> �__NR_process_vm_readv� undeclared (first use in this function); did you
> mean �process_vm_readv�?
>return syscall(__NR_process_vm_readv, pid, lvec, liovcnt, rvec,
> riovcnt, flags);
>   ^
>   process_vm_readv
> /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> 2.1.2rc2/opal/include/opal/sys/cma.h:113:18: note: each undeclared
> identifier is reported only once for each function it appears in
> /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> 2.1.2rc2/opal/include/opal/sys/cma.h: In function  process_vm_writev�:
> /home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-
> 2.1.2rc2/opal/include/opal/sys/cma.h:124:18: error:
> �__NR_process_vm_writev� undeclared (first use in this function); did you
> mean �process_vm_writev�?
>return syscall(__NR_process_vm_writev, pid, lvec, liovcnt, rvec,
> riovcnt, flags);
>   ^~
>   process_vm_writev
> Makefile:1838: recipe for target 'mca_btl_sm_la-btl_sm.lo' failed
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> <(510)%20495-2352>
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <(510)%20486-6900>
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] [2.1.2rc2] CMA build failure on Linux/SPARC64

2017-08-21 Thread Paul Hargrove
Both the v9 and v8+ ABIs on a Linux/SPARC64 system are failing "make all"
with the error below.

-Paul

make[2]: Entering directory
'/home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/BLD/opal/mca/btl/sm'
  CC   mca_btl_sm_la-btl_sm.lo
In file included from
/home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-2.1.2rc2/opal/mca/btl/sm/btl_sm.c:45:0:
/home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-2.1.2rc2/opal/include/opal/sys/cma.h:101:2:
error: #error "Unsupported architecture for process_vm_readv and
process_vm_writev syscalls"
 #error "Unsupported architecture for process_vm_readv and
process_vm_writev syscalls"
  ^
/home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-2.1.2rc2/opal/include/opal/sys/cma.h:
In function  process_vm_readv�:
/home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-2.1.2rc2/opal/include/opal/sys/cma.h:113:18:
error: �__NR_process_vm_readv� undeclared (first use in this function); did
you mean �process_vm_readv�?
   return syscall(__NR_process_vm_readv, pid, lvec, liovcnt, rvec, riovcnt,
flags);
  ^
  process_vm_readv
/home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-2.1.2rc2/opal/include/opal/sys/cma.h:113:18:
note: each undeclared identifier is reported only once for each function it
appears in
/home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-2.1.2rc2/opal/include/opal/sys/cma.h:
In function  process_vm_writev�:
/home/phargrov/OMPI/openmpi-2.1.2rc2-linux-sparcv9/openmpi-2.1.2rc2/opal/include/opal/sys/cma.h:124:18:
error: �__NR_process_vm_writev� undeclared (first use in this function);
did you mean �process_vm_writev�?
   return syscall(__NR_process_vm_writev, pid, lvec, liovcnt, rvec,
riovcnt, flags);
  ^~
  process_vm_writev
Makefile:1838: recipe for target 'mca_btl_sm_la-btl_sm.lo' failed

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI v2.1.2rc1 available

2017-08-15 Thread Paul Hargrove
I have not yet had a chance to run this RC through all it paces.

However, I can say that I have successfully built and run this RC on a
system with Apple's latest public Betas of Mac OS High Sierra and Xcode 9.

-Paul

On Thu, Aug 10, 2017 at 11:47 AM, Howard Pritchard 
wrote:

> Hi Folks,
>
>
> Open MPI v2.1.2rc1 tarballs are available for testing at the usual
>
> place:
>
> https://www.open-mpi.org/software/ompi/v2.1/
>
>
> There is an outstanding issue which will be fixed before the final release:
>
>
> https://github.com/open-mpi/ompi/issues/4069
>
>
> but we wanted to get an rc1 out to see what else we may need
>
> to fix.
>
>
> Bug fixes/changes in this release include:
>
>
> - Remove IB XRC support from the OpenIB BTL due to loss of maintainer.
> - Fix a problem with MPI_IALLTOALLW when using zero-length messages.
>   Thanks to Dahai Guo for reporting.
> - Fix a problem with C11 generic type interface for SHMEM_G.  Thanks
>   to Nick Park for reporting.
> - Switch to using the lustreapi.h include file when building Open MPI
>   with Lustre support.
> - Fix a problem in the OB1 PML that led to hangs with OSU collective tests.
> - Fix a progression issue with MPI_WIN_FLUSH_LOCAL.  Thanks to
>   Joseph Schuchart for reporting.
> - Fix an issue with recent versions of PBSPro requiring libcrypto.
>   Thanks to Petr Hanousek for reporting.
> - Fix a problem when using MPI_ANY_SOURCE with MPI_SENDRECV.
> - Fix an issue that prevented signals from being propagated to ORTE
>   daemons.
> - Ensure that signals are forwarded from ORTE daemons to all processes
>   in the process group created by the daemons.  Thanks to Ted Sussman
>   for reporting.
> - Fix a problem with launching a job under a debugger. Thanks to
>   Greg Lee for reporting.
> - Fix a problem with Open MPI native I/O MPI_FILE_OPEN when using
>   a communicator having an associated topology.  Thanks to
>   Wei-keng Liao for reporting.
> - Fix an issue when using MPI_ACCUMULATE with derived datatypes.
> - Fix a problem with Fortran bindings that led to compilation errors
>   for user defined reduction operations.  Thanks to Nathan Weeks for
>   reporting.
> - Fix ROMIO issues with large writes/reads when using NFS file systems.
> - Fix definition of Fortran MPI_ARGV_NULL and MPI_ARGVS_NULL.
> - Enable use of the head node of a SLURM allocation on Cray XC systems.
> - Fix a problem with synchronous sends when using the UCX PML.
> - Use default socket buffer size to improve TCP BTL performance.
>
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] v3.0.0 blocker issues

2017-08-01 Thread Paul Hargrove
#3993 in particular has "xlc" in the description, but appears to be caused
by bogus .m4.

-Paul

On Tue, Aug 1, 2017 at 3:07 PM, Barrett, Brian via devel <
devel@lists.open-mpi.org> wrote:

> Here’s the full list: https://github.com/open-mpi/
> ompi/issues?q=is%3Aissue%20is%3Aopen%20label%3A%22Target%3A%
> 203.0.x%22%20label%3A%22blocker%22
>
> There’s obviously a bunch of XLC issues in there, and IBM’s working on the
> right documentation / configure checks so that we feel comfortable
> releasing with at least documented XLC support.
>
> However, there’s a number of issues outside of XLC that, if nothing else,
> could use some updates on the tickets.  We can’t release 3.0.0 with a bunch
> of blocker bugs, so any time you can spend knocking down the bug list would
> be appreciated.
>
> Brian
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel




-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] [3.0.0rc2] FreeBSD: divide-by-zero in hwloc

2017-07-31 Thread Paul Hargrove
On FreeBSD-11/amd64:

$ mpirun -mca btl vader,self -np 2 examples/ring_c
[freebsd-amd64:25312] *** Process received signal ***
[freebsd-amd64:25312] Signal: Floating point exception (8)
[freebsd-amd64:25312] Signal code: Integer divide-by-zero (2)
[freebsd-amd64:25312] Failing at address: 0x800c13822
[freebsd-amd64:25312] [ 0] 0x8016e5934  at
/lib/libthr.so.3
[freebsd-amd64:25312] [ 1] 0x8016e4ecf  at
/lib/libthr.so.3
[freebsd-amd64:25312] *** End of error message ***


gdb's backtrace, however, looks different:

#0  0x000800c13822 in look_proc (backend=0x802439c20,
infos=0x802433640, highest_cpuid=13,
highest_ext_cpuid=2147483658, features=0x7fffdd20,
cpuid_type=57921536)
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/opal/mca/hwloc/hwloc1113/hwloc/src/topology-x86.c:458
458 cache->cacheid = infos->apicid / cache->nbthreads_sharing;
#1  0x000800c12749 in look_procs (backend=0x802439c20,
infos=0x802433640, fulldiscovery=1,
highest_cpuid=13, highest_ext_cpuid=2147483658,
features=0x7fffdd20, cpuid_type=57921536,
get_cpubind=0x800c0e3e0 ,
set_cpubind=0x800c0e390 )
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/opal/mca/hwloc/hwloc1113/hwloc/src/topology-x86.c:905
#2  0x000800c123dc in hwloc_look_x86 (backend=0x802439c20,
fulldiscovery=1)
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/opal/mca/hwloc/hwloc1113/hwloc/src/topology-x86.c:1056
#3  0x000800c11efe in hwloc_x86_discover (backend=0x802439c20)
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/opal/mca/hwloc/hwloc1113/hwloc/src/topology-x86.c:1127
#4  0x000800c21d16 in hwloc_discover (topology=0x8024e6000)
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/opal/mca/hwloc/hwloc1113/hwloc/src/topology.c:2499
#5  0x000800c21b9c in opal_hwloc1113_hwloc_topology_load
(topology=0x8024e6000)
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/opal/mca/hwloc/hwloc1113/hwloc/src/topology.c:2994
#6  0x000800bf9768 in opal_hwloc_base_get_topology ()
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/opal/mca/hwloc/base/hwloc_base_util.c:310
#7  0x000803219c3a in rte_init ()
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/orte/mca/ess/hnp/ess_hnp_module.c:213
#8  0x000800840502 in orte_init (pargc=0x7fffe888,
pargv=0x7fffe880, flags=4)
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/orte/runtime/orte_init.c:247
#9  0x00080088bae7 in orte_submit_init (argc=7, argv=0x7fffea18,
opts=0x0)
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/orte/orted/orted_submit.c:524
#10 0x004012c3 in orterun (argc=7, argv=0x7fffea18)
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/orte/tools/orterun/orterun.c:137
#11 0x00401292 in main (argc=7, argv=0x7fffea18)
at
/home/phargrov/OMPI/openmpi-3.0.0rc2-freebsd11-amd64/openmpi-3.0.0rc2/orte/tools/orterun/main.c:13

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] [3.0.0rc2] pmix2x/libz link failure on NetBSD7

2017-07-31 Thread Paul Hargrove
I am seeing the following on NetBSD-7.2/amd64:

/bin/sh ../../../libtool  --tag=CC--mode=link gcc -std=gnu99  -g
-finline-functions -fno-strict-aliasing -mcx16 -pthread-o opal_wrapper
opal_wrapper.o ../../../opal/libopen-pal.la -lrt -lexecinfo -lm -lutil
libtool: link: gcc -std=gnu99 -g -finline-functions -fno-strict-aliasing
-mcx16 -pthread -o opal_wrapper opal_wrapper.o
 ../../../opal/.libs/libopen-pal.a -lpthread -lrt -lexecinfo -lm -lutil
-pthread
../../../opal/.libs/libopen-pal.a(compress.o): In function
`pmix_util_compress_string':
/home/phargrov/OMPI/openmpi-3.0.0rc2-netbsd7-amd64-static/openmpi-3.0.0rc2/opal/mca/pmix/pmix2x/pmix/src/util/compress.c:41:
undefined reference to `deflateInit_'
/home/phargrov/OMPI/openmpi-3.0.0rc2-netbsd7-amd64-static/openmpi-3.0.0rc2/opal/mca/pmix/pmix2x/pmix/src/util/compress.c:44:
undefined reference to `deflateBound'
/home/phargrov/OMPI/openmpi-3.0.0rc2-netbsd7-amd64-static/openmpi-3.0.0rc2/opal/mca/pmix/pmix2x/pmix/src/util/compress.c:57:
undefined reference to `deflate'
/home/phargrov/OMPI/openmpi-3.0.0rc2-netbsd7-amd64-static/openmpi-3.0.0rc2/opal/mca/pmix/pmix2x/pmix/src/util/compress.c:58:
undefined reference to `deflateEnd'
../../../opal/.libs/libopen-pal.a(compress.o): In function
`pmix_util_uncompress_string':
/home/phargrov/OMPI/openmpi-3.0.0rc2-netbsd7-amd64-static/openmpi-3.0.0rc2/opal/mca/pmix/pmix2x/pmix/src/util/compress.c:119:
undefined reference to `inflateInit_'
/home/phargrov/OMPI/openmpi-3.0.0rc2-netbsd7-amd64-static/openmpi-3.0.0rc2/opal/mca/pmix/pmix2x/pmix/src/util/compress.c:128:
undefined reference to `inflate'
/home/phargrov/OMPI/openmpi-3.0.0rc2-netbsd7-amd64-static/openmpi-3.0.0rc2/opal/mca/pmix/pmix2x/pmix/src/util/compress.c:129:
undefined reference to `inflateEnd'

It looks like configure found libz:
  checking for library containing deflate... -lz
and yet -lz doesn't appear in the link command.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] [3.0.0rc2] yoinks, indeed

2017-07-31 Thread Paul Hargrove
I have an x86 Linux system where configuring Open MPI 3.0.0rc2 yields:

configure: WARNING: lib fabric requires both libnl v1 and libnl v3 --
yoinks!
configure: WARNING: This is a configuration that is known to cause run-time
crashes
configure: error: Cannot continue

I am happy to disable libfabric support to test this system.
However, if there is info somebody wants/needs to collect from this system
let me know.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] [3.0.0rc2] spurious xlc configure failures

2017-07-31 Thread Paul Hargrove
With xlc-12.1 I am seeing spurious failures of configure tests due to
multiple definitions of OPAL_ASM_SYNC_HAVE_64BIT:

"conftest.c", line 211.9: 1506-236 (W) Macro name OPAL_ASM_SYNC_HAVE_64BIT
has been redefined.
"conftest.c", line 211.9: 1506-358 (I) "OPAL_ASM_SYNC_HAVE_64BIT" is
defined on line 209 of conftest.c.

Indeed the generated boilerplate in each conftest.c past a certain point
contains two *different* definitions of this macro on lines 209 and 211:

 [...]
 #define OPAL_ASM_SYNC_HAVE_64BIT
 #define OPAL_HAVE_GCC_BUILTIN_CSWAP_INT128 0
 #define OPAL_ASM_SYNC_HAVE_64BIT 1
 [...]


-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] [3.0.0rc1] PMIX ERROR: UNPACK-INADEQUATE-SPACE

2017-07-04 Thread Paul Hargrove
I don't have an external PMIx.

Configure options: --prefix=[...] --enable-debug CC=... CXX=... FC=...
One of the two also had --with-libfabric=/usr/local/pkg/libfabric-1.4.0

If it might matter, one of these systems has 14 interfaces listed by
/usr/sbin/ifconfig and the other has 8.

-Paul

On Tue, Jul 4, 2017 at 9:46 AM, Artem Polyakov <artpo...@gmail.com> wrote:

> Hello, Paul.
> How OMPI was configured ? Were you by chance using external PMIx?
>
> пн, 3 июля 2017 г. в 18:43, Paul Hargrove <phhargr...@lbl.gov>:
>
>>
>> On (at least) two different hosts (both Linux, one x86-64 and one
>> ppc64el) I am seeing a failure to launch ring_c with errors like those
>> shown below.
>>
>> -Paul
>>
>> $ mpirun -mca btl sm,self -np 2 examples/ring_c
>> [pcp-d-1:02255] PMIX ERROR: UNPACK-INADEQUATE-SPACE in file
>> /home/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-pathcc-6/
>> openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line
>> 2477
>> [pcp-d-1:02255] PMIX ERROR: ERROR in file /home/phargrov/OMPI/openmpi-3.
>> 0.0rc1-linux-x86_64-pathcc-6/openmpi-3.0.0rc1/opal/mca/
>> pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line 2024
>> [pcp-d-1:02255] PMIX ERROR: UNPACK-INADEQUATE-SPACE in file
>> /home/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-pathcc-6/
>> openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line
>> 1150
>> [pcp-d-1:02255] PMIX ERROR: UNPACK-INADEQUATE-SPACE in file
>> /home/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-pathcc-6/
>> openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/common/pmix_jobdata.c at
>> line 112
>> [pcp-d-1:02255] PMIX ERROR: UNPACK-INADEQUATE-SPACE in file
>> /home/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-pathcc-6/
>> openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/common/pmix_jobdata.c at
>> line 392
>> [pcp-d-1:02255] PMIX ERROR: UNPACK-INADEQUATE-SPACE in file
>> /home/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-pathcc-6/
>> openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/server/pmix_server.c at
>> line 518
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> <(510)%20495-2352>
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> <(510)%20486-6900>
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
> --
> - Best regards, Artem Polyakov (Mobile mail)
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [3.0.0rc1] PMIX ERROR: UNPACK-INADEQUATE-SPACE

2017-07-03 Thread Paul Hargrove
On (at least) two different hosts (both Linux, one x86-64 and one ppc64el)
I am seeing a failure to launch ring_c with errors like those shown below.

-Paul

$ mpirun -mca btl sm,self -np 2 examples/ring_c
[pcp-d-1:02255] PMIX ERROR: UNPACK-INADEQUATE-SPACE in file
/home/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-pathcc-6/openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c
at line 2477
[pcp-d-1:02255] PMIX ERROR: ERROR in file
/home/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-pathcc-6/openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c
at line 2024
[pcp-d-1:02255] PMIX ERROR: UNPACK-INADEQUATE-SPACE in file
/home/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-pathcc-6/openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c
at line 1150
[pcp-d-1:02255] PMIX ERROR: UNPACK-INADEQUATE-SPACE in file
/home/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-pathcc-6/openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/common/pmix_jobdata.c
at line 112
[pcp-d-1:02255] PMIX ERROR: UNPACK-INADEQUATE-SPACE in file
/home/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-pathcc-6/openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/common/pmix_jobdata.c
at line 392
[pcp-d-1:02255] PMIX ERROR: UNPACK-INADEQUATE-SPACE in file
/home/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-pathcc-6/openmpi-3.0.0rc1/opal/mca/pmix/pmix2x/pmix/src/server/pmix_server.c
at line 518

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [3.0.0rc1] ppc64/gcc-4.8.3 check failure (regression).

2017-07-03 Thread Paul Hargrove
On a PPC64LE w/ gcc-7.1.0 I see opal_fifo hang instead of failing.

-Paul

On Mon, Jul 3, 2017 at 4:39 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> On a PPC64 host with gcc-4.8.3 I have configured with
>
>  --prefix=[...] --enable-debug \
> CFLAGS=-m64 --with-wrapper-cflags=-m64 \
> CXXFLAGS=-m64 --with-wrapper-cxxflags=-m64 \
> FCFLAGS=-m64 --with-wrapper-fcflags=-m64
>
> I see "make check" report a failure from opal_fifo.
> Previous testing of Open MPI 2.1.1rc1 did not fail this test.
>
>
> I also noticed the following warnings from building opal_lifo and
> opal_fifo tests, but have found that adding the volatile qualifier in
> opal_fifo.c did *not* resolve the failure.
>
>   CC   opal_lifo.o
> /home/phargrov/OMPI/openmpi-3.0.0rc1-linux-ppc64-gcc/
> openmpi-3.0.0rc1/test/class/opal_lifo.c: In function
> 'check_lifo_consistency':
> /home/phargrov/OMPI/openmpi-3.0.0rc1-linux-ppc64-gcc/
> openmpi-3.0.0rc1/test/class/opal_lifo.c:72:26: warning: assignment
> discards 'volatile' qualifier from pointer target type [enabled by default]
>  for (count = 0, item = lifo->opal_lifo_head.data.item ; item !=
> >opal_lifo_ghost ;
>   ^
>   CCLD opal_lifo
>   CC   opal_fifo.o
> /home/phargrov/OMPI/openmpi-3.0.0rc1-linux-ppc64-gcc/
> openmpi-3.0.0rc1/test/class/opal_fifo.c: In function
> 'check_fifo_consistency':
> /home/phargrov/OMPI/openmpi-3.0.0rc1-linux-ppc64-gcc/
> openmpi-3.0.0rc1/test/class/opal_fifo.c:109:26: warning: assignment
> discards 'volatile' qualifier from pointer target type [enabled by default]
>  for (count = 0, item = fifo->opal_fifo_head.data.item ; item !=
> >opal_fifo_ghost ;
>
>
>
> -Paul
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> <(510)%20495-2352>
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <(510)%20486-6900>
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [3.0.0rc1] ppc64/gcc-4.8.3 check failure (regression).

2017-07-03 Thread Paul Hargrove
On a PPC64 host with gcc-4.8.3 I have configured with

 --prefix=[...] --enable-debug \
CFLAGS=-m64 --with-wrapper-cflags=-m64 \
CXXFLAGS=-m64 --with-wrapper-cxxflags=-m64 \
FCFLAGS=-m64 --with-wrapper-fcflags=-m64

I see "make check" report a failure from opal_fifo.
Previous testing of Open MPI 2.1.1rc1 did not fail this test.


I also noticed the following warnings from building opal_lifo and opal_fifo
tests, but have found that adding the volatile qualifier in opal_fifo.c did
*not* resolve the failure.

  CC   opal_lifo.o
/home/phargrov/OMPI/openmpi-3.0.0rc1-linux-ppc64-gcc/openmpi-3.0.0rc1/test/class/opal_lifo.c:
In function 'check_lifo_consistency':
/home/phargrov/OMPI/openmpi-3.0.0rc1-linux-ppc64-gcc/openmpi-3.0.0rc1/test/class/opal_lifo.c:72:26:
warning: assignment discards 'volatile' qualifier from pointer target type
[enabled by default]
 for (count = 0, item = lifo->opal_lifo_head.data.item ; item !=
>opal_lifo_ghost ;
  ^
  CCLD opal_lifo
  CC   opal_fifo.o
/home/phargrov/OMPI/openmpi-3.0.0rc1-linux-ppc64-gcc/openmpi-3.0.0rc1/test/class/opal_fifo.c:
In function 'check_fifo_consistency':
/home/phargrov/OMPI/openmpi-3.0.0rc1-linux-ppc64-gcc/openmpi-3.0.0rc1/test/class/opal_fifo.c:109:26:
warning: assignment discards 'volatile' qualifier from pointer target type
[enabled by default]
 for (count = 0, item = fifo->opal_fifo_head.data.item ; item !=
>opal_fifo_ghost ;



-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [3.0.0rc1] ILP32 build failures

2017-07-03 Thread Paul Hargrove
Below is the corresponding failure at build time (rather than link time)
for "clang -m32" on Linux/x86-64.

-Paul

libtool: compile:  clang -DHAVE_CONFIG_H -I.
-I/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal
-I../opal/include -I../ompi/include -I../oshmem/include
-I../opal/mca/hwloc/hwloc1113/hwloc/include/private/autogen
-I../opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/autogen
-I../ompi/mpiext/cuda/c
-I/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1
-I..
-I/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal/include
-I/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/orte/include
-I../orte/include
-I/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/ompi/include
-I/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/oshmem/include
-I/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/BLD/opal/mca/event/libevent2022/libevent/include
-I/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal/mca/event/libevent2022/libevent
-I/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal/mca/event/libevent2022/libevent/include
-I/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/BLD/opal/mca/hwloc/hwloc1113/hwloc/include
-I/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal/mca/hwloc/hwloc1113/hwloc/include
-DOPAL_CONFIGURE_HOST=\"pcp-f-5\" -m32 -g -finline-functions
-fno-strict-aliasing -pthread -MT class/opal_list.lo -MD -MP -MF
class/.deps/opal_list.Tpo -c
/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal/class/opal_list.c
 -fPIC -DPIC -o class/.libs/opal_list.o
In file included from
/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal/class/opal_list.c:22:
In file included from
/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal/class/opal_list.h:73:
In file included from
/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal/class/opal_object.h:126:
In file included from
/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal/threads/thread_usage.h:30:
In file included from
/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal/include/opal/sys/atomic.h:155:
/scratch/phargrov/OMPI/openmpi-3.0.0rc1-linux-x86_64-clang-m32/openmpi-3.0.0rc1/opal/include/opal/sys/gcc_builtin/atomic.h:150:12:
error: cannot compile this atomic library call yet
return __atomic_add_fetch (addr, delta, __ATOMIC_RELAXED);
   ^~
1 error generated.

On Mon, Jul 3, 2017 at 3:55 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> On every ILP32 build I try w/ gcc, I get the following.
>
> ../../../opal/.libs/libopen-pal.so: undefined reference to
> `__atomic_fetch_add_8'
> collect2: error: ld returned 1 exit status
> make[2]: *** [opal_wrapper] Error 1
>
> This includes at least x86 and ppc32 (probably MIPS and ARM, but those are
> too slow to have finished configure yet).
>
> -Paul
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> <(510)%20495-2352>
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <(510)%20486-6900>
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [3.0.0rc1] XLC build error

2017-07-03 Thread Paul Hargrove
With xlc-12.1 and 13.1 (big-endian PPC64, both LP64 and ILP32) there are
numerous problems compiling with ompi_datatype_module.c (a prefix of which
are shown below).

-Paul


libtool: compile:  xlc -DHAVE_CONFIG_H -I.
-I/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype
-I../../opal/include -I../../ompi/include -I../../oshmem/include
-I../../opal/mca/hwloc/hwloc1113/hwloc/include/private/autogen
-I../../opal/mca/hwloc/hwloc1113/hwloc/include/hwloc/autogen
-I../../ompi/mpiext/cuda/c
-I/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1
-I../..
-I/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/opal/include
-I/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/orte/include
-I../../orte/include
-I/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/include
-I/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/oshmem/include
-I/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/BLD/opal/mca/event/libevent2022/libevent/include
-I/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/opal/mca/event/libevent2022/libevent
-I/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/opal/mca/event/libevent2022/libevent/include
-I/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/BLD/opal/mca/hwloc/hwloc1113/hwloc/include
-I/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/opal/mca/hwloc/hwloc1113/hwloc/include
-D_REENTRANT -q32 -g -c
/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c
-Wp,-qmakedep=gcc,-MF.deps/ompi_datatype_module.TPlo  -qpic -DPIC -o
.libs/ompi_datatype_module.o
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 166.54: 1506-514 (S) Array element designator cannot be applied to an
object of type "struct opal_datatype_t".
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 166.54: 1506-196 (W) Initialization between types "unsigned int*" and
"int" is not allowed.
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 166.54: 1506-514 (S) Array element designator cannot be applied to an
object of type "struct opal_datatype_t".
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 166.54: 1506-026 (S) Number of initializers cannot be greater than the
number of aggregate members.
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 167.54: 1506-514 (S) Array element designator cannot be applied to an
object of type "struct opal_datatype_t".
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 167.54: 1506-514 (S) Array element designator cannot be applied to an
object of type "struct opal_datatype_t".
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 167.54: 1506-196 (W) Initialization between types "unsigned int*" and
"int" is not allowed.
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 167.54: 1506-514 (S) Array element designator cannot be applied to an
object of type "struct opal_datatype_t".
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 167.54: 1506-026 (S) Number of initializers cannot be greater than the
number of aggregate members.
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 168.54: 1506-514 (S) Array element designator cannot be applied to an
object of type "struct opal_datatype_t".
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 168.54: 1506-196 (W) Initialization between types "unsigned int*" and
"int" is not allowed.
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 168.54: 1506-514 (S) Array element designator cannot be applied to an
object of type "struct opal_datatype_t".
"/home/hargrove/SCRATCH/OMPI/openmpi-3.0.0rc1-linux-ppc32-xlc-12.1/openmpi-3.0.0rc1/ompi/datatype/ompi_datatype_module.c",
line 168.54: 1506-026 (S) Number of initializers cannot be greater than the
number of aggregate members.

[OMPI devel] [3.0.0rc1] ILP32 build failures

2017-07-03 Thread Paul Hargrove
On every ILP32 build I try w/ gcc, I get the following.

../../../opal/.libs/libopen-pal.so: undefined reference to
`__atomic_fetch_add_8'
collect2: error: ld returned 1 exit status
make[2]: *** [opal_wrapper] Error 1

This includes at least x86 and ppc32 (probably MIPS and ARM, but those are
too slow to have finished configure yet).

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI v2.0.3rc1 available for testing

2017-05-27 Thread Paul Hargrove
On Fri, May 26, 2017 at 2:06 PM, Howard Pritchard 
wrote:

> Hi Folks,
>
> Open MPI v2.0.3rc1 tarballs are available on the download site for testing:
>
[...]

I am pleased to report that my testing found nothing new.

-Paul



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI 3.x branch naming

2017-05-05 Thread Paul Hargrove
As a maintainer of non-MTT scripts that need to know the layout of the
directories containing nighty and RC tarball, I also think that all the
changes should be done soon (and all together, not spread over months).

-Paul

On Fri, May 5, 2017 at 2:16 PM, George Bosilca  wrote:

> If we rebranch from master for every "major" release it makes sense to
> rename the branch. In the long term renaming seems like the way to go, and
> thus the pain of altering everything that depends on the naming will exist
> at some point. I'am in favor of doing it asap (but I have no stakes in the
> game as UTK does not have an MTT).
>
>   George.
>
>
>
> On Fri, May 5, 2017 at 1:53 PM, Barrett, Brian via devel <
> devel@lists.open-mpi.org> wrote:
>
>> Hi everyone -
>>
>> We’ve been having discussions among the release managers about the choice
>> of naming the branch for Open MPI 3.0.0 as v3.x (as opposed to v3.0.x).
>> Because the current plan is that each “major” release (in the sense of the
>> three release points from master per year, not necessarily in increasing
>> the major number of the release number) is to rebranch off of master,
>> there’s a feeling that we should have named the branch v3.0.x, and then
>> named the next one 3.1.x, and so on.  If that’s the case, we should
>> consider renaming the branch and all the things that depend on the branch
>> (web site, which Jeff has already half-done; MTT testing; etc.).  The
>> disadvantage is that renaming will require everyone who’s configured MTT to
>> update their test configs.
>>
>> The first question is should we rename the branch?  While there would be
>> some ugly, there’s nothing that really breaks long term if we don’t.  Jeff
>> has stronger feelings than I have here.
>>
>> If we are going to rename the branch from v3.x to v3.0.x, my proposal
>> would be that we do it next Saturday evening (May 13th).  I’d create a new
>> branch from the current state of v3.x and then delete the old branch.  We’d
>> try to push all the PRs Friday so that there were no outstanding PRs that
>> would have to be reopened.  We’d then bug everyone to update their nightly
>> testing to pull from a different URL and update their MTT configs.  After a
>> week or two, we’d stop having tarballs available at both v3.x and v3.0.x on
>> the Open MPI web page.
>>
>> Thoughts?
>>
>> Brian
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI 2.1.1rc1 is up

2017-05-02 Thread Paul Hargrove
My testing was delayed since I was on vacation last week.
I have run my normal range of tests and have no new issues to report.

-Paul

On Thu, Apr 27, 2017 at 12:14 PM, Howard Pritchard 
wrote:

> Hi Open MPI developers,
>
> Open MPI 2.1.1rc1 is available for testing at the usual place:
>
> https://www.open-mpi.org/software/ompi/v2.1/
>
> Bug fixes in this release:
>
> - Add missing MPI_AINT_ADD/MPI_AINT_DIFF function definitions to mpif.h.
>   Thanks to Aboorva Devarajan for reporting.
>
> - Fix the error return from MPI_WIN_LOCK when rank argument is invalid.
>   Thanks to Jeff Hammond for reporting and fixing this issue.
>
> - Fix a problem with mpirun/orterun when started under a debugger. Thanks
>   to Gregory Leff for reporting.
>
> - Add configury option to disable use of CMA by the vader BTL.  Thanks
>to Sascha Hunold for reporting.
>
> - Add configury check for MPI_DOUBLE_COMPLEX datatype support.
>Thanks to Alexander Klein for reporting.
>
> - Fix memory allocated by MPI_WIN_ALLOCATE_SHARED to
>be 64 byte aligned.  Thanks to Joseph Schuchart for reporting.
>
> - Update MPI_WTICK man page to reflect possibly higher
>   resolution than 10e-6.  Thanks to Mark Dixon for
>   reporting
>
> - Add missing MPI_T_PVAR_SESSION_NULL definition to mpi.h
>   include file.  Thanks to Omri Mor for this contribution.
>
> - Enhance the Open MPI spec file to install modulefile in /opt
>   if installed in a non-default location.  Thanks to Kevin
>   Buckley for reporting and supplying a fix.
>
> - Fix a problem with conflicting PMI symbols when linking statically.
>Thanks to Kilian Cavalotti for reporting.
>
> Please try it out if you have time.
>
> Thanks,
>
> Howard and Jeff
>
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI v2.1.0rc3 released

2017-03-09 Thread Paul Hargrove
I am getting "make install" failures on NetBSD that are the same as seen
once before:
   https://www.mail-archive.com/devel@lists.open-mpi.org/msg19906.html

The fix that time was, I understand, to build release candidates on a
different system.

Note that due to problems fixed between rc2 and rc3, I was not able to
reach "make install" with rc2.

-Paul

On Thu, Mar 9, 2017 at 7:58 AM, Jeff Squyres (jsquyres) 
wrote:

> In the usual location:
>
> https://www.open-mpi.org/software/ompi/v2.1/
>
> Fixes since rc2:
>
> - Fix Hargrove-identified fallocate issue
> - Fixed Hargrove-identified missing header file
> - Fixed potential multi-threading issue with hooking madvise
> - Lots of README updates
> - Add ConnectX-5 part IDs to openib BTL
> - Fix name collision in shared memory MPI-IO implementation
> - Fix uninitialized UCX request field
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra

2017-03-07 Thread Paul Hargrove
Sorry to be so pedantic, but I try to put myself in the position of the
clueless user (which is actually not that hard early in the morning w/o
sufficient coffee).

-Paul

On Tue, Mar 7, 2017 at 10:00 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:

> Good point.  I just updated the FAQ item to include the v2.1.x text.
>
> Thanks!
>
>
> > On Mar 7, 2017, at 10:52 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> >
> > I initially did a Google search on the error text and "Open MPI FAQ"
> > Since the error message issued by 2.1.x no longer matches the text in
> the FAQ entry, my search did not find the entry.
> >
> > Not only is the FAQ entry text (error message) specific to 2.0.x, but so
> is the entry's title "I am using Open MPI 2.0.x and getting an error at
> application startup. How do I work around this?"
> >
> > So, I still think a new FAQ entry is needed OR the existing one should
> be generalized.
> >
> > -Paul
> >
> >
> > On Tue, Mar 7, 2017 at 9:15 AM, Howard Pritchard <hpprit...@gmail.com>
> wrote:
> > Hi Paul
> >
> > There is an entry 8 under OS-X FAQ which describes this problem.
> >
> > Adding max allowable len is a good idea.
> >
> > Howard
> >
> > Paul Hargrove <phhargr...@lbl.gov> schrieb am Di. 7. März 2017 um 08:04:
> > The following is fairly annoying (though I understand the problem is
> real):
> >
> > $ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c
> > PMIx has detected a temporary directory name that results
> > in a path that is too long for the Unix domain socket:
> >
> > Temp dir: /var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/
> openmpi-sessions-502@anlextwls026-173_0/53422
> >
> > Try setting your TMPDIR environmental variable to point to
> > something shorter in length
> >
> > Of course this comes from the fact that something outside my control has
> set TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR)
> >TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/
> >
> > I am just reporting this for three minor reasons
> > 1) Just in case nobody was aware of this problem
> > 2) To request that an FAQ entry related to this be added
> > 3) Yes, the message is clear, but it could be improved by indicating the
> allowable length of $TMPDIR
> >
> > -Paul
> >
> >
> > --
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department   Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >
> >
> >
> > --
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department   Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra

2017-03-07 Thread Paul Hargrove
I initially did a Google search on the error text and "Open MPI FAQ"
Since the error message issued by 2.1.x no longer matches the text in the
FAQ entry, my search did not find the entry.

Not only is the FAQ entry text (error message) specific to 2.0.x, but so is
the entry's title "I am using Open MPI 2.0.x and getting an error at
application startup. How do I work around this?"

So, I still think a new FAQ entry is needed OR the existing one should be
generalized.

-Paul


On Tue, Mar 7, 2017 at 9:15 AM, Howard Pritchard <hpprit...@gmail.com>
wrote:

> Hi Paul
>
> There is an entry 8 under OS-X FAQ which describes this problem.
>
> Adding max allowable len is a good idea.
>
> Howard
>
> Paul Hargrove <phhargr...@lbl.gov> schrieb am Di. 7. März 2017 um 08:04:
>
>> The following is fairly annoying (though I understand the problem is
>> real):
>>
>> $ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c
>> PMIx has detected a temporary directory name that results
>> in a path that is too long for the Unix domain socket:
>>
>> Temp dir: /var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/
>> openmpi-sessions-502@anlextwls026-173_0/53422
>>
>> Try setting your TMPDIR environmental variable to point to
>> something shorter in length
>>
>> Of course this comes from the fact that something outside my control has
>> set TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR)
>>TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/
>>
>> I am just reporting this for three minor reasons
>> 1) Just in case nobody was aware of this problem
>> 2) To request that an FAQ entry related to this be added
>> 3) Yes, the message is clear, but it could be improved by indicating the
>> allowable length of $TMPDIR
>>
>> -Paul
>>
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> <(510)%20495-2352>
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> <(510)%20486-6900>
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.1.0rc2] stupid run failure on Mac OS X Sierra

2017-03-07 Thread Paul Hargrove
The following is fairly annoying (though I understand the problem is real):

$ [full-path-to]/mpirun -mca btl sm,self -np 2 examples/ring_c
PMIx has detected a temporary directory name that results
in a path that is too long for the Unix domain socket:

Temp dir:
/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/openmpi-sessions-502@anlextwls026-173_0
/53422

Try setting your TMPDIR environmental variable to point to
something shorter in length

Of course this comes from the fact that something outside my control has
set TMPDIR to a session-specific directory (same value as $XDG_RUNTIME_DIR)
   TMPDIR=/var/folders/mg/q0_5yv791yz65cdnbglcqjvcgp/T/

I am just reporting this for three minor reasons
1) Just in case nobody was aware of this problem
2) To request that an FAQ entry related to this be added
3) Yes, the message is clear, but it could be improved by indicating the
allowable length of $TMPDIR

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.0rc2] ring_c SEGV on OpenBSD/i386

2017-03-07 Thread Paul Hargrove
Both 2.1.0rc2 and 2.0.2 appear to crash about 1 run in every 5.
This probabilistic nature is why I did not notice it in 2.0x.

-Paul

On Mon, Mar 6, 2017 at 7:58 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> I am traveling all this week and so don't know when I can take a look, but
> will try.
> -Paul
>
> On Mon, Mar 6, 2017 at 7:40 PM, r...@open-mpi.org <r...@open-mpi.org> wrote:
>
>> I’m not sure what could be going on here. I take it you were able to run
>> this example for the 2.0 series under this environment, yes? This code
>> hasn’t changed since that release, so I’m not sure why it would be failing
>> to resolve symbols now.
>>
>>
>> On Mar 6, 2017, at 2:22 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>
>> RC2 tarball for 2.1.0 configured with only --prefix=...
>> and --enable-mca-no-build=patcher
>> I don't have time to dig right now:
>>
>> $ mpirun -mca btl sm,self -np 2 examples/ring_c
>> [openbsd-i386:95593] *** Process received signal ***
>> 
>> --
>> mpirun noticed that process rank 1 with PID 0 on node openbsd-i386 exited
>> on signal 11 (Segmentation fault).
>> 
>> --
>>
>> $ gdb examples/ring_c ring_c.core
>> [...]
>> (gdb) where
>> #0  0x0ff27cf3 in _dl_find_symbol_obj (object=0x7d49a000, name=0xc7d96ab
>> "strsignal", hash=Variable "hash" is
>> not available.
>> )
>> at /usr/src/libexec/ld.so/resolve.c:540
>> #1  0x0ff27f8d in _dl_find_symbol (name=0xc7d96ab "strsignal",
>> this=0x830f1584, flags=Variable "flags" is not
>> available.
>> )
>> at /usr/src/libexec/ld.so/resolve.c:669
>> #2  0x0ff2a75f in _dl_bind (object=0x7d49a600, index=3704) at
>> /usr/src/libexec/ld.so/i386/rtld_machine.c:387
>> #3  0x0ff26637 in _dl_bind_start () at /usr/src/libexec/ld.so/i386/ld
>> asm.S:155
>> #4  0x7d49a600 in ?? ()
>> #5  0x0e78 in ?? ()
>> #6  0x0d560033 in __fgetwc_unlock (fp=0x1) at
>> /usr/src/lib/libc/stdio/fgetwc.c:65
>> #7  
>> #8  0x0ff27cf3 in _dl_find_symbol_obj (object=0x7dd41c00, name=0xd48042f
>> "recv", hash=Variable "hash" is not available.
>> )
>> at /usr/src/libexec/ld.so/resolve.c:540
>> #9  0x0ff27f8d in _dl_find_symbol (name=0xd48042f "recv",
>> this=0x830f1c34, flags=Variable "flags" is not available.
>> )
>> at /usr/src/libexec/ld.so/resolve.c:669
>> #10 0x0ff2a75f in _dl_bind (object=0x82980e00, index=32) at
>> /usr/src/libexec/ld.so/i386/rtld_machine.c:387
>> #11 0x0ff26637 in _dl_bind_start () at /usr/src/libexec/ld.so/i386/ld
>> asm.S:155
>> #12 0x82980e00 in ?? ()
>> #13 0x0020 in ?? ()
>> #14 0x0c820033 in opal_getcwd ()
>>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
>> libopen-pal.so.30.0
>> #15 0x0d4856e2 in mca_oob_usock_peer_recv_connect_ack ()
>>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
>> openmpi/mca_oob_usock.so
>> #16 0x0d48789e in mca_oob_usock_recv_handler ()
>>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
>> openmpi/mca_oob_usock.so
>> #17 0x0c82f11a in opal_libevent2022_event_base_loop (base=0x805b9000,
>> flags=1)
>> at /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/openmpi-2
>> .1.0rc2/opal/mca/event/libevent2022/libevent/event.c:1321
>> #18 0x0c7f16b4 in progress_engine ()
>>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
>> libopen-pal.so.30.0
>> #19 0x0b3cc852 in _rthread_start (v=0x7dd42428) at
>> /usr/src/lib/librthread/rthread.c:115
>> #20 0x0d5c4f82 in __tfork_thread () at /usr/src/lib/libc/arch/i386/sy
>> s/tfork_thread.S:95
>>
>> -Paul
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> <(510)%20495-2352>
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> <(510)%20486-6900>
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>>
>>
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> <(510)%20495-2352>
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <(510)%20486-6900>
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.0rc2] ring_c SEGV on OpenBSD/i386

2017-03-06 Thread Paul Hargrove
I am traveling all this week and so don't know when I can take a look, but
will try.
-Paul

On Mon, Mar 6, 2017 at 7:40 PM, r...@open-mpi.org <r...@open-mpi.org> wrote:

> I’m not sure what could be going on here. I take it you were able to run
> this example for the 2.0 series under this environment, yes? This code
> hasn’t changed since that release, so I’m not sure why it would be failing
> to resolve symbols now.
>
>
> On Mar 6, 2017, at 2:22 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> RC2 tarball for 2.1.0 configured with only --prefix=...
> and --enable-mca-no-build=patcher
> I don't have time to dig right now:
>
> $ mpirun -mca btl sm,self -np 2 examples/ring_c
> [openbsd-i386:95593] *** Process received signal ***
> --
> mpirun noticed that process rank 1 with PID 0 on node openbsd-i386 exited
> on signal 11 (Segmentation fault).
> --
>
> $ gdb examples/ring_c ring_c.core
> [...]
> (gdb) where
> #0  0x0ff27cf3 in _dl_find_symbol_obj (object=0x7d49a000, name=0xc7d96ab
> "strsignal", hash=Variable "hash" is
> not available.
> )
> at /usr/src/libexec/ld.so/resolve.c:540
> #1  0x0ff27f8d in _dl_find_symbol (name=0xc7d96ab "strsignal",
> this=0x830f1584, flags=Variable "flags" is not
> available.
> )
> at /usr/src/libexec/ld.so/resolve.c:669
> #2  0x0ff2a75f in _dl_bind (object=0x7d49a600, index=3704) at
> /usr/src/libexec/ld.so/i386/rtld_machine.c:387
> #3  0x0ff26637 in _dl_bind_start () at /usr/src/libexec/ld.so/i386/
> ldasm.S:155
> #4  0x7d49a600 in ?? ()
> #5  0x0e78 in ?? ()
> #6  0x0d560033 in __fgetwc_unlock (fp=0x1) at /usr/src/lib/libc/stdio/
> fgetwc.c:65
> #7  
> #8  0x0ff27cf3 in _dl_find_symbol_obj (object=0x7dd41c00, name=0xd48042f
> "recv", hash=Variable "hash" is not available.
> )
> at /usr/src/libexec/ld.so/resolve.c:540
> #9  0x0ff27f8d in _dl_find_symbol (name=0xd48042f "recv", this=0x830f1c34,
> flags=Variable "flags" is not available.
> )
> at /usr/src/libexec/ld.so/resolve.c:669
> #10 0x0ff2a75f in _dl_bind (object=0x82980e00, index=32) at
> /usr/src/libexec/ld.so/i386/rtld_machine.c:387
> #11 0x0ff26637 in _dl_bind_start () at /usr/src/libexec/ld.so/i386/
> ldasm.S:155
> #12 0x82980e00 in ?? ()
> #13 0x0020 in ?? ()
> #14 0x0c820033 in opal_getcwd ()
>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
> libopen-pal.so.30.0
> #15 0x0d4856e2 in mca_oob_usock_peer_recv_connect_ack ()
>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
> openmpi/mca_oob_usock.so
> #16 0x0d48789e in mca_oob_usock_recv_handler ()
>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
> openmpi/mca_oob_usock.so
> #17 0x0c82f11a in opal_libevent2022_event_base_loop (base=0x805b9000,
> flags=1)
> at /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/openmpi-
> 2.1.0rc2/opal/mca/event/libevent2022/libevent/event.c:1321
> #18 0x0c7f16b4 in progress_engine ()
>from /home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/
> libopen-pal.so.30.0
> #19 0x0b3cc852 in _rthread_start (v=0x7dd42428) at /usr/src/lib/librthread/
> rthread.c:115
> #20 0x0d5c4f82 in __tfork_thread () at /usr/src/lib/libc/arch/i386/
> sys/tfork_thread.S:95
>
> -Paul
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> <(510)%20495-2352>
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <(510)%20486-6900>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.1.0rc2] PMIX failure running ring_c on NetBSD

2017-03-06 Thread Paul Hargrove
2.1.0rc2 tarball on NetBSD7/amd64.
Configured with only --prefix=... and --disable-mpi-fortran

To get past the lack of a struct timeval definition required a small source
change in a previous email.
Once past that, I can build Open MPI and compile the examples.
However, I cannot run them.

Output below.

-Paul

$ mpirun -mca btl sm,self -np 2 examples/ring_c
[netbsd-amd64.kvm:20873] PMIX ERROR: ERROR in file
/home/phargrov/OMPI/openmpi-2.1.0rc2-netbsd7-amd64/openmpi-2.1.0rc2/opal/mca/pmix/pmix112/pmix/src/dstore/pmix_esh.c
at line 1651
[netbsd-amd64.kvm:20873] PMIX ERROR: OUT-OF-RESOURCE in file
/home/phargrov/OMPI/openmpi-2.1.0rc2-netbsd7-amd64/openmpi-2.1.0rc2/opal/mca/pmix/pmix112/pmix/src/dstore/pmix_esh.c
at line 820
[netbsd-amd64.kvm:20873] PMIX ERROR: ERROR in file
/home/phargrov/OMPI/openmpi-2.1.0rc2-netbsd7-amd64/openmpi-2.1.0rc2/opal/mca/pmix/pmix112/pmix/src/dstore/pmix_esh.c
at line 1468
[netbsd-amd64.kvm:20873] PMIX ERROR: ERROR in file
/home/phargrov/OMPI/openmpi-2.1.0rc2-netbsd7-amd64/openmpi-2.1.0rc2/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c
at line 592
[netbsd-amd64.kvm:16632] PMIX ERROR: ERROR in file
/home/phargrov/OMPI/openmpi-2.1.0rc2-netbsd7-amd64/openmpi-2.1.0rc2/opal/mca/pmix/pmix112/pmix/src/dstore/pmix_esh.c
at line 349
[netbsd-amd64.kvm:16632] PMIX ERROR: ERROR in file
/home/phargrov/OMPI/openmpi-2.1.0rc2-netbsd7-amd64/openmpi-2.1.0rc2/opal/mca/pmix/pmix112/pmix/src/dstore/pmix_esh.c
at line 839
[netbsd-amd64.kvm:16632] PMIX ERROR: ERROR in file
/home/phargrov/OMPI/openmpi-2.1.0rc2-netbsd7-amd64/openmpi-2.1.0rc2/opal/mca/pmix/pmix112/pmix/src/dstore/pmix_esh.c
at line 1021
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** on a NULL communicator
[netbsd-amd64.kvm:19230] PMIX ERROR: ERROR in file
/home/phargrov/OMPI/openmpi-2.1.0rc2-netbsd7-amd64/openmpi-2.1.0rc2/opal/mca/pmix/pmix112/pmix/src/dstore/pmix_esh.c
at line 349
[netbsd-amd64.kvm:19230] PMIX ERROR: ERROR in file
/home/phargrov/OMPI/openmpi-2.1.0rc2-netbsd7-amd64/openmpi-2.1.0rc2/opal/mca/pmix/pmix112/pmix/src/dstore/pmix_esh.c
at line 839
[netbsd-amd64.kvm:19230] PMIX ERROR: ERROR in file
/home/phargrov/OMPI/openmpi-2.1.0rc2-netbsd7-amd64/openmpi-2.1.0rc2/opal/mca/pmix/pmix112/pmix/src/dstore/pmix_esh.c
at line 1021
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[netbsd-amd64.kvm:19230] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
[netbsd-amd64.kvm:16632] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
--
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[31721,1],0]
  Exit code:1
--


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National 

[OMPI devel] [2.1.0rc2] PMIX build failures

2017-03-06 Thread Paul Hargrove
Ralph,

I found a couple issues with PMIX in the 2.2.0rc2 tarball.
However, I am providing a proper fix for one and a sub-standard fix for the
other.

The following short patch resolves the build errors with "unknown type name
'pid_t'" on (so far at least) FreeBSD when compiling
opal/mca/pmix/pmix112/pmix/src/sm/pmix_sm.c

--- opal/mca/pmix/pmix112/pmix/include/pmix/pmix_common.h~  2017-03-06
13:58:18.625897000 -0800
+++ opal/mca/pmix/pmix112/pmix/include/pmix/pmix_common.h   2017-03-06
13:58:24.392739000 -0800
@@ -50,6 +50,9 @@
 #include 
 #include 
 #include 
+#ifdef HAVE_UNISTD_H
+#include  /* for pid_t */
+#endif
 #ifdef HAVE_SYS_TIME_H
 #include  /* for struct timeval */
 #endif
Signed-off-by: Paul H. Hargrove 

The second issue is an incomplete definition for "struct timeval" on
FreeBSD and NetBSD (maybe others).
In pmix_common.h I see:

#ifdef HAVE_SYS_TIME_H
#include  /* for struct timeval */
#endif

However, since at least one source file is including pmix_common.h *before*
pmix_config.h, HAVE_SYS_TIME_H is not getting defined.

The following was sufficient for the *one* file I saw failing to compiler.
However, I seriously doubt this is the right way (or place) to fix this
(and am correspondingly omitting a sign-off).
--- opal/mca/pmix/pmix112/pmix/src/sm/pmix_sm.c~2017-03-06
14:03:42.0 -0800
+++ opal/mca/pmix/pmix112/pmix/src/sm/pmix_sm.c 2017-03-06
14:13:22.0 -0800
@@ -8,6 +8,7 @@
  * $HEADER$
  */

+#include "src/include/pmix_config.h"
 #include 
 #include "src/include/pmix_globals.h"
 #include "pmix_sm.h"

Unfortunately, on NetBSD that just gets me past compilation to fail at
runtime - the subject of my next email.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.1.0rc2] ring_c SEGV on OpenBSD/i386

2017-03-06 Thread Paul Hargrove
RC2 tarball for 2.1.0 configured with only --prefix=...
and --enable-mca-no-build=patcher
I don't have time to dig right now:

$ mpirun -mca btl sm,self -np 2 examples/ring_c
[openbsd-i386:95593] *** Process received signal ***
--
mpirun noticed that process rank 1 with PID 0 on node openbsd-i386 exited
on signal 11 (Segmentation fault).
--

$ gdb examples/ring_c ring_c.core
[...]
(gdb) where
#0  0x0ff27cf3 in _dl_find_symbol_obj (object=0x7d49a000, name=0xc7d96ab
"strsignal", hash=Variable "hash" is
not available.
)
at /usr/src/libexec/ld.so/resolve.c:540
#1  0x0ff27f8d in _dl_find_symbol (name=0xc7d96ab "strsignal",
this=0x830f1584, flags=Variable "flags" is not
available.
)
at /usr/src/libexec/ld.so/resolve.c:669
#2  0x0ff2a75f in _dl_bind (object=0x7d49a600, index=3704) at
/usr/src/libexec/ld.so/i386/rtld_machine.c:387
#3  0x0ff26637 in _dl_bind_start () at /usr/src/libexec/
ld.so/i386/ldasm.S:155
#4  0x7d49a600 in ?? ()
#5  0x0e78 in ?? ()
#6  0x0d560033 in __fgetwc_unlock (fp=0x1) at
/usr/src/lib/libc/stdio/fgetwc.c:65
#7  
#8  0x0ff27cf3 in _dl_find_symbol_obj (object=0x7dd41c00, name=0xd48042f
"recv", hash=Variable "hash" is not available.
)
at /usr/src/libexec/ld.so/resolve.c:540
#9  0x0ff27f8d in _dl_find_symbol (name=0xd48042f "recv", this=0x830f1c34,
flags=Variable "flags" is not available.
)
at /usr/src/libexec/ld.so/resolve.c:669
#10 0x0ff2a75f in _dl_bind (object=0x82980e00, index=32) at
/usr/src/libexec/ld.so/i386/rtld_machine.c:387
#11 0x0ff26637 in _dl_bind_start () at /usr/src/libexec/
ld.so/i386/ldasm.S:155
#12 0x82980e00 in ?? ()
#13 0x0020 in ?? ()
#14 0x0c820033 in opal_getcwd ()
   from
/home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/libopen-pal.so.30.0
#15 0x0d4856e2 in mca_oob_usock_peer_recv_connect_ack ()
   from
/home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/openmpi/mca_oob_usock.so
#16 0x0d48789e in mca_oob_usock_recv_handler ()
   from
/home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/openmpi/mca_oob_usock.so
#17 0x0c82f11a in opal_libevent2022_event_base_loop (base=0x805b9000,
flags=1)
at
/home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/openmpi-2.1.0rc2/opal/mca/event/libevent2022/libevent/event.c:1321
#18 0x0c7f16b4 in progress_engine ()
   from
/home/phargrov/OMPI/openmpi-2.1.0rc2-openbsd6-i386/INST/lib/libopen-pal.so.30.0
#19 0x0b3cc852 in _rthread_start (v=0x7dd42428) at
/usr/src/lib/librthread/rthread.c:115
#20 0x0d5c4f82 in __tfork_thread () at
/usr/src/lib/libc/arch/i386/sys/tfork_thread.S:95

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] OMPI v1.10.6rc1 ready for test

2017-02-02 Thread Paul Hargrove
Sorry for the delayed response.
I have completed my normal RC testing and have nothing to report.

-Paul

On Mon, Jan 30, 2017 at 1:03 PM, r...@open-mpi.org  wrote:

> Usual place: https://www.open-mpi.org/software/ompi/v1.10/
>
> Scheduled release: Fri Feb 3rd
>
> 1.10.6
> --
> - Fix bug in timer code that caused problems at optimization settings
>   greater than 2
> - OSHMEM: make mmap allocator the default instead of sysv or verbs
> - Support MPI_Dims_create with dimension zero
> - Update USNIC support
> - Prevent 64-bit overflow on timer counter
> - Add support for forwarding signals
> - Fix bug that caused truncated messages on large sends over TCP BTL
> - Fix potential infinite loop when printing a stacktrace
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.0.2rc4] "make install" failure on NetBSD/i386 (libtool?)

2017-01-28 Thread Paul Hargrove
Howard,

Both 2.0.2rc2 and 2.0.2rc3 builds just now are also failing in the same
manner, though I have records of at least a successful 2.0.2rc2 build
previously.
So, I am suspicious that something has changed with the updates applied
recently to this system, such as to binutils.

I am travelling all of next week (and finishing slides for that trip this
weekend).
So, I don't have time to poke at this in depth.

Since all my other BSD systems, including NetBSD/amd64, are error-free
*and* the older tarballs are also failing, I don't see this as a release
blocker (even if the eventual fix belongs in Open MPI's configury).

-Paul

On Sat, Jan 28, 2017 at 12:44 PM, Howard Pritchard <hpprit...@gmail.com>
wrote:

> HI Paul,
>
> This might be a result of building the tarball on a new system.
> Would you mind trying the rc3 tarball and see if that builds on the
> system?
>
>
> Howard
>
>
>
> 2017-01-27 15:12 GMT-07:00 Paul Hargrove <phhargr...@lbl.gov>:
>
>> I had no problem with 2.0.2rc3 on NetBSD, but with 2.0.2rc4 I am seeing a
>> "make install" failure (below).
>> This is seen on an x86 (32-bit) platform, but not x86_64.
>> I cannot say for certain that this is an Open MPI regression, since there
>> *have* been s/w updates on this system since I last tested.
>>
>> Configured with only --prefix and --disable-mpi-fortran (due to
>> https://github.com/open-mpi/ompi/issues/184)
>>
>> -Paul
>>
>> $ env LANG=C make install
>> [...]
>> Making install in mca/btl/sm
>>  
>> /home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/openmpi-2.0.2rc4/config/install-sh
>> -c -d '/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/INST/shar
>> e/openmpi' /usr/bin/install -c -m 644 /home/phargrov/OMPI/openmpi-2.
>> 0.2rc4-netbsd7-i386/openmpi-2.0.2rc4/opal/mca/btl/sm/help-mpi-btl-sm.txt
>> '/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/INST/share/openmpi'
>>  
>> /home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/openmpi-2.0.2rc4/config/install-sh
>> -c -d '/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/INST/lib/
>> openmpi'
>>  /bin/sh ../../../../libtool   --mode=install /usr/bin/install -c
>> mca_btl_sm.la '/home/phargrov/OMPI/openmpi-2
>> .0.2rc4-netbsd7-i386/INST/lib/openmpi'
>> libtool: warning: relinking 'mca_btl_sm.la'
>> libtool: install: (cd /home/phargrov/OMPI/openmpi-2.
>> 0.2rc4-netbsd7-i386/BLD/opal/mca/btl/sm; /bin/sh "/home/ph
>> argrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/BLD/libtool"  --tag CC
>> --mode=relink gcc -std=gnu99 -O3 -DNDEBUG -finline-functions
>> -fno-strict-aliasing -pthread -module -avoid-version -o mca_btl_sm.la
>> -rpath /home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/INST/lib/openmpi
>> mca_btl_sm_la-btl_sm.lo mca_btl_sm_la-btl_sm_component.lo 
>> mca_btl_sm_la-btl_sm_frag.lo
>> /home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/BLD/opal/m
>> ca/common/sm/libmca_common_sm.la -lrt -lexecinfo -lm -lutil )
>>
>> *** Warning: linker path does not have real file for library
>> -lmca_common_sm.
>> *** I have the capability to make that library automatically link in when
>> *** you link to this library.  But I can only do this if you have a
>> *** shared version of the library, which you do not appear to have
>> *** because I did check the linker path looking for a file starting
>> *** with libmca_common_sm and none of the candidates passed a file format
>> test
>> *** using a regex pattern. Last file checked:
>> /home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/BLD/opal/mca/c
>> ommon/sm/.libs/libmca_common_sm.so.20.0
>>
>> *** Warning: libtool could not satisfy all declared inter-library
>> *** dependencies of module mca_btl_sm.  Therefore, libtool will create
>> *** a static module, that should work as long as the dlopening
>> *** application is linked with the -dlopen flag.
>> libtool: relink: ar cru .libs/mca_btl_sm.a .libs/mca_btl_sm_la-btl_sm.o
>> .libs/mca_btl_sm_la-btl_sm_component.o
>>  .libs/mca_btl_sm_la-btl_sm_frag.o
>> libtool: relink: ranlib .libs/mca_btl_sm.a
>> libtool: relink: ( cd ".libs" && rm -f "mca_btl_sm.la" && ln -s "../
>> mca_btl_sm.la" "mca_btl_sm.la" )
>> libtool: install: /usr/bin/install -c .libs/mca_btl_sm.soT
>> /home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/I
>> NST/lib/openmpi/mca_btl_sm.so
>> install: .libs/mca_btl_sm.soT: stat: No such file or directory
>> *** Error code 1
>>
>> Stop.
>>
>>
>>
>> -Paul
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Co

Re: [OMPI devel] Reminder: assign as well as request review

2017-01-27 Thread Paul Hargrove
I am so often the guy complaining about what it busted.
So, it feels nice to have contributed something *positive* on this list.

-Paul

On Fri, Jan 27, 2017 at 5:42 PM, r...@open-mpi.org <r...@open-mpi.org> wrote:

> Thanks Paul - that does indeed help!
>
> On Jan 27, 2017, at 12:26 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> Ralph,
>
> It looks like GitHub *might* have rolled out the solution to your problem
> just this week:
>https://github.com/blog/2306-filter-pull-request-reviews-
> and-review-requests
>
> This appears to include an "Awaiting review from you" filter.
> Not quite a dashboard or notification, but at least a way to make the
> query.
>
> -Paul
>
>
> On Fri, Jan 27, 2017 at 7:46 AM, r...@open-mpi.org <r...@open-mpi.org>
> wrote:
>
>> Hey folks
>>
>> Just a reminder. If you request a review from someone, GitHub doesn’t
>> show that person’s icon when looking at the list of PRs. It only shows
>> their icon and marks the PR with their ID if you actually “assign” it to
>> that person. Thus, just requesting a review without assigning the PR to
>> someone makes it impossible for them to see which PRs are awaiting their
>> attention.
>>
>> Speaking personally, I have no idea which PRs are awaiting my attention
>> unless you assign them to me. So please remember to do so.
>>
>> Thanks
>> Ralph
>>
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> <(510)%20495-2352>
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <(510)%20486-6900>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.2rc4] "make install" failure on NetBSD/i386 (libtool?)

2017-01-27 Thread Paul Hargrove
I had no problem with 2.0.2rc3 on NetBSD, but with 2.0.2rc4 I am seeing a
"make install" failure (below).
This is seen on an x86 (32-bit) platform, but not x86_64.
I cannot say for certain that this is an Open MPI regression, since there
*have* been s/w updates on this system since I last tested.

Configured with only --prefix and --disable-mpi-fortran (due to
https://github.com/open-mpi/ompi/issues/184)

-Paul

$ env LANG=C make install
[...]
Making install in mca/btl/sm
 
/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/openmpi-2.0.2rc4/config/install-sh
-c -d '/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/INST/share/openmpi'
/usr/bin/install
-c -m 644
/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/openmpi-2.0.2rc4/opal/mca/btl/sm/help-mpi-btl-sm.txt
'/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/INST/share/openmpi'
 
/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/openmpi-2.0.2rc4/config/install-sh
-c -d '/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/INST/lib/openmpi'
 /bin/sh ../../../../libtool   --mode=install /usr/bin/install -c
mca_btl_sm.la '/home/phargrov/OMPI/openmpi
-2.0.2rc4-netbsd7-i386/INST/lib/openmpi'
libtool: warning: relinking 'mca_btl_sm.la'
libtool: install: (cd
/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/BLD/opal/mca/btl/sm;
/bin/sh "/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/BLD/libtool"
 --tag CC --mode=relink gcc -std=gnu99 -O3 -DNDEBUG -finline-functions
-fno-strict-aliasing -pthread -module -avoid-version -o mca_btl_sm.la
-rpath /home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/INST/lib/openmpi
mca_btl_sm_la-btl_sm.lo mca_btl_sm_la-btl_sm_component.lo
mca_btl_sm_la-btl_sm_frag.lo
/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/BLD/opal/mca/common/sm/libmca_com
mon_sm.la -lrt -lexecinfo -lm -lutil )

*** Warning: linker path does not have real file for library
-lmca_common_sm.
*** I have the capability to make that library automatically link in when
*** you link to this library.  But I can only do this if you have a
*** shared version of the library, which you do not appear to have
*** because I did check the linker path looking for a file starting
*** with libmca_common_sm and none of the candidates passed a file format
test
*** using a regex pattern. Last file checked:
/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/BLD/opal/mca/c
ommon/sm/.libs/libmca_common_sm.so.20.0

*** Warning: libtool could not satisfy all declared inter-library
*** dependencies of module mca_btl_sm.  Therefore, libtool will create
*** a static module, that should work as long as the dlopening
*** application is linked with the -dlopen flag.
libtool: relink: ar cru .libs/mca_btl_sm.a .libs/mca_btl_sm_la-btl_sm.o
.libs/mca_btl_sm_la-btl_sm_component.o
 .libs/mca_btl_sm_la-btl_sm_frag.o
libtool: relink: ranlib .libs/mca_btl_sm.a
libtool: relink: ( cd ".libs" && rm -f "mca_btl_sm.la" && ln -s "../
mca_btl_sm.la" "mca_btl_sm.la" )
libtool: install: /usr/bin/install -c .libs/mca_btl_sm.soT
/home/phargrov/OMPI/openmpi-2.0.2rc4-netbsd7-i386/I
NST/lib/openmpi/mca_btl_sm.so
install: .libs/mca_btl_sm.soT: stat: No such file or directory
*** Error code 1

Stop.



-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Reminder: assign as well as request review

2017-01-27 Thread Paul Hargrove
Ralph,

It looks like GitHub *might* have rolled out the solution to your problem
just this week:

https://github.com/blog/2306-filter-pull-request-reviews-and-review-requests

This appears to include an "Awaiting review from you" filter.
Not quite a dashboard or notification, but at least a way to make the query.

-Paul


On Fri, Jan 27, 2017 at 7:46 AM, r...@open-mpi.org  wrote:

> Hey folks
>
> Just a reminder. If you request a review from someone, GitHub doesn’t show
> that person’s icon when looking at the list of PRs. It only shows their
> icon and marks the PR with their ID if you actually “assign” it to that
> person. Thus, just requesting a review without assigning the PR to someone
> makes it impossible for them to see which PRs are awaiting their attention.
>
> Speaking personally, I have no idea which PRs are awaiting my attention
> unless you assign them to me. So please remember to do so.
>
> Thanks
> Ralph
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel




-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.2rc3] build failure ppc64/-m32 and bultin-atomics

2017-01-05 Thread Paul Hargrove
I have a standard Linux/ppc64 system with gcc-4.8.3
I have configured the 2.0.2rc3 tarball with

--prefix=... --enable-builtin-atomics \
CFLAGS=-m32 --with-wrapper-cflags=-m32 \
CXXFLAGS=-m32 --with-wrapper-cxxflags=-m32 \
FCFLAGS=-m32 --with-wrapper-fcflags=-m32 --disable-mpi-fortran

(Yes, I know the FCFLAGS are unnecessary).

I get a "make check" failure:

make[3]: Entering directory
`/home/phargrov/OMPI/openmpi-2.0.2rc3-linux-ppc32-gcc/BLD/test/asm'
  CC   atomic_barrier.o
  CCLD atomic_barrier
  CC   atomic_barrier_noinline-atomic_barrier_noinline.o
  CCLD atomic_barrier_noinline
  CC   atomic_spinlock.o
  CCLD atomic_spinlock
  CC   atomic_spinlock_noinline-atomic_spinlock_noinline.o
  CCLD atomic_spinlock_noinline
  CC   atomic_math.o
  CCLD atomic_math
atomic_math.o: In function `atomic_math_test':
atomic_math.c:(.text+0x78): undefined reference to `__sync_add_and_fetch_8'
collect2: error: ld returned 1 exit status
make[3]: *** [atomic_math] Error 1


It looks like there is an (incorrect) assumption that 8-byte atomics are
available.
Removing --enable-builtin-atomics resolves this issue.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.2rc2] opal_fifo hang w/ --enable-osx-builtin-atomics

2017-01-04 Thread Paul Hargrove
On Macs running Yosemite (OS X 10.10 w/ Xcode 7.1) and El Capitan (OS X
10.11 w/ Xcode 8.1) I have configured with
CC=cc CXX=c++ FC=/sw/bin/gfortran --prefix=...
--enable-osx-builtin-atomics

Upon running "make check", the test "opal_fifo" hangs on both systems.
Without the --enable-osx-builtin-atomics things are fine.

I don't have data for Sierra (10.12).

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.2rc2] FreeBSD-11 run failure

2017-01-04 Thread Paul Hargrove
With the 2.0.2rc2 tarball on FreeBSD-11 (i386 or amd64) I am configuring
with:
 --prefix=... CC=clang CXX=clang++ --disable-mpi-fortran

I get a failure running ring_c:

mpirun -mca btl sm,self -np 2 examples/ring_c'
--
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--
+ exit 1

When I configure with either "--disable-dlopen" OR "--enable-static
--disable-shared" the problem vanishes.
So, I suspect a dlopen-related issue.

I will

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI v2.0.2rc1 is up

2016-12-19 Thread Paul Hargrove
Resent on Friday afternoon:
https://www.mail-archive.com/devel@lists.open-mpi.org//msg19836.html

-Paul

On Mon, Dec 19, 2016 at 10:21 AM, Howard Pritchard <hpprit...@gmail.com>
wrote:

> HI Paul,
>
> Would you mind resending the  "runtime error w/ PGI usempif08 on
> OpenPOWER"
> email without the config.log attached?
>
> Thanks,
>
> Howard
>
>
> 2016-12-16 12:17 GMT-07:00 Howard Pritchard <hpprit...@gmail.com>:
>
>> HI Paul,
>>
>> Thanks for the checking the rc out.  And for noting the grammar
>> mistake.
>>
>> Howard
>>
>>
>> 2016-12-16 1:00 GMT-07:00 Paul Hargrove <phhargr...@lbl.gov>:
>>
>>> My testing is complete.
>>>
>>> The only problems not already known are related to PGI's recent
>>> "Community Edition" compilers and have been reported in three separate
>>> emails:
>>>
>>> [2.0.2rc1] Fortran link failure with PGI fortran on MacOSX
>>> <https://mail-archive.com/devel@lists.open-mpi.org/msg19823.html>
>>> [2.0.2rc1] Build failure w/ PGI compilers on Mac OS X
>>> <https://mail-archive.com/devel@lists.open-mpi.org/msg19824.html>
>>> [2.0.2rc1] runtime error w/ PGI usempif08 on OpenPOWER
>>>
>>> For some reason the last one does not appear in the archive!
>>> Perhaps the config.log.bz2 I attached was too large?
>>> Let me know if I should resend it.
>>>
>>> BTW:  a typo in the ChangeLog of the announcement email:
>>>
>>> - Fix a problem with early exit of a MPI process without calling
>>> MPI_FINALIZE
>>>   of MPI_ABORT that could lead to job hangs.  Thanks to Christof
>>> Koehler for
>>>   reporting.
>>>
>>> The "of" that begins the second line was almost certainly intended to be
>>> "or".
>>>
>>> -Paul
>>>
>>> On Wed, Dec 14, 2016 at 6:58 PM, Jeff Squyres (jsquyres) <
>>> jsquy...@cisco.com> wrote:
>>>
>>>> Please test!
>>>>
>>>> https://www.open-mpi.org/software/ompi/v2.0/
>>>>
>>>> Changes since v2.0.1:
>>>>
>>>> - Remove use of DATE in the message queue version string reported to
>>>> debuggers to
>>>>   insure bit-wise reproducibility of binaries.  Thanks to Alastair
>>>> McKinstry
>>>>   for help in fixing this problem.
>>>> - Fix a problem with early exit of a MPI process without calling
>>>> MPI_FINALIZE
>>>>   of MPI_ABORT that could lead to job hangs.  Thanks to Christof
>>>> Koehler for
>>>>   reporting.
>>>> - Fix a problem with forward of SIGTERM signal from mpirun to MPI
>>>> processes
>>>>   in a job.  Thanks to Noel Rycroft for reporting this problem
>>>> - Plug some memory leaks in MPI_WIN_FREE discovered using Valgrind.
>>>> Thanks
>>>>   to Joseph Schuchart for reporting.
>>>> - Fix a problems  MPI_NEIGHOR_ALLTOALL when using a communicator with
>>>> an empty topology
>>>>   graph.  Thanks to Daniel Ibanez for reporting.
>>>> - Fix a typo in a PMIx component help file.  Thanks to @njoly for
>>>> reporting this.
>>>> - Fix a problem with Valgrind false positives when using Open MPI's
>>>> internal memchecker.
>>>>   Thanks to Yvan Fournier for reporting.
>>>> - Fix a problem with MPI_FILE_DELETE returning MPI_SUCCESS when
>>>>   deleting a non-existent file. Thanks to Wei-keng Liao for reporting.
>>>> - Fix a problem with MPI_IMPROBE that could lead to hangs in subsequent
>>>> MPI
>>>>   point to point or collective calls.  Thanks to Chris Pattison for
>>>> reporting.
>>>> - Fix a problem when configure Open MPI for powerpc with
>>>> --enable-mpi-cxx
>>>>   enabled.  Thanks to amckinstry for reporting.
>>>> - Fix a problem using MPI_IALLTOALL with MPI_IN_PLACE argument.  Thanks
>>>> to
>>>>   Chris Ward for reporting.
>>>> - Fix a problem using MPI_RACCUMULATE with the Portals4 transport.
>>>> Thanks to
>>>>   @PDeveze for reporting.
>>>> - Fix an issue with static linking and duplicate symbols arising from
>>>> PMIx
>>>>   Slurm components.  Thanks to Limin Gu for reporting.
>>>> - Fix a problem when using MPI dynamic memory windows.  Thanks to
>>>>   Christoph Niethamm

[OMPI devel] Errors with CXX=pgc++ (but CXX=pgCC OK)

2016-12-16 Thread Paul Hargrove
With the 1.10.r5c1 tarball on linux/x86-64 and various versions of the PGI
compilers I have configured with

--prefix=[...] --enable-debug CC=pgcc CXX=pgc++ FC=pgfortran


I see the following with version 14.3 of the PGI compilers:

/bin/bash ../../../libtool  --tag=CXX   --mode=link pgc++  -g
 -version-info 2:3:1  -o libmpi_cxx.la -rpath
/sandbox/hargrove/OMPI/openmpi-1.10.5rc1-linux-x86_64-pgi-14/INST/lib
mpicxx.lo intercepts.lo comm.lo datatype.lo win.lo file.lo ../../../ompi/
libmpi.la -lrt -lutil
libtool: link: pgc++  -fPIC -DPIC -shared -nostdlib
/usr/lib/x86_64-linux-gnu/crti.o
/nfs/software/linux-ubuntu_precise_amd64/com/packages/pgi/143/linux86-64/14.3/libso/trace_init.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/crtbeginS.o
/nfs/software/linux-ubuntu_precise_amd64/com/packages/pgi/143/linux86-64/14.3/libso/initmp.o
 .libs/mpicxx.o .libs/intercepts.o .libs/comm.o .libs/datatype.o
.libs/win.o .libs/file.o   -Wl,-rpath
-Wl,/sandbox/hargrove/OMPI/openmpi-1.10.5rc1-linux-x86_64-pgi-14/BLD/ompi/.libs
-Wl,-rpath
-Wl,/sandbox/hargrove/OMPI/openmpi-1.10.5rc1-linux-x86_64-pgi-14/BLD/orte/.libs
-Wl,-rpath
-Wl,/sandbox/hargrove/OMPI/openmpi-1.10.5rc1-linux-x86_64-pgi-14/BLD/opal/.libs
-Wl,-rpath
-Wl,/sandbox/hargrove/OMPI/openmpi-1.10.5rc1-linux-x86_64-pgi-14/INST/lib
-L/sandbox/hargrove/OMPI/openmpi-1.10.5rc1-linux-x86_64-pgi-14/BLD/orte/.libs
-L/sandbox/hargrove/OMPI/openmpi-1.10.5rc1-linux-x86_64-pgi-14/BLD/opal/.libs
../../../ompi/.libs/libmpi.so
/sandbox/hargrove/OMPI/openmpi-1.10.5rc1-linux-x86_64-pgi-14/BLD/orte/.libs/libopen-rte.so
/sandbox/hargrove/OMPI/openmpi-1.10.5rc1-linux-x86_64-pgi-14/BLD/opal/.libs/libopen-pal.so
-ldl -lrt -lutil
-L/nfs/software/linux-ubuntu_precise_amd64/com/packages/pgi/143/linux86-64/14.3/libso
-L/soft/com/packages/pgi/143/14.3/share_objects/lib64
-L/nfs/software/linux-ubuntu_precise_amd64/com/packages/pgi/143/linux86-64/14.3/lib
-L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -lpgatm -lgcc_s -lstdc++
-lpgmp -lnuma -lpthread -lnspgc -lpgc -lm -lc -lgcc
/usr/lib/gcc/x86_64-linux-gnu/4.6/crtendS.o
/usr/lib/x86_64-linux-gnu/crtn.o  -g   -Wl,-soname -Wl,libmpi_cxx.so.1 -o
.libs/libmpi_cxx.so.1.1.3
pgc++-Error-Unknown switch: -nostdlib
make[2]: *** [libmpi_cxx.la] Error 1

Switching from CXX=pgc++ to CXX=pgCC eliminates the problem.

This is *possibly* related to the following configure output in which pgc++
has been misidentified as the GNU compiler:

$ grep 'compiler vendor' configure.log
checking for the C compiler vendor... portland group
checking for the C++ compiler vendor... gnu
checking for the C++ compiler vendor... (cached) gnu
checking for the C compiler vendor... portland group

I suspect that pgc++ is intentionally masquerading as a GNU compiler, but
falling short of the mark as of their 14.3 release.

I didn't see this in last night's testing of 2.0.2rc1 because I was
configuring with CXX=pgCC on my Linux/x86-64 systems.
However testing pgc++ today with 2.0.2rc1 I see pretty much the same
results with both release candidates:

With PGI 12.10 and 13.9 configure decides that CC and CXX are not link
compatible (but not when CXX=pgCC)
This is true of both the 1.10 and 2.0 RCs

With PGI 14.3, v1.10.5rc1 fails as described above, while 2.0.2rc1 (w/ c++
bindings disabled by default) was OK.
However, enabling the c++ bindings in 2.0.2rc1 leads to the same error
shown above.

With PGI 15.9, pgc++ unfortunately gets an unrelated ICE compiling VT
(which also vanishes with CXX=pgCC) for 1.10, and is fine for 2.0.

With PGI 16.10, pgcc gets an unrelated SEGV compiling mtl_ofi_component.c,
but has no problems with pgc++ on either branch once I configure using
--without-libfabric.


Based on my experiences listed above, I would recommend (in the Open MPI
README):
If building C++ bindings or VT, I advise against use of CXX=pgc++ prior to
16.10.
Otherwise, it appears usable from 14.3 forward.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.0.2rc1] runtime error w/ PGI usempif08 on OpenPOWER

2016-12-16 Thread Paul Hargrove
Since the message below has still not appeared in the archive after more
then 15hrs, I am resending w/o the attachment.
-Paul

On Thu, Dec 15, 2016 at 10:23 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> On a little-endian Power8 I have the free edition of PGI16.10 for
> OpenPOWER.
> I am configuring the 2.0.2rc1 tarball with
>
> --prefix=... --enable-debug CC=pgcc CXX=pgc++ FC=pgfortran
>
>
> I can build Open MPI and can run ring_c, ring_mpih and ring_usempi.
> However, ring_usempif08 dies with an invalid datatype message:
>
> $ mpirun -mca btl sm,self -np 2 examples/ring_usempif08'
> Process 0 sending 10 to  1 tag 201 ( 2 processes in ring)
> [openpower-6:20752] *** An error occurred in MPI_Recv
> [openpower-6:20752] *** reported by process [1078198273,1]
> [openpower-6:20752] *** on communicator MPI_COMM_WORLD
> [openpower-6:20752] *** MPI_ERR_TYPE: invalid datatype
> [openpower-6:20752] *** MPI_ERRORS_ARE_FATAL (processes in this
> communicator will now abort,
> [openpower-6:20752] ***and potentially your MPI job)
> [openpower-6:20746] 1 more process has sent help message
> help-mpi-errors.txt / mpi_errors_are_fatal
> [openpower-6:20746] Set MCA parameter "orte_base_help_aggregate" to 0 to
> see all help / error messages
>
>
>
> The (bzip2-compressed) config.log is attached.
> Let me know of anything else required.
>
> FWIW this is the same OpenPOWER VM that Nathan has been using for other
> issues with PGI on OpenPOWER.
> So, he already has access and I can grant access to others if necessary.
>
> -Paul
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> <(510)%20495-2352>
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <(510)%20486-6900>
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
<(510)%20495-2352>
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
<(510)%20486-6900>
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Open MPI v2.0.2rc1 is up

2016-12-16 Thread Paul Hargrove
My testing is complete.

The only problems not already known are related to PGI's recent "Community
Edition" compilers and have been reported in three separate emails:

[2.0.2rc1] Fortran link failure with PGI fortran on MacOSX
<https://mail-archive.com/devel@lists.open-mpi.org/msg19823.html>
[2.0.2rc1] Build failure w/ PGI compilers on Mac OS X
<https://mail-archive.com/devel@lists.open-mpi.org/msg19824.html>
[2.0.2rc1] runtime error w/ PGI usempif08 on OpenPOWER

For some reason the last one does not appear in the archive!
Perhaps the config.log.bz2 I attached was too large?
Let me know if I should resend it.

BTW:  a typo in the ChangeLog of the announcement email:

- Fix a problem with early exit of a MPI process without calling
MPI_FINALIZE
  of MPI_ABORT that could lead to job hangs.  Thanks to Christof Koehler for
  reporting.

The "of" that begins the second line was almost certainly intended to be
"or".

-Paul

On Wed, Dec 14, 2016 at 6:58 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:

> Please test!
>
> https://www.open-mpi.org/software/ompi/v2.0/
>
> Changes since v2.0.1:
>
> - Remove use of DATE in the message queue version string reported to
> debuggers to
>   insure bit-wise reproducibility of binaries.  Thanks to Alastair
> McKinstry
>   for help in fixing this problem.
> - Fix a problem with early exit of a MPI process without calling
> MPI_FINALIZE
>   of MPI_ABORT that could lead to job hangs.  Thanks to Christof Koehler
> for
>   reporting.
> - Fix a problem with forward of SIGTERM signal from mpirun to MPI processes
>   in a job.  Thanks to Noel Rycroft for reporting this problem
> - Plug some memory leaks in MPI_WIN_FREE discovered using Valgrind.  Thanks
>   to Joseph Schuchart for reporting.
> - Fix a problems  MPI_NEIGHOR_ALLTOALL when using a communicator with an
> empty topology
>   graph.  Thanks to Daniel Ibanez for reporting.
> - Fix a typo in a PMIx component help file.  Thanks to @njoly for
> reporting this.
> - Fix a problem with Valgrind false positives when using Open MPI's
> internal memchecker.
>   Thanks to Yvan Fournier for reporting.
> - Fix a problem with MPI_FILE_DELETE returning MPI_SUCCESS when
>   deleting a non-existent file. Thanks to Wei-keng Liao for reporting.
> - Fix a problem with MPI_IMPROBE that could lead to hangs in subsequent MPI
>   point to point or collective calls.  Thanks to Chris Pattison for
> reporting.
> - Fix a problem when configure Open MPI for powerpc with --enable-mpi-cxx
>   enabled.  Thanks to amckinstry for reporting.
> - Fix a problem using MPI_IALLTOALL with MPI_IN_PLACE argument.  Thanks to
>   Chris Ward for reporting.
> - Fix a problem using MPI_RACCUMULATE with the Portals4 transport.  Thanks
> to
>   @PDeveze for reporting.
> - Fix an issue with static linking and duplicate symbols arising from PMIx
>   Slurm components.  Thanks to Limin Gu for reporting.
> - Fix a problem when using MPI dynamic memory windows.  Thanks to
>   Christoph Niethammer for reporting.
> - Fix a problem with Open MPI's pkgconfig files.  Thanks to Alastair
> McKinstry
>   for reporting.
> - Fix a problem with MPI_IREDUCE when the same buffer is supplied for the
>   send and recv buffer arguments.  Thanks to Valentin Petrov for reporting.
> - Fix a problem with atomic operations on PowerPC.  Thanks to Paul
>   Hargrove for reporting.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.2rc1] Build failure w/ PGI compilers on Mac OS X

2016-12-15 Thread Paul Hargrove
I have Mac OS X 10.10 (Yosemite) and 10.12 (Sierra) systems with PGI
compilers installed.
I have configured the 2.0.2rc1 tarball with

 --prefix=[...] --enable-debug CC=pgcc CXX=pgc++ FC=pgfortran

I see a build failure in the libevent code:


PGC-F-0249-#error --  "No way to define ev_uint64_t"
(/Users/phhargrove/OMPI/openmpi-2.0.2rc1-macos10.10-x86-pgi/openmpi-2.0.2rc1/opal/mca/event/libevent2022/libevent/include/event2/util.h:
126)
PGC/x86-64 OSX 16.10-0: compilation aborted


I've tracked this down to libevent's configure script failing to find
uint64_t.
>From opal/mca/event/libevent2022/libevent/config.log:

configure:14682: checking for uint64_t
configure:14682: pgcc -c -g -Wno-deprecated-declarations
-I/Users/phhargrove/OMPI/openmpi-2.0.2rc1-macos10.10-x86-pgi/openmpi-2.0.2rc1
-I/Users/phhargrove/OMPI/openmpi-2.0.2rc1-macos10.10-x86-pgi/BLD
-I/Users/phhargrove/OMPI/openmpi-2.0.2rc1-macos10.10-x86-pgi/openmpi-2.0.2rc1/opal/include

-I/Users/phhargrove/OMPI/openmpi-2.0.2rc1-macos10.10-x86-pgi/openmpi-2.0.2rc1/opal/mca/hwloc/hwloc1112/hwloc/include
-I/Users/phhargrove/OMPI/openmpi-2.0.2rc1-macos10.10-x86-pgi/BLD/opal/mca/hwloc/hwloc1112/hwloc/include
-Drandom=opal_random conftest.c >&5
pgcc-Error-Unknown switch: -Wno-deprecated-declarations
configure:14682: $? = 1


In fact, nearly *all* of libevent's configure test fails as a result of
that unsupported compiler flag.
Use of that flag, in turn, is due to the following
in opal/mca/event/libevent2022/libevent/configure.ac:


# OS X Lion started deprecating the system openssl. Let's just disable
# all deprecation warnings on OS X.
case "$host_os" in

 darwin*)
CFLAGS="$CFLAGS -Wno-deprecated-declarations"
;;
esac


It seems this needs to change to acknowledge that not every compiler for
Mac OS X is gcc-compatible.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.2rc1] Fortran link failure with PGI fortran on MacOSX

2016-12-15 Thread Paul Hargrove
I have Mac OS X 10.10 (Yosemite) and 10.12 (Sierra) systems with PGI
compilers installed.
I have configured the 2.0.2rc1 tarball with
 --prefix=[...]  --enable-debug CC=cc CXX=c++
Where cc and c++ are Clang from Apple Xcode 7.1 (Build 7B91b) and 8.2
(Build 8C38).
Meanwhile Open MPI's configure finds pgfortran-16.10-0 (note: PGI, not GNU)
in my $PATH.

I can build Open MPI fine, but when building the examples, ring_mpifh fails
to link.
This is particularly odd to me since hello_mpifh linked OK.

$ make -k
mpicc -ghello_c.c   -o hello_c
mpicc -gring_c.c   -o ring_c
mpicc -gconnectivity_c.c   -o connectivity_c
mpifort -g hello_mpifh.f -o hello_mpifh
mpifort -g ring_mpifh.f -o ring_mpifh
ld: illegal text-relocation to '_mpi_fortran_status_ignore_' in
/Users/phhargrove/OMPI/openmpi-2.0.2rc1-macos10.10-
x86-clang/INST/lib/libmpi.dylib from '_MAIN_' in ring_mpifh.o for inferred
architecture x86_64

I have no problems with gfortran (GNU).

The (bzip2-compressed) output of configure on the Sierra system is attached.
Let me know which additional files to send to whom.

Note that PGI 16.10 is available with a free "Community Edition" license.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
<(510)%20495-2352>
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
<(510)%20486-6900>


configure.log.bz2
Description: BZip2 compressed data
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread Paul Hargrove
On Thu, Dec 1, 2016 at 4:25 PM, Gilles Gouaillardet 
wrote:
[...]

> git checkout master
>
> git merge --ff-only topic/misc_fixes
>
> git push origin master
>
[...]


Gilles,

You characterized the merge commit has having "close to zero added value"
to you - but in this instance it would have saved you and others a
non-trivial amount of time in email.

Additionally, in projects I work on we value that merge commit as a "cut
line" if we ever need to revert an entire PR for some reason.  Using
git-bisect such that one includes or excludes the entire PR is also a
justification for keeping the merge commit.  So my opinion is that you
should have omitted "--ff-only" and entered a commit message that at least
identified the PR number.


> though this does not generate a git commit, github.com is smart enough to
> figure this out and marks the PR as merged.
>

FWIW: "smart enough" is simply a detection that the last commit in the PR
has become an ancestor of the current HEAD.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] master broken on (at least) OpenBSD-6

2016-09-22 Thread Paul Hargrove
When trying to test PR2107 on OpenBSD-6 I was blocked by the following
error, which is also present in 'master'.

../../../ompi/opal/util/if.c: In function 'opal_ifisloopback':
../../../ompi/opal/util/if.c:710: error: 'IFF_LOOPBACK' undeclared (first
use in this function)
../../../ompi/opal/util/if.c:710: error: (Each undeclared identifier is
reported only once
../../../ompi/opal/util/if.c:710: error: for each function it appears in.)
../../../ompi/opal/util/if.c: In function 'opal_ifgetaliases':
../../../ompi/opal/util/if.c:791: error: 'IFF_LOOPBACK' undeclared (first
use in this function)

The constant IFF_LOOPBACK is defined in net/if.h, as expected.
However, opal_config.h says:

/* #undef HAVE_NET_IF_H */


Related to that is the following output from configure:

checking net/if.h usability... no
checking net/if.h presence... yes
configure: WARNING: net/if.h: present but cannot be compiled
configure: WARNING: net/if.h: check for missing prerequisite headers?
configure: WARNING: net/if.h: see the Autoconf documentation
configure: WARNING: net/if.h: section "Present But Cannot Be Compiled"
configure: WARNING: net/if.h: proceeding with the compiler's result
configure: WARNING: ##
-- ##
configure: WARNING: ## Report this to
http://www.open-mpi.org/community/help/ ##
configure: WARNING: ##
-- ##
checking for net/if.h... no


Details from config.log:

configure:62402: checking net/if.h usability
configure:62402: gcc -std=gnu99 -c -O3 -DNDEBUG -finline-functions
-fno-strict-aliasing  conftest.c >&5
In file included from conftest.c:433:
/usr/include/net/if.h:358: error: field 'ifru_addr' has incomplete type
/usr/include/net/if.h:359: error: field 'ifru_dstaddr' has incomplete type
/usr/include/net/if.h:360: error: field 'ifru_broadaddr' has incomplete type
/usr/include/net/if.h:387: error: field 'ifrau_addr' has incomplete type
/usr/include/net/if.h:393: error: field 'ifra_dstaddr' has incomplete type
/usr/include/net/if.h:395: error: field 'ifra_mask' has incomplete type
/usr/include/net/if.h:438: error: field 'addr' has incomplete type
/usr/include/net/if.h:439: error: field 'dstaddr' has incomplete type
/usr/include/net/if.h:445: error: expected specifier-qualifier-list before
'sa_family_t'
In file included from /usr/include/net/if.h:454,
 from conftest.c:433:
/usr/include/net/if_arp.h:79: error: field 'arp_pa' has incomplete type
/usr/include/net/if_arp.h:80: error: field 'arp_ha' has incomplete type


Man page for if_nametoindex() and friends show:

 #include 
 #include 
 #include 


It looks like configure.ac is *trying* to deal with the sys/socket.h
dependency:

# Needed to work around Darwin requiring sys/socket.h for
# net/if.h
AC_CHECK_HEADERS([net/if.h], [], [],
[#include 
#if STDC_HEADERS
# include 
# include 
#else
# if HAVE_STDLIB_H
#  include 
# endif
#endif
#if HAVE_SYS_SOCKET_H
# include 
#endif
])


However, config.log suggests that the previous failure has been cached,
making this (second) test for net/if.h a no-op:

configure:62453: checking for net/if.h
configure:62453: result: no



-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] OMPI devel] RFC: Reenabling the TCP BTL over local interfaces (when specifically requested)

2016-09-21 Thread Paul Hargrove
On Wed, Sep 21, 2016 at 9:36 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

>
> if i want to exclude ib0, i might want to
> mpirun --mca btl_tcp_if_exclude ib0 ...
>
> to me, this is an honest mistake, but with your proposal, i would be
> screwed when
> running on more than one node because i should have
> mpirun --mca btl_tcp_if_exclude ib0,lo ...



My view on this particular honest mistake is that it feels a lot like
failing to include the "self" btl list.
To the best of my knowledge the is no "safety net" for that user mistake.
Instead, there is documentation in README:

  - If specified, the "btl" parameter must include the "self"
component, or Open MPI will not be able to deliver messages to the
same rank as the sender.  For example: "mpirun --mca btl tcp,self
..."

So, one could/should do he same for btl_tcp_if_exclude.
BUT IT IS ALREADY IN THE README TODAY!
Immediately following the warning above regarding "self" is the following
text:

  - If specified, the "btl_tcp_if_exclude" paramater must include the
loopback device ("lo" on many Linux platforms), or Open MPI will
not be able to route MPI messages using the TCP BTL.  For example:
"mpirun --mca btl_tcp_if_exclude lo,eth1 ..."

So, in short, there is *already* documentation that tells the user *not* to
do what Gilles is worried about.


-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] 2.0.1rc3 posted

2016-09-02 Thread Paul Hargrove
All of my testing on 2.0.1rc3 is complete except for SPARC.
The alignment issue on SPARC *has* been tested via 2.0.1rc2 + patch (so
there is very low probability that 2.0.1rc3 would fail).

At this point I am aware of only two platforms that fail that we didn't
already know about:
+ OpenBSD-6.0 disallows the "patcher" call to mprotect() unless you make
the required animal sacrifice (or something close to it!)
+ Geoffroy's reported problems w/ atomics on PPC64 with PGI-16.x (yes, PGI
for PPC), which is issue open-mpi/ompi#2044

I mention these only for completeness, and don't advocate holding up a
2.0.1 release for either.

-Paul

On Thu, Sep 1, 2016 at 1:47 PM, Jeff Squyres (jsquyres) 
wrote:

> We're getting close.  Unless any showstoppers show up in the immediate
> future, we will likely be releasing this as 2.0.1:
>
> rc3 tarballs here:
>
> https://www.open-mpi.org/software/ompi/v2.0/
>
> Changes since rc2:
>
> - Fix COMM_SPAWN segv
> - Fix yalla bandwidth issue
> - Fix an OMPIO-related crash when using built-in datatypes
> - Fix a Solaris alignment issue
> - Fix a stdin problem
> - Fix a bunch of typos and make other updates in README
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] 2.0.1rc3 posted

2016-09-02 Thread Paul Hargrove
I can confirm that 2.0.1rc2+patch *did* run correctly on Linux/SPARC.
I am running 2.0.1rc3 now, for completeness.

-Paul

On Fri, Sep 2, 2016 at 3:24 AM, Jeff Squyres (jsquyres) 
wrote:

> > On Sep 1, 2016, at 8:42 PM, Gilles Gouaillardet 
> wrote:
> >
> > Paul,
> >
> >
> > I guess this was a typo, and you should either read
> >
> > - Fix a SPARC alignment issue
> >
> > or
> >
> > - Fix an alignment issue on alignment sensitive processors such as SPARC
>
> I did not copy and paste those bullets from NEWS -- that was just
> shorthand for us to know what was done since rc2; sorry for the confusion.
>
> I fixed the bullet this morning to be:
>
> - Fix alignment issues on SPARC platforms.
>
> Good enough.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.1rc3] OpenBSD 6.0 patcher failure

2016-09-01 Thread Paul Hargrove
The OpenBSD 6.0 release was announced slightly under 12 hours before Jeff
announced the Open MPI 2.0.1rc3 tarball.
So, I just *had* to try them out together.

First, let me say that I have no expectation that the following issue be
fixed for 2.0.1, but hopefully before 2.2.

It appears that abort() is being called from mprotect() when called from
the patcher code.
This is very likely to be because the call asks for RWX, but by default
OpenBSD-6.0 *prohibits* W+X.
To allow both Write and eXec requires *both* a linker flag
(-Wl,-z,wxneeded) and that the executable be on an f/s mounted with the
"wxallowed" option.
Manually working to satisfy those conditions allows "make check" and ring_c
to run w/o failures.

-Paul


$ gdb .libs/dlopen_test dlopen_test.core
GNU gdb 6.3
[...]
This GDB was configured as "amd64-unknown-openbsd6.0"...
Core was generated by `dlopen_test'.
Program terminated with signal 6, Aborted.
[...]
#0  0x0b726a6cf44a in mprotect () at :2
2   : No such file or directory.
in 
(gdb) where
#0  0x0b726a6cf44a in mprotect () at :2
#1  0x0b72813d5943 in ModifyMemoryProtection (addr=12586039598736,
length=4096, prot=7)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/mca/patcher/base/patcher_base_patch.c:127
#2  0x0b72813d59aa in apply_patch (patch_data=0xb728e72fe60
"I▒▒-=\201r\v", address=12586039598736,
data_size=13)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/mca/patcher/base/patcher_base_patch.c:135
#3  0x0b72813d5a49 in mca_base_patcher_patch_apply_binary
(patch=0xb728e72fe00)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/mca/patcher/base/patcher_base_patch.c:152
#4  0x0b7340474f73 in mca_patcher_overwrite_apply_patch
(patch=0xb728e72fe00)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/mca/patcher/overwrite/patcher_overwrite_module.c:151
#5  0x0b7340474ff3 in mca_patcher_overwrite_patch_address
(sys_addr=12586039598736,
hook_addr=12586422447608)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/mca/patcher/overwrite/patcher_overwrite_module.c:255
#6  0x0b73404753f3 in mca_patcher_overwrite_patch_symbol
(func_symbol_name=0xb728150038f "munmap",
func_new_addr=12586422447608, func_old_addr=0xb7281827ed8)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/mca/patcher/overwrite/patcher_overwrite_module.c:301
#7  0x0b72813d301c in patcher_open ()
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/mca/memory/patcher/memory_patcher_component.c:442
#8  0x0b728136f87d in open_components (framework=0xb72818240c0)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/mca/base/mca_base_components_open.c:117
#9  0x0b728136f7a0 in mca_base_framework_components_open
(framework=0xb72818240c0,
flags=MCA_BASE_OPEN_DEFAULT)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/mca/base/mca_base_components_open.c:65
#10 0x0b72813d2bab in opal_memory_base_open
(flags=MCA_BASE_OPEN_DEFAULT)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/mca/memory/base/memory_base_open.c:137
#11 0x0b7281370aad in mca_base_framework_open (framework=0xb72818240c0,
flags=MCA_BASE_OPEN_DEFAULT)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/mca/base/mca_base_framework.c:174
#12 0x0b728133e39d in opal_init (pargc=0x7f7c9ffc,
pargv=0x7f7c9ff0)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/opal/runtime/opal_init.c:459
#13 0x0b70441012d3 in main (argc=1, argv=0x7f7ca068)
at
/home/phargrov/OMPI/openmpi-2.0.1rc3-openbsd6-amd64/openmpi-2.0.1rc3/ompi/debuggers/dlopen_test.c:134




-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] openmpi-2.0.0 - problems with ppc64, PGI and atomics

2016-09-01 Thread Paul Hargrove
I failed to get PGI 16.x working at all (licence issue, I think).
So, I can neither confirm nor refute Geoffroy's reported problems.

-Paul

On Thu, Sep 1, 2016 at 6:15 PM, Vallee, Geoffroy R. <valle...@ornl.gov>
wrote:

> Interesting, I am having the problem with both 16.5 and 16.7.
>
> My 2 cents,
>
> > On Sep 1, 2016, at 8:25 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> >
> > FWIW I have not seen problems when testing the 2.0.1rc2 w/ PGI versions
> 12.10, 13.9, 14.3 or 15.9.
> >
> > I am going to test 2.0.2.rc3 ASAP and try to get PGI 16.4 coverage added
> in
> >
> > -Paul
> >
> > On Thu, Sep 1, 2016 at 12:48 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > Please send all the information on the build support page and open an
> issue at github.  Thanks.
> >
> >
> > > On Sep 1, 2016, at 3:41 PM, Vallee, Geoffroy R. <valle...@ornl.gov>
> wrote:
> > >
> > > This is indeed a little better but still creating a problem:
> > >
> > >  CCLD opal_wrapper
> > > ../../../opal/.libs/libopen-pal.a(opal_progress.o): In function
> `_opal_progress_unregister':
> > > /autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/runtime/opal_progress.c:459:
> undefined reference to `opal_atomic_swap_64'
> > > ../../../opal/.libs/libopen-pal.a(opal_progress.o): In function
> `_opal_progress_register':
> > > /autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/runtime/opal_progress.c:398:
> undefined reference to `opal_atomic_swap_64'
> > > make[2]: *** [opal_wrapper] Error 2
> > > make[2]: Leaving directory `/autofs/nccs-svm1_sw/gvh/src/
> openmpi-2.0.1rc2/opal/tools/wrappers'
> > > make[1]: *** [all-recursive] Error 1
> > > make[1]: Leaving directory `/autofs/nccs-svm1_sw/gvh/src/
> openmpi-2.0.1rc2/opal'
> > > make: *** [all-recursive] Error 1
> > >
> > > $ nm libopen-pal.a  | grep atomic
> > > U opal_atomic_cmpset_64
> > > 0ab0 t opal_atomic_cmpset_ptr
> > > U opal_atomic_wmb
> > > 0950 t opal_lifo_push_atomic
> > > U opal_atomic_cmpset_acq_32
> > > 03d0 t opal_atomic_lock
> > > 0450 t opal_atomic_unlock
> > > U opal_atomic_wmb
> > > U opal_atomic_ll_64
> > > U opal_atomic_sc_64
> > > U opal_atomic_wmb
> > > 1010 t opal_lifo_pop_atomic
> > > U opal_atomic_cmpset_acq_32
> > > 04b0 t opal_atomic_init
> > > 04e0 t opal_atomic_lock
> > > U opal_atomic_mb
> > > 0560 t opal_atomic_unlock
> > > U opal_atomic_wmb
> > > U opal_atomic_add_32
> > > U opal_atomic_cmpset_acq_32
> > > 0820 t opal_atomic_init
> > > 0850 t opal_atomic_lock
> > > U opal_atomic_sub_32
> > > U opal_atomic_swap_64
> > > 08d0 t opal_atomic_unlock
> > > U opal_atomic_wmb
> > > 0130 t opal_atomic_init
> > > atomic-asm.o:
> > > 0138 T opal_atomic_add_32
> > > 0018 T opal_atomic_cmpset_32
> > > 00c4 T opal_atomic_cmpset_64
> > > 003c T opal_atomic_cmpset_acq_32
> > > 00e8 T opal_atomic_cmpset_acq_64
> > > 0070 T opal_atomic_cmpset_rel_32
> > > 0110 T opal_atomic_cmpset_rel_64
> > >  T opal_atomic_mb
> > > 0008 T opal_atomic_rmb
> > > 0150 T opal_atomic_sub_32
> > > 0010 T opal_atomic_wmb
> > > 2280 t mca_base_pvar_is_atomic
> > > U opal_atomic_ll_64
> > > U opal_atomic_sc_64
> > > U opal_atomic_wmb
> > > 0900 t opal_lifo_pop_atomic
> > >
> > >> On Sep 1, 2016, at 3:16 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > >>
> > >> Can you try the latest v2.0.1 nightly snapshot tarball?
> > >>
> > >>
> > >>> On Sep 1, 2016, at 2:56 PM, Vallee, Geoffroy R. <valle...@ornl.gov>
> wrote:
> > >>>
> > >>> Hello,
> > >>>
> > >>> I get the following problem when we compile OpenMPI-2.0.0 (it seems
> to be specific to 2.x; the problem did n

Re: [OMPI devel] 2.0.1rc3 posted

2016-09-01 Thread Paul Hargrove
Giles,

Thanks for the clarification.
I have 2.0.1rc2 + patch running on the Linux/SPARC system, but it is still
many hours from finishing/
I hope to run 2.0.1rc3 there tomorrow.

-Paul

On Thu, Sep 1, 2016 at 5:42 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Paul,
>
>
> I guess this was a typo, and you should either read
>
> - Fix a SPARC alignment issue
>
> or
>
> - Fix an alignment issue on alignment sensitive processors such as SPARC
>
>
> the patch i submitted to fix the issue you reported is definitely included
> in 2.0.1rc3
>
>
> Cheers,
>
>
> Gilles
>
> On 9/2/2016 9:31 AM, Paul Hargrove wrote:
>
>
> On Thu, Sep 1, 2016 at 1:47 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> - Fix a Solaris alignment issue
>>
>
> If you mean the SIGBUS I reported that was Linux on SPARC h/w
>
> -Paul
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
>
> ___
> devel mailing 
> listde...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] 2.0.1rc3 posted

2016-09-01 Thread Paul Hargrove
On Thu, Sep 1, 2016 at 1:47 PM, Jeff Squyres (jsquyres) 
wrote:

> - Fix a Solaris alignment issue
>

If you mean the SIGBUS I reported that was Linux on SPARC h/w

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] openmpi-2.0.0 - problems with ppc64, PGI and atomics

2016-09-01 Thread Paul Hargrove
FWIW I have not seen problems when testing the 2.0.1rc2 w/ PGI versions
12.10, 13.9, 14.3 or 15.9.

I am going to test 2.0.2.rc3 ASAP and try to get PGI 16.4 coverage added in

-Paul

On Thu, Sep 1, 2016 at 12:48 PM, Jeff Squyres (jsquyres)  wrote:

> Please send all the information on the build support page and open an
> issue at github.  Thanks.
>
>
> > On Sep 1, 2016, at 3:41 PM, Vallee, Geoffroy R. 
> wrote:
> >
> > This is indeed a little better but still creating a problem:
> >
> >  CCLD opal_wrapper
> > ../../../opal/.libs/libopen-pal.a(opal_progress.o): In function
> `_opal_progress_unregister':
> > /autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/runtime/opal_progress.c:459:
> undefined reference to `opal_atomic_swap_64'
> > ../../../opal/.libs/libopen-pal.a(opal_progress.o): In function
> `_opal_progress_register':
> > /autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/runtime/opal_progress.c:398:
> undefined reference to `opal_atomic_swap_64'
> > make[2]: *** [opal_wrapper] Error 2
> > make[2]: Leaving directory `/autofs/nccs-svm1_sw/gvh/src/
> openmpi-2.0.1rc2/opal/tools/wrappers'
> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory `/autofs/nccs-svm1_sw/gvh/src/
> openmpi-2.0.1rc2/opal'
> > make: *** [all-recursive] Error 1
> >
> > $ nm libopen-pal.a  | grep atomic
> > U opal_atomic_cmpset_64
> > 0ab0 t opal_atomic_cmpset_ptr
> > U opal_atomic_wmb
> > 0950 t opal_lifo_push_atomic
> > U opal_atomic_cmpset_acq_32
> > 03d0 t opal_atomic_lock
> > 0450 t opal_atomic_unlock
> > U opal_atomic_wmb
> > U opal_atomic_ll_64
> > U opal_atomic_sc_64
> > U opal_atomic_wmb
> > 1010 t opal_lifo_pop_atomic
> > U opal_atomic_cmpset_acq_32
> > 04b0 t opal_atomic_init
> > 04e0 t opal_atomic_lock
> > U opal_atomic_mb
> > 0560 t opal_atomic_unlock
> > U opal_atomic_wmb
> > U opal_atomic_add_32
> > U opal_atomic_cmpset_acq_32
> > 0820 t opal_atomic_init
> > 0850 t opal_atomic_lock
> > U opal_atomic_sub_32
> > U opal_atomic_swap_64
> > 08d0 t opal_atomic_unlock
> > U opal_atomic_wmb
> > 0130 t opal_atomic_init
> > atomic-asm.o:
> > 0138 T opal_atomic_add_32
> > 0018 T opal_atomic_cmpset_32
> > 00c4 T opal_atomic_cmpset_64
> > 003c T opal_atomic_cmpset_acq_32
> > 00e8 T opal_atomic_cmpset_acq_64
> > 0070 T opal_atomic_cmpset_rel_32
> > 0110 T opal_atomic_cmpset_rel_64
> >  T opal_atomic_mb
> > 0008 T opal_atomic_rmb
> > 0150 T opal_atomic_sub_32
> > 0010 T opal_atomic_wmb
> > 2280 t mca_base_pvar_is_atomic
> > U opal_atomic_ll_64
> > U opal_atomic_sc_64
> > U opal_atomic_wmb
> > 0900 t opal_lifo_pop_atomic
> >
> >> On Sep 1, 2016, at 3:16 PM, Jeff Squyres (jsquyres) 
> wrote:
> >>
> >> Can you try the latest v2.0.1 nightly snapshot tarball?
> >>
> >>
> >>> On Sep 1, 2016, at 2:56 PM, Vallee, Geoffroy R. 
> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I get the following problem when we compile OpenMPI-2.0.0 (it seems to
> be specific to 2.x; the problem did not appear with 1.10.x) with PGI:
> >>>
> >>> CCLD opal_wrapper
> >>> ../../../opal/.libs/libopen-pal.so: undefined reference to
> `opal_atomic_sc_64'
> >>> ../../../opal/.libs/libopen-pal.so: undefined reference to
> `opal_atomic_ll_64'
> >>> ../../../opal/.libs/libopen-pal.so: undefined reference to
> `opal_atomic_swap_64'
> >>> make[1]: *** [opal_wrapper] Error 2
> >>>
> >>> It is a little for me to pin point the exact problem but i can see the
> following:
> >>>
> >>> $ nm ./.libs/libopen-pal.so | grep atomic
> >>> 00026320 t 0017.plt_call.opal_atomic_add_32
> >>> 00026250 t 0017.plt_call.opal_atomic_cmpset_32
> >>> 00026780 t 0017.plt_call.opal_atomic_cmpset_64
> >>> 000280c0 t 0017.plt_call.opal_atomic_cmpset_acq_32
> >>> 00028ae0 t 0017.plt_call.opal_atomic_ll_64
> >>> 00027fe0 t 0017.plt_call.opal_atomic_mb
> >>> 00027d50 t 0017.plt_call.opal_atomic_rmb
> >>> 00028500 t 0017.plt_call.opal_atomic_sc_64
> >>> 00027670 t 0017.plt_call.opal_atomic_sub_32
> >>> 00026da0 t 0017.plt_call.opal_atomic_swap_64
> >>> 00027050 t 0017.plt_call.opal_atomic_wmb
> >>> 0005e6a0 t mca_base_pvar_is_atomic
> >>> 0004715c T opal_atomic_add_32
> >>> 0004703c T opal_atomic_cmpset_32
> >>> 000470e8 T opal_atomic_cmpset_64
> >>> 

Re: [OMPI devel] [2.0.1rc2] SIGBUS on Linux/SPARC

2016-09-01 Thread Paul Hargrove
Giles,

I will try the patch and let you know.
However, it will likely not be until tomorrow (Friday AM California time)
that I have results.

-Paul

On Wed, Aug 31, 2016 at 9:26 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Thanks Paul,
>
>
> can you please try the patch available at https://patch-diff.
> githubusercontent.com/raw/open-mpi/ompi-release/pull/1357.patch ?
>
>
> Cheers,
>
>
> Gilles
>
> On 9/1/2016 2:12 AM, Paul Hargrove wrote:
>
> On an emulated UltraSPARC system running Linux (and using V9 ABI) I was
> able to build the RC, but get a SIGBUS when running ring_c.
> The problem is an unaligned 64-bit access, as shown by the gdb session
> below.
>
> I have not tried, but it *might* be possible to reproduce on PPC64 via
> "prctl --unaligned=signal".
>
> -Paul
>
>
> Core was generated by `examples/ring_c'.
> Program terminated with signal 10, Bus error.
> #0  0xf630ed64 in component_set_addr (peer=0xf6bb7114, uris=0x90ec8)
> at /home/phargrov/OMPI/openmpi-2.0.1rc2-linux-sparcv9/openmpi-
> 2.0.1rc2/orte/mca/oob/usock/oob_usock_component.c:318
> 318 if (OPAL_SUCCESS != opal_hash_table_get_value_
> uint64(_oob_usock_module.peers,
>
> (gdb) l
> 313 if (ORTE_PROC_IS_APP) {
> 314 /* if this is my daemon, then take it - otherwise, ignore
> */
> 315 if (ORTE_PROC_MY_DAEMON->jobid == peer->jobid &&
> 316 ORTE_PROC_MY_DAEMON->vpid == peer->vpid) {
> 317 ui64 = (uint64_t*)peer;
> 318 if (OPAL_SUCCESS != opal_hash_table_get_value_
> uint64(_oob_usock_module.peers,
> 319
>  (*ui64), (void**)) || NULL == pr) {
> 320 pr = OBJ_NEW(mca_oob_usock_peer_t);
> 321 pr->name = *peer;
> 322 opal_hash_table_set_value_
> uint64(_oob_usock_module.peers, (*ui64), pr);
>
> (gdb) print ui64
> $1 = (uint64_t *) 0xf6bb7114
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
>
> ___
> devel mailing 
> listde...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.1rc2] SIGBUS on Linux/SPARC

2016-08-31 Thread Paul Hargrove
On an emulated UltraSPARC system running Linux (and using V9 ABI) I was
able to build the RC, but get a SIGBUS when running ring_c.
The problem is an unaligned 64-bit access, as shown by the gdb session
below.

I have not tried, but it *might* be possible to reproduce on PPC64 via
"prctl --unaligned=signal".

-Paul


Core was generated by `examples/ring_c'.
Program terminated with signal 10, Bus error.
#0  0xf630ed64 in component_set_addr (peer=0xf6bb7114, uris=0x90ec8)
at
/home/phargrov/OMPI/openmpi-2.0.1rc2-linux-sparcv9/openmpi-2.0.1rc2/orte/mca/oob/usock/oob_usock_component.c:318
318 if (OPAL_SUCCESS !=
opal_hash_table_get_value_uint64(_oob_usock_module.peers,

(gdb) l
313 if (ORTE_PROC_IS_APP) {
314 /* if this is my daemon, then take it - otherwise, ignore */
315 if (ORTE_PROC_MY_DAEMON->jobid == peer->jobid &&
316 ORTE_PROC_MY_DAEMON->vpid == peer->vpid) {
317 ui64 = (uint64_t*)peer;
318 if (OPAL_SUCCESS !=
opal_hash_table_get_value_uint64(_oob_usock_module.peers,
319
 (*ui64), (void**)) || NULL == pr) {
320 pr = OBJ_NEW(mca_oob_usock_peer_t);
321 pr->name = *peer;
322
opal_hash_table_set_value_uint64(_oob_usock_module.peers, (*ui64), pr);

(gdb) print ui64
$1 = (uint64_t *) 0xf6bb7114

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] 2.0.1rc2 released

2016-08-31 Thread Paul Hargrove
On Tue, Aug 30, 2016 at 1:38 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:

> On Aug 30, 2016, at 4:06 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> >
> > I will report my findings as they come in from my testers.
> > However, NERSC is down for quarterly maintenance which means I am w/o
> Intel compilers today.
> >
> > I am proud to have been verb-ified, but could I get some clarification
> on which "Hargroved" items are fixed?
> >
> > I *am* expecting that the following are included:
> > + sec/native on Solaris (PR 1336)
> > + pmix use of strnlen() [requires unsupported Mac OS X 10.6 to verify]
> > + README updates [which I will not be proof reading]
> >
> > I am currently assuming these are *not* fixed for this rc (all have
> 2.0.2 milestone):
> > + Support for NAG Fortran (PR 1215)
> > + xlc-12.1 inline atomics (PR 1344)
> > + The memory/patcher issues with xlc (PR 1347)
>
> Correct on all counts.
>
> We talked today on the webex about releasing today/tomorrow, but Ralph
> just identified a second part to the stdin wireup issue, and we're still
> testing a COMM_SPAWN issue.  So we might push back a little further... :-\



With NERSC back in operation late last night I was able to get the Intel
compiler tests taken care of.
Also overnight my slow ARM and MIPS emulators finished.

Overall everything I had tested previously is as expected - things listed
as fixed are fixed.
No previously-unknown problems were seen on the platforms on which I had
tested 2.0.0rc1.

HOWEVER, I was able to get a full run on an emulated SPARCv9 platform for
the first time.
It FAILED with a SIGBUS, which I will report that shortly in a new thread.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Off-topic re: supporting old systems

2016-08-30 Thread Paul Hargrove
Christopher,

I mentioned the redirect only because you said something about "shouldn't
be using SSL anyway".

My testing script always disables the cert check because it has been needed
often enough to become my preferred default.


phargrov@sparc64:~$ sudo apt-get update
[...]
W: Failed to fetch http://security.debian.org/dists/wheezy/updates/Release
 Unable to find expected entry 'main/binary-sparc/Packages' in Release file
(Wrong sources.list entry or malformed file)

E: Some index files failed to download. They have been ignored, or old ones
used instead.

phargrov@sparc64:~$ apt-cache policy wget
wget:
  Installed: 1.13.4-3+deb7u2
  Candidate: 1.13.4-3+deb7u2
  Version table:
 *** 1.13.4-3+deb7u2 0
500 http://ftp.us.debian.org/debian/ wheezy/main sparc Packages
500 http://security.debian.org/ wheezy/updates/main sparc Packages
100 /var/lib/dpkg/status

I'll looked into that warning and error from "apt-get update" and found
that the sparc files that were supposed to move from security.debian.org to
archive.debian.org never made it.
So, the sparc platform is a bit more orphaned that it already was when
support stopped at Wheezy.


-Paul

On Tue, Aug 30, 2016 at 7:16 PM, Christopher Samuel <sam...@unimelb.edu.au>
wrote:

> On 31/08/16 12:05, Paul Hargrove wrote:
>
> > As Giles mentions the http: redirects to https: before anything is
> fetched.
> > Replacing "-nv" in the wget command with "-v" shows that redirect
> clearly.
>
> Agreed, but it still just works on Debian Wheezy for me. :-)
>
> What does "apt-cache policy wget" say for you?
>
> root@db3:/tmp# apt-cache policy wget
> wget:
>   Installed: 1.13.4-3+deb7u3
>   Candidate: 1.13.4-3+deb7u3
> [...]
>
> Here's the plain wget, with redirect, don't even need to disable the
> certificate check here on Debian Wheezy (though it still works if you do).
>
> root@db3:/tmp# wget  http://www.open-mpi.org/software/ompi/v2.0/downloads/
> openmpi-2.0.1rc2.tar.bz2
> --2016-08-31 12:11:59--  http://www.open-mpi.org/
> software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
> Resolving www.open-mpi.org (www.open-mpi.org)... 192.185.39.252
> Connecting to www.open-mpi.org (www.open-mpi.org)|192.185.39.252|:80...
> connected.
> HTTP request sent, awaiting response... 302 Found
> Location: https://www.open-mpi.org/software/ompi/v2.0/downloads/
> openmpi-2.0.1rc2.tar.bz2 [following]
> --2016-08-31 12:11:59--  https://www.open-mpi.org/
> software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
> Connecting to www.open-mpi.org (www.open-mpi.org)|192.185.39.252|:443...
> connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 8192091 (7.8M) [application/x-tar]
> Saving to: `openmpi-2.0.1rc2.tar.bz2'
>
> 100%[===
> =>] 8,192,091   1.75M/s   in 7.3s
>
> 2016-08-31 12:12:08 (1.07 MB/s) - `openmpi-2.0.1rc2.tar.bz2' saved
> [8192091/8192091]
>
>
> All the best,
> Chris
> --
>  Christopher SamuelSenior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/  http://twitter.com/vlsci
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.1rc2] CRITICAL error in README

2016-08-30 Thread Paul Hargrove
I believe that both the addresses and subscription URLs for the mailing
lists are out-of-date in the README as shown in red below.
I don't know if the list addresses might be forwarding, but those
subscription URLs are definitely 404.

-Paul

The best way to report bugs, send comments, or ask questions is to
sign up on the user's and/or developer's mailing list (for user-level
and developer-level questions; when in doubt, send to the user's
list):

us...@open-mpi.org
de...@open-mpi.org

Because of spam, only subscribers are allowed to post to these lists
(ensure that you subscribe with and post from exactly the same e-mail
address -- j...@example.com is considered different than
j...@mycomputer.example.com!).  Visit these pages to subscribe to the
lists:

 http://www.open-mpi.org/mailman/listinfo.cgi/users
 http://www.open-mpi.org/mailman/listinfo.cgi/devel

Thanks for your time.





-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Off-topic re: supporting old systems

2016-08-30 Thread Paul Hargrove
Giles,

I am not using a proxy at the http level, but am behind a NAT box (which
should not make any difference).
Both http: and https: URLs fail in the same manner with wget, while curl is
fine with either.

Christopher,

As Giles mentions the http: redirects to https: before anything is fetched.
Replacing "-nv" in the wget command with "-v" shows that redirect clearly.

All,

I am on a big-endian (SPARC) host, which I did not mention previously.
So, perhaps there is a wget or GnuTLS bug on Wheezy that is endian or arch
specific.

-Paul

On Tue, Aug 30, 2016 at 6:31 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Paul,
>
>
> http://www.open-mpi.org redirects to https://www.open-mpi.org
>
> (note http vs https)
>
> if you are using a manual proxy, both http_proxy and https_proxy should be
> configured in your /etc/wgetrc
>
>
> if you are behind a transparent proxy, you have to make sure it allows
> https to www.open-mpi.org
>
> (some transparent proxy might simply close the connection to a non white
> listed https site, hence the confusing error message)
>
>
> btw, you might want to directly wget https ...
>
> $ wget -nv --no-check-certificate https://www.open-mpi.org/softw
> are/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
> <http://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2>
>
> Cheers,
>
> Gilles
>
> On 8/31/2016 9:44 AM, Christopher Samuel wrote:
>
> On 31/08/16 06:22, Paul Hargrove wrote:
>
>
> It seems that a stock Debian Wheezy system cannot even *download* Open
> MPI any more:
>
> Works for me, both http (which shouldn't be using SSL anyway) and https.
>
> Are you behind some weird intercepting proxy?
>
> root@db3:/tmp# wget -nv --no-check-certificate 
> http://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
> 2016-08-31 10:42:34 
> URL:https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
>  [8192091/8192091] -> "openmpi-2.0.1rc2.tar.bz2" [1]
>
> root@db3:/tmp# wget -nv --no-check-certificate 
> https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
> 2016-08-31 10:43:10 
> URL:https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
>  [8192091/8192091] -> "openmpi-2.0.1rc2.tar.bz2.1" [1]
>
> root@db3:/tmp# cat /etc/issue
> Debian GNU/Linux 7 \n \l
>
> cheers,
> Chris
>
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] pmix3x broken on Solaris/x86 (no flock)

2016-08-30 Thread Paul Hargrove
Just tried to build last night's master tarball on Solaris-11.3 on x86-64
hardware (but keep in mind the default ABI is ILP32).

I see the following fatal error

"/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c",
line 305: warning: implicit function declaration: flock
(E_NO_IMPLICIT_DECL_ALLOWED)
"/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c",
line 305: undefined symbol: LOCK_EX
"/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c",
line 321: undefined symbol: LOCK_UN
"/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c",
line 335: undefined symbol: LOCK_UN
"/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c",
line 348: undefined symbol: LOCK_UN
"/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c",
line 369: undefined symbol: LOCK_UN
"/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c",
line 410: undefined symbol: LOCK_SH
"/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c",
line 435: undefined symbol: LOCK_UN
"/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c",
line 444: undefined symbol: LOCK_UN
"/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c",
line 454: undefined symbol: LOCK_UN
"/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c",
line 561: undefined symbol: LOCK_UN
cc: acomp failed for
/shared/OMPI/openmpi-master-solaris11-x86-ib-ss12u5/openmpi-dev-4716-g99b2664/opal/mca/pmix/pmix3x/pmix/src/dstore/pmix_esh.c

This is, of course, because flock() is a BSDism not defined by POSIX and
therefore not present on SysV systems.
For information on the more portable fcntl()-based alternative, see for
instance https://www.perkin.org.uk/posts/solaris-portability-flock.html

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] FYI: soon to lose IA64 access

2016-08-30 Thread Paul Hargrove
FWIW, I can confirm that 'master' currently works fine on IA64 using
builtin atomic support (which is the default on 'master').
-Paul

On Tue, Aug 30, 2016 at 4:56 PM, Nathan Hjelm <hje...@me.com> wrote:

> This might be the last straw in IA64 support. If we can’t even test it
> anymore it might *finally* be time to kill the asm. If someone wants to use
> IA64 they can use the builtin atomic support.
>
> -Nathan
>
> > On Aug 30, 2016, at 4:42 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> >
> > I don't recall the details of the last discussion over which CPU
> architectures would be dropped effective when.
> > However, apparently IA64 support is still present in both 2.0.1rc2 and
> master
> >
> > I suspect that I am currently the only member of this community with the
> ability to test IA64.
> > So, I thought I should let you know that the owners of the 4-CPU
> IA64-based Altix I've been using are planning to retire it.
> > No date has been set that I know of, but "soon".
> >
> > If any of you are interested in receiving a donation of a 2U rack-mount
> Altix system, I can make the necessary introductions.
> > I have already declined the offer.
> >
> > -Paul
> >
> > --
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department   Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel




-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] FYI: soon to lose IA64 access

2016-08-30 Thread Paul Hargrove
I don't recall the details of the last discussion over which CPU
architectures would be dropped effective when.
However, apparently IA64 support is still present in both 2.0.1rc2 and
master

I suspect that I am currently the only member of this community with the
ability to test IA64.
So, I thought I should let you know that the owners of the 4-CPU IA64-based
Altix I've been using are planning to retire it.
No date has been set that I know of, but "soon".

If any of you are interested in receiving a donation of a 2U rack-mount
Altix system, I can make the necessary introductions.
I have already declined the offer.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] pmix3x passing -Wall to *all* compilers.

2016-08-30 Thread Paul Hargrove
On a recent nightly tarball of 'master' (but not v2.0) I am seeing "-Wall"
among the options passed to the compiler without any configure test to
ensure this options is safe.  The root cause seems to be the following
(final) line in opal/mca/pmix/pmix3x/pmix/src/Makefile.am:
AM_CFLAGS = -Wall

I find that xlc-13.1 warns:

libtool: compile:  xlc -DHAVE_CONFIG_H -I. -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-13.1/openmpi-dev-4711-
gd33204b/opal/mca/pmix/pmix3x/pmix/src -I../src/include
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-
xlc-13.1/BLD/opal/mca/pmix/pmix3x/pmix -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-13.1/openmpi-dev-4711-
gd33204b/opal/mca/pmix/pmix3x/pmix -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-13.1/openmpi-dev-4711-
gd33204b/opal/mca/pmix/pmix3x/pmix/src -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-13.1/BLD/opal/mca/pmix/pmix3x/pmix/include
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-xlc-13.1/openmpi-dev-4711-
gd33204b/opal/mca/pmix/pmix3x/pmix/include -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-13.1/openmpi-dev-4711-gd33204b
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-xlc-13.1/BLD
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-xlc-13.1/openmpi-dev-4711-gd33204b/opal/include
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-xlc-13.1/BLD/opal/include
-D_REENTRANT -I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-
xlc-13.1/openmpi-dev-4711-gd33204b/opal/mca/hwloc/hwloc1113/hwloc/include
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-
xlc-13.1/BLD/opal/mca/hwloc/hwloc1113/hwloc/include -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-13.1/openmpi-dev-4711-
gd33204b/opal/mca/event/libevent2022/libevent -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-13.1/openmpi-dev-4711-
gd33204b/opal/mca/event/libevent2022/libevent/include
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-
xlc-13.1/BLD/opal/mca/event/libevent2022/libevent/include -Wall -q64 -g -c
/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-xlc-13.1/
openmpi-dev-4711-gd33204b/opal/mca/pmix/pmix3x/pmix/src/class/pmix_object.c
-Wp,-qmakedep=gcc,-MFclass/.deps/pmix_object.TPlo  -qpic -DPIC -o
class/.libs/pmix_object.o
/opt/ibm/xlC/13.1.0/bin/.orig/xlc: 1501-289 (W) Option -Wall was
incorrectly specified. The option will be ignored.

However, xlc-11.1 (and presumably many other compilers) will choke on this
option:

libtool: compile:  xlc -DHAVE_CONFIG_H -I. -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-11.1/openmpi-dev-4712-
ga6d515b/opal/mca/pmix/pmix3x/pmix/src -I../src/include
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-
xlc-11.1/BLD/opal/mca/pmix/pmix3x/pmix -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-11.1/openmpi-dev-4712-
ga6d515b/opal/mca/pmix/pmix3x/pmix -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-11.1/openmpi-dev-4712-
ga6d515b/opal/mca/pmix/pmix3x/pmix/src -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-11.1/BLD/opal/mca/pmix/pmix3x/pmix/include
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-xlc-11.1/openmpi-dev-4712-
ga6d515b/opal/mca/pmix/pmix3x/pmix/include -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-11.1/openmpi-dev-4712-ga6d515b
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-xlc-11.1/BLD
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-xlc-11.1/openmpi-dev-4712-ga6d515b/opal/include
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-xlc-11.1/BLD/opal/include
-D_REENTRANT -I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-
xlc-11.1/openmpi-dev-4712-ga6d515b/opal/mca/hwloc/hwloc1113/hwloc/include
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-
xlc-11.1/BLD/opal/mca/hwloc/hwloc1113/hwloc/include -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-11.1/openmpi-dev-4712-
ga6d515b/opal/mca/event/libevent2022/libevent -I/gpfs-biou/phh1/OMPI/
openmpi-master-linux-ppc64-xlc-11.1/openmpi-dev-4712-
ga6d515b/opal/mca/event/libevent2022/libevent/include
-I/gpfs-biou/phh1/OMPI/openmpi-master-linux-ppc64-
xlc-11.1/BLD/opal/mca/event/libevent2022/libevent/include -Wall -DNDEBUG -O
-DNDEBUG -q64 -qsuppress=1501-274 -c /gpfs-biou/phh1/OMPI/openmpi-
master-linux-ppc64-xlc-11.1/openmpi-dev-4712-ga6d515b/
opal/mca/pmix/pmix3x/pmix/src/class/pmix_object.c
-Wp,-qmakedep=gcc,-MFclass/.deps/pmix_object.TPlo  -qpic -DPIC -o
class/.libs/pmix_object.o
/opt/apps/ibm/vac/11.1/bin/.orig/xlc: 1501-210 (S) command option Wall
contains an incorrect subargument

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Off-topic re: supporting old systems

2016-08-30 Thread Paul Hargrove
On Tue, Aug 30, 2016 at 1:42 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:

> On Aug 30, 2016, at 4:22 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> >
> > It seems that a stock Debian Wheezy system cannot even *download* Open
> MPI any more:
> >
> > $ wget -nv --no-check-certificate http://www.open-mpi.org/
> software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
> > GnuTLS: A TLS packet with unexpected length was received.
> > Unable to establish SSL connection.
> >
> > I assume this is because modern web servers have dropped support for old
> ciphers and hashes that are insecure, and the Wheezy-era GnuTLS
> implementation lacks support for the newer ones.
> > So, this is just an observation, not a support request.
>
> Woof!
>
> If someone who is knowledgeable in these areas can confirm that Paul's
> supposition is correct, that would be great...
>
>

FWIW:
The stock "curl" in that distro worked where "wget" did not.
So, my conclusion that "a stock Debian Wheezy system cannot even *download*
Open MPI" was premature.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] Off-topic re: supporting old systems

2016-08-30 Thread Paul Hargrove
On Tue, Aug 30, 2016 at 10:49 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
[...]
>
> I still have systems running Red Hat Linux 8 (that would be something like
> Fedora "negative 3").
> I had to accept that Open MPI moved forward while I did not - I use Open
> MPI 1.6.5 on that system.
> When I reported a build issue with 2.0.1rc1 on Mac OS X 10.6, this
> community responded by dropping support for not just OS X 10.6, but also
> 10.7.
>
[...]


It seems that a stock Debian Wheezy system cannot even *download* Open MPI
any more:

$ wget -nv --no-check-certificate
http://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1rc2.tar.bz2
GnuTLS: A TLS packet with unexpected length was received.
Unable to establish SSL connection.

I assume this is because modern web servers have dropped support for old
ciphers and hashes that are insecure, and the Wheezy-era GnuTLS
implementation lacks support for the newer ones.
So, this is just an observation, not a support request.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] 2.0.1rc2 released

2016-08-30 Thread Paul Hargrove
I will report my findings as they come in from my testers.
However, NERSC is down for quarterly maintenance which means I am w/o Intel
compilers today.

I am proud to have been verb-ified, but could I get some clarification on
which "Hargroved" items are fixed?

I *am* expecting that the following are included:
+ sec/native on Solaris (PR 1336)
+ pmix use of strnlen() [requires unsupported Mac OS X 10.6 to verify]
+ README updates [which I will not be proof reading]

I am currently assuming these are *not* fixed for this rc (all have 2.0.2
milestone):
+ Support for NAG Fortran (PR 1215)
+ xlc-12.1 inline atomics (PR 1344)
+ The memory/patcher issues with xlc (PR 1347)

Let me know if either of those lists is incorrect, or if there is anything
else recent for which you are expecting my tests to validate.

-Paul

On Mon, Aug 29, 2016 at 2:00 PM, Jeff Squyres (jsquyres)  wrote:

> In the usual place:
>
> https://www.open-mpi.org/software/ompi/v2.0/
>
> Please test!
>
> We fixed a few things since rc1:
>
> - several Hargroved items
> - stdin propagation
>
> There may still be some OSHMEM issues.  We'll discuss these on the Webex
> tomorrow.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] C89 support

2016-08-30 Thread Paul Hargrove
Nathan,

Unless I have misunderstood both Chris and the Clang bug report, the
problematic functions are in glibc.
So, addition of the gnu_inline attribute would probably require either
modifying system headers or interposing ahead of them.

-Paul

On Tue, Aug 30, 2016 at 8:30 AM, Nathan Hjelm  wrote:

> The best way to put this is his compiler defaults to --std=gnu89. That
> gives him about 90% of what we require from C99 but has weirdness like
> __restrict. The real solution is the list of functions that are called out
> on link and spot fixing with the gnu_inline attribute if -fgnu89-inline
> does not work.
>
> -Nathan
>
> On Aug 30, 2016, at 09:23 AM, "r...@open-mpi.org"  wrote:
>
> Chris
>
> At the risk of being annoying, it would really help me if you could answer
> my question: is Gilles correct in his feeling that we are looking at a
> scenario where you can support 90% of C99 (e.g., C99-style comments, named
> structure fields), and only the things modified in this PR are required?
>
> I’m asking because these changes are minor and okay, but going back thru
> the code to revise all the comments and other C99isms would be a rather
> large task.
>
>
> On Aug 30, 2016, at 7:06 AM, C Bergström  wrote:
>
>
> On Tue, Aug 30, 2016 at 9:20 PM, Jeff Squyres (jsquyres)
>
>  wrote:
>
> On Aug 29, 2016, at 11:42 PM, C Bergström 
> wrote:
>
>
> Paul - Is this your typical post? I can't tell if you're trying to be
>
> rude or it's accidental.
>
>
> I believe that multiple people on this thread are reacting to the
> passive-aggressive tones and negative connotations in charged replies.
>
>
> Total bullshit - If any of my replies were "charged", passive
>
> aggressive or otherwise that was not my intention. Anyone who I
>
> thought has replied rudely, I have called out directly and I don't
>
> mince words.
>
>
> I'm not interested to spend 50 replies on 3 small patches. If you guys
>
> don't care about platform X, Foo compiler or older standards I respect
>
> that. My 1st email started with what I consider a humble tone. My
>
> patches are public and I've given all the details I have time for.
>
>
> Last try
>
>
>
> I'd like to see:
>
>
> 1. The specific link error that we're talking about.
>
>
> As posted before - the error is *exactly* the same as in the public
>
> clang bug report.
>
>
> (Thanks to Nathan)
>
> https://webcache.googleusercontent.com/search?
> q=cache:p2WZm7Vlt2gJ:https://llvm.org/bugs/show_bug.cgi%
> 3Fid%3D5960+=1=en=clnk=us
>
>
>
> 2. All the information listed on https://www.open-mpi.org/community/help/
> for compile/build problems.
>
>
> I'm not going to shift threw this wall of text to try to figure out
>
> what you feel is missing. (Now my tone is "curt" if I have to be
>
> clear)
>
>
>
> 3. More complete patches for fixing the issues. Specifically, the 3
> provided patches fix certain issues in some parts of the code base, but the
> same issues occur in other places in the code base. As such, the provided
> patches are not complete.
>
>
> The patches against 1.x are complete. If you want to test and fix 2.x
>
> branch or git master feel free to pull my patches when I push them to
>
> our github.
>
>
> You can verify the patches with clang and SLES10. In the near future
>
> it's likely I'll even post prebuilt binaries of clang which could be
>
> used for easier validation. There's also of course the nightly EKOPath
>
> builds that are available.. etc etc
>
> 
>
> In parting - I will test LDFLAGS|CFLAGS=“-fgnu89-inline” and if it
>
> does indeed fix the issue without side effects I'll let you guys know.
>
> ___
>
> devel mailing list
>
> devel@lists.open-mpi.org
>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] C89 support

2016-08-30 Thread Paul Hargrove
On Tue, Aug 30, 2016 at 7:06 AM, C Bergström 
wrote:

>
> >
> > 3. More complete patches for fixing the issues.  Specifically, the 3
> provided patches fix certain issues in some parts of the code base, but the
> same issues occur in other places in the code base.  As such, the provided
> patches are not complete.
>
> The patches against 1.x are complete. If you want to test and fix 2.x
> branch or git master feel free to pull my patches when I push them to
> our github.



I think the "completeness" issue might be misunderstood.
There are numerous components in Open MPI, some of which will not be
compiled if configure fails to find the necessary dependencies.
So, there is a concern that your patches may be complete for your
customer's site but not for another.
For instance, a SLES10 install with headers and libs for some additional
supported network will compile files you have not patched.

Take, for instance, the case of declaring the loop control variable in
initialization clause of a for-statement:
In the 1.10.3rc4 source I have on hand, I find (at least) 112 instances in
51 files:

$  grep -r 'for *(int' . | wc -l
112
$ grep -rl 'for *(int' . | wc -l
51


So, the fact that your patches touch only a small fraction of those is
enough to raise the concern over their completeness.


On another, tangentially related, point:
When using clang on some systems with older glibc, I have found CC='clang
-U__GNUC__' to be helpful (and similarly for CXX, when used).
Unfortunately, that is likely to also disable inline asm that might
otherwise be used.
This is also not likely to fix the sort of problem described in Clang Bug
5960.
So, I mention it only on the off-chance it proves useful to somebody.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] C89 support

2016-08-30 Thread Paul Hargrove
On Tue, Aug 30, 2016 at 7:06 AM, C Bergström 
wrote:

> On Tue, Aug 30, 2016 at 9:20 PM, Jeff Squyres (jsquyres)
>  wrote:
> > On Aug 29, 2016, at 11:42 PM, C Bergström 
> wrote:
> >>
> >> Paul - Is this your typical post? I can't tell if you're trying to be
> >> rude or it's accidental.
> >
> > I believe that multiple people on this thread are reacting to the
> passive-aggressive tones and negative connotations in charged replies.
>
> Total bullshit - If any of my replies were "charged", passive
> aggressive or otherwise that was not my intention.



If opening your post with the phrase "Total bullshit" is not "charged" then
I don't know how to define that term.
If you were not guilty of the charge Jeff leveled against you before, then
you are now.

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] C89 support

2016-08-30 Thread Paul Hargrove
Responses inline, below.


On Mon, Aug 29, 2016 at 8:42 PM, C Bergström <cbergst...@pathscale.com>
wrote:

> On Tue, Aug 30, 2016 at 5:49 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> >
> > On Mon, Aug 29, 2016 at 8:32 AM, C Bergström <cbergst...@pathscale.com>
> > wrote:
> > [...snip...]
> >>
> >> Based on the latest response - it seems that we'll just fork OMPI and
> >> maintain those patches on top. I'll advise our customers not to use
> >> OMPI and document why.
> >>
> >> Thanks again
> >> ___
> >> devel mailing list
> >> devel@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> >
> >
> >
> > Though I participate on this list, I am not one of the Open MPI
> developers,
> > and do not pretend to speak for them.
> >
> > So, speaking only for myself, I already recommend that users of any
> recent
> > Open MPI avoid compiling it using the PathScale compilers.
> > My own testing shows that both ekopath-5.0.5 and ekopath-6.0.527
> experience
> > Internal Compiler Errors or SEGVs when building Open MPI, and at least
> one
> > other package I care about (GASNet).
> > So I think you can understand why I find it ironic that PathScale should
> > request that the Open MPI sources revert to C89 to support PathScale
> > compilers for an EOL distro.
>
> Paul - Is this your typical post? I can't tell if you're trying to be
> rude or it's accidental.
>


I am well known on this list and I think others will agree this was not my
typical post.
It is my response to your prior post: "I'll advise our customers not to use
OMPI and document why" which I found to be rude and inappropriate.

When a colleague and I report problems back in 2011 with the Open Sourced
PathScale4, you were unhelpful at the time.
We could not get the release to work on Linux, FreeBSD or Solaris and then
you apparently abruptly stopped responding.
Perhaps that previous experience (unintentionally/unconsciously) put a bit
of additional "bite" into my posting this time.



> Moving your complaint to more technical points
> #0 As stated before this issue is not exclusive to PathScale, but
> inherited from clang and root caused by glibc.
>
> A forum post with a similar complaint/question
> http://clang-developers.42468.n3.nabble.com/minimum-glibc-on
> -Linux-needed-to-work-with-clang-in-c99-mode-td2093917.html
>
> clang bugzilla is currently limited access, but when back to public
> you can get more details here
> https://llvm.org/bugs/show_bug.cgi?id=5960
>
>
> Again thanks for hijacking the thread, but in regards to your issue
> #1 Have you tested a newer version? (You appear to be more than a year
> off in versions and not on anything officially supported)


I have no control over the versions installed on the systems where I have
access to PathScale compilers.
As a guest by courtesy at that particular institution I have very little
influence regarding installations of new software.
It appears they no longer have any user demand for the eko compiler suite.

When I visited www.pathscale.com prior to my previous posting, I could not
find any information about what versions are current.
So I had no passive means by which to determine that the release I had
tested were so out-of-date as to be unsupported.
Is such information available somewhere?  Should I email "sales" for this
sort of information in the future?



> #2 Have you ever filed a support request with us?
>

I reported the issues twice here on this list and Jeff Squyres subsequently
indicated the problem had been reported to PathScale.
For reference:
https://mail-archive.com/devel%40lists.open-mpi.org/msg18945.html
https://mail-archive.com/devel%40lists.open-mpi.org/msg19205.html (and
Jeff's reply stating the issue had been reported)




>
> #3 You should realize that we're in the process of trying to setup
> versions of OpenMPI that are validated and 100% tested. (Thus trying
> to avoid problems like this going forward)
>

I think that is a good think.
I work in (for?) this community as the "oddball" that tests on all the
compilers and OSes that I can possibly find.
So, if I  know that PathScale tests new releases of Open MPI with their own
compilers then I can stop trying to use the installs you have noted are
out-of-date.



>
> I have no problem taking a hit on a bug or some issue, but I would
> hope that anyone an ironic sense of humor would fact check before
> complaining publicly.
>

If I have made any factual errors, even by omission, then I apologize.
It is not my intention to win some imagined debate by deception.



>
> My motivation isn't 

Re: [OMPI devel] C89 support

2016-08-29 Thread Paul Hargrove
On Mon, Aug 29, 2016 at 8:32 AM, C Bergström 
wrote:
[...snip...]

> Based on the latest response - it seems that we'll just fork OMPI and
> maintain those patches on top. I'll advise our customers not to use
> OMPI and document why.
>
> Thanks again
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>


Though I participate on this list, I am not one of the Open MPI developers,
and do not pretend to speak for them.

So, speaking only for myself, I already recommend that users of any recent
Open MPI avoid compiling it using the PathScale compilers.
My own testing shows that both ekopath-5.0.5 and ekopath-6.0.527 experience
Internal Compiler Errors or SEGVs when building Open MPI, and at least one
other package I care about (GASNet).
So I think you can understand why I find it ironic that PathScale should
request that the Open MPI sources revert to C89 to support PathScale
compilers for an EOL distro.

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0-latest] unreachable code

2016-08-27 Thread Paul Hargrove
Oracle Studio 12.5 C compiler on a recent v2.0 nightly tarball points out
the following unreachable code:

"/shared/OMPI/openmpi-2.0-latest-solaris11-x86-ib-ss12u5/openmpi-v2.0.0-227-g917d293/opal/mca/base/mca_base_component_repository.c",
line 265: warning: statement not reached
"/shared/OMPI/openmpi-2.0-latest-solaris11-x86-ib-ss12u5/openmpi-v2.0.0-227-g917d293/opal/mca/btl/openib/btl_openib.c",
line 498: warning: statement not reached
"/shared/OMPI/openmpi-2.0-latest-solaris11-x86-ib-ss12u5/openmpi-v2.0.0-227-g917d293/opal/mca/btl/openib/connect/btl_openib_connect_udcm.c",
line 2735: warning: statement not reached
"/shared/OMPI/openmpi-2.0-latest-solaris11-x86-ib-ss12u5/openmpi-v2.0.0-227-g917d293/ompi/mca/io/romio314/romio/adio/common/ad_fstype.c",
line 306: warning: statement not reached
"/shared/OMPI/openmpi-2.0-latest-solaris11-x86-ib-ss12u5/openmpi-v2.0.0-227-g917d293/ompi/mca/io/romio314/romio/adio/common/ad_threaded_io.c",
line 31: warning: statement not reached

I will ignore the ROMIO ones, leaving 3 to diagnose.

1) opal/mca/base/mca_base_component_repository.c, line 265:
   260  *framework_components = NULL;
   261  #if OPAL_HAVE_DL_SUPPORT
   262  return opal_hash_table_get_value_ptr
(_base_component_repository, framework->framework_name,
   263strlen
(framework->framework_name), (void **) framework_components);
   264  #endif
 > 265  return OPAL_ERR_NOT_FOUND;
   266  }
   267
   268  #if OPAL_HAVE_DL_SUPPORT
   269  static void mca_base_component_repository_release_internal
(mca_base_component_repository_item_t *ri) {
   270  int group_id;

The unreachable warning easily fixed with an '#else' clause as follows:

--- opal/mca/base/mca_base_component_repository.c~  Sat Aug 27 02:03:23
2016
+++ opal/mca/base/mca_base_component_repository.c   Sat Aug 27 02:03:41
2016
@@ -261,8 +261,9 @@
 #if OPAL_HAVE_DL_SUPPORT
 return opal_hash_table_get_value_ptr (_base_component_repository,
framework->framework_name,
   strlen
(framework->framework_name), (void **) framework_components);
-#endif
+#else
 return OPAL_ERR_NOT_FOUND;
+#endif
 }

 #if OPAL_HAVE_DL_SUPPORT



2) opal/mca/btl/openib/btl_openib.c line 498 is also due to a return right
after an #endif, and can also be fixed with an #else.

   493  case IBV_LINK_LAYER_UNSPECIFIED:
   494  default:
   495  return MCA_BTL_OPENIB_TRANSPORT_UNKNOWN;
   496  }
   497  #endif
 > 498  return MCA_BTL_OPENIB_TRANSPORT_IB;
   499
   500  case IBV_TRANSPORT_IWARP:
   501  return MCA_BTL_OPENIB_TRANSPORT_IWARP;
   502
   503  case IBV_TRANSPORT_UNKNOWN:

3) opal/mca/btl/openib/connect/btl_openib_connect_udcm.c line 2735 looks
like debugging code (the "while(1)" on line 2734) that maybe was not
intended to be committed:

  2730  IBV_QP_PORT | IBV_QP_ACCESS_FLAGS);
  2731  if (ret) {
  2732  BTL_ERROR(("Error modifying XRC recv QP[%x] to
IBV_QPS_INIT, errno says: %s [%d]",
  2733 lcl_ep->xrc_recv_qp_num, strerror(ret), ret));
  2734  while(1);
> 2735  return OPAL_ERROR;
  2736  }
  2737  #endif
  2738
  2739  memset(, 0, sizeof(struct ibv_qp_attr));
  2740  attr.qp_state   = IBV_QPS_RTR;


-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.1rc1] type warnings from clang

2016-08-27 Thread Paul Hargrove
Building w/ clang-3.4.2 on Linux/x86-64:

/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang/openmpi-2.0.1rc1/opal/mca/btl/openib/btl_openib_component.c:2158:21:
warning: implicit conversion from enumeration type
'btl_openib_receive_queues_source_t' to different enumeration type
'mca_base_var_source_t' [-Wenum-conversion]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix1_client.c:408:19:
warning: implicit conversion from enumeration type 'opal_pmix_scope_t' to
different enumeration type 'pmix_scope_t' [-Wenum-conversion]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang/openmpi-2.0.1rc1/ompi/mca/io/romio314/romio/adio/common/utils.c:97:3:
warning: passing 'const MPI_Aint *' (aka 'const long *') to parameter of
type 'MPI_Aint *' (aka 'long *') discards qualifiers
[-Wincompatible-pointer-types-discards-qualifiers]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang/openmpi-2.0.1rc1/oshmem/mca/spml/yoda/spml_yoda.c:644:33:
warning: incompatible pointer types passing 'oshmem_proc_t **' (aka 'struct
oshmem_proc_t **') to parameter of type 'struct ompi_proc_t **'
[-Wincompatible-pointer-types]

Adding -m32 yields the following which all seem to show ROMIO's ADIO_Offset
being 64-bits wide, while MPI_Offset and MPI_Count are only 32-bits wide:

/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/io/romio314/romio/mpi-io/get_bytoff.c:66:44:
warning: incompatible pointer types passing 'MPI_Offset *' (aka 'long *')
to parameter of type 'ADIO_Offset *' (aka 'long long *')
[-Wincompatible-pointer-types]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/io/romio314/romio/mpi-io/get_posn.c:55:33:
warning: incompatible pointer types passing 'MPI_Offset *' (aka 'long *')
to parameter of type 'ADIO_Offset *' (aka 'long long *')
[-Wincompatible-pointer-types]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/io/romio314/romio/mpi-io/get_posn_sh.c:56:36:
warning: incompatible pointer types passing 'MPI_Offset *' (aka 'long *')
to parameter of type 'ADIO_Offset *' (aka 'long long *')
[-Wincompatible-pointer-types]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/io/romio314/romio/mpi-io/seek.c:76:30:
warning: incompatible pointer types passing 'MPI_Offset *' (aka 'long *')
to parameter of type 'ADIO_Offset *' (aka 'long long *')
[-Wincompatible-pointer-types]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/io/romio314/romio/mpi-io/seek.c:97:32:
warning: incompatible pointer types passing 'MPI_Offset *' (aka 'long *')
to parameter of type 'ADIO_Offset *' (aka 'long long *')
[-Wincompatible-pointer-types]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/io/romio314/romio/mpi-io/seek_sh.c:104:37:
warning: incompatible pointer types passing 'MPI_Offset *' (aka 'long *')
to parameter of type 'ADIO_Offset *' (aka 'long long *')
[-Wincompatible-pointer-types]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/io/romio314/romio/mpi-io/seek_sh.c:133:36:
warning: incompatible pointer types passing 'MPI_Offset *' (aka 'long *')
to parameter of type 'ADIO_Offset *' (aka 'long long *')
[-Wincompatible-pointer-types]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/io/romio314/romio/adio/common/ad_coll_exch_new.c:160:35:
warning: incompatible pointer types passing 'ADIO_Offset *' (aka 'long long
*') to parameter of type 'MPI_Count *' (aka 'long *')
[-Wincompatible-pointer-types]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/io/romio314/romio/adio/common/ad_read_coll.c:857:5:
warning: incompatible pointer types passing 'ADIO_Offset *' (aka 'long long
*') to parameter of type 'const MPI_Count *' (aka 'const long *')
[-Wincompatible-pointer-types]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/io/romio314/romio/adio/common/ad_write_coll.c:615:8:
warning: incompatible pointer types passing 'ADIO_Offset *' (aka 'long long
*') to parameter of type 'const MPI_Count *' (aka 'const long *')
[-Wincompatible-pointer-types]



Use of -m32 also generates the printf format warnings (after removing the
ones for types of equal size but different signedness):

/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/fcoll/two_phase/fcoll_two_phase_support_fns.c:175:36:
warning: format specifies type 'long long' but the argument has type 'long'
[-Wformat]
/scratch/phargrov/OMPI/openmpi-2.0.1rc1-linux-x86_64-clang-m32/openmpi-2.0.1rc1/ompi/mca/fcoll/two_phase/fcoll_two_phase_support_fns.c:175:44:
warning: format specifies type 'long long' but the argument has type 'long'
[-Wformat]

[OMPI devel] [2.0.1rc1] ppc64 atomics (still) broken w/ xlc-12.1

2016-08-27 Thread Paul Hargrove
I didn't get to test 2.0.1rc1 with xlc-12.1 until just now because I need a
CRYPTOCard for access (== not fully automated like my other tests).

It appears that the problem I reported in 2.0.0rc2 and thought to be as
fixed by pr1140  was
never /fully/ fixed.
The commit in that PR includes only ONE of the TWO patch hunks in my
original email (URL in the PR's initial comment).
So, opal_atomic_ll_32() was fixed but opal_atomic_ll_64() was not.

The same half-fixed state exists on master as well, but is masked by the
default use of "__sync builtin atomics".

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.1rc1] minor nits in README

2016-08-24 Thread Paul Hargrove
A run of "spell README" produces (after manual filtering) the following
misspelled words:

appliation
applicaions
availble
compatibile
libeveny
Memopy
paramater
relavant
specfic


It appears that us middleware authors don't know how to write
"applications"  ;-)


README says:
- Open MPI's run-time behavior can be customized via MCA ("MPI
  Component Architecture") [...]

I hope I don't need to tell you what is wrong in that text.
A bit of digging w/ git show that Jeff Squyres, of all people, committed
that error in SVN r20004.


README says:
  - "ob1" supports a variety of networks that can be used in
combination with each other (per OS constraints; e.g., there are
reports that the GM and OpenFabrics kernel drivers do not operate
well together):

I am not disputing whether the statement in parenthesis is correct or not,
but since GM support is long gone I question its relevance.


-Paul [who as a child confirmed that "unbreakable combs" aren't]

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-24 Thread Paul Hargrove
My previous response was composed too quickly.
I should have said "successfully built and RUN".

-Paul


On Wed, Aug 24, 2016 at 9:04 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Thanks Paul !
>
>
> yes, this snapshot does include the patch i posted earlier.
>
> btw, the issue was a runtime error, not a build error.
>
>
> Cheers,
>
>
> Gilles
>
> On 8/25/2016 12:00 PM, Paul Hargrove wrote:
>
> Giles,
>
> I have successfully built openmpi-v2.0.0-227-g917d293 (tonight's nightly
> tarball) on Solaris 11.3 with both the Gnu and Studio compilers.  Based on
> Ralph's previous email, I assume that included the patch you had directed
> me to (though I did not attempt to verify that myself).
>
> -Paul
>
> On Wed, Aug 24, 2016 at 10:44 AM, Paul Hargrove <phhargr...@lbl.gov>
> wrote:
>
>> Ralph,
>>
>> That will allow me to test much sooner.
>>
>> -Paul
>>
>> On Wed, Aug 24, 2016 at 10:41 AM, r...@open-mpi.org <r...@open-mpi.org>
>> wrote:
>>
>>> When you do, that PR has already been committed, so you can just pull
>>> the next nightly 2.x tarball and test from there
>>>
>>> On Aug 24, 2016, at 10:39 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>>
>>> I am afraid it might take a day or two before I can get to testing that
>>> patch.
>>>
>>> -Paul
>>>
>>> On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet <gil...@rist.or.jp
>>> > wrote:
>>>
>>>> Paul,
>>>>
>>>>
>>>> you can download a patch at https://patch-diff.githubuserc
>>>> ontent.com/raw/open-mpi/ompi-release/pull/1336.patch
>>>>
>>>> (note you need recent autotools in order to use it)
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> Gilles
>>>>
>>>> On 8/23/2016 10:40 PM, r...@open-mpi.org wrote:
>>>>
>>>> Looks like Solaris has a “getupeercred” - can you take a look at it,
>>>> Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native
>>>> sec component.
>>>>
>>>>
>>>> On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote:
>>>>
>>>> I took a quick glance at this one, and the only way I can see to get
>>>> that error is from this block of code:
>>>>
>>>> #if defined(HAVE_STRUCT_UCRED_UID)
>>>> euid = ucred.uid;
>>>> gid = ucred.gid;
>>>> #else
>>>> euid = ucred.cr_uid;
>>>> gid = ucred.cr_gid;
>>>> #endif
>>>>
>>>> #elif defined(HAVE_GETPEEREID)
>>>> pmix_output_verbose(2, pmix_globals.debug_output,
>>>> "sec:native checking getpeereid for peer
>>>> credentials");
>>>> if (0 != getpeereid(peer->sd, , )) {
>>>> pmix_output_verbose(2, pmix_globals.debug_output,
>>>> "sec: getsockopt getpeereid failed: %s",
>>>> strerror (pmix_socket_errno));
>>>> return PMIX_ERR_INVALID_CRED;
>>>> }
>>>> #else
>>>> return PMIX_ERR_NOT_SUPPORTED;
>>>> #endif
>>>>
>>>>
>>>> I can only surmise, therefore, that Solaris doesn’t pass either of the
>>>> two #if define’d tests. Is there a Solaris alternative?
>>>>
>>>>
>>>> On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote:
>>>>
>>>> Thanks Gilles!
>>>>
>>>> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet <
>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>
>>>> Thanks Paul,
>>>>
>>>> at first glance, something is going wrong in the sec module under
>>>> solaris.
>>>> I will keep digging tomorrow
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> On Tuesday, August 23, 2016, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>>>
>>>>> On Solaris 11.3 on x86-64:
>>>>>
>>>>> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
>>>>> examples/ring_c'
>>>>> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
>>>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_s

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-24 Thread Paul Hargrove
Giles,

I have successfully built openmpi-v2.0.0-227-g917d293 (tonight's nightly
tarball) on Solaris 11.3 with both the Gnu and Studio compilers.  Based on
Ralph's previous email, I assume that included the patch you had directed
me to (though I did not attempt to verify that myself).

-Paul

On Wed, Aug 24, 2016 at 10:44 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> Ralph,
>
> That will allow me to test much sooner.
>
> -Paul
>
> On Wed, Aug 24, 2016 at 10:41 AM, r...@open-mpi.org <r...@open-mpi.org>
> wrote:
>
>> When you do, that PR has already been committed, so you can just pull the
>> next nightly 2.x tarball and test from there
>>
>> On Aug 24, 2016, at 10:39 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>
>> I am afraid it might take a day or two before I can get to testing that
>> patch.
>>
>> -Paul
>>
>> On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet <gil...@rist.or.jp>
>> wrote:
>>
>>> Paul,
>>>
>>>
>>> you can download a patch at https://patch-diff.githubuserc
>>> ontent.com/raw/open-mpi/ompi-release/pull/1336.patch
>>>
>>> (note you need recent autotools in order to use it)
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>> On 8/23/2016 10:40 PM, r...@open-mpi.org wrote:
>>>
>>> Looks like Solaris has a “getupeercred” - can you take a look at it,
>>> Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native
>>> sec component.
>>>
>>>
>>> On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote:
>>>
>>> I took a quick glance at this one, and the only way I can see to get
>>> that error is from this block of code:
>>>
>>> #if defined(HAVE_STRUCT_UCRED_UID)
>>> euid = ucred.uid;
>>> gid = ucred.gid;
>>> #else
>>> euid = ucred.cr_uid;
>>> gid = ucred.cr_gid;
>>> #endif
>>>
>>> #elif defined(HAVE_GETPEEREID)
>>> pmix_output_verbose(2, pmix_globals.debug_output,
>>> "sec:native checking getpeereid for peer
>>> credentials");
>>> if (0 != getpeereid(peer->sd, , )) {
>>> pmix_output_verbose(2, pmix_globals.debug_output,
>>> "sec: getsockopt getpeereid failed: %s",
>>> strerror (pmix_socket_errno));
>>> return PMIX_ERR_INVALID_CRED;
>>> }
>>> #else
>>> return PMIX_ERR_NOT_SUPPORTED;
>>> #endif
>>>
>>>
>>> I can only surmise, therefore, that Solaris doesn’t pass either of the
>>> two #if define’d tests. Is there a Solaris alternative?
>>>
>>>
>>> On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote:
>>>
>>> Thanks Gilles!
>>>
>>> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com> wrote:
>>>
>>> Thanks Paul,
>>>
>>> at first glance, something is going wrong in the sec module under
>>> solaris.
>>> I will keep digging tomorrow
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On Tuesday, August 23, 2016, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>>
>>>> On Solaris 11.3 on x86-64:
>>>>
>>>> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
>>>> examples/ring_c'
>>>> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
>>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
>>>> at line 529
>>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 983
>>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 199
>>>> 
>>>> --
>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>> likely to abort.  There are many reasons that a parallel process can
>

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-24 Thread Paul Hargrove
Ralph,

That will allow me to test much sooner.

-Paul

On Wed, Aug 24, 2016 at 10:41 AM, r...@open-mpi.org <r...@open-mpi.org> wrote:

> When you do, that PR has already been committed, so you can just pull the
> next nightly 2.x tarball and test from there
>
> On Aug 24, 2016, at 10:39 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> I am afraid it might take a day or two before I can get to testing that
> patch.
>
> -Paul
>
> On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet <gil...@rist.or.jp>
> wrote:
>
>> Paul,
>>
>>
>> you can download a patch at https://patch-diff.githubuserc
>> ontent.com/raw/open-mpi/ompi-release/pull/1336.patch
>>
>> (note you need recent autotools in order to use it)
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>> On 8/23/2016 10:40 PM, r...@open-mpi.org wrote:
>>
>> Looks like Solaris has a “getupeercred” - can you take a look at it,
>> Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native
>> sec component.
>>
>>
>> On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote:
>>
>> I took a quick glance at this one, and the only way I can see to get that
>> error is from this block of code:
>>
>> #if defined(HAVE_STRUCT_UCRED_UID)
>> euid = ucred.uid;
>> gid = ucred.gid;
>> #else
>> euid = ucred.cr_uid;
>> gid = ucred.cr_gid;
>> #endif
>>
>> #elif defined(HAVE_GETPEEREID)
>> pmix_output_verbose(2, pmix_globals.debug_output,
>> "sec:native checking getpeereid for peer
>> credentials");
>> if (0 != getpeereid(peer->sd, , )) {
>> pmix_output_verbose(2, pmix_globals.debug_output,
>> "sec: getsockopt getpeereid failed: %s",
>> strerror (pmix_socket_errno));
>> return PMIX_ERR_INVALID_CRED;
>> }
>> #else
>> return PMIX_ERR_NOT_SUPPORTED;
>> #endif
>>
>>
>> I can only surmise, therefore, that Solaris doesn’t pass either of the
>> two #if define’d tests. Is there a Solaris alternative?
>>
>>
>> On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote:
>>
>> Thanks Gilles!
>>
>> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>> Thanks Paul,
>>
>> at first glance, something is going wrong in the sec module under solaris.
>> I will keep digging tomorrow
>>
>> Cheers,
>>
>> Gilles
>>
>> On Tuesday, August 23, 2016, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>
>>> On Solaris 11.3 on x86-64:
>>>
>>> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
>>> examples/ring_c'
>>> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c at
>>> line 529
>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 983
>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 199
>>> 
>>> --
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or
>>> environment
>>> problems.  This failure appears to be an internal failure; here's some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>>
>>>   ompi_mpi_init: ompi_rte_init failed
>>>   --> Returned "(null)" (-43) instead of "Success" (0)
>>> 
>>> --
>>> *** An error occurred in MPI_Init
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>> ***and potentially your MPI job)
>>> [pcp-d-4:25078] Local abort before MPI_INIT completed completed
>>> successfully, but am not able to aggregate error messages, and not able to
>>> guarantee that all other

Re: [OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-24 Thread Paul Hargrove
I am afraid it might take a day or two before I can get to testing that
patch.

-Paul

On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Paul,
>
>
> you can download a patch at https://patch-diff.githubusercontent.com/raw/
> open-mpi/ompi-release/pull/1336.patch
>
> (note you need recent autotools in order to use it)
>
>
> Cheers,
>
>
> Gilles
>
> On 8/23/2016 10:40 PM, r...@open-mpi.org wrote:
>
> Looks like Solaris has a “getupeercred” - can you take a look at it,
> Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native
> sec component.
>
>
> On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote:
>
> I took a quick glance at this one, and the only way I can see to get that
> error is from this block of code:
>
> #if defined(HAVE_STRUCT_UCRED_UID)
> euid = ucred.uid;
> gid = ucred.gid;
> #else
> euid = ucred.cr_uid;
> gid = ucred.cr_gid;
> #endif
>
> #elif defined(HAVE_GETPEEREID)
> pmix_output_verbose(2, pmix_globals.debug_output,
> "sec:native checking getpeereid for peer
> credentials");
> if (0 != getpeereid(peer->sd, , )) {
> pmix_output_verbose(2, pmix_globals.debug_output,
> "sec: getsockopt getpeereid failed: %s",
> strerror (pmix_socket_errno));
> return PMIX_ERR_INVALID_CRED;
> }
> #else
> return PMIX_ERR_NOT_SUPPORTED;
> #endif
>
>
> I can only surmise, therefore, that Solaris doesn’t pass either of the two
> #if define’d tests. Is there a Solaris alternative?
>
>
> On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote:
>
> Thanks Gilles!
>
> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
> Thanks Paul,
>
> at first glance, something is going wrong in the sec module under solaris.
> I will keep digging tomorrow
>
> Cheers,
>
> Gilles
>
> On Tuesday, August 23, 2016, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
>> On Solaris 11.3 on x86-64:
>>
>> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
>> examples/ring_c'
>> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c at
>> line 529
>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 983
>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 199
>> 
>> --
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>>   ompi_mpi_init: ompi_rte_init failed
>>   --> Returned "(null)" (-43) instead of "Success" (0)
>> 
>> --
>> *** An error occurred in MPI_Init
>> *** on a NULL communicator
>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>> ***and potentially your MPI job)
>> [pcp-d-4:25078] Local abort before MPI_INIT completed completed
>> successfully, but am not able to aggregate error messages, and not able to
>> guarantee that all other processes were killed!
>> ---
>> Primary job  terminated normally, but 1 process returned
>> a non-zero exit code.. Per user-direction, the job has been aborted.
>> ---
>> 
>> --
>> mpirun detected that one or more processes exited with non-zero status,
>> thus causing
>> the job to be terminated. The first process to do so was:
>>
>>   Process name: [[25599,1],1]
>>   Exit code:1
>> 
>> --
>>
>> -Paul
>>
>> --
>> Paul H. Hargrove   

Re: [OMPI devel] [2.0.1.rc1] dlopen_test crashes with xlc and studio compilers

2016-08-23 Thread Paul Hargrove
Jeff,
Correct, the latest GA release is 12.5 and it correctly handles -m32 where
12.4 failed.
-Paul [Sent from my phone]

On Tuesday, August 23, 2016, Jeff Squyres (jsquyres) <jsquy...@cisco.com>
wrote:

> Paul --
>
> 12.5 is the latest (last?) version of the Oracle compilers, right?
>
> If that's correct: are you saying that 12.5 (GA/not beta) *does* handle
> the dlopen test with -m32 properly?
>
> If so, it sounds like we should amend README to say that we support 12.5
> and nothing earlier (given that 12.5 is pretty old in itself).
>
>
>
>
> > On Aug 23, 2016, at 12:49 AM, Paul Hargrove <phhargr...@lbl.gov
> <javascript:;>> wrote:
> >
> > As with 2.0.0 I am still seeing "dlopen_test" crashing on two types of
> system:
> >
> > Linux/X86-64 with Oracle Studio Compilers and -m32
> > Linux/PPC with XLC
> >
> > In the Studio compilers case, the problem is only when using -m32, and
> ONLY with versions prior to 12.5 (no longer in beta, FWIW).
> > So, this may be worth a note in the README if there is not already one.
> >
> > The Linux + XLC problem is in all versions I have access to.
> > This is know already as issue #1854 with a milestone of v2.0.1.
> >
> > -Paul
> >
> >
> > --
> > Paul H. Hargrove  phhargr...@lbl.gov
> <javascript:;>
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department   Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org <javascript:;>
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com <javascript:;>
> For corporate legal information go to: http://www.cisco.com/web/
> about/doing_business/legal/cri/
>
> ___
> devel mailing list
> devel@lists.open-mpi.org <javascript:;>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>


-- 
-Paul [Sent from my phone]
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.1.rc1] dlopen_test crashes with xlc and studio compilers

2016-08-22 Thread Paul Hargrove
As with 2.0.0 I am still seeing "dlopen_test" crashing on two types of
system:

Linux/X86-64 with Oracle Studio Compilers and -m32
Linux/PPC with XLC

In the Studio compilers case, the problem is only when using -m32, and ONLY
with versions prior to 12.5 (no longer in beta, FWIW).
So, this may be worth a note in the README if there is not already one.

The Linux + XLC problem is in all versions I have access to.
This is know already as issue #1854 with a milestone of v2.0.1.

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] [2.0.1.rc1] runtime failure on MacOS 10.6

2016-08-22 Thread Paul Hargrove
I was using /usr/bin/gcc:
   i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646)

A quick looks suggests that there is no strnlen() on this system.
It doesn't appear in any header below /usr/include/

FWIW: strnlen() first appears in IEEE Std 1003.1-2008 (aka POSIX.1-2008)
and so could be absent on any system claiming conformance to an earlier
revision.
It was in glic prior to appearing in POSIX.1 and so might be on most any
Linux system regardless of age.

-Paul

On Mon, Aug 22, 2016 at 9:17 PM, r...@open-mpi.org <r...@open-mpi.org> wrote:

> Hey Paul
>
> I just checked on my Mac and had no problem. However, I’m at 10.11, and so
> I’m wondering if the old 10.6 just doesn’t have strnlen on it?
>
> What compiler were you using?
>
> On Aug 22, 2016, at 9:14 PM, r...@open-mpi.org wrote:
>
> Huh - I’ll take a look. Thanks!
>
> On Aug 22, 2016, at 9:11 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> On a Mac OSX 10.6 system:
>
> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
> dyld: lazy symbol binding failed: Symbol not found: _strnlen
>   Referenced from: /Users/paul/OMPI/openmpi-2.0.
> 1rc1-macos10.6-x86-m32/INST/lib/openmpi/mca_pmix_pmix112.so
>   Expected in: flat namespace
>
> dyld: Symbol not found: _strnlen
>   Referenced from: /Users/paul/OMPI/openmpi-2.0.
> 1rc1-macos10.6-x86-m32/INST/lib/openmpi/mca_pmix_pmix112.so
>   Expected in: flat namespace
>
> Let me know what additional information is desired and to whom to send.
>
> -Paul
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.1.rc1] Solaris MPIX failure

2016-08-22 Thread Paul Hargrove
On Solaris 11.3 on x86-64:

$ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
examples/ring_c'
[pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
at line 529
[pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
at line 983
[pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
at line 199
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (-43) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[pcp-d-4:25078] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
--
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[25599,1],1]
  Exit code:1
--

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

[OMPI devel] [2.0.1.rc1] runtime failure on MacOS 10.6

2016-08-22 Thread Paul Hargrove
On a Mac OSX 10.6 system:

$ mpirun -mca btl sm,self -np 2 examples/ring_c'
dyld: lazy symbol binding failed: Symbol not found: _strnlen
  Referenced from:
/Users/paul/OMPI/openmpi-2.0.1rc1-macos10.6-x86-m32/INST/lib/openmpi/mca_pmix_pmix112.so
  Expected in: flat namespace

dyld: Symbol not found: _strnlen
  Referenced from:
/Users/paul/OMPI/openmpi-2.0.1rc1-macos10.6-x86-m32/INST/lib/openmpi/mca_pmix_pmix112.so
  Expected in: flat namespace

Let me know what additional information is desired and to whom to send.

-Paul



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] PGI built Open MPI vs GNU built slurm

2016-07-26 Thread Paul Hargrove
Gilles,

With the additional information you provided about "dependency_libs", I
agree that that either of the fixes you propose sound safe.

-Paul

On Mon, Jul 25, 2016 at 6:26 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Paul,
>
> in my environment, libslurm.la contains
>
> # Linker flags that can not go in dependency_libs.
> inherited_linker_flags=' -pthread'
>
> # Libraries that this one depends upon.
> dependency_libs=' -ldl -lpthread'
>
>
> so bottom line, it invokes the compiler with both -pthread and -lpthread
>
>
> iirc, -pthread does two things :
>
> - invoke the compiler with -D_REENTRANT (so it uses the thread-safe errno
> and so on)
>
> - invoke the linker with -lpthread
>
> OpenMPI has its own way to pass -D_REENTRANT or similar anyway, and
> libslurm.la is used only for for linking.
>
> since -lpthread is pulled anyway from libslurm.la (or it was already set
> by OpenMPI), then yes, discarding -pthread should do the trick.
>
>
> Cheers,
>
>
> Gilles
>
> On 7/26/2016 10:11 AM, Paul Hargrove wrote:
>
> Gilles,
>
> My initial thought is that libslurm probably does require linking
> libpthread, either for for linking pthread_* symbols, or for proper
> *operation* (such as thread-safe versions of functions which override weak
> definitions in libc).
>
> If so, then neither omitting "-pthread" nor telling pgcc not to complain
> about "-pthread" is going to be a good solution.
> Instead the "-pthread" needs to be replaced by "-lpthread", or similar.
>
> -Paul
>
> On Mon, Jul 25, 2016 at 6:03 PM, Gilles Gouaillardet <gil...@rist.or.jp>
> wrote:
>
>> Folks,
>>
>>
>> This is a followup of a thread that initially started at
>> http://www.open-mpi.org/community/lists/users/2016/07/29635.php
>>
>>
>> The user is trying to build Open MPI with PGI compiler and
>> libslurm.la/libpmi.la support, and slurm was built with gcc compiler.
>>
>>
>> At first, it fails because the "-pthread" flag is pulled from
>> libslurm.la/libpmi.la, but this flag is not supported by PGI compilers.
>>
>> A workaround is to pass the -noswitcherror flag to the PGI compiler (so
>> the -pthread flag is discarded and a warning message is issued, but PGI
>> compiler does not fail). Unfortunatly, that does not work because libtool
>> does does not pass this flag to the PGI compiler.
>>
>>
>> Of course, one option is to tell the user to rebuild slurm with PGI, so
>> libslurm.la/libpmi.la do not have the "-pthread" flag.
>>
>> A nicer though arguable option is to hack libtool to silently drop the
>> "-pthread" flag with PGI compiler is used (i made a proof of concept, and
>> this is a two lines patch).
>>
>> An other cleaner option is to hack libtool so it pass -noswitcherror to
>> PGI compiler, but i do not know how to achieve this.
>>
>>
>> Any thoughts ?
>>
>>
>> Cheers
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/07/19278.php
>>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
>
> ___
> devel mailing listde...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/07/19279.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/07/19280.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] PGI built Open MPI vs GNU built slurm

2016-07-25 Thread Paul Hargrove
Gilles,

My initial thought is that libslurm probably does require linking
libpthread, either for for linking pthread_* symbols, or for proper
*operation* (such as thread-safe versions of functions which override weak
definitions in libc).

If so, then neither omitting "-pthread" nor telling pgcc not to complain
about "-pthread" is going to be a good solution.
Instead the "-pthread" needs to be replaced by "-lpthread", or similar.

-Paul

On Mon, Jul 25, 2016 at 6:03 PM, Gilles Gouaillardet 
wrote:

> Folks,
>
>
> This is a followup of a thread that initially started at
> http://www.open-mpi.org/community/lists/users/2016/07/29635.php
>
>
> The user is trying to build Open MPI with PGI compiler and
> libslurm.la/libpmi.la support, and slurm was built with gcc compiler.
>
>
> At first, it fails because the "-pthread" flag is pulled from
> libslurm.la/libpmi.la, but this flag is not supported by PGI compilers.
>
> A workaround is to pass the -noswitcherror flag to the PGI compiler (so
> the -pthread flag is discarded and a warning message is issued, but PGI
> compiler does not fail). Unfortunatly, that does not work because libtool
> does does not pass this flag to the PGI compiler.
>
>
> Of course, one option is to tell the user to rebuild slurm with PGI, so
> libslurm.la/libpmi.la do not have the "-pthread" flag.
>
> A nicer though arguable option is to hack libtool to silently drop the
> "-pthread" flag with PGI compiler is used (i made a proof of concept, and
> this is a two lines patch).
>
> An other cleaner option is to hack libtool so it pass -noswitcherror to
> PGI compiler, but i do not know how to achieve this.
>
>
> Any thoughts ?
>
>
> Cheers
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/07/19278.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] Change compiler

2016-07-18 Thread Paul Hargrove
Murali,

I typically configure with
   CC=clang CXX=clang++
on the configure command line.
Editing of the files generated by configure (such as the Makefile) is not
advisable.

-Paul

On Mon, Jul 18, 2016 at 1:06 PM, Emani, Murali  wrote:

> Hi all,
>
> I would like to know if there is Clang support for OpenMPI codebase.
>
> I am trying to change the underlying compiler from gcc to clang in
> ‘configure' and ‘make all install’, I changed these values in Makefile in
> root dir and another one in config directory. The steps during ‘configure’
> reflect gcc again instead of clang. Is this the right way or am I missing
> something here ?
>
> Is the wrapper compiler environment variable ‘OMPI_CC’ intended to replace
> the underlying compiler when compiling an MPI application.
>
>
> —
> Murali
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/07/19235.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] [2.0.0rc4] non-critical faulres report

2016-07-12 Thread Paul Hargrove
Ok, PGI 15.10 w/ -m32 failed in the same way as the earlier versions.

-Paul

On Tue, Jul 12, 2016 at 11:10 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> I have a lead on a 15.10 installation with -m32 support.
> I will report results later.
>
> -Paul
>
>
> On Tue, Jul 12, 2016 at 10:29 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> Got it; thanks.
>>
>> > On Jul 12, 2016, at 1:00 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>> >
>> > I have only access to two instances of PGI which were installed with
>> -m32 support.
>> > They are both old:  12.10-0 and 13.9-0.
>> > Sorry, I know that's not much.
>> >
>> > -Paul
>> >
>> > On Tue, Jul 12, 2016 at 8:06 AM, Howard Pritchard <hpprit...@gmail.com>
>> wrote:
>> > Paul,
>> >
>> > Could you narrow down the versions of the PGCC where you get the ICE
>> when
>> > using the -m32 option?
>> >
>> > Thanks,
>> >
>> > Howard
>> >
>> >
>> > 2016-07-06 15:29 GMT-06:00 Paul Hargrove <phhargr...@lbl.gov>:
>> > The following are previously reported issues that I am *not* expecting
>> to be resolved in 2.0.0.
>> > However, I am listing them here for completeness.
>> >
>> > Known, but with later target:
>> >
>> > OpenBSD fails to build ROMIO - PR1178 exists with v2.0.1 target
>> > NAG Fortran support - PR1215 exists with v2.0.1 target
>> >
>> > Known, but *not* suspected to be the fault of Open MPI or it embedded
>> components:
>> >
>> > Pathcc gets ICE - versions 5.0.5 and 6.0.527 get compiler crashes
>> building Open MPI
>> > Pgcc -m32 gets ICE - versions 12.x and 13.x (the only ones I can test
>> w/ -m32) crash compiling hwloc
>> >
>> > -Paul
>> >
>> > --
>> > Paul H. Hargrove  phhargr...@lbl.gov
>> > Computer Languages & Systems Software (CLaSS) Group
>> > Computer Science Department   Tel: +1-510-495-2352
>> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> >
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/07/19155.php
>> >
>> >
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/07/19181.php
>> >
>> >
>> >
>> > --
>> > Paul H. Hargrove  phhargr...@lbl.gov
>> > Computer Languages & Systems Software (CLaSS) Group
>> > Computer Science Department   Tel: +1-510-495-2352
>> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> > ___
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/07/19182.php
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/07/19183.php
>>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] [2.0.0rc4] non-critical faulres report

2016-07-12 Thread Paul Hargrove
I have a lead on a 15.10 installation with -m32 support.
I will report results later.

-Paul


On Tue, Jul 12, 2016 at 10:29 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> Got it; thanks.
>
> > On Jul 12, 2016, at 1:00 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> >
> > I have only access to two instances of PGI which were installed with
> -m32 support.
> > They are both old:  12.10-0 and 13.9-0.
> > Sorry, I know that's not much.
> >
> > -Paul
> >
> > On Tue, Jul 12, 2016 at 8:06 AM, Howard Pritchard <hpprit...@gmail.com>
> wrote:
> > Paul,
> >
> > Could you narrow down the versions of the PGCC where you get the ICE when
> > using the -m32 option?
> >
> > Thanks,
> >
> > Howard
> >
> >
> > 2016-07-06 15:29 GMT-06:00 Paul Hargrove <phhargr...@lbl.gov>:
> > The following are previously reported issues that I am *not* expecting
> to be resolved in 2.0.0.
> > However, I am listing them here for completeness.
> >
> > Known, but with later target:
> >
> > OpenBSD fails to build ROMIO - PR1178 exists with v2.0.1 target
> > NAG Fortran support - PR1215 exists with v2.0.1 target
> >
> > Known, but *not* suspected to be the fault of Open MPI or it embedded
> components:
> >
> > Pathcc gets ICE - versions 5.0.5 and 6.0.527 get compiler crashes
> building Open MPI
> > Pgcc -m32 gets ICE - versions 12.x and 13.x (the only ones I can test w/
> -m32) crash compiling hwloc
> >
> > -Paul
> >
> > --
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department   Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/07/19155.php
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/07/19181.php
> >
> >
> >
> > --
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department   Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/07/19182.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/07/19183.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] [2.0.0rc4] non-critical faulres report

2016-07-12 Thread Paul Hargrove
I have only access to two instances of PGI which were installed with -m32
support.
They are both old:  12.10-0 and 13.9-0.
Sorry, I know that's not much.

-Paul

On Tue, Jul 12, 2016 at 8:06 AM, Howard Pritchard <hpprit...@gmail.com>
wrote:

> Paul,
>
> Could you narrow down the versions of the PGCC where you get the ICE when
> using the -m32 option?
>
> Thanks,
>
> Howard
>
>
> 2016-07-06 15:29 GMT-06:00 Paul Hargrove <phhargr...@lbl.gov>:
>
>> The following are previously reported issues that I am *not* expecting to
>> be resolved in 2.0.0.
>> However, I am listing them here for completeness.
>>
>> Known, but with later target:
>>
>> OpenBSD fails to build ROMIO - PR1178 exists with v2.0.1 target
>> NAG Fortran support - PR1215 exists with v2.0.1 target
>>
>> Known, but *not* suspected to be the fault of Open MPI or it embedded
>> components:
>>
>> Pathcc gets ICE - versions 5.0.5 and 6.0.527 get compiler crashes
>> building Open MPI
>> Pgcc -m32 gets ICE - versions 12.x and 13.x (the only ones I can test w/
>> -m32) crash compiling hwloc
>>
>> -Paul
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department   Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/07/19155.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/07/19181.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


  1   2   3   4   5   6   7   8   9   >