Re: [OMPI devel] When libltdl is not your friend

2015-02-02 Thread Paul Hargrove
job is now going to abort; sorry. On Mon, Feb 2, 2015 at 7:01 PM, Paul Hargrove wrote: > Howard, > > This was seen on NERSC's Carver. > > -Paul > > On Mon, Feb 2, 2015 at 6:49 PM, Howard Pritchard > wrote: > >> Hi Paul, >> >> Thanks for check

[OMPI devel] Master assert failure on Linux/PPC64

2015-02-02 Thread Paul Hargrove
On a Linux/PPC64 system I see the failure below from a build of the current master tarball. This build was configured with --prefix=... --enable-debug \ CFLAGS=-m64 --with-wrapper-cflags=-m64 \ CXXFLAGS=-m64 --with-wrapper-cxxflags=-m64 \ FCFLAGS=-m64 --with-wrapper-fcflags=-m64 I am not

[OMPI devel] Master build failure w/ Solaris Studio 12.3 on Linux/x86-64

2015-02-02 Thread Paul Hargrove
On a Linux/x86-64 system I am using the Solaris Studio 12.3 compilers. I have configured the current master tarball as follows: --prefix=... --enable-debug \ CC=cc CXX=CC FC=f90 \ CXXFLAGS='-L/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -library=stlport4' \ --with-wrapper-cxxflags='-L/

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-03 Thread Paul Hargrove
On Mon, Feb 2, 2015 at 5:47 PM, Paul Hargrove wrote: > I'll report my test results more completely later, but all 4 PGI-based > builds I have results for so far have failed with libtool replacing > "-lltdl" in link command line with "/usr/lib/libltdl.so" rat

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-03 Thread Paul Hargrove
On Mon, Feb 2, 2015 at 9:26 PM, Paul Hargrove wrote: > I am now going to see about a PGI compiler on a system at another center > (or two?) in order to see how universal the problem is. That was a dead-end. Of the many non-NERSC non-Cray institutions where I have accounts, I could onl

[OMPI devel] failed to open libltdl.so

2015-02-03 Thread Paul Hargrove
I found another failure mode for non-embedded libltdl. On a system with libltdl.so on the login node but NOT the compute nodes I encountered the following, once per rank, at job launch: /home/phhargrove/OMPI/openmpi-libltdl-linux-x86_64 psm/INST/bin/orted: error while loading shared libraries: li

[OMPI devel] Master hangs in opal_fifo test

2015-02-03 Thread Paul Hargrove
I have seen opal_fifo hang on 2 distinct systems + Linux/ppc32 with xlc-11.1 + Linux/x86-64 with icc-14.0.1.106 I have no explanation to offer for either hang. No "weird" configure options were passed to either. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Computer La

[OMPI devel] Master build broken libfabrics + PGI

2015-02-03 Thread Paul Hargrove
On a Linux/x86_64 system with PGI-14.3 I have configured a current master tarball with the following: --prefix=... --enable-debug CC=pgcc CXX=pgCC FC=pgfortran I see "make V=1" fail as shown below. This does NOT occur with GNU or Intel compilers on the same system. Initial guess is mis-ordered

Re: [OMPI devel] Master hangs in opal_LIFO test

2015-02-03 Thread Paul Hargrove
CORRECTION: It is the opal_lifo (not fifo) test which hung on both systems. -Paul On Mon, Feb 2, 2015 at 11:03 PM, Paul Hargrove wrote: > I have seen opal_fifo hang on 2 distinct systems > + Linux/ppc32 with xlc-11.1 > + Linux/x86-64 with icc-14.0.1.106 > > I have no expla

Re: [OMPI devel] Master hangs in opal_fifo test

2015-02-04 Thread Paul Hargrove
2/04 4:17, Nathan Hjelm wrote: > > Thats the second report involving icc 14. I will dig into this later > this week. > > -Nathan > > On Mon, Feb 02, 2015 at 11:03:41PM -0800, Paul Hargrove wrote: > > I have seen opal_fifo hang on 2 distinct systems > + Linux/ppc32 with xlc

Re: [OMPI devel] Master build broken libfabrics + PGI

2015-02-06 Thread Paul Hargrove
bric/libfabric/src/enosys.c: 50) PGC/x86-64 Linux 10.9-0: compilation completed with severe errors make[2]: *** [libfabric/src/libmca_common_libfabric_la-enosys.lo] Error 1 make[2]: Leaving directory `/global/scratch2/sd/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-10.9/BLD/opal/mca/common/lib

[OMPI devel] opal_fifo SEGV from master

2015-02-06 Thread Paul Hargrove
Yes, this time I really mean "fifo", not "lifo". ;-) With last night's master tarball (Open MPI dev-845-ga3275aa) configured with only --prefix and --enable-debug A Linux-86-64 system running debian Wheezy and compiler = "gcc (Debian 4.7.2-5) 4.7.2" Failure from "make check": /home/phargrov/OMP

[OMPI devel] ess:alps build failure with PGI

2015-02-06 Thread Paul Hargrove
The following in orte/mca/ess/alps/Makefile.am assumes a GNU (or GNU-like) compiler: mca_ess_alps_la_CPPFLAGS = $(ess_alps_CPPFLAGS) -fno-ident If building with PGI, the result is pgcc-Error-Unknown switch: -fno-ident when compiling orte/mca/ess/alps/ess_alps_component.c This is last night's

[OMPI devel] Shutdown-time crash via oob:ud

2015-02-06 Thread Paul Hargrove
With last night's master tarball (openmpi-dev-845-ga3275aa) on a Linux/x86-64 system, I am seeing a crash (below) from ring_c run on a login node. Other than CC/CXX/FC settings I've configured with only --prefix=... --enable-debug --with-tm=... This is occurring with at least the Gnu, Intel, Path

Re: [OMPI devel] opal_fifo SEGV from master

2015-02-12 Thread Paul Hargrove
(LWP 19685)] 0x00401417 in opal_fifo_pop_atomic (fifo=0x7fffe130) at /home/phargrov/OMPI/openmpi-master-linux-x86_64-sl7x/openmpi-dev-904-g08dceda/opal/class/opal_fifo.h:127 127 next = (opal_list_item_t *) item->opal_list_next; -Paul On Fri, Feb 6, 2015 at 4:22 PM, Paul Hargrove

Re: [OMPI devel] opal_fifo SEGV from master

2015-02-12 Thread Paul Hargrove
jelm > wrote: > > > > I think I see the issue. Looks like there is a missing memory > barrier > > after the head consistency code. I will add one and see if that > fixes > > your problem. > > > > BTW, I can't repro

Re: [OMPI devel] RFC: Remove embedded libltdl

2015-02-13 Thread Paul Hargrove
but in both cases only with Portland Group compilers (Gnu, Intel, PathScale and Open64 compilers worked fine). So, this is not a SUSE-specific issue. -Paul On Thu, Feb 12, 2015 at 11:49 PM, Philipp Thomas wrote: > Hi Paul, > sorry for chiming in so late, but this list is on low priority for m

Re: [OMPI devel] Labels, milestones, assignees on ompi-release PRs

2015-02-14 Thread Paul Hargrove
Jeff, It might be helpful to define "PR" before first use on this page and others. While PR = "Pull Request" is common usage among folks familiar with Git, it stands for "Problem Report" in other s/w development contexts (e.g. the gcc-bugs mailing list and associated bugzilla). So, the phrase "f

Re: [OMPI devel] Fortran issue

2015-02-20 Thread Paul Hargrove
INTEGER JEFF(3) DATA JEFF/4HMR. ,4HFORT,3HRAN/ If you can understand that, you should probably pretend you can't :-) -Paul [who has actually used Computed GO TO and Arithmetic IF] On Fri, Feb 20, 2015 at 11:28 AM, Larry Baker wrote: > Excellent, Mr. Fortran. Thank you. > > By th

Re: [OMPI devel] RFC: DL / DSO functionality

2015-02-24 Thread Paul Hargrove
Jeff, +0.95 Read the new PR yesterday and agree it makes sense to bypass libltdl where it would add little or nothing to a "dlopen-lovin' platform". Forgive me for asking a question I am sure I could answer by reading the .m4: How are you planning to distinguish which platforms support dlopen()?

Re: [OMPI devel] RFC: DL / DSO functionality

2015-02-24 Thread Paul Hargrove
See two responses inline below. On Tue, Feb 24, 2015 at 1:08 PM, Jeff Squyres (jsquyres) wrote: > On Feb 24, 2015, at 1:55 PM, Paul Hargrove wrote: > > > > Forgive me for asking a question I am sure I could answer by reading the > .m4: > > How are you planning to dis

Re: [OMPI devel] RFC: DL / DSO functionality

2015-02-24 Thread Paul Hargrove
On Tue, Feb 24, 2015 at 1:45 PM, Paul Hargrove wrote: [...] > > Smoke testing will begin momentarily... > [...] I am choking on all the smoke. Somebody call the fire marshall! It looks like with Jeff's tarball all the BSDs are failing i

Re: [OMPI devel] RFC: DL / DSO functionality

2015-02-24 Thread Paul Hargrove
platforms when configured with "--enable-static --disable-shared" All my x86, x86-64, ppc and ppc64 Linux platforms I did not test Linux on IA64, MIPS or ARM -Paul On Tue, Feb 24, 2015 at 2:44 PM, Paul Hargrove wrote: > > > On Tue, Feb 24, 2015 at 1:45 PM, Paul Hargrove

Re: [OMPI devel] RFC: DL / DSO functionality

2015-02-25 Thread Paul Hargrove
On Wed, Feb 25, 2015 at 8:45 AM, Jeff Squyres (jsquyres) wrote: > > SECOND: > > On {Free,Net,Open}BSD dlopen() appears in libc, not in libdl. > > So, I suspect one *should* be able to compile dl:dlopen on all these > systems with the proper configure tests. > > Cool; I'll fix this. ...done. L

Re: [OMPI devel] RFC: DL / DSO functionality

2015-02-25 Thread Paul Hargrove
On Wed, Feb 25, 2015 at 9:56 AM, Jeff Squyres (jsquyres) wrote: > On Feb 25, 2015, at 11:51 AM, Dave Goodell (dgoodell) > wrote: > > > >> This is a good question: what should we do here? > >> > >> 1. Abort the configure (e.g., insist that the user install libltdl or > --disable-dlopen) > > > > I

Re: [OMPI devel] RFC: DL / DSO functionality

2015-02-25 Thread Paul Hargrove
On Wed, Feb 25, 2015 at 10:17 AM, Paul Hargrove wrote: > I did that and just shipped a tarball to get Hargroved. >> > > Tests have been dispatched... I will report complete results later today. > The first of the BSD results should be in soon, and I'll plan to report

[OMPI devel] verbs and oob_ub breakage?

2015-02-25 Thread Paul Hargrove
FYI: On several systems where Jeff's tarball for pr410 ran fine yesterday, I am seeing errors in today's tarball related to either libverbs or mca_oob_ud. Issue #1: On Solaris verbs support is now rejected at configure time. Configure output appears below as "1)" Issue #2: On Linux I get undefi

Re: [OMPI devel] RFC: DL / DSO functionality

2015-02-25 Thread Paul Hargrove
, 2015, at 3:01 PM, Jeff Squyres (jsquyres) > wrote: > > > > Per my prior mail, m4 typo fixed -- could you release the hounds again? > > > >> On Feb 25, 2015, at 2:13 PM, Paul Hargrove wrote: > >> > >> > >> On Wed, Feb 25, 2015 at 10:1

Re: [OMPI devel] RFC: DL / DSO functionality

2015-02-25 Thread Paul Hargrove
On Wed, Feb 25, 2015 at 12:29 PM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > /me wishes yet again that shell scripting had a "strict" mode that would > yell at you when you use "$foop" instead of "$foo" (and $foop doesn't > exist/was never set). See http://redsymbol.net/articles/unof

Re: [OMPI devel] RFC: DL / DSO functionality

2015-02-25 Thread Paul Hargrove
; (and $foop doesn't > exist/was never set). > > > > > On Feb 25, 2015, at 3:04 PM, Paul Hargrove wrote: > > > > I've queued new tests for the platforms w/ verbs-related failures. > > Is there any point retesting the BSDs as well? > > > >

Re: [OMPI devel] RFC: DL / DSO functionality

2015-02-25 Thread Paul Hargrove
On Wed, Feb 25, 2015 at 4:14 PM, Jeff Squyres (jsquyres) wrote: > On Feb 25, 2015, at 4:17 PM, Paul Hargrove wrote: > > > > The Linux and Solaris verbs issues are resolved. > > Good. > > > The BSD results are unchanged. > > That means this, right: >

[OMPI devel] printf format warnings on master

2015-02-26 Thread Paul Hargrove
Clang noted the following on FreeBSD-10/amd64 using the current master tarball: Making check in threads make opal_thread opal_condition CC opal_thread.o CCLD opal_thread CC opal_condition.o /home/phargrov/OMPI/openmpi-master-freebsd10-amd64/openmpi-dev-1118-gdc80863/test/thr

[OMPI devel] mtl:psm configury build broken in master

2015-02-26 Thread Paul Hargrove
I have been testing mtl:psm on a very old system. Sometime pretty recently (this week I think), this started to break at configure time: --- MCA component mtl:psm (m4 configuration macro) checking for MCA component mtl:psm compile mode... dso checking --with-psm value... sanity check ok (/usr/loca

[OMPI devel] CORRECTION: mtl:psm configury broken (but NOT on master)

2015-02-26 Thread Paul Hargrove
your lap, Jeff. -Paul On Thu, Feb 26, 2015 at 4:12 PM, Paul Hargrove wrote: > I have been testing mtl:psm on a very old system. > Sometime pretty recently (this week I think), this started to break at > configure time: > > --- MCA component mtl:psm (m4 configuration macro) > checki

[OMPI devel] Master warning on oob:ud w/ PGI

2015-02-26 Thread Paul Hargrove
The warning below comes from pgi-14.7 on the latest master tarball (output from "make V=1"). -Paul libtool: compile: pgcc -DHAVE_CONFIG_H -I. -I/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-1118-gdc80863/orte/mca/oob/ud -I../../../../opal/include -I../../../

[OMPI devel] nearly-irrepoducable failure of master on Mac OS X 10.8

2015-02-26 Thread Paul Hargrove
Initially I was testing Jeff's tarball for PR 410, on Mac OS X 10.8 where cc is clang, I have configured with --prefix=[...] --enable-debug --enable-osx-builtin-atomics CC=cc CXX=c++ I passed "make check", but when I try to run ring_c I get the first failure shown (far) below. HOWEVER, I tried

[OMPI devel] Odd master build failure with Studio 12.4 on Linux w/ -m32

2015-02-26 Thread Paul Hargrove
I am using Oracle's Studio 12.4 compilers for Linux/x86-64 to build the current master tarball. However, I am passing "-m32" to generate x86 (ILP32 ABI) executables. The full configure mess is: --prefix=[...] --enable-debug \ CC=cc CFLAGS="-m32" --with-wrapper-cflags="-m32" \ CXX=CC CXXFLAGS="

[OMPI devel] master tarball broken

2015-03-04 Thread Paul Hargrove
I assume I am not the only one seeing the following with openmpi-dev-1184-gbb22d26.tar.bz2 make[2]: *** No rule to make target `/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-icc-12.1/BLD/opal/mca/common/verbs/ libmca_common_verbs.la', needed by `mca_oob_ud.la'. Stop. make[2]: Lea

[OMPI devel] Unwanted ibv_fork_init() mess(ages) and complaint for non-IB login node

2015-03-04 Thread Paul Hargrove
I have a system with InifniPath HCAs, where I've historically tested mtl:psm. For some reason, that appears to have ceased working some time in the past 4 months. However, this report is about something else. I am using the current master tarball: openmpi-dev-1203-g171d674.tar.bz2 When I ran confi

Re: [OMPI devel] Unwanted ibv_fork_init() mess(ages) and complaint for non-IB login node

2015-03-04 Thread Paul Hargrove
> fail and no one cares. > > Can you tell us why ibv_fork_init() would fail? > > > > > On Mar 4, 2015, at 9:56 AM, Paul Hargrove wrote: > > > > I have a system with InifniPath HCAs, where I've historically tested > mtl:psm. > > For some reason, that ap

Re: [OMPI devel] master tarball broken

2015-03-04 Thread Paul Hargrove
The problem reported below is still present in openmpi-dev-1203-g171d674 -Paul On Wed, Mar 4, 2015 at 4:47 AM, Jeff Squyres (jsquyres) wrote: > I think Mike just recently committed the fix. I'll go kick off a master > tarball creation manually. > > > > On Mar 3,

Re: [OMPI devel] Unwanted ibv_fork_init() mess(ages) and complaint for non-IB login node

2015-03-04 Thread Paul Hargrove
On Wed, Mar 4, 2015 at 1:04 PM, Dave Goodell (dgoodell) wrote: [...] > > libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. > > libibverbs: Warning: no userspace device-specific driver found for > /sys/class/infiniband_verbs/uverbs0 > > I think that warning is printed by lib

[OMPI devel] libfabric code does not build with pgi-{10,11}

2015-03-04 Thread Paul Hargrove
I don't know if anybody *cares*, but I find that the common:libfabric code compiles with pgi-{9,12,13,14} but not with pgi-{10,11}. These are versions 10.9 and 11.9 of pgi that built Open MPI 1.8.4 just fine. If somebody does care, let me know who and I'll send logs off-list. However, this can be

Re: [OMPI devel] Unwanted ibv_fork_init() mess(ages) and complaint for non-IB login node

2015-03-05 Thread Paul Hargrove
Mike, Mike, I think in this case "version does not match" really means "not installed". I am pretty sure, as Dave G. and I discussed earlier on this thread, those two lines are because the head node has an HCA and kernel modules, but only the "devel" libraries have been installed (because the HCA

Re: [OMPI devel] Unwanted ibv_fork_init() mess(ages) and complaint for non-IB login node

2015-03-05 Thread Paul Hargrove
On Thu, Mar 5, 2015 at 6:51 AM, Jeff Squyres (jsquyres) wrote: > These are the things that need to be figured out. If somebody has a set of mca params to recommend, I am prepared to collect as much debug/trace output as you need to understand in what order components are initializing. Also, I

Re: [OMPI devel] libfabric code does not build with pgi-{10,11}

2015-03-05 Thread Paul Hargrove
rity since the libfabric embedding > within > open mpi should hopefully soon be a thing of the past. > > Howard > > > 2015-03-04 14:28 GMT-07:00 Paul Hargrove : > >> I don't know if anybody *cares*, but I find that the common:libfabric >> code compiles with pgi-{9,12

Re: [OMPI devel] Unwanted ibv_fork_init() mess(ages) and complaint for non-IB login node

2015-03-05 Thread Paul Hargrove
In "work for you" does "you" == @PHHargrove? If YES: 1) are the changes to be tested reflected in the master tarball yet? 2) other than presence/absence of the warning how am I testing if the support "works" when the param is set to 1? -Paul On Thu, Mar 5, 2015 at 6:29 PM, Jeff Squyres (jsquyres

[OMPI devel] Master failure of oob_tcp on Solaris

2015-03-20 Thread Paul Hargrove
Seen earlier today with last night's master tarball: $ mpirun -mca btl sm,self,verbs -np 2 -host pcp-j-31,pcp-j-35 examples/ring_c' [pcp-j-35:01400] [/shared/OMPI/openmpi-master-solaris11-x64-ib-ss12u3/openmpi-dev-1351-gccba8ce/orte/mca/oob/tcp/oob_tcp_common.c:103] setsockopt(TCP_KEEPALIVE) faile

Re: [OMPI devel] Master failure of oob_tcp on Solaris

2015-03-20 Thread Paul Hargrove
twork routing requirements). -- On Fri, Mar 20, 2015 at 7:13 AM, Ralph Castain wrote: > Hi Paul > > It should have kept running, albeit with that warning - did the program > actually fail? > > > On Mar 1

Re: [OMPI devel] Master failure of oob_tcp on Solaris

2015-03-20 Thread Paul Hargrove
-- > > > > > On Fri, Mar 20, 2015 at 7:13 AM, Ralph Castain wrote: > >> Hi Paul >> >> It should have kept running, albeit with that warning - did the program >> actually fail? >> >> >> On Mar 19, 2015, at

Re: [OMPI devel] Opal atomics question

2015-03-26 Thread Paul Hargrove
In case anybody cares: In GASNet we have atomics for "add", "sub", "incr", "decr", and "dec-and-test". On some platforms all five are implemented in terms of a single inline-atomic for "add". There are platforms on which one or more of "incr", "decr" and "decr-and-test" have custom implementatio

Re: [OMPI devel] Opal atomics question

2015-03-26 Thread Paul Hargrove
Nathan, I test sparcv8+, sparcv9, ia64 and mips in release candidates. That isn't the same as *using* any of those platforms in production. I just mean to say that the implementations are known to pass "make check". -Paul On Thu, Mar 26, 2015 at 8:48 AM, Nathan Hjelm wrote: > > As a follow-on.

Re: [OMPI devel] Opal atomics question

2015-03-26 Thread Paul Hargrove
> On Thu, Mar 26, 2015 at 09:40:06AM -0700, Paul Hargrove wrote: > >Nathan, > >I test sparcv8+, sparcv9, ia64 and mips in release candidates. > >That isn't the same as *using* any of those platforms in production. > >I just mean to say that the impleme

Re: [OMPI devel] 1.8.5rc1 is ready for testing

2015-04-06 Thread Paul Hargrove
Two comments on the proposed NEWS: - Don't use inline functions with Clang compiler > Without motivation/explanation that sounds like a bad thing. > - Improved support for Cray > Cray's compilers, networks or the programming environment in general? -Paul -- Paul H. Hargrove

[OMPI devel] 1.8.5rc1 VT-related build failure

2015-04-06 Thread Paul Hargrove
I am trying to build Open MPI 1.8.5rc1 wit AMDs Open64 4.5.2.1. I see the following failure (from "make V=1") while building VT: openCC -DHAVE_CONFIG_H -I. -I/scratch/phargrov/OMPI/openmpi-1.8.5rc1-linux-x86_64-open64/openmpi-1.8.5rc1/ompi/contrib/vt/vt/tools/vtfilter -I../.. -I/scratch/phargro

[OMPI devel] 1.8.5rc1 MacOS X 10.8 build failure (libltdl?)

2015-04-06 Thread Paul Hargrove
On MacOS X 10.8 (where cc and c++ are clang and clang++, and the default ABI is LP64) I've configured the release candidate with --prefix= --enable-debug CC=cc CXX=c++ The build fails linking opal_wrapper as shown in the "make V=1" output below. Based on the undefined symbol (_lt_libltdlc_

Re: [OMPI devel] 1.8.5rc1 MacOS X 10.8 build failure (libltdl?)

2015-04-06 Thread Paul Hargrove
On Mon, Apr 6, 2015 at 5:13 PM, Ralph Castain wrote: [...] > I believe we have seen this before, and it was an issue caused by a change > in libtool itself. The Mac automatically updated to the new version, which > triggers the problem. > > This is why we embedded ltdl directly into the OMPI mast

Re: [OMPI devel] 1.8.5rc1 is ready for testing

2015-04-06 Thread Paul Hargrove
My testing of real hardware is almost complete and I've reported the only two issues[*] I encountered: http://www.open-mpi.org/community/lists/devel/2015/04/17183.php http://www.open-mpi.org/community/lists/devel/2015/04/17184.php There were roughly 5 failing configurations out of about 70.

Re: [OMPI devel] 1.8.5rc1 is ready for testing

2015-04-08 Thread Paul Hargrove
ot; ABIs on a 64-bit (MIPS 5Kc) system. I believe that completely covers the atomics implementations for those architectures. -Paul On Mon, Apr 6, 2015 at 6:38 PM, Paul Hargrove wrote: > My testing of real hardware is almost complete and I've reported the only > two issues[*] I encountered

Re: [OMPI devel] 1.8.5rc1 is ready for testing

2015-04-08 Thread Paul Hargrove
wrote: > Thanks Paul!! > > The VT fix is in the queue, if/when you can check it. I'm going to look at > the other issue today. > > > On Apr 8, 2015, at 8:10 AM, Paul Hargrove wrote: > > My 5 ARM and 3 MIPS testers completed without any problems. > > The ARM test

Re: [OMPI devel] 1.8.5rc1 is ready for testing

2015-04-08 Thread Paul Hargrove
Ralph and Bert, I found a couple issues with the VT fix. I will document them in the PR when I've finished collecting the outputs.. -Paul On Wed, Apr 8, 2015 at 8:40 AM, Paul Hargrove wrote: > Ralph, > > According to his comments in the PR, Bert already tested with the same >

Re: [OMPI devel] 1.8.5rc1 is ready for testing

2015-04-09 Thread Paul Hargrove
Paul Hargrove wrote: > Ralph and Bert, > > I found a couple issues with the VT fix. > I will document them in the PR when I've finished collecting the outputs.. > > -Paul > > On Wed, Apr 8, 2015 at 8:40 AM, Paul Hargrove wrote: > >> Ralph, >> >> Accord

Re: [OMPI devel] 1.8.5rc1 is ready for testing

2015-04-09 Thread Paul Hargrove
ou don't enable CUDA, then you should be able to test. > > On Thu, Apr 9, 2015 at 10:03 AM, Paul Hargrove wrote: > >> Ralph, >> >> It appears that Bert and I have iterated to a correct fix for the >> VT+Open64 issue. >> >> Did you make any progre

Re: [OMPI devel] 1.8.5rc1 is ready for testing

2015-04-09 Thread Paul Hargrove
Ok, looks good now on MacOS 10.8 (at v1.8.4-189-ga6169b4) -Paul On Thu, Apr 9, 2015 at 10:15 AM, Paul Hargrove wrote: > Ok, Ralph. > I will test the v1.8 nightly and report. > > -Paul > > On Thu, Apr 9, 2015 at 10:06 AM, Ralph Castain wrote: > >> I committed

Re: [OMPI devel] v1.8.5 NEWS and README

2015-04-17 Thread Paul Hargrove
On Fri, Apr 17, 2015 at 5:57 AM, Jeff Squyres (jsquyres) wrote: > - OS X (10.6, 10.7, 10.8, 10.9), 32 and 64 bit (x86_64), with gcc and > Absoft compilers (*) > Since about 10.7 (depending which XCode you installed), cc and c++ have been Clang and Clang++ on Mac OS X. The "gcc" is optional

Re: [OMPI devel] v1.8.5 NEWS and README

2015-04-17 Thread Paul Hargrove
On Fri, Apr 17, 2015 at 1:02 PM, Ralph Castain wrote: [...regarding Cray XC...] > If you don't want to support it, I understand - but we should be clear > that it may well work anyway. Ralph, Do you really want to enumerate all of the "it may well work anyway" platforms? If so, I have quite a

Re: [OMPI devel] interaction with slurm 14.11

2015-04-17 Thread Paul Hargrove
Ralph, I think David's concern is that because Slurm has changed their default behavior, Open MPI's default behavior has changed as well. The request (on which I have no opinion) appears to be that ORTE make an explicit request for the behavior that was the previous default in Slurm. That would en

Re: [OMPI devel] 1.8.5rc2 released

2015-04-21 Thread Paul Hargrove
ng on Intel Xeon Phi coprocessors. > - Opportunistically switch away from using GNU Libtool's libltdl > library when possible (by default). > - Fix some VampirTrace errors. Thanks to Paul Hargrove for reporting > the issues. > - Correct default binding patterns when --use-hw

Re: [OMPI devel] 1.8.5rc2 released

2015-04-21 Thread Paul Hargrove
ctly/completely from master to v1.8. -Paul [a.k.a. bot:hargrove] On Tue, Apr 21, 2015 at 4:22 PM, Paul Hargrove wrote: > Is the following configure-fails-by-default behavior really the desired > one in 1.8.5? > I thought this was more of a 1.9 change than a mid-series change. > &g

Re: [OMPI devel] 1.8.5rc2 released

2015-04-21 Thread Paul Hargrove
, 2015, at 7:38 PM, Paul Hargrove wrote: > > Sorry the output in the previous email left out some relevant detail. > See here that BOTH dl components were unable to compile with the 1.8.5rc2 > tarball: > > +++ Configuring MCA framework dl > checking for no configure componen

Re: [OMPI devel] 1.8.5rc2 released

2015-04-22 Thread Paul Hargrove
rocessors. > - Opportunistically switch away from using GNU Libtool's libltdl > library when possible (by default). > - Fix some VampirTrace errors. Thanks to Paul Hargrove for reporting > the issues. > - Correct default binding patterns when --use-hwthread-cpus was >

Re: [OMPI devel] Fwd: OpenIB module initialisation causes segmentation fault when locked memory limit too low

2015-04-22 Thread Paul Hargrove
processes, but perhaps mpirun propogates the rlimits. So, on NERSC's Carver I can reproduce the problem (in my build of 1.8.5rc2) quite easily (below). I have configured with --enable-debug, which probably explains why I see an assertion failure rather than the reported SEGV. -Paul {hargrove@

Re: [OMPI devel] Fwd: OpenIB module initialisation causes segmentation fault when locked memory limit too low

2015-04-22 Thread Paul Hargrove
, 2015 at 9:41 AM, Paul Hargrove wrote: > Howard, > > Unless there is some reason the settings must be global, you should be > able to set the limits w/o root privs: > > Bourne shells: > $ ulimit -l 64 > C shells: > % limit -h memorylocked 64 > > I would have

[OMPI devel] Broken flex-required error message

2015-04-22 Thread Paul Hargrove
When building from a git clone of master I encountered the following: checking for flex... no checking for lex... no configure: WARNING: *** Could not find GNU Flex on your system. configure: WARNING: *** GNU Flex required for developer builds of Open MPI. configure: WARNING: *** Other versions of

[OMPI devel] Incorrect timer frequency [w/ patch]

2015-04-22 Thread Paul Hargrove
I had reason to look at the linux timer code today and noticed what I believe to be a subtle error. This is in both 'master' and v1.8.5rc2 Since casts bind tighter than multiplication in C, I believe that the 1-line patch below is required to produce the desired result of conversion to an integer

Re: [OMPI devel] Broken flex-required error message

2015-04-22 Thread Paul Hargrove
On Wed, Apr 22, 2015 at 2:02 PM, Jeff Squyres (jsquyres) wrote: > FWIW, it *did* print a list of files for me on my Mac when I faked it out > and forced it to *not* find flex. > A quick look and the commit shows for lfile in `find . -name \*.l -print`; do Notice the find is rooted at ".".

[OMPI devel] powerpc64le support [1-line patch]

2015-04-22 Thread Paul Hargrove
I had an opportunity to try the 1.8.5rc2 tarball on a little-endian POWER8 (aka ppc64el or powerpc64le). The good news is that things "just worked" as they did when I tried ARMv8 (aka aarch64). However, I see a little room for improvement with almost no work at all. I noticed: checking for __syn

[OMPI devel] 1.8.5rc2 testing report

2015-04-22 Thread Paul Hargrove
Well, I tried rc2 on just about everything except my phone and my linksys. For me the configure failure (dlopen() not found) on {Free,Net,Open}BSD is the only problem. Since it works on 'master' I am confident Jeff will sort this out. -Paul -- Paul H. Hargrove phhargr..

Re: [OMPI devel] powerpc64le support [1-line patch]

2015-04-22 Thread Paul Hargrove
On Wed, Apr 22, 2015 at 2:43 PM, Jeff Squyres (jsquyres) wrote: > > In addition to the one-line patch below, I needed to run autogen.pl > with a new enough config/config.{guess,sub}. > > Along the way I noticed > > opal/mca/common/libfabric/libfabric/config/config.guess > > opal/m

Re: [OMPI devel] 1.8.5rc2 released

2015-04-22 Thread Paul Hargrove
past that point and making updates to the PR instead of this email thread. -Paul > > > > On Apr 21, 2015, at 8:40 PM, Paul Hargrove wrote: > > > > > > > > On Tue, Apr 21, 2015 at 5:33 PM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: &g

[OMPI devel] Minor error in distscript.csh

2015-04-23 Thread Paul Hargrove
When running "make dist" (several times) today, I noticed the following: - WARNING: Got bad config.guess from ftp.gnu.org (non-existent or empty) - WARNING: using included versions for both config.sub and config.guess Nevermind for the moment that the wget is NOT from ftp.gnu.org any longer. Th

[OMPI devel] Suggested README changes

2015-04-23 Thread Paul Hargrove
I have attached a patch (against master) that fixes some typos and makes an update. It applies *almost* cleanly to v1.8, requiring "-C2" if applying with "git apply" due to context changes. I also noted the following which I believe is just plain false, but don't have an alternative for. Portals

Re: [OMPI devel] Suggested README changes

2015-04-23 Thread Paul Hargrove
Applied -- thank you! > > -Jeff [accepting patches from Paul since 2002] > > ;-p > > > On Apr 23, 2015, at 2:29 PM, Paul Hargrove wrote: > > > > I have attached a patch (against master) that fixes some typos and makes > an update. > > It applies almost clea

[OMPI devel] "maybe" issue in 1.8.5rc[23]

2015-04-23 Thread Paul Hargrove
I have a system w/ xlc-11.1. It has essentially always failed "make check" in a LP64 build due to xlc botching the atomics. So, when it failed with 1.8.5.rc2 I didn't look closely. Today it has failed with rc3 and I *did* look closely and here is what I see: PASS: predefined_gap_test /bin/sh: lin

Re: [OMPI devel] "maybe" issue in 1.8.5rc[23]

2015-04-23 Thread Paul Hargrove
/atomic.h:158 So, this is a new symptom of the known inability of this compiler to get the inline asm right. Sorry for the false alarm, -Paul On Thu, Apr 23, 2015 at 4:09 PM, Paul Hargrove wrote: > I have a system w/ xlc-11.1. > It has essentially always failed "make check" in a LP64

[OMPI devel] 1.8.5rc3 preliminary testing report

2015-04-23 Thread Paul Hargrove
Sorry my ARMv6, ARMv8 and PowerPC64LE systems were dedicated to other purposes today. So, I was only able to test 1.8.5rc3 against 71 distinct configurations. ;-) There was 1 known-failure (xlc-11.1 LP64) that scared me by changing failure modes between rc1 and rc2. Several slow tests (Linux on M

[OMPI devel] Unsolicited code review of new distscript.sh

2015-04-23 Thread Paul Hargrove
I gave "make dist" a try on NetBSD (with its own /bin/sh) and Ubuntu Trusty (w/ /bin/sh symlinked to dash). Both generated the tarballs, but dash spewed some warnings on the unalias commands. So here is my code review (roughly as long as the scipt itself). 1) #!/usr/bin/env sh The presence of

[OMPI devel] Dead code in opal_config_asm.4

2015-04-24 Thread Paul Hargrove
There is a block of code near the start of the OPAL_CONF_ASM which begins: # OS X Leopard ld bus errors if ... However, Leopard is OS X 10.5 and the minimum supported by Open MPI is 10.6. So, that code should be unreachable at this time (and since Jan 2014 http://www.open-mpi.org/community/lists

Re: [OMPI devel] 1.8.5....going once...going twice...

2015-04-24 Thread Paul Hargrove
5 of the 6 MIPS and ARM testers that were still running last night have completed successfully. No reason to think the last one won't pass on rc3 as it did on rc2, if given another 2 or 3 hours to complete. -Paul On Fri, Apr 24, 2015 at 9:52 AM, Ralph Castain wrote: > Any last minute issues peo

[OMPI devel] My 1.8.5rc3 testing report

2015-04-24 Thread Paul Hargrove
All done! All good! Summary: 3 Unavailable 1 Known bad[*] configuration 70 PASS -Paul [*] Compiler bug confirmed by Nysal On Thu, Apr 23, 2015 at 7:29 PM, Paul Hargrove wrote: > Sorry my ARMv6, ARMv8 and PowerPC64LE systems were dedicated to other > purposes today. &

Re: [OMPI devel] powerpc64le support [1-line patch]

2015-04-24 Thread Paul Hargrove
reloading of firmware is involved. -Paul On Fri, Apr 24, 2015 at 3:04 PM, Troy Benjegerdes wrote: > On Wed, Apr 22, 2015 at 02:19:07PM -0700, Paul Hargrove wrote: > > I had an opportunity to try the 1.8.5rc2 tarball on a little-endian > POWER8 > > (aka ppc64el or powerpc64le).

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579

2015-05-16 Thread Paul Hargrove
AIX, Solaris and {Free,Open,Net}BSD results are also not consistent with regards to units used for reporting: AIX$ no -o tcp_keepidle -o tcp_keepintvl tcp_keepidle = 14400 tcp_keepintvl = 150 {phargrov@solaris11-amd64 ~}$ ndd -get /dev/tcp tcp_keepalive_interval 720 [phargrov@freebsd10-amd64

Re: [OMPI devel] Proposal: update Open MPI's version number and release process

2015-05-20 Thread Paul Hargrove
On Tue, May 19, 2015 at 9:37 PM, Howard Pritchard wrote: > Pretty soon the developer will get trained to use the PR process, unless > they are that engineer I've yet to meet who always writes flawless code. I've never met that developer, either. However, I have met one (and thankfully only one)

Re: [OMPI devel] 1.8.6rc1 ready for test

2015-05-24 Thread Paul Hargrove
I see a failure on OpenBSD when building ROMIO: libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I/home/phargrov/OMPI/openmpi-1.8.6rc1-openbsd5-amd64/openmpi-1.8.6rc1/ompi/mca/io/romio/romio -I./adio/include -DOMPI_BUILDING=1 -I/home/phargrov/OMPI/openmpi-1.8.6rc1-openbsd5-amd64/openmpi-1.8.

Re: [OMPI devel] 1.8.6rc1 ready for test

2015-05-24 Thread Paul Hargrove
ng code assumes that if that configure test passed, then struct statfs contains a f_type field. I do notice that Open MPI's configure actually checks rather than assuming this: checking for struct statfs.f_type... no -Paul On Sun, May 24, 2015 at 5:57 PM, Paul Hargrove wrote: > I see a fa

Re: [OMPI devel] 1.8.6rc1 ready for test

2015-05-25 Thread Paul Hargrove
I still have some slow emulated ARM and MIPS testers running (though some positive ARM results are in). I will report the MIPS and remaining ARM results when they are all ready (probably Tue AM). However, with 70 configurations tested the only issue I've seen is the ROMIO build failure on OpenBSD-

Re: [OMPI devel] 1.8.6rc1 ready for test

2015-05-26 Thread Paul Hargrove
On Sun, May 24, 2015 at 10:24 PM, Paul Hargrove wrote: > I will report the MIPS and remaining ARM results when they are all ready > (probably Tue AM). > They completed normally. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Sys

Re: [OMPI devel] 1.8.6rc2 ready for test

2015-06-16 Thread Paul Hargrove
My testing report for 1.8.6rc2 is nearly unchanged relative to rc1: Around 70 configurations have completed, but slow qemu-emulated ARM and MIPS systems will continue overnight. At this point everything is successful except for some known issues (also present in 1.8.5 and thus not regressions):

Re: [OMPI devel] [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-25 Thread Paul Hargrove
I can see cloning of existing component's source as a starting point for a new one as a common occurrence (at least relative to creating new components from zero). So, this is probably not the last time this will ever occur. Would a build with --disable-dlopen have detected this problem (by failin

Re: [OMPI devel] [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-25 Thread Paul Hargrove
On Thu, Jun 25, 2015 at 4:59 PM, Gilles Gouaillardet wrote: > In this case, mca_coll_hcoll module is linked with the proprietary > libhcoll.so. > the ml symbols are defined in both mca_coll_ml.so and libhcoll.so > i am not sure (i blame my poor understanding of linkers) this is an error > if > Op

<    3   4   5   6   7   8   9   10   >