, 2014, at 7:52 PM, Jed Brown wrote:
> >
> >> I don't have experience with GerritHub, but Bitbucket supports this
> >> feature (permissions on branch names/globs) and we use it in PETSc.
> > Thanks for the info. Paul Hargrove said pretty much the same thing to
Bitbucket today. It may be a workable
> model that the main OMPI repo (and wiki and tickets) is at Bitbucket, and
> most other repos (and wikis and tickets) are at Github.
> > 2. I just sent a mail to Github support asking them if they plan to
> support per-branch push ACLs. I don
On Thu, Sep 25, 2014 at 2:28 PM, Jed Brown wrote:
> Paul Hargrove writes:
> > The GUIs for things like browsing commits, viewing diffs, etc are pretty
> > similar in capability and each is sufficiently intuitive (after a brief
> > learning curve) that I don't find I ne
I agree with George that zeroing memory only in the debug builds could hide
bugs, and thus would want to see the debug and non-debug builds have the
same behavior (both malloc or both calloc). So, I also agree this looks
initially like a hard choice.
What about using malloc() in non-debug builds
) when --with-valgrind
> is specified on the command line?
>
> I.e., don't tie it to debug builds, but to valgrind-enabled builds?
>
>
> On Oct 3, 2014, at 6:11 PM, Paul Hargrove wrote:
>
> > I agree with George that zeroing memory only in the debug builds could
> hid
I know of two possibilities:
1) I cannot be certain but since the message concerns a PC-relative
addressing mode, it is possible that something needs to be compiled with
-fPIC to fix the issue. See if adding that option to any of the mpicc
commands helps.
2) Try adding ONE of "-ll", "-lfl" or "-
> D-52062 Aachen
>
> Phone +49 (0)241 80 99932
> fri...@cats.rwth-aachen.de
> http://www.cats.rwth-aachen.de
>
> On 18.10.2014, at 02:24, Paul Hargrove wrote:
>
> I know of two possibilities:
>
> 1) I cannot be certain but since the message concerns a PC-re
I can shed some light on these warnings.
sem_init() and sem_destroy() are POSIX-defined interfaces for UNNAMED
semaphores.
There are also POSX interfaces, sem_{open,close,unlink}(), that operate on
NAMED semaphores.
See for more info:
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sema
On Mon, Oct 27, 2014 at 2:42 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
[...]
> Paul, since you have access to many platforms, could you please run this
> test with and without -D_REENTRANT / -D_THREAD_SAFE
> and tell me where the program produces incorrect behaviour (output i
l
On Mon, Oct 27, 2014 at 2:48 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:
> Thanks Paul !
>
> Gilles
>
> On 2014/10/27 18:47, Paul Hargrove wrote:
>
> On Mon, Oct 27, 2014 at 2:42 AM, Gilles Gouaillardet
> wrote:
> [...]
>
>
On Tue, Oct 28, 2014 at 11:53 AM, Howard Pritchard
wrote:
>
>> We may no longer require those as you have separated the Cray check out,
>> but the original problem is that we would pickup the Slurm components on
>> the Cray because we would find pmi.h
>>
>> Oh, I forgot about that .
>
In GASNet
edison08 ~}$ rpm -qf /opt/cray/pmi/default/include/pmi_cray.h
cray-libpmi-devel-5.0.5-1..10300.134.8.ari
-Paul
On Tue, Oct 28, 2014 at 12:02 PM, Ralph Castain wrote:
>
> On Oct 28, 2014, at 11:59 AM, Paul Hargrove wrote:
>
>
> On Tue, Oct 28, 2014 at 11:53 AM, Howard Pritchard
On Tue, Oct 28, 2014 at 12:20 PM, Ralph Castain wrote:
> On Oct 28, 2014, at 12:17 PM, Paul Hargrove wrote:
>
> Ralph,
>
> The Cray's at NERSC have *both* pmi_cray.h and pmi.h (and pmi2.h as well).
>
>
> I understand that - I was questioning if that is univers
kages are suppose to include
> all dependencies on headers files, libs, etc. from other cay packages.
>
> Howard
>
>
>
>
> 2014-10-28 13:20 GMT-06:00 Ralph Castain :
>
>>
>> On Oct 28, 2014, at 12:17 PM, Paul Hargrove wrote:
>>
>> Ralph,
>&g
Amit,
You appear to be mixing PGI and GNU compilers, as shown by the "g++" in the
final portion of your output.
You must configure Open MPI with all compilers (C, C++ and Fortran) from
the same "family".
-Paul
On Wed, Oct 29, 2014 at 1:11 PM, Kumar, Amit wrote:
> Dear Developers,
>
> I have r
On Mon, Nov 3, 2014 at 8:29 AM, Dave Goodell (dgoodell)
wrote:
> > btw, is there a push option to abort if that would make github history
> non linear ?
>
> No, not really. There are some options to "pull" to prevent you from
> creating a merge commit, but the fix when you encounter that situati
IIRC it was not possible to merge with a dirty tree with git 1.7.
So, Dave, you may have been bitten in those dark days.
-Paul
On Mon, Nov 3, 2014 at 8:49 AM, Dave Goodell (dgoodell)
wrote:
> On Nov 3, 2014, at 10:41 AM, Jed Brown wrote:
>
> > "Dave Goodell (dgoodell)" writes:
> >> Most of the
Not clear if the following failure is Solaris-specific, but it *IS* a
regression relative to 1.8.3.
The system has 2 IPV4 interfaces:
Ethernet on 172.16.0.119/16
IPoIB on 172.18.0.119/16
$ ifconfig bge0
bge0: flags=1004843 mtu 1500
index 2
inet 172.16.0.119 netmask broadcas
27;m not sure why the
> connection is failing.
>
> Thanks
> Ralph
>
> On Nov 3, 2014, at 5:56 PM, Paul Hargrove wrote:
>
> Not clear if the following failure is Solaris-specific, but it *IS* a
> regression relative to 1.8.3.
>
> The system has 2 IPV4 interfaces:
>
think of Gilles's recent issues w/ errno on Solaris unless
_REENTRANT was defined.
So, I tried building again after configuring with CFLAGS=-D_REENTRANT
AND THAT DID THE TRICK.
-Paul
On Mon, Nov 3, 2014 at 7:23 PM, Paul Hargrove wrote:
> Ralph,
>
> Requested output is attached.
>
the latest trunk tarball?
> This looks familiar to me, and I wonder if we are just missing a changeset
> from the trunk that fixed the handshake issues we had with failing over
> from one transport to another.
>
> Ralph
>
> On Nov 3, 2014, at 7:23 PM, Paul Hargrove wrote:
Jeff wrote:
MPI_THREAD_MULTIPLE support barely works in v1.8. Why have it on by
default, especially when there's a performance penalty?
I think the "barely works" state of threading support is a stronger
argument for return to the 1.6.x behavior than PSM performance. Who knows
what subtle bugs h
All atomics must be done through not just "the same btl" but the same btl
MODULE, since atomics from two IB HCAs, for instance, are not necessarily
coherent. So, how is the "best" one to be selected?
-Paul [Sent from my phone]
On Nov 5, 2014 7:15 AM, "Nathan Hjelm" wrote:
>
> In the new osc com
Ralph,
I downloaded the attachment and found it to be a gzipped tar file
containing a single text file "log".
I have attached the bzipped (not tarred) log file.
-Paul
On Tue, Nov 25, 2014 at 7:29 AM, Ralph Castain wrote:
> I don't know what you put in that log file, but it was an executable an
Allan,
A likely possibility is that some important kernel feature (that Open MPI
assumes is present) is missing.
That includes not only "kernel modules" as you mention, but also features
configure in (or out) of the base kernel.
For instance, some embedded kernels omit UNIX-domain sockets and SysV
IX-domain sockets and Sys V
> IPC are both enabled in the build. Are there any other possibilities I can
> check?
>
> Thanks,
> Di
>
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory <http://vast.cs.ucla.edu/>,
> Department of Computer Science, UC Los Ang
; Regards,
>> Di
>>
>> On Tue, Nov 25, 2014 at 2:25 PM, Ralph Castain wrote:
>>
>>>
>>> This is all running on a single node, correct? If so, did you configure
>>> OMPI with â EURO "enable-debug?
>>>
>>> If you can do that, or
t; I'll have to look - there isn't supposed to be such a requirement, and I
> certainly haven't seen it before.
>
>
> On Nov 25, 2014, at 3:26 PM, Paul Hargrove wrote:
>
> Allan,
>
> I am glad things are working for you now.
> I can confirm (on a QE
On Tue, Nov 25, 2014 at 5:37 PM, Ralph Castain wrote:
> So it looks like the issue isn't so much with our code as it is with the
> OS stack, yes? We aren't requiring that the loopback be "up", but the stack
> is in order to establish the connection, even when we are trying a non-lo
> interface.
ause it
> generally isn't necessary on a cluster.
>
>
> is a backport (since this is already available in the trunk/master) simply
> out of the question ?
>
>
> It would be against our normal procedures, but I can raise it at next
> week's meeting.
>
>
&g
Howard,
I regularly test release candidates against the PGI installations on
NERSC's systems (and sometimes elsewhere). In fact, have a test of
1.8.4rc2 against pgi-14.4 "in the pipe" right now.
I believe Larry Baker of USGS is also a PGI user (in production, rather
than just testing as I do).
Testing the 1.8.4rc2 tarball on my x86-64 Solaris-11 systems I am getting
the following crash for both "-m32" and "-m64" builds:
$ mpirun -mca btl sm,self,openib -np 2 -host pcp-j-19,pcp-j-20
examples/ring_c'
[pcp-j-19:18762] *** Process received signal ***
[pcp-j-19:18762] Signal: Segmentation Fa
I think I've reported this earlier in the 1.8 series.
If I compile on an SGI UV (e.g. blacklight at PSC) configure picks up the
presence of xpmem headers and enables the vader BTL.
However, the port of vader to SGI's "flavor" of xpmem is incomplete and the
following build failure results:
make[2]:
+0x5c
[niagara1:29881] *** End of error message ***
Segmentation Fault - core dumped
On Thu, Dec 11, 2014 at 3:29 PM, Ralph Castain wrote:
> Ah crud - incomplete commit means we didn't send the topo string. Will
> roll rc3 in a few minutes.
>
> Thanks, Paul
> Ralph
>
> On
ifferent - it's failing in mpirun itself. Can you get a
> line number on it?
>
> Sorry for delay - I'm generating rc3 now
>
>
> On Dec 11, 2014, at 6:59 PM, Paul Hargrove wrote:
>
> Don't see an rc3 yet.
>
> My Solaris-10/SPARC runs fail slightly differently (
Ralph,
Sorry to be the bearer of more bad news.
The "good" news is I've seen the new warning regarding the lack of a
loopback interface.
The BAD news is that it is occurring on a Linux cluster that I'ver verified
DOES have 'lo' configured on the front-end and compute nodes (UP and
RUNNING accordin
for not having reviewed in a timely manner) seems
> to check
> there is a *selected* loopback interface.
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/12 13:15, Paul Hargrove wrote:
>
> Ralph,
>
> Sorry to be the bearer of more bad news.
> The "good" n
ngs : mpirun + 2 orted
> + 2 mpi tasks
>
> do you have any oob_tcp_if_include or oob_tcp_if_exclude settings in your
> openmpi-mca-params.conf ?
>
> here is attached a patch to fix this issue.
> what we really want is test there is a loopback interface, period.
> the current c
On a Linux system configured without java support I see the following two
dangling symlinks installed in ${prefix}/bin:
lrwxrwxrwx 1 phhargrove phhargrove 8 Dec 11 23:52 oshjavac -> mpijavac
lrwxrwxrwx 1 phhargrove phhargrove 8 Dec 11 23:52 shmemjavac -> mpijavac
It seems there is some logic mi
First, I want to ask what became of the issue discussed in this thread?
http://www.open-mpi.org/community/lists/devel/2014/11/16160.php
I though we had concluded that one just needed -D_REENTRANT.
I mention that only for completeness, because I think my current problem is
different.
The followi
stain wrote:
>
> Thanks Paul - I will post a fix for this tomorrow. Looks like Sparc isn't
> returning an architecture type for some reason, and I didn't protect
> against it.
>
>
> On Dec 11, 2014, at 7:39 PM, Paul Hargrove wrote:
>
> Backtrace for the Solaris-10/SPARC S
> Afraid I'm drawing a blank, Paul - I can't see how we got to a bad address
> down there. This is at the beginning of orte_init, so there are no threads
> running nor has anything much happened.
>
> Do you have any suggestions?
>
>
> On Dec 12, 2014,
NOTE:
The existing code for "%l." in guess_strlen() is garbage.
The va_arg() macro calls all have "int" for the type!!
I am *only* testing a fix for the missing "%u" at the moment.
-Paul
On Fri, Dec 12, 2014 at 3:14 PM, Paul Hargrove wrote:
> Thanks, Gille
imeout 300", and have also attached the resulting stderr.
No joy for either timeout value.
-Paul
>
> On Dec 12, 2014, at 8:53 AM, Paul Hargrove wrote:
>
>
>
> First, I want to ask what became of the issue discussed in this thread?
>http://www.open-mpi.org/community
On Fri, Dec 12, 2014 at 4:29 PM, Ralph Castain wrote:
> All right - I'll surrender and remove the timeout. Will release rc4 later
> tonight.
>
> Sorry for putting you thru this Paul - for some reason, these problems
> aren't showing up elsewhere.
>
Even at a 300s timeout I don't get a connection
ph Castain wrote:
>
> I'm hoping it will fix it. The timeout code was the only change from 1.8.3
> besides the loopback warning, so it should restore the prior behavior.
>
>
> On Dec 12, 2014, at 4:32 PM, Paul Hargrove wrote:
>
>
> On Fri, Dec 12, 2014 at 4:2
have been written to the
* final string if enough space had been available.
*/
static int guess_strlen(const char *fmt, va_list ap)
{
char dummy[1];
return 1 + vsnprintf(dummy, 1, fmt, ap);
}
BTW: I do see some messages like "select: Interrupted system call" which I
assume are
ve the problem. We can then address
> the broader question (e.g., do we even need this stuff any more at all?) in
> a more leisurely way.
>
>
> On Dec 12, 2014, at 5:42 PM, Larry Baker wrote:
>
> On 12 Dec 2014, at 5:22 PM, Paul Hargrove wrote:
>
> HOWEVER, while the patch
n Fri, Dec 12, 2014 at 5:17 PM, Ralph Castain wrote:
> No need for autogen - simple change to a couple of files
>
>
>
> On Dec 12, 2014, at 4:38 PM, Paul Hargrove wrote:
>
> Ralph,
>
> Patches to *code* are fine, but I am not equipped to autogen.
>
> -Paul
>
> On
It appears that with Ralph's oob_tcp patches (paul.diff) everything is now
OK on Solaris-11/x86-64.
On Solaris-10/SPARC I needed to fix guess_strlen() (or change "%u" to "%d"
to avoid the issue) or else I didn't get very far at all (SEGV in orterun).
However, with that issue resolved things are st
My testing on 1.8.4rc4 is not quite done, but is getting close.
With two exceptions, so far all looks good to me on almost 60 different
platforms.
I've retested on my Solaris systems and saw none of the issues I had with
rc3.
The x86-64/Linux system with mtl:psm is no longer giving a SEGV at exit.
On Sun, Dec 14, 2014 at 10:52 PM, Paul Hargrove wrote:
>
> Solaris-10/SPARC and "--enable-static --disable-shared" appears broken for
> C++ apps (but OK for C).
> I will report in more details when I have more information.
>
First the good news:
The problem I was exper
On Mon, Dec 15, 2014 at 5:35 AM, Ralph Castain wrote:
>
> 7. Linkage issue on Solaris-11 reported by Paul Hargrove. Missing the
> multi-threaded C libraries, apparently need "-mt=yes" in both compile and
> link. Need someone to investigate.
The lack of multi-thread libra
nd.
-Paul
On Mon, Dec 15, 2014 at 12:52 PM, Paul Hargrove wrote:
>
>
> On Mon, Dec 15, 2014 at 5:35 AM, Ralph Castain wrote:
>>
>> 7. Linkage issue on Solaris-11 reported by Paul Hargrove. Missing the
>> multi-threaded C libraries, apparently need "-mt=yes" i
.
I am getting less certain that my speculation about thread-safe libs is
correct.
-Paul
On Mon, Dec 15, 2014 at 1:24 PM, Paul Hargrove wrote:
>
> A little more reading finds that...
>
> Docs says that one needs "-mt&q
the CLOSE_THE_SOCKET macro resets errno, and hence the confusing
> error message
> e.g. failed: Error 0 (0)
>
> FWIW, master is also affected.
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/16 10:47, Paul Hargrove wrote:
>
> I have tried with a oob_tcp_if_include setting so
is 11 (at least with gcc compilers) do not
> need any flags
> (except the -D_REENTRANT that is added automatically)
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/16 12:10, Paul Hargrove wrote:
>
> Gilles,
>
> I will try the patch when I can.
> However, our network is u
sabled and
try again.
Use of "-mca oob_tcp_if_include bge0" to use a single interface did not fix
this.
-Paul
On Mon, Dec 15, 2014 at 7:18 PM, Paul Hargrove wrote:
>
> Gilles,
>
> I am NOT seeing the problem with gcc.
> It is only occurring with the S
des)
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/16 16:00, Paul Hargrove wrote:
>
> Gilles,
>
> I looked again carefully and I am *NOT* finding -D_REENTRANT passed to most
> compilations.
> It appears to be used for building libevent and vt, but nothing else.
ts pcp-j-20
> says ?
>
> BTW, did you try without -m64 ?
>
> Does the following work
> ping/ssh 172.18.0.120
>
> Honestly, this output makes very little sense to me, so i am asking way
> too much info hoping i can reproduce this issue or get a hint on what can
> possibly
S ? (or is -D_REENTRANT enough ?)
> LDFLAGS ? (that might be solaris and/or solarisstudio (12.4) specific and
> i simply ignore it)
>
> Bottom line, i do invite you to test 1.8.4rc4 again and with
> CFLAGS="-mt"
> or
> CFLAGS="-mt -m64"
> if you previ
My 1.8.3 build has not completed.
HOWEVER, I can already see a key difference in the configure step.
In 1.8.3 "-mt" was added AUTOMATICALLY to CFLAGS by configure:
checking if C compiler and POSIX threads work as is... no - Solaris, not
checked
checking if C++ compiler and POSIX threads work as i
;
The resulting run worked!
So, I very strongly suspect that the problem will be resolved if one
restores the configure logic that my previous email shows has vanished
(since that would restore "-mt" to CFLAGS and wrapper cflags).
-Paul
On Tue, Dec 16, 2014 at 8:10 PM, Paul Hargrove wrote:
to the tarball.
>
> Ralph
>
>
> On Tue, Dec 16, 2014 at 9:59 PM, Paul Hargrove wrote:
>
>> Gilles,
>>
>> The 1.8.3 test works where the 1.8.4rc4 one fails with identical
>> configure arguments.
>>
>> While it may be overkill, I conf
W. i was unable to reproduce the problem on solaris 11 with sunstudio
> 12.4 even if i do not use -D_REENTRANT *nor* -mt (!)
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/17 15:01, Ralph Castain wrote:
>
> Hi Paul
>
> Can you try the attached patch? It would require run
gt; it is worth giving it a try (to be 100.0% sure ...)
>
> can you please do that tomorrow ?
>
> in the mean time, if we (well Ralph indeed) want to release 1.8.4, then
> simply restore
> the two config files i mentionned.
>
> Cheers,
>
> Gilles
>
>
> On 201
it that
> sets the -D_REENTRANT CFLAGS on solaris/solarisstudio
>
> https://github.com/open-mpi/ompi-release/commit/ac8b84ce674b958dbf8c9481b300beeef0548b83
>
> Cheers,
>
> Gilles
>
>
> On 2014/12/17 15:56, Paul Hargrove wrote:
>
> I've queued 3 tests:
>
&g
Results of tests described below:
1) SEGV in hwloc - will report later
2) PASS
3) PASS
So, both -D_REENTRANT or -mt are working for me IF added both the CFLAGS
and wrapper-cflags.
-Paul
On Tue, Dec 16, 2014 at 10:56 PM, Paul Hargrove wrote:
>
> I've queued 3 tests:
>
> 1) o
I tried last nights v1.8 tarball (openmpi-v1.8.3-272-g4e4f997.tar.bz2) with
the Studio Compilers (v12.3) on a Solaris/x86-64 system.
Configure args (other than prefix) were:
--enable-debug --with-verbs \
CC=cc CXX=CC FC=f90 \
CFLAGS=-m64 --with-wrapper-cflags=-m64 \
FCFLAGS=-m64 --with-wrapper-fcf
Wednesday, December 17, 2014 3:53 PM
> *To:* de...@open-mpi.org
> *Subject:* Re: [OMPI devel] Solaris/x86-64 SEGV with 1.8-latest
>
>
>
> Le 17/12/2014 21:43, Paul Hargrove a écrit :
>
>
>
> Dbx gives me
>
> t@1 (l@1) terminated by signal SEGV (no mapping
.8 tree, and is in the latest
> nightly tarball.
>
> If I'm following this thread right -- and I might not be! -- I think
> Gilles is saying that now that the __sun check is in, it should fix this
> -mt/-D_REENTRANT/whatever problem.
>
> Can you confirm?
>
>
> On Dec 16,
Tests queued on 61 distinct configurations... will share results when I've
got them.
-Paul
On Wed, Dec 17, 2014 at 9:15 PM, Ralph Castain wrote:
>
> Hi folks
>
> Trying to bring this to closure, so hopefully this is the last one. Please
> give it a smoke test:
>
> http://www.open-mpi.org/softwar
On Wed, Dec 17, 2014 at 7:17 PM, Paul Hargrove wrote:
>
> I am going to run the nightly on other configs on both my
> Solaris-11/x86-64 and Solaris-10/SPARC systems.
> I just want to be sure some other compile/abi/arch combination didn't get
> broken by accident.
> I will
With results from about 50 out of 61 platforms:
+ KNOWN: SGI UV is still "broken-by-default" (fails compiling vader unless
configured with --without-xpmem)
+ NEW: I see Fortran bindings failing to compile w/ gfortran
+ NEW: I see Fortran bindings fail to link with Open64
I also have unexplained e
argrove/GSCRATCH/OMPI/openmpi-1.8.4rc5-linux-x86_64-gcc-atomics/openmpi-1.8.4rc5/ompi/mpi/fortran/mpif-h/sizeof-mpif08-pre-1.8.4_f.F90:104
[...about 180 more lines of similar output...]
On Thu, Dec 18, 2014 at 9:30 AM, Jeff Squyres (jsquyres) wrote:
>
> On Dec 18, 2014, at 11:55 AM, Paul Ha
On Thu, Dec 18, 2014 at 8:55 AM, Paul Hargrove wrote:
>
> I also have unexplained errors on my Solaris-10/SPARC system.
> It looks like there may have been a loss of network connectivity during
> the tests.
> I need to check these deeper, but I expect them to pass when I get a
>
"deficient" fortran support. If there is a desire/need to follow up
on this, let me know. However, all those "deficient" fortan compilers have
been reported by me on this list at least once in testing prior releases
(just never in one place).
-Paul
On Thu, Dec 18, 2014 at
iler has detected errors in module
> "MPI_F08_SIZEOF". No module information file will be created for this
> module.
>
> if (present(ierror)) ierror = 0
> ^
> "../../../../../../src/openmpi-1.8.4rc5/ompi/mpi/fortran/mpif-h/sizeof-mpif08-pre-1.8.4_f.F90",
&
On Thu, Dec 18, 2014 at 5:50 PM, Paul Hargrove wrote:
>
> Unless something turns up on the MIPS systems my "smoke test" of rc5 is
> complete.
In case anybody was holding their breath:
The MIPS testers completed just fine.
-Paul
--
Paul H. Hargrove
ghtly
> tarball for you.
>
> http://www.open-mpi.org/nightly/v1.8/
>
> Could you test it in the 2 cases where you had fortran failures?
>
>
>
> On Dec 18, 2014, at 8:50 PM, Paul Hargrove wrote:
>
> > Update:
> >
> > I now have 59 of 61 results, with
my
"thumbs up" with respect to "Fortran Sadness".
-Paul
On Fri, Dec 19, 2014 at 12:51 PM, Paul Hargrove wrote:
> Jeff,
>
> Less typing to launch 50+ testers than pick out just those two.
> Starting them now...
>
> -Paul
>
> On Fri, Dec 19, 2014 at
Sorry to rain on the parade, but SGI UV is still broken by default.
I reported this as present in 1.8.4rc5 and Nathan had claimed to be working
on it.
A reminder that all it takes is a 1-line change in
ompi/mca/btl/vader/configure.m4 to not search for sn/xpmem.h
-Paul
On Fri, Dec 19, 2014 at 7:2
Jeff,
If I understand one is (or will be soon) expected to have libtool-dev(el)
installed on the build system, even if one is not a OMPI developer.
How does this plan to cease embedding libltdl align with the fact that
autogen.pl currently applies patches to the parts of the generated
configure f
enough autotools to autogen on this old system then I wouldn't
have asked about libltdl from libtool-1.4. So, please *do* generate a
tarball and I will test (on *all* of my systems).
-Paul
On Fri, Jan 30, 2015 at 3:49 AM, Jeff Squyres (jsquyres) wrote:
> On Jan 29, 2015, at 9:11 PM, Paul H
i, Jan 30, 2015 at 1:29 PM, Jeff Squyres (jsquyres) wrote:
> On Jan 30, 2015, at 2:46 PM, Paul Hargrove wrote:
> >
> > If I had new enough autotools to autogen on this old system then I
> wouldn't have asked about libltdl from libtool-1.4. So, please *do*
> generate a tarball
featuring 100% fewer "make check"
> failures.
>
> http://www.open-mpi.org/~jsquyres/unofficial/
>
>
> > On Jan 30, 2015, at 5:14 PM, Jeff Squyres (jsquyres)
> wrote:
> >
> > Shame on me for not running "make check".
> >
> > Fixing...
&g
at 5:14 PM, Jeff Squyres (jsquyres)
> wrote:
> >
> > Shame on me for not running "make check".
> >
> > Fixing...
> >
> >
> >> On Jan 30, 2015, at 4:58 PM, Paul Hargrove wrote:
> >>
> >> Jeff,
> >>
> >> I ra
The output below occurred testing Jeff's no-embedded-libltdl tarball, but I
am assuming in quite likely the same is true on the trunk.
The "issue" is that I am told by configure that "C and C++ compilers are
not link compatible".
However, it appears I just don't have a C++ compiler at all!!
I am
the hopper system at Nersc.
>
> Do you have any Cray insight here? (see below for the exact issue)
>
>
> > On Feb 1, 2015, at 3:52 AM, Paul Hargrove wrote:
> >
> > Jeff (off-list),
> >
> > Original make was with V=1, so I skipped the "make clean"
t 4:44 AM, Jeff Squyres (jsquyres) wrote:
> Looks like the lt_interface.c code didn't properly use the lt_dladvise
> #if. How did that ever work, I wonder?
>
> Fixed now. On to your second finding...
>
>
> > On Jan 30, 2015, at 7:42 PM, Paul Hargrove wrote:
> >
>
On Mon, Feb 2, 2015 at 1:58 PM, Paul Hargrove wrote:
> 2b. I am retrying now with all of Cray's environment modules unloaded
> except the one for the PGI compiler. Nathan had suggested something like
> this to me in the past, but I've never had issues with the default
>
had fixed it in my local tree but not yet pushed to my github branch; I
> was waiting to see what happened w.r.t. your failure on the NERSC machine.
>
> I pushed the fix up to my branch now; do you want a new tarball?
>
>
> > On Feb 2, 2015, at 5:56 PM, Paul Hargrove wrote:
>
res (jsquyres)
wrote:
> Paul --
>
> If you've got the cycles and it's easy, release the hounds on the tarball
> that I just uploaded to:
>
> http://www.open-mpi.org/~jsquyres/unofficial/
>
> Thanks!
>
>
> > On Feb 2, 2015, at 7:19 PM, Paul Hargrove
On Mon, Feb 2, 2015 at 4:13 PM, Paul Hargrove wrote:
> HOWEVER - switching from PGI to GNU compilers made the problem go away.
> So, I suspect it may be an issue with the installation/configuration of
> the PGI compilers.
>
I've reproduced the problem on a non-Cray system wi
The following comes from testing Jeff's no-embedded-libltdl work, but I
suspect the same is true on tru^H^H^Hmaster.
The output below, from "make V=1" shows a link failure from trying to use
arc4random_addrandom(), which was removed on OpenBSD in late 2013.
The part that bugs me is that I thought
On Mon, Feb 2, 2015 at 5:22 PM, Paul Hargrove wrote:
> So, the overhead for me is pretty small as long as the number of failures
> is kept low.
I jinxed it!!!
I have, I believe, about 7 different failures now on various systems.
All of those appear UNRELATED to the libltdl changes.
Below is one example of what happens when you assume that you can trust the
libltdl installed an otherwise very well maintained national center. I
think this is another "vote" for continuing to embed (a working) libltdl.
-Paul
$ mpirun -mca btl sm,self -np 2 examples/ring_c'
libibverbs: Warning:
Howard,
This was seen on NERSC's Carver.
-Paul
On Mon, Feb 2, 2015 at 6:49 PM, Howard Pritchard
wrote:
> Hi Paul,
>
> Thanks for checking in depth into this. Just to help in determining how
> to proceed, which national center is this?
>
> Howard
>
>
> 2015-02-
I have a Mac OSX 10.8 system, where cc is clang.
I have no problems with a default build from the current master tarball.
However, a static-only build leads to a link failure on opal_wrapper.
Configured with
--prefix=... --enable-debug CC=cc CXX=c++ --enable-static --disable-shared
Failing port
On a system on which 1.8.4rc5 passed all my tests, I see the following
running "make" in the examples directory:
[...]
make[2]: Leaving directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64-java/BLD/examples'
make[2]: Entering directory
`/brashear/hargrove/OMPI/openmpi-master-linux-x86_64
601 - 700 of 925 matches
Mail list logo