[OMPI devel] 1.8.4rc2 now available for testing

2014-12-11 Thread Ralph Castain
In the usual place - this is an early rc as it doesn’t yet contain the thread multiple fix that is impacting performance. However, I wanted to give people a chance to run all their non-threaded functional validation tests. The release candidate includes a wide range of bug fixes as reported by

[OMPI devel] still supporting pgi?

2014-12-11 Thread Howard Pritchard
Hi Folks, I'm trying to use mtt on a cluster where it looks like the only functional compiler that 1) can build open mpi master 2) can also build the ibm test suite may be pgi. Can't compile write now, so I'm trying to fix it. But I'm now wondering whether we are still supporting building

Re: [OMPI devel] still supporting pgi?

2014-12-11 Thread Ralph Castain
I’m unaware of any conscious decision to cut pgi off - I think it has been more a case of nobody having a license to use for testing. > On Dec 11, 2014, at 7:37 AM, Howard Pritchard wrote: > > Hi Folks, > > I'm trying to use mtt on a cluster where it looks like the only

Re: [OMPI devel] still supporting pgi?

2014-12-11 Thread Jeff Squyres (jsquyres)
On Dec 11, 2014, at 7:40 AM, Ralph Castain wrote: > I’m unaware of any conscious decision to cut pgi off - I think it has been > more a case of nobody having a license to use for testing. +1 -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to:

Re: [OMPI devel] still supporting pgi?

2014-12-11 Thread Nathan Hjelm
On Thu, Dec 11, 2014 at 07:37:17AM -0800, Howard Pritchard wrote: >Hi Folks, >I'm trying to use mtt on a cluster where it looks like the only functional >compiler that >1) can build open mpi master >2) can also build the ibm test suite >may be pgi. Can't compile write now,

Re: [OMPI devel] still supporting pgi?

2014-12-11 Thread Howard Pritchard
Okay, I'll try to fix things. problem in opal_datatype_internal.h, then a meltdown with libfabric owing to the fact that its probably only been used in a gnu env. I'll open an issue on that one and assign it to Jeff. I think we should be turning this libfabric build off unless one asks for it.

Re: [OMPI devel] still supporting pgi?

2014-12-11 Thread Jeff Squyres (jsquyres)
On Dec 11, 2014, at 9:58 AM, Howard Pritchard wrote: > Okay, I'll try to fix things. problem in opal_datatype_internal.h, then a > meltdown with libfabric owing to the fact that its probably > only been used in a gnu env. I'll open an issue on that one and assign it to >

Re: [OMPI devel] still supporting pgi?

2014-12-11 Thread Paul Kapinos
Jeff, PGI compiler(s) are available on our Cluster: $ module avail pgi there are a lot of older versions, too: $ module load DEPRECATED $ module avail pgi best Paul P.S. in our standard environmet, Intel compieler and Open MPI are active, so $ module unload openmpi intel $ module load pgi

Re: [OMPI devel] Introducing memkind + Adding component in mpool framework

2014-12-11 Thread Vishwanath Venkatesan
Hi Jeff & Ralph, Thanks for the response, and sorry for the delay in my reply. Attending the developers meeting sounds like a good idea, But I will be back from my vacation only on the 15th. So I will not be able to close in on my possibilities to attend the developers meeting before that. I will

Re: [OMPI devel] Introducing memkind + Adding component in mpool framework

2014-12-11 Thread Jeff Squyres (jsquyres)
Ok. Howard asked me about this in person this week at the MPI Forum. I think we all agree that this sounds like an interesting prospect; we just need to make some adjustments in the OMPI infrastructure to make it happen. That will take some discussion. On Dec 11, 2014, at 11:58 AM,

Re: [OMPI devel] still supporting pgi?

2014-12-11 Thread Jeff Squyres (jsquyres)
Howard -- One thing I neglected to say -- if libfabric/usnic support on master is causing problems for you, you can configure without libfabric: ./configure --without-libfabric ... (which will, of course, also disable anything that requires libfabric) The intent is that we build things by

Re: [OMPI devel] still supporting pgi?

2014-12-11 Thread Paul Hargrove
Howard, I regularly test release candidates against the PGI installations on NERSC's systems (and sometimes elsewhere). In fact, have a test of 1.8.4rc2 against pgi-14.4 "in the pipe" right now. I believe Larry Baker of USGS is also a PGI user (in production, rather than just testing as I do).

Re: [OMPI devel] still supporting pgi?

2014-12-11 Thread Larry Baker
On 11 Dec 2014, at 2:12 PM, Paul Hargrove wrote: > I believe Larry Baker of USGS is also a PGI user (in production, rather than > just testing as I do). That is correct. Although we are running a rather old Rocks cluster kit (CentOS based) which is so old that we cannot run the latest PGI

[OMPI devel] [1.8.4rc2] orted SEGVs on Solaris-11/x86-64

2014-12-11 Thread Paul Hargrove
Testing the 1.8.4rc2 tarball on my x86-64 Solaris-11 systems I am getting the following crash for both "-m32" and "-m64" builds: $ mpirun -mca btl sm,self,openib -np 2 -host pcp-j-19,pcp-j-20 examples/ring_c' [pcp-j-19:18762] *** Process received signal *** [pcp-j-19:18762] Signal: Segmentation

Re: [OMPI devel] [1.8.4rc2] orted SEGVs on Solaris-11/x86-64

2014-12-11 Thread Ralph Castain
Ah crud - incomplete commit means we didn’t send the topo string. Will roll rc3 in a few minutes. Thanks, Paul Ralph > On Dec 11, 2014, at 3:08 PM, Paul Hargrove wrote: > > Testing the 1.8.4rc2 tarball on my x86-64 Solaris-11 systems I am getting the > following crash for

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-11 Thread George Bosilca
The overall design in OMPI was that no OMPI module should be allowed to decide if threads are on (thus it should not rely on the value returned by opal_using_threads during it's initialization stage). Instead, they should respect the level of thread support requested as an argument during the

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-11 Thread Ralph Castain
Just to help me understand: I don’t think this change actually changed any behavior. However, it certainly *allows* a different behavior. Isn’t that true? If so, I guess the real question is for Pascal at Bull: why do you feel this earlier setting is required? > On Dec 11, 2014, at 4:21 PM,

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-11 Thread Gilles Gouaillardet
George, please allow me to jump in with naive comments ... currently (master) both openib and usnic btl invokes opal_using_threads in component_init() : btl_openib_component_init(int *num_btl_modules, bool enable_progress_threads, bool

[OMPI devel] [1.8.4rc2] build broken by default on SGI UV

2014-12-11 Thread Paul Hargrove
I think I've reported this earlier in the 1.8 series. If I compile on an SGI UV (e.g. blacklight at PSC) configure picks up the presence of xpmem headers and enables the vader BTL. However, the port of vader to SGI's "flavor" of xpmem is incomplete and the following build failure results:

Re: [OMPI devel] [1.8.4rc2] orted SEGVs on Solaris-11/x86-64

2014-12-11 Thread Paul Hargrove
Don't see an rc3 yet. My Solaris-10/SPARC runs fail slightly differently (see below). It looks sufficiently similar that it MIGHT be the same root cause. However, lacking an rc3 to test I figured it would be better to report this than to ignore it. The problem is present with both V8+ and V9

Re: [OMPI devel] [1.8.4rc2] orted SEGVs on Solaris-11/x86-64

2014-12-11 Thread Ralph Castain
No, that looks different - it’s failing in mpirun itself. Can you get a line number on it? Sorry for delay - I’m generating rc3 now > On Dec 11, 2014, at 6:59 PM, Paul Hargrove wrote: > > Don't see an rc3 yet. > > My Solaris-10/SPARC runs fail slightly differently (see

[OMPI devel] [1.8.4rc2] orterun SEGVs on Solaris-10/SPARC

2014-12-11 Thread Paul Hargrove
Backtrace for the Solaris-10/SPARC SEGV appears below. I've changed the subject line to distinguish this from the earlier report. -Paul program terminated by signal SEGV (no mapping at the fault address) 0x7d93b634: strlen+0x0014: lduh [%o2], %o1 Current function is guess_strlen

[OMPI devel] [1.8.4rc3] false report of no loopback interface + segv at exit

2014-12-11 Thread Paul Hargrove
Ralph, Sorry to be the bearer of more bad news. The "good" news is I've seen the new warning regarding the lack of a loopback interface. The BAD news is that it is occurring on a Linux cluster that I'ver verified DOES have 'lo' configured on the front-end and compute nodes (UP and RUNNING

Re: [OMPI devel] [1.8.4rc3] false report of no loopback interface + segv at exit

2014-12-11 Thread Gilles Gouaillardet
Paul, about the five warnings : can you confirm you are running mpirun *not* on n15 nor n16 ? if my guess is correct, then you can get up to 5 warnings : mpirun + 2 orted + 2 mpi tasks do you have any oob_tcp_if_include or oob_tcp_if_exclude settings in your openmpi-mca-params.conf ? here is

Re: [OMPI devel] [1.8.4rc3] false report of no loopback interface + segv at exit

2014-12-11 Thread Ralph Castain
I honestly think it has to be a selected interface, Gilles, else we will fail to connect. > On Dec 11, 2014, at 8:26 PM, Gilles Gouaillardet > wrote: > > Paul, > > about the five warnings : > can you confirm you are running mpirun *not* on n15 nor n16 ? > if my

Re: [OMPI devel] [1.8.4rc2] orterun SEGVs on Solaris-10/SPARC

2014-12-11 Thread Ralph Castain
Thanks Paul - I will post a fix for this tomorrow. Looks like Sparc isn’t returning an architecture type for some reason, and I didn’t protect against it. > On Dec 11, 2014, at 7:39 PM, Paul Hargrove wrote: > > Backtrace for the Solaris-10/SPARC SEGV appears below. > I've