Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
The proof of the pudding is that all of the MPI layer has been adapted to the new async behavior -except- for the openib cpc's. The issue of what to do with these has been raised several times, especially once the ofacm code was committed. Unfortunately, lack of time and priorities left this

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
On Nov 14, 2013, at 1:16 PM, Shamis, Pavel wrote: >> 1. Ralph made the OOB asynchronous. I pondered this for awhile today, and I just want to correct any misimpression this statement might leave, especially with folks who haven't been around the project that much over the

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
Thanks Pasha!! On Nov 14, 2013, at 4:34 PM, Shamis, Pavel wrote: > For Iboffload this should not be an issue since our connection manager is > blocking (I have to double-check ) > > For openib, this should not be such huge change. The code is pretty much > standalone, we

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
For Iboffload this should not be an issue since our connection manager is blocking (I have to double-check ) For openib, this should not be such huge change. The code is pretty much standalone, we only have to move it to main thread and add signaling mechanism. I will take a look. Best,

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
On Nov 14, 2013, at 4:22 PM, Shamis, Pavel wrote: > Well, this is major change in a behavior. > > Since openib calls communication calls from the callback > it pretty much requires to enable thread safety on openib btl level. Ah, yes - could well be true. Or else separate

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
Well, this is major change in a behavior. Since openib calls communication calls from the callback it pretty much requires to enable thread safety on openib btl level. But we may move the queue flush operation from the callback to main thread, so the progress engine will wait on a signal from

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
On Nov 14, 2013, at 3:33 PM, Shamis, Pavel wrote: > >> The only change is that the receive callback is now occurring in the ORTE >> event thread, and so perhaps someone needs to look at a way to pass that >> back into the OMPI event base (which I guess is the OPAL event

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
> The only change is that the receive callback is now occurring in the ORTE > event thread, and so perhaps someone needs to look at a way to pass that back > into the OMPI event base (which I guess is the OPAL event base)? Just > glancing at the code, it looks like that could be the issue -

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
On Nov 14, 2013, at 3:07 PM, Shamis, Pavel wrote: > >> So far as I can tell, the issue is one of blocking. The OOB handshake is now >> async - i.e., you post a non-blocking recv at the beginning of time, and >> then do a non-blocking send to the other side when you want to

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
> So far as I can tell, the issue is one of blocking. The OOB handshake is now > async - i.e., you post a non-blocking recv at the beginning of time, and then > do a non-blocking send to the other side when you want to create a > connection. The question is: how do you know when that

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
So far as I can tell, the issue is one of blocking. The OOB handshake is now async - i.e., you post a non-blocking recv at the beginning of time, and then do a non-blocking send to the other side when you want to create a connection. The question is: how do you know when that connection is

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
> > 1. Ralph made the OOB asynchronous. > Ralph, I'm not familiar with details of the change. If out-of-band communication is supported, it should not be that huge change for XOOB/OOB.

Re: [OMPI devel] ROMIO update breaks trunk

2013-11-14 Thread Ralph Castain
Ah, thanks! Yes indeed On Nov 14, 2013, at 1:05 PM, Thomas Naughton wrote: > Hi Ralph, > > Does the version in AM_INIT_AUTOMAKE in configure.ac also need to be > increased? It currently shows 1.11. > > Thanks, > --tjn > >

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
Comments inline. > > 3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm in > that move. Never changed openib to use ofacm/common. Pasha: This is not entirely true. I changed openib btl ~3 year ago before my departure from Mellanox. (I sent link to the code earlier).

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
On Nov 14, 2013, at 12:42 PM, Joshua Ladd wrote: > We are happy to provide access to our set of small test clusters and > engineering resources, but, honestly, Nathan/LANL guys probably have better > access to a big IB system. > > I’m sure your boss could care less,

Re: [OMPI devel] ROMIO update breaks trunk

2013-11-14 Thread Thomas Naughton
Hi Ralph, Does the version in AM_INIT_AUTOMAKE in configure.ac also need to be increased? It currently shows 1.11. Thanks, --tjn _ Thomas Naughton naught...@ornl.gov Research

Re: [OMPI devel] ROMIO update breaks trunk

2013-11-14 Thread Ralph Castain
Ha! Jeff points out that our web site says we are at AM 1.12.2 - yet our HACKING file says 1.11.1 Sadness I'll leave the romio update alone and update the HACKING file to avoid future confusion On Nov 14, 2013, at 12:41 PM, Ralph Castain wrote: > Just in case others are

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
We are happy to provide access to our set of small test clusters and engineering resources, but, honestly, Nathan/LANL guys probably have better access to a big IB system. I'm sure your boss could care less, but this is not Intel's code base. Sorry to be so blunt about it, Ralph. We've

[OMPI devel] ROMIO update breaks trunk

2013-11-14 Thread Ralph Castain
Just in case others are encountering this: the recent ROMIO update contains a line in its configure.ac that breaks the trunk for automake versions less than 1.12: "I've looked a bit around online for this, and the consensus generally seems to be that AM_PROG_AR should be added in libtool, not

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
On Nov 14, 2013, at 12:21 PM, Barrett, Brian W wrote: > On 11/14/13 1:13 PM, "Joshua Ladd" wrote: > >> Let me try to summarize my understanding of the situation: >> >> 1. Ralph made the OOB asynchronous. >> >> 2. OOB cpcs don't work as a result of

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Barrett, Brian W
On 11/14/13 1:13 PM, "Joshua Ladd" wrote: >Let me try to summarize my understanding of the situation: > >1. Ralph made the OOB asynchronous. > >2. OOB cpcs don't work as a result of 1, and are thereby "deprecated", >meaning: won't fix. > >3. Pasha moved the openib/connect

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
Let me try to summarize my understanding of the situation: 1. Ralph made the OOB asynchronous. 2. OOB cpcs don't work as a result of 1, and are thereby "deprecated", meaning: won't fix. 3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm in that move. Never changed

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
I'm a bit outdated. What it the problem with oob / xoob ? -Pasha On Nov 14, 2013, at 3:07 PM, "Hjelm, Nathan T" wrote: > I don't think so. From what I understand the iboffload component may not live > much longer because of > Mellanox's fork of Cheetah. So, it might not

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Hjelm, Nathan T
I don't think so. From what I understand the iboffload component may not live much longer because of Mellanox's fork of Cheetah. So, it might not matter. -Nathan Excuse the *&(#$y Outlook posting-style. OWA sucks. From: devel [devel-boun...@open-mpi.org]

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
The key question, though, is: has anyone checked to see if the ofacm code even works any more?? Only oob and xoob components appear to be present - so unless someone fixed those since they were originally copied from openib, I doubt ofacm works. On Nov 14, 2013, at 11:08 AM, Shamis, Pavel

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
There is some confusion in the thread. UDCM is just another CPC, like XOOB, OOB, and RDMACM (I think IBCM is officially dead). XOOB and OOB don't use UDCM, they relay on ORTE out-of-band communication. OpenIB/connect supports UDCM,XOOB,OOB, and RDMACM OFACM supports (at least last time when we

Re: [OMPI devel] [EXTERNAL] What to do about openib/ofacm/cpc (was: r29703 - in trunk: contrib/p...)

2013-11-14 Thread Ralph Castain
Here's the core problem: it isn't a question of "if" some of these things should be resolved, but "who". They've been around for a very long time, but nobody has the time or will to fix them. I have no access to such machines, so all I can do is verify that it sorta compiles and is consistent

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
Unless someone went in and "fixed" the code in common (judging by the comments, fixed seems to imply porting (x)oob to use UDCM, which hasn't been done at all in the context of xoob and is incompletely patched and remains unusable as a replacement for oob in 1.7.4), there is no reason to

[OMPI devel] What to do about openib/ofacm/cpc (was: r29703 - in trunk: contrib/p...)

2013-11-14 Thread Jeff Squyres (jsquyres)
On Nov 14, 2013, at 1:03 PM, Ralph Castain wrote: >> 1) What the status of UDCM is (does it work reliably, does it support >> XRC, etc.) > > Seems to be working okay on the IB systems at LANL and IU. Don't know about > XRC - I seem to recall the answer is "no" FWIW, I

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
On Nov 14, 2013, at 9:33 AM, Barrett, Brian W wrote: > On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)" wrote: > >> Does XRC work with the UDCM CPC? >> >> >> On Nov 14, 2013, at 9:35 AM, Ralph Castain wrote: >> >>> I think the

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Barrett, Brian W
On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)" wrote: >Does XRC work with the UDCM CPC? > > >On Nov 14, 2013, at 9:35 AM, Ralph Castain wrote: > >> I think the problems in udcm were fixed by Nathan quite some time ago, >>but never moved to 1.7 as everyone

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
Yeah, I believe that is true as well. However, we have been bugging people to fix this for a long time, and nobody appears to have the cycles to do so. As a reminder: we have to remove all OOB dependence on connections in the BTL as we are moving the BTLs to OPAL. Hence, there is no interest in

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
I see comments alluding to the fact that it does not, but with the intentions to add it: Hopefully, xoob in common will work. #if HAVE_XRC if (mca_btl_openib_component.num_xrc_qps > 0) { BTL_VERBOSE(("UD CPC does not support XRC QPs (yet)")); break; } #endif for (i = 0

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
When I looked at the code last time - no. (The connection state machine is very different) Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Nov 14, 2013, at 11:51 AM, Jeff Squyres (jsquyres)

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Jeff Squyres (jsquyres)
Does XRC work with the UDCM CPC? On Nov 14, 2013, at 9:35 AM, Ralph Castain wrote: > I think the problems in udcm were fixed by Nathan quite some time ago, but > never moved to 1.7 as everyone was told that the connect code in openib was > already deprecated pending merge

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
I think the problems in udcm were fixed by Nathan quite some time ago, but never moved to 1.7 as everyone was told that the connect code in openib was already deprecated pending merge with the new ofacm common code. Looking over at that area, I see only oob and xoob - so if the users of the

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
Um, no. It's supposed to work with UDCM which doesn't appear to be enabled in 1.7. Per Ralph's comment to me last night: "... you cannot use the oob connection manager. It doesn't work and was deprecated. You must use udcm, which is why things are supposed to be set to do so by default.

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Jeff Squyres (jsquyres)
Does the openib *only* work with RDMACM now? That's surprising (and bad!). Did someone ask Mellanox about fixing the OOB and XOOB CPCs? On Nov 13, 2013, at 11:16 PM, svn-commit-mai...@open-mpi.org wrote: > Author: rhc (Ralph Castain) > Date: 2013-11-13 23:16:53 EST (Wed, 13 Nov 2013) > New