Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd

The proof of the pudding is that all of the MPI layer has been adapted to the 
new async behavior -except- for the openib cpc's. The issue of what to do with 
these has been raised several times, especially once the ofacm code was 
committed. Unfortunately, lack of time and priorities left this code to bitrot.

[Josh] Not completely true, UDCM is supposed to be the alternative, at least 
for RC. It's easy to say - "well, everything works now except OpenIB". If we're 
working under the assumption that these were community decisions wholeheartedly 
agreed upon and fully endorsed by all members, well then we have to also 
believe that we agreed as a community to the following list of tasks and 
nobody's done anything. The only ones explicitly committed to technical work - 
Mellanox.  Per Jeff's words, "the next dominos to fall" implies at least a 
partial ordering. We need a functioning UDCM before we can study it and figure 
out how to adapt it to XRC - maybe it is functioning perfectly, who knows??! 
Nobody, apparently - seems like it should've been released into the wild in 
1.7.3. Are there some ppt slides that we can look at from the RFC? If so, I've 
been unable to locate them. Unfortunately, this is just one piece of what's 
missing and we are relying on the rest of the community that agreed to these 
changes to make good on their promises. My biggest issue this morning is that 
UDCM is not in 1.7 but the OOB change is - that's a problem. You skipped steps 
1, 2, 3, and 4 and went right to 5 - that's a problem. That's not what we as a 
community agreed upon.

Josh


Subject: [OMPI devel] Openib + common/verbs CPC consolidation
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
List-Post: devel@lists.open-mpi.org
Date: 2013-05-14 15:29:15

  *   Next message: Rolf vandeVaart: "[OMPI devel] Build warnings in 
trunk"
  *   Previous message: Ralph Castain: "[OMPI devel] RFC: rewrite of ORTE 
OOB"


 FYI: On the teleconf today, we talked about the next dominos to fall in the 
quest to move the BTLs down to OPAL:

1. Nathan will make the openib "udcm" CPC the default in the immediate future
   --> This paves the way to ditch the problematic "oob" openib CPC
   --> This also will give udcm more widespread testing
2. Mellanox will add XRC support to udcm
   --> This paves the way to ditch the problematic "xoob" openib CPC
   --> Josh thought they could do this within a month, but that's a SWAG and 
subject to change
3. I will ping Chelsio about getting them to add proper iWARP support into 
common/ofacm
   --> This paves the way to eliminate btl/openib/cpc
   --> No idea on timeframe yet
4. Once #3 is done, make openib use common/ofacm
5. Once #2, #3, and #4 are done, delete btl/openib/cpc

#1-3 have people assigned to them. #4 does not (#5 is pretty trivial -- an svn 
rm plus some Makefile.am mods).






Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain

On Nov 14, 2013, at 1:16 PM, Shamis, Pavel  wrote:

>> 1. Ralph made the OOB asynchronous.


I pondered this for awhile today, and I just want to correct any misimpression 
this statement might leave, especially with folks who haven't been around the 
project that much over the last couple of years. Just to clarify: this wasn't a 
case of Ralph waking up one day and saying "hey, let's make the OOB async!". 
Quite the contrary.

This whole conversion process started nearly two years ago when we, as a 
community, decided to move towards an async progress model. We laid out all the 
things that we thought would need to be done to make that happen...and then we 
started down that path. First, we updated the event library to the 2.x series 
so we could separate the event bases for the different layers, and so we could 
have event priority levels. Some folks started hardening the BTLs for thread 
safety and adding progress threads inside them. Etc.

One step on that path was to make ORTE operate asynchronously as a purely 
event-driven library. First, we rewrote the state machine so all ORTE 
operations ran in an event, except for the OOB as that can of worms was just 
too hard. Frankly, nobody wanted to touch it, so we left it alone and made 
everything else work.

Finally, I took on the OOB rewrite. One of our continual problems was 
deadlocking somewhere because someone would call a blocking send/recv while in 
an OOB callback - usually way down in the stack somewhere that wasn't 
immediately obvious to the user. After spending time fiddling with things, it 
became clear that the only simple solution was to make the OOB totally 
non-blocking. This also made a much cleaner integration to the rest of the ORTE 
state machine.

So we brought it up at a couple of developer meetings, talked a number of times 
on the weekly telecon, went thru several email threads, RFCs, etc. - with me 
emphasizing repeatedly that the OOB was going to lose its blocking interfaces. 
The fact that OOB callbacks would be occurring in the ORTE event base thread 
was also discussed, and was one of the reasons why we locked libevent thread 
protection "on" earlier this year. This fact may have escaped some people, but 
it was discussed on several occasions.

The proof of the pudding is that all of the MPI layer has been adapted to the 
new async behavior -except- for the openib cpc's. The issue of what to do with 
these has been raised several times, especially once the ofacm code was 
committed. Unfortunately, lack of time and priorities left this code to bitrot.

I'm not pointing fingers at anyone, nor am I saying this was all perfect. Just 
trying to point out that this was a community move that is part of our 
community roadmap, and we perhaps need to be better at finding a way to keep 
everyone/everything a little more connected to the convoy. This is going to get 
even more rocky in the next year as we push towards full thread safety and 
async progress, and re-implement checkpoint/restart support.

So heads-up...!
Ralph



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
Thanks Pasha!!

On Nov 14, 2013, at 4:34 PM, Shamis, Pavel  wrote:

> For Iboffload this should not be an issue since our connection manager is 
> blocking (I have to double-check )
> 
> For openib, this should not be such huge change. The code is pretty much 
> standalone, we only have to move it to 
> main thread and add signaling mechanism.
> 
> I will take a look.
> 
> Best,
> -Pasha
> 
> 
> 
> 
> On Nov 14, 2013, at 7:25 PM, Ralph Castain  wrote:
> 
>> 
>> On Nov 14, 2013, at 4:22 PM, Shamis, Pavel  wrote:
>> 
>>> Well, this is major change in a behavior.
>>> 
>>> Since openib calls communication calls from the callback
>>> it pretty much requires to enable thread safety on openib btl level.
>> 
>> Ah, yes - could well be true. Or else separate the two like we do elsewhere 
>> - transfer the recv callback to the openib thread and let it do the rest.
>> 
>>> 
>>> But we may move the queue flush operation from the callback to main thread, 
>>> so 
>>> the progress engine will wait on a signal from callback. 
>> 
>> Yep - that's what we do elsewhere
>> 
>>> 
>>> How does it work for other parts of OMPI (sm, communicator) ? 
>>> I guess they don't do anything in the callbacks ? 
>> 
>> Correct - they immediately transfer the info to their local progress engine 
>> (in whatever form).
>> 
>>> 
>>> Best,
>>> Pasha
>>> 
>>> On Nov 14, 2013, at 6:35 PM, Ralph Castain  wrote:
>>> 
 
 On Nov 14, 2013, at 3:33 PM, Shamis, Pavel  wrote:
 
> 
>> The only change is that the receive callback is now occurring in the 
>> ORTE event thread, and so perhaps someone needs to look at a way to pass 
>> that back into the OMPI event base (which I guess is the OPAL event 
>> base)? Just glancing at the code, it looks like that could be the issue 
>> - but I honestly have no idea what event base someone wants to switch 
>> to, or if they want to resolve it some other way. There are clearly some 
>> things happening in the ofacm oob code that involve thread locking etc., 
>> but I don't know what those areas are trying to do.
> 
> I see. In this mode do you enable thread safety support  in all library 
> (mpi)?
 
 Only if the user configures to do so - ORTE doesn't require it as we use 
 the event library's thread safety and do everything inside events.
 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
For Iboffload this should not be an issue since our connection manager is 
blocking (I have to double-check )

For openib, this should not be such huge change. The code is pretty much 
standalone, we only have to move it to 
main thread and add signaling mechanism.

I will take a look.

Best,
-Pasha




On Nov 14, 2013, at 7:25 PM, Ralph Castain  wrote:

> 
> On Nov 14, 2013, at 4:22 PM, Shamis, Pavel  wrote:
> 
>> Well, this is major change in a behavior.
>> 
>> Since openib calls communication calls from the callback
>> it pretty much requires to enable thread safety on openib btl level.
> 
> Ah, yes - could well be true. Or else separate the two like we do elsewhere - 
> transfer the recv callback to the openib thread and let it do the rest.
> 
>> 
>> But we may move the queue flush operation from the callback to main thread, 
>> so 
>> the progress engine will wait on a signal from callback. 
> 
> Yep - that's what we do elsewhere
> 
>> 
>> How does it work for other parts of OMPI (sm, communicator) ? 
>> I guess they don't do anything in the callbacks ? 
> 
> Correct - they immediately transfer the info to their local progress engine 
> (in whatever form).
> 
>> 
>> Best,
>> Pasha
>> 
>> On Nov 14, 2013, at 6:35 PM, Ralph Castain  wrote:
>> 
>>> 
>>> On Nov 14, 2013, at 3:33 PM, Shamis, Pavel  wrote:
>>> 
 
> The only change is that the receive callback is now occurring in the ORTE 
> event thread, and so perhaps someone needs to look at a way to pass that 
> back into the OMPI event base (which I guess is the OPAL event base)? 
> Just glancing at the code, it looks like that could be the issue - but I 
> honestly have no idea what event base someone wants to switch to, or if 
> they want to resolve it some other way. There are clearly some things 
> happening in the ofacm oob code that involve thread locking etc., but I 
> don't know what those areas are trying to do.
 
 I see. In this mode do you enable thread safety support  in all library 
 (mpi)?
>>> 
>>> Only if the user configures to do so - ORTE doesn't require it as we use 
>>> the event library's thread safety and do everything inside events.
>>> 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain

On Nov 14, 2013, at 4:22 PM, Shamis, Pavel  wrote:

> Well, this is major change in a behavior.
> 
> Since openib calls communication calls from the callback
> it pretty much requires to enable thread safety on openib btl level.

Ah, yes - could well be true. Or else separate the two like we do elsewhere - 
transfer the recv callback to the openib thread and let it do the rest.

> 
> But we may move the queue flush operation from the callback to main thread, 
> so 
> the progress engine will wait on a signal from callback. 

Yep - that's what we do elsewhere

> 
> How does it work for other parts of OMPI (sm, communicator) ? 
> I guess they don't do anything in the callbacks ? 

Correct - they immediately transfer the info to their local progress engine (in 
whatever form).

> 
> Best,
> Pasha
> 
> On Nov 14, 2013, at 6:35 PM, Ralph Castain  wrote:
> 
>> 
>> On Nov 14, 2013, at 3:33 PM, Shamis, Pavel  wrote:
>> 
>>> 
 The only change is that the receive callback is now occurring in the ORTE 
 event thread, and so perhaps someone needs to look at a way to pass that 
 back into the OMPI event base (which I guess is the OPAL event base)? Just 
 glancing at the code, it looks like that could be the issue - but I 
 honestly have no idea what event base someone wants to switch to, or if 
 they want to resolve it some other way. There are clearly some things 
 happening in the ofacm oob code that involve thread locking etc., but I 
 don't know what those areas are trying to do.
>>> 
>>> I see. In this mode do you enable thread safety support  in all library 
>>> (mpi)?
>> 
>> Only if the user configures to do so - ORTE doesn't require it as we use the 
>> event library's thread safety and do everything inside events.
>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
Well, this is major change in a behavior.
 
Since openib calls communication calls from the callback
it pretty much requires to enable thread safety on openib btl level.

But we may move the queue flush operation from the callback to main thread, so 
the progress engine will wait on a signal from callback. 

How does it work for other parts of OMPI (sm, communicator) ? 
I guess they don't do anything in the callbacks ? 

Best,
Pasha

On Nov 14, 2013, at 6:35 PM, Ralph Castain  wrote:

> 
> On Nov 14, 2013, at 3:33 PM, Shamis, Pavel  wrote:
> 
>> 
>>> The only change is that the receive callback is now occurring in the ORTE 
>>> event thread, and so perhaps someone needs to look at a way to pass that 
>>> back into the OMPI event base (which I guess is the OPAL event base)? Just 
>>> glancing at the code, it looks like that could be the issue - but I 
>>> honestly have no idea what event base someone wants to switch to, or if 
>>> they want to resolve it some other way. There are clearly some things 
>>> happening in the ofacm oob code that involve thread locking etc., but I 
>>> don't know what those areas are trying to do.
>> 
>> I see. In this mode do you enable thread safety support  in all library 
>> (mpi)?
> 
> Only if the user configures to do so - ORTE doesn't require it as we use the 
> event library's thread safety and do everything inside events.
> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain

On Nov 14, 2013, at 3:33 PM, Shamis, Pavel  wrote:

> 
>> The only change is that the receive callback is now occurring in the ORTE 
>> event thread, and so perhaps someone needs to look at a way to pass that 
>> back into the OMPI event base (which I guess is the OPAL event base)? Just 
>> glancing at the code, it looks like that could be the issue - but I honestly 
>> have no idea what event base someone wants to switch to, or if they want to 
>> resolve it some other way. There are clearly some things happening in the 
>> ofacm oob code that involve thread locking etc., but I don't know what those 
>> areas are trying to do.
> 
> I see. In this mode do you enable thread safety support  in all library (mpi)?

Only if the user configures to do so - ORTE doesn't require it as we use the 
event library's thread safety and do everything inside events.

> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel

> The only change is that the receive callback is now occurring in the ORTE 
> event thread, and so perhaps someone needs to look at a way to pass that back 
> into the OMPI event base (which I guess is the OPAL event base)? Just 
> glancing at the code, it looks like that could be the issue - but I honestly 
> have no idea what event base someone wants to switch to, or if they want to 
> resolve it some other way. There are clearly some things happening in the 
> ofacm oob code that involve thread locking etc., but I don't know what those 
> areas are trying to do.

I see. In this mode do you enable thread safety support  in all library (mpi)?



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain

On Nov 14, 2013, at 3:07 PM, Shamis, Pavel  wrote:

> 
>> So far as I can tell, the issue is one of blocking. The OOB handshake is now 
>> async - i.e., you post a non-blocking recv at the beginning of time, and 
>> then do a non-blocking send to the other side when you want to create a 
>> connection. The question is: how do you know when that connection is ready?
> 
> As you describe, the new behavior is identical to original one. We post 
> non-blocking (persistent) receive during initialization. Later OMPI has 
> barrier in the flow to ensure that all processes reached the point.
> On first send, we use a non-blocking oob-send to initialize the connection 
> (QPs). The receive triggers callback that handles the connection setup. OOB / 
> XOOB communication semantics is a fully non-blocking.
> 
> We don't really block anywhere. 
> We use   ompi_rte_recv_buffer_nb and ompi_rte_send_buffer_nb functions only.

The only change is that the receive callback is now occurring in the ORTE event 
thread, and so perhaps someone needs to look at a way to pass that back into 
the OMPI event base (which I guess is the OPAL event base)? Just glancing at 
the code, it looks like that could be the issue - but I honestly have no idea 
what event base someone wants to switch to, or if they want to resolve it some 
other way. There are clearly some things happening in the ofacm oob code that 
involve thread locking etc., but I don't know what those areas are trying to do.


> 
> Best,
> Pasha
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel

> So far as I can tell, the issue is one of blocking. The OOB handshake is now 
> async - i.e., you post a non-blocking recv at the beginning of time, and then 
> do a non-blocking send to the other side when you want to create a 
> connection. The question is: how do you know when that connection is ready?

As you describe, the new behavior is identical to original one. We post 
non-blocking (persistent) receive during initialization. Later OMPI has barrier 
in the flow to ensure that all processes reached the point.
On first send, we use a non-blocking oob-send to initialize the connection 
(QPs). The receive triggers callback that handles the connection setup. OOB / 
XOOB communication semantics is a fully non-blocking.

We don't really block anywhere. 
We use   ompi_rte_recv_buffer_nb and ompi_rte_send_buffer_nb functions only.

Best,
Pasha

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
So far as I can tell, the issue is one of blocking. The OOB handshake is now 
async - i.e., you post a non-blocking recv at the beginning of time, and then 
do a non-blocking send to the other side when you want to create a connection. 
The question is: how do you know when that connection is ready?

I don't know enough about the cpc handshake to debug the problem. I can only 
verify that the data is being sent and received by both sides. Someone who 
understands the cpc state machine needs to look at the code and figure out 
where and how to block at the appropriate point.


On Nov 14, 2013, at 1:16 PM, Shamis, Pavel  wrote:

>> 
>> 1. Ralph made the OOB asynchronous.
>> 
> 
> Ralph,
> 
> I'm not familiar with details of the change. If out-of-band communication is 
> supported, it should not be
> that huge change for XOOB/OOB.
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
> 
> 1. Ralph made the OOB asynchronous.
> 

Ralph,

I'm not familiar with details of the change. If out-of-band communication is 
supported, it should not be
that huge change for XOOB/OOB.



Re: [OMPI devel] ROMIO update breaks trunk

2013-11-14 Thread Ralph Castain
Ah, thanks! Yes indeed

On Nov 14, 2013, at 1:05 PM, Thomas Naughton  wrote:

> Hi Ralph,
> 
> Does the version in AM_INIT_AUTOMAKE in configure.ac also need to be
> increased?  It currently shows 1.11.
> 
> Thanks,
> --tjn
> 
> _
>  Thomas Naughton  naught...@ornl.gov
>  Research Associate   (865) 576-4184
> 
> 
> On Thu, 14 Nov 2013, Ralph Castain wrote:
> 
>> Ha! Jeff points out that our web site says we are at AM 1.12.2 - yet our
>> HACKING file says 1.11.1  Sadness
>> I'll leave the romio update alone and update the HACKING file to avoid
>> future confusion
>> On Nov 14, 2013, at 12:41 PM, Ralph Castain  wrote:
>> 
>>  Just in case others are encountering this: the recent ROMIO
>>  update contains a line in its configure.ac that breaks the trunk
>>  for automake versions less than 1.12:
>> "I've looked a bit around online for this, and the consensus generally seems
>> to be that AM_PROG_AR should be added in libtool, not in every configure.ac
>> script out there. It's especially problematic as AM_PROG_AR doesn't exist i
>> n automake before 1.12, which means it breaks, among others, with the automa
>> ke we use to build our distribution tarballs :-)
>> See e.g. http://debbugs.gnu.org/cgi/bugreport.cgi?bug=11401 for a discussion
>> ."
>> I'm going to comment that line out in ompi/mca/io/romio/romio/configure.ac s
>> o the trunk can build until someone figures out (a) if it is really needed, 
>> and (b) how to correctly add it
>> Ralph
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
Comments inline.

> 
> 3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm in 
> that move.  Never changed openib to use ofacm/common.

Pasha: This is not entirely true.  I changed openib btl ~3 year ago before my 
departure from Mellanox.  (I sent link to the code earlier).
We (community) were not able to integrate the code because of "first message" 
issue in iWarp.

> 
> Given Nathan's comments a second ago about ORNL not supporting the IB Offload 
> component, it barely makes sense to keep common/ofacm. 

Pasha: We have no intend to remove iboffload support. Obviously if Mellanox 
stops to support CORE-Direct technology, it make sense to remove it.

Best,
-Pasha



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain

On Nov 14, 2013, at 12:42 PM, Joshua Ladd  wrote:

> We are happy to provide access to our set of small test clusters and 
> engineering resources, but, honestly, Nathan/LANL guys probably have better 
> access to a big IB system.
>  
> I’m sure your boss could care less, but this is not Intel’s code base. Sorry 
> to be so blunt about it, Ralph

I agree - nobody said it was. However, this community works by committee. In 
this case, the OOB update was discussed for more than a year, the RFC was out 
for nearly 6 months, the branch was made available for testing and review for 
nearly 3 months, and it sat in the trunk for another 3+ months before moving to 
the 1.7 branch.

At some point, the IB users in this community have to take responsibility for 
testing and helping debug their code areas, not just letting them bitrot for 
months and then saying "hey, something broke - somebody fix it".

As I said, I'm happy to help - but ultimately, IB support is the responsibility 
of the IB members of this community...and I'm not one of them.


> . We’ve expended an enormous amount of effort *trying* to make OSHMEM 
> something that works for the community and not just Mellanox customers. 
> Believe me, we would rather focus our efforts elsewhere too.   
>  
> Josh
>  
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Thursday, November 14, 2013 3:32 PM
> To: Open MPI Developers
> Cc: Yiftah Shahar; Gilad Shainer
> Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 
> - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib 
> ompi/mca/btl/openib/connect
>  
>  
> On Nov 14, 2013, at 12:21 PM, Barrett, Brian W  wrote:
> 
> 
> On 11/14/13 1:13 PM, "Joshua Ladd"  wrote:
> 
> 
> Let me try to summarize my understanding of the situation:
> 
> 1. Ralph made the OOB asynchronous.
> 
> 2. OOB cpcs don't work as a result of 1, and are thereby "deprecated",
> meaning: won't fix.
> 
> 3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm
> in that move.  Never changed openib to use ofacm/common.
> 
> 4. UDCM is "functional" in the trunk, still sitting in openib/connect.
> But no one is entirely sure if it really works which is why it was
> disabled in 1.7. Nathan - is there a design doc you can share on this
> beyond the comments in the code?
> 
> 5. In order to satisfy the "grand plan":
> a. UDCM still needs to be moved to common/ofacm.
>   b. OpenIB still needs to be changed to use common/ofacm.
>   c.  RDMACM still needs to migrate to common/ofacm.
>   d. XRC support needs to be added to UDCM and put into
> common/ofacm.
> 
> 6. The "grand plan" being:  move the BTLs into Opal - hence the need to
> scuttle the OOB cpcs thereby justifying the deprecation and not fixing
> cpcs after #1.
> 
> So, that's a quick roundup of how we ended up here (as I understand it.)
> What needs to be done is:
> 
> That's my understanding as well.
> 
> 
> 1. Somebody needs to certify/review/ that what Nathan has done is sound.
> From my perspective, this is a BIG change and needs a comprehensive
> architecture review. We've been using it in the trunk, and we've been
> testing it under MTT for some time - but have not deployed or tested at
> large-scale out in the field. Would be nice to see something on paper in
> terms of a design doc.
> 
> 2. Somebody then needs to move UDCM into common/ofacm.
> 
> 3. Somebody needs to change openib to use common/ofacm cpcs instead of
> openib/connect cpcs.
> 
> 4. Somebody needs to move RDMACM into common/ofacm and make sure RoCEE
> works.
> 
> 5. Somebody needs to add XRC support to UDCM - whatever that might mean.
> Given Nathan added UDCM back in 2011 and nobody is really sure it's ready
> for prime-time, and given Pasha's comments regarding the difference in
> state machine requirements  between the two connection schemes, this
> doesn't seem like a trivial task.
> 
> Given Nathan's comments a second ago about ORNL not supporting the IB
> Offload component, it barely makes sense to keep common/ofacm. And it
> sounds like the two cpcs presently contained therein are now unusable.
> 
> All of this work is a result of the Grand Plan to move the BTLs into the
> Opal layer - which I have no idea what the motive is (I was not involved
> with OMPI when this was decided or debated.)
> 
> Basically, without these five changes OpenIB is dead in 1.7.4 and beyond
> for RC, XRC, and RoCEE. These are blockers to 1.7.4 and I don't believe
> that the onus falls squarely on Mellanox to fix these. These were
> community decisions and, as such, it must be a community effort to
> resolve. We are happy to lend a hand, but we are not fixing all of this
> mess.
> 
> I think that the 5 steps above sound correct and I agree that 1) this
> means 1.7.4 is on hold until we fix this and 2) that Mellanox shouldn't be
> the only one to 

Re: [OMPI devel] ROMIO update breaks trunk

2013-11-14 Thread Thomas Naughton

Hi Ralph,

Does the version in AM_INIT_AUTOMAKE in configure.ac also need to be
increased?  It currently shows 1.11.

Thanks,
--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Thu, 14 Nov 2013, Ralph Castain wrote:


Ha! Jeff points out that our web site says we are at AM 1.12.2 - yet our
HACKING file says 1.11.1  Sadness
I'll leave the romio update alone and update the HACKING file to avoid
future confusion


On Nov 14, 2013, at 12:41 PM, Ralph Castain  wrote:

  Just in case others are encountering this: the recent ROMIO
  update contains a line in its configure.ac that breaks the trunk
  for automake versions less than 1.12:

"I've looked a bit around online for this, and the consensus generally seems
 to be that AM_PROG_AR should be added in libtool, not in every configure.ac
 script out there. It's especially problematic as AM_PROG_AR doesn't exist i
n automake before 1.12, which means it breaks, among others, with the automa
ke we use to build our distribution tarballs :-)

See e.g. http://debbugs.gnu.org/cgi/bugreport.cgi?bug=11401 for a discussion
."

I'm going to comment that line out in ompi/mca/io/romio/romio/configure.ac s
o the trunk can build until someone figures out (a) if it is really needed, 
and (b) how to correctly add it


Ralph







Re: [OMPI devel] ROMIO update breaks trunk

2013-11-14 Thread Ralph Castain
Ha! Jeff points out that our web site says we are at AM 1.12.2 - yet our 
HACKING file says 1.11.1  Sadness

I'll leave the romio update alone and update the HACKING file to avoid future 
confusion


On Nov 14, 2013, at 12:41 PM, Ralph Castain  wrote:

> Just in case others are encountering this: the recent ROMIO update contains a 
> line in its configure.ac that breaks the trunk for automake versions less 
> than 1.12:
> 
> "I've looked a bit around online for this, and the consensus generally seems 
> to be that AM_PROG_AR should be added in libtool, not in every configure.ac 
> script out there. It's especially problematic as AM_PROG_AR doesn't exist in 
> automake before 1.12, which means it breaks, among others, with the automake 
> we use to build our distribution tarballs :-)
> 
> See e.g. http://debbugs.gnu.org/cgi/bugreport.cgi?bug=11401 for a discussion."
> 
> I'm going to comment that line out in ompi/mca/io/romio/romio/configure.ac so 
> the trunk can build until someone figures out (a) if it is really needed, and 
> (b) how to correctly add it
> 
> Ralph
> 
> 
> 
> 
> 



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
We are happy to provide access to our set of small test clusters and 
engineering resources, but, honestly, Nathan/LANL guys probably have better 
access to a big IB system.

I'm sure your boss could care less, but this is not Intel's code base. Sorry to 
be so blunt about it, Ralph. We've expended an enormous amount of effort 
*trying* to make OSHMEM something that works for the community and not just 
Mellanox customers. Believe me, we would rather focus our efforts elsewhere too.

Josh

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, November 14, 2013 3:32 PM
To: Open MPI Developers
Cc: Yiftah Shahar; Gilad Shainer
Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - 
in trunk: contrib/platform/iu/odin ompi/mca/btl/openib 
ompi/mca/btl/openib/connect


On Nov 14, 2013, at 12:21 PM, Barrett, Brian W 
> wrote:


On 11/14/13 1:13 PM, "Joshua Ladd" 
> wrote:


Let me try to summarize my understanding of the situation:

1. Ralph made the OOB asynchronous.

2. OOB cpcs don't work as a result of 1, and are thereby "deprecated",
meaning: won't fix.

3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm
in that move.  Never changed openib to use ofacm/common.

4. UDCM is "functional" in the trunk, still sitting in openib/connect.
But no one is entirely sure if it really works which is why it was
disabled in 1.7. Nathan - is there a design doc you can share on this
beyond the comments in the code?

5. In order to satisfy the "grand plan":
a. UDCM still needs to be moved to common/ofacm.
  b. OpenIB still needs to be changed to use common/ofacm.
  c.  RDMACM still needs to migrate to common/ofacm.
  d. XRC support needs to be added to UDCM and put into
common/ofacm.

6. The "grand plan" being:  move the BTLs into Opal - hence the need to
scuttle the OOB cpcs thereby justifying the deprecation and not fixing
cpcs after #1.

So, that's a quick roundup of how we ended up here (as I understand it.)
What needs to be done is:

That's my understanding as well.


1. Somebody needs to certify/review/ that what Nathan has done is sound.
>From my perspective, this is a BIG change and needs a comprehensive
architecture review. We've been using it in the trunk, and we've been
testing it under MTT for some time - but have not deployed or tested at
large-scale out in the field. Would be nice to see something on paper in
terms of a design doc.

2. Somebody then needs to move UDCM into common/ofacm.

3. Somebody needs to change openib to use common/ofacm cpcs instead of
openib/connect cpcs.

4. Somebody needs to move RDMACM into common/ofacm and make sure RoCEE
works.

5. Somebody needs to add XRC support to UDCM - whatever that might mean.
Given Nathan added UDCM back in 2011 and nobody is really sure it's ready
for prime-time, and given Pasha's comments regarding the difference in
state machine requirements  between the two connection schemes, this
doesn't seem like a trivial task.

Given Nathan's comments a second ago about ORNL not supporting the IB
Offload component, it barely makes sense to keep common/ofacm. And it
sounds like the two cpcs presently contained therein are now unusable.

All of this work is a result of the Grand Plan to move the BTLs into the
Opal layer - which I have no idea what the motive is (I was not involved
with OMPI when this was decided or debated.)

Basically, without these five changes OpenIB is dead in 1.7.4 and beyond
for RC, XRC, and RoCEE. These are blockers to 1.7.4 and I don't believe
that the onus falls squarely on Mellanox to fix these. These were
community decisions and, as such, it must be a community effort to
resolve. We are happy to lend a hand, but we are not fixing all of this
mess.

I think that the 5 steps above sound correct and I agree that 1) this
means 1.7.4 is on hold until we fix this and 2) that Mellanox shouldn't be
the only one to fix this for 1.7.4, given the amount of work involved.

Ralph, what, specifically, broke about the oob/xoob cpc mechanisms by
making the oob asynchronous?

Hard for me to say as I don't really have access to an IB machine any more. 
Odin is my sole reference point, and someone has had that fully locked up for 
more than a week (and I can't complain as I am totally a guest there). Even 
then, I can only test on a few nodes.

I have no objection to helping, but we need someone who cares about IB and has 
access to such a machine to take the lead. Otherwise, we're just spinning our 
wheels.

As for the work issue: note that this has been "under development" now for more 
than a year. We've talked at length about how "somebody" needs to fix the 
openib/ofacm issue, but everyone keeps pushing it down the road as "not mine". 
Like I said, I can help - but (a) my boss couldn't care less about this issue, 
and (b) I have 

[OMPI devel] ROMIO update breaks trunk

2013-11-14 Thread Ralph Castain
Just in case others are encountering this: the recent ROMIO update contains a 
line in its configure.ac that breaks the trunk for automake versions less than 
1.12:

"I've looked a bit around online for this, and the consensus generally seems to 
be that AM_PROG_AR should be added in libtool, not in every configure.ac script 
out there. It's especially problematic as AM_PROG_AR doesn't exist in automake 
before 1.12, which means it breaks, among others, with the automake we use to 
build our distribution tarballs :-)

See e.g. http://debbugs.gnu.org/cgi/bugreport.cgi?bug=11401 for a discussion."

I'm going to comment that line out in ompi/mca/io/romio/romio/configure.ac so 
the trunk can build until someone figures out (a) if it is really needed, and 
(b) how to correctly add it

Ralph








Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain

On Nov 14, 2013, at 12:21 PM, Barrett, Brian W  wrote:

> On 11/14/13 1:13 PM, "Joshua Ladd"  wrote:
> 
>> Let me try to summarize my understanding of the situation:
>> 
>> 1. Ralph made the OOB asynchronous.
>> 
>> 2. OOB cpcs don't work as a result of 1, and are thereby "deprecated",
>> meaning: won't fix.
>> 
>> 3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm
>> in that move.  Never changed openib to use ofacm/common.
>> 
>> 4. UDCM is "functional" in the trunk, still sitting in openib/connect.
>> But no one is entirely sure if it really works which is why it was
>> disabled in 1.7. Nathan - is there a design doc you can share on this
>> beyond the comments in the code?
>> 
>> 5. In order to satisfy the "grand plan":
>>  a. UDCM still needs to be moved to common/ofacm.
>>   b. OpenIB still needs to be changed to use common/ofacm.
>>   c.  RDMACM still needs to migrate to common/ofacm.
>>   d. XRC support needs to be added to UDCM and put into
>> common/ofacm.
>> 
>> 6. The "grand plan" being:  move the BTLs into Opal - hence the need to
>> scuttle the OOB cpcs thereby justifying the deprecation and not fixing
>> cpcs after #1.
>> 
>> So, that's a quick roundup of how we ended up here (as I understand it.)
>> What needs to be done is:
> 
> That's my understanding as well.
> 
>> 1. Somebody needs to certify/review/ that what Nathan has done is sound.
>> From my perspective, this is a BIG change and needs a comprehensive
>> architecture review. We've been using it in the trunk, and we've been
>> testing it under MTT for some time - but have not deployed or tested at
>> large-scale out in the field. Would be nice to see something on paper in
>> terms of a design doc.
>> 
>> 2. Somebody then needs to move UDCM into common/ofacm.
>> 
>> 3. Somebody needs to change openib to use common/ofacm cpcs instead of
>> openib/connect cpcs.
>> 
>> 4. Somebody needs to move RDMACM into common/ofacm and make sure RoCEE
>> works.
>> 
>> 5. Somebody needs to add XRC support to UDCM - whatever that might mean.
>> Given Nathan added UDCM back in 2011 and nobody is really sure it's ready
>> for prime-time, and given Pasha's comments regarding the difference in
>> state machine requirements  between the two connection schemes, this
>> doesn't seem like a trivial task.
>> 
>> Given Nathan's comments a second ago about ORNL not supporting the IB
>> Offload component, it barely makes sense to keep common/ofacm. And it
>> sounds like the two cpcs presently contained therein are now unusable.
>> 
>> All of this work is a result of the Grand Plan to move the BTLs into the
>> Opal layer - which I have no idea what the motive is (I was not involved
>> with OMPI when this was decided or debated.)
>> 
>> Basically, without these five changes OpenIB is dead in 1.7.4 and beyond
>> for RC, XRC, and RoCEE. These are blockers to 1.7.4 and I don't believe
>> that the onus falls squarely on Mellanox to fix these. These were
>> community decisions and, as such, it must be a community effort to
>> resolve. We are happy to lend a hand, but we are not fixing all of this
>> mess.
> 
> I think that the 5 steps above sound correct and I agree that 1) this
> means 1.7.4 is on hold until we fix this and 2) that Mellanox shouldn't be
> the only one to fix this for 1.7.4, given the amount of work involved.
> 
> Ralph, what, specifically, broke about the oob/xoob cpc mechanisms by
> making the oob asynchronous?

Hard for me to say as I don't really have access to an IB machine any more. 
Odin is my sole reference point, and someone has had that fully locked up for 
more than a week (and I can't complain as I am totally a guest there). Even 
then, I can only test on a few nodes.

I have no objection to helping, but we need someone who cares about IB and has 
access to such a machine to take the lead. Otherwise, we're just spinning our 
wheels.

As for the work issue: note that this has been "under development" now for more 
than a year. We've talked at length about how "somebody" needs to fix the 
openib/ofacm issue, but everyone keeps pushing it down the road as "not mine". 
Like I said, I can help - but (a) my boss couldn't care less about this issue, 
and (b) I have no way to test the results.



>  That is, 1-5 are a huge amount of work; have
> we done the analysis to say that updating the oob / xoob cpc to work with
> the new oob is actually more work than doing 1-5?  Obviously, there's long
> term plans that make oob/xoob problematic.  But those aren't 1.7 / 1.8
> plans.  Unfortunately, the cpcs were always out of my area of interest, so
> I'm flying a bit more blind than I'd like here.
> 
> Brian
> 
> --
>  Brian W. Barrett
>  Scalable System Software Group
>  Sandia National Laboratories
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Barrett, Brian W
On 11/14/13 1:13 PM, "Joshua Ladd"  wrote:

>Let me try to summarize my understanding of the situation:
>
>1. Ralph made the OOB asynchronous.
>
>2. OOB cpcs don't work as a result of 1, and are thereby "deprecated",
>meaning: won't fix.
>
>3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm
>in that move.  Never changed openib to use ofacm/common.
>
>4. UDCM is "functional" in the trunk, still sitting in openib/connect.
>But no one is entirely sure if it really works which is why it was
>disabled in 1.7. Nathan - is there a design doc you can share on this
>beyond the comments in the code?
>
>5. In order to satisfy the "grand plan":
>   a. UDCM still needs to be moved to common/ofacm.
>b. OpenIB still needs to be changed to use common/ofacm.
>c.  RDMACM still needs to migrate to common/ofacm.
>d. XRC support needs to be added to UDCM and put into
>common/ofacm.
>
>6. The "grand plan" being:  move the BTLs into Opal - hence the need to
>scuttle the OOB cpcs thereby justifying the deprecation and not fixing
>cpcs after #1.
>
>So, that's a quick roundup of how we ended up here (as I understand it.)
>What needs to be done is:

That's my understanding as well.

>1. Somebody needs to certify/review/ that what Nathan has done is sound.
>From my perspective, this is a BIG change and needs a comprehensive
>architecture review. We've been using it in the trunk, and we've been
>testing it under MTT for some time - but have not deployed or tested at
>large-scale out in the field. Would be nice to see something on paper in
>terms of a design doc.
>
>2. Somebody then needs to move UDCM into common/ofacm.
>
>3. Somebody needs to change openib to use common/ofacm cpcs instead of
>openib/connect cpcs.
>
>4. Somebody needs to move RDMACM into common/ofacm and make sure RoCEE
>works.
>
>5. Somebody needs to add XRC support to UDCM - whatever that might mean.
>Given Nathan added UDCM back in 2011 and nobody is really sure it's ready
>for prime-time, and given Pasha's comments regarding the difference in
>state machine requirements  between the two connection schemes, this
>doesn't seem like a trivial task.
>
>Given Nathan's comments a second ago about ORNL not supporting the IB
>Offload component, it barely makes sense to keep common/ofacm. And it
>sounds like the two cpcs presently contained therein are now unusable.
> 
>All of this work is a result of the Grand Plan to move the BTLs into the
>Opal layer - which I have no idea what the motive is (I was not involved
>with OMPI when this was decided or debated.)
>
>Basically, without these five changes OpenIB is dead in 1.7.4 and beyond
>for RC, XRC, and RoCEE. These are blockers to 1.7.4 and I don't believe
>that the onus falls squarely on Mellanox to fix these. These were
>community decisions and, as such, it must be a community effort to
>resolve. We are happy to lend a hand, but we are not fixing all of this
>mess.

I think that the 5 steps above sound correct and I agree that 1) this
means 1.7.4 is on hold until we fix this and 2) that Mellanox shouldn't be
the only one to fix this for 1.7.4, given the amount of work involved.

Ralph, what, specifically, broke about the oob/xoob cpc mechanisms by
making the oob asynchronous?  That is, 1-5 are a huge amount of work; have
we done the analysis to say that updating the oob / xoob cpc to work with
the new oob is actually more work than doing 1-5?  Obviously, there's long
term plans that make oob/xoob problematic.  But those aren't 1.7 / 1.8
plans.  Unfortunately, the cpcs were always out of my area of interest, so
I'm flying a bit more blind than I'd like here.

Brian

--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories






Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
Let me try to summarize my understanding of the situation:

1. Ralph made the OOB asynchronous.

2. OOB cpcs don't work as a result of 1, and are thereby "deprecated", meaning: 
won't fix.  

3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm in 
that move.  Never changed openib to use ofacm/common. 

4. UDCM is "functional" in the trunk, still sitting in openib/connect. But no 
one is entirely sure if it really works which is why it was disabled in 1.7. 
Nathan - is there a design doc you can share on this beyond the comments in the 
code?

5. In order to satisfy the "grand plan":
a. UDCM still needs to be moved to common/ofacm.
b. OpenIB still needs to be changed to use common/ofacm.
c.  RDMACM still needs to migrate to common/ofacm.
d. XRC support needs to be added to UDCM and put into 
common/ofacm.

6. The "grand plan" being:  move the BTLs into Opal - hence the need to scuttle 
the OOB cpcs thereby justifying the deprecation and not fixing cpcs after #1.

So, that's a quick roundup of how we ended up here (as I understand it.)  What 
needs to be done is:

1. Somebody needs to certify/review/ that what Nathan has done is sound. From 
my perspective, this is a BIG change and needs a comprehensive architecture 
review. We've been using it in the trunk, and we've been testing it under MTT 
for some time - but have not deployed or tested at large-scale out in the 
field. Would be nice to see something on paper in terms of a design doc. 

2. Somebody then needs to move UDCM into common/ofacm.

3. Somebody needs to change openib to use common/ofacm cpcs instead of 
openib/connect cpcs.

4. Somebody needs to move RDMACM into common/ofacm and make sure RoCEE works.

5. Somebody needs to add XRC support to UDCM - whatever that might mean. Given 
Nathan added UDCM back in 2011 and nobody is really sure it's ready for 
prime-time, and given Pasha's comments regarding the difference in state 
machine requirements  between the two connection schemes, this doesn't seem 
like a trivial task.

Given Nathan's comments a second ago about ORNL not supporting the IB Offload 
component, it barely makes sense to keep common/ofacm. And it sounds like the 
two cpcs presently contained therein are now unusable.
 
All of this work is a result of the Grand Plan to move the BTLs into the Opal 
layer - which I have no idea what the motive is (I was not involved with OMPI 
when this was decided or debated.) 

Basically, without these five changes OpenIB is dead in 1.7.4 and beyond for 
RC, XRC, and RoCEE. These are blockers to 1.7.4 and I don't believe that the 
onus falls squarely on Mellanox to fix these. These were community decisions 
and, as such, it must be a community effort to resolve. We are happy to lend a 
hand, but we are not fixing all of this mess.

Josh 

 

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Shamis, Pavel
Sent: Thursday, November 14, 2013 2:08 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - 
in trunk: contrib/platform/iu/odin ompi/mca/btl/openib 
ompi/mca/btl/openib/connect

There is some confusion in the thread. UDCM is just another CPC, like XOOB, 
OOB, and RDMACM (I think IBCM is officially dead).
XOOB and OOB don't use UDCM, they relay on ORTE out-of-band communication.

OpenIB/connect supports UDCM,XOOB,OOB, and RDMACM OFACM supports (at least last 
time when we checked) OOB and XOOB

RDMACM was not moved to OFACM, because of iWarp's "first message" requirement 
that used to break the abstraction.
Moreover RDMACM scalability used to be terrible, as a result no one in IB 
community really used it.
The situation is a bit different today, since ROCEE relays on RDMACM. It worth 
noting that you may setup ROCEE connections with a regular OOB with a some 
restrictions (we did it for mvapich-1).

The code between ofacm and openib is similar, but NOT the same. We change the 
API in a way that it allows to hide XRC QP management (there is hash table that 
manages QP to EP mapping) in OFACM instead of OPENIB.
This made openib initialization code a bit cleaner. Here is my old tree with 
openib btl changes https://bitbucket.org/pasha/ofacm

I hope it helps,

Best,
Pasha

On Nov 14, 2013, at 1:17 PM, Joshua Ladd  wrote:

> Unless someone went in and "fixed" the code in common (judging by the 
> comments, fixed seems to imply porting (x)oob to use UDCM, which hasn't been 
> done at all in the context of xoob and is incompletely patched and remains 
> unusable as a replacement for oob in 1.7.4), there is no reason to believe it 
> would work any different than the cpcs under btl/openib/connect. IIRC, it's 
> the same code - copy/pasted - just moved to a common location so Cheetah 
> collectives can do their wireup. So, if oob cpc doesn't work, ofacm oob won't 
> work either and, I guess, by extension, Cheetah 

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
I'm a bit outdated. What it the problem with oob / xoob ?
-Pasha

On Nov 14, 2013, at 3:07 PM, "Hjelm, Nathan T"  wrote:

> I don't think so. From what I understand the iboffload component may not live 
> much longer because of
> Mellanox's fork of Cheetah. So, it might not matter.
> 
> -Nathan
> 
> Excuse the *&(#$y Outlook posting-style. OWA sucks.
> 
> From: devel [devel-boun...@open-mpi.org] on behalf of Ralph Castain 
> [r...@open-mpi.org]
> Sent: Thursday, November 14, 2013 12:58 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full]svn:open-mpi  
>   r29703  - in trunk: contrib/platform/iu/odinompi/mca/btl/openib 
> ompi/mca/btl/openib/connect
> 
> The key question, though, is: has anyone checked to see if the ofacm code 
> even works any more??
> 
> Only oob and xoob components appear to be present - so unless someone fixed 
> those since they were originally copied from openib, I doubt ofacm works.
> 
> 
> On Nov 14, 2013, at 11:08 AM, Shamis, Pavel  wrote:
> 
>> There is some confusion in the thread. UDCM is just another CPC, like XOOB, 
>> OOB, and RDMACM (I think IBCM is officially dead).
>> XOOB and OOB don't use UDCM, they relay on ORTE out-of-band communication.
>> 
>> OpenIB/connect supports UDCM,XOOB,OOB, and RDMACM
>> OFACM supports (at least last time when we checked) OOB and XOOB
>> 
>> RDMACM was not moved to OFACM, because of iWarp's "first message" 
>> requirement that used to break the abstraction.
>> Moreover RDMACM scalability used to be terrible, as a result no one in IB 
>> community really used it.
>> The situation is a bit different today, since ROCEE relays on RDMACM. It 
>> worth noting that you may setup
>> ROCEE connections with a regular OOB with a some restrictions (we did it for 
>> mvapich-1).
>> 
>> The code between ofacm and openib is similar, but NOT the same. We change 
>> the API in a way that it allows
>> to hide XRC QP management (there is hash table that manages QP to EP 
>> mapping) in OFACM instead of OPENIB.
>> This made openib initialization code a bit cleaner. Here is my old tree with 
>> openib btl changes https://bitbucket.org/pasha/ofacm
>> 
>> I hope it helps,
>> 
>> Best,
>> Pasha
>> 
>> On Nov 14, 2013, at 1:17 PM, Joshua Ladd  wrote:
>> 
>>> Unless someone went in and "fixed" the code in common (judging by the 
>>> comments, fixed seems to imply porting (x)oob to use UDCM, which hasn't 
>>> been done at all in the context of xoob and is incompletely patched and 
>>> remains unusable as a replacement for oob in 1.7.4), there is no reason to 
>>> believe it would work any different than the cpcs under btl/openib/connect. 
>>> IIRC, it's the same code - copy/pasted - just moved to a common location so 
>>> Cheetah collectives can do their wireup. So, if oob cpc doesn't work, ofacm 
>>> oob won't work either and, I guess, by extension, Cheetah IBoffload won't 
>>> work. Pasha, correct me if you know different.
>>> 
>>> 
>>> Josh
>>> 
>>> 
>>> -Original Message-
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
>>> Sent: Thursday, November 14, 2013 1:05 PM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi 
>>> r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib 
>>> ompi/mca/btl/openib/connect
>>> 
>>> 
>>> On Nov 14, 2013, at 9:33 AM, Barrett, Brian W  wrote:
>>> 
 On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)"  wrote:
 
> Does XRC work with the UDCM CPC?
> 
> 
> On Nov 14, 2013, at 9:35 AM, Ralph Castain  wrote:
> 
>> I think the problems in udcm were fixed by Nathan quite some time
>> ago, but never moved to 1.7 as everyone was told that the connect
>> code in openib was already deprecated pending merge with the new
>> ofacm common code. Looking over at that area, I see only oob and
>> xoob - so if the users of the common ofacm code are finding that it
>> works, the simple answer may just be to finally complete the switchover.
>> 
>> Meantime, perhaps someone can CMR and review a copying of the udcm
>> cpc to the 1.7 branch?
>> 
>> 
>> On Nov 14, 2013, at 5:14 AM, Joshua Ladd  wrote:
>> 
>>> Um, no. It's supposed to work with UDCM which doesn't appear to be
>>> enabled in 1.7.
>>> 
>>> Per Ralph's comment to me last night:
>>> 
>>> "... you cannot use the oob connection manager. It doesn't work and
>>> was deprecated. You must use udcm, which is why things are supposed
>>> to be set to do so by default. Please check the openib connect
>>> priorities and correct them if necessary."
>>> 
>>> However, it's never been enabled in 1.7 - don't know what "borked"
>>> means, and from what Devendar tells 

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Hjelm, Nathan T
I don't think so. From what I understand the iboffload component may not live 
much longer because of
Mellanox's fork of Cheetah. So, it might not matter.

-Nathan

Excuse the *&(#$y Outlook posting-style. OWA sucks.

From: devel [devel-boun...@open-mpi.org] on behalf of Ralph Castain 
[r...@open-mpi.org]
Sent: Thursday, November 14, 2013 12:58 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full]svn:open-mpi
r29703  - in trunk: contrib/platform/iu/odinompi/mca/btl/openib 
ompi/mca/btl/openib/connect

The key question, though, is: has anyone checked to see if the ofacm code even 
works any more??

Only oob and xoob components appear to be present - so unless someone fixed 
those since they were originally copied from openib, I doubt ofacm works.


On Nov 14, 2013, at 11:08 AM, Shamis, Pavel  wrote:

> There is some confusion in the thread. UDCM is just another CPC, like XOOB, 
> OOB, and RDMACM (I think IBCM is officially dead).
> XOOB and OOB don't use UDCM, they relay on ORTE out-of-band communication.
>
> OpenIB/connect supports UDCM,XOOB,OOB, and RDMACM
> OFACM supports (at least last time when we checked) OOB and XOOB
>
> RDMACM was not moved to OFACM, because of iWarp's "first message" requirement 
> that used to break the abstraction.
> Moreover RDMACM scalability used to be terrible, as a result no one in IB 
> community really used it.
> The situation is a bit different today, since ROCEE relays on RDMACM. It 
> worth noting that you may setup
> ROCEE connections with a regular OOB with a some restrictions (we did it for 
> mvapich-1).
>
> The code between ofacm and openib is similar, but NOT the same. We change the 
> API in a way that it allows
> to hide XRC QP management (there is hash table that manages QP to EP mapping) 
> in OFACM instead of OPENIB.
> This made openib initialization code a bit cleaner. Here is my old tree with 
> openib btl changes https://bitbucket.org/pasha/ofacm
>
> I hope it helps,
>
> Best,
> Pasha
>
> On Nov 14, 2013, at 1:17 PM, Joshua Ladd  wrote:
>
>> Unless someone went in and "fixed" the code in common (judging by the 
>> comments, fixed seems to imply porting (x)oob to use UDCM, which hasn't been 
>> done at all in the context of xoob and is incompletely patched and remains 
>> unusable as a replacement for oob in 1.7.4), there is no reason to believe 
>> it would work any different than the cpcs under btl/openib/connect. IIRC, 
>> it's the same code - copy/pasted - just moved to a common location so 
>> Cheetah collectives can do their wireup. So, if oob cpc doesn't work, ofacm 
>> oob won't work either and, I guess, by extension, Cheetah IBoffload won't 
>> work. Pasha, correct me if you know different.
>>
>>
>> Josh
>>
>>
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
>> Sent: Thursday, November 14, 2013 1:05 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 
>> - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib 
>> ompi/mca/btl/openib/connect
>>
>>
>> On Nov 14, 2013, at 9:33 AM, Barrett, Brian W  wrote:
>>
>>> On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)"  wrote:
>>>
 Does XRC work with the UDCM CPC?


 On Nov 14, 2013, at 9:35 AM, Ralph Castain  wrote:

> I think the problems in udcm were fixed by Nathan quite some time
> ago, but never moved to 1.7 as everyone was told that the connect
> code in openib was already deprecated pending merge with the new
> ofacm common code. Looking over at that area, I see only oob and
> xoob - so if the users of the common ofacm code are finding that it
> works, the simple answer may just be to finally complete the switchover.
>
> Meantime, perhaps someone can CMR and review a copying of the udcm
> cpc to the 1.7 branch?
>
>
> On Nov 14, 2013, at 5:14 AM, Joshua Ladd  wrote:
>
>> Um, no. It's supposed to work with UDCM which doesn't appear to be
>> enabled in 1.7.
>>
>> Per Ralph's comment to me last night:
>>
>> "... you cannot use the oob connection manager. It doesn't work and
>> was deprecated. You must use udcm, which is why things are supposed
>> to be set to do so by default. Please check the openib connect
>> priorities and correct them if necessary."
>>
>> However, it's never been enabled in 1.7 - don't know what "borked"
>> means, and from what Devendar tells me, several UDCM commits that
>> are in the trunk have not been pushed over to 1.7:
>>
>> So, as of this moment, OpenIB BTL is essentially dead-in-the-water
>> in 1.7.
>>
>>
>>
>>>
>>> I'm going to start by admitting that I haven't been paying attention
>>> to IB the last 

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
The key question, though, is: has anyone checked to see if the ofacm code even 
works any more??

Only oob and xoob components appear to be present - so unless someone fixed 
those since they were originally copied from openib, I doubt ofacm works.


On Nov 14, 2013, at 11:08 AM, Shamis, Pavel  wrote:

> There is some confusion in the thread. UDCM is just another CPC, like XOOB, 
> OOB, and RDMACM (I think IBCM is officially dead).
> XOOB and OOB don't use UDCM, they relay on ORTE out-of-band communication.
> 
> OpenIB/connect supports UDCM,XOOB,OOB, and RDMACM
> OFACM supports (at least last time when we checked) OOB and XOOB
> 
> RDMACM was not moved to OFACM, because of iWarp's "first message" requirement 
> that used to break the abstraction.
> Moreover RDMACM scalability used to be terrible, as a result no one in IB 
> community really used it.
> The situation is a bit different today, since ROCEE relays on RDMACM. It 
> worth noting that you may setup
> ROCEE connections with a regular OOB with a some restrictions (we did it for 
> mvapich-1).
> 
> The code between ofacm and openib is similar, but NOT the same. We change the 
> API in a way that it allows
> to hide XRC QP management (there is hash table that manages QP to EP mapping) 
> in OFACM instead of OPENIB.
> This made openib initialization code a bit cleaner. Here is my old tree with 
> openib btl changes https://bitbucket.org/pasha/ofacm
> 
> I hope it helps,
> 
> Best,
> Pasha
> 
> On Nov 14, 2013, at 1:17 PM, Joshua Ladd  wrote:
> 
>> Unless someone went in and "fixed" the code in common (judging by the 
>> comments, fixed seems to imply porting (x)oob to use UDCM, which hasn't been 
>> done at all in the context of xoob and is incompletely patched and remains 
>> unusable as a replacement for oob in 1.7.4), there is no reason to believe 
>> it would work any different than the cpcs under btl/openib/connect. IIRC, 
>> it's the same code - copy/pasted - just moved to a common location so 
>> Cheetah collectives can do their wireup. So, if oob cpc doesn't work, ofacm 
>> oob won't work either and, I guess, by extension, Cheetah IBoffload won't 
>> work. Pasha, correct me if you know different. 
>> 
>> 
>> Josh
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
>> Sent: Thursday, November 14, 2013 1:05 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 
>> - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib 
>> ompi/mca/btl/openib/connect
>> 
>> 
>> On Nov 14, 2013, at 9:33 AM, Barrett, Brian W  wrote:
>> 
>>> On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)"  wrote:
>>> 
 Does XRC work with the UDCM CPC?
 
 
 On Nov 14, 2013, at 9:35 AM, Ralph Castain  wrote:
 
> I think the problems in udcm were fixed by Nathan quite some time 
> ago, but never moved to 1.7 as everyone was told that the connect 
> code in openib was already deprecated pending merge with the new 
> ofacm common code. Looking over at that area, I see only oob and 
> xoob - so if the users of the common ofacm code are finding that it 
> works, the simple answer may just be to finally complete the switchover.
> 
> Meantime, perhaps someone can CMR and review a copying of the udcm 
> cpc to the 1.7 branch?
> 
> 
> On Nov 14, 2013, at 5:14 AM, Joshua Ladd  wrote:
> 
>> Um, no. It's supposed to work with UDCM which doesn't appear to be 
>> enabled in 1.7.
>> 
>> Per Ralph's comment to me last night:
>> 
>> "... you cannot use the oob connection manager. It doesn't work and 
>> was deprecated. You must use udcm, which is why things are supposed 
>> to be set to do so by default. Please check the openib connect 
>> priorities and correct them if necessary."
>> 
>> However, it's never been enabled in 1.7 - don't know what "borked"
>> means, and from what Devendar tells me, several UDCM commits that 
>> are in the trunk have not been pushed over to 1.7:
>> 
>> So, as of this moment, OpenIB BTL is essentially dead-in-the-water 
>> in 1.7.
>> 
>> 
>> 
>>> 
>>> I'm going to start by admitting that I haven't been paying attention 
>>> to IB the last couple of months, so I'm out of my league a little bit 
>>> here.  I remember discussions of UDCM replacing OOB both because the 
>>> OOB CPC had some issues and because it would make it easier to move 
>>> the BTLs to the OPAL layer (ie, below the OOB).  But I also thought 
>>> that was more future work than it clearly was.  So can someone let me know:
>>> 
>>> 1) What the status of UDCM is (does it work reliably, does it support 
>>> XRC, etc.)
>> 
>> Seems to be working okay on the IB systems at LANL and IU. Don't know about 
>> XRC - I seem to recall 

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
There is some confusion in the thread. UDCM is just another CPC, like XOOB, 
OOB, and RDMACM (I think IBCM is officially dead).
XOOB and OOB don't use UDCM, they relay on ORTE out-of-band communication.

OpenIB/connect supports UDCM,XOOB,OOB, and RDMACM
OFACM supports (at least last time when we checked) OOB and XOOB

RDMACM was not moved to OFACM, because of iWarp's "first message" requirement 
that used to break the abstraction.
Moreover RDMACM scalability used to be terrible, as a result no one in IB 
community really used it.
The situation is a bit different today, since ROCEE relays on RDMACM. It worth 
noting that you may setup
ROCEE connections with a regular OOB with a some restrictions (we did it for 
mvapich-1).

The code between ofacm and openib is similar, but NOT the same. We change the 
API in a way that it allows
to hide XRC QP management (there is hash table that manages QP to EP mapping) 
in OFACM instead of OPENIB.
This made openib initialization code a bit cleaner. Here is my old tree with 
openib btl changes https://bitbucket.org/pasha/ofacm

I hope it helps,

Best,
Pasha

On Nov 14, 2013, at 1:17 PM, Joshua Ladd  wrote:

> Unless someone went in and "fixed" the code in common (judging by the 
> comments, fixed seems to imply porting (x)oob to use UDCM, which hasn't been 
> done at all in the context of xoob and is incompletely patched and remains 
> unusable as a replacement for oob in 1.7.4), there is no reason to believe it 
> would work any different than the cpcs under btl/openib/connect. IIRC, it's 
> the same code - copy/pasted - just moved to a common location so Cheetah 
> collectives can do their wireup. So, if oob cpc doesn't work, ofacm oob won't 
> work either and, I guess, by extension, Cheetah IBoffload won't work. Pasha, 
> correct me if you know different. 
> 
> 
> Josh
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Thursday, November 14, 2013 1:05 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 
> - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib 
> ompi/mca/btl/openib/connect
> 
> 
> On Nov 14, 2013, at 9:33 AM, Barrett, Brian W  wrote:
> 
>> On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)"  wrote:
>> 
>>> Does XRC work with the UDCM CPC?
>>> 
>>> 
>>> On Nov 14, 2013, at 9:35 AM, Ralph Castain  wrote:
>>> 
 I think the problems in udcm were fixed by Nathan quite some time 
 ago, but never moved to 1.7 as everyone was told that the connect 
 code in openib was already deprecated pending merge with the new 
 ofacm common code. Looking over at that area, I see only oob and 
 xoob - so if the users of the common ofacm code are finding that it 
 works, the simple answer may just be to finally complete the switchover.
 
 Meantime, perhaps someone can CMR and review a copying of the udcm 
 cpc to the 1.7 branch?
 
 
 On Nov 14, 2013, at 5:14 AM, Joshua Ladd  wrote:
 
> Um, no. It's supposed to work with UDCM which doesn't appear to be 
> enabled in 1.7.
> 
> Per Ralph's comment to me last night:
> 
> "... you cannot use the oob connection manager. It doesn't work and 
> was deprecated. You must use udcm, which is why things are supposed 
> to be set to do so by default. Please check the openib connect 
> priorities and correct them if necessary."
> 
> However, it's never been enabled in 1.7 - don't know what "borked"
> means, and from what Devendar tells me, several UDCM commits that 
> are in the trunk have not been pushed over to 1.7:
> 
> So, as of this moment, OpenIB BTL is essentially dead-in-the-water 
> in 1.7.
> 
> 
> 
>> 
>> I'm going to start by admitting that I haven't been paying attention 
>> to IB the last couple of months, so I'm out of my league a little bit 
>> here.  I remember discussions of UDCM replacing OOB both because the 
>> OOB CPC had some issues and because it would make it easier to move 
>> the BTLs to the OPAL layer (ie, below the OOB).  But I also thought 
>> that was more future work than it clearly was.  So can someone let me know:
>> 
>> 1) What the status of UDCM is (does it work reliably, does it support 
>> XRC, etc.)
> 
> Seems to be working okay on the IB systems at LANL and IU. Don't know about 
> XRC - I seem to recall the answer is "no"
> 
>> 2) What's the difference between CPCs and OFACM and what's our plans 
>> w.r.t 1.7 there?
> 
> Pasha created ofacm because some of the collective components now need to 
> forge connections. So he created the common/ofacm code to meet those needs, 
> with the intention of someday replacing the openib cpc's with the new common 
> code. However, this was stalled by the iWarp issue, and so it fell off the 
> table.
> 
> We now have 

Re: [OMPI devel] [EXTERNAL] What to do about openib/ofacm/cpc (was: r29703 - in trunk: contrib/p...)

2013-11-14 Thread Ralph Castain
Here's the core problem: it isn't a question of "if" some of these things 
should be resolved, but "who". They've been around for a very long time, but 
nobody has the time or will to fix them. I have no access to such machines, so 
all I can do is verify that it sorta compiles and is consistent with the code 
base. I can't verify that it works, nor debug it.

Guess my point is that someone who cares needs to cleanup the cpc vs ofacm 
problem and get whatever connection managers we want to support working. I 
removed the oob and xoob ones because (a) they don't work, and (b) I'm tired of 
repeatedly having to explain that to people.


On Nov 14, 2013, at 10:23 AM, Barrett, Brian W  wrote:

> On 11/14/13 11:16 AM, "Jeff Squyres (jsquyres)"  wrote:
> 
>> On Nov 14, 2013, at 1:03 PM, Ralph Castain  wrote:
>> 
 1) What the status of UDCM is (does it work reliably, does it support
 XRC, etc.)
>>> 
>>> Seems to be working okay on the IB systems at LANL and IU. Don't know
>>> about XRC - I seem to recall the answer is "no"
>> 
>> FWIW, I recall that when Cisco was testing UDCM (a long time ago --
>> before we threw away our IB gear...), we found bugs in UDCM that only
>> showed up with really large numbers of MTT tests running UDCM (i.e., 10K+
>> tests a night, especially with lots of UDCM-based jobs running
>> concurrently on the same cluster).  These types of bugs didn't show up in
>> casual testing.
>> 
>> Has that happened with the new/fixed UDCM?  Cisco is no longer in a
>> position to test this.
> 
> Neither are we at Sandia, unfortunately.  I only have 16 nodes for nightly
> testing, and only 8 of those are always running Linux, so that doesn't
> help much on the stress test.
> 
 2) What's the difference between CPCs and OFACM and what's our plans
 w.r.t 1.7 there?
>>> 
>>> Pasha created ofacm because some of the collective components now need
>>> to forge connections. So he created the common/ofacm code to meet those
>>> needs, with the intention of someday replacing the openib cpc's with the
>>> new common code. However, this was stalled by the iWarp issue, and so it
>>> fell off the table.
>>> 
>>> We now have two duplicate ways of doing the same thing, but with code
>>> in two different places. :-(
>> 
>> FWIW, the iWARP vendors have repeatedly been warned that ofacm is going
>> to take over, and unless they supply patches, iWarp will stop working in
>> Open MPI.  I know for a fact that they are very aware of this.
>> 
>> So my $0.02 is that ofacm should take over -- let's get rid of CPC and
>> have openib use the ofacm.  The iWarp folks can play catch up if/when
>> they want to.  
>> 
>> Of course, I'm not in this part of the code base any more, so it's not
>> really my call -- just my $0.02...
> 
> Of course, that doesn't help with the core issue; we can't have a
> regression w.r.t XRC support between 1.7.3 and 1.7.4.  But I agree, I'm
> fine with only fixing this in one place.
> 
> Brian
> 
> --
>  Brian W. Barrett
>  Scalable System Software Group
>  Sandia National Laboratories
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
Unless someone went in and "fixed" the code in common (judging by the comments, 
fixed seems to imply porting (x)oob to use UDCM, which hasn't been done at all 
in the context of xoob and is incompletely patched and remains unusable as a 
replacement for oob in 1.7.4), there is no reason to believe it would work any 
different than the cpcs under btl/openib/connect. IIRC, it's the same code - 
copy/pasted - just moved to a common location so Cheetah collectives can do 
their wireup. So, if oob cpc doesn't work, ofacm oob won't work either and, I 
guess, by extension, Cheetah IBoffload won't work. Pasha, correct me if you 
know different. 


Josh


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, November 14, 2013 1:05 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - 
in trunk: contrib/platform/iu/odin ompi/mca/btl/openib 
ompi/mca/btl/openib/connect


On Nov 14, 2013, at 9:33 AM, Barrett, Brian W  wrote:

> On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)"  wrote:
> 
>> Does XRC work with the UDCM CPC?
>> 
>> 
>> On Nov 14, 2013, at 9:35 AM, Ralph Castain  wrote:
>> 
>>> I think the problems in udcm were fixed by Nathan quite some time 
>>> ago, but never moved to 1.7 as everyone was told that the connect 
>>> code in openib was already deprecated pending merge with the new 
>>> ofacm common code. Looking over at that area, I see only oob and 
>>> xoob - so if the users of the common ofacm code are finding that it 
>>> works, the simple answer may just be to finally complete the switchover.
>>> 
>>> Meantime, perhaps someone can CMR and review a copying of the udcm 
>>> cpc to the 1.7 branch?
>>> 
>>> 
>>> On Nov 14, 2013, at 5:14 AM, Joshua Ladd  wrote:
>>> 
 Um, no. It's supposed to work with UDCM which doesn't appear to be 
 enabled in 1.7.
 
 Per Ralph's comment to me last night:
 
 "... you cannot use the oob connection manager. It doesn't work and 
 was deprecated. You must use udcm, which is why things are supposed 
 to be set to do so by default. Please check the openib connect 
 priorities and correct them if necessary."
 
 However, it's never been enabled in 1.7 - don't know what "borked"
 means, and from what Devendar tells me, several UDCM commits that 
 are in the trunk have not been pushed over to 1.7:
 
 So, as of this moment, OpenIB BTL is essentially dead-in-the-water 
 in 1.7.
 
 
 
> 
> I'm going to start by admitting that I haven't been paying attention 
> to IB the last couple of months, so I'm out of my league a little bit 
> here.  I remember discussions of UDCM replacing OOB both because the 
> OOB CPC had some issues and because it would make it easier to move 
> the BTLs to the OPAL layer (ie, below the OOB).  But I also thought 
> that was more future work than it clearly was.  So can someone let me know:
> 
>  1) What the status of UDCM is (does it work reliably, does it support 
> XRC, etc.)

Seems to be working okay on the IB systems at LANL and IU. Don't know about XRC 
- I seem to recall the answer is "no"

>  2) What's the difference between CPCs and OFACM and what's our plans 
> w.r.t 1.7 there?

Pasha created ofacm because some of the collective components now need to forge 
connections. So he created the common/ofacm code to meet those needs, with the 
intention of someday replacing the openib cpc's with the new common code. 
However, this was stalled by the iWarp issue, and so it fell off the table.

We now have two duplicate ways of doing the same thing, but with code in two 
different places. :-(

>  3) Someone mentioned that ofacm oob worked, but cpc oob didn't.  Can 
> someone explain why?

I'm not sure that is actually true as there is no indication that anyone is 
using or testing the collective components that use ofacm code.


> 
> Again, sorry for being dense; I've been spending too much time in 
> Portals land lately.
> 
> Brian
> 
> --
>  Brian W. Barrett
>  Scalable System Software Group
>  Sandia National Laboratories
> 
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


[OMPI devel] What to do about openib/ofacm/cpc (was: r29703 - in trunk: contrib/p...)

2013-11-14 Thread Jeff Squyres (jsquyres)
On Nov 14, 2013, at 1:03 PM, Ralph Castain  wrote:

>> 1) What the status of UDCM is (does it work reliably, does it support
>> XRC, etc.)
> 
> Seems to be working okay on the IB systems at LANL and IU. Don't know about 
> XRC - I seem to recall the answer is "no"

FWIW, I recall that when Cisco was testing UDCM (a long time ago -- before we 
threw away our IB gear...), we found bugs in UDCM that only showed up with 
really large numbers of MTT tests running UDCM (i.e., 10K+ tests a night, 
especially with lots of UDCM-based jobs running concurrently on the same 
cluster).  These types of bugs didn't show up in casual testing.

Has that happened with the new/fixed UDCM?  Cisco is no longer in a position to 
test this.

>> 2) What's the difference between CPCs and OFACM and what's our plans
>> w.r.t 1.7 there?
> 
> Pasha created ofacm because some of the collective components now need to 
> forge connections. So he created the common/ofacm code to meet those needs, 
> with the intention of someday replacing the openib cpc's with the new common 
> code. However, this was stalled by the iWarp issue, and so it fell off the 
> table.
> 
> We now have two duplicate ways of doing the same thing, but with code in two 
> different places. :-(

FWIW, the iWARP vendors have repeatedly been warned that ofacm is going to take 
over, and unless they supply patches, iWarp will stop working in Open MPI.  I 
know for a fact that they are very aware of this.

So my $0.02 is that ofacm should take over -- let's get rid of CPC and have 
openib use the ofacm.  The iWarp folks can play catch up if/when they want to.  

Of course, I'm not in this part of the code base any more, so it's not really 
my call -- just my $0.02...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain

On Nov 14, 2013, at 9:33 AM, Barrett, Brian W  wrote:

> On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)"  wrote:
> 
>> Does XRC work with the UDCM CPC?
>> 
>> 
>> On Nov 14, 2013, at 9:35 AM, Ralph Castain  wrote:
>> 
>>> I think the problems in udcm were fixed by Nathan quite some time ago,
>>> but never moved to 1.7 as everyone was told that the connect code in
>>> openib was already deprecated pending merge with the new ofacm common
>>> code. Looking over at that area, I see only oob and xoob - so if the
>>> users of the common ofacm code are finding that it works, the simple
>>> answer may just be to finally complete the switchover.
>>> 
>>> Meantime, perhaps someone can CMR and review a copying of the udcm cpc
>>> to the 1.7 branch?
>>> 
>>> 
>>> On Nov 14, 2013, at 5:14 AM, Joshua Ladd  wrote:
>>> 
 Um, no. It's supposed to work with UDCM which doesn't appear to be
 enabled in 1.7.
 
 Per Ralph's comment to me last night:
 
 "... you cannot use the oob connection manager. It doesn't work and
 was deprecated. You must use udcm, which is why things are supposed to
 be set to do so by default. Please check the openib connect priorities
 and correct them if necessary."
 
 However, it's never been enabled in 1.7 - don't know what "borked"
 means, and from what Devendar tells me, several UDCM commits that are
 in the trunk have not been pushed over to 1.7:
 
 So, as of this moment, OpenIB BTL is essentially dead-in-the-water in
 1.7.
 
 
 
> 
> I'm going to start by admitting that I haven't been paying attention to IB
> the last couple of months, so I'm out of my league a little bit here.  I
> remember discussions of UDCM replacing OOB both because the OOB CPC had
> some issues and because it would make it easier to move the BTLs to the
> OPAL layer (ie, below the OOB).  But I also thought that was more future
> work than it clearly was.  So can someone let me know:
> 
>  1) What the status of UDCM is (does it work reliably, does it support
> XRC, etc.)

Seems to be working okay on the IB systems at LANL and IU. Don't know about XRC 
- I seem to recall the answer is "no"

>  2) What's the difference between CPCs and OFACM and what's our plans
> w.r.t 1.7 there?

Pasha created ofacm because some of the collective components now need to forge 
connections. So he created the common/ofacm code to meet those needs, with the 
intention of someday replacing the openib cpc's with the new common code. 
However, this was stalled by the iWarp issue, and so it fell off the table.

We now have two duplicate ways of doing the same thing, but with code in two 
different places. :-(

>  3) Someone mentioned that ofacm oob worked, but cpc oob didn't.  Can
> someone explain why?

I'm not sure that is actually true as there is no indication that anyone is 
using or testing the collective components that use ofacm code.


> 
> Again, sorry for being dense; I've been spending too much time in Portals
> land lately.
> 
> Brian
> 
> --
>  Brian W. Barrett
>  Scalable System Software Group
>  Sandia National Laboratories
> 
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Barrett, Brian W
On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)"  wrote:

>Does XRC work with the UDCM CPC?
>
>
>On Nov 14, 2013, at 9:35 AM, Ralph Castain  wrote:
>
>> I think the problems in udcm were fixed by Nathan quite some time ago,
>>but never moved to 1.7 as everyone was told that the connect code in
>>openib was already deprecated pending merge with the new ofacm common
>>code. Looking over at that area, I see only oob and xoob - so if the
>>users of the common ofacm code are finding that it works, the simple
>>answer may just be to finally complete the switchover.
>> 
>> Meantime, perhaps someone can CMR and review a copying of the udcm cpc
>>to the 1.7 branch?
>> 
>> 
>> On Nov 14, 2013, at 5:14 AM, Joshua Ladd  wrote:
>> 
>>> Um, no. It's supposed to work with UDCM which doesn't appear to be
>>>enabled in 1.7.
>>> 
>>> Per Ralph's comment to me last night:
>>> 
>>> "... you cannot use the oob connection manager. It doesn't work and
>>>was deprecated. You must use udcm, which is why things are supposed to
>>>be set to do so by default. Please check the openib connect priorities
>>>and correct them if necessary."
>>> 
>>> However, it's never been enabled in 1.7 - don't know what "borked"
>>>means, and from what Devendar tells me, several UDCM commits that are
>>>in the trunk have not been pushed over to 1.7:
>>> 
>>> So, as of this moment, OpenIB BTL is essentially dead-in-the-water in
>>>1.7.
>>> 
>>> 
>>> 

I'm going to start by admitting that I haven't been paying attention to IB
the last couple of months, so I'm out of my league a little bit here.  I
remember discussions of UDCM replacing OOB both because the OOB CPC had
some issues and because it would make it easier to move the BTLs to the
OPAL layer (ie, below the OOB).  But I also thought that was more future
work than it clearly was.  So can someone let me know:

  1) What the status of UDCM is (does it work reliably, does it support
XRC, etc.)
  2) What's the difference between CPCs and OFACM and what's our plans
w.r.t 1.7 there?
  3) Someone mentioned that ofacm oob worked, but cpc oob didn't.  Can
someone explain why?

Again, sorry for being dense; I've been spending too much time in Portals
land lately.

Brian

--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories







Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
Yeah, I believe that is true as well. However, we have been bugging people to 
fix this for a long time, and nobody appears to have the cycles to do so.

As a reminder: we have to remove all OOB dependence on connections in the BTL 
as we are moving the BTLs to OPAL. Hence, there is no interest in fixing the 
OOB cpc's as we are about to blow them away.


On Nov 14, 2013, at 9:01 AM, Shamis, Pavel  wrote:

> When I looked at the code last time - no.
> (The connection state machine is very different)
> 
> Pavel (Pasha) Shamis
> ---
> Computer Science Research Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
> 
> 
> 
> 
> 
> 
> On Nov 14, 2013, at 11:51 AM, Jeff Squyres (jsquyres) 
> > wrote:
> 
> Does XRC work with the UDCM CPC?
> 
> 
> On Nov 14, 2013, at 9:35 AM, Ralph Castain 
> > wrote:
> 
> I think the problems in udcm were fixed by Nathan quite some time ago, but 
> never moved to 1.7 as everyone was told that the connect code in openib was 
> already deprecated pending merge with the new ofacm common code. Looking over 
> at that area, I see only oob and xoob - so if the users of the common ofacm 
> code are finding that it works, the simple answer may just be to finally 
> complete the switchover.
> 
> Meantime, perhaps someone can CMR and review a copying of the udcm cpc to the 
> 1.7 branch?
> 
> 
> On Nov 14, 2013, at 5:14 AM, Joshua Ladd 
> > wrote:
> 
> Um, no. It's supposed to work with UDCM which doesn't appear to be enabled in 
> 1.7.
> 
> Per Ralph's comment to me last night:
> 
> "... you cannot use the oob connection manager. It doesn't work and was 
> deprecated. You must use udcm, which is why things are supposed to be set to 
> do so by default. Please check the openib connect priorities and correct them 
> if necessary."
> 
> However, it's never been enabled in 1.7 - don't know what "borked" means, and 
> from what Devendar tells me, several UDCM commits that are in the trunk have 
> not been pushed over to 1.7:
> 
> So, as of this moment, OpenIB BTL is essentially dead-in-the-water in 1.7.
> 
> 
> 
>[enable_connectx_xrc="$enableval"], 
> [enable_connectx_xrc="yes"])
> #
> # Unconnect Datagram (UD) based connection manager
> #
> #AC_ARG_ENABLE([openib-udcm],
> #[AC_HELP_STRING([--enable-openib-udcm],
> #[Enable datagram connection support in openib BTL 
> (default: enabled)])],
> #[enable_openib_udcm="$enableval"], 
> [enable_openib_udcm="yes"])
> # Per discussion with Ralph and Nathan, disable UDCM for now.
> # It's borked and needs some surgery to get back on its feet.
> enable_openib_udcm=no
> 
> 
> Josh
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] 
> On Behalf Of Jeff Squyres (jsquyres)
> Sent: Thursday, November 14, 2013 6:44 AM
> To: >
> Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: 
> contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect
> 
> Does the openib *only* work with RDMACM now?
> 
> That's surprising (and bad!).
> 
> Did someone ask Mellanox about fixing the OOB and XOOB CPCs?
> 
> 
> On Nov 13, 2013, at 11:16 PM, 
> svn-commit-mai...@open-mpi.org wrote:
> 
> Author: rhc (Ralph Castain)
> Date: 2013-11-13 23:16:53 EST (Wed, 13 Nov 2013)
> New Revision: 29703
> URL: https://svn.open-mpi.org/trac/ompi/changeset/29703
> 
> Log:
> Given that the oob and xoob cpc's are no longer operable and haven't been 
> since the OOB update, remove them to avoid confusion
> 
> cmr:v1.7.4:reviewer=hjelmn:subject=Remove stale cpcs from openib
> 
> Deleted:
> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.h
> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c
> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.h
> Text files modified:
> trunk/contrib/platform/iu/odin/optimized.conf   | 1
> trunk/contrib/platform/iu/odin/static.conf  | 1
> trunk/ompi/mca/btl/openib/Makefile.am   |10
> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_base.c |14
> /dev/null   |   975 
> -
> /dev/null   |18
> /dev/null   |  1150 
> 
> /dev/null   |19
> 8 files changed, 5 insertions(+), 2183 deletions(-)
> 
> Modified: trunk/contrib/platform/iu/odin/optimized.conf
> 

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
I see comments alluding to the fact that it does not, but with the intentions 
to add it: Hopefully, xoob in common will work. 


#if HAVE_XRC
if (mca_btl_openib_component.num_xrc_qps > 0) {
BTL_VERBOSE(("UD CPC does not support XRC QPs (yet)"));
break;
}
#endif

for (i = 0 ; i < mca_btl_openib_component.num_qps ; ++i) {
qps[i].psn= lcl_ep->qps[i].qp->lcl_psn;
qps[i].qp_num = lcl_ep->qps[i].qp->lcl_qp->qp_num;
/* NTH: TODO -- add XRC support */
}


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Thursday, November 14, 2013 11:51 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: 
contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

Does XRC work with the UDCM CPC?


On Nov 14, 2013, at 9:35 AM, Ralph Castain  wrote:

> I think the problems in udcm were fixed by Nathan quite some time ago, but 
> never moved to 1.7 as everyone was told that the connect code in openib was 
> already deprecated pending merge with the new ofacm common code. Looking over 
> at that area, I see only oob and xoob - so if the users of the common ofacm 
> code are finding that it works, the simple answer may just be to finally 
> complete the switchover.
> 
> Meantime, perhaps someone can CMR and review a copying of the udcm cpc to the 
> 1.7 branch?
> 
> 
> On Nov 14, 2013, at 5:14 AM, Joshua Ladd  wrote:
> 
>> Um, no. It's supposed to work with UDCM which doesn't appear to be enabled 
>> in 1.7.
>> 
>> Per Ralph's comment to me last night:
>> 
>> "... you cannot use the oob connection manager. It doesn't work and was 
>> deprecated. You must use udcm, which is why things are supposed to be set to 
>> do so by default. Please check the openib connect priorities and correct 
>> them if necessary."
>> 
>> However, it's never been enabled in 1.7 - don't know what "borked" means, 
>> and from what Devendar tells me, several UDCM commits that are in the trunk 
>> have not been pushed over to 1.7:
>> 
>> So, as of this moment, OpenIB BTL is essentially dead-in-the-water in 1.7.
>> 
>> 
>> 
>>  [enable_connectx_xrc="$enableval"], 
>> [enable_connectx_xrc="yes"])
>>   #
>>   # Unconnect Datagram (UD) based connection manager
>>   #
>> #AC_ARG_ENABLE([openib-udcm],
>> #[AC_HELP_STRING([--enable-openib-udcm],
>> #[Enable datagram connection support in openib BTL 
>> (default: enabled)])], 
>> #[enable_openib_udcm="$enableval"], 
>> [enable_openib_udcm="yes"])
>>   # Per discussion with Ralph and Nathan, disable UDCM for now.
>>   # It's borked and needs some surgery to get back on its feet.
>>   enable_openib_udcm=no
>> 
>> 
>> Josh
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
>> (jsquyres)
>> Sent: Thursday, November 14, 2013 6:44 AM
>> To: 
>> Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: 
>> contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect
>> 
>> Does the openib *only* work with RDMACM now?
>> 
>> That's surprising (and bad!).
>> 
>> Did someone ask Mellanox about fixing the OOB and XOOB CPCs?
>> 
>> 
>> On Nov 13, 2013, at 11:16 PM, svn-commit-mai...@open-mpi.org wrote:
>> 
>>> Author: rhc (Ralph Castain)
>>> Date: 2013-11-13 23:16:53 EST (Wed, 13 Nov 2013)
>>> New Revision: 29703
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29703
>>> 
>>> Log:
>>> Given that the oob and xoob cpc's are no longer operable and haven't been 
>>> since the OOB update, remove them to avoid confusion
>>> 
>>> cmr:v1.7.4:reviewer=hjelmn:subject=Remove stale cpcs from openib
>>> 
>>> Deleted:
>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.h
>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c
>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.h
>>> Text files modified: 
>>> trunk/contrib/platform/iu/odin/optimized.conf   | 1 
>>> 
>>> trunk/contrib/platform/iu/odin/static.conf  | 1 
>>> 
>>> trunk/ompi/mca/btl/openib/Makefile.am   |10 
>>> 
>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_base.c |14 
>>> 
>>> /dev/null   |   975 
>>> -   
>>> /dev/null   |18 
>>> 
>>> /dev/null   |  1150 
>>> 
>>> /dev/null 

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
When I looked at the code last time - no.
(The connection state machine is very different)

Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Nov 14, 2013, at 11:51 AM, Jeff Squyres (jsquyres) 
> wrote:

Does XRC work with the UDCM CPC?


On Nov 14, 2013, at 9:35 AM, Ralph Castain 
> wrote:

I think the problems in udcm were fixed by Nathan quite some time ago, but 
never moved to 1.7 as everyone was told that the connect code in openib was 
already deprecated pending merge with the new ofacm common code. Looking over 
at that area, I see only oob and xoob - so if the users of the common ofacm 
code are finding that it works, the simple answer may just be to finally 
complete the switchover.

Meantime, perhaps someone can CMR and review a copying of the udcm cpc to the 
1.7 branch?


On Nov 14, 2013, at 5:14 AM, Joshua Ladd 
> wrote:

Um, no. It's supposed to work with UDCM which doesn't appear to be enabled in 
1.7.

Per Ralph's comment to me last night:

"... you cannot use the oob connection manager. It doesn't work and was 
deprecated. You must use udcm, which is why things are supposed to be set to do 
so by default. Please check the openib connect priorities and correct them if 
necessary."

However, it's never been enabled in 1.7 - don't know what "borked" means, and 
from what Devendar tells me, several UDCM commits that are in the trunk have 
not been pushed over to 1.7:

So, as of this moment, OpenIB BTL is essentially dead-in-the-water in 1.7.



[enable_connectx_xrc="$enableval"], 
[enable_connectx_xrc="yes"])
 #
 # Unconnect Datagram (UD) based connection manager
 #
#AC_ARG_ENABLE([openib-udcm],
#[AC_HELP_STRING([--enable-openib-udcm],
#[Enable datagram connection support in openib BTL 
(default: enabled)])],
#[enable_openib_udcm="$enableval"], 
[enable_openib_udcm="yes"])
 # Per discussion with Ralph and Nathan, disable UDCM for now.
 # It's borked and needs some surgery to get back on its feet.
 enable_openib_udcm=no


Josh


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On 
Behalf Of Jeff Squyres (jsquyres)
Sent: Thursday, November 14, 2013 6:44 AM
To: >
Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: 
contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

Does the openib *only* work with RDMACM now?

That's surprising (and bad!).

Did someone ask Mellanox about fixing the OOB and XOOB CPCs?


On Nov 13, 2013, at 11:16 PM, 
svn-commit-mai...@open-mpi.org wrote:

Author: rhc (Ralph Castain)
List-Post: devel@lists.open-mpi.org
Date: 2013-11-13 23:16:53 EST (Wed, 13 Nov 2013)
New Revision: 29703
URL: https://svn.open-mpi.org/trac/ompi/changeset/29703

Log:
Given that the oob and xoob cpc's are no longer operable and haven't been since 
the OOB update, remove them to avoid confusion

cmr:v1.7.4:reviewer=hjelmn:subject=Remove stale cpcs from openib

Deleted:
trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.h
trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c
trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.h
Text files modified:
trunk/contrib/platform/iu/odin/optimized.conf   | 1
trunk/contrib/platform/iu/odin/static.conf  | 1
trunk/ompi/mca/btl/openib/Makefile.am   |10
trunk/ompi/mca/btl/openib/connect/btl_openib_connect_base.c |14
/dev/null   |   975 
-
/dev/null   |18
/dev/null   |  1150 

/dev/null   |19
8 files changed, 5 insertions(+), 2183 deletions(-)

Modified: trunk/contrib/platform/iu/odin/optimized.conf
==
--- trunk/contrib/platform/iu/odin/optimized.conf   Wed Nov 13 19:34:15 2013
(r29702)
+++ trunk/contrib/platform/iu/odin/optimized.conf   2013-11-13 23:16:53 EST 
(Wed, 13 Nov 2013)  (r29703)
@@ -80,7 +80,6 @@

## Setup OpenIB
btl_openib_want_fork_support = 0
-btl_openib_cpc_include = oob
#btl_openib_receive_queues = 
P,128,256,64,32,32:S,2048,1024,128,32:S,12288,1024,128,32:S,65536,1024,128,32

## Setup TCP

Modified: trunk/contrib/platform/iu/odin/static.conf
==
--- trunk/contrib/platform/iu/odin/static.conf  Wed Nov 13 

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Jeff Squyres (jsquyres)
Does XRC work with the UDCM CPC?


On Nov 14, 2013, at 9:35 AM, Ralph Castain  wrote:

> I think the problems in udcm were fixed by Nathan quite some time ago, but 
> never moved to 1.7 as everyone was told that the connect code in openib was 
> already deprecated pending merge with the new ofacm common code. Looking over 
> at that area, I see only oob and xoob - so if the users of the common ofacm 
> code are finding that it works, the simple answer may just be to finally 
> complete the switchover.
> 
> Meantime, perhaps someone can CMR and review a copying of the udcm cpc to the 
> 1.7 branch?
> 
> 
> On Nov 14, 2013, at 5:14 AM, Joshua Ladd  wrote:
> 
>> Um, no. It's supposed to work with UDCM which doesn't appear to be enabled 
>> in 1.7.
>> 
>> Per Ralph's comment to me last night:
>> 
>> "... you cannot use the oob connection manager. It doesn't work and was 
>> deprecated. You must use udcm, which is why things are supposed to be set to 
>> do so by default. Please check the openib connect priorities and correct 
>> them if necessary."
>> 
>> However, it's never been enabled in 1.7 - don't know what "borked" means, 
>> and from what Devendar tells me, several UDCM commits that are in the trunk 
>> have not been pushed over to 1.7:
>> 
>> So, as of this moment, OpenIB BTL is essentially dead-in-the-water in 1.7.
>> 
>> 
>> 
>>  [enable_connectx_xrc="$enableval"], 
>> [enable_connectx_xrc="yes"])
>>   #
>>   # Unconnect Datagram (UD) based connection manager
>>   #
>> #AC_ARG_ENABLE([openib-udcm],
>> #[AC_HELP_STRING([--enable-openib-udcm],
>> #[Enable datagram connection support in openib BTL 
>> (default: enabled)])], 
>> #[enable_openib_udcm="$enableval"], 
>> [enable_openib_udcm="yes"])
>>   # Per discussion with Ralph and Nathan, disable UDCM for now.
>>   # It's borked and needs some surgery to get back on its feet.
>>   enable_openib_udcm=no
>> 
>> 
>> Josh
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
>> (jsquyres)
>> Sent: Thursday, November 14, 2013 6:44 AM
>> To: 
>> Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: 
>> contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect
>> 
>> Does the openib *only* work with RDMACM now?
>> 
>> That's surprising (and bad!).
>> 
>> Did someone ask Mellanox about fixing the OOB and XOOB CPCs?
>> 
>> 
>> On Nov 13, 2013, at 11:16 PM, svn-commit-mai...@open-mpi.org wrote:
>> 
>>> Author: rhc (Ralph Castain)
>>> Date: 2013-11-13 23:16:53 EST (Wed, 13 Nov 2013)
>>> New Revision: 29703
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29703
>>> 
>>> Log:
>>> Given that the oob and xoob cpc's are no longer operable and haven't been 
>>> since the OOB update, remove them to avoid confusion
>>> 
>>> cmr:v1.7.4:reviewer=hjelmn:subject=Remove stale cpcs from openib
>>> 
>>> Deleted:
>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.h
>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c
>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.h
>>> Text files modified: 
>>> trunk/contrib/platform/iu/odin/optimized.conf   | 1 
>>> 
>>> trunk/contrib/platform/iu/odin/static.conf  | 1 
>>> 
>>> trunk/ompi/mca/btl/openib/Makefile.am   |10 
>>> 
>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_base.c |14 
>>> 
>>> /dev/null   |   975 
>>> -   
>>> /dev/null   |18 
>>> 
>>> /dev/null   |  1150 
>>> 
>>> /dev/null   |19 
>>> 
>>> 8 files changed, 5 insertions(+), 2183 deletions(-)
>>> 
>>> Modified: trunk/contrib/platform/iu/odin/optimized.conf
>>> ==
>>> --- trunk/contrib/platform/iu/odin/optimized.conf   Wed Nov 13 19:34:15 
>>> 2013(r29702)
>>> +++ trunk/contrib/platform/iu/odin/optimized.conf   2013-11-13 23:16:53 EST 
>>> (Wed, 13 Nov 2013)  (r29703)
>>> @@ -80,7 +80,6 @@
>>> 
>>> ## Setup OpenIB
>>> btl_openib_want_fork_support = 0 
>>> -btl_openib_cpc_include = oob 
>>> #btl_openib_receive_queues = 
>>> P,128,256,64,32,32:S,2048,1024,128,32:S,12288,1024,128,32:S,65536,1024,128,32
>>>  
>>> 
>>> ## Setup TCP
>>> 
>>> Modified: 

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Ralph Castain
I think the problems in udcm were fixed by Nathan quite some time ago, but 
never moved to 1.7 as everyone was told that the connect code in openib was 
already deprecated pending merge with the new ofacm common code. Looking over 
at that area, I see only oob and xoob - so if the users of the common ofacm 
code are finding that it works, the simple answer may just be to finally 
complete the switchover.

Meantime, perhaps someone can CMR and review a copying of the udcm cpc to the 
1.7 branch?


On Nov 14, 2013, at 5:14 AM, Joshua Ladd  wrote:

> Um, no. It's supposed to work with UDCM which doesn't appear to be enabled in 
> 1.7.
> 
> Per Ralph's comment to me last night:
> 
> "... you cannot use the oob connection manager. It doesn't work and was 
> deprecated. You must use udcm, which is why things are supposed to be set to 
> do so by default. Please check the openib connect priorities and correct them 
> if necessary."
> 
> However, it's never been enabled in 1.7 - don't know what "borked" means, and 
> from what Devendar tells me, several UDCM commits that are in the trunk have 
> not been pushed over to 1.7:
> 
> So, as of this moment, OpenIB BTL is essentially dead-in-the-water in 1.7.
> 
> 
> 
>   [enable_connectx_xrc="$enableval"], 
> [enable_connectx_xrc="yes"])
>#
># Unconnect Datagram (UD) based connection manager
>#
> #AC_ARG_ENABLE([openib-udcm],
> #[AC_HELP_STRING([--enable-openib-udcm],
> #[Enable datagram connection support in openib BTL 
> (default: enabled)])], 
> #[enable_openib_udcm="$enableval"], 
> [enable_openib_udcm="yes"])
># Per discussion with Ralph and Nathan, disable UDCM for now.
># It's borked and needs some surgery to get back on its feet.
>enable_openib_udcm=no
> 
> 
> Josh
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
> (jsquyres)
> Sent: Thursday, November 14, 2013 6:44 AM
> To: 
> Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: 
> contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect
> 
> Does the openib *only* work with RDMACM now?
> 
> That's surprising (and bad!).
> 
> Did someone ask Mellanox about fixing the OOB and XOOB CPCs?
> 
> 
> On Nov 13, 2013, at 11:16 PM, svn-commit-mai...@open-mpi.org wrote:
> 
>> Author: rhc (Ralph Castain)
>> Date: 2013-11-13 23:16:53 EST (Wed, 13 Nov 2013)
>> New Revision: 29703
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29703
>> 
>> Log:
>> Given that the oob and xoob cpc's are no longer operable and haven't been 
>> since the OOB update, remove them to avoid confusion
>> 
>> cmr:v1.7.4:reviewer=hjelmn:subject=Remove stale cpcs from openib
>> 
>> Deleted:
>>  trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
>>  trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.h
>>  trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c
>>  trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.h
>> Text files modified: 
>>  trunk/contrib/platform/iu/odin/optimized.conf   | 1 
>> 
>>  trunk/contrib/platform/iu/odin/static.conf  | 1 
>> 
>>  trunk/ompi/mca/btl/openib/Makefile.am   |10 
>> 
>>  trunk/ompi/mca/btl/openib/connect/btl_openib_connect_base.c |14 
>> 
>>  /dev/null   |   975 
>> -   
>>  /dev/null   |18 
>> 
>>  /dev/null   |  1150 
>> 
>>  /dev/null   |19 
>> 
>>  8 files changed, 5 insertions(+), 2183 deletions(-)
>> 
>> Modified: trunk/contrib/platform/iu/odin/optimized.conf
>> ==
>> --- trunk/contrib/platform/iu/odin/optimized.confWed Nov 13 19:34:15 
>> 2013(r29702)
>> +++ trunk/contrib/platform/iu/odin/optimized.conf2013-11-13 23:16:53 EST 
>> (Wed, 13 Nov 2013)  (r29703)
>> @@ -80,7 +80,6 @@
>> 
>> ## Setup OpenIB
>> btl_openib_want_fork_support = 0 
>> -btl_openib_cpc_include = oob 
>> #btl_openib_receive_queues = 
>> P,128,256,64,32,32:S,2048,1024,128,32:S,12288,1024,128,32:S,65536,1024,128,32
>>  
>> 
>> ## Setup TCP
>> 
>> Modified: trunk/contrib/platform/iu/odin/static.conf
>> ==
>> --- trunk/contrib/platform/iu/odin/static.conf   Wed Nov 13 19:34:15 
>> 2013(r29702)
>> +++ 

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Joshua Ladd
Um, no. It's supposed to work with UDCM which doesn't appear to be enabled in 
1.7.

Per Ralph's comment to me last night:

"... you cannot use the oob connection manager. It doesn't work and was 
deprecated. You must use udcm, which is why things are supposed to be set to do 
so by default. Please check the openib connect priorities and correct them if 
necessary."

However, it's never been enabled in 1.7 - don't know what "borked" means, and 
from what Devendar tells me, several UDCM commits that are in the trunk have 
not been pushed over to 1.7:

So, as of this moment, OpenIB BTL is essentially dead-in-the-water in 1.7.



   [enable_connectx_xrc="$enableval"], 
[enable_connectx_xrc="yes"])
#
# Unconnect Datagram (UD) based connection manager
#
#AC_ARG_ENABLE([openib-udcm],
#[AC_HELP_STRING([--enable-openib-udcm],
#[Enable datagram connection support in openib BTL 
(default: enabled)])], 
#[enable_openib_udcm="$enableval"], 
[enable_openib_udcm="yes"])
# Per discussion with Ralph and Nathan, disable UDCM for now.
# It's borked and needs some surgery to get back on its feet.
enable_openib_udcm=no


Josh


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Thursday, November 14, 2013 6:44 AM
To: 
Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: 
contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

Does the openib *only* work with RDMACM now?

That's surprising (and bad!).

Did someone ask Mellanox about fixing the OOB and XOOB CPCs?


On Nov 13, 2013, at 11:16 PM, svn-commit-mai...@open-mpi.org wrote:

> Author: rhc (Ralph Castain)
> Date: 2013-11-13 23:16:53 EST (Wed, 13 Nov 2013)
> New Revision: 29703
> URL: https://svn.open-mpi.org/trac/ompi/changeset/29703
> 
> Log:
> Given that the oob and xoob cpc's are no longer operable and haven't been 
> since the OOB update, remove them to avoid confusion
> 
> cmr:v1.7.4:reviewer=hjelmn:subject=Remove stale cpcs from openib
> 
> Deleted:
>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.h
>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c
>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.h
> Text files modified: 
>   trunk/contrib/platform/iu/odin/optimized.conf   | 1 
> 
>   trunk/contrib/platform/iu/odin/static.conf  | 1 
> 
>   trunk/ompi/mca/btl/openib/Makefile.am   |10 
> 
>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_base.c |14 
> 
>   /dev/null   |   975 
> -   
>   /dev/null   |18 
> 
>   /dev/null   |  1150 
> 
>   /dev/null   |19 
> 
>   8 files changed, 5 insertions(+), 2183 deletions(-)
> 
> Modified: trunk/contrib/platform/iu/odin/optimized.conf
> ==
> --- trunk/contrib/platform/iu/odin/optimized.conf Wed Nov 13 19:34:15 
> 2013(r29702)
> +++ trunk/contrib/platform/iu/odin/optimized.conf 2013-11-13 23:16:53 EST 
> (Wed, 13 Nov 2013)  (r29703)
> @@ -80,7 +80,6 @@
> 
> ## Setup OpenIB
> btl_openib_want_fork_support = 0 
> -btl_openib_cpc_include = oob 
> #btl_openib_receive_queues = 
> P,128,256,64,32,32:S,2048,1024,128,32:S,12288,1024,128,32:S,65536,1024,128,32 
> 
> ## Setup TCP
> 
> Modified: trunk/contrib/platform/iu/odin/static.conf
> ==
> --- trunk/contrib/platform/iu/odin/static.confWed Nov 13 19:34:15 
> 2013(r29702)
> +++ trunk/contrib/platform/iu/odin/static.conf2013-11-13 23:16:53 EST 
> (Wed, 13 Nov 2013)  (r29703)
> @@ -80,7 +80,6 @@
> 
> ## Setup OpenIB
> btl_openib_want_fork_support = 0 
> -btl_openib_cpc_include = oob 
> #btl_openib_receive_queues = 
> P,128,256,64,32,32:S,2048,1024,128,32:S,12288,1024,128,32:S,65536,1024,128,32 
> 
> ## Setup TCP
> 
> Modified: trunk/ompi/mca/btl/openib/Makefile.am
> ==
> --- trunk/ompi/mca/btl/openib/Makefile.am Wed Nov 13 19:34:15 2013
> (r29702)
> +++ trunk/ompi/mca/btl/openib/Makefile.am 2013-11-13 23:16:53 EST (Wed, 
> 13 Nov 2013)  (r29703)
> @@ -14,6 +14,7 @@
> # Copyright (c) 2011  

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Jeff Squyres (jsquyres)
Does the openib *only* work with RDMACM now?

That's surprising (and bad!).

Did someone ask Mellanox about fixing the OOB and XOOB CPCs?


On Nov 13, 2013, at 11:16 PM, svn-commit-mai...@open-mpi.org wrote:

> Author: rhc (Ralph Castain)
> Date: 2013-11-13 23:16:53 EST (Wed, 13 Nov 2013)
> New Revision: 29703
> URL: https://svn.open-mpi.org/trac/ompi/changeset/29703
> 
> Log:
> Given that the oob and xoob cpc's are no longer operable and haven't been 
> since the OOB update, remove them to avoid confusion
> 
> cmr:v1.7.4:reviewer=hjelmn:subject=Remove stale cpcs from openib
> 
> Deleted:
>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_oob.h
>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c
>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.h
> Text files modified: 
>   trunk/contrib/platform/iu/odin/optimized.conf   | 1 
> 
>   trunk/contrib/platform/iu/odin/static.conf  | 1 
> 
>   trunk/ompi/mca/btl/openib/Makefile.am   |10 
> 
>   trunk/ompi/mca/btl/openib/connect/btl_openib_connect_base.c |14 
> 
>   /dev/null   |   975 
> -   
>   /dev/null   |18 
> 
>   /dev/null   |  1150 
> 
>   /dev/null   |19 
> 
>   8 files changed, 5 insertions(+), 2183 deletions(-)
> 
> Modified: trunk/contrib/platform/iu/odin/optimized.conf
> ==
> --- trunk/contrib/platform/iu/odin/optimized.conf Wed Nov 13 19:34:15 
> 2013(r29702)
> +++ trunk/contrib/platform/iu/odin/optimized.conf 2013-11-13 23:16:53 EST 
> (Wed, 13 Nov 2013)  (r29703)
> @@ -80,7 +80,6 @@
> 
> ## Setup OpenIB
> btl_openib_want_fork_support = 0 
> -btl_openib_cpc_include = oob 
> #btl_openib_receive_queues = 
> P,128,256,64,32,32:S,2048,1024,128,32:S,12288,1024,128,32:S,65536,1024,128,32 
> 
> ## Setup TCP
> 
> Modified: trunk/contrib/platform/iu/odin/static.conf
> ==
> --- trunk/contrib/platform/iu/odin/static.confWed Nov 13 19:34:15 
> 2013(r29702)
> +++ trunk/contrib/platform/iu/odin/static.conf2013-11-13 23:16:53 EST 
> (Wed, 13 Nov 2013)  (r29703)
> @@ -80,7 +80,6 @@
> 
> ## Setup OpenIB
> btl_openib_want_fork_support = 0 
> -btl_openib_cpc_include = oob 
> #btl_openib_receive_queues = 
> P,128,256,64,32,32:S,2048,1024,128,32:S,12288,1024,128,32:S,65536,1024,128,32 
> 
> ## Setup TCP
> 
> Modified: trunk/ompi/mca/btl/openib/Makefile.am
> ==
> --- trunk/ompi/mca/btl/openib/Makefile.am Wed Nov 13 19:34:15 2013
> (r29702)
> +++ trunk/ompi/mca/btl/openib/Makefile.am 2013-11-13 23:16:53 EST (Wed, 
> 13 Nov 2013)  (r29703)
> @@ -14,6 +14,7 @@
> # Copyright (c) 2011  NVIDIA Corporation.  All rights reserved.
> # Copyright (c) 2011  Mellanox Technologies.  All rights reserved.
> # Copyright (c) 2012  Oak Ridge National Laboratory.  All rights reserved
> +# Copyright (c) 2013  Intel, Inc. All rights reserved.
> # $COPYRIGHT$
> #
> # Additional copyrights may follow
> @@ -60,8 +61,6 @@
> btl_openib_ip.c \
> connect/base.h \
> connect/btl_openib_connect_base.c \
> -connect/btl_openib_connect_oob.c \
> -connect/btl_openib_connect_oob.h \
> connect/btl_openib_connect_empty.c \
> connect/btl_openib_connect_empty.h \
> connect/connect.h
> @@ -73,13 +72,6 @@
> btl_openib_failover.h
> endif
> 
> -# If we have XRC support, build that CPC
> -if MCA_btl_openib_have_xrc
> -sources += \
> -connect/btl_openib_connect_xoob.c \
> -connect/btl_openib_connect_xoob.h
> -endif
> -
> # If we have rdmacm support, build that CPC
> if MCA_btl_openib_have_rdmacm
> sources += \
> 
> Modified: trunk/ompi/mca/btl/openib/connect/btl_openib_connect_base.c
> ==
> --- trunk/ompi/mca/btl/openib/connect/btl_openib_connect_base.c   Wed Nov 
> 13 19:34:15 2013(r29702)
> +++ trunk/ompi/mca/btl/openib/connect/btl_openib_connect_base.c   
> 2013-11-13 23:16:53 EST (Wed, 13 Nov 2013)  (r29703)
> @@ -17,11 +17,7 @@
> #include "btl_openib.h"
> #include "btl_openib_proc.h"
> #include "connect/base.h"
> -#include "connect/btl_openib_connect_oob.h"
> #include "connect/btl_openib_connect_empty.h"
>