The proof of the pudding is that all of the MPI layer has been adapted to the
new async behavior -except- for the openib cpc's. The issue of what to do with
these has been raised several times, especially once the ofacm code was
committed. Unfortunately, lack of time and priorities left this
On Nov 14, 2013, at 1:16 PM, Shamis, Pavel wrote:
>> 1. Ralph made the OOB asynchronous.
I pondered this for awhile today, and I just want to correct any misimpression
this statement might leave, especially with folks who haven't been around the
project that much over the
Thanks Pasha!!
On Nov 14, 2013, at 4:34 PM, Shamis, Pavel wrote:
> For Iboffload this should not be an issue since our connection manager is
> blocking (I have to double-check )
>
> For openib, this should not be such huge change. The code is pretty much
> standalone, we
For Iboffload this should not be an issue since our connection manager is
blocking (I have to double-check )
For openib, this should not be such huge change. The code is pretty much
standalone, we only have to move it to
main thread and add signaling mechanism.
I will take a look.
Best,
On Nov 14, 2013, at 4:22 PM, Shamis, Pavel wrote:
> Well, this is major change in a behavior.
>
> Since openib calls communication calls from the callback
> it pretty much requires to enable thread safety on openib btl level.
Ah, yes - could well be true. Or else separate
Well, this is major change in a behavior.
Since openib calls communication calls from the callback
it pretty much requires to enable thread safety on openib btl level.
But we may move the queue flush operation from the callback to main thread, so
the progress engine will wait on a signal from
On Nov 14, 2013, at 3:33 PM, Shamis, Pavel wrote:
>
>> The only change is that the receive callback is now occurring in the ORTE
>> event thread, and so perhaps someone needs to look at a way to pass that
>> back into the OMPI event base (which I guess is the OPAL event
> The only change is that the receive callback is now occurring in the ORTE
> event thread, and so perhaps someone needs to look at a way to pass that back
> into the OMPI event base (which I guess is the OPAL event base)? Just
> glancing at the code, it looks like that could be the issue -
On Nov 14, 2013, at 3:07 PM, Shamis, Pavel wrote:
>
>> So far as I can tell, the issue is one of blocking. The OOB handshake is now
>> async - i.e., you post a non-blocking recv at the beginning of time, and
>> then do a non-blocking send to the other side when you want to
> So far as I can tell, the issue is one of blocking. The OOB handshake is now
> async - i.e., you post a non-blocking recv at the beginning of time, and then
> do a non-blocking send to the other side when you want to create a
> connection. The question is: how do you know when that
So far as I can tell, the issue is one of blocking. The OOB handshake is now
async - i.e., you post a non-blocking recv at the beginning of time, and then
do a non-blocking send to the other side when you want to create a connection.
The question is: how do you know when that connection is
>
> 1. Ralph made the OOB asynchronous.
>
Ralph,
I'm not familiar with details of the change. If out-of-band communication is
supported, it should not be
that huge change for XOOB/OOB.
Ah, thanks! Yes indeed
On Nov 14, 2013, at 1:05 PM, Thomas Naughton wrote:
> Hi Ralph,
>
> Does the version in AM_INIT_AUTOMAKE in configure.ac also need to be
> increased? It currently shows 1.11.
>
> Thanks,
> --tjn
>
>
Comments inline.
>
> 3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm in
> that move. Never changed openib to use ofacm/common.
Pasha: This is not entirely true. I changed openib btl ~3 year ago before my
departure from Mellanox. (I sent link to the code earlier).
On Nov 14, 2013, at 12:42 PM, Joshua Ladd wrote:
> We are happy to provide access to our set of small test clusters and
> engineering resources, but, honestly, Nathan/LANL guys probably have better
> access to a big IB system.
>
> I’m sure your boss could care less,
Hi Ralph,
Does the version in AM_INIT_AUTOMAKE in configure.ac also need to be
increased? It currently shows 1.11.
Thanks,
--tjn
_
Thomas Naughton naught...@ornl.gov
Research
Ha! Jeff points out that our web site says we are at AM 1.12.2 - yet our
HACKING file says 1.11.1 Sadness
I'll leave the romio update alone and update the HACKING file to avoid future
confusion
On Nov 14, 2013, at 12:41 PM, Ralph Castain wrote:
> Just in case others are
We are happy to provide access to our set of small test clusters and
engineering resources, but, honestly, Nathan/LANL guys probably have better
access to a big IB system.
I'm sure your boss could care less, but this is not Intel's code base. Sorry to
be so blunt about it, Ralph. We've
Just in case others are encountering this: the recent ROMIO update contains a
line in its configure.ac that breaks the trunk for automake versions less than
1.12:
"I've looked a bit around online for this, and the consensus generally seems to
be that AM_PROG_AR should be added in libtool, not
On Nov 14, 2013, at 12:21 PM, Barrett, Brian W wrote:
> On 11/14/13 1:13 PM, "Joshua Ladd" wrote:
>
>> Let me try to summarize my understanding of the situation:
>>
>> 1. Ralph made the OOB asynchronous.
>>
>> 2. OOB cpcs don't work as a result of
On 11/14/13 1:13 PM, "Joshua Ladd" wrote:
>Let me try to summarize my understanding of the situation:
>
>1. Ralph made the OOB asynchronous.
>
>2. OOB cpcs don't work as a result of 1, and are thereby "deprecated",
>meaning: won't fix.
>
>3. Pasha moved the openib/connect
Let me try to summarize my understanding of the situation:
1. Ralph made the OOB asynchronous.
2. OOB cpcs don't work as a result of 1, and are thereby "deprecated", meaning:
won't fix.
3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm in
that move. Never changed
I'm a bit outdated. What it the problem with oob / xoob ?
-Pasha
On Nov 14, 2013, at 3:07 PM, "Hjelm, Nathan T" wrote:
> I don't think so. From what I understand the iboffload component may not live
> much longer because of
> Mellanox's fork of Cheetah. So, it might not
I don't think so. From what I understand the iboffload component may not live
much longer because of
Mellanox's fork of Cheetah. So, it might not matter.
-Nathan
Excuse the *&(#$y Outlook posting-style. OWA sucks.
From: devel [devel-boun...@open-mpi.org]
The key question, though, is: has anyone checked to see if the ofacm code even
works any more??
Only oob and xoob components appear to be present - so unless someone fixed
those since they were originally copied from openib, I doubt ofacm works.
On Nov 14, 2013, at 11:08 AM, Shamis, Pavel
There is some confusion in the thread. UDCM is just another CPC, like XOOB,
OOB, and RDMACM (I think IBCM is officially dead).
XOOB and OOB don't use UDCM, they relay on ORTE out-of-band communication.
OpenIB/connect supports UDCM,XOOB,OOB, and RDMACM
OFACM supports (at least last time when we
Here's the core problem: it isn't a question of "if" some of these things
should be resolved, but "who". They've been around for a very long time, but
nobody has the time or will to fix them. I have no access to such machines, so
all I can do is verify that it sorta compiles and is consistent
Unless someone went in and "fixed" the code in common (judging by the comments,
fixed seems to imply porting (x)oob to use UDCM, which hasn't been done at all
in the context of xoob and is incompletely patched and remains unusable as a
replacement for oob in 1.7.4), there is no reason to
On Nov 14, 2013, at 1:03 PM, Ralph Castain wrote:
>> 1) What the status of UDCM is (does it work reliably, does it support
>> XRC, etc.)
>
> Seems to be working okay on the IB systems at LANL and IU. Don't know about
> XRC - I seem to recall the answer is "no"
FWIW, I
On Nov 14, 2013, at 9:33 AM, Barrett, Brian W wrote:
> On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)" wrote:
>
>> Does XRC work with the UDCM CPC?
>>
>>
>> On Nov 14, 2013, at 9:35 AM, Ralph Castain wrote:
>>
>>> I think the
On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)" wrote:
>Does XRC work with the UDCM CPC?
>
>
>On Nov 14, 2013, at 9:35 AM, Ralph Castain wrote:
>
>> I think the problems in udcm were fixed by Nathan quite some time ago,
>>but never moved to 1.7 as everyone
Yeah, I believe that is true as well. However, we have been bugging people to
fix this for a long time, and nobody appears to have the cycles to do so.
As a reminder: we have to remove all OOB dependence on connections in the BTL
as we are moving the BTLs to OPAL. Hence, there is no interest in
I see comments alluding to the fact that it does not, but with the intentions
to add it: Hopefully, xoob in common will work.
#if HAVE_XRC
if (mca_btl_openib_component.num_xrc_qps > 0) {
BTL_VERBOSE(("UD CPC does not support XRC QPs (yet)"));
break;
}
#endif
for (i = 0
When I looked at the code last time - no.
(The connection state machine is very different)
Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Nov 14, 2013, at 11:51 AM, Jeff Squyres (jsquyres)
Does XRC work with the UDCM CPC?
On Nov 14, 2013, at 9:35 AM, Ralph Castain wrote:
> I think the problems in udcm were fixed by Nathan quite some time ago, but
> never moved to 1.7 as everyone was told that the connect code in openib was
> already deprecated pending merge
I think the problems in udcm were fixed by Nathan quite some time ago, but
never moved to 1.7 as everyone was told that the connect code in openib was
already deprecated pending merge with the new ofacm common code. Looking over
at that area, I see only oob and xoob - so if the users of the
Um, no. It's supposed to work with UDCM which doesn't appear to be enabled in
1.7.
Per Ralph's comment to me last night:
"... you cannot use the oob connection manager. It doesn't work and was
deprecated. You must use udcm, which is why things are supposed to be set to do
so by default.
Does the openib *only* work with RDMACM now?
That's surprising (and bad!).
Did someone ask Mellanox about fixing the OOB and XOOB CPCs?
On Nov 13, 2013, at 11:16 PM, svn-commit-mai...@open-mpi.org wrote:
> Author: rhc (Ralph Castain)
> Date: 2013-11-13 23:16:53 EST (Wed, 13 Nov 2013)
> New
38 matches
Mail list logo