Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Gleb Natapov
On Thu, Jun 07, 2007 at 02:38:51PM -0400, George Bosilca wrote:
> 
> On Jun 7, 2007, at 1:28 PM, Gleb Natapov wrote:
> 
> >On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:
> >>) I expect you to revise the patch in order to propose a generic
> >>solution or I'll trigger a vote against the patch. I vote to be
> >>backed out of the trunk as it export way to much knowledge from the
> >>Open IB BTL into the PML layer.
> >The patch solves real problem. If we want to back it out we need to  
> >find
> >another solution. I also didn't like this change too much, but I  
> >thought
> >about other solutions and haven't found something better that what
> >Galen did. If you have something in mind lets discuss it.
> 
> Well, I didn't really investigate this matter, as for all devices  
> where I work this never happens. What really bother me, and which is  
> not as Galen describe it is the following line:
> 
> >frag->base.order = order
> 
> As the value of "order" come from the PML level and the Open IB BTL  
The intention is that most of the time PML will use NO_ORDER, but
sometimes order is important.

> use it without any change, to me it means that some knowledge  
> belonging to the BTL (BTL_OPENIB_LP_QP is clearly Open IB related  
> isn't it?) has ben exported to the PML. You can turn it any way you  
BTL_OPENIB_LP_QP is never used in PML code. It just like some kind of
cookie that is transparent to PML. You don't claim that we expose BTL
internals if BTL setups callback for PML to use, don't you?

> want, this clearly means that the PML is able to give direct orders  
> to the BTL which allow it to put the fragment in the high or low  
> priority queue (which is a VERY Open IB feature).
OpenIB and UDAPL are only two BTLs that can reorder packets currently, so
we have two choses here. We can prohibit from BTL to reorder packets or we
have to provide a way to insure ordering between certain packets. The
former will limit BTL interface IMHO.

> 
> Now, if we create a constant MCA_BTL_HP_QUEUE it might look better.  
> But, again as all others BTLs will completely ignore this, it look  
> like a Open IB feature exported to the PML to me.
> 
Galen currently works on adding arbitrary number of different sized
queues in openib BTL. There will be much more then two queues. This kind
of thing is really internal to BTL.

--
Gleb.


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Galen Shipman


On Jun 7, 2007, at 12:49 PM, Don Kerr wrote:

It would be difficult for me to attend this afternoon.  Tomorrow is  
much

better for me.



Brian and I are both out tomorrow. I think what we will do is have a  
call today, report back to the group and then if necessary have  
another call on Monday/Tuesday.


- Galen



-DON

George Bosilca wrote:


I'm available this afternoon.

   george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:


) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from  
the

Open IB BTL into the PML layer.


The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to  
send

significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following  
snippet

reflect exactly the same behavior as the original patch.


I didn't try to change the semantic. Just make the code to match  
the

semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



- 
---


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Brian Barrett

I'm available this afternoon...

Brian

On Jun 7, 2007, at 12:39 PM, George Bosilca wrote:


I'm available this afternoon.

  george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:

) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.

The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following  
snippet

reflect exactly the same behavior as the original patch.

I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Don Kerr
It would be difficult for me to attend this afternoon.  Tomorrow is much 
better for me.


-DON

George Bosilca wrote:


I'm available this afternoon.

   george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:


) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.


The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following snippet
reflect exactly the same behavior as the original patch.


I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 



Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread George Bosilca

I'm available this afternoon.

  george.

On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote:



Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:

) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.

The patch solves real problem. If we want to back it out we need to
find
another solution. I also didn't like this change too much, but I
thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following snippet
reflect exactly the same behavior as the original patch.

I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread George Bosilca


On Jun 7, 2007, at 1:28 PM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:

) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.
The patch solves real problem. If we want to back it out we need to  
find
another solution. I also didn't like this change too much, but I  
thought

about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.


Well, I didn't really investigate this matter, as for all devices  
where I work this never happens. What really bother me, and which is  
not as Galen describe it is the following line:



frag->base.order = order


As the value of "order" come from the PML level and the Open IB BTL  
use it without any change, to me it means that some knowledge  
belonging to the BTL (BTL_OPENIB_LP_QP is clearly Open IB related  
isn't it?) has ben exported to the PML. You can turn it any way you  
want, this clearly means that the PML is able to give direct orders  
to the BTL which allow it to put the fragment in the high or low  
priority queue (which is a VERY Open IB feature).


Now, if we create a constant MCA_BTL_HP_QUEUE it might look better.  
But, again as all others BTLs will completely ignore this, it look  
like a Open IB feature exported to the PML to me.





As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following snippet
reflect exactly the same behavior as the original patch.

I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.


You didn't change compared with Galen patch. But the same knowledge  
export happens with your patch (for the same reasons as described  
above).




--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Galen Shipman


Are people available today to discuss this over the phone?

- Galen



On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote:


On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:

) I expect you to revise the patch in order to propose a generic
solution or I'll trigger a vote against the patch. I vote to be
backed out of the trunk as it export way to much knowledge from the
Open IB BTL into the PML layer.
The patch solves real problem. If we want to back it out we need to  
find
another solution. I also didn't like this change too much, but I  
thought

about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.



  george.

PS: With Gleb changes the problem is the same. The following snippet
reflect exactly the same behavior as the original patch.

I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Gleb Natapov
On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote:
> ) I expect you to revise the patch in order to propose a generic  
> solution or I'll trigger a vote against the patch. I vote to be  
> backed out of the trunk as it export way to much knowledge from the  
> Open IB BTL into the PML layer.
The patch solves real problem. If we want to back it out we need to find
another solution. I also didn't like this change too much, but I thought
about other solutions and haven't found something better that what
Galen did. If you have something in mind lets discuss it.

As a general comment this kind of discussion is why I prefer to send
significant changes as a patch to the list for discussion before
committing.

> 
>   george.
> 
> PS: With Gleb changes the problem is the same. The following snippet  
> reflect exactly the same behavior as the original patch.
I didn't try to change the semantic. Just make the code to match the
semantic that Galen described.

--
Gleb.


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Galen Shipman


On Jun 7, 2007, at 9:11 AM, George Bosilca wrote:

There is something weird with this change, and the patch reflect  
it. The new argument "order" come from the PML level and might be  
MCA_BTL_NO_ORDER (which is kind of global) or BTL_OPENIB_LP_QP or  
BTL_OPENIB_HP_QP (which are definitively Open IB related). Do you  
really intend to let the PML knows about Open IB internal constants ?


No, the PML knows only one thing about the order tag, it is either  
MCA_BTL_NO_ORDER or it is something that the BTL assigns.
The PML has no idea about BTL_OPENIB_LP_QP or BTL_OPENIB_HP_QP, to  
the PML it is just an order tag assigned to a fragment by the BTL.


So the semantics are that after a btl_send/put/get an order tag may  
be assigned by the BTL to the descriptor,
This order tag can then be specified to subsequent calls to btl_alloc  
or btl_prepare. The PML has no idea what the value means, other than  
he is requesting a descriptor that will be ordered w.r.t. a  
previously transmitted descriptor.




If it's the case (which seems to be true from the following snippet
if(MCA_BTL_NO_ORDER == order) {
frag->base.order = BTL_OPENIB_LP_QP;
} else {
frag->base.order = order;
}
So I am choosing some ordering to use here because the PML told me he  
doesn't care, what is wrong with this?




) I expect you to revise the patch in order to propose a generic  
solution or I'll trigger a vote against the patch.
This exports no knowledge of the Open IB BTL to the PML layer, the  
PML doesn't know that this is a QP index, he doesn't care! The PML  
simply uses this value (if it wants to) to request ordering with  
subsequent fragments. We use the QP index only as a BTL optimization,  
it could have been anything. So the only new knowledge that the PML  
has is how to request that ordering of fragments be enforced, and the  
BTL doesn't even have to provide this if it doesn't want, that is the  
reason for MCA_BTL_NO_ORDER.



Please describe a use case where this is not a generic solution. Keep  
in mind that MX, TCP, GM all can provide ordering guarantees if they  
wish, in fact for MX you can simply always assign an order tag, say  
the value is 1. MX can then guarantee ordering of all fragments sent  
over the same BTL.



I vote to be backed out of the trunk as it export way to much  
knowledge from the Open IB BTL into the PML layer.


The only other option that I have identified that doesn't push PML  
level protocol into the BTL is to require that BTLs always guarantee  
ordering of fragments sent/put/get over the same BTL.





  george.

PS: With Gleb changes the problem is the same. The following  
snippet reflect exactly the same behavior as the original patch.


Gleb's changes don't change the semantic guarantees that I have  
described above.






frag->base.order = order;
assert(frag->base.order != BTL_OPENIB_HP_QP);

On Jun 7, 2007, at 9:49 AM, Gleb Natapov wrote:


Hi Galen,

On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote:



With current code this is not the case. Order tag is set during a
fragment
allocation. It seems wrong according to your description. Attached
patch fixes
this. If no specific ordering tag is provided to allocation
function order of
the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/
get order
is set to whatever QP was used for communication. If order is set
before send call
it is used to choose QP.



I do set the order tag during allocation/prepare, but the defined
semantics are that the tag is only valid after send/put/get. We can
set them up any where we wish in the BTL, the PML however cannot  
rely

on anything until after the send/put/get call. So really this is an
issue of semantics versus implementation. The implementation I
believe does conform to the semantics as the upper layer (PML)
doesn't use the tag value until after a call to send/put/get.

I will look over the patch however, might make more sense to delay
setting the value until the actual send/put/get call.


Have you had a chance to look over the patch?

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread George Bosilca
There is something weird with this change, and the patch reflect it.  
The new argument "order" come from the PML level and might be  
MCA_BTL_NO_ORDER (which is kind of global) or BTL_OPENIB_LP_QP or  
BTL_OPENIB_HP_QP (which are definitively Open IB related). Do you  
really intend to let the PML knows about Open IB internal constants ?  
If it's the case (which seems to be true from the following snippet

if(MCA_BTL_NO_ORDER == order) {
frag->base.order = BTL_OPENIB_LP_QP;
} else {
frag->base.order = order;
}
) I expect you to revise the patch in order to propose a generic  
solution or I'll trigger a vote against the patch. I vote to be  
backed out of the trunk as it export way to much knowledge from the  
Open IB BTL into the PML layer.


  george.

PS: With Gleb changes the problem is the same. The following snippet  
reflect exactly the same behavior as the original patch.


frag->base.order = order;
assert(frag->base.order != BTL_OPENIB_HP_QP);

On Jun 7, 2007, at 9:49 AM, Gleb Natapov wrote:


Hi Galen,

On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote:



With current code this is not the case. Order tag is set during a
fragment
allocation. It seems wrong according to your description. Attached
patch fixes
this. If no specific ordering tag is provided to allocation
function order of
the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/
get order
is set to whatever QP was used for communication. If order is set
before send call
it is used to choose QP.



I do set the order tag during allocation/prepare, but the defined
semantics are that the tag is only valid after send/put/get. We can
set them up any where we wish in the BTL, the PML however cannot rely
on anything until after the send/put/get call. So really this is an
issue of semantics versus implementation. The implementation I
believe does conform to the semantics as the upper layer (PML)
doesn't use the tag value until after a call to send/put/get.

I will look over the patch however, might make more sense to delay
setting the value until the actual send/put/get call.


Have you had a chance to look over the patch?

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Gleb Natapov
Hi Galen,

On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote:
> 
> > With current code this is not the case. Order tag is set during a  
> > fragment
> > allocation. It seems wrong according to your description. Attached  
> > patch fixes
> > this. If no specific ordering tag is provided to allocation  
> > function order of
> > the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/ 
> > get order
> > is set to whatever QP was used for communication. If order is set  
> > before send call
> > it is used to choose QP.
> >
> 
> I do set the order tag during allocation/prepare, but the defined  
> semantics are that the tag is only valid after send/put/get. We can  
> set them up any where we wish in the BTL, the PML however cannot rely  
> on anything until after the send/put/get call. So really this is an  
> issue of semantics versus implementation. The implementation I  
> believe does conform to the semantics as the upper layer (PML)  
> doesn't use the tag value until after a call to send/put/get.
> 
> I will look over the patch however, might make more sense to delay  
> setting the value until the actual send/put/get call.
> 
Have you had a chance to look over the patch?

--
Gleb.


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-05-27 Thread Gleb Natapov
On Sun, May 27, 2007 at 10:23:23AM -0600, Galen Shipman wrote:
> >>
> >>
> >> The problem is that MCA_BTL_DES_FLAGS_PRIORITY  was meant to indicate
> >> that the fragment was higher priority, but the fragment isn't higher
> >> priority. It simply needs to be ordered w.r.t. a previous fragment,
> >> an RDMA in this case.
> > But after the change priority flags is totally ignored.
> >
> 
> 
> So the priority flag was broken I think, in OpenIB we used the  
> priority flag to choose a QP on which to send the fragment, but there  
> was no checking for if the fragment could be sent on the specified  
> QP. So a max send size fragment could be set as priority and the BTL  
> would use the high priority QP and we would go bang. This is how I  
> read the code, I might have missed something.
You a right. We kinda rely on PML here. PML always send "max send" sized
packets with low priority.

> 
> If the priority flag is simply a "hint" and has hard requirements  
> than we can still use it in the OpenIB BTL but it won't have any  
> effect as only eager size fragments can be marked high priority and  
> we already send these over the high priority QP.

Sometimes prepare_src may return eager sized fragment and it will be
sent on low priority QP.

--
Gleb.


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-05-27 Thread Gleb Natapov
On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote:
> 
> > With current code this is not the case. Order tag is set during a  
> > fragment
> > allocation. It seems wrong according to your description. Attached  
> > patch fixes
> > this. If no specific ordering tag is provided to allocation  
> > function order of
> > the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/ 
> > get order
> > is set to whatever QP was used for communication. If order is set  
> > before send call
> > it is used to choose QP.
> >
> 
> I do set the order tag during allocation/prepare, but the defined  
> semantics are that the tag is only valid after send/put/get. We can  
> set them up any where we wish in the BTL, the PML however cannot rely  
> on anything until after the send/put/get call. So really this is an  
> issue of semantics versus implementation. The implementation I  
> believe does conform to the semantics as the upper layer (PML)  
> doesn't use the tag value until after a call to send/put/get.
>
With current approach send function doesn't know the difference between
fragment that should be ordered because PML requested it and regular
fragment without ordering requirements.

--
Gleb.


Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-05-27 Thread Galen Shipman

doh,, correction..

On May 27, 2007, at 10:23 AM, Galen Shipman wrote:




The problem is that MCA_BTL_DES_FLAGS_PRIORITY  was meant to  
indicate

that the fragment was higher priority, but the fragment isn't higher
priority. It simply needs to be ordered w.r.t. a previous fragment,
an RDMA in this case.

But after the change priority flags is totally ignored.




So the priority flag was broken I think, in OpenIB we used the
priority flag to choose a QP on which to send the fragment, but there
was no checking for if the fragment could be sent on the specified
QP. So a max send size fragment could be set as priority and the BTL
would use the high priority QP and we would go bang. This is how I
read the code, I might have missed something.

If the priority flag is simply a "hint" and has

*NO*

hard requirements
than we can still use it in the OpenIB BTL but it won't have any
effect as only eager size fragments can be marked high priority and
we already send these over the high priority QP.

- Galen






--
Gleb.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-05-27 Thread Galen Shipman



The problem is that MCA_BTL_DES_FLAGS_PRIORITY  was meant to indicate
that the fragment was higher priority, but the fragment isn't higher
priority. It simply needs to be ordered w.r.t. a previous fragment,
an RDMA in this case.

But after the change priority flags is totally ignored.




So the priority flag was broken I think, in OpenIB we used the  
priority flag to choose a QP on which to send the fragment, but there  
was no checking for if the fragment could be sent on the specified  
QP. So a max send size fragment could be set as priority and the BTL  
would use the high priority QP and we would go bang. This is how I  
read the code, I might have missed something.


If the priority flag is simply a "hint" and has hard requirements  
than we can still use it in the OpenIB BTL but it won't have any  
effect as only eager size fragments can be marked high priority and  
we already send these over the high priority QP.


- Galen






--
Gleb.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-05-27 Thread Gleb Natapov
On Fri, May 25, 2007 at 09:31:33PM -0600, Galen Shipman wrote:
> 
> On May 24, 2007, at 2:48 PM, George Bosilca wrote:
> 
> > I see the problem this patch try to solve, but I fail to correctly  
> > understand the implementation. The patch affect all PML and BTL in  
> > the code base by adding one more argument to some of the most often  
> > called functions. And there is only one BTL (openib) who seems to  
> > use it while all others completely ignore it. Moreover, there seems  
> > to be already a very similar mechanism based on the  
> > MCA_BTL_DES_FLAGS_PRIORITY flag, which can be set by the PML level  
> > into the btl_descriptor.
> >
> > So what's the difference between the additional argument and a  
> > correct usage of the MCA_BTL_DES_FLAGS_PRIORITY flag ?
> 
> 
> The problem is that MCA_BTL_DES_FLAGS_PRIORITY  was meant to indicate  
> that the fragment was higher priority, but the fragment isn't higher  
> priority. It simply needs to be ordered w.r.t. a previous fragment,  
> an RDMA in this case.
But after the change priority flags is totally ignored.

> This being said, we could have just added an rdma fin flag, but this  
> would mix protocol a bit too much between the BTL and the PML in my  
> opinion.
> What we have with this fix is that the BTL can assign an order tag to  
> any descriptor if it wishes, this order tag is only valid after a  
> call to btl_send or btl_put/get. This order tag can then be used to  
With current code this is not the case. Order tag is set during a fragment
allocation. It seems wrong according to your description. Attached patch fixes
this. If no specific ordering tag is provided to allocation function order of
the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/get order
is set to whatever QP was used for communication. If order is set before send 
call
it is used to choose QP.

--
Gleb.
diff --git a/ompi/mca/btl/openib/btl_openib.c b/ompi/mca/btl/openib/btl_openib.c
index 2330dc0..fadb2bd 100644
--- a/ompi/mca/btl/openib/btl_openib.c
+++ b/ompi/mca/btl/openib/btl_openib.c
@@ -377,16 +377,8 @@ mca_btl_base_descriptor_t* mca_btl_openib_alloc(

 if(size <= mca_btl_openib_component.eager_limit){ 
 MCA_BTL_IB_FRAG_ALLOC_EAGER(btl, frag, rc);
-if(order == MCA_BTL_NO_ORDER) {
-order = BTL_OPENIB_HP_QP;
-} 
 } else if(size <= mca_btl_openib_component.max_send_size) { 
-if(order == MCA_BTL_NO_ORDER) { 
-order = BTL_OPENIB_LP_QP; 
-} else if(order != BTL_OPENIB_LP_QP) { 
-return NULL;
-}
-
+assert(order != BTL_OPENIB_HP_QP);
 MCA_BTL_IB_FRAG_ALLOC_MAX(btl, frag, rc); 
 }

@@ -502,6 +494,7 @@ mca_btl_base_descriptor_t* mca_btl_openib_prepare_src(
 frag->base.des_dst = NULL;
 frag->base.des_dst_cnt = 0;
 frag->base.des_flags = 0;
+frag->base.order = order;

 frag->sg_entry.length = max_data;
 frag->sg_entry.lkey = openib_reg->mr->lkey;
@@ -511,12 +504,6 @@ mca_btl_base_descriptor_t* mca_btl_openib_prepare_src(
 frag->segment.seg_addr.pval = iov.iov_base;
 frag->segment.seg_key.key32[0] = (uint32_t)frag->sg_entry.lkey;

-if(MCA_BTL_NO_ORDER == order) { 
-frag->base.order = BTL_OPENIB_LP_QP;
-} else { 
-frag->base.order = order;
-}
-
 BTL_VERBOSE(("frag->sg_entry.lkey = %lu .addr = %llu "
 "frag->segment.seg_key.key32[0] = %lu",
 frag->sg_entry.lkey, frag->sg_entry.addr,
@@ -529,21 +516,13 @@ mca_btl_base_descriptor_t* mca_btl_openib_prepare_src(
 if(max_data + reserve <= btl->btl_eager_limit) {
 /* the data is small enough to fit in the eager frag and
  * memory is not prepinned */
-
 MCA_BTL_IB_FRAG_ALLOC_EAGER(btl, frag, rc);
-if(MCA_BTL_NO_ORDER == order) { 
-order = BTL_OPENIB_LP_QP;
-}
 }

 if(NULL == frag) {
 /* the data doesn't fit into eager frag or eager frag is
  * not available */
-if(MCA_BTL_NO_ORDER == order) { 
-order = BTL_OPENIB_LP_QP;
-} else if(BTL_OPENIB_HP_QP == order){ 
-return NULL;
-} 
+assert(order != BTL_OPENIB_HP_QP);
 MCA_BTL_IB_FRAG_ALLOC_MAX(btl, frag, rc);

 if(NULL == frag) {
@@ -571,8 +550,6 @@ mca_btl_base_descriptor_t* mca_btl_openib_prepare_src(
 frag->base.des_dst_cnt = 0;
 frag->base.des_flags = 0;

-
-
 return >base;
 }

@@ -641,12 +618,7 @@ mca_btl_base_descriptor_t* mca_btl_openib_prepare_dst(
 frag->base.des_src_cnt = 0;
 frag->base.des_flags = 0;

-if(MCA_BTL_NO_ORDER == order) { 
-frag->base.order = BTL_OPENIB_LP_QP;
-} else { 
-frag->base.order = order;
-}
-
+frag->base.order = order;

 BTL_VERBOSE(("frag->sg_entry.lkey 

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-05-24 Thread George Bosilca
I see the problem this patch try to solve, but I fail to correctly  
understand the implementation. The patch affect all PML and BTL in  
the code base by adding one more argument to some of the most often  
called functions. And there is only one BTL (openib) who seems to use  
it while all others completely ignore it. Moreover, there seems to be  
already a very similar mechanism based on the  
MCA_BTL_DES_FLAGS_PRIORITY flag, which can be set by the PML level  
into the btl_descriptor.


So what's the difference between the additional argument and a  
correct usage of the MCA_BTL_DES_FLAGS_PRIORITY flag ?


  george.

On May 24, 2007, at 3:51 PM, gship...@osl.iu.edu wrote:


Author: gshipman
Date: 2007-05-24 15:51:26 EDT (Thu, 24 May 2007)
New Revision: 14768
URL: https://svn.open-mpi.org/trac/ompi/changeset/14768

Log:
Add optional ordering to the BTL interface.
This is required to tighten up the BTL semantics. Ordering is not  
guaranteed,

but, if the BTL returns a order tag in a descriptor (other than
MCA_BTL_NO_ORDER) then we may request another descriptor that will  
obey

ordering w.r.t. to the other descriptor.


This will allow sane behavior for RDMA networks, where local  
completion of an
RDMA operation on the active side does not imply remote completion  
on the
passive side. If we send a FIN message after local completion and  
the FIN is
not ordered w.r.t. the RDMA operation then badness may occur as the  
passive
side may now try to deregister the memory and the RDMA operation  
may still be

pending on the passive side.

Note that this has no impact on networks that don't suffer from this
limitation as the ORDER tag can simply always be specified as
MCA_BTL_NO_ORDER.





Text files modified:
   trunk/ompi/mca/bml/bml.h |29  
+++
   trunk/ompi/mca/btl/btl.h |10  


   trunk/ompi/mca/btl/gm/btl_gm.c   | 8 ++
   trunk/ompi/mca/btl/gm/btl_gm.h   | 3 ++
   trunk/ompi/mca/btl/mx/btl_mx.c   | 8 ++
   trunk/ompi/mca/btl/mx/btl_mx.h   | 3 ++
   trunk/ompi/mca/btl/openib/btl_openib.c   |49  
++-

   trunk/ompi/mca/btl/openib/btl_openib.h   | 3 ++
   trunk/ompi/mca/btl/openib/btl_openib_endpoint.c  | 7 +++--
   trunk/ompi/mca/btl/openib/btl_openib_frag.c  | 7 +
   trunk/ompi/mca/btl/portals/btl_portals.c | 8 -
   trunk/ompi/mca/btl/portals/btl_portals.h | 3 ++
   trunk/ompi/mca/btl/self/btl_self.c   | 3 ++
   trunk/ompi/mca/btl/self/btl_self.h   | 3 ++
   trunk/ompi/mca/btl/sm/btl_sm.c   | 2 +
   trunk/ompi/mca/btl/sm/btl_sm.h   | 2 +
   trunk/ompi/mca/btl/tcp/btl_tcp.c | 6 
   trunk/ompi/mca/btl/tcp/btl_tcp.h | 3 ++
   trunk/ompi/mca/btl/template/btl_template.c   | 8 -
   trunk/ompi/mca/btl/template/btl_template.h   | 3 ++
   trunk/ompi/mca/btl/template/btl_template_component.c |10 +++ 
+---
   trunk/ompi/mca/btl/udapl/btl_udapl.c |11  
++--

   trunk/ompi/mca/btl/udapl/btl_udapl.h | 3 ++
   trunk/ompi/mca/btl/udapl/btl_udapl_component.c   |17  


   trunk/ompi/mca/osc/rdma/osc_rdma_data_move.c | 3 ++
   trunk/ompi/mca/pml/dr/pml_dr.h   | 6 ++--
   trunk/ompi/mca/pml/dr/pml_dr_sendreq.c   |12  
-

   trunk/ompi/mca/pml/dr/pml_dr_sendreq.h   | 3 +
   trunk/ompi/mca/pml/ob1/pml_ob1.c |17  
-
   trunk/ompi/mca/pml/ob1/pml_ob1.h |44  
+--
   trunk/ompi/mca/pml/ob1/pml_ob1_recvreq.c |14  
+++
   trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.c |28  
--

   32 files changed, 241 insertions(+), 95 deletions(-)


Diff not shown due to size (53504 bytes).
To see the diff, run the following command:

svn diff -r 14767:14768 --no-diff-deleted

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn




smime.p7s
Description: S/MIME cryptographic signature