Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
On Thu, Jun 07, 2007 at 02:38:51PM -0400, George Bosilca wrote: > > On Jun 7, 2007, at 1:28 PM, Gleb Natapov wrote: > > >On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote: > >>) I expect you to revise the patch in order to propose a generic > >>solution or I'll trigger a vote against the patch. I vote to be > >>backed out of the trunk as it export way to much knowledge from the > >>Open IB BTL into the PML layer. > >The patch solves real problem. If we want to back it out we need to > >find > >another solution. I also didn't like this change too much, but I > >thought > >about other solutions and haven't found something better that what > >Galen did. If you have something in mind lets discuss it. > > Well, I didn't really investigate this matter, as for all devices > where I work this never happens. What really bother me, and which is > not as Galen describe it is the following line: > > >frag->base.order = order > > As the value of "order" come from the PML level and the Open IB BTL The intention is that most of the time PML will use NO_ORDER, but sometimes order is important. > use it without any change, to me it means that some knowledge > belonging to the BTL (BTL_OPENIB_LP_QP is clearly Open IB related > isn't it?) has ben exported to the PML. You can turn it any way you BTL_OPENIB_LP_QP is never used in PML code. It just like some kind of cookie that is transparent to PML. You don't claim that we expose BTL internals if BTL setups callback for PML to use, don't you? > want, this clearly means that the PML is able to give direct orders > to the BTL which allow it to put the fragment in the high or low > priority queue (which is a VERY Open IB feature). OpenIB and UDAPL are only two BTLs that can reorder packets currently, so we have two choses here. We can prohibit from BTL to reorder packets or we have to provide a way to insure ordering between certain packets. The former will limit BTL interface IMHO. > > Now, if we create a constant MCA_BTL_HP_QUEUE it might look better. > But, again as all others BTLs will completely ignore this, it look > like a Open IB feature exported to the PML to me. > Galen currently works on adding arbitrary number of different sized queues in openib BTL. There will be much more then two queues. This kind of thing is really internal to BTL. -- Gleb.
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
On Jun 7, 2007, at 12:49 PM, Don Kerr wrote: It would be difficult for me to attend this afternoon. Tomorrow is much better for me. Brian and I are both out tomorrow. I think what we will do is have a call today, report back to the group and then if necessary have another call on Monday/Tuesday. - Galen -DON George Bosilca wrote: I'm available this afternoon. george. On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote: Are people available today to discuss this over the phone? - Galen On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote: On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote: ) I expect you to revise the patch in order to propose a generic solution or I'll trigger a vote against the patch. I vote to be backed out of the trunk as it export way to much knowledge from the Open IB BTL into the PML layer. The patch solves real problem. If we want to back it out we need to find another solution. I also didn't like this change too much, but I thought about other solutions and haven't found something better that what Galen did. If you have something in mind lets discuss it. As a general comment this kind of discussion is why I prefer to send significant changes as a patch to the list for discussion before committing. george. PS: With Gleb changes the problem is the same. The following snippet reflect exactly the same behavior as the original patch. I didn't try to change the semantic. Just make the code to match the semantic that Galen described. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel - --- ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
I'm available this afternoon... Brian On Jun 7, 2007, at 12:39 PM, George Bosilca wrote: I'm available this afternoon. george. On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote: Are people available today to discuss this over the phone? - Galen On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote: On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote: ) I expect you to revise the patch in order to propose a generic solution or I'll trigger a vote against the patch. I vote to be backed out of the trunk as it export way to much knowledge from the Open IB BTL into the PML layer. The patch solves real problem. If we want to back it out we need to find another solution. I also didn't like this change too much, but I thought about other solutions and haven't found something better that what Galen did. If you have something in mind lets discuss it. As a general comment this kind of discussion is why I prefer to send significant changes as a patch to the list for discussion before committing. george. PS: With Gleb changes the problem is the same. The following snippet reflect exactly the same behavior as the original patch. I didn't try to change the semantic. Just make the code to match the semantic that Galen described. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
It would be difficult for me to attend this afternoon. Tomorrow is much better for me. -DON George Bosilca wrote: I'm available this afternoon. george. On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote: Are people available today to discuss this over the phone? - Galen On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote: On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote: ) I expect you to revise the patch in order to propose a generic solution or I'll trigger a vote against the patch. I vote to be backed out of the trunk as it export way to much knowledge from the Open IB BTL into the PML layer. The patch solves real problem. If we want to back it out we need to find another solution. I also didn't like this change too much, but I thought about other solutions and haven't found something better that what Galen did. If you have something in mind lets discuss it. As a general comment this kind of discussion is why I prefer to send significant changes as a patch to the list for discussion before committing. george. PS: With Gleb changes the problem is the same. The following snippet reflect exactly the same behavior as the original patch. I didn't try to change the semantic. Just make the code to match the semantic that Galen described. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
I'm available this afternoon. george. On Jun 7, 2007, at 2:35 PM, Galen Shipman wrote: Are people available today to discuss this over the phone? - Galen On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote: On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote: ) I expect you to revise the patch in order to propose a generic solution or I'll trigger a vote against the patch. I vote to be backed out of the trunk as it export way to much knowledge from the Open IB BTL into the PML layer. The patch solves real problem. If we want to back it out we need to find another solution. I also didn't like this change too much, but I thought about other solutions and haven't found something better that what Galen did. If you have something in mind lets discuss it. As a general comment this kind of discussion is why I prefer to send significant changes as a patch to the list for discussion before committing. george. PS: With Gleb changes the problem is the same. The following snippet reflect exactly the same behavior as the original patch. I didn't try to change the semantic. Just make the code to match the semantic that Galen described. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
On Jun 7, 2007, at 1:28 PM, Gleb Natapov wrote: On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote: ) I expect you to revise the patch in order to propose a generic solution or I'll trigger a vote against the patch. I vote to be backed out of the trunk as it export way to much knowledge from the Open IB BTL into the PML layer. The patch solves real problem. If we want to back it out we need to find another solution. I also didn't like this change too much, but I thought about other solutions and haven't found something better that what Galen did. If you have something in mind lets discuss it. Well, I didn't really investigate this matter, as for all devices where I work this never happens. What really bother me, and which is not as Galen describe it is the following line: frag->base.order = order As the value of "order" come from the PML level and the Open IB BTL use it without any change, to me it means that some knowledge belonging to the BTL (BTL_OPENIB_LP_QP is clearly Open IB related isn't it?) has ben exported to the PML. You can turn it any way you want, this clearly means that the PML is able to give direct orders to the BTL which allow it to put the fragment in the high or low priority queue (which is a VERY Open IB feature). Now, if we create a constant MCA_BTL_HP_QUEUE it might look better. But, again as all others BTLs will completely ignore this, it look like a Open IB feature exported to the PML to me. As a general comment this kind of discussion is why I prefer to send significant changes as a patch to the list for discussion before committing. george. PS: With Gleb changes the problem is the same. The following snippet reflect exactly the same behavior as the original patch. I didn't try to change the semantic. Just make the code to match the semantic that Galen described. You didn't change compared with Galen patch. But the same knowledge export happens with your patch (for the same reasons as described above). -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
Are people available today to discuss this over the phone? - Galen On Jun 7, 2007, at 11:28 AM, Gleb Natapov wrote: On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote: ) I expect you to revise the patch in order to propose a generic solution or I'll trigger a vote against the patch. I vote to be backed out of the trunk as it export way to much knowledge from the Open IB BTL into the PML layer. The patch solves real problem. If we want to back it out we need to find another solution. I also didn't like this change too much, but I thought about other solutions and haven't found something better that what Galen did. If you have something in mind lets discuss it. As a general comment this kind of discussion is why I prefer to send significant changes as a patch to the list for discussion before committing. george. PS: With Gleb changes the problem is the same. The following snippet reflect exactly the same behavior as the original patch. I didn't try to change the semantic. Just make the code to match the semantic that Galen described. -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote: > ) I expect you to revise the patch in order to propose a generic > solution or I'll trigger a vote against the patch. I vote to be > backed out of the trunk as it export way to much knowledge from the > Open IB BTL into the PML layer. The patch solves real problem. If we want to back it out we need to find another solution. I also didn't like this change too much, but I thought about other solutions and haven't found something better that what Galen did. If you have something in mind lets discuss it. As a general comment this kind of discussion is why I prefer to send significant changes as a patch to the list for discussion before committing. > > george. > > PS: With Gleb changes the problem is the same. The following snippet > reflect exactly the same behavior as the original patch. I didn't try to change the semantic. Just make the code to match the semantic that Galen described. -- Gleb.
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
On Jun 7, 2007, at 9:11 AM, George Bosilca wrote: There is something weird with this change, and the patch reflect it. The new argument "order" come from the PML level and might be MCA_BTL_NO_ORDER (which is kind of global) or BTL_OPENIB_LP_QP or BTL_OPENIB_HP_QP (which are definitively Open IB related). Do you really intend to let the PML knows about Open IB internal constants ? No, the PML knows only one thing about the order tag, it is either MCA_BTL_NO_ORDER or it is something that the BTL assigns. The PML has no idea about BTL_OPENIB_LP_QP or BTL_OPENIB_HP_QP, to the PML it is just an order tag assigned to a fragment by the BTL. So the semantics are that after a btl_send/put/get an order tag may be assigned by the BTL to the descriptor, This order tag can then be specified to subsequent calls to btl_alloc or btl_prepare. The PML has no idea what the value means, other than he is requesting a descriptor that will be ordered w.r.t. a previously transmitted descriptor. If it's the case (which seems to be true from the following snippet if(MCA_BTL_NO_ORDER == order) { frag->base.order = BTL_OPENIB_LP_QP; } else { frag->base.order = order; } So I am choosing some ordering to use here because the PML told me he doesn't care, what is wrong with this? ) I expect you to revise the patch in order to propose a generic solution or I'll trigger a vote against the patch. This exports no knowledge of the Open IB BTL to the PML layer, the PML doesn't know that this is a QP index, he doesn't care! The PML simply uses this value (if it wants to) to request ordering with subsequent fragments. We use the QP index only as a BTL optimization, it could have been anything. So the only new knowledge that the PML has is how to request that ordering of fragments be enforced, and the BTL doesn't even have to provide this if it doesn't want, that is the reason for MCA_BTL_NO_ORDER. Please describe a use case where this is not a generic solution. Keep in mind that MX, TCP, GM all can provide ordering guarantees if they wish, in fact for MX you can simply always assign an order tag, say the value is 1. MX can then guarantee ordering of all fragments sent over the same BTL. I vote to be backed out of the trunk as it export way to much knowledge from the Open IB BTL into the PML layer. The only other option that I have identified that doesn't push PML level protocol into the BTL is to require that BTLs always guarantee ordering of fragments sent/put/get over the same BTL. george. PS: With Gleb changes the problem is the same. The following snippet reflect exactly the same behavior as the original patch. Gleb's changes don't change the semantic guarantees that I have described above. frag->base.order = order; assert(frag->base.order != BTL_OPENIB_HP_QP); On Jun 7, 2007, at 9:49 AM, Gleb Natapov wrote: Hi Galen, On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote: With current code this is not the case. Order tag is set during a fragment allocation. It seems wrong according to your description. Attached patch fixes this. If no specific ordering tag is provided to allocation function order of the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/ get order is set to whatever QP was used for communication. If order is set before send call it is used to choose QP. I do set the order tag during allocation/prepare, but the defined semantics are that the tag is only valid after send/put/get. We can set them up any where we wish in the BTL, the PML however cannot rely on anything until after the send/put/get call. So really this is an issue of semantics versus implementation. The implementation I believe does conform to the semantics as the upper layer (PML) doesn't use the tag value until after a call to send/put/get. I will look over the patch however, might make more sense to delay setting the value until the actual send/put/get call. Have you had a chance to look over the patch? -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
There is something weird with this change, and the patch reflect it. The new argument "order" come from the PML level and might be MCA_BTL_NO_ORDER (which is kind of global) or BTL_OPENIB_LP_QP or BTL_OPENIB_HP_QP (which are definitively Open IB related). Do you really intend to let the PML knows about Open IB internal constants ? If it's the case (which seems to be true from the following snippet if(MCA_BTL_NO_ORDER == order) { frag->base.order = BTL_OPENIB_LP_QP; } else { frag->base.order = order; } ) I expect you to revise the patch in order to propose a generic solution or I'll trigger a vote against the patch. I vote to be backed out of the trunk as it export way to much knowledge from the Open IB BTL into the PML layer. george. PS: With Gleb changes the problem is the same. The following snippet reflect exactly the same behavior as the original patch. frag->base.order = order; assert(frag->base.order != BTL_OPENIB_HP_QP); On Jun 7, 2007, at 9:49 AM, Gleb Natapov wrote: Hi Galen, On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote: With current code this is not the case. Order tag is set during a fragment allocation. It seems wrong according to your description. Attached patch fixes this. If no specific ordering tag is provided to allocation function order of the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/ get order is set to whatever QP was used for communication. If order is set before send call it is used to choose QP. I do set the order tag during allocation/prepare, but the defined semantics are that the tag is only valid after send/put/get. We can set them up any where we wish in the BTL, the PML however cannot rely on anything until after the send/put/get call. So really this is an issue of semantics versus implementation. The implementation I believe does conform to the semantics as the upper layer (PML) doesn't use the tag value until after a call to send/put/get. I will look over the patch however, might make more sense to delay setting the value until the actual send/put/get call. Have you had a chance to look over the patch? -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
Hi Galen, On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote: > > > With current code this is not the case. Order tag is set during a > > fragment > > allocation. It seems wrong according to your description. Attached > > patch fixes > > this. If no specific ordering tag is provided to allocation > > function order of > > the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/ > > get order > > is set to whatever QP was used for communication. If order is set > > before send call > > it is used to choose QP. > > > > I do set the order tag during allocation/prepare, but the defined > semantics are that the tag is only valid after send/put/get. We can > set them up any where we wish in the BTL, the PML however cannot rely > on anything until after the send/put/get call. So really this is an > issue of semantics versus implementation. The implementation I > believe does conform to the semantics as the upper layer (PML) > doesn't use the tag value until after a call to send/put/get. > > I will look over the patch however, might make more sense to delay > setting the value until the actual send/put/get call. > Have you had a chance to look over the patch? -- Gleb.
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
On Sun, May 27, 2007 at 10:23:23AM -0600, Galen Shipman wrote: > >> > >> > >> The problem is that MCA_BTL_DES_FLAGS_PRIORITY was meant to indicate > >> that the fragment was higher priority, but the fragment isn't higher > >> priority. It simply needs to be ordered w.r.t. a previous fragment, > >> an RDMA in this case. > > But after the change priority flags is totally ignored. > > > > > So the priority flag was broken I think, in OpenIB we used the > priority flag to choose a QP on which to send the fragment, but there > was no checking for if the fragment could be sent on the specified > QP. So a max send size fragment could be set as priority and the BTL > would use the high priority QP and we would go bang. This is how I > read the code, I might have missed something. You a right. We kinda rely on PML here. PML always send "max send" sized packets with low priority. > > If the priority flag is simply a "hint" and has hard requirements > than we can still use it in the OpenIB BTL but it won't have any > effect as only eager size fragments can be marked high priority and > we already send these over the high priority QP. Sometimes prepare_src may return eager sized fragment and it will be sent on low priority QP. -- Gleb.
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote: > > > With current code this is not the case. Order tag is set during a > > fragment > > allocation. It seems wrong according to your description. Attached > > patch fixes > > this. If no specific ordering tag is provided to allocation > > function order of > > the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/ > > get order > > is set to whatever QP was used for communication. If order is set > > before send call > > it is used to choose QP. > > > > I do set the order tag during allocation/prepare, but the defined > semantics are that the tag is only valid after send/put/get. We can > set them up any where we wish in the BTL, the PML however cannot rely > on anything until after the send/put/get call. So really this is an > issue of semantics versus implementation. The implementation I > believe does conform to the semantics as the upper layer (PML) > doesn't use the tag value until after a call to send/put/get. > With current approach send function doesn't know the difference between fragment that should be ordered because PML requested it and regular fragment without ordering requirements. -- Gleb.
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
doh,, correction.. On May 27, 2007, at 10:23 AM, Galen Shipman wrote: The problem is that MCA_BTL_DES_FLAGS_PRIORITY was meant to indicate that the fragment was higher priority, but the fragment isn't higher priority. It simply needs to be ordered w.r.t. a previous fragment, an RDMA in this case. But after the change priority flags is totally ignored. So the priority flag was broken I think, in OpenIB we used the priority flag to choose a QP on which to send the fragment, but there was no checking for if the fragment could be sent on the specified QP. So a max send size fragment could be set as priority and the BTL would use the high priority QP and we would go bang. This is how I read the code, I might have missed something. If the priority flag is simply a "hint" and has *NO* hard requirements than we can still use it in the OpenIB BTL but it won't have any effect as only eager size fragments can be marked high priority and we already send these over the high priority QP. - Galen -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
The problem is that MCA_BTL_DES_FLAGS_PRIORITY was meant to indicate that the fragment was higher priority, but the fragment isn't higher priority. It simply needs to be ordered w.r.t. a previous fragment, an RDMA in this case. But after the change priority flags is totally ignored. So the priority flag was broken I think, in OpenIB we used the priority flag to choose a QP on which to send the fragment, but there was no checking for if the fragment could be sent on the specified QP. So a max send size fragment could be set as priority and the BTL would use the high priority QP and we would go bang. This is how I read the code, I might have missed something. If the priority flag is simply a "hint" and has hard requirements than we can still use it in the OpenIB BTL but it won't have any effect as only eager size fragments can be marked high priority and we already send these over the high priority QP. - Galen -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
On Fri, May 25, 2007 at 09:31:33PM -0600, Galen Shipman wrote: > > On May 24, 2007, at 2:48 PM, George Bosilca wrote: > > > I see the problem this patch try to solve, but I fail to correctly > > understand the implementation. The patch affect all PML and BTL in > > the code base by adding one more argument to some of the most often > > called functions. And there is only one BTL (openib) who seems to > > use it while all others completely ignore it. Moreover, there seems > > to be already a very similar mechanism based on the > > MCA_BTL_DES_FLAGS_PRIORITY flag, which can be set by the PML level > > into the btl_descriptor. > > > > So what's the difference between the additional argument and a > > correct usage of the MCA_BTL_DES_FLAGS_PRIORITY flag ? > > > The problem is that MCA_BTL_DES_FLAGS_PRIORITY was meant to indicate > that the fragment was higher priority, but the fragment isn't higher > priority. It simply needs to be ordered w.r.t. a previous fragment, > an RDMA in this case. But after the change priority flags is totally ignored. > This being said, we could have just added an rdma fin flag, but this > would mix protocol a bit too much between the BTL and the PML in my > opinion. > What we have with this fix is that the BTL can assign an order tag to > any descriptor if it wishes, this order tag is only valid after a > call to btl_send or btl_put/get. This order tag can then be used to With current code this is not the case. Order tag is set during a fragment allocation. It seems wrong according to your description. Attached patch fixes this. If no specific ordering tag is provided to allocation function order of the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/get order is set to whatever QP was used for communication. If order is set before send call it is used to choose QP. -- Gleb. diff --git a/ompi/mca/btl/openib/btl_openib.c b/ompi/mca/btl/openib/btl_openib.c index 2330dc0..fadb2bd 100644 --- a/ompi/mca/btl/openib/btl_openib.c +++ b/ompi/mca/btl/openib/btl_openib.c @@ -377,16 +377,8 @@ mca_btl_base_descriptor_t* mca_btl_openib_alloc( if(size <= mca_btl_openib_component.eager_limit){ MCA_BTL_IB_FRAG_ALLOC_EAGER(btl, frag, rc); -if(order == MCA_BTL_NO_ORDER) { -order = BTL_OPENIB_HP_QP; -} } else if(size <= mca_btl_openib_component.max_send_size) { -if(order == MCA_BTL_NO_ORDER) { -order = BTL_OPENIB_LP_QP; -} else if(order != BTL_OPENIB_LP_QP) { -return NULL; -} - +assert(order != BTL_OPENIB_HP_QP); MCA_BTL_IB_FRAG_ALLOC_MAX(btl, frag, rc); } @@ -502,6 +494,7 @@ mca_btl_base_descriptor_t* mca_btl_openib_prepare_src( frag->base.des_dst = NULL; frag->base.des_dst_cnt = 0; frag->base.des_flags = 0; +frag->base.order = order; frag->sg_entry.length = max_data; frag->sg_entry.lkey = openib_reg->mr->lkey; @@ -511,12 +504,6 @@ mca_btl_base_descriptor_t* mca_btl_openib_prepare_src( frag->segment.seg_addr.pval = iov.iov_base; frag->segment.seg_key.key32[0] = (uint32_t)frag->sg_entry.lkey; -if(MCA_BTL_NO_ORDER == order) { -frag->base.order = BTL_OPENIB_LP_QP; -} else { -frag->base.order = order; -} - BTL_VERBOSE(("frag->sg_entry.lkey = %lu .addr = %llu " "frag->segment.seg_key.key32[0] = %lu", frag->sg_entry.lkey, frag->sg_entry.addr, @@ -529,21 +516,13 @@ mca_btl_base_descriptor_t* mca_btl_openib_prepare_src( if(max_data + reserve <= btl->btl_eager_limit) { /* the data is small enough to fit in the eager frag and * memory is not prepinned */ - MCA_BTL_IB_FRAG_ALLOC_EAGER(btl, frag, rc); -if(MCA_BTL_NO_ORDER == order) { -order = BTL_OPENIB_LP_QP; -} } if(NULL == frag) { /* the data doesn't fit into eager frag or eager frag is * not available */ -if(MCA_BTL_NO_ORDER == order) { -order = BTL_OPENIB_LP_QP; -} else if(BTL_OPENIB_HP_QP == order){ -return NULL; -} +assert(order != BTL_OPENIB_HP_QP); MCA_BTL_IB_FRAG_ALLOC_MAX(btl, frag, rc); if(NULL == frag) { @@ -571,8 +550,6 @@ mca_btl_base_descriptor_t* mca_btl_openib_prepare_src( frag->base.des_dst_cnt = 0; frag->base.des_flags = 0; - - return >base; } @@ -641,12 +618,7 @@ mca_btl_base_descriptor_t* mca_btl_openib_prepare_dst( frag->base.des_src_cnt = 0; frag->base.des_flags = 0; -if(MCA_BTL_NO_ORDER == order) { -frag->base.order = BTL_OPENIB_LP_QP; -} else { -frag->base.order = order; -} - +frag->base.order = order; BTL_VERBOSE(("frag->sg_entry.lkey
Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768
I see the problem this patch try to solve, but I fail to correctly understand the implementation. The patch affect all PML and BTL in the code base by adding one more argument to some of the most often called functions. And there is only one BTL (openib) who seems to use it while all others completely ignore it. Moreover, there seems to be already a very similar mechanism based on the MCA_BTL_DES_FLAGS_PRIORITY flag, which can be set by the PML level into the btl_descriptor. So what's the difference between the additional argument and a correct usage of the MCA_BTL_DES_FLAGS_PRIORITY flag ? george. On May 24, 2007, at 3:51 PM, gship...@osl.iu.edu wrote: Author: gshipman Date: 2007-05-24 15:51:26 EDT (Thu, 24 May 2007) New Revision: 14768 URL: https://svn.open-mpi.org/trac/ompi/changeset/14768 Log: Add optional ordering to the BTL interface. This is required to tighten up the BTL semantics. Ordering is not guaranteed, but, if the BTL returns a order tag in a descriptor (other than MCA_BTL_NO_ORDER) then we may request another descriptor that will obey ordering w.r.t. to the other descriptor. This will allow sane behavior for RDMA networks, where local completion of an RDMA operation on the active side does not imply remote completion on the passive side. If we send a FIN message after local completion and the FIN is not ordered w.r.t. the RDMA operation then badness may occur as the passive side may now try to deregister the memory and the RDMA operation may still be pending on the passive side. Note that this has no impact on networks that don't suffer from this limitation as the ORDER tag can simply always be specified as MCA_BTL_NO_ORDER. Text files modified: trunk/ompi/mca/bml/bml.h |29 +++ trunk/ompi/mca/btl/btl.h |10 trunk/ompi/mca/btl/gm/btl_gm.c | 8 ++ trunk/ompi/mca/btl/gm/btl_gm.h | 3 ++ trunk/ompi/mca/btl/mx/btl_mx.c | 8 ++ trunk/ompi/mca/btl/mx/btl_mx.h | 3 ++ trunk/ompi/mca/btl/openib/btl_openib.c |49 ++- trunk/ompi/mca/btl/openib/btl_openib.h | 3 ++ trunk/ompi/mca/btl/openib/btl_openib_endpoint.c | 7 +++-- trunk/ompi/mca/btl/openib/btl_openib_frag.c | 7 + trunk/ompi/mca/btl/portals/btl_portals.c | 8 - trunk/ompi/mca/btl/portals/btl_portals.h | 3 ++ trunk/ompi/mca/btl/self/btl_self.c | 3 ++ trunk/ompi/mca/btl/self/btl_self.h | 3 ++ trunk/ompi/mca/btl/sm/btl_sm.c | 2 + trunk/ompi/mca/btl/sm/btl_sm.h | 2 + trunk/ompi/mca/btl/tcp/btl_tcp.c | 6 trunk/ompi/mca/btl/tcp/btl_tcp.h | 3 ++ trunk/ompi/mca/btl/template/btl_template.c | 8 - trunk/ompi/mca/btl/template/btl_template.h | 3 ++ trunk/ompi/mca/btl/template/btl_template_component.c |10 +++ +--- trunk/ompi/mca/btl/udapl/btl_udapl.c |11 ++-- trunk/ompi/mca/btl/udapl/btl_udapl.h | 3 ++ trunk/ompi/mca/btl/udapl/btl_udapl_component.c |17 trunk/ompi/mca/osc/rdma/osc_rdma_data_move.c | 3 ++ trunk/ompi/mca/pml/dr/pml_dr.h | 6 ++-- trunk/ompi/mca/pml/dr/pml_dr_sendreq.c |12 - trunk/ompi/mca/pml/dr/pml_dr_sendreq.h | 3 + trunk/ompi/mca/pml/ob1/pml_ob1.c |17 - trunk/ompi/mca/pml/ob1/pml_ob1.h |44 +-- trunk/ompi/mca/pml/ob1/pml_ob1_recvreq.c |14 +++ trunk/ompi/mca/pml/ob1/pml_ob1_sendreq.c |28 -- 32 files changed, 241 insertions(+), 95 deletions(-) Diff not shown due to size (53504 bytes). To see the diff, run the following command: svn diff -r 14767:14768 --no-diff-deleted ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn smime.p7s Description: S/MIME cryptographic signature