Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-09 Thread Don Kerr

On 10/08/09 17:14, Don Kerr wrote:

George,

This is an interesting approach although I am guessing the changes 
would be wide spread and have many performance implications. Am I 
wrong in this belief?
My point here is that if this is going to have as many performance 
implications as I think it will, it probably makes sense to investigate 
the potential bigger dual-rail issue and consider the "never share" 
approach in the larger context.


-DON



-DON

On 10/08/09 11:45, George Bosilca wrote:

Don,

I think we can do something slightly different that will satisfy 
everybody.


How about a solution where each BTL will define a limit where a 
message will never be shared with another BTL? We can have two such 
limits, one for the send protocol and one for the RMA (it will apply 
either to PUT or GET operations based on the BTL support and PML 
decision).


  george.

On Oct 8, 2009, at 11:01 , Don Kerr wrote:




On 10/07/09 13:52, George Bosilca wrote:

Don,

The problem is that a particular BTL doesn't have the knowledge 
about the other selected BTL, so allowing the BTLs to set this 
limit is not as easy as it sound. However, in the case two 
identical BTLs are selected and that they are the only ones, this 
clearly is a better approach.


If this parameter is set at the PML level, I can't imagine how we 
figure out the correct value depending on the BTLs.


I see this as a pretty strong restriction. How do we know we set a 
value that make sense?
OK, I now see why setting at btl level is difficult. And for the 
case of multiple btls which are also different component types, 
however unlikely that is,  a pml setting will not be optimal for both.


-DON




 george.

On Oct 7, 2009, at 10:19 , Don Kerr wrote:


George,

Were you suggesting that the proposed new parameter 
"max_rdma_single_rget" be set by the individual btls similar to 
"btl_eager_limit"?  Seems to me to that is the better approach if 
I am to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is 
somewhat specific but where as OB1 appears to have multiple 
protocols depending on the capabilities of the BTLs I would not 
characterize as an IB centric problem. Maybe OB1 RDMA problem. 
There is a clear benefit from modifying this specific case. Do 
you think its not worth making incremental improvements while 
also attacking a potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in 
the PML. Moreover, I noticed that independent on the BTL we have 
some problems with the multi-rail performance. As an example on 
a cluster with 3 GB cards we get the same performance is I 
enable 2 or 3. Didn't had time to look into the details, but 
this might be a more general problem.


george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the 
trunk.  The change does not impact single rail, tested with 
openib btl, case and does improve dual rail case. Since it does 
involve performance and I am adding a OB1 mca parameter just 
wanted to check if anyone was interested or had an issue with 
it before I committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-08 Thread Don Kerr

George,

This is an interesting approach although I am guessing the changes would 
be wide spread and have many performance implications. Am I wrong in 
this belief?


-DON

On 10/08/09 11:45, George Bosilca wrote:

Don,

I think we can do something slightly different that will satisfy 
everybody.


How about a solution where each BTL will define a limit where a 
message will never be shared with another BTL? We can have two such 
limits, one for the send protocol and one for the RMA (it will apply 
either to PUT or GET operations based on the BTL support and PML 
decision).


  george.

On Oct 8, 2009, at 11:01 , Don Kerr wrote:




On 10/07/09 13:52, George Bosilca wrote:

Don,

The problem is that a particular BTL doesn't have the knowledge 
about the other selected BTL, so allowing the BTLs to set this limit 
is not as easy as it sound. However, in the case two identical BTLs 
are selected and that they are the only ones, this clearly is a 
better approach.


If this parameter is set at the PML level, I can't imagine how we 
figure out the correct value depending on the BTLs.


I see this as a pretty strong restriction. How do we know we set a 
value that make sense?
OK, I now see why setting at btl level is difficult. And for the case 
of multiple btls which are also different component types, however 
unlikely that is,  a pml setting will not be optimal for both.


-DON




 george.

On Oct 7, 2009, at 10:19 , Don Kerr wrote:


George,

Were you suggesting that the proposed new parameter 
"max_rdma_single_rget" be set by the individual btls similar to 
"btl_eager_limit"?  Seems to me to that is the better approach if I 
am to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is 
somewhat specific but where as OB1 appears to have multiple 
protocols depending on the capabilities of the BTLs I would not 
characterize as an IB centric problem. Maybe OB1 RDMA problem. 
There is a clear benefit from modifying this specific case. Do you 
think its not worth making incremental improvements while also 
attacking a potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in 
the PML. Moreover, I noticed that independent on the BTL we have 
some problems with the multi-rail performance. As an example on a 
cluster with 3 GB cards we get the same performance is I enable 2 
or 3. Didn't had time to look into the details, but this might be 
a more general problem.


george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the 
trunk.  The change does not impact single rail, tested with 
openib btl, case and does improve dual rail case. Since it does 
involve performance and I am adding a OB1 mca parameter just 
wanted to check if anyone was interested or had an issue with it 
before I committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-08 Thread George Bosilca

Don,

I think we can do something slightly different that will satisfy  
everybody.


How about a solution where each BTL will define a limit where a  
message will never be shared with another BTL? We can have two such  
limits, one for the send protocol and one for the RMA (it will apply  
either to PUT or GET operations based on the BTL support and PML  
decision).


  george.

On Oct 8, 2009, at 11:01 , Don Kerr wrote:




On 10/07/09 13:52, George Bosilca wrote:

Don,

The problem is that a particular BTL doesn't have the knowledge  
about the other selected BTL, so allowing the BTLs to set this  
limit is not as easy as it sound. However, in the case two  
identical BTLs are selected and that they are the only ones, this  
clearly is a better approach.


If this parameter is set at the PML level, I can't imagine how we  
figure out the correct value depending on the BTLs.


I see this as a pretty strong restriction. How do we know we set a  
value that make sense?
OK, I now see why setting at btl level is difficult. And for the  
case of multiple btls which are also different component types,  
however unlikely that is,  a pml setting will not be optimal for both.


-DON




 george.

On Oct 7, 2009, at 10:19 , Don Kerr wrote:


George,

Were you suggesting that the proposed new parameter  
"max_rdma_single_rget" be set by the individual btls similar to  
"btl_eager_limit"?  Seems to me to that is the better approach if  
I am to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is  
somewhat specific but where as OB1 appears to have multiple  
protocols depending on the capabilities of the BTLs I would not  
characterize as an IB centric problem. Maybe OB1 RDMA problem.  
There is a clear benefit from modifying this specific case. Do  
you think its not worth making incremental improvements while  
also attacking a potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in  
the PML. Moreover, I noticed that independent on the BTL we have  
some problems with the multi-rail performance. As an example on  
a cluster with 3 GB cards we get the same performance is I  
enable 2 or 3. Didn't had time to look into the details, but  
this might be a more general problem.


george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the  
trunk.  The change does not impact single rail, tested with  
openib btl, case and does improve dual rail case. Since it does  
involve performance and I am adding a OB1 mca parameter just  
wanted to check if anyone was interested or had an issue with  
it before I committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-08 Thread Don Kerr



On 10/07/09 13:52, George Bosilca wrote:

Don,

The problem is that a particular BTL doesn't have the knowledge about 
the other selected BTL, so allowing the BTLs to set this limit is not 
as easy as it sound. However, in the case two identical BTLs are 
selected and that they are the only ones, this clearly is a better 
approach.


If this parameter is set at the PML level, I can't imagine how we 
figure out the correct value depending on the BTLs.


I see this as a pretty strong restriction. How do we know we set a 
value that make sense?
OK, I now see why setting at btl level is difficult. And for the case of 
multiple btls which are also different component types, however unlikely 
that is,  a pml setting will not be optimal for both.


-DON




  george.

On Oct 7, 2009, at 10:19 , Don Kerr wrote:


George,

Were you suggesting that the proposed new parameter 
"max_rdma_single_rget" be set by the individual btls similar to 
"btl_eager_limit"?  Seems to me to that is the better approach if I 
am to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is 
somewhat specific but where as OB1 appears to have multiple 
protocols depending on the capabilities of the BTLs I would not 
characterize as an IB centric problem. Maybe OB1 RDMA problem. There 
is a clear benefit from modifying this specific case. Do you think 
its not worth making incremental improvements while also attacking a 
potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in the 
PML. Moreover, I noticed that independent on the BTL we have some 
problems with the multi-rail performance. As an example on a 
cluster with 3 GB cards we get the same performance is I enable 2 
or 3. Didn't had time to look into the details, but this might be a 
more general problem.


 george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the 
trunk.  The change does not impact single rail, tested with openib 
btl, case and does improve dual rail case. Since it does involve 
performance and I am adding a OB1 mca parameter just wanted to 
check if anyone was interested or had an issue with it before I 
committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-07 Thread George Bosilca

Don,

The problem is that a particular BTL doesn't have the knowledge about  
the other selected BTL, so allowing the BTLs to set this limit is not  
as easy as it sound. However, in the case two identical BTLs are  
selected and that they are the only ones, this clearly is a better  
approach.


If this parameter is set at the PML level, I can't imagine how we  
figure out the correct value depending on the BTLs.


I see this as a pretty strong restriction. How do we know we set a  
value that make sense?


  george.

On Oct 7, 2009, at 10:19 , Don Kerr wrote:


George,

Were you suggesting that the proposed new parameter  
"max_rdma_single_rget" be set by the individual btls similar to  
"btl_eager_limit"?  Seems to me to that is the better approach if I  
am to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is  
somewhat specific but where as OB1 appears to have multiple  
protocols depending on the capabilities of the BTLs I would not  
characterize as an IB centric problem. Maybe OB1 RDMA problem.  
There is a clear benefit from modifying this specific case. Do you  
think its not worth making incremental improvements while also  
attacking a potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in  
the PML. Moreover, I noticed that independent on the BTL we have  
some problems with the multi-rail performance. As an example on a  
cluster with 3 GB cards we get the same performance is I enable 2  
or 3. Didn't had time to look into the details, but this might be  
a more general problem.


 george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the  
trunk.  The change does not impact single rail, tested with  
openib btl, case and does improve dual rail case. Since it does  
involve performance and I am adding a OB1 mca parameter just  
wanted to check if anyone was interested or had an issue with it  
before I committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-07 Thread Don Kerr

George,

Were you suggesting that the proposed new parameter 
"max_rdma_single_rget" be set by the individual btls similar to 
"btl_eager_limit"?  Seems to me to that is the better approach if I am 
to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is somewhat 
specific but where as OB1 appears to have multiple protocols depending 
on the capabilities of the BTLs I would not characterize as an IB 
centric problem. Maybe OB1 RDMA problem. There is a clear benefit from 
modifying this specific case. Do you think its not worth making 
incremental improvements while also attacking a potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in the 
PML. Moreover, I noticed that independent on the BTL we have some 
problems with the multi-rail performance. As an example on a cluster 
with 3 GB cards we get the same performance is I enable 2 or 3. 
Didn't had time to look into the details, but this might be a more 
general problem.


  george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the trunk.  
The change does not impact single rail, tested with openib btl, case 
and does improve dual rail case. Since it does involve performance 
and I am adding a OB1 mca parameter just wanted to check if anyone 
was interested or had an issue with it before I committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


[OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-06 Thread Don Kerr


I intend to make the change suggested in this ticket to the trunk.  The 
change does not impact single rail, tested with openib btl, case and 
does improve dual rail case. Since it does involve performance and I am 
adding a OB1 mca parameter just wanted to check if anyone was interested 
or had an issue with it before I committed the change.


-DON