Re: [ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V2] patch for review
What really should happen is that the field Local Ack Timeout in REQ should be (2 * PacketLifeTime + Local CA’s ACK delay) (see 12.7.34) and then the responder should use this for it's QP. Just to clarify, the value is _based_ on (2 * PacketLifeTime + local CA ack delay). For example, if local CA ack delay is 0, then local ack timeout = PacketLifeTime + 1. This does not sound too hard - why can't we just fix CM to do this, then? The work-arounds were only suggestions to use until a fix is in place and to verify that this really is the problem. I do plan on submitting a fix. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V2] patch for review
As previously stated, IBM HCA will address these issues. However, my understanding is that mthca/Topspin adapters also have a problem (too high a value for the Local CA Delay Ack). Both HCAs need to be fixed for good interoperability. I think you're misunderstanding what local CA ack delay means. This is a property of an HCA that is not (necessarily) subject to tuning -- it is just a property of the HCA, namely the maximum amount of time it may take to generate an ACK. So if a certain HCA reports a value of 15, then that means that any remote HCA talking to it must be prepared for a delay of 4.096 * 2^15 usecs before receiving an ACK. If the ACK delays on both sides are not being taken into account properly when establishing a connection, then I guess that is a bug in our CM. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V2] patch for review
Hello Roland, If the ACK delays on both sides are not being taken into account properly when establishing a connection, then I guess that is a bug in our CM. - R. So for each IPoIB connection, the ACK delays could be different from remote. Then how TCP retransmission timeout have a corresponding value? Thanks Shirley Ma IBM Linux Technology Center___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V2] patch for review
Thanks for the clarifications Roland. There is something that I am still missing- I presume the Local CA Ack Delay is common across all QPs in the HCA and the Local Ack Timeout is specific to each QP. Is that correct? I tried to change the ib_qp_attr .timeout value (this is the Local Ack Timeout -right?) to 0xf as the QP transitions from RTR to RTS (page 569 IB Spec) . A subsequent ib_query_qp() tells me that timeout = 0. This happens on both ehca and mthca. There may be a CM bug, but I am guessing somthing else is incorrect too. I have not yet narrowed that down. Pradeep [EMAIL PROTECTED] Roland Dreier [EMAIL PROTECTED] wrote on 04/24/2007 11:33:25 AM: As previously stated, IBM HCA will address these issues. However, my understanding is that mthca/Topspin adapters also have a problem (too high a value for the Local CA Delay Ack). Both HCAs need to be fixed for good interoperability. I think you're misunderstanding what local CA ack delay means. This is a property of an HCA that is not (necessarily) subject to tuning -- it is just a property of the HCA, namely the maximum amount of time it may take to generate an ACK. So if a certain HCA reports a value of 15, then that means that any remote HCA talking to it must be prepared for a delay of 4.096 * 2^15 usecs before receiving an ACK. If the ACK delays on both sides are not being taken into account properly when establishing a connection, then I guess that is a bug in our CM. - R. ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V2] patch for review
If the ACK delays on both sides are not being taken into account properly when establishing a connection, then I guess that is a bug in our CM. I looked, and the cm does not take into account the ca ack delay. This can be worked around by bumping up the qp timeout value between calling ib_cm_init_qp_attr() and ib_modify_qp(), or by increasing the path record packet_life_time. - Sean ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V2] patch for review
Quoting Sean Hefty [EMAIL PROTECTED]: Subject: Re: [ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V2] patch for review If the ACK delays on both sides are not being taken into account properly when establishing a connection, then I guess that is a bug in our CM. I looked, and the cm does not take into account the ca ack delay. This can be worked around by bumping up the qp timeout value between calling ib_cm_init_qp_attr() and ib_modify_qp(), or by increasing the path record packet_life_time. What really should happen is that the field Local Ack Timeout in REQ should be (2 * PacketLifeTime + Local CA’s ACK delay) (see 12.7.34) and then the responder should use this for it's QP. This does not sound too hard - why can't we just fix CM to do this, then? -- MST ___ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general