Ali Ayoub wrote:
1. If I change the local and the remote timeout for ib_cm_req_param to 40 (instead of 20, the default value) it causes kernel oops.

The timeout is calculated as: 4.096 x 2 ^ timeout. In highly technical terms, going from 20 to 40 increases the timeout by a factor of a lot (from seconds to weeks).

Since the oops occurred in cmpost, I'm not overly concerned with trying to debug this at the moment. (I will happily take a patch that fixes the issue, or will look at it more if it definitely looks like an ib_cm bug. Cmpost just isn't meant to be a robust test program.)

2. With the following parameters:

            connections = 3000

            message_size = 200

            message_count = 10

            qp_type = RC

The test fails inconsistently; in some cases it causes a kernel oops,

This setup will result in allocating a fair amount of memory, which could explain the failures. The oops may be related, but I can't tell just from the backtrace. I've never run into this myself though. Can you reproduce this issue using a smaller number of connections?

Note that when simultaneously establishing a large number of connections, you will end up overrunning QP 1 on the remote side. This will result in a lot of dropped MADs, timeouts, and retries, which can make the results of the test unpredictable.

3. In other cases the server fails because it receives some IB_CM_DREQ_ERROR when the client receives all the IB_CM_DREQ_RECEIVED.

This can occur, and is easier to reproduce for a large number of connections. A DREQ is retried until a DREP is received. However, since a DREP is not acked, once it has been sent, the disconnect is done from the client's perspective. If the DREP is lost, the server will see a DREQ timeout.

There is code in the ib_cm to resend a DREP in response to a repeated DREQ, but the state needed to generate the DREP is only maintained while the old connection is in timewait.

- Sean
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to