I've had a report of rdma_connect() failing with a callback event type of
RDMA_CM_EVENT_UNREACHABLE and status -ETIMEDOUT although the peer node was
up and running at the time.

It seems this can be reproduced as follows...

1. Establish a connection between nodes A and B

2. Reboot node A

3. Start establishing a new connection from node A to node B

4. After a timeout, the CM callback occurs as described.

Could this happen with a buggy SM?  Are there some good places in the
OpenFabrics stack to add printks to help point the finger (or can some
existing debug/trace code be enabled)?

-- 
Cheers,

             Eric



_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to