I've had a report of rdma_connect() failing with a callback event type of
RDMA_CM_EVENT_UNREACHABLE and status -ETIMEDOUT although the peer node was
up and running at the time.
It seems this can be reproduced as follows...
1. Establish a connection between nodes A and B
2. Reboot node A
3. Start establishing a new connection from node A to node B
4. After a timeout, the CM callback occurs as described.
Could this happen with a buggy SM? Are there some good places in the
OpenFabrics stack to add printks to help point the finger (or can some
existing debug/trace code be enabled)?
--
Cheers,
Eric
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general