Sean Hefty wrote: > Michael S. Tsirkin wrote: >> Sean, did we decide what to do for upstream yet? >> I would say we need something like the below for 2.6.19 too >> (probably just need to update node type check). >> And, I like it that this approach leaves all matters of policy >> to users (such as whether move QP to RTS after asynchronous event >> or after completion event).
> I will go with a patch similar to this one. It seems the most flexible. Just to make sure, you come to say that you would merge this patch instead the one that had the CM track local qp numbers and install a callback for the consumer QP to catch the async event etc? Also i'd like to make sure i follow what would happen: T1) the consumer gets an rx completion on a QP associated with a non established CMA ID [also on some point along time the async handler is called with a COMM_EST async event for this QP] T2) the consumer calls rdma_establish() T3) the consumer cma callback is called with ESTABLISHED event and is now able to post sends to the QP Indeed the **patch** for itself is somehow simpler, but the consumer must get established event before posting sends to the qp so they need to either queue RX-es or modify the QP to RTS before sending the REP. As i said before this is fine with our iser target as we queue the sole possible RX (login request) till getting the established. Is rdma_established() --> cm_establish() callable from non interruptible context? our target does a context jump once the cq handler is called so it does the actual processing in thread level, but there may be other consumers attempting to call rdma_establish from the hard-irq cq callback context. Also does the patch ensures only one ESTABLISHED event would be called for the id, no matter if rdma_establish() and an RTU reception happen in parallel? >> As a side note, reasons for frequent loss of RTU must be investigated. > A lost RTU shouldn't be any more likely than a lost REQ or REP. Is the RTU > never showing up? I will look into the ib_cm and see if there's an issue > that > would cause an RTU not to be retried. Indeed, my initial suspect was that heavy CPU load on the server node prevents the mad/cm threads to be scheduled in, but as REQ messages do appear i also thought we should see if a "retried" REP cause a resend on the RTU. _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
