I have the changes to the steps 6, 9.2 and 11. In step 9.2 ib_cm_init_qp_attr() failed with -22 and then RCQP failed with IB_WC_RETRY_EXC_ERR.
VBabu Sean Hefty wrote: >>Let me make the steps clear - >> >> > >This helps - thanks. > > > >> 1. On Passive node register for remote port UP/DOWN event by >>registering with ib_sa_serv_notice_hdlr() >> >> > >FYI - patches for this are being worked separately. > > > >> 2. On Passive node start the listener by calling ib_cm_listen(). >> 3. On Active node create the RC QP and establish the connection by >>calling ib_send_cm_req(). In struct ib_cm_req_param specify both primary >>path (say, through Port1) and alternate path (say, through Port2). >>NOTE:-Assume Port1 of Active node is connected to Port1 of Passive node; >>and Port2 of Active node is connected to Port2 of Passive node. >>NOTE:- After this step QP's path_mig_state will be IB_MIG_ARMED. >> 4. Let us say, Port1 on Active node fails >> 5. IB_EVENT_PORT_ERR event is generated on Active node; and remote >>port error event is generated on Passive node. >> 6. In those event handler call ib_qp_modify() to set the >>path_mig_state to IB_MIG_MIGRATED. This will let the HCA's firmware know >>to switch to the alternate path. >> >> > >At least the active side in your scenario should call ib_cm_notify() after this >step. Otherwise, the LAP will go out the primary path, which is down. This >isn't a big deal in your test case, since you wait for the primary path to >return (step 7) before calling ib_send_cm_lap(). > > > >> 7. After a while, Port1 is comes back again. >> 8. IB_EVENT_PORT_ACTIVE event is generated on Active node; and remote >>port active event is generated on Passive node. >> 9. On the Active node from IB_EVENT_PORT_ACTIVE event handler call >>the ib_send_cm_lap() to send the alternate path (through Port1) to the >>Passive node. >> 9.1 Passive node receives the LAP message >> >> > >The proposed patch will record the alternate path when the LAP is sent or >received. (Again, these patches are untested, so there can be some bugs here. >I'm still working on writing a test program to use these interfaces.) > > > >> 9.2 Calls ib_cm_init_rearm_attr() initialize the alternate path info >> >> > >This should now call ib_cm_init_qp_attr(). > > > >> 9.3 Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM >> 9.4 Send APR message back to the Active node. >>10. Active node receives the APR message >>11. Calls ib_cm_init_rearm_attr() initialize the alternate path info >> >> > >This should now call ib_cm_init_qp_attr(). > > > >>12. Calls ib_qp_modify() to update path_mig_state to IB_MIG_REARM >>13. Now when a first packet is passed between the Active and Passive >>node the ib_core changes the path_mig_state to the IB_MIG_ARMED. >> 14. Now it is all set for another failover. >> >> > >Using the proposed patches, where did you see a failure? > >- Sean > > _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
