On Tuesday 01 August 2006 05:19, Venkatesh Babu wrote: > Configuration2: Node1 and Node 2 conneected through two switches for > each port. > Node1, port1 -> switch1 -> Node2, port1 > Node1, port2 -> switch2 -> Node2, port2 > > Node 1: > 1. Call ib_cm_listen() to wait for connection requests > 2. When a REQ message arrives create a RC QP and establish a connection > 3. Setup callback handlers to receive packets. > 4. Receive packets and verify it and drop it. > 5. Event IB_MIG_MIGRATED received > 6. Stopped receiving packets. > > Node 2: > 1. Create RC QP > 2. Send REQ message to Node 1 to establish the connection (Load both > primary and alternate paths) > 3. Contineously send some packets > 4. Simulate the port failure by unplugging the IB cable > 5. Event IB_MIG_MIGRATED received > > But with > Configuration2, IB_EVENT_PORT_ERR event occurrs on a node1, failover to > the alternate path doesn't work. The traffic stops. Because node1 > doesn't now when the IB_EVENT_PORT_ERR event occurred on Node2.
We have not seen these problems here. We have regression tests which check APM, and they have run without problems. These tests have scripts which bring the HCA port down (equivalent to pulling the cable) to check that the migration occurs automatically. (You should NOT need to do ib_modify_qp for the migration to work in the case of a port error). Note, though, that these tests use the ibv_verbs layer directly. We have not checked out APM over the CM. There may be a bug here regarding setting up the alternate path properly when creating the connection (although this does seem strange, since you indicate that the MIGRATED event is received on both sides!). Please send us your test code so that we may reproduce the problem here. - Jack _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
