1. Yes, I can rearm the alternate path by sending LAP and APR messages. 2. I was sending some network traffic (netperf) while doing these failovers.
VBabu somenath wrote: > hi Venkatesh: > > Two questions: > > 1. does re-enabling Migration (as defined in vol1 of ib spec in > 17.2.8.1.4) work for you? > (I mean after the 1st path failure, you do lap/apr packet transfer) > > 2. What applications you are testing with? > > thanks, som. > > Venkatesh Babu wrote: > >> >> I have added couple of patches to the OFED stack as described in >> bug#160, bug#172, and bug#159 and with this successfully tested the >> APM functionality, except one issue. >> >> Configuration: >> 2 Nodes >> CPU: AMD Opteron(tm) Processor 252 Dual processor >> CA type: MT25208 >> Firmware version: 5.1.4 >> OS: CentOS release 4.2 >> IB: OFED 1.0 >> >> 2 Flextronics 24 port switchs >> >> Node1 Port1 connected to Switch1 >> Node1 Port2 connected to Switch2 >> Node2 Port1 connected to switch1 >> Node 2 Port 2 connected to Switch2 >> >> Node1 : Active side of the RC QP >> Node 2 : Passive side of the RC QP >> >> Test1: >> Failover simulation on Node1 >> 1. Simulate the port1 failure, RC QP migrates the path to port2 >> 2. Simulate the port1 UP to rearm the alternate path from port1 >> 3. Simulate the port2 failure, RC QP migrate the path to port1 >> 4. Simulate the port2 IP to rearm the alternate path from port2 >> >> Test2: >> Real failover my manually pulling the cable >> 1. Simulate the failover/failback by pulling cable of Node1 port1 >> 2. Simulate the failover/failback by pulling cable of Node1 port2 >> 3. Simulate the failover/failback by pulling cable of Node2 port1 >> 4. Simulate the failover/failback by pulling cable of Node2 port2 >> >> >> ISSUE: >> If I pull the both the cables then there are no paths to the >> destination, so RC QP connection is supposed to tear down. But it is >> not working. >> >> 1. Create a RC QP and load both primary and alternate path >> (I was setting rnr_retry_count = 6, retry_count = 6, >> packet_life_time field of struct ib_sa_path_rec to 15 and also tried >> with 12) >> 2. Send some traffic over RC QP >> 3. Disconnect the cable belonging to the primary path >> 4. It smoothly fails over to alternate path and it becomes primary path. >> >> No affect to the traffic on that RC QP >> 5. Remove the second cable belonging to the new primary path. >> 6. Obviously traffic stops since there are no paths to the >> destination. But for the outstanding WRs in the RC QP I don't get any >> callback from the verbs layer describing whether it succeeded or >> failed due to some error like IB_WC_RETRY_EXC_ERR. >> When I query the RC QP properties it still shows that it is in >> IB_QPS_RTS state. >> >> >> Without APM functionality it behaves correctly - >> 1. Create a RC QP and load only primary path >> (I was setting rnr_retry_count = 6, retry_count = 6, >> packet_life_time field of struct ib_sa_path_rec to 15 and also tried >> with 12) >> 2. Send some traffic over RC QP >> 3. Disconnect the cable belonging to the primary path >> 4. Obviously traffic stops since there are no paths to the >> destination. For the outstanding WRs in the RC QP I do get a callback >> from the verbs layer describing the first WR that it failed due to >> error IB_WC_RETRY_EXC_ERR and for all other WRs I get >> IB_WC_WR_FLUSH_ERR. >> I will close this RC QP. >> >> VBabu >> >> Date: Mon, 16 Oct 2006 14:03:50 -0700 >> From: "Sean Hefty" <[EMAIL PROTECTED]> >> Subject: Re: [openib-general] APM support in openib stack >> To: [EMAIL PROTECTED] >> Cc: [email protected] >> Message-ID: <[EMAIL PROTECTED]> >> Content-Type: text/plain; charset=iso-8859-1; format=flowed >> >> somenath wrote: >> >>>>>> Doesn't ib_cm_init_qp_attr() set this for you? >>>>> >>>>> >>>> >>>> No, it doesn't. it returns me >>>> attr_mask= 0x12d181 >>>> port=0x0 alt_port=0x0 >>> >>> >>> >>> >> >> Okay - there was a fix to the cm.c file (svn rev 8267) that added >> setting the alternate port number when initializing the QP >> attributes. Apparently that fix did not make it into the release >> that you're using. >> >> - Sean >> >> >> >> >> > _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
