On Fri, 2006-06-23 at 18:20 +0530, Pradipta Kumar Banerjee wrote: > Steve Wise wrote: > > The goal of adding the return codes was so that the rping program could > > exit with a status indicating success or failure. Every rping run > > results in a DISCONNECT event, so I don't think we want to treat that > > case as an error. > DISCONNECT event will be generated when the connection is closed or in case > of > some error (like CCAE_LLP_CONNECTION_LOST, CCAE_BAD_CLOSE in case of Ammasso > driver etc). > >
You'll also get the DISCONNECT event when one side finished the rping loops and does rdma_disconnect(). So receiving that event isn't necessarily an error... > > > > Also, can you explain why thi fixes Amith's problem, which sounded like > > a process was hanging? > > > On debugging I found that the main thread was blocked in ibv_destroy_cq(), > cm_thread was blocked in rdma_get_cm_event->write() and cq_thread was blocked > in > ibv_get_cq_event->read > Taking the return value of the DISCONNECT event into consideration forcefully > killed the process. > On delving deeper into this problem, I think that there is more to this rping > hang. Let me work on this further. > I think rping needs some coordination on these threads and when they should be killed. > On a related note - I noticed another rping hang in the following case > - Start the rping as a client without first starting an rping server > - If you are lucky the first run itself will result in the 'lt-rping' process > in > 'D' state. If not repeating the procedure will result in the hang. > > This is the o/p. > > cq completion failed status 5 > wait for CONNECTED state 10 > connect error -1 > > Thanks, > Pradipta. > > > > > > > Thanks, > > > > Steve. > > > > > > > > On Fri, 2006-06-23 at 00:53 +0530, Pradipta Kumar Banerjee wrote: > >> Hi, > >> Please ignore the earlier mail. There were some problems with the mailer. > >> Here is the new one. > >> > >> This patch fixes the problem as reported by Amith. > >> > >> Signed-off-by: Pradipta Kumar Banerjee <[EMAIL PROTECTED]> > >> > >> --- > >> > >> Index: rping.c > >> ============================================================================= > >> --- rping.c.org 2006-06-23 00:22:17.000000000 +0530 > >> +++ rping.c 2006-06-23 00:39:06.000000000 +0530 > >> @@ -215,6 +215,7 @@ static int rping_cma_event_handler(struc > >> case RDMA_CM_EVENT_DISCONNECTED: > >> fprintf(stderr, "%s DISCONNECT EVENT...\n", cb->server ? > >> "server" : "client"); > >> sem_post(&cb->sem); > >> + ret = -1; > >> break; > >> > >> case RDMA_CM_EVENT_DEVICE_REMOVAL: > > > > > > _______________________________________________ > > openib-general mailing list > > [email protected] > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
