On Fri, 2006-06-23 at 18:20 +0530, Pradipta Kumar Banerjee wrote:
> Steve Wise wrote:
> > The goal of adding the return codes was so that the rping program could
> > exit with a status indicating success or failure.  Every rping run
> > results in a DISCONNECT event, so I don't think we want to treat that
> > case as an error.
> DISCONNECT event will be generated when the connection is closed or in case 
> of 
> some error (like CCAE_LLP_CONNECTION_LOST, CCAE_BAD_CLOSE in case of Ammasso 
> driver etc).
> > 

You'll also get the DISCONNECT event when one side finished the rping
loops and does rdma_disconnect().  So receiving that event isn't
necessarily an error...


> > 
> > Also, can you explain why thi fixes Amith's problem, which sounded like
> > a process was hanging?
> > 
> On debugging I found that the main thread was blocked in ibv_destroy_cq(), 
> cm_thread was blocked in rdma_get_cm_event->write() and cq_thread was blocked 
> in 
> ibv_get_cq_event->read
> Taking the return value of the DISCONNECT event into consideration forcefully 
> killed the process.
> On delving deeper into this problem, I think that there is more to this rping 
> hang. Let me work on this further.
> 

I think rping needs some coordination on these threads and when they
should be killed. 

> On a related note - I noticed another rping hang in the following case
> - Start the rping as a client without first starting an rping server
> - If you are lucky the first run itself will result in the 'lt-rping' process 
> in 
> 'D' state. If not repeating the procedure will result in the hang.
> 
> This is the o/p.
> 
> cq completion failed status 5
> wait for CONNECTED state 10
> connect error -1
> 
> Thanks,
> Pradipta.
> 
> 
> > 
> 
> > Thanks,
> > 
> > Steve.
> > 
> > 
> > 
> > On Fri, 2006-06-23 at 00:53 +0530, Pradipta Kumar Banerjee wrote:
> >> Hi,
> >>  Please ignore the earlier mail. There were some problems with the mailer.
> >> Here is the new one.
> >>
> >> This patch fixes the problem as reported by Amith.
> >>
> >> Signed-off-by: Pradipta Kumar Banerjee <[EMAIL PROTECTED]>
> >>
> >> ---
> >>
> >> Index: rping.c
> >> =============================================================================
> >> --- rping.c.org    2006-06-23 00:22:17.000000000 +0530
> >> +++ rping.c        2006-06-23 00:39:06.000000000 +0530
> >> @@ -215,6 +215,7 @@ static int rping_cma_event_handler(struc
> >>    case RDMA_CM_EVENT_DISCONNECTED:
> >>            fprintf(stderr, "%s DISCONNECT EVENT...\n", cb->server ? 
> >> "server" : "client");
> >>            sem_post(&cb->sem);
> >> +          ret = -1;
> >>            break;
> >>  
> >>    case RDMA_CM_EVENT_DEVICE_REMOVAL:
> > 
> > 
> > _______________________________________________
> > openib-general mailing list
> > [email protected]
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > 


_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to