Steve Wise wrote:
> The goal of adding the return codes was so that the rping program could
> exit with a status indicating success or failure.  Every rping run
> results in a DISCONNECT event, so I don't think we want to treat that
> case as an error.
DISCONNECT event will be generated when the connection is closed or in case of 
some error (like CCAE_LLP_CONNECTION_LOST, CCAE_BAD_CLOSE in case of Ammasso 
driver etc).
> 
> 
> Also, can you explain why thi fixes Amith's problem, which sounded like
> a process was hanging?
> 
On debugging I found that the main thread was blocked in ibv_destroy_cq(), 
cm_thread was blocked in rdma_get_cm_event->write() and cq_thread was blocked 
in 
ibv_get_cq_event->read
Taking the return value of the DISCONNECT event into consideration forcefully 
killed the process.
On delving deeper into this problem, I think that there is more to this rping 
hang. Let me work on this further.

On a related note - I noticed another rping hang in the following case
- Start the rping as a client without first starting an rping server
- If you are lucky the first run itself will result in the 'lt-rping' process 
in 
'D' state. If not repeating the procedure will result in the hang.

This is the o/p.

cq completion failed status 5
wait for CONNECTED state 10
connect error -1

Thanks,
Pradipta.


> 

> Thanks,
> 
> Steve.
> 
> 
> 
> On Fri, 2006-06-23 at 00:53 +0530, Pradipta Kumar Banerjee wrote:
>> Hi,
>>  Please ignore the earlier mail. There were some problems with the mailer.
>> Here is the new one.
>>
>> This patch fixes the problem as reported by Amith.
>>
>> Signed-off-by: Pradipta Kumar Banerjee <[EMAIL PROTECTED]>
>>
>> ---
>>
>> Index: rping.c
>> =============================================================================
>> --- rping.c.org      2006-06-23 00:22:17.000000000 +0530
>> +++ rping.c  2006-06-23 00:39:06.000000000 +0530
>> @@ -215,6 +215,7 @@ static int rping_cma_event_handler(struc
>>      case RDMA_CM_EVENT_DISCONNECTED:
>>              fprintf(stderr, "%s DISCONNECT EVENT...\n", cb->server ? 
>> "server" : "client");
>>              sem_post(&cb->sem);
>> +            ret = -1;
>>              break;
>>  
>>      case RDMA_CM_EVENT_DEVICE_REMOVAL:
> 
> 
> _______________________________________________
> openib-general mailing list
> [email protected]
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 
> 


_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to