> As you can see the crash occurred  because pDevice == NULL.
> 
> It looks like hCa and hQp are in CL_INITIALIZED state, so
> WmRegRemoveHandler or WmRegInit (error flow) probably did not put NULL in
> the pDevice yet.

Are we hitting an error flow?

> Is it possible that in WmRegInit the line:
> 
> pRegistration->pDevice = dev;
> 
> was not called yet, but the line:
> 
> ib_status = dev->IbInterface.reg_mad_svc(pRegistration->hQp, &svc,
> &pRegistration->hService);
> 
> was already executed and a mad was received?

This does look like a problem.  Moving the pDevice assignment up should fix any 
potential issue there, and I believe it's safe to do so.

I do have a concern that the code crashed here:

        if (reg->hService == NULL) {
                reg->pDevice->IbInterface.put_mad(pMad);        <---

This means that hService was NULL, but winmad still received a callback.  I can 
understand receiving a callback before reg_mad_svc() returns, but from this 
call:

        ib_status = dev->IbInterface.reg_mad_svc(pRegistration->hQp, &svc,
                                                                
&pRegistration->hService);

hService should be set before any callback is invoked.  Looking at the code for 
reg_mad_svc, the service parameter is set at the end of the function.  So, it 
appears that ibal can begin reporting mads to the user before it has finished 
initializing the mad service, and may do so even in the case where reg_mad_svc 
fails.

We may need to move the call to __mad_disp_reg() to after the assignment of 
*ph_mad_svc in reg_mad_svc.  I don't know if it's safe to make that change, 
however.

- Sean
_______________________________________________
ofw mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw

Reply via email to