> We got a BSOD in Opensm - 10D, {b, 76157f00, 0, 8811d008}.
>
> Could you take a look ?
I looked into this, and I can't say that I see anything wrong in the code. :(
> Seems like BSOD has been caused by a race between the main and MAD reading
> threads of Opensm.
>
> The main thread has already closed the port and is now found in
> osm_subn_destroy():
>
> opensm_main
> ...
> osm_mad_pool_destroy(&p_osm->mad_pool);
> osm_vendor_delete(&p_osm->p_vendor); //
> port release
> osm_subn_destroy(&p_osm->subn);
> // the thread is found here now
>
> The reading thread is still in action:
>
> opensm!umad_receiver
> libibumad!umad_recv
> ...
> winmad!WmIoRead
> winmad!WmProviderRead
> WdfObjectAcquireLock(pProvider->ReadQueue); // BSOD
>
> A try to ReadQueue with !wdfqueue fails.
>
> Seems like pProvider is already released. But there is no any checks of its
> validity in WmProviderRead().
The pProvider->Ref is set to 0, which strongly suggests that the provider has
been released.
> Possible solution:
>
> Maybe WmIoRead() should check, that the Provider is not being released and
> take some reference, while WmProviderRemoveHandler() should wait to this
> reference to be removed ?
The provider object is (supposed to be) bound to the lifetime of the open
ControlDevice file. It is initialized in the EvtFileCreate callback and
released in the EvtFileCleanup callback. According to the MS documentation,
the EvtFileCleanup is called after the last handle to the file has been closed.
My assumption was that this meant that the file is no longer accessible for
any other access (ioctls, reads, or writes).
There is a vague note in the documentation that states: "(Because of
outstanding I/O requests, this handle might not have been released.)" I have
no idea what exactly this means. If it means that Windows may invoke calls on
a file during or after calling the EvtFileCleanup, then Windows is seriously
stupid.
As a simple test, we can *try* adding checks in wm_driver.c in
WmIoDeviceControl(), WmIoRead(), and WmIoWrite() that do something like:
if (prov->Ref == 0) {
WdfRequestComplete(Request, STATUS_WINDOWS_IS_STUPID);
return;
}
(A better solution may be to call WmProviderGet() / WmProviderPut(), with
WmProviderGet() returning whether or not we actually obtained the provider.)
What we really need to determine is whether Windows will invoke calls on a file
during or after calling the cleanup event callback, but I have no idea how we
can know that. And if it does, is it a 'feature' or a bug. If windows does
not do this, then the check above isn't a safe fix, since it depends on the
prov memory being accessible.
- Sean
_______________________________________________
ofw mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw