A couple of replies in one email:
Garrett D'Amore wrote: > I requested that the team not create a new DDI_QUIESCE command. The > expansion of detach(9e) and attach(9e) with new interfaces was, IMO, a > big mistake. The biggest problem is that it is impossible to tell (at > least easily) if a driver supports the new interfaces or not. By adding > a new devops entry point, it becomes nothing more than a matter of > routine inspection. (The current design is one of the major limiting > factors in suspend-to-ram. There is no predictability about whether > suspend will work or not -- the only way to know is to actually try it.) I guess inspection is a good reason to create a new entry in dev_ops, but it's still not a guarantee that quiesce(9e) will succeed. You could require the driver to export a property if it supports quiesce(9e) and get the same effect you are looking for without expanding dev_ops if you wanted to. There's nothing wrong with expanding dev_ops, but we don't have to expand it if we don't want to. Sherry Moore wrote: > Hi David, > >> Specifically, can open(9e) or any other entry point be called >> while the driver is in quiesce(9e)? I would think we would want >> to guard against that, and if not, the driver needs to know that. >> If the intent is that nothing else will be called while the >> driver is in quiesce(9e), please state that guarantee in the >> man page. > > Technically, yes; architecturally no. Since we're basically at the > same place as we would be when the reset() entry points are called, the > same sort of assumption hold here also. > > I will add the following to the man page: > > "open(9E) and other entry points should not be called while the driver > is in quiesce(9E)." I think the man page needs to provide a guarantee that the driver does not have to guard against open(9e) and close(9e) being called once the framework calls quiesce(9e), so that has to be worded appropriately. (Not a "should" for the implementation, but written as a normative requirement that the driver can rely on.) Something like: open(9e), close(9e), attach(9e), detach(9e) and other driver entry points shall not be called when quiesce(9e) is called. I'm not sure if that's enough. What if open, close, etc are in progress when the framework decides to call quiesce. Is that possible? So maybe it has to be worded even stronger such that the guarantee is that open, close, attach, detach and other driver entry points will not be active or called when the framework calls quiesce(9e). That still leaves the question of interrupt handlers, timers, callbacks, etc that could be called when the driver is in quiesce(9e) or might be active when the quiesce(9e) is called. This needs a bit of work, IMO. > >> Second, are there open instances of the driver when quiesce(9e) >> is called, or does the framework guarantee to close all open >> instances of the device first? > > Yes, as long as the driver can guarantee that no more asynchronous > writes occur to system memory as operation to the open instances. > > I will add that to the man page as well. Bear with me on this. A driver has scheduled the device to do a bunch of io by handing off a list of requests to the device. The driver probably uses completion interrupts or doorbells to the application directly to signal completions or errors from the device. Nothing has been done to tell the driver to stop doing that when we call quiesce(9e), so the device may still be writing to memory and sending interrupts. Will those interrupts be taken when the driver is in quiesce(9e)? If so, that's why the driver needs some sort of lock to guard against race conditions with it's interrupt handler. (Also, timers, callbacks, etc. Basically any interface that can asynchronously call into the driver while any other entry point is active.) There's always the possibility of a race inside the driver if any interrupt can be taken when we start to call quiesce(9e). Also, there's timer interrupts, soft interrupts, etc. Basically any sort of async event that the driver setup before quiesce(9e) was called. This needs some work also. >> Also, the phrase "guarantee on disk data integrety" is not >> well-defined from a driver perspective. If I put my driver hat >> on, I really have no idea what that requirement is. Do you expect >> the driver writer to do something specific in that case, or just >> reset the controller, dropping any scheduled or in-flight >> transactions? How does it know if the disk data integrety requirement >> can be met. > > Fair enough. I will remove the data integrity clause. From Fast > Reboot's perspective, it's preferred that the driver drop any scheduled > or in-flight transactions. Removing it is ok, but I'm not sure that's sufficient. We really have to be very explicit about what we expect the device to do and what it can expect in quiesce(9e). > >> Finally, I'm not sure if you want to say anything about DMA >> mappings, etc. I guess they can be thrown away because the >> driver for any iommu in the path (yes, x86 has them too) will >> end up being reset by its driver? > > OK. How about "asynchronous writes to system memory" instead? Not sure. Basically, I think what you are trying to create is an interface to the driver that gets the device into a quiet reset state. But doing that in the driver may be more complicated than you think if it has to worry about taking interrupts, soft ints, timers, callbacks or any other async event while it's in quiesce(9e). > >> How will console output work during this process? > > Console output will continue to work until the chain of drivers for the > console are quiesced, which will be performed last. What happens after that and before we're ready to do console output again? What if the process fails? There's no way to display a message on the console? What is the behavior of this stuff if the process fails? I suppose the fallback could be to do a normal reset, as long as we end up booting the same thing that would have been booted had this process succeeded. Is that part of the interface? >> Also, which ddi services is the driver permitted to call and not >> permitted to call as part of it's quiesce(9e) implementation? > > All DDI interfaces can be called except those prevented by > devtree_freeze(). So I can create timers, add interrupts, wait for locks, etc? > >> I guess what you are doing is loading the new ramdisk or kernel >> into memory and then calling quiesce(9e)? How would that work >> in a panic situation? (Yes, I saw that's part of phase 2, but >> it still needs to fit these interfaces.) > > We will probably require drivers to implement quiesce() in a lock-free, > no-memory-allocation, do-the-bare-minimum manner to poke only the few > registers to stop all asynchronous writes to system memory and > interrupt generate. That's why the quisce() interface has an "arg" > field... The quiesce(9e) man page is one of the interfaces that PSARC is being asked to approve with this case, right? The man page doesn't say that. I think this project needs a bit of work before it can just time out as a fast-track. It should be possible for you to do this, but the interfaces, especially quiesce(9e) and what the driver can do and is supposed to do needs some more thought, IMO. It seems like what you want to do is stop interrupt dispatch, stop callback threads, timer callbacks, soft ints, etc, and then just call into the driver to tell it to stop whatever it might be doing and get the device into a quiet/reset state. You have to guarantee in the interface all of the above conditions in order to implement something like that and show the driver writer that they don't need to use locks or worry about any other part of the driver becoming active when it's in quiesce(9e). Once the driver returns successfully from quiesce(9e), it basically guarantees that the device will no longer access system memory (we need more there.) I think we also have to guarantee that once quiesce(9e) is called, we won't call into the driver in any e.p. or callback, interrupt, timer, etc anymore. -David
