Re: [PATCH RFD] alternative kobject release wait mechanism
On Thu, 26 Apr 2007 10:58:45 -0400 (EDT), Alan Stern <[EMAIL PROTECTED]> wrote: > > > > The remove() method must also unset driver_data. > > > > > > It doesn't really have to. The driver core could do it. > > > > I think it is more consistent if the driver takes care of the fields > > specifically designed for its usage. > > Yes. However if the driver forgets to clear the field it shouldn't cause > a warning. After all, there won't be any harm; the next driver to bind > to the device will just overwrite the driver_data anyway. Agreed, but it is still a good practice and should be recommended. > > Yes. Especially since the "gone"-field may be contained in that > > embedding structure if the subsystem controls it. > > No, no! The "gone" flag must be in the private data structure. If it > were in a container of the device structure, then it could be overwritten > when a different driver binds to the device. Argl, thinko again. You're right, of course. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Thu, 26 Apr 2007, Cornelia Huck wrote: > On Wed, 25 Apr 2007 16:13:12 -0400 (EDT), > Alan Stern <[EMAIL PROTECTED]> wrote: > > > > Documentation/driver-model/lifetime-rules.txt? > > > > When (if) such a file is added, it should contain more than just these few > > paragraphs. > > This file may be a good idea in general. I'll see if I can come up with > something useful. Good. > > > The remove() method must also unset driver_data. > > > > It doesn't really have to. The driver core could do it. > > I think it is more consistent if the driver takes care of the fields > specifically designed for its usage. Yes. However if the driver forgets to clear the field it shouldn't cause a warning. After all, there won't be any harm; the next driver to bind to the device will just overwrite the driver_data anyway. > > If every driver follows this rule, we may not need the kobject->owner > > stuff. Although it wouldn't hurt to keep it for safety's sake. > > I'm still concerned about that problem. It is that there is simply no > guarantee that everybody released their reference before the module's > exit function returned (the driver itself can and must assure that it > drops its last reference before it finishes exit, but it just can't > control who else holds a reference). I'm much in favour of adding a bit > of code than to risk oopses about which the driver can't do anything > itself (plus, the race exists now, but the immediate disconnect will > involve auditing all drivers). I agree. > > > > The driver must restrict itself to reading (not > > > > writing!) the fields in the device structure. The > > > > only exception is that the driver may lock/unlock > > > > dev->sem. > > > > > > And it may decrease the reference count, of course :) > > > > :-) Actually this should be a little more general, since the device will > > generally be embedded in a larger structure created by the subsystem. The > > driver should also be able to acquire and release locks in that larger > > structure. > > Yes. Especially since the "gone"-field may be contained in that > embedding structure if the subsystem controls it. No, no! The "gone" flag must be in the private data structure. If it were in a container of the device structure, then it could be overwritten when a different driver binds to the device. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Wed, 25 Apr 2007 16:13:12 -0400 (EDT), Alan Stern <[EMAIL PROTECTED]> wrote: > > Documentation/driver-model/lifetime-rules.txt? > > When (if) such a file is added, it should contain more than just these few > paragraphs. This file may be a good idea in general. I'll see if I can come up with something useful. > > The remove() method must also unset driver_data. > > It doesn't really have to. The driver core could do it. I think it is more consistent if the driver takes care of the fields specifically designed for its usage. > It should never need to make that transition. The only routines in the > driver which get passed a pointer to the device (as opposed to a pointer > to the private data) should be the methods called by the driver core or > sysfs. Neither will make any calls to the driver after remove() has > returned, unless the driver does something wrong. OK, thinko on my side. > > Uhm. This would imply that a driver may never create a device itself. > > A trivial omission on my part -- "The driver must also retain a module > reference to the owner of the device (unless the driver is itself the > device's owner)." > > > However, the kobject->owner and module refcounting stuff should remove > > this restriction. > > If every driver follows this rule, we may not need the kobject->owner > stuff. Although it wouldn't hurt to keep it for safety's sake. I'm still concerned about that problem. It is that there is simply no guarantee that everybody released their reference before the module's exit function returned (the driver itself can and must assure that it drops its last reference before it finishes exit, but it just can't control who else holds a reference). I'm much in favour of adding a bit of code than to risk oopses about which the driver can't do anything itself (plus, the race exists now, but the immediate disconnect will involve auditing all drivers). > > > The driver must restrict itself to reading (not > > > writing!) the fields in the device structure. The > > > only exception is that the driver may lock/unlock > > > dev->sem. > > > > And it may decrease the reference count, of course :) > > :-) Actually this should be a little more general, since the device will > generally be embedded in a larger structure created by the subsystem. The > driver should also be able to acquire and release locks in that larger > structure. Yes. Especially since the "gone"-field may be contained in that embedding structure if the subsystem controls it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Wed, 25 Apr 2007, Cornelia Huck wrote: > On Tue, 24 Apr 2007 15:38:12 -0400 (EDT), > Alan Stern <[EMAIL PROTECTED]> wrote: > > > We ought to make it explicitly clear that _all_ subsystems should behave > > this way. Maybe it isn't necessary to go as far as having device_del() > > call itself recursively; doing that would open up lots of possible races. > > But I think it would be a good idea to add a WARN_ON in device_del, right > > after the call to bus_remove_device(), that would be triggered if the > > device still had any children. > > If we decide that this should be policy, WARN_ON may be the least > invasive option. Yes. I have changed my mind regarding your earlier proposal; I now think it's best to make it "mandatory" for remove() to unregister all children. "Mandatory" in the sense that not doing so will provoke a big fat WARN_ON complaint. Having device_del() call itself recursively is not a good idea. It would destroy that guarantee that no one but the creator of a device will unregister it. > Should it be a possible option to move children to a different parent, > so that the requirement wouldn't be "unregister all children", but "no > children remain after remove() returns"? That might confuse udev, but otherwise it could be okay. > > It would also be good to document (but where?) some lifetime rules for > > device drivers. > > Documentation/driver-model/lifetime-rules.txt? When (if) such a file is added, it should contain more than just these few paragraphs. > > Something like this: > > > > When a driver's remove() method returns, the driver must no > > longer try to use the device it was just unbound from. The > > device may be physically gone, or a different driver may be > > bound to it. Most importantly, remove() should unregister > > all child devices created by the driver. > > s/should/must/, if we agree on the policy above. Yes. > The remove() method must also unset driver_data. It doesn't really have to. The driver core could do it. > > To accomplish all this safely, the driver should allocate a > > private data structure containing at least a "gone" flag and > > a mutex or spinlock for synchronization. Each time the driver > > needs to use the device, it should first lock the mutex or > > spinlock and check the "gone" flag. > > How should a driver make the device -> private data transition if it > may no longer have private data attached to the device? It should never need to make that transition. The only routines in the driver which get passed a pointer to the device (as opposed to a pointer to the private data) should be the methods called by the driver core or sysfs. Neither will make any calls to the driver after remove() has returned, unless the driver does something wrong. > > Ideally remove() should release all of the driver's references > > to the device, in accord with the "Immediate Detach" principle. > > However it is acceptable for the driver to retain a reference, > > provided it meets the following conditions: > > > > The reference must be dropped in a timely manner, > > such as when the release() methods for all child > > devices have run. > > > > The driver must also retain a module reference to > > the owner of the device. In practice this means the > > driver must contain static code references to the > > subsystem which created the device, since struct > > device doesn't have an "owner" field. > > Uhm. This would imply that a driver may never create a device itself. A trivial omission on my part -- "The driver must also retain a module reference to the owner of the device (unless the driver is itself the device's owner)." > However, the kobject->owner and module refcounting stuff should remove > this restriction. If every driver follows this rule, we may not need the kobject->owner stuff. Although it wouldn't hurt to keep it for safety's sake. > > The driver must restrict itself to reading (not > > writing!) the fields in the device structure. The > > only exception is that the driver may lock/unlock > > dev->sem. > > And it may decrease the reference count, of course :) :-) Actually this should be a little more general, since the device will generally be embedded in a larger structure created by the subsystem. The driver should also be able to acquire and release locks in that larger structure. On a related topic, I complained before about the problem with char device unregistration. Below is an example patch showing how the USB mechanism can be made safe. It should be easy to understand, even for people who aren't familiar with how the USB core works. Basically it amounts to replacing a spinlock with an rwsem, which can then be held throughout an entire call to an f_op->open() method. Without somethi
Re: [PATCH RFD] alternative kobject release wait mechanism
On Tue, 24 Apr 2007 15:38:12 -0400 (EDT), Alan Stern <[EMAIL PROTECTED]> wrote: > We ought to make it explicitly clear that _all_ subsystems should behave > this way. Maybe it isn't necessary to go as far as having device_del() > call itself recursively; doing that would open up lots of possible races. > But I think it would be a good idea to add a WARN_ON in device_del, right > after the call to bus_remove_device(), that would be triggered if the > device still had any children. If we decide that this should be policy, WARN_ON may be the least invasive option. Should it be a possible option to move children to a different parent, so that the requirement wouldn't be "unregister all children", but "no children remain after remove() returns"? > It would also be good to document (but where?) some lifetime rules for > device drivers. Documentation/driver-model/lifetime-rules.txt? > Something like this: > > When a driver's remove() method returns, the driver must no > longer try to use the device it was just unbound from. The > device may be physically gone, or a different driver may be > bound to it. Most importantly, remove() should unregister > all child devices created by the driver. s/should/must/, if we agree on the policy above. The remove() method must also unset driver_data. > To accomplish all this safely, the driver should allocate a > private data structure containing at least a "gone" flag and > a mutex or spinlock for synchronization. Each time the driver > needs to use the device, it should first lock the mutex or > spinlock and check the "gone" flag. How should a driver make the device -> private data transition if it may no longer have private data attached to the device? > Ideally remove() should release all of the driver's references > to the device, in accord with the "Immediate Detach" principle. > However it is acceptable for the driver to retain a reference, > provided it meets the following conditions: > > The reference must be dropped in a timely manner, > such as when the release() methods for all child > devices have run. > > The driver must also retain a module reference to > the owner of the device. In practice this means the > driver must contain static code references to the > subsystem which created the device, since struct > device doesn't have an "owner" field. Uhm. This would imply that a driver may never create a device itself. However, the kobject->owner and module refcounting stuff should remove this restriction. > The driver must restrict itself to reading (not > writing!) the fields in the device structure. The > only exception is that the driver may lock/unlock > dev->sem. And it may decrease the reference count, of course :) > How does that sound? Yes, we should have some documentation like that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Sun, 22 Apr 2007, Greg KH wrote: > > Looking some more, kobject_get_path() is used for kobject renaming, > > uevent handling, and a little bit in the input core. None of these things > > should try to access a kobject after it has been del()ed. After all, it's > > no longer present in the filesystem so it doesn't _have_ a path. > > But we _have_ to have a full path at that time to tell userspace what > just went away. That is the main reason we enforce this (there were > tons of issues with scsi devices and this in the past which is what > caused us to enforce this.) The SCSI subsystem has undergone many, many changes since 2.6.0. In particular, it has implemented a full-fledged device-state model, complete with spinlock-protected state transitions. I'll have to check with James Bottomley, but I bet that SCSI now always unregisters all the children of a device before unregistering the device itself. For everyone: We ought to make it explicitly clear that _all_ subsystems should behave this way. Maybe it isn't necessary to go as far as having device_del() call itself recursively; doing that would open up lots of possible races. But I think it would be a good idea to add a WARN_ON in device_del, right after the call to bus_remove_device(), that would be triggered if the device still had any children. It would also be good to document (but where?) some lifetime rules for device drivers. Something like this: When a driver's remove() method returns, the driver must no longer try to use the device it was just unbound from. The device may be physically gone, or a different driver may be bound to it. Most importantly, remove() should unregister all child devices created by the driver. To accomplish all this safely, the driver should allocate a private data structure containing at least a "gone" flag and a mutex or spinlock for synchronization. Each time the driver needs to use the device, it should first lock the mutex or spinlock and check the "gone" flag. Ideally remove() should release all of the driver's references to the device, in accord with the "Immediate Detach" principle. However it is acceptable for the driver to retain a reference, provided it meets the following conditions: The reference must be dropped in a timely manner, such as when the release() methods for all child devices have run. The driver must also retain a module reference to the owner of the device. In practice this means the driver must contain static code references to the subsystem which created the device, since struct device doesn't have an "owner" field. The driver must restrict itself to reading (not writing!) the fields in the device structure. The only exception is that the driver may lock/unlock dev->sem. How does that sound? Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Mon, 23 Apr 2007, Cornelia Huck wrote: > On Sun, 22 Apr 2007 10:40:51 -0700, > Greg KH <[EMAIL PROTECTED]> wrote: > > > > Looking some more, kobject_get_path() is used for kobject renaming, > > > uevent handling, and a little bit in the input core. None of these > > > things > > > should try to access a kobject after it has been del()ed. After all, > > > it's > > > no longer present in the filesystem so it doesn't _have_ a path. > > > > But we _have_ to have a full path at that time to tell userspace what > > just went away. That is the main reason we enforce this (there were > > tons of issues with scsi devices and this in the past which is what > > caused us to enforce this.) > > What we need to ensure is that the parent device is kept at least until > all children, grandchildren and so on are done with their uevent needs. > This would imply it needed to stay as long as those children, > grandchildren, ... are still registered. Would it be save to suggest > that a ->remove callback would always need to unregister the children? > Then putting the parent reference at the end of kobject_del() (which is > after kobject_uevent() in kobject_unregister()) should be safe. Yes, this is the weak spot. For some reason I had assumed that it was illegal to unregister a device while it had registered children (just as it is illegal to rmdir a non-empty directory). If it isn't illegal, then perhaps we should arrange things so that device_del() will recursively call itself for all the device's children. > Question: What now? > > 1. Make it mandatory that all children must be unregistered when > device_del() returns. > > 2. Don't demand an empty directory in sysfs_drop_dentry(). I'm in favor of (1). But instead of making it mandatory, simply force it to be true by having device_del() call itself for all remaining children. For this to be safe, we also have to allow device_del() to be called multiple times (since the device's owner might not be aware that the core had already unregistered it). That's no problem; just add: if (!device_is_registered(dev)) return; to the start of the routine. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Sun, 22 Apr 2007 10:40:51 -0700, Greg KH <[EMAIL PROTECTED]> wrote: > > Looking some more, kobject_get_path() is used for kobject renaming, > > uevent handling, and a little bit in the input core. None of these things > > should try to access a kobject after it has been del()ed. After all, it's > > no longer present in the filesystem so it doesn't _have_ a path. > > But we _have_ to have a full path at that time to tell userspace what > just went away. That is the main reason we enforce this (there were > tons of issues with scsi devices and this in the past which is what > caused us to enforce this.) What we need to ensure is that the parent device is kept at least until all children, grandchildren and so on are done with their uevent needs. This would imply it needed to stay as long as those children, grandchildren, ... are still registered. Would it be save to suggest that a ->remove callback would always need to unregister the children? Then putting the parent reference at the end of kobject_del() (which is after kobject_uevent() in kobject_unregister()) should be safe. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Mon, Apr 23, 2007 at 03:40:21PM +0900, Tejun Heo wrote: > Hello, Dmitry. > > Dmitry Torokhov wrote: > > Isn't think a good thing? By decoupling the 2 layers we insulate them > > from changes in each other. This allows bug subsystems to concentrate > > on topics that important to them instead of worying about refcounting > > objects that are not directly interesting for the subsystem in > > question. > > I think the best thing would be make struct device's lifetime rules > simple enough such that it doesn't really matter to driver subsystems > and drivers can just do what they wanna do. I agree. > Also, separate struct device from the actual implementation has problem > in that struct device is widely used to refer to the device by many > layers drivers register devices to. Basically, you'll have to implement > immediate-disconnect between struct device and the actual > implementation. So, it just shifts the problem from struct device to > the place between struct device and actual implementation and I think > struct device itself is better place to deal with that than somewhere > inbetween it and driver private data. I also agree. > > Now for smaller subsystems it may make sense to embed stuct devices > > into subsystem objects and manage it all together. In fact input > > system does this but I think it is much simlpier than SCSI or IDE. > > Well, both SCSI and IDE heavily depend on struct device acting as 'base > class'. It's all over the place and almost a basic assumption about the > driver model. And that's how it should be. And your sysfs patches now make it a lot easier than before, and I can't thank you enough for doing that work. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Hello, Dmitry. Dmitry Torokhov wrote: > Isn't think a good thing? By decoupling the 2 layers we insulate them > from changes in each other. This allows bug subsystems to concentrate > on topics that important to them instead of worying about refcounting > objects that are not directly interesting for the subsystem in > question. I think the best thing would be make struct device's lifetime rules simple enough such that it doesn't really matter to driver subsystems and drivers can just do what they wanna do. Also, separate struct device from the actual implementation has problem in that struct device is widely used to refer to the device by many layers drivers register devices to. Basically, you'll have to implement immediate-disconnect between struct device and the actual implementation. So, it just shifts the problem from struct device to the place between struct device and actual implementation and I think struct device itself is better place to deal with that than somewhere inbetween it and driver private data. > Now for smaller subsystems it may make sense to embed stuct devices > into subsystem objects and manage it all together. In fact input > system does this but I think it is much simlpier than SCSI or IDE. Well, both SCSI and IDE heavily depend on struct device acting as 'base class'. It's all over the place and almost a basic assumption about the driver model. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Sat, Apr 21, 2007 at 05:36:51PM -0400, Alan Stern wrote: > On Fri, 20 Apr 2007, Greg KH wrote: > > > > Greg, do you know of anything in particular that depends on a kobjects > > > not > > > being released before their children are released? > > > > Yes, the whole driver model :) > > But anything in particular? Looking through the source code, I see > kobj->parent gets used mainly by kobject_get_path() and not by much else. Which is essencial for when a kobject is removed from the system and we need to tell userspace which kobject it was. > Looking some more, kobject_get_path() is used for kobject renaming, > uevent handling, and a little bit in the input core. None of these things > should try to access a kobject after it has been del()ed. After all, it's > no longer present in the filesystem so it doesn't _have_ a path. But we _have_ to have a full path at that time to tell userspace what just went away. That is the main reason we enforce this (there were tons of issues with scsi devices and this in the past which is what caused us to enforce this.) thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Fri, 20 Apr 2007, Greg KH wrote: > > Greg, do you know of anything in particular that depends on a kobjects not > > being released before their children are released? > > Yes, the whole driver model :) But anything in particular? Looking through the source code, I see kobj->parent gets used mainly by kobject_get_path() and not by much else. Looking some more, kobject_get_path() is used for kobject renaming, uevent handling, and a little bit in the input core. None of these things should try to access a kobject after it has been del()ed. After all, it's no longer present in the filesystem so it doesn't _have_ a path. So I don't see any immediate problem. A quick boot with my patch applied, during which I installed and removed various modules and hot-pluggable devices, didn't cause anything strange to happen. > When adding a new device, we always grab a reference to the parent > device so it can not go away before we do. > > Look at the last kobject_put(parent); in kobject_cleanup() which ensures > this. Yes, I know. The question is: What (if anything) is wrong with the parent going away first? So long as the parent remains present while the child is _registered_, who will care if the parent is deallocated after the child is unregistered but before it is released? And if there is any code which does care, don't you think we should be able to change it so that it doesn't? > Ick, no, I think this used to be the way things worked, but bad things > would end up happening, so we fixed it up to be the way things are > today. Read the comments for the changelog for this file for details. > > Specifically, look at commit 10921a8f1305b8ec97794941db78b825db5839bc > in the history.git repo which is almost exactly what you are proposing > to be reverted... Yes, it is. I had a little trouble finding it; the search facility in the gitweb system at git.kernel.org doesn't seem to work right. Who should I complain to about that? Anyway, the patch itself is available at http://marc.info/?l=linux-kernel&m=107116644617624&w=2 Here's what the changelog comment says: It fixes a kobject bug where the parent could be deleted before the child object, causing all sorts of badness later when we clean up the child object. It's been acked by Pat. Not terribly explicit. As far as I can tell, cleaning up the child object doesn't do much except to kfree() a few items and call the ktype's release() routine. For a struct device, the release() routine merely calls dev->release(), or dev->type->release(), or dev->class->dev_release() as the case may be. None of these should try to access the device's parent unless they made special arrangements to acquire a reference to it beforehand. Which is what we're trying to eliminate -- that's what immediate detach means. The change was made back in December 2003, for 2.6.0-test11. Since then the driver-model core and its users have evolved an awful lot. Perhaps reverting it now won't hurt anything. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Fri, 20 Apr 2007, Dmitry Torokhov wrote: > On 4/19/07, Alan Stern <[EMAIL PROTECTED]> wrote: > > > > Among the worst offenders are character devices. None of the subsystem > > providers offering char device registration performs immediate detach -- > > they are a lot like sysfs used to be. (In fact, they probably _can't_ > > provide it since read() or write() calls can block indefinitely in the > > lower-level driver; the subsystem may have no way to force them to > > fail-fast.) > > > That shoudl be doable - the read/write routines should check if device > is still alive and return immideately when device is dead. When > disconnecting device just wake up all readers and writers and they > should eventually fall off. Read and write don't matter so much; they can be handled easily enough in the driver. The real problem is the race between open() and unregister(). We rely on the low-level drivers to make them mutually exclusive, but it really should be done by the subsystem. More precisely, the subsystem should guarantee that all outstanding calls to open() have completed before unregister() returns. > Hmm, I guess am starting to think that using refcounting everywhere is > not a good idea. We are trying to have "immediate disconnect" behavior > and refcounting is an antithesis of it. Refcounting works well when it > is contained - register grabs reference; unregister puts it back; but > there is no passing references down between the layers. When device is > being removed it needs to signal downstream that it is gone and should > not be accessed anymore. And we need to do that anyway because if > device is really gone but its users ignore it they will get endless > stream of errors when trying to access it. "Should not be accessed" is correct, but there's nothing wrong in principle with acquiring a lock in the device structure, provided the driver does nothing else. And provided that the driver pins the subsystem module, so the device's release() routine will be called _before_ the subsystem can be unloaded. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Fri, 20 Apr 2007, Dmitry Torokhov wrote: > On 4/20/07, Alan Stern <[EMAIL PROTECTED]> wrote: > > > > Dmitry, in thinking things over some more I realized there's going to be a > > problem with the autosuspend support in USB. It has to do with the way a > > driver needs to prevent (or block) suspends from occurring while it is > > actively using a device. > > > > To understand this, you need to know that USB adds a pm_mutex to the > > device structure. It gets used for synchronizing all suspend- or > > resume-related activities. In particular, the driver's suspend() method > > is called with usbdev->pm_mutex held. > > > > The next idea is that a driver needs to synchronize its remove() method > > with other methods such as read() -- we mustn't allow read() to try and > > refer to the device after the driver has been unbound. Let's say the > > driver has unbind_mutex embedded in its private data structure for this > > purpose. > > > > Now consider what read() has to do. It needs to block suspends from > > occurring while it runs, and it needs to do an autoresume if the device > > was already suspended. So read() will look like this: > > > >mutex_lock(&private->unbind_mutex); > >if (private->gone) { > >ret = -ENODEV; > >goto done; > >} > >if (private->suspended) > >autoresume(private->usbdev); > >... > > done: > >mutex_unlock(&private->unbind_mutex); > >return ret; > > > > Meanwhile, the suspend() method needs to block while read() is running. > > So it will look like this: > > > >private->suspended = 1; > >mutex_lock(&private->unbind_mutex); > >/* Now the driver has quiesced */ > >mutex_unlock(&private->unbind_mutex); > > > > Here's the problem: The autoresume() call inside read() tries to acquire > > usbdev->pm_mutex while holding private->unbind_mutex. The suspend() > > method does the reverse, a locking-order violation. > > > > So far I haven't figured out any way to make this work. Do you have a > > suggestion? (Don't say read() should hold pm_mutex as well as > > unbind_mutex; that's no good -- although the reason is rather obscure.) > > > > First of all I think that I would merge pm and unbind mutex into one - > you also need to synchronize resume and suspend with bind and unbind > so you pretty much need to always acquire both of them. The USB core already insures that suspend and resume are mutually exclusive with bind and unbind, so that part doesn't matter. Besides, there's a more important reason for not merging the pm and unbind mutexes: The pm mutex lies in the usb_device structure but the unbind mutex lies in the private data structure. It has to be this way. The pm mutex is used by the USB core, which doesn't know anything about the private data. Conversely, the driver has to acquire the unbind mutex at times when it doesn't know whether or not it has already been unbound, so that mutex must go in the private data. Otherwise the driver might try to acquire the mutex -- thereby touching the usb_device -- after it was unbound, violating immediate detachment. > Second (and I admit I have not followed USB autoresume discussions, my > usb-devle backlog is over 5000 messages ;( ) is I am not sure why > woudl a read block autoresume. I can see write doing that but passive > reads should not affect device state. I was just using read() as an example. And it wouldn't block autoresume; it would _do_ an autoresume, thereby preventing autosuspend. Another possible arrangement would be to have open() do an autoresume (preventing autosuspend during the entire time the device file is held open) and have close() re-enable autosuspend. But the principle remains the same. I don't know, perhaps adding another mutex would work. I need to think about it some more. There's always the easy solution: Don't do a complete immediate detach. The driver could keep a reference to the device structure for an extended period, provided it always checks private->gone whenever it locks usbdev->sem. This would mean that the USB core might not be able to release usbdev until the lower-level driver was unloaded. But that's okay: Since the lower-level driver contains code references to routines in the USB core, the core would be pinned anyway. This example does show that implementing immediate detach comes with a price. It makes things more complicated than they would be otherwise. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Fri, Apr 20, 2007 at 11:40:39AM -0400, Alan Stern wrote: > Here's a patch to do what I mentioned earlier. Not tested -- it may > expose some existing bugs. It may even break something, but I'm not aware > of anything that depends on it explicitly. > > Greg, do you know of anything in particular that depends on a kobjects not > being released before their children are released? Yes, the whole driver model :) When adding a new device, we always grab a reference to the parent device so it can not go away before we do. Look at the last kobject_put(parent); in kobject_cleanup() which ensures this. > Index: usb-2.6/lib/kobject.c > === > --- usb-2.6.orig/lib/kobject.c > +++ usb-2.6/lib/kobject.c > @@ -192,12 +192,15 @@ void kobject_init(struct kobject * kobj) > > static void unlink(struct kobject * kobj) > { > + struct kobject *parent = kobj->parent; > + > if (kobj->kset) { > spin_lock(&kobj->kset->list_lock); > list_del_init(&kobj->entry); > spin_unlock(&kobj->kset->list_lock); > } > kobject_put(kobj); > + kobject_put(parent); > } > > /** > @@ -241,7 +244,6 @@ int kobject_shadow_add(struct kobject * > if (error) { > /* unlink does the kobject_put() for us */ > unlink(kobj); > - kobject_put(parent); > > /* be noisy on error issues */ > if (error == -EEXIST) > @@ -489,7 +491,6 @@ void kobject_cleanup(struct kobject * ko > { > struct kobj_type * t = get_ktype(kobj); > struct kset * s = kobj->kset; > - struct kobject * parent = kobj->parent; > > pr_debug("kobject %s: cleaning up\n",kobject_name(kobj)); > if (kobj->k_name != kobj->name) > @@ -505,7 +506,6 @@ void kobject_cleanup(struct kobject * ko > > if (s) > kset_put(s); > - kobject_put(parent); > } Ick, no, I think this used to be the way things worked, but bad things would end up happening, so we fixed it up to be the way things are today. Read the comments for the changelog for this file for details. Specifically, look at commit 10921a8f1305b8ec97794941db78b825db5839bc in the history.git repo which is almost exactly what you are proposing to be reverted... thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On 4/20/07, Tejun Heo <[EMAIL PROTECTED]> wrote: Hello, Dmitry. Dmitry Torokhov wrote: >> Many drivers (at least all the SCSI/IDE ones) consider struct device as >> the base class of the devices those drivers implement. I don't think we >> can just consider those drivers to be wrong. > > I am not saying they are wrong I am just saying that driver core is > not where most work is done. Every subsystem has its own locking rules > and lifetime rules and they are what is important. Whether subsystem > uses embedding or referencing of struct devices is implementation > detail. > > What is the goal of driver core? I thought the main goal was to have > an uniform interface for device power management and presentation of > device tree to userspace. It has nothing to do with main purposes of > every individual subsystem - making some set of devices/services work. I think we're running in circle here. 1. The driver's lifetime rules matters but currently the driver model imposes reference counted model to each struct device. 2. You can decouple struct device completely from actual driver implementation. I agree with you that #2 is possible but just don't think it's the right thing to do. By making struct device independent from driver implementation doesn't help solving lifetime problems in drivers. It just insulates driver model from those. Isn't think a good thing? By decoupling the 2 layers we insulate them from changes in each other. This allows bug subsystems to concentrate on topics that important to them instead of worying about refcounting objects that are not directly interesting for the subsystem in question. Now for smaller subsystems it may make sense to embed stuct devices into subsystem objects and manage it all together. In fact input system does this but I think it is much simlpier than SCSI or IDE. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Hello, Dmitry. Dmitry Torokhov wrote: >> Many drivers (at least all the SCSI/IDE ones) consider struct device as >> the base class of the devices those drivers implement. I don't think we >> can just consider those drivers to be wrong. > > I am not saying they are wrong I am just saying that driver core is > not where most work is done. Every subsystem has its own locking rules > and lifetime rules and they are what is important. Whether subsystem > uses embedding or referencing of struct devices is implementation > detail. > > What is the goal of driver core? I thought the main goal was to have > an uniform interface for device power management and presentation of > device tree to userspace. It has nothing to do with main purposes of > every individual subsystem - making some set of devices/services work. I think we're running in circle here. 1. The driver's lifetime rules matters but currently the driver model imposes reference counted model to each struct device. 2. You can decouple struct device completely from actual driver implementation. I agree with you that #2 is possible but just don't think it's the right thing to do. By making struct device independent from driver implementation doesn't help solving lifetime problems in drivers. It just insulates driver model from those. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On 4/19/07, Alan Stern <[EMAIL PROTECTED]> wrote: Among the worst offenders are character devices. None of the subsystem providers offering char device registration performs immediate detach -- they are a lot like sysfs used to be. (In fact, they probably _can't_ provide it since read() or write() calls can block indefinitely in the lower-level driver; the subsystem may have no way to force them to fail-fast.) That shoudl be doable - the read/write routines should check if device is still alive and return immideately when device is dead. When disconnecting device just wake up all readers and writers and they should eventually fall off. Hmm, I guess am starting to think that using refcounting everywhere is not a good idea. We are trying to have "immediate disconnect" behavior and refcounting is an antithesis of it. Refcounting works well when it is contained - register grabs reference; unregister puts it back; but there is no passing references down between the layers. When device is being removed it needs to signal downstream that it is gone and should not be accessed anymore. And we need to do that anyway because if device is really gone but its users ignore it they will get endless stream of errors when trying to access it. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Hi Tejun, On 4/20/07, Tejun Heo <[EMAIL PROTECTED]> wrote: Hello, Dmitry. Dmitry Torokhov wrote: > On 4/19/07, Cornelia Huck <[EMAIL PROTECTED]> wrote: >> On Thu, 19 Apr 2007 09:13:43 -0400, >> "Dmitry Torokhov" <[EMAIL PROTECTED]> wrote: >> >> > Because they are managed by 2 different entities. the struct device >> > objects are managed by device core and driver-specific objects are >> > managed by their respective driver. >> >> Not sure if I understand you here. My view of this was always that the >> embedding object was kind of an extended device and that the relevant >> driver/subsystem managed it through the driver core infrastructure. >> > > I am not sure if I agree with this point of view. Driver (or > subsystem) provides an instance of struct device for the rest of the > system to iteract uniformly with (suspend/resume/tree > visualization/etc) i.e. struct device implement an interface for > subsystems. However most of the system use their own mechanisms to > manage their devices. They can rely on the driver core to a certain > degree but driver core is mostly a carries out helper functions, not > the meat. Many drivers (at least all the SCSI/IDE ones) consider struct device as the base class of the devices those drivers implement. I don't think we can just consider those drivers to be wrong. I am not saying they are wrong I am just saying that driver core is not where most work is done. Every subsystem has its own locking rules and lifetime rules and they are what is important. Whether subsystem uses embedding or referencing of struct devices is implementation detail. What is the goal of driver core? I thought the main goal was to have an uniform interface for device power management and presentation of device tree to userspace. It has nothing to do with main purposes of every individual subsystem - making some set of devices/services work. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On 4/20/07, Alan Stern <[EMAIL PROTECTED]> wrote: Dmitry, in thinking things over some more I realized there's going to be a problem with the autosuspend support in USB. It has to do with the way a driver needs to prevent (or block) suspends from occurring while it is actively using a device. To understand this, you need to know that USB adds a pm_mutex to the device structure. It gets used for synchronizing all suspend- or resume-related activities. In particular, the driver's suspend() method is called with usbdev->pm_mutex held. The next idea is that a driver needs to synchronize its remove() method with other methods such as read() -- we mustn't allow read() to try and refer to the device after the driver has been unbound. Let's say the driver has unbind_mutex embedded in its private data structure for this purpose. Now consider what read() has to do. It needs to block suspends from occurring while it runs, and it needs to do an autoresume if the device was already suspended. So read() will look like this: mutex_lock(&private->unbind_mutex); if (private->gone) { ret = -ENODEV; goto done; } if (private->suspended) autoresume(private->usbdev); ... done: mutex_unlock(&private->unbind_mutex); return ret; Meanwhile, the suspend() method needs to block while read() is running. So it will look like this: private->suspended = 1; mutex_lock(&private->unbind_mutex); /* Now the driver has quiesced */ mutex_unlock(&private->unbind_mutex); Here's the problem: The autoresume() call inside read() tries to acquire usbdev->pm_mutex while holding private->unbind_mutex. The suspend() method does the reverse, a locking-order violation. So far I haven't figured out any way to make this work. Do you have a suggestion? (Don't say read() should hold pm_mutex as well as unbind_mutex; that's no good -- although the reason is rather obscure.) First of all I think that I would merge pm and unbind mutex into one - you also need to synchronize resume and suspend with bind and unbind so you pretty much need to always acquire both of them. Second (and I admit I have not followed USB autoresume discussions, my usb-devle backlog is over 5000 messages ;( ) is I am not sure why woudl a read block autoresume. I can see write doing that but passive reads should not affect device state. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Here's a patch to do what I mentioned earlier. Not tested -- it may expose some existing bugs. It may even break something, but I'm not aware of anything that depends on it explicitly. Greg, do you know of anything in particular that depends on a kobjects not being released before their children are released? This is meant to be used without any of the other changes Cornelia and I have posted. Although it might not hurt to have them all present... Alan Stern Index: usb-2.6/lib/kobject.c === --- usb-2.6.orig/lib/kobject.c +++ usb-2.6/lib/kobject.c @@ -192,12 +192,15 @@ void kobject_init(struct kobject * kobj) static void unlink(struct kobject * kobj) { + struct kobject *parent = kobj->parent; + if (kobj->kset) { spin_lock(&kobj->kset->list_lock); list_del_init(&kobj->entry); spin_unlock(&kobj->kset->list_lock); } kobject_put(kobj); + kobject_put(parent); } /** @@ -241,7 +244,6 @@ int kobject_shadow_add(struct kobject * if (error) { /* unlink does the kobject_put() for us */ unlink(kobj); - kobject_put(parent); /* be noisy on error issues */ if (error == -EEXIST) @@ -489,7 +491,6 @@ void kobject_cleanup(struct kobject * ko { struct kobj_type * t = get_ktype(kobj); struct kset * s = kobj->kset; - struct kobject * parent = kobj->parent; pr_debug("kobject %s: cleaning up\n",kobject_name(kobj)); if (kobj->k_name != kobj->name) @@ -505,7 +506,6 @@ void kobject_cleanup(struct kobject * ko if (s) kset_put(s); - kobject_put(parent); } static void kobject_release(struct kref *kref) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Fri, 20 Apr 2007, Tejun Heo wrote: > Hello, Alan. > > Alan Stern wrote: > > This doesn't solve a related problem: a subsystem wants to register > > devices and to provide a set of mutually-exclusive services to the > > devices' drivers. The mutual exclusion has to be provided by a mutex or > > something similar, and the drivers need a way to unbind even while waiting > > to acquire the mutex. > > I don't really follow why the drivers need a way to unbind even while > waiting to acquire the mutex. Care to enlighten me? Forget I mentioned it. It isn't a problem if the subsystem uses its own mutex to provide the mutual exclusion. Things become difficult only if the subsystem tries to use dev->sem. > > I thought of something else that could also be done: There should be a way > > to cancel an outstanding workqueue request. At the moment all you can do > > is call flush_workqueue(), which will hang if you are already executing in > > a workqueue routine. You should be able to delete a particular entry from > > the workqueue (or wait for it to complete if it has already started > > running). This could be implemented right away. > > It all depends on how a particular subsystem is shaped but having such > thing would definitely help. I saw something recently suggesting such a thing is already present in Andrew Morton's queue. Great minds think alike... :-) > Yeah, exactly. My argument is that that impedance matching between > lifetime rules must happen at some place and it's better if we can do in > the higher layer where we can afford more effort and thus complexity. > We're currently pushing that down to each drivers and not too many are > getting it right. I think it's just unrealistic to expect every and > each driver subsystems to get it right, so some overhead at higher layer > is acceptable and we can definitely afford much more optimization at > higher layer. Agreed. You know, things may not be quite as bad as I had thought. I was under the impression that the driver core did put_device(dev->parent) during device_release(). But that's not the case; the call is made during device_del(). This makes things very different. Failure to drop references to a device during remove() won't cause any lingering references to the device's parent. The effects of one badly-behaved driver won't propagate indefinitely far up the device tree. On the other hand, although devices behave this way, kobjects do not. The call to kobject_put(kobj->parent) is in kobject_cleanup(), not in kobject_del() or unlink(). So this still needs to be fixed. It may be related to the sysfs <-> kobject link you have been trying to break. Dmitry, in thinking things over some more I realized there's going to be a problem with the autosuspend support in USB. It has to do with the way a driver needs to prevent (or block) suspends from occurring while it is actively using a device. To understand this, you need to know that USB adds a pm_mutex to the device structure. It gets used for synchronizing all suspend- or resume-related activities. In particular, the driver's suspend() method is called with usbdev->pm_mutex held. The next idea is that a driver needs to synchronize its remove() method with other methods such as read() -- we mustn't allow read() to try and refer to the device after the driver has been unbound. Let's say the driver has unbind_mutex embedded in its private data structure for this purpose. Now consider what read() has to do. It needs to block suspends from occurring while it runs, and it needs to do an autoresume if the device was already suspended. So read() will look like this: mutex_lock(&private->unbind_mutex); if (private->gone) { ret = -ENODEV; goto done; } if (private->suspended) autoresume(private->usbdev); ... done: mutex_unlock(&private->unbind_mutex); return ret; Meanwhile, the suspend() method needs to block while read() is running. So it will look like this: private->suspended = 1; mutex_lock(&private->unbind_mutex); /* Now the driver has quiesced */ mutex_unlock(&private->unbind_mutex); Here's the problem: The autoresume() call inside read() tries to acquire usbdev->pm_mutex while holding private->unbind_mutex. The suspend() method does the reverse, a locking-order violation. So far I haven't figured out any way to make this work. Do you have a suggestion? (Don't say read() should hold pm_mutex as well as unbind_mutex; that's no good -- although the reason is rather obscure.) Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Fri, 20 Apr 2007 14:27:06 +0900, Tejun Heo <[EMAIL PROTECTED]> wrote: > Hello, Alan. > > Alan Stern wrote: > > This doesn't solve a related problem: a subsystem wants to register > > devices and to provide a set of mutually-exclusive services to the > > devices' drivers. The mutual exclusion has to be provided by a mutex or > > something similar, and the drivers need a way to unbind even while waiting > > to acquire the mutex. > > I don't really follow why the drivers need a way to unbind even while > waiting to acquire the mutex. Care to enlighten me? I guess when the driver is just being ripped out or an unbind has been triggered or similar. > Yeah, exactly. My argument is that that impedance matching between > lifetime rules must happen at some place and it's better if we can do in > the higher layer where we can afford more effort and thus complexity. > We're currently pushing that down to each drivers and not too many are > getting it right. I think it's just unrealistic to expect every and > each driver subsystems to get it right, so some overhead at higher layer > is acceptable and we can definitely afford much more optimization at > higher layer. I agree. It isn't easy to get this right, so just we should just solve that once for all drivers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Thu, 19 Apr 2007 16:49:11 +0200, Cornelia Huck <[EMAIL PROTECTED]> wrote: > On Thu, 19 Apr 2007 10:20:51 -0400 (EDT), > Alan Stern <[EMAIL PROTECTED]> wrote: > > > The patch below, applied on top of Cornelia's changes plus the > > kobject_init() patch I posted earlier, actually seems to work. And it > > prevents an oops which I was able to trigger without it. > > Looks nice at first glance. I'll try to test it a bit. OK, I've tested this a bit (with a silly dummy driver and zfcp with hacked-back-in unload support) and I haven't encountered any problems so far. So I second the seems-to-work statement :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Hello, Dmitry. Dmitry Torokhov wrote: > On 4/19/07, Cornelia Huck <[EMAIL PROTECTED]> wrote: >> On Thu, 19 Apr 2007 09:13:43 -0400, >> "Dmitry Torokhov" <[EMAIL PROTECTED]> wrote: >> >> > Because they are managed by 2 different entities. the struct device >> > objects are managed by device core and driver-specific objects are >> > managed by their respective driver. >> >> Not sure if I understand you here. My view of this was always that the >> embedding object was kind of an extended device and that the relevant >> driver/subsystem managed it through the driver core infrastructure. >> > > I am not sure if I agree with this point of view. Driver (or > subsystem) provides an instance of struct device for the rest of the > system to iteract uniformly with (suspend/resume/tree > visualization/etc) i.e. struct device implement an interface for > subsystems. However most of the system use their own mechanisms to > manage their devices. They can rely on the driver core to a certain > degree but driver core is mostly a carries out helper functions, not > the meat. Many drivers (at least all the SCSI/IDE ones) consider struct device as the base class of the devices those drivers implement. I don't think we can just consider those drivers to be wrong. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Hello, Alan. Alan Stern wrote: > This doesn't solve a related problem: a subsystem wants to register > devices and to provide a set of mutually-exclusive services to the > devices' drivers. The mutual exclusion has to be provided by a mutex or > something similar, and the drivers need a way to unbind even while waiting > to acquire the mutex. I don't really follow why the drivers need a way to unbind even while waiting to acquire the mutex. Care to enlighten me? > The obvious answer is to introduce a different sort of synchronization > primitive: a mutex (or semaphore or rwsem) which can be invalidated. > > The semantics would be straightforward. When mutex_invalidate() is > called, it marks the mutex so that all future lock attempts will fail with > -ENODEV. It also wakes up all threads that are blocked trying to lock the > mutex and causes them to fail with the same error. Once all that is done > mutex_invalidate() returns. In particular, it doesn't wait for the > current lock to be released -- in fact, you would call it while holding > the lock. > > This would solve a lot of your problems. But it would also mean making > extensive changes to the kernel. For one thing, mutex_lock() would return > int instead of void, and you would want to mark it __must_check. Every > place where a mutex is locked, the code would have to be changed to add an > error pathway. That's the sort of thing I was talking about when I said > it was going to be a tremendous job. I think we both agree that's not a good idea. :-) > I thought of something else that could also be done: There should be a way > to cancel an outstanding workqueue request. At the moment all you can do > is call flush_workqueue(), which will hang if you are already executing in > a workqueue routine. You should be able to delete a particular entry from > the workqueue (or wait for it to complete if it has already started > running). This could be implemented right away. It all depends on how a particular subsystem is shaped but having such thing would definitely help. > More problems with immediate detach -- it would have to apply to char > devices. When a char device is unregistered you can't force user > processes to close their open file handles. Instead something like your > change to sysfs is needed -- wait for outstanding calls to complete and > fail any future calls. This means that registering a device will use up > more than just a pointer in a table of minor device numbers. Each entry > would require at least an rwsem, and device I/O would be slowed down by > the need to get a read-lock each time before entering the device driver. > > The same idea applies to block devices, although here the problems center > more around the block core and request queues. Yeah, exactly. My argument is that that impedance matching between lifetime rules must happen at some place and it's better if we can do in the higher layer where we can afford more effort and thus complexity. We're currently pushing that down to each drivers and not too many are getting it right. I think it's just unrealistic to expect every and each driver subsystems to get it right, so some overhead at higher layer is acceptable and we can definitely afford much more optimization at higher layer. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Thu, 19 Apr 2007, Dmitry Torokhov wrote: > On 4/19/07, Alan Stern <[EMAIL PROTECTED]> wrote: > > On Wed, 18 Apr 2007, Dmitry Torokhov wrote: > > > > > I am still do not understand why this is needed. Would it not be > > > simplier just to use a reference to struct device instead of embedding > > > it in a larger structure if their lifetimes are different and one does > > > not have a subsystem that takes care of releasing logic. > > > > > > > > > Pretty much drivers have 2 options: > > > > > > struct my_device { > > > void *private_data; > > > struct device dev; > > > }; > > > > Actually people use dev_[gs]et_drvdata() instead of a separate > > private_data pointer. That way there's no need for the my_device > > container. > > > > No, drvdata belongs to driver that is bound to a device. We are > talking about private data created and managed by driver that provides > device. Oh, sorry, I misunderstood. > Again, if we are talking about driver bound to a device then devices > ->release() is irrelevant. If we are talking about driver that created > device then driver's ->remove() is irrelevant. The problem is that device structures carry references to their parents. So even if a driver does use this option and is able to give up its own references immediately, it can't make any guarantees about drivers of child devices. If every driver followed your scheme we'd be okay; until then any ancestor of a device with a non-immediate-detach driver would not be able to do immediate detach. There's also another problem. Even though all the references to my_dev->dev may be gone, there will probably still be outstanding references to private_data structure. The driver's exit() routine will have to wait until all those references are gone. How can it do that safely? Presumably the private_data will include its own refcount and some release routine will run when the count drops to 0. That routine will be part of the driver's module -- so how can it enable itself to be unloaded? > > > With current sysfs orphaning attributes upon removal request there is > > > no issue of accessing driver-private data through references obtained > > > via ether embedded or referenced dev structure so everything is fine. > > > > Not so. There are other pathways besides sysfs which can cause a driver > > to access its data structures. > > > > Which ones? These needs to be identified and treated with "immediate > disconnect" that you advocated earlier. Once active users of device's > services are gone you can zap it. Among the worst offenders are character devices. None of the subsystem providers offering char device registration performs immediate detach -- they are a lot like sysfs used to be. (In fact, they probably _can't_ provide it since read() or write() calls can block indefinitely in the lower-level driver; the subsystem may have no way to force them to fail-fast.) So every lower-level driver which registers a char device needs to provide its own synchronization and immediate detach, and at the moment I doubt very many of them do. And of those which do, a large percentage rely on the BKL for synchronization! Here's another awkward possibility. Suppose a workqueue routine tries to unregister a device. Suppose the device's driver needs to do something in a process context, so it has a work item pending in the same workqueue. The driver's remove() method would need to flush the workqueue (in order to perform immediate detach), but doing so would cause a deadlock. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On 4/19/07, Alan Stern <[EMAIL PROTECTED]> wrote: On Wed, 18 Apr 2007, Dmitry Torokhov wrote: > I am still do not understand why this is needed. Would it not be > simplier just to use a reference to struct device instead of embedding > it in a larger structure if their lifetimes are different and one does > not have a subsystem that takes care of releasing logic. > > > Pretty much drivers have 2 options: > > struct my_device { > void *private_data; > struct device dev; > }; Actually people use dev_[gs]et_drvdata() instead of a separate private_data pointer. That way there's no need for the my_device container. No, drvdata belongs to driver that is bound to a device. We are talking about private data created and managed by driver that provides device. > In this case ->release must live in a subsystem code; individual > drivers kfree(my_dev->private) and do any additional cleanup after > calling device_unregister(&my_dev->dev); That doesn't sound right. Generally the call to device_unregister() and the release method live in the same module. Maybe you meant to say individual drivers kfree(my_dev->private_data) and do any additional cleanup in their remove() routine. Again, if we are talking about driver bound to a device then devices ->release() is irrelevant. If we are talking about driver that created device then driver's ->remove() is irrelevant. This approach seems dangerous. Suppose there's mutex embedded in my_dev->private_data, and suppose some other thread is blocked waiting on that mutex when remove() is called. That other thread will then oops when my_dev->private_data is deallocated. What other thread? I suppose it is module-local thread or subsystem-local thread. Let's that particular subsystem take care of it's own data and use its own ->release() when it is ready. What I mean is there is no need to perform clean-up at once; every layer can do its own cleanup. > Second option: > > struct my_device { > type member1; > type member2; > >struct device *dev; > }; > > dev is coming from _device_create(). Driver core takes care of > releasing dev structure; driver does cleanup of my_device. Lots of drivers create devices dynamically without using device_create(). More to the point, how does the driver clean up my_device? It probably has a reference count somewhere in my_device, especially if my_device is shared with other threads or other drivers. We then face exactly the same problem: What happens if the driver's module is unloaded before all the references to my_device are gone? This is up to subsystem to ensure that it does not access dead devices. > With current sysfs orphaning attributes upon removal request there is > no issue of accessing driver-private data through references obtained > via ether embedded or referenced dev structure so everything is fine. Not so. There are other pathways besides sysfs which can cause a driver to access its data structures. Which ones? These needs to be identified and treated with "immediate disconnect" that you advocated earlier. Once active users of device's services are gone you can zap it. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Wed, 18 Apr 2007, Dmitry Torokhov wrote: > I am still do not understand why this is needed. Would it not be > simplier just to use a reference to struct device instead of embedding > it in a larger structure if their lifetimes are different and one does > not have a subsystem that takes care of releasing logic. > > > Pretty much drivers have 2 options: > > struct my_device { > void *private_data; > struct device dev; > }; Actually people use dev_[gs]et_drvdata() instead of a separate private_data pointer. That way there's no need for the my_device container. > In this case ->release must live in a subsystem code; individual > drivers kfree(my_dev->private) and do any additional cleanup after > calling device_unregister(&my_dev->dev); That doesn't sound right. Generally the call to device_unregister() and the release method live in the same module. Maybe you meant to say individual drivers kfree(my_dev->private_data) and do any additional cleanup in their remove() routine. This approach seems dangerous. Suppose there's mutex embedded in my_dev->private_data, and suppose some other thread is blocked waiting on that mutex when remove() is called. That other thread will then oops when my_dev->private_data is deallocated. > Second option: > > struct my_device { > type member1; > type member2; > >struct device *dev; > }; > > dev is coming from _device_create(). Driver core takes care of > releasing dev structure; driver does cleanup of my_device. Lots of drivers create devices dynamically without using device_create(). More to the point, how does the driver clean up my_device? It probably has a reference count somewhere in my_device, especially if my_device is shared with other threads or other drivers. We then face exactly the same problem: What happens if the driver's module is unloaded before all the references to my_device are gone? > With current sysfs orphaning attributes upon removal request there is > no issue of accessing driver-private data through references obtained > via ether embedded or referenced dev structure so everything is fine. Not so. There are other pathways besides sysfs which can cause a driver to access its data structures. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Thu, 19 Apr 2007 10:20:51 -0400 (EDT), Alan Stern <[EMAIL PROTECTED]> wrote: > The patch below, applied on top of Cornelia's changes plus the > kobject_init() patch I posted earlier, actually seems to work. And it > prevents an oops which I was able to trigger without it. Looks nice at first glance. I'll try to test it a bit. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Thu, 19 Apr 2007, Cornelia Huck wrote: > On Thu, 19 Apr 2007 01:06:25 +0900, > Tejun Heo <[EMAIL PROTECTED]> wrote: > > > Heh heh, I'm amazed it actually works. Agreed that the list walking > > isn't pretty but adding completion to each kobject still feels like too > > much of overhead just for release waiting. Any better ideas? > > Not really. But I'm more in favour on adding a new field than on adding > complicated code. (Although, the whole device infrastructure wouldn't > fare bad with some diet...) The patch below, applied on top of Cornelia's changes plus the kobject_init() patch I posted earlier, actually seems to work. And it prevents an oops which I was able to trigger without it. Alan Stern Index: usb-2.6/drivers/base/core.c === --- usb-2.6.orig/drivers/base/core.c +++ usb-2.6/drivers/base/core.c @@ -479,8 +479,9 @@ static void klist_children_put(struct kl /** - * device_initialize - init device structure. + * device_initialize_owner - init device structure. * @dev: device. + * @owner: module owing the release routine. * * This prepares the device for use by other layers, * including adding it to the device hierarchy. @@ -489,10 +490,10 @@ static void klist_children_put(struct kl * may use @dev's fields (e.g. the refcount). */ -void device_initialize(struct device *dev) +void device_initialize_owner(struct device *dev, struct module *owner) { kobj_set_kset_s(dev, devices_subsys); - kobject_init(&dev->kobj); + kobject_init_owner(&dev->kobj, owner); klist_init(&dev->klist_children, klist_children_get, klist_children_put); INIT_LIST_HEAD(&dev->dma_pools); @@ -756,8 +757,9 @@ int device_add(struct device *dev) /** - * device_register - register a device with the system. + * device_register_owner - register a device with the system. * @dev: pointer to the device structure + * @owner: module owning the release routine * * This happens in two clean steps - initialize the device * and add it to the system. The two steps can be called @@ -767,9 +769,9 @@ int device_add(struct device *dev) * before it is added to the hierarchy. */ -int device_register(struct device *dev) +int device_register_owner(struct device *dev, struct module *owner) { - device_initialize(dev); + device_initialize_owner(dev, owner); return device_add(dev); } @@ -1013,9 +1015,9 @@ int __init devices_init(void) EXPORT_SYMBOL_GPL(device_for_each_child); EXPORT_SYMBOL_GPL(device_find_child); -EXPORT_SYMBOL_GPL(device_initialize); +EXPORT_SYMBOL_GPL(device_initialize_owner); EXPORT_SYMBOL_GPL(device_add); -EXPORT_SYMBOL_GPL(device_register); +EXPORT_SYMBOL_GPL(device_register_owner); EXPORT_SYMBOL_GPL(device_del); EXPORT_SYMBOL_GPL(device_unregister); Index: usb-2.6/include/linux/device.h === --- usb-2.6.orig/include/linux/device.h +++ usb-2.6/include/linux/device.h @@ -509,9 +509,12 @@ void driver_init(void); /* * High level routines for use by the bus drivers */ -extern int __must_check device_register(struct device * dev); +extern int __must_check device_register_owner(struct device * dev, + struct module *owner); +#define device_register(dev) device_register_owner(dev, THIS_MODULE) extern void device_unregister(struct device * dev); -extern void device_initialize(struct device * dev); +extern void device_initialize_owner(struct device * dev, struct module *owner); +#define device_initialize(dev) device_initialize_owner(dev, THIS_MODULE) extern int __must_check device_add(struct device * dev); extern void device_del(struct device * dev); extern int device_for_each_child(struct device *, void *, Index: usb-2.6/drivers/base/platform.c === --- usb-2.6.orig/drivers/base/platform.c +++ usb-2.6/drivers/base/platform.c @@ -106,13 +106,15 @@ EXPORT_SYMBOL_GPL(platform_get_irq_bynam * platform_add_devices - add a numbers of platform devices * @devs: array of platform devices to add * @num: number of platform devices in array + * @owner: module owning the devices in @devs */ -int platform_add_devices(struct platform_device **devs, int num) +int platform_add_devices_owner(struct platform_device **devs, int num, + struct module *owner) { int i, ret = 0; for (i = 0; i < num; i++) { - ret = platform_device_register(devs[i]); + ret = platform_device_register_owner(devs[i], owner); if (ret) { while (--i >= 0) platform_device_unregister(devs[i]); @@ -122,7 +124,7 @@ int platform_add_devices(struct platform return ret; } -EXPORT_SYMBOL_GPL(platform_add_devices); +EXPORT_SYMBOL_
Re: [PATCH RFD] alternative kobject release wait mechanism
On 4/19/07, Cornelia Huck <[EMAIL PROTECTED]> wrote: On Thu, 19 Apr 2007 09:13:43 -0400, "Dmitry Torokhov" <[EMAIL PROTECTED]> wrote: > Because they are managed by 2 different entities. the struct device > objects are managed by device core and driver-specific objects are > managed by their respective driver. Not sure if I understand you here. My view of this was always that the embedding object was kind of an extended device and that the relevant driver/subsystem managed it through the driver core infrastructure. I am not sure if I agree with this point of view. Driver (or subsystem) provides an instance of struct device for the rest of the system to iteract uniformly with (suspend/resume/tree visualization/etc) i.e. struct device implement an interface for subsystems. However most of the system use their own mechanisms to manage their devices. They can rely on the driver core to a certain degree but driver core is mostly a carries out helper functions, not the meat. > > > > Pretty much drivers have 2 options: > > > > > > struct my_device { > > > void *private_data; > > > struct device dev; > > > }; > > > > > > In this case ->release must live in a subsystem code; individual > > > drivers kfree(my_dev->private) and do any additional cleanup after > > > calling device_unregister(&my_dev->dev); > > > > They must do this in the ->remove callback. > > Why? If the driver truly stops hardware then any driver-specific data > is not needed. With sysfs severing access to removed attributes there > is no need to gave "global release", cleanup can be done in stages. I think I meant the same thing :) Freeing the data in the ->release callback is obviously too late. Freeing it in the ->remove callback means that the device is no longer really used (and can't be looked up any more); only some further refrences may linger (and those are of no consequence with the sysfs disconnect). Ah, right, I confused ->remove() with ->release() in your post. Sorry about that. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Thu, 19 Apr 2007 09:13:43 -0400, "Dmitry Torokhov" <[EMAIL PROTECTED]> wrote: > Because they are managed by 2 different entities. the struct device > objects are managed by device core and driver-specific objects are > managed by their respective driver. Not sure if I understand you here. My view of this was always that the embedding object was kind of an extended device and that the relevant driver/subsystem managed it through the driver core infrastructure. > > > > Pretty much drivers have 2 options: > > > > > > struct my_device { > > > void *private_data; > > > struct device dev; > > > }; > > > > > > In this case ->release must live in a subsystem code; individual > > > drivers kfree(my_dev->private) and do any additional cleanup after > > > calling device_unregister(&my_dev->dev); > > > > They must do this in the ->remove callback. > > Why? If the driver truly stops hardware then any driver-specific data > is not needed. With sysfs severing access to removed attributes there > is no need to gave "global release", cleanup can be done in stages. I think I meant the same thing :) Freeing the data in the ->release callback is obviously too late. Freeing it in the ->remove callback means that the device is no longer really used (and can't be looked up any more); only some further refrences may linger (and those are of no consequence with the sysfs disconnect). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Thu, 19 Apr 2007 01:06:25 +0900, Tejun Heo <[EMAIL PROTECTED]> wrote: > Heh heh, I'm amazed it actually works. Agreed that the list walking > isn't pretty but adding completion to each kobject still feels like too > much of overhead just for release waiting. Any better ideas? Not really. But I'm more in favour on adding a new field than on adding complicated code. (Although, the whole device infrastructure wouldn't fare bad with some diet...) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On 4/19/07, Cornelia Huck <[EMAIL PROTECTED]> wrote: On Wed, 18 Apr 2007 12:41:36 -0400, "Dmitry Torokhov" <[EMAIL PROTECTED]> wrote: > I am still do not understand why this is needed. Would it not be > simplier just to use a reference to struct device instead of embedding > it in a larger structure if their lifetimes are different and one does > not have a subsystem that takes care of releasing logic. Why are their lifetimes different? Usually, if I hold on to the device, I also want to be able to use the structure that embeds the device. Because they are managed by 2 different entities. the struct device objects are managed by device core and driver-specific objects are managed by their respective driver. > Pretty much drivers have 2 options: > > struct my_device { > void *private_data; > struct device dev; > }; > > In this case ->release must live in a subsystem code; individual > drivers kfree(my_dev->private) and do any additional cleanup after > calling device_unregister(&my_dev->dev); They must do this in the ->remove callback. Why? If the driver truly stops hardware then any driver-specific data is not needed. With sysfs severing access to removed attributes there is no need to gave "global release", cleanup can be done in stages. > > Second option: > > struct my_device { > type member1; > type member2; > >struct device *dev; > }; > > dev is coming from _device_create(). Driver core takes care of > releasing dev structure; driver does cleanup of my_device. device_create() would need to not expect a class then, or it's not universally usable. Also, the driver would need a method to get back from the device to my_device. We're practically back at the first option again, only that now the ->release function is sitting in the driver core instead of the subsystem. To a degree. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Wed, 18 Apr 2007 12:41:36 -0400, "Dmitry Torokhov" <[EMAIL PROTECTED]> wrote: > I am still do not understand why this is needed. Would it not be > simplier just to use a reference to struct device instead of embedding > it in a larger structure if their lifetimes are different and one does > not have a subsystem that takes care of releasing logic. Why are their lifetimes different? Usually, if I hold on to the device, I also want to be able to use the structure that embeds the device. > Pretty much drivers have 2 options: > > struct my_device { > void *private_data; > struct device dev; > }; > > In this case ->release must live in a subsystem code; individual > drivers kfree(my_dev->private) and do any additional cleanup after > calling device_unregister(&my_dev->dev); They must do this in the ->remove callback. > > Second option: > > struct my_device { > type member1; > type member2; > >struct device *dev; > }; > > dev is coming from _device_create(). Driver core takes care of > releasing dev structure; driver does cleanup of my_device. device_create() would need to not expect a class then, or it's not universally usable. Also, the driver would need a method to get back from the device to my_device. We're practically back at the first option again, only that now the ->release function is sitting in the driver core instead of the subsystem. > With current sysfs orphaning attributes upon removal request there is > no issue of accessing driver-private data through references obtained > via ether embedded or referenced dev structure so everything is fine. Discoupling of sysfs and kobject lifetime rules definetly eliminates a lot of issues. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Thu, 19 Apr 2007, Tejun Heo wrote: > More afterthoughts. If a mutex is used to protect access against > removal. There is no reason to hold reference to it. > > kernel_thread() > { > /* wanna dereference my_obj */ > mutex_lock(); > verify my_obj is there and use it if so. > mutex_unlock(); > } > > remove() > { > mutex_lock(); > kill_it(); > mutex_unlock(); > } > > I probably have over simplified it but using both mutex and reference > counts doesn't make much sense. IOW, you get an active reference when > you grab the mutex excluding its removal and verified it's still there. > > There probably are other reasons why things are done that way and we can > and probably will have to resort to mixed solutions in foreseeable > future but I don't think there is any inherent problem in applying > immediate-disconnect in the described situation. > > Feel free to scream at me if I'm getting it totally wrong. :-) This doesn't solve a related problem: a subsystem wants to register devices and to provide a set of mutually-exclusive services to the devices' drivers. The mutual exclusion has to be provided by a mutex or something similar, and the drivers need a way to unbind even while waiting to acquire the mutex. The obvious answer is to introduce a different sort of synchronization primitive: a mutex (or semaphore or rwsem) which can be invalidated. The semantics would be straightforward. When mutex_invalidate() is called, it marks the mutex so that all future lock attempts will fail with -ENODEV. It also wakes up all threads that are blocked trying to lock the mutex and causes them to fail with the same error. Once all that is done mutex_invalidate() returns. In particular, it doesn't wait for the current lock to be released -- in fact, you would call it while holding the lock. This would solve a lot of your problems. But it would also mean making extensive changes to the kernel. For one thing, mutex_lock() would return int instead of void, and you would want to mark it __must_check. Every place where a mutex is locked, the code would have to be changed to add an error pathway. That's the sort of thing I was talking about when I said it was going to be a tremendous job. I thought of something else that could also be done: There should be a way to cancel an outstanding workqueue request. At the moment all you can do is call flush_workqueue(), which will hang if you are already executing in a workqueue routine. You should be able to delete a particular entry from the workqueue (or wait for it to complete if it has already started running). This could be implemented right away. More problems with immediate detach -- it would have to apply to char devices. When a char device is unregistered you can't force user processes to close their open file handles. Instead something like your change to sysfs is needed -- wait for outstanding calls to complete and fail any future calls. This means that registering a device will use up more than just a pointer in a table of minor device numbers. Each entry would require at least an rwsem, and device I/O would be slowed down by the need to get a read-lock each time before entering the device driver. The same idea applies to block devices, although here the problems center more around the block core and request queues. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On 4/18/07, Tejun Heo <[EMAIL PROTECTED]> wrote: Hello, Alan Stern wrote: > On Wed, 18 Apr 2007, Tejun Heo wrote: > >> Hello, all. >> >> Agreed with the problem but I'm not very enthusiastic for adding >> kobj->owner. How about the following? exit() routines will have to >> do device_unregister_wait() instead of device_unregister(). On return >> from it, it's guaranteed that all references to it are dropped and >> ->release is finished. The caller is responsible for avoiding >> deadlock, of course. > > There's a problem with this approach. > > Many drivers, especially those for hot-pluggable buses, register and > unregister devices dynamically. These events can occur in time-critical > situations, where the driver cannot afford to wait for all the references > to be dropped when unregistering a device. It's okay to wait in a module > exit routine, but to make things work the routine would have to wait for > references to _all_ unregistered objects to go away, not just the > references for the objects it unregisters at exit time. > > So let's see what changes are needed to make the approach workable. We > will have to maintain a count of objects whose release methods haven't > been called yet. The count has to be incremented every time an object is > unregistered (or registered, it doesn't matter which) and decremented > _after_ the release method returns -- meaning somewhere in the driver > core. When the count goes to zero, the exit routine is then allowed to > terminate. > > Hmmm, this is beginning to sound like a module-wide refcount which serves > to block mod->exit(). In fact, it sounds almost identical to what > Cornelia wrote, except that the refcount refers only to devices rather > than arbitrary kobjects (and except that the blockage just before > mod->exit returns instead of just after). You can see where I'm > leading... The goal of immediate-disconnect is to remove such lingering reference counts so that device_unregister() or driver detach puts the last reference count. You tell a higher layer that a device is going away, on return from the function, that layer isn't gonna access the device anymore. ie. On return from the unregistration function, the upper layer shouldn't have any reference to the device. If you unregister from all layers a device is registered to, you are left with only 1 reference which you put with device_unregister(). After all are converted, reference count doesn't mean much. struct device isn't a reference counted object anymore. I don't think this is gonna be too difficult to do. I think I can convert block layer and IDE/SCSI drivers without too much problem. Dunno much about other layers tho. I am still do not understand why this is needed. Would it not be simplier just to use a reference to struct device instead of embedding it in a larger structure if their lifetimes are different and one does not have a subsystem that takes care of releasing logic. Pretty much drivers have 2 options: struct my_device { void *private_data; struct device dev; }; In this case ->release must live in a subsystem code; individual drivers kfree(my_dev->private) and do any additional cleanup after calling device_unregister(&my_dev->dev); Second option: struct my_device { type member1; type member2; struct device *dev; }; dev is coming from _device_create(). Driver core takes care of releasing dev structure; driver does cleanup of my_device. With current sysfs orphaning attributes upon removal request there is no issue of accessing driver-private data through references obtained via ether embedded or referenced dev structure so everything is fine. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Hello, Alan Stern wrote: > On Thu, 19 Apr 2007, Tejun Heo wrote: > >> The goal of immediate-disconnect is to remove such lingering reference >> counts so that device_unregister() or driver detach puts the last >> reference count. > > Yes, I understand. If you had immediate-disconnect then you wouldn't need > device_unregister_wait(). In fact, you wouldn't need any reference counts > at all. It would be guaranteed that when the unregister call returned, > all references would be gone. > >> You tell a higher layer that a device is going away, on return from the >> function, that layer isn't gonna access the device anymore. > > No, no. You tell somebody (it might be a higher layer, it might be a > lower layer, or it might be a same-height layer -- doesn't matter) that a > device is going away. Yeap, right. I higher, lower, same, whatever. I was using the term as drivers usually register to upper layers. > On return from the function, that layer isn't going > to access the device any more, _nor_ will anyone else who has obtained a > reference from that layer. This last clause is very important. Agreed. That layer is responsible for managing lingering objects and telling its users that the device is a zombie now. >> I don't think this is gonna be too difficult to do. I think I can >> convert block layer and IDE/SCSI drivers without too much problem. >> Dunno much about other layers tho. > > You have to convert more than layers (or core subsystems). You also have > to audit and convert drivers. It will be tremendously difficult to do. I definitely can be mis-assessing the problem. I'll first give a shot at the block/SCSI layer. How about that? > You did misunderstand. Here's what I was talking about: > > Driver A: > - > unregister_device(dev); > > /* inside the driver core */ > down(&dev->sem); > if (dev->driver) > dev->driver->remove(dev); > up(&dev->sem); > device_put(dev);/* or device_put_wait */ > > > Driver B: > - > void remove(struct device *dev) > { > struct my_device *mydev = dev_get_drvdata(dev); > > mydev->gone = 1; > kref_put(&mydev->kref, my_device_release); > } > > > Driver B's kernel thread: > - > kref_get(&mydev->kref); > down(&mydev->dev.sem); > if (mydev->gone) > goto finished; > ... > finished: > up(&mydev->dev.sem); > kref_put(&mydev->kref, my_device_release); > > Consider what happens if the kernel thread blocks on its down() while the > remove() method is running. It will be impossible for Driver B to > eliminate the reference to dev held by mydev and by the down() routine. > > In short, Driver B _can't_ provide an immediate detach. Not unless > someone figures out a way to cancel a blocked down(). And do the same > thing for other blocking primitives. Ah.. I see. You're right in that driver B cannot wait for disconnect in its remove routine in the above code but using a separate mutex to protect ->gone should do the trick, so I don't think the above case is a big problem. It's a pretty specific case which is easy to spot and update. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Thu, 19 Apr 2007, Tejun Heo wrote: > The goal of immediate-disconnect is to remove such lingering reference > counts so that device_unregister() or driver detach puts the last > reference count. Yes, I understand. If you had immediate-disconnect then you wouldn't need device_unregister_wait(). In fact, you wouldn't need any reference counts at all. It would be guaranteed that when the unregister call returned, all references would be gone. > You tell a higher layer that a device is going away, on return from the > function, that layer isn't gonna access the device anymore. No, no. You tell somebody (it might be a higher layer, it might be a lower layer, or it might be a same-height layer -- doesn't matter) that a device is going away. On return from the function, that layer isn't going to access the device any more, _nor_ will anyone else who has obtained a reference from that layer. This last clause is very important. > I don't think this is gonna be too difficult to do. I think I can > convert block layer and IDE/SCSI drivers without too much problem. > Dunno much about other layers tho. You have to convert more than layers (or core subsystems). You also have to audit and convert drivers. It will be tremendously difficult to do. > Dunno if I understood the problem right but can't we do the following? > > remove() > { > acquire sem; > device_del(); > release sem; > device_put_wait(); > } You did misunderstand. Here's what I was talking about: Driver A: - unregister_device(dev); /* inside the driver core */ down(&dev->sem); if (dev->driver) dev->driver->remove(dev); up(&dev->sem); device_put(dev);/* or device_put_wait */ Driver B: - void remove(struct device *dev) { struct my_device *mydev = dev_get_drvdata(dev); mydev->gone = 1; kref_put(&mydev->kref, my_device_release); } Driver B's kernel thread: - kref_get(&mydev->kref); down(&mydev->dev.sem); if (mydev->gone) goto finished; ... finished: up(&mydev->dev.sem); kref_put(&mydev->kref, my_device_release); Consider what happens if the kernel thread blocks on its down() while the remove() method is running. It will be impossible for Driver B to eliminate the reference to dev held by mydev and by the down() routine. In short, Driver B _can't_ provide an immediate detach. Not unless someone figures out a way to cancel a blocked down(). And do the same thing for other blocking primitives. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Cornelia Huck wrote: > On Wed, 18 Apr 2007 03:41:10 +0900, > Tejun Heo <[EMAIL PROTECTED]> wrote: > > > > OK, I hit a bit on the code. Once I saved a reference to the completion > in kobject_cleanup, it seemed to survive a load/unload testloop for a > module registering a device. However, I still dislike this "list of > waiters" approach, it looks too complex... Heh heh, I'm amazed it actually works. Agreed that the list walking isn't pretty but adding completion to each kobject still feels like too much of overhead just for release waiting. Any better ideas? Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Tejun Heo wrote: >> Incidentally, Tejun, I'm all in favor of a immediate-detach driver model >> approach. Unfortunately it's impossible to realize fully, although we >> could come much closer than we are now. >> >> Here's an example where immediate-detach cannot be implemented. A driver >> binds to a device and uses that device is a kernel thread. The thread >> carries out certain operations which require it to hold the device >> semaphore (because, for example, they need to be mutually exclusive with >> unbind). >> >> The driver's remove() method is called with the semaphore held. If the >> thread tries to lock the semaphore at the same time and blocks, there is >> no way at all for the remove() method to force the thread to drop its >> reference. >> >> This isn't merely a theoretical example. The USB hub driver works in >> exactly this way. > > Dunno if I understood the problem right but can't we do the following? > > remove() > { > acquire sem; > device_del(); > release sem; > device_put_wait(); > } More afterthoughts. If a mutex is used to protect access against removal. There is no reason to hold reference to it. kernel_thread() { /* wanna dereference my_obj */ mutex_lock(); verify my_obj is there and use it if so. mutex_unlock(); } remove() { mutex_lock(); kill_it(); mutex_unlock(); } I probably have over simplified it but using both mutex and reference counts doesn't make much sense. IOW, you get an active reference when you grab the mutex excluding its removal and verified it's still there. There probably are other reasons why things are done that way and we can and probably will have to resort to mixed solutions in foreseeable future but I don't think there is any inherent problem in applying immediate-disconnect in the described situation. Feel free to scream at me if I'm getting it totally wrong. :-) Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Hello, Alan Stern wrote: > On Wed, 18 Apr 2007, Tejun Heo wrote: > >> Hello, all. >> >> Agreed with the problem but I'm not very enthusiastic for adding >> kobj->owner. How about the following? exit() routines will have to >> do device_unregister_wait() instead of device_unregister(). On return >> from it, it's guaranteed that all references to it are dropped and >> ->release is finished. The caller is responsible for avoiding >> deadlock, of course. > > There's a problem with this approach. > > Many drivers, especially those for hot-pluggable buses, register and > unregister devices dynamically. These events can occur in time-critical > situations, where the driver cannot afford to wait for all the references > to be dropped when unregistering a device. It's okay to wait in a module > exit routine, but to make things work the routine would have to wait for > references to _all_ unregistered objects to go away, not just the > references for the objects it unregisters at exit time. > > So let's see what changes are needed to make the approach workable. We > will have to maintain a count of objects whose release methods haven't > been called yet. The count has to be incremented every time an object is > unregistered (or registered, it doesn't matter which) and decremented > _after_ the release method returns -- meaning somewhere in the driver > core. When the count goes to zero, the exit routine is then allowed to > terminate. > > Hmmm, this is beginning to sound like a module-wide refcount which serves > to block mod->exit(). In fact, it sounds almost identical to what > Cornelia wrote, except that the refcount refers only to devices rather > than arbitrary kobjects (and except that the blockage just before > mod->exit returns instead of just after). You can see where I'm > leading... The goal of immediate-disconnect is to remove such lingering reference counts so that device_unregister() or driver detach puts the last reference count. You tell a higher layer that a device is going away, on return from the function, that layer isn't gonna access the device anymore. ie. On return from the unregistration function, the upper layer shouldn't have any reference to the device. If you unregister from all layers a device is registered to, you are left with only 1 reference which you put with device_unregister(). After all are converted, reference count doesn't mean much. struct device isn't a reference counted object anymore. I don't think this is gonna be too difficult to do. I think I can convert block layer and IDE/SCSI drivers without too much problem. Dunno much about other layers tho. > Incidentally, Tejun, I'm all in favor of a immediate-detach driver model > approach. Unfortunately it's impossible to realize fully, although we > could come much closer than we are now. > > Here's an example where immediate-detach cannot be implemented. A driver > binds to a device and uses that device is a kernel thread. The thread > carries out certain operations which require it to hold the device > semaphore (because, for example, they need to be mutually exclusive with > unbind). > > The driver's remove() method is called with the semaphore held. If the > thread tries to lock the semaphore at the same time and blocks, there is > no way at all for the remove() method to force the thread to drop its > reference. > > This isn't merely a theoretical example. The USB hub driver works in > exactly this way. Dunno if I understood the problem right but can't we do the following? remove() { acquire sem; device_del(); release sem; device_put_wait(); } -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Wed, 18 Apr 2007 10:53:55 -0400 (EDT), Alan Stern <[EMAIL PROTECTED]> wrote: > Many drivers, especially those for hot-pluggable buses, register and > unregister devices dynamically. These events can occur in time-critical > situations, where the driver cannot afford to wait for all the references > to be dropped when unregistering a device. It's okay to wait in a module > exit routine, but to make things work the routine would have to wait for > references to _all_ unregistered objects to go away, not just the > references for the objects it unregisters at exit time. Uh, yes, didn't think of that. I'd even say that's the majority of devices allocated in a driver. > BTW, Cornelia, there's a small problem with your patch set. You have > kobject_init() try to grab a reference to kobj->owner, but it's quite > possible for kobj->owner to contain uninitialized garbage since the rest > of the kernel is still unaware of its existence. There's a patch below to > fix things up. I've got a second patch which makes device_initialize() > pass an owner to the embedded kobject, but I want to try it out a little > before posting it. :-) I was relying on the object being zeroed after allocation (and the caller of kobject_initialize already setting ->owner), but passing an owner via the initializing routine sounds saner. Thanks for looking at that. > Index: usb-2.6/include/linux/kobject.h > === > --- usb-2.6.orig/include/linux/kobject.h > +++ usb-2.6/include/linux/kobject.h > @@ -71,7 +71,8 @@ static inline const char * kobject_name( > return kobj->k_name; > } > > -extern void kobject_init(struct kobject *); > +extern void kobject_init_owner(struct kobject *, struct module *owner); > +#define kobject_init(kobj) kobject_init_owner(kobj, THIS_MODULE) > extern void kobject_cleanup(struct kobject *); > > extern int __must_check kobject_add(struct kobject *); > @@ -84,7 +85,9 @@ extern int __must_check kobject_shadow_r > const char *new_name); > extern int __must_check kobject_move(struct kobject *, struct kobject *); > > -extern int __must_check kobject_register(struct kobject *); > +extern int __must_check kobject_register_owner(struct kobject *, > + struct module *owner); > +#define kobject_register(kobj) kobject_register_owner(kobj, > THIS_MODULE) > extern void kobject_unregister(struct kobject *); > > extern struct kobject * kobject_get(struct kobject *); > Index: usb-2.6/lib/kobject.c > === > --- usb-2.6.orig/lib/kobject.c > +++ usb-2.6/lib/kobject.c > @@ -164,10 +164,11 @@ static void verify_dynamic_kobject_alloc > #endif > > /** > - * kobject_init - initialize object. > + * kobject_init_onwer - initialize object. > * @kobj: object in question. > + * @owner: module owning @kobj. > */ > -void kobject_init(struct kobject * kobj) > +void kobject_init_owner(struct kobject * kobj, struct module *owner) > { > if (!kobj) > return; > @@ -178,6 +179,7 @@ void kobject_init(struct kobject * kobj) > init_waitqueue_head(&kobj->poll); > kobj->kset = kset_get(kobj->kset); > /* Attempt to grab reference of owning module's kobject. */ > + kobj->owner = owner; > mod_kobject_get(kobj->owner); > } > > @@ -272,15 +274,16 @@ int kobject_add(struct kobject * kobj) > > > /** > - * kobject_register - initialize and add an object. > + * kobject_register_owner - initialize and add an object. > * @kobj: object in question. > + * @owner: module owning @kobj. > */ > > -int kobject_register(struct kobject * kobj) > +int kobject_register_owner(struct kobject * kobj, struct module *owner) > { > int error = -EINVAL; > if (kobj) { > - kobject_init(kobj); > + kobject_init_owner(kobj, owner); > error = kobject_add(kobj); > if (!error) > kobject_uevent(kobj, KOBJ_ADD); > @@ -750,8 +753,8 @@ void subsys_remove_file(struct subsystem > } > #endif /* 0 */ > > -EXPORT_SYMBOL(kobject_init); > -EXPORT_SYMBOL(kobject_register); > +EXPORT_SYMBOL(kobject_init_owner); > +EXPORT_SYMBOL(kobject_register_owner); > EXPORT_SYMBOL(kobject_unregister); > EXPORT_SYMBOL(kobject_get); > EXPORT_SYMBOL(kobject_put); > Hm, this means we always have an owner (until we explicitely call kobject_{init,register}_owner(kobj, NULL)) - which may be unneeded in some cases. But better to err on the safe side, I suppose. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Wed, 18 Apr 2007 03:41:10 +0900, Tejun Heo <[EMAIL PROTECTED]> wrote: OK, I hit a bit on the code. Once I saved a reference to the completion in kobject_cleanup, it seemed to survive a load/unload testloop for a module registering a device. However, I still dislike this "list of waiters" approach, it looks too complex... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Wed, 18 Apr 2007, Tejun Heo wrote: > Hello, all. > > Agreed with the problem but I'm not very enthusiastic for adding > kobj->owner. How about the following? exit() routines will have to > do device_unregister_wait() instead of device_unregister(). On return > from it, it's guaranteed that all references to it are dropped and > ->release is finished. The caller is responsible for avoiding > deadlock, of course. There's a problem with this approach. Many drivers, especially those for hot-pluggable buses, register and unregister devices dynamically. These events can occur in time-critical situations, where the driver cannot afford to wait for all the references to be dropped when unregistering a device. It's okay to wait in a module exit routine, but to make things work the routine would have to wait for references to _all_ unregistered objects to go away, not just the references for the objects it unregisters at exit time. So let's see what changes are needed to make the approach workable. We will have to maintain a count of objects whose release methods haven't been called yet. The count has to be incremented every time an object is unregistered (or registered, it doesn't matter which) and decremented _after_ the release method returns -- meaning somewhere in the driver core. When the count goes to zero, the exit routine is then allowed to terminate. Hmmm, this is beginning to sound like a module-wide refcount which serves to block mod->exit(). In fact, it sounds almost identical to what Cornelia wrote, except that the refcount refers only to devices rather than arbitrary kobjects (and except that the blockage just before mod->exit returns instead of just after). You can see where I'm leading... BTW, Cornelia, there's a small problem with your patch set. You have kobject_init() try to grab a reference to kobj->owner, but it's quite possible for kobj->owner to contain uninitialized garbage since the rest of the kernel is still unaware of its existence. There's a patch below to fix things up. I've got a second patch which makes device_initialize() pass an owner to the embedded kobject, but I want to try it out a little before posting it. :-) Incidentally, Tejun, I'm all in favor of a immediate-detach driver model approach. Unfortunately it's impossible to realize fully, although we could come much closer than we are now. Here's an example where immediate-detach cannot be implemented. A driver binds to a device and uses that device is a kernel thread. The thread carries out certain operations which require it to hold the device semaphore (because, for example, they need to be mutually exclusive with unbind). The driver's remove() method is called with the semaphore held. If the thread tries to lock the semaphore at the same time and blocks, there is no way at all for the remove() method to force the thread to drop its reference. This isn't merely a theoretical example. The USB hub driver works in exactly this way. Alan Stern Index: usb-2.6/include/linux/kobject.h === --- usb-2.6.orig/include/linux/kobject.h +++ usb-2.6/include/linux/kobject.h @@ -71,7 +71,8 @@ static inline const char * kobject_name( return kobj->k_name; } -extern void kobject_init(struct kobject *); +extern void kobject_init_owner(struct kobject *, struct module *owner); +#define kobject_init(kobj) kobject_init_owner(kobj, THIS_MODULE) extern void kobject_cleanup(struct kobject *); extern int __must_check kobject_add(struct kobject *); @@ -84,7 +85,9 @@ extern int __must_check kobject_shadow_r const char *new_name); extern int __must_check kobject_move(struct kobject *, struct kobject *); -extern int __must_check kobject_register(struct kobject *); +extern int __must_check kobject_register_owner(struct kobject *, + struct module *owner); +#define kobject_register(kobj) kobject_register_owner(kobj, THIS_MODULE) extern void kobject_unregister(struct kobject *); extern struct kobject * kobject_get(struct kobject *); Index: usb-2.6/lib/kobject.c === --- usb-2.6.orig/lib/kobject.c +++ usb-2.6/lib/kobject.c @@ -164,10 +164,11 @@ static void verify_dynamic_kobject_alloc #endif /** - * kobject_init - initialize object. + * kobject_init_onwer - initialize object. * @kobj: object in question. + * @owner: module owning @kobj. */ -void kobject_init(struct kobject * kobj) +void kobject_init_owner(struct kobject * kobj, struct module *owner) { if (!kobj) return; @@ -178,6 +179,7 @@ void kobject_init(struct kobject * kobj) init_waitqueue_head(&kobj->poll); kobj->kset = kset_get(kobj->kset); /* Attempt to grab reference of owning module's kobject. */ + kobj->owner = owner; mod_kobject_get(kobj->owner); } @@ -2
Re: [PATCH RFD] alternative kobject release wait mechanism
Cornelia Huck wrote: On Wed, 18 Apr 2007 17:46:09 +0900, Tejun Heo <[EMAIL PROTECTED]> wrote: It's debatable but I think things will be safer this way. If we wait by default, we are forced to check that all references are dropped and will have a stack dump indicating which object is causing problem when something goes wrong, which is better than silent object leaking and/or jumping to non-existent address way later. I agree that oopsing is bad. However, lingering references are not always coding errors. What if it will just take long for a reference to be given up? You'd have a hanging device_unregister(), with no particular gain. It's more like future plan than immediately applicable. I think most high-level driver related interfaces can be converted as sysfs was converted such that they disconnect immediately from the device - resolving conflicts between higher layer using reference counts and device driver layer which expects immediate disconnect is responsibility of those interfaces - just as sysfs does it. If you have lingering reference to struct device after driver is detached, you're already screwed. If there's outstanding reference to it from the previous driver, how are you gonna load the next one? You're gonna have to wait somewhere for all the references to go away. Actually, your patch series is doing exactly this during module unloading. Problem is that you'll need to do the same thing before attaching the next driver for the same device. Immediate-disconnect from all higher interface for device drivers is my goal for driver model as I wrote in the RFD about lifetime rules. I think it's doable and should result in easier model to get right, but I might be missing something big time, so please point out if you can spot holes or don't agree. I personally think all driver interface should be made this way such that completion of unregister function guarantees no further access to the object or module. IMHO, it's more intuitive and easier to force correctness. If we really did this, we should also provide a non-waiting alternative. For transitional purpose, sure. In the long term, I think it's better if we can do without it. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Wed, 18 Apr 2007 17:46:09 +0900, Tejun Heo <[EMAIL PROTECTED]> wrote: > It's debatable but I think things will be safer this way. If we wait by > default, we are forced to check that all references are dropped and will > have a stack dump indicating which object is causing problem when > something goes wrong, which is better than silent object leaking and/or > jumping to non-existent address way later. I agree that oopsing is bad. However, lingering references are not always coding errors. What if it will just take long for a reference to be given up? You'd have a hanging device_unregister(), with no particular gain. > > I personally think all driver interface should be made this way such > that completion of unregister function guarantees no further access to > the object or module. IMHO, it's more intuitive and easier to force > correctness. If we really did this, we should also provide a non-waiting alternative. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Cornelia Huck wrote: > On Wed, 18 Apr 2007 03:49:01 +0900, > Tejun Heo <[EMAIL PROTECTED]> wrote: > >> Oh, one more thing, with proper code audit, we can just make >> device_unregister() do the waiting instead of adding separate >> device_unregister_wait(). I think that will be a good step toward >> immediate-detach driver model. > > Do we really want that? If the release function doesn't sit in the > module, or if the device_unregister() is done for other reasons in a > non-module, we don't care about lingering references. There's no need > to wait (and possibly lock up) there. It's debatable but I think things will be safer this way. If we wait by default, we are forced to check that all references are dropped and will have a stack dump indicating which object is causing problem when something goes wrong, which is better than silent object leaking and/or jumping to non-existent address way later. I personally think all driver interface should be made this way such that completion of unregister function guarantees no further access to the object or module. IMHO, it's more intuitive and easier to force correctness. I'm not sure about kobject_put() but device_unregister() waiting for release doesn't sound bad to me. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
Cornelia Huck wrote: > On Wed, 18 Apr 2007 03:41:10 +0900, > Tejun Heo <[EMAIL PROTECTED]> wrote: > >> Hello, all. >> >> Agreed with the problem but I'm not very enthusiastic for adding >> kobj->owner. How about the following? exit() routines will have to >> do device_unregister_wait() instead of device_unregister(). On return >> from it, it's guaranteed that all references to it are dropped and >> ->release is finished. > > Sounds interesting. Kind of like the completion approach, but with the > dangerous bits outside the module. > >> The caller is responsible for avoiding >> deadlock, of course. > > I think that wording is a bit too strong. Of course, if the device is > unregistered in the exit routine, the module must make sure it gave up > all references it itself obtained. However, it doesn't have any control > about who obtained a reference during the object's lifetime. If an > outsider holds on to a reference, we'll lock up until this reference > has been given up (but I guess this is what we want). Right, the better wording would be "the caller is responsible for not causing deadlocks by dropping all references it owns on entry". >> +void kobject_put_wait(struct kobject * kobj) >> +{ >> +struct kobj_wait kwait; >> +unsigned long flags; >> + >> +if (!kobj) >> +return; >> + >> +BUG_ON(!list_empty(&kobj->entry)); >> + >> +init_completion(&kwait.cmpl); >> +kwait.kobj = kobj; >> + >> +spin_lock_irqsave(&kobj_wait_lock, flags); >> +list_add(&kwait.list, &kobj_wait_list); >> +spin_unlock_irqrestore(&kobj_wait_lock, flags); >> + >> +kobject_put(kobj); >> + >> +if (!wait_for_completion_timeout(&kwait.cmpl, 30 * HZ)) { >> +printk(KERN_WARNING "kobject_put_wait: kobject %p is still " >> + "alive after 30s, possible reference count bug\n", kobj); >> +dump_stack(); >> +wait_for_completion(&kwait.cmpl); >> +} >> +} >> > > Couldn't this waiting be made simpler? > - add completion to kobject in kobject_unregister_wait() > - call kobject_put(), then wait_for_completion() > - in kobject_cleanup(), save completion from kobject, call release > function, then complete() on saved completion I was trying to avoid adding a completion to all kobjects. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Wed, 18 Apr 2007 03:49:01 +0900, Tejun Heo <[EMAIL PROTECTED]> wrote: > Oh, one more thing, with proper code audit, we can just make > device_unregister() do the waiting instead of adding separate > device_unregister_wait(). I think that will be a good step toward > immediate-detach driver model. Do we really want that? If the release function doesn't sit in the module, or if the device_unregister() is done for other reasons in a non-module, we don't care about lingering references. There's no need to wait (and possibly lock up) there. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFD] alternative kobject release wait mechanism
On Wed, 18 Apr 2007 03:41:10 +0900, Tejun Heo <[EMAIL PROTECTED]> wrote: > Hello, all. > > Agreed with the problem but I'm not very enthusiastic for adding > kobj->owner. How about the following? exit() routines will have to > do device_unregister_wait() instead of device_unregister(). On return > from it, it's guaranteed that all references to it are dropped and > ->release is finished. Sounds interesting. Kind of like the completion approach, but with the dangerous bits outside the module. > The caller is responsible for avoiding > deadlock, of course. I think that wording is a bit too strong. Of course, if the device is unregistered in the exit routine, the module must make sure it gave up all references it itself obtained. However, it doesn't have any control about who obtained a reference during the object's lifetime. If an outsider holds on to a reference, we'll lock up until this reference has been given up (but I guess this is what we want). > > The code is only compile-tested and is more of proof-of-concept than > working code. > > DO NOT APPLY. > diff --git a/lib/kobject.c b/lib/kobject.c > index 057921c..22e5148 100644 > --- a/lib/kobject.c > +++ b/lib/kobject.c > @@ -16,6 +16,15 @@ > #include > #include > > +struct kobj_wait { > + struct list_headlist; > + struct kobject *kobj; > + struct completion cmpl; > +}; > + > +static spinlock_t kobj_wait_lock = SPIN_LOCK_UNLOCKED; > +static LIST_HEAD(kobj_wait_list); > + > /** > * populate_dir - populate directory with attributes. > * @kobj: object we're working on. > @@ -425,6 +434,23 @@ void kobject_unregister(struct kobject * kobj) > } > > /** > + * kobject_unregister_wait - unregister, put and wait > + * @kobj: object going away > + * > + * Remove @kobj from hierarchy, decrement refcount and wait for > + * it to die. > + */ > +void kobject_unregister_wait(struct kobject * kobj) > +{ > + if (!kobj) > + return; > + pr_debug("kobject %s: unregistering\n",kobject_name(kobj)); > + kobject_uevent(kobj, KOBJ_REMOVE); > + kobject_del(kobj); > + kobject_put_wait(kobj); > +} > + > +/** > * kobject_get - increment refcount for object. > * @kobj: object. > */ > @@ -446,8 +472,22 @@ void kobject_cleanup(struct kobject * kobj) > struct kobj_type * t = get_ktype(kobj); > struct kset * s = kobj->kset; > struct kobject * parent = kobj->parent; > + struct kobj_wait *kwait; > + unsigned long flags; > > pr_debug("kobject %s: cleaning up\n",kobject_name(kobj)); > + > + /* is somebody waiting for @kobj to die? */ > + spin_lock_irqsave(&kobj_wait_lock, flags); > + list_for_each_entry(kwait, &kobj_wait_list, list) { > + if (kwait->kobj == kobj) { > + list_del_init(&kwait->list); > + break; > + } > + } > + spin_unlock_irqrestore(&kobj_wait_lock, flags); > + > + /* clean up */ > if (kobj->k_name != kobj->name) > kfree(kobj->k_name); > kobj->k_name = NULL; > @@ -456,6 +496,10 @@ void kobject_cleanup(struct kobject * kobj) > if (s) > kset_put(s); > kobject_put(parent); > + > + /* notify waiter */ > + if (kwait) > + complete(&kwait->cmpl); > } > > static void kobject_release(struct kref *kref) > @@ -475,6 +519,42 @@ void kobject_put(struct kobject * kobj) > kref_put(&kobj->kref, kobject_release); > } > > +/** > + * kobject_put_wait - put and wait for all references to go away > + * @kobj: object > + * > + * This function is used by @kobj owner to kill it - it puts a > + * reference to @kobj and waits till all the references are gone > + * and ->release is finished. @kobj must have been deleted and > + * the caller is responsible for guaranteeing that all references > + * will be dropped in foreseeable future. > + */ > +void kobject_put_wait(struct kobject * kobj) > +{ > + struct kobj_wait kwait; > + unsigned long flags; > + > + if (!kobj) > + return; > + > + BUG_ON(!list_empty(&kobj->entry)); > + > + init_completion(&kwait.cmpl); > + kwait.kobj = kobj; > + > + spin_lock_irqsave(&kobj_wait_lock, flags); > + list_add(&kwait.list, &kobj_wait_list); > + spin_unlock_irqrestore(&kobj_wait_lock, flags); > + > + kobject_put(kobj); > + > + if (!wait_for_completion_timeout(&kwait.cmpl, 30 * HZ)) { > + printk(KERN_WARNING "kobject_put_wait: kobject %p is still " > +"alive after 30s, possible reference count bug\n", kobj); > + dump_stack(); > + wait_for_completion(&kwait.cmpl); > + } > +} > Couldn't this waiting be made simpler? - add completion to kobject in kobject_unregister_wait() - call kobject_put(), then wait_for_completion() - in kobject_cleanup(), save completion from kobject, call release function, then complete() on saved
Re: [PATCH RFD] alternative kobject release wait mechanism
Tejun Heo wrote: > Hello, all. > > Agreed with the problem but I'm not very enthusiastic for adding > kobj->owner. How about the following? exit() routines will have to > do device_unregister_wait() instead of device_unregister(). On return > from it, it's guaranteed that all references to it are dropped and > ->release is finished. The caller is responsible for avoiding > deadlock, of course. Oh, one more thing, with proper code audit, we can just make device_unregister() do the waiting instead of adding separate device_unregister_wait(). I think that will be a good step toward immediate-detach driver model. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/