On Fri, Jan 13, 2017 at 12:02:36PM -0800, James Bottomley wrote: > > > Actually, no, the devrm is a completely lifetime managed device as > > > part > > > of the chip structure. once you've done a device_del on it, it can > > > be > > > kfreed because it's no longer visible to anything else. > > > > No, that isn't enough. Anything else could have obtained a kref on > > devrm outside of the sphere the device_del manages. > > > > For instance, the cdev does exactly that, via this: > > > > > chip->cdev.kobj.parent = &chip->dev.kobj; > > > + chip->cdevrm.kobj.parent = &chip->devrm.kobj; > > > > In the worst case the kref the cdev grabs is not released until after > > tpm_chip_unregister() returns. > > chip_unregister doesn't tear down either device. It's the final > release of the chip->dev that does that.
I don't think you are seeing the problem. I think you are assuming cdev_del waits for userspace to close any open fds. It does not. Instead cdev independently holds on to a module lock and a kref on the chip, once userspace has done close() then the kref is dropped and the module lock let go. This can happen after chip_unregister, after devm has done the final put_device, and after the driver core has completed driver detach. This is why it is necessary for this: + chip->cdevrm.kobj.parent = &chip->devrm.kobj; To point to a working kref, such as chip->dev.kobj. My point is this patch has two very subtle bugs caused by devrm having a broken kref. Yes we can fix those two cases, but this entire class of bugs is prevented if devrm has a release function that does put_device(chip->dev). We have no idea what will happen down the road; most poeple assume krefs *work*, having a struct with 4 krefs where only 3 work is pretty wild, IMHO. > Now there is a related problem that the owner is actually the *wrong* > module: it holds the tpm module in place not the actual driver module, > so I can happily attach tcsd to the TPM device then rmmod tpm_tis, > which causes some interesting issues. I can fix this, but it's not a > problem of the current patch. No, it is correct as is. The cdev fops rely only on the tpm module. When tpm_chip_unregister returns to the driver the chips->ops is set to NULL with proper locking - the driver code becomes uncallable at that point. Jason ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi _______________________________________________ tpmdd-devel mailing list tpmdd-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tpmdd-devel