On Fri, Apr 20, 2012 at 10:03:09AM +1000, matthew green wrote: > > > > scsipi depends upon kernel lock. thus, callers should arrange for it > > > to be held. since these are drivers calling, it is upto each driver > > > to do this currently. this is what we've done with other problems we > > > have hit as they've arrived. > > > > if the caller is MPSAFE, I agree. Non-MPSAFE driver should not have to care > > about KERNEL_LOCK (precisely because they are not MPSAFE). > > The basic assumption about kernel locking was that that driver and > > subsystems > > using spl locking would continue to work as it, without changes. If we > > change this assumption a lot of code needs to be changed (that is, almost > > all our drivers). > > we've never had autoconfig run with the kernel lock AFAICT, so this > assumption has never been true.
So this is a bug. The contract was really that spl-locked drivers would continue to work as is when fine-grained locking was introduced. I don't think it's reasonable to require every (but the few already MP-safe) drivers to take KERNEL_LOCK in their attach routine. > the thing that does work is that > IPL_VM using drivers will automatically get kernel locked in their > interrupt handlers. It's not only IPL_VM drivers, it'a all driver wich don't register and interrupt handler explicitely marked MPSAFE (for example, with PCI_INTR_MPSAFE for PCI drivers). > > > > these KASSERT()s have been in place for a long time now and, until > > > your change, we've fixed it at the caller level not worked around it > > > at the scsipi level. > > > > AFAIK the only drivers that has been changed are ncr53c9x and USB, and they > > are MP-safe (they already use mutex instead of spl locking). > > usb is not MP safe in -current, only on the usbmp branch. all those > drivers take care of taking the kernel lock when calling into the > scsipi code since they almost *always* run when not cold anyway. So things were fixed at the wrong place. Any non-MPSAFE driver is at risk right now > infact, the asserts were added in -current due to the usbmp work > and the understanding that they would advance the rest of the system. > > > > your change can't work with the drivers that are broken without the > > > major regression of holding kernel lock over autoconfig, or, never > > > ever having them be modules or detached/reattached. > > > > Maybe non-MPSAFE drivers can't be modules or detached/reattached, indeed. > > that's not true. eg, radeondrm works just fine being MP safe or not. > (it actually is MP safe, though it doesn't announce this to any of the > rest of the system currently.) Yes, of course, unless there's a KASSERT, things will run fine most of the time. That's the problem with race conditions: it doesn't reliably fail. > > > But that's a problem with autoconf not dealing with non-MPSAFE drivers, > > not the driver themselve (they were working before the MP changes, they > > should continue to work as-is). If you want autoconf to not run > > under the KERNEL_LOCK when not needed, then is has to be extended so that > > MPSAFE drivers can register themselves as MPSAFE, so that non-MPSAFE > > drivers are still called with KERNEL_LOCK. > > it seems the wrong answer to me. instead of fixing the one or > two drivers you've found problems in, you want to add more > infrastructure (IMO clutter) to the system and patch all the > drivers that *are* ok. considering that this problem is > very specific to scsipi, i don't think that is a viable > solution. No, it's not specific to scsipi. It happens that scsipi asserts that KERNEL_LOCK is held, but any non-MPSAFE drivers needs to be called with KERNEL_LOCK held. > > > > please make "drvctl -d" and rescan work for your drivers. this will > > > make the checks against "cold" go away. > > > > I don't want to touch each driver in our tree for drvctl -d/rescan to work. > > as in aside, you should make "drvctl -d" work for any drivers > you maintain. everyone (tm) should. irrespective of modules > or otherwise, the end result of it working tends to be bug > fixes. It's hard to drvctl -d the root device ... > > > If we want this to work for non-MPSAFE driver, then this has to be fixed > > in autoconf (if the problem really exists, which I've not looked at yet). > > i don't believe there is a problem in autoconfig. there is a > problem with some (probably most) scsi drivers here, but that > is about it. it is true that the contracts have changed, but > this is progress after all. the assert exists solely to find > places that are wrong and fix them. your change hides this. the contract has changed in a way that exposes all non-MPSAFE driver to a race condition. The problem shows up in scsipi, but it's not the only place where the problem exists. ata is not MP-safe, so the problem is also in IDE and SATA drivers (including some USB drivers). mii(4) and ifmedia is not MP-safe, so ethernet drivers have this problem too. I've not audited the whole code but there's probably other examples. > > this isn't about SMPifing a whole driver, but adding a call to > taken/release the kernel lock around a few (often just one) > calls in a driver. it has never been a difficult change to > make. if you have drivers that needs it, i am happy to send > patches for testing. *ALL* non-MPSAFE drivers needs it. -- Manuel Bouyer <bou...@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference --