Re: Proper -current if_attach locking?
M. Warner Losh wrote: In message: [EMAIL PROTECTED] Nate Lawson [EMAIL PROTECTED] writes: : I was looking into some could sleep messages and found some bogus : locking in the attach routine of many drivers. Several init a mtx in : their softc and then lock/unlock it in their attach routine. This, of : course, does nothing to provide exclusive access to a device. I assume : there is going to be a global IF_LOCK or something to be used in attach : routines. Can someone fill me in on the intended design? The locking in the attach routines is generally bogus. Locking is only needed when you have more than one thread of execution. You don't have more than one thread of execution until after you establish the ISR and turn on interrupts. We should likely not be enabling interrupts until very late in the attach routine so that we don't need any locking in them. I looked at this. It seems to me that it's not quite that simple (sorry). I think that there are issues with locking because you don't know if this is a driver that's getting loaded well after the system has booted, or if this is a PCCARD or other hot plug device that has just arrived in the system. It also seems to me that the reversal problems that would result by simply inserting locking have to do with the fact that the code is relatively schitzophrenic on deciding whether it's locking data, or it's locking a critical path. The entire idea of lock order reversals (please, don't use the abbreviation LOR: my mind keeps telling me that Legolas just got his commit bit) is predicated, I think, on the idea that there is an order of operation that has to be adhered to because we are talking not about data object locking, we are in fact talking about (theoretically independent) code path locking, and what happens on a reentry during a sleep. Basically, this means that for code that can't result in a sleep, there's no such thing as a lock order reversal -- there's only locking at the wrong functional layer. In this particular case, we're talking about a interface list, and we're talking about a device instance, and the events that get a device on or off the list are *not* independent and they *are* mutually exclusive, based on state. Given all that, it seems to me that this would be an ideal place for a critical section lock -- or, at least, a lock that's asserted on the list of interfaces, and which doesn't have any witness registration (per Bruce's earlier suggestion). My preference (if it's not obvious) would be a critical section lock, but a data lock on a list held over the same operations is functionally the same thing, anyway. Perhaps it's time to step back, and look over the overall locking philosophy that everyone is supposed to be implementing to, and the implementation assumptions that that philosophy makes necessary for things to work? I think that some of the assumptions are themselves mutually exclusive with current implementation, and it would be nice to know, up front, what needs to be rewritten, and what *should* be rewritten, to conform to the new philosophy. I can't be the only one who finds FreeBSD 5.x to be in such a state of flux that it's almost impossible to know what a correct implementation is supposed to look like, for a given subsystem and/or device driver, list, etc.. If someone were to lay down the law, we could at least discuss it, and come to some conclusions that would let people know what to do -- even if there's no change in the law as a result, it's a useful exercise, since people can then just code to the spec., and not have to have in depth knowledge of the overall design (if there is one) to deal with all the related issues... they can just copy the changes into every other driver/whatever that's in the same niche, ecologically. Parallel progress is a Good Thing(tm). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Proper -current if_attach locking?
In message: [EMAIL PROTECTED] Terry Lambert [EMAIL PROTECTED] writes: : M. Warner Losh wrote: : In message: [EMAIL PROTECTED] : Nate Lawson [EMAIL PROTECTED] writes: : : I was looking into some could sleep messages and found some bogus : : locking in the attach routine of many drivers. Several init a mtx in : : their softc and then lock/unlock it in their attach routine. This, of : : course, does nothing to provide exclusive access to a device. I assume : : there is going to be a global IF_LOCK or something to be used in attach : : routines. Can someone fill me in on the intended design? : : The locking in the attach routines is generally bogus. Locking is : only needed when you have more than one thread of execution. You : don't have more than one thread of execution until after you establish : the ISR and turn on interrupts. We should likely not be enabling : interrupts until very late in the attach routine so that we don't need : any locking in them. : : I looked at this. It seems to me that it's not quite that : simple (sorry). I think that there are issues with locking : because you don't know if this is a driver that's getting : loaded well after the system has booted, or if this is a : PCCARD or other hot plug device that has just arrived in : the system. That doesn't mattar at all. If it is a new device that's just arrived, the attach still won't be interrupted *by other code in the driver* until after it has setup its ISR and told the hardware to start generating interrupts. No device locking is needed in the attach routine until after interrupts are enabled in the hardware. : It also seems to me that the reversal problems that would : result by simply inserting locking have to do with the fact : that the code is relatively schitzophrenic on deciding whether : it's locking data, or it's locking a critical path. The reversal is because of the bogus locking. The first time through it locks the device then the interface. However, after that it locks the interface and then the device, which can be bad. It does point to a problem, however, in that sometimes we'll take out the locks in one order and other times other orders depending on the code path if we aren't careful. I should go look at the new code more closely. I worry that in the non interrupt case we get things in the IF, DEV order (because the IF locks, then calls the callback routines, which then call the DEV lock). But in the interrupt case we get the DEV lock first, then try to queue data and that somehow causes the IF locks to be grabbed. But you are right, I do need to go look at the code to see what, exactly, is happening and how the new interface locking code is interacting with the semi-bogus locking than many of the wpaul drivers have in them now. : I can't be the only one who finds FreeBSD 5.x to be in such a state : of flux that it's almost impossible to know what a correct : implementation is supposed to look like, for a given subsystem : and/or device driver, list, etc.. There we agree. It takes a lot to keep up, and even then I fall behind when something new happens behind my back. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Proper -current if_attach locking?
I was right (and I think you are too). We do have lock issues. dc_attach does approximately: DC_LOCK ether_attach() (which does a IFNET_WLOCK/UNLOCK pair) DC_UNLOCK (this sets the lock order to be DC_LOCK, IFNET_WLOCK). However in if_slowtimo we have: if_slowtimo(arg) { ... IFNET_RLOCK(); ... if (ifp-if_watchdog) (*ifp-if_watchdog)(ifp); ... IFNET_RUNLOCK(); } and dc_watchdog does a DC_LOCK/UNLOCK pair). This is a Lock Order Reversal, and not a LotR :-) What's worse is that dc_intr does: DC_LOCK ...dc_start (which calls IF_PREPEND which does the IFNET_LOCK/UNLOCK thing) DC_UNLOCK So even if we remove the one from attach, it looks like we have others lurking in the code. Either that, or it is too late for me to be looking at code like this :-( Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Proper -current if_attach locking?
M. Warner Losh writes: .. However in if_slowtimo we have: if_slowtimo(arg) { ... IFNET_RLOCK(); ... if (ifp-if_watchdog) (*ifp-if_watchdog)(ifp); ... IFNET_RUNLOCK(); } and dc_watchdog does a DC_LOCK/UNLOCK pair). This is a Lock Order Reversal, and not a LotR :-) What's worse is that dc_intr does: DC_LOCK ...dc_start (which calls IF_PREPEND which does the IFNET_LOCK/UNLOCK thing) DC_UNLOCK So even if we remove the one from attach, it looks like we have others lurking in the code. Either that, or it is too late for me to be looking at code like this :-( I think its too late at night ;) The IFNET_RLOCK() called in if_slowtimo() is a global lock for the list of ifnet structs to ensure that no devices are removed or added while something may be using it. There is one ifnet list in the system. The lock in IF_PREPEND() (and more commonly used in drivers, IF_DEQUE()) is per-ifq, to protect against multiple accesses to a single ifq. There are many ifqs in the system. FWIW, I've been running my 3rd party Myrinet driver Giant-free and have had no problems, and no lock order reversals. I don't do bogus locking in my attach routine, though :) FWIW2: Running Giant-free brings -current TCP performance up to nearly 63% of -stable performance (from 39%), and udp xmit perf up to 87% of -stable. (testing w/o WITNESS). Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Proper -current if_attach locking?
In message: [EMAIL PROTECTED] Andrew Gallatin [EMAIL PROTECTED] writes: : The IFNET_RLOCK() called in if_slowtimo() is a global lock for the : list of ifnet structs to ensure that no devices are removed or added : while something may be using it. There is one ifnet list in the system. So this means that only the locking in attach is bogus, and similar locking in detach is also bogus because they produce lock order reversals as the global lock is held to insert/remove if interfaces. : The lock in IF_PREPEND() (and more commonly used in drivers, : IF_DEQUE()) is per-ifq, to protect against multiple accesses : to a single ifq. There are many ifqs in the system. I knew I must have been missing something really fundamental last night. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Proper -current if_attach locking?
M. Warner Losh writes: In message: [EMAIL PROTECTED] Andrew Gallatin [EMAIL PROTECTED] writes: : The IFNET_RLOCK() called in if_slowtimo() is a global lock for the : list of ifnet structs to ensure that no devices are removed or added : while something may be using it. There is one ifnet list in the system. So this means that only the locking in attach is bogus, and similar locking in detach is also bogus because they produce lock order reversals as the global lock is held to insert/remove if interfaces. Yes. Though I haven't looked at if_dc myself, there may be other locking problems. I've only been commenting on the ones that you brought up. But back to an earlier point. Somebody (you?) validly pointed out that the driver should not be callable and should not generate interrupts until its finished attaching. The lock in its attach was probably a somewhat misguided attempt at that. The first point can be accomplished by doing the ether_ifattach() last, but the second may be harder. I do that by poking a bit on my card which prevents it from generating interrupts while the device is being setup. Not sure if a similar bit exists on tulip cards. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Proper -current if_attach locking?
In message: [EMAIL PROTECTED] Andrew Gallatin [EMAIL PROTECTED] writes: : : M. Warner Losh writes: : In message: [EMAIL PROTECTED] : Andrew Gallatin [EMAIL PROTECTED] writes: : : The IFNET_RLOCK() called in if_slowtimo() is a global lock for the : : list of ifnet structs to ensure that no devices are removed or added : : while something may be using it. There is one ifnet list in the system. : : So this means that only the locking in attach is bogus, and similar : locking in detach is also bogus because they produce lock order : reversals as the global lock is held to insert/remove if interfaces. : : Yes. Though I haven't looked at if_dc myself, there may be other : locking problems. I've only been commenting on the ones that you : brought up. : : But back to an earlier point. Somebody (you?) validly pointed out : that the driver should not be callable and should not generate : interrupts until its finished attaching. The lock in its attach was : probably a somewhat misguided attempt at that. Yes. That was me. There are some drivers that have separated front/back ends that makes this harder, but most of them don't. : The first point can be accomplished by doing the ether_ifattach() : last, but the second may be harder. I do that by poking a bit on my : card which prevents it from generating interrupts while the device is : being setup. Not sure if a similar bit exists on tulip cards. All PCI cards have to be able to turn off their interrupt sources, otherwise interrupt storms result. At least that's my understanding. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Proper -current if_attach locking?
On Fri, 3 Jan 2003, Nate Lawson wrote: NLI was looking into some could sleep messages and found some bogus NLlocking in the attach routine of many drivers. Several init a mtx in NLtheir softc and then lock/unlock it in their attach routine. This, of NLcourse, does nothing to provide exclusive access to a device. I assume NLthere is going to be a global IF_LOCK or something to be used in attach NLroutines. Can someone fill me in on the intended design? Probably not. I asked the same question a couple of month ago and got 0 answers. I think, there is no way, the driver itself can assure exclusive access to the device it is attaching. It *must* assume, that there is some kind of locking around the call to the attach routine. Getting the lock in the softc inside the attach routine may be neccessary, because the routine may call other functions that assume they have the lock. harti -- harti brandt, http://www.fokus.gmd.de/research/cc/cats/employees/hartmut.brandt/private [EMAIL PROTECTED], [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Proper -current if_attach locking?
On Mon, Jan 06, 2003 at 10:58:25AM +0100, Harti Brandt wrote: On Fri, 3 Jan 2003, Nate Lawson wrote: NLI was looking into some could sleep messages and found some bogus NLlocking in the attach routine of many drivers. Several init a mtx in NLtheir softc and then lock/unlock it in their attach routine. This, of NLcourse, does nothing to provide exclusive access to a device. I assume NLthere is going to be a global IF_LOCK or something to be used in attach NLroutines. Can someone fill me in on the intended design? Probably not. I asked the same question a couple of month ago and got 0 answers. I think, there is no way, the driver itself can assure exclusive access to the device it is attaching. It *must* assume, that there is some kind of locking around the call to the attach routine. Getting the lock in the softc inside the attach routine may be neccessary, because the routine may call other functions that assume they have the lock. Only using lock in softc can't assure its exclusive access because there are some cases of changing some values in ifnet struct outside of device driver routines. Most of the NIC drivers don't have its own locks for now, and using both IFNET_*LOCK() and its own softc lock can't make everything in sync. There should be two use of locks IMO: one or more per-device locks in driver softc for manipulating per-device private data protection, and ifnet lock for each ifnet struct protection (such as ifnet.if_mtx). Maybe these locks should be adaptive or spin And IFNET_*LOCK() should remain for adding/removing ifnet struct to the global ifnet whose type is ifnethead. In case of ifqueue, it should not need to acquire its ifnet lock because ifqueue has its own mutex. -- Kyunghwan Kim [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Proper -current if_attach locking?
On Mon, 6 Jan 2003, Kyunghwan Kim wrote: On Mon, Jan 06, 2003 at 10:58:25AM +0100, Harti Brandt wrote: On Fri, 3 Jan 2003, Nate Lawson wrote: NLI was looking into some could sleep messages and found some bogus NLlocking in the attach routine of many drivers. Several init a mtx in NLtheir softc and then lock/unlock it in their attach routine. This, of NLcourse, does nothing to provide exclusive access to a device. I assume NLthere is going to be a global IF_LOCK or something to be used in attach NLroutines. Can someone fill me in on the intended design? Probably not. I asked the same question a couple of month ago and got 0 answers. I think, there is no way, the driver itself can assure exclusive access to the device it is attaching. It *must* assume, that there is some kind of locking around the call to the attach routine. Getting the lock in the softc inside the attach routine may be neccessary, because the routine may call other functions that assume they have the lock. Only using lock in softc can't assure its exclusive access because there are some cases of changing some values in ifnet struct outside of device driver routines. Most of the NIC drivers don't have its own locks for now, and using both IFNET_*LOCK() and its own softc lock can't make everything in sync. My point. There should be two use of locks IMO: one or more per-device locks in driver softc for manipulating per-device private data protection, and ifnet lock for each ifnet struct protection (such as ifnet.if_mtx). Looking further into sys/net/if.c, it appears that the list of interfaces is protected by IFNET_WLOCK in if_attach(). I think it's safe to work under the following assumptions: 1. newbus will not call an attach routine twice for the same hw 2. ifnet routines take care of themselves 3. per-device locking is only necessary to provide exclusive access within a given driver instance. -Nate To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Proper -current if_attach locking?
In message: [EMAIL PROTECTED] Nate Lawson [EMAIL PROTECTED] writes: : I was looking into some could sleep messages and found some bogus : locking in the attach routine of many drivers. Several init a mtx in : their softc and then lock/unlock it in their attach routine. This, of : course, does nothing to provide exclusive access to a device. I assume : there is going to be a global IF_LOCK or something to be used in attach : routines. Can someone fill me in on the intended design? The locking in the attach routines is generally bogus. Locking is only needed when you have more than one thread of execution. You don't have more than one thread of execution until after you establish the ISR and turn on interrupts. We should likely not be enabling interrupts until very late in the attach routine so that we don't need any locking in them. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Proper -current if_attach locking?
I was looking into some could sleep messages and found some bogus locking in the attach routine of many drivers. Several init a mtx in their softc and then lock/unlock it in their attach routine. This, of course, does nothing to provide exclusive access to a device. I assume there is going to be a global IF_LOCK or something to be used in attach routines. Can someone fill me in on the intended design? -Nate To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message