Re: Proper -current if_attach locking?

2003-01-07 Thread Terry Lambert
M. Warner Losh wrote:
 In message: [EMAIL PROTECTED]
 Nate Lawson [EMAIL PROTECTED] writes:
 : I was looking into some could sleep messages and found some bogus
 : locking in the attach routine of many drivers.  Several init a mtx in
 : their softc and then lock/unlock it in their attach routine.  This, of
 : course, does nothing to provide exclusive access to a device.  I assume
 : there is going to be a global IF_LOCK or something to be used in attach
 : routines.  Can someone fill me in on the intended design?
 
 The locking in the attach routines is generally bogus.  Locking is
 only needed when you have more than one thread of execution.  You
 don't have more than one thread of execution until after you establish
 the ISR and turn on interrupts.  We should likely not be enabling
 interrupts until very late in the attach routine so that we don't need
 any locking in them.

I looked at this.  It seems to me that it's not quite that
simple (sorry).  I think that there are issues with locking
because you don't know if this is a driver that's getting
loaded well after the system has booted, or if this is a
PCCARD or other hot plug device that has just arrived in
the system.

It also seems to me that the reversal problems that would
result by simply inserting locking have to do with the fact
that the code is relatively schitzophrenic on deciding whether
it's locking data, or it's locking a critical path.

The entire idea of lock order reversals (please, don't use the
abbreviation LOR: my mind keeps telling me that Legolas just
got his commit bit) is predicated, I think, on the idea that
there is an order of operation that has to be adhered to because
we are talking not about data object locking, we are in fact
talking about (theoretically independent) code path locking, and
what happens on a reentry during a sleep.  Basically, this means
that for code that can't result in a sleep, there's no such thing
as a lock order reversal -- there's only locking at the wrong
functional layer.

In this particular case, we're talking about a interface list,
and we're talking about a device instance, and the events that
get a device on or off the list are *not* independent and they
*are* mutually exclusive, based on state.

Given all that, it seems to me that this would be an ideal
place for a critical section lock -- or, at least, a lock
that's asserted on the list of interfaces, and which doesn't
have any witness registration (per Bruce's earlier suggestion).
My preference (if it's not obvious) would be a critical section
lock, but a data lock on a list held over the same operations is
functionally the same thing, anyway.

Perhaps it's time to step back, and look over the overall locking
philosophy that everyone is supposed to be implementing to, and
the implementation assumptions that that philosophy makes necessary
for things to work?  I think that some of the assumptions are
themselves mutually exclusive with current implementation, and it
would be nice to know, up front, what needs to be rewritten, and
what *should* be rewritten, to conform to the new philosophy.

I can't be the only one who finds FreeBSD 5.x to be in such a state
of flux that it's almost impossible to know what a correct
implementation is supposed to look like, for a given subsystem
and/or device driver, list, etc..

If someone were to lay down the law, we could at least discuss it,
and come to some conclusions that would let people know what to do --
even if there's no change in the law as a result, it's a useful
exercise, since people can then just code to the spec., and not
have to have in depth knowledge of the overall design (if there is
one) to deal with all the related issues... they can just copy the
changes into every other driver/whatever that's in the same niche,
ecologically.  Parallel progress is a Good Thing(tm).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Proper -current if_attach locking?

2003-01-07 Thread M. Warner Losh
In message: [EMAIL PROTECTED]
Terry Lambert [EMAIL PROTECTED] writes:
: M. Warner Losh wrote:
:  In message: [EMAIL PROTECTED]
:  Nate Lawson [EMAIL PROTECTED] writes:
:  : I was looking into some could sleep messages and found some bogus
:  : locking in the attach routine of many drivers.  Several init a mtx in
:  : their softc and then lock/unlock it in their attach routine.  This, of
:  : course, does nothing to provide exclusive access to a device.  I assume
:  : there is going to be a global IF_LOCK or something to be used in attach
:  : routines.  Can someone fill me in on the intended design?
:  
:  The locking in the attach routines is generally bogus.  Locking is
:  only needed when you have more than one thread of execution.  You
:  don't have more than one thread of execution until after you establish
:  the ISR and turn on interrupts.  We should likely not be enabling
:  interrupts until very late in the attach routine so that we don't need
:  any locking in them.
: 
: I looked at this.  It seems to me that it's not quite that
: simple (sorry).  I think that there are issues with locking
: because you don't know if this is a driver that's getting
: loaded well after the system has booted, or if this is a
: PCCARD or other hot plug device that has just arrived in
: the system.

That doesn't mattar at all.  If it is a new device that's just
arrived, the attach still won't be interrupted *by other code in the
driver* until after it has setup its ISR and told the hardware to
start generating interrupts.  No device locking is needed in the
attach routine until after interrupts are enabled in the hardware.

: It also seems to me that the reversal problems that would
: result by simply inserting locking have to do with the fact
: that the code is relatively schitzophrenic on deciding whether
: it's locking data, or it's locking a critical path.

The reversal is because of the bogus locking.  The first time through
it locks the device then the interface.  However, after that it locks
the interface and then the device, which can be bad.  It does point to
a problem, however, in that sometimes we'll take out the locks in one
order and other times other orders depending on the code path if we
aren't careful.  I should go look at the new code more closely.

I worry that in the non interrupt case we get things in the IF, DEV
order (because the IF locks, then calls the callback routines, which
then call the DEV lock).  But in the interrupt case we get the DEV
lock first, then try to queue data and that somehow causes the IF
locks to be grabbed.

But you are right, I do need to go look at the code to see what,
exactly, is happening and how the new interface locking code is
interacting with the semi-bogus locking than many of the wpaul drivers
have in them now.

: I can't be the only one who finds FreeBSD 5.x to be in such a state
: of flux that it's almost impossible to know what a correct
: implementation is supposed to look like, for a given subsystem
: and/or device driver, list, etc..

There we agree.  It takes a lot to keep up, and even then I fall
behind when something new happens behind my back.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Proper -current if_attach locking?

2003-01-07 Thread M. Warner Losh
I was right (and I think you are too).  We do have lock issues.

dc_attach does approximately:

DC_LOCK
ether_attach()  (which does a IFNET_WLOCK/UNLOCK pair)
DC_UNLOCK

(this sets the lock order to be DC_LOCK, IFNET_WLOCK).

However in if_slowtimo we have:

if_slowtimo(arg)
{
... IFNET_RLOCK();
... if (ifp-if_watchdog)
(*ifp-if_watchdog)(ifp);
... IFNET_RUNLOCK();
}

and dc_watchdog does a DC_LOCK/UNLOCK pair).  This is a Lock Order
Reversal, and not a LotR :-)

What's worse is that dc_intr does:

DC_LOCK
...dc_start (which calls IF_PREPEND which does the IFNET_LOCK/UNLOCK thing)
DC_UNLOCK

So even if we remove the one from attach, it looks like we have others
lurking in the code.

Either that, or it is too late for me to be looking at code like this
:-(

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Proper -current if_attach locking?

2003-01-07 Thread Andrew Gallatin

M. Warner Losh writes:
..
  However in if_slowtimo we have:
  
  if_slowtimo(arg)
  {
  ...  IFNET_RLOCK();
  ...  if (ifp-if_watchdog)
   (*ifp-if_watchdog)(ifp);
  ...  IFNET_RUNLOCK();
  }
  
  and dc_watchdog does a DC_LOCK/UNLOCK pair).  This is a Lock Order
  Reversal, and not a LotR :-)
  
  What's worse is that dc_intr does:
  
  DC_LOCK
  ...dc_start (which calls IF_PREPEND which does the IFNET_LOCK/UNLOCK thing)
  DC_UNLOCK
  
  So even if we remove the one from attach, it looks like we have others
  lurking in the code.
  
  Either that, or it is too late for me to be looking at code like this
  :-(
  

I think its too late at night ;)

The IFNET_RLOCK() called in if_slowtimo() is a global lock for the
list of ifnet structs to ensure that no devices are removed or added
while something may be using it.  There is one ifnet list in the system.

The lock in IF_PREPEND() (and more commonly used in drivers,
IF_DEQUE()) is per-ifq, to protect against multiple accesses 
to a single  ifq.  There are many ifqs in the system.

FWIW, I've been running my 3rd party Myrinet driver Giant-free and
have had no problems, and no lock order reversals.  I don't do bogus
locking in my attach routine, though :)

FWIW2: Running Giant-free brings -current TCP performance up to nearly
63% of -stable performance (from 39%), and udp xmit perf up to 87% of
-stable.  (testing w/o WITNESS).

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Proper -current if_attach locking?

2003-01-07 Thread M. Warner Losh
In message: [EMAIL PROTECTED]
Andrew Gallatin [EMAIL PROTECTED] writes:
: The IFNET_RLOCK() called in if_slowtimo() is a global lock for the
: list of ifnet structs to ensure that no devices are removed or added
: while something may be using it.  There is one ifnet list in the system.

So this means that only the locking in attach is bogus, and similar
locking in detach is also bogus because they produce lock order
reversals as the global lock is held to insert/remove if interfaces.

: The lock in IF_PREPEND() (and more commonly used in drivers,
: IF_DEQUE()) is per-ifq, to protect against multiple accesses 
: to a single  ifq.  There are many ifqs in the system.

I knew I must have been missing something really fundamental last
night.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Proper -current if_attach locking?

2003-01-07 Thread Andrew Gallatin

M. Warner Losh writes:
  In message: [EMAIL PROTECTED]
  Andrew Gallatin [EMAIL PROTECTED] writes:
  : The IFNET_RLOCK() called in if_slowtimo() is a global lock for the
  : list of ifnet structs to ensure that no devices are removed or added
  : while something may be using it.  There is one ifnet list in the system.
  
  So this means that only the locking in attach is bogus, and similar
  locking in detach is also bogus because they produce lock order
  reversals as the global lock is held to insert/remove if interfaces.

Yes.  Though I haven't looked at if_dc myself, there may be other
locking problems.  I've only been commenting on the ones that you
brought up.

But back to an earlier point.  Somebody (you?) validly pointed out
that the driver should not be callable and should not generate
interrupts until its finished attaching.  The lock in its attach was
probably a somewhat misguided attempt at that.  

The first point can be accomplished by doing the ether_ifattach()
last, but the second may be harder.  I do that by poking a bit on my
card which prevents it from generating interrupts while the device is
being setup.  Not sure if a similar bit exists on tulip cards.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Proper -current if_attach locking?

2003-01-07 Thread M. Warner Losh
In message: [EMAIL PROTECTED]
Andrew Gallatin [EMAIL PROTECTED] writes:
: 
: M. Warner Losh writes:
:   In message: [EMAIL PROTECTED]
:   Andrew Gallatin [EMAIL PROTECTED] writes:
:   : The IFNET_RLOCK() called in if_slowtimo() is a global lock for the
:   : list of ifnet structs to ensure that no devices are removed or added
:   : while something may be using it.  There is one ifnet list in the system.
:   
:   So this means that only the locking in attach is bogus, and similar
:   locking in detach is also bogus because they produce lock order
:   reversals as the global lock is held to insert/remove if interfaces.
: 
: Yes.  Though I haven't looked at if_dc myself, there may be other
: locking problems.  I've only been commenting on the ones that you
: brought up.
: 
: But back to an earlier point.  Somebody (you?) validly pointed out
: that the driver should not be callable and should not generate
: interrupts until its finished attaching.  The lock in its attach was
: probably a somewhat misguided attempt at that.  

Yes.  That was me.  There are some drivers that have separated
front/back ends that makes this harder, but most of them don't.

: The first point can be accomplished by doing the ether_ifattach()
: last, but the second may be harder.  I do that by poking a bit on my
: card which prevents it from generating interrupts while the device is
: being setup.  Not sure if a similar bit exists on tulip cards.

All PCI cards have to be able to turn off their interrupt sources,
otherwise interrupt storms result.  At least that's my understanding.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Proper -current if_attach locking?

2003-01-06 Thread Harti Brandt
On Fri, 3 Jan 2003, Nate Lawson wrote:

NLI was looking into some could sleep messages and found some bogus
NLlocking in the attach routine of many drivers.  Several init a mtx in
NLtheir softc and then lock/unlock it in their attach routine.  This, of
NLcourse, does nothing to provide exclusive access to a device.  I assume
NLthere is going to be a global IF_LOCK or something to be used in attach
NLroutines.  Can someone fill me in on the intended design?

Probably not. I asked the same question a couple of month ago and got 0
answers. I think, there is no way, the driver itself can assure exclusive
access to the device it is attaching. It *must* assume, that there is some
kind of locking around the call to the attach routine. Getting the lock in
the softc inside the attach routine may be neccessary, because the routine
may call other functions that assume they have the lock.

harti
-- 
harti brandt, http://www.fokus.gmd.de/research/cc/cats/employees/hartmut.brandt/private
  [EMAIL PROTECTED], [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Proper -current if_attach locking?

2003-01-06 Thread Kyunghwan Kim
On Mon, Jan 06, 2003 at 10:58:25AM +0100, Harti Brandt wrote:
 On Fri, 3 Jan 2003, Nate Lawson wrote:
 NLI was looking into some could sleep messages and found some bogus
 NLlocking in the attach routine of many drivers.  Several init a mtx in
 NLtheir softc and then lock/unlock it in their attach routine.  This, of
 NLcourse, does nothing to provide exclusive access to a device.  I assume
 NLthere is going to be a global IF_LOCK or something to be used in attach
 NLroutines.  Can someone fill me in on the intended design?
 
 Probably not. I asked the same question a couple of month ago and got 0
 answers. I think, there is no way, the driver itself can assure exclusive
 access to the device it is attaching. It *must* assume, that there is some
 kind of locking around the call to the attach routine. Getting the lock in
 the softc inside the attach routine may be neccessary, because the routine
 may call other functions that assume they have the lock.

Only using lock in softc can't assure its exclusive access
because there are some cases of changing some values in ifnet struct
outside of device driver routines.

Most of the NIC drivers don't have its own locks for now, and using
both IFNET_*LOCK() and its own softc lock can't make everything in sync.

There should be two use of locks IMO: one or more per-device locks
in driver softc for manipulating per-device private data protection,
and ifnet lock for each ifnet struct protection (such as ifnet.if_mtx).
Maybe these locks should be adaptive or spin
And IFNET_*LOCK() should remain for adding/removing ifnet struct to
the global ifnet whose type is ifnethead.

In case of ifqueue, it should not need to acquire its ifnet lock
because ifqueue has its own mutex.
-- 
Kyunghwan Kim
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Proper -current if_attach locking?

2003-01-06 Thread Nate Lawson
On Mon, 6 Jan 2003, Kyunghwan Kim wrote:
 On Mon, Jan 06, 2003 at 10:58:25AM +0100, Harti Brandt wrote:
  On Fri, 3 Jan 2003, Nate Lawson wrote:
  NLI was looking into some could sleep messages and found some bogus
  NLlocking in the attach routine of many drivers.  Several init a mtx in
  NLtheir softc and then lock/unlock it in their attach routine.  This, of
  NLcourse, does nothing to provide exclusive access to a device.  I assume
  NLthere is going to be a global IF_LOCK or something to be used in attach
  NLroutines.  Can someone fill me in on the intended design?
  
  Probably not. I asked the same question a couple of month ago and got 0
  answers. I think, there is no way, the driver itself can assure exclusive
  access to the device it is attaching. It *must* assume, that there is some
  kind of locking around the call to the attach routine. Getting the lock in
  the softc inside the attach routine may be neccessary, because the routine
  may call other functions that assume they have the lock.
 
 Only using lock in softc can't assure its exclusive access
 because there are some cases of changing some values in ifnet struct
 outside of device driver routines.
 
 Most of the NIC drivers don't have its own locks for now, and using
 both IFNET_*LOCK() and its own softc lock can't make everything in sync.

My point.
 
 There should be two use of locks IMO: one or more per-device locks
 in driver softc for manipulating per-device private data protection,
 and ifnet lock for each ifnet struct protection (such as ifnet.if_mtx).

Looking further into sys/net/if.c, it appears that the list of interfaces
is protected by IFNET_WLOCK in if_attach().

I think it's safe to work under the following assumptions:
1. newbus will not call an attach routine twice for the same hw
2. ifnet routines take care of themselves
3. per-device locking is only necessary to provide exclusive access within
a given driver instance.

-Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Proper -current if_attach locking?

2003-01-06 Thread M. Warner Losh
In message: [EMAIL PROTECTED]
Nate Lawson [EMAIL PROTECTED] writes:
: I was looking into some could sleep messages and found some bogus
: locking in the attach routine of many drivers.  Several init a mtx in
: their softc and then lock/unlock it in their attach routine.  This, of
: course, does nothing to provide exclusive access to a device.  I assume
: there is going to be a global IF_LOCK or something to be used in attach
: routines.  Can someone fill me in on the intended design?

The locking in the attach routines is generally bogus.  Locking is
only needed when you have more than one thread of execution.  You
don't have more than one thread of execution until after you establish
the ISR and turn on interrupts.  We should likely not be enabling
interrupts until very late in the attach routine so that we don't need
any locking in them.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Proper -current if_attach locking?

2003-01-03 Thread Nate Lawson
I was looking into some could sleep messages and found some bogus
locking in the attach routine of many drivers.  Several init a mtx in
their softc and then lock/unlock it in their attach routine.  This, of
course, does nothing to provide exclusive access to a device.  I assume
there is going to be a global IF_LOCK or something to be used in attach
routines.  Can someone fill me in on the intended design?

-Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message