Re: CVS commit: src/sys/dev/scsipi

2012-04-19 Thread Warner Losh

On Apr 19, 2012, at 6:03 PM, matthew green wrote:
>> But that's a problem with autoconf not dealing with non-MPSAFE drivers,
>> not the driver themselve (they were working before the MP changes, they
>> should continue to work as-is). If you want autoconf to not run
>> under the KERNEL_LOCK when not needed, then is has to be extended so that
>> MPSAFE drivers can register themselves as MPSAFE, so that non-MPSAFE
>> drivers are still called with KERNEL_LOCK.
> 
> it seems the wrong answer to me.  instead of fixing the one or
> two drivers you've found problems in, you want to add more
> infrastructure (IMO clutter) to the system and patch all the
> drivers that *are* ok.  considering that this problem is
> very specific to scsipi, i don't think that is a viable
> solution.

FreeBSD started out with MPSAFE and then went to NEEDS_GIANT as flags for the 
drivers.  I'm with Matthew: patch the drivers that are broken (or not known to 
be safe) and then you have a convenient thing to grep for when you want to 
expand the drivers that are safe.  Much better that way, and it turned out to 
be a big win in FreeBSD when we went from opt-in MPSAFE to out-out...

Warner




re: CVS commit: src/sys/dev/scsipi

2012-04-19 Thread matthew green

> > scsipi depends upon kernel lock.  thus, callers should arrange for it
> > to be held.  since these are drivers calling, it is upto each driver
> > to do this currently.  this is what we've done with other problems we
> > have hit as they've arrived.
> 
> if the caller is MPSAFE, I agree. Non-MPSAFE driver should not have to care
> about KERNEL_LOCK (precisely because they are not MPSAFE).
> The basic assumption about kernel locking was that that driver and subsystems
> using spl locking would continue to work as it, without changes. If we
> change this assumption a lot of code needs to be changed (that is, almost
> all our drivers).

we've never had autoconfig run with the kernel lock AFAICT, so this
assumption has never been true.  the thing that does work is that
IPL_VM using drivers will automatically get kernel locked in their
interrupt handlers.

> > these KASSERT()s have been in place for a long time now and, until
> > your change, we've fixed it at the caller level not worked around it
> > at the scsipi level.
> 
> AFAIK the only drivers that has been changed are ncr53c9x and USB, and they
> are MP-safe (they already use mutex instead of spl locking).

usb is not MP safe in -current, only on the usbmp branch.  all those
drivers take care of taking the kernel lock when calling into the
scsipi code since they almost *always* run when not cold anyway.

infact, the asserts were added in -current due to the usbmp work
and the understanding that they would advance the rest of the system.

> > your change can't work with the drivers that are broken without the
> > major regression of holding kernel lock over autoconfig, or, never
> > ever having them be modules or detached/reattached.
> 
> Maybe non-MPSAFE drivers can't be modules or detached/reattached, indeed.

that's not true.  eg, radeondrm works just fine being MP safe or not.
(it actually is MP safe, though it doesn't announce this to any of the
rest of the system currently.)

> But that's a problem with autoconf not dealing with non-MPSAFE drivers,
> not the driver themselve (they were working before the MP changes, they
> should continue to work as-is). If you want autoconf to not run
> under the KERNEL_LOCK when not needed, then is has to be extended so that
> MPSAFE drivers can register themselves as MPSAFE, so that non-MPSAFE
> drivers are still called with KERNEL_LOCK.

it seems the wrong answer to me.  instead of fixing the one or
two drivers you've found problems in, you want to add more
infrastructure (IMO clutter) to the system and patch all the
drivers that *are* ok.  considering that this problem is
very specific to scsipi, i don't think that is a viable
solution.

> > please make "drvctl -d" and rescan work for your drivers.  this will
> > make the checks against "cold" go away.
> 
> I don't want to touch each driver in our tree for drvctl -d/rescan to work.

as in aside, you should make "drvctl -d" work for any drivers
you maintain.  everyone (tm) should.  irrespective of modules
or otherwise, the end result of it working tends to be bug
fixes.

> If we want this to work for non-MPSAFE driver, then this has to be fixed
> in autoconf (if the problem really exists, which I've not looked at yet).

i don't believe there is a problem in autoconfig.  there is a
problem with some (probably most) scsi drivers here, but that
is about it.  it is true that the contracts have changed, but
this is progress after all.  the assert exists solely to find
places that are wrong and fix them.  your change hides this.

this isn't about SMPifing a whole driver, but adding a call to
taken/release the kernel lock around a few (often just one)
calls in a driver.  it has never been a difficult change to
make.  if you have drivers that needs it, i am happy to send
patches for testing.


.mrg.


Re: CVS commit: src/sys/dev/scsipi

2012-04-19 Thread Manuel Bouyer
On Thu, Apr 19, 2012 at 07:00:56PM +1000, matthew green wrote:
> 
> > > > If the driver's attach is called after cold (e.g. after a detach/rescan
> > > > of the pci bus), the driver's attach should be called with KERNEL_LOCK
> > > > held, or bad things may happen when interrupts are enabled for this 
> > > > driver.
> > > 
> > > there should be no reliance on "cold" being set for normal driver
> > > attach.  it might be a module loaded after boot.  how ever the
> > > driver is loaded, it will need to work without cold being set.
> > 
> > If it's a module laoded after boot, and the driver is not SMP-safe,
> > its attach function has to be called with KERNEL_LOCK held. Or you may
> > have problems when this driver starts receiving interrupts before its
> > attach function has returned (which is typically the case, interrupt enable
> > occurs somewhere in _attach()).
> 
> i don't see how this matters.  drivers should never enable
> interrupts if they're not ready to service them.
> 
> if this is happening, then it is a driver bug.

It looks like we're talking about different things.
I'm talking about this senario, with a non-smp-safe driver.

foo_attach()
{
do software and hardware initialisation();

s = splbio();
enable_interrupt(foo_intr());
finish_init_and_attach_children();
splx(s);
return;
}

foo_intr()
{

}

In this context, the driver enables interrupt, but don't expect to have its
interrupt handler called before the splx(), because
finish_init_and_attach_children() is running at IPL_BIO. But if foo_attach()
is not called with KERNEL_LOCK held, another CPU may call foo_intr(),
breaking the driver's spl protection.

And I don't think it's the driver's business to make sure KERNEL_LOCK is
held in it's attach routine: it's a non-MPSAFE driver and it should not
even have to know about KERNEL_LOCK.

If attach routines can be called without KERNEL_LOCK held after cold,
then all non-MPSAFE drivers (i.e. almost all of them) needs to be fixed.


> 
> > > in my mind, the scsi code should try to run regardless of the value
> > > of "cold" and that's why i replied above.
> > > 
> > > > What kind of senario do you have in mind ?
> > > 
> > > modules, as above.  or simply drvctl -d / -r.  IMO, only platform
> > > specific code really should depend upon cold.  "scsipi", as a
> > > relatively high level subsystem, should not.
> > 
> > But I don't think it's the job of non-SMP-safe drivers to make sure scsipi
> > is called with KERNEL_LOCK held (and scsipi may be the only subsystem
> > to check for KERNEL_LOCK(), but others do rely on it as well - e.g.
> > sys/dev/ata). Either attach() functions should always be called with
> > KERNEL_LOCK(), or for checks as above we have to accept that when
> > cold, locking is not fully up yet and lock checks should be bypassed.
> 
> the lack of checks elsewhere is not the point here.  we should add
> them to dev/ata (and more) as well, if they depend upon these.
> 
> currently, autoconf isn't run with the kernel lock held at any point
> and i don't think that saying that we should hold it during autoconf
> is a good change.  we should fix the cases that trigger problems when
> they appear, and it sounds like you've hit one in your new driver.
> holding the kernel lock here would be a major step backwards.
> 
> scsipi depends upon kernel lock.  thus, callers should arrange for it
> to be held.  since these are drivers calling, it is upto each driver
> to do this currently.  this is what we've done with other problems we
> have hit as they've arrived.

if the caller is MPSAFE, I agree. Non-MPSAFE driver should not have to care
about KERNEL_LOCK (precisely because they are not MPSAFE).
The basic assumption about kernel locking was that that driver and subsystems
using spl locking would continue to work as it, without changes. If we
change this assumption a lot of code needs to be changed (that is, almost
all our drivers).

> 
> it may be that we have this problem in a large number of drivers but
> these KASSERT()s have been in place for a long time now and, until
> your change, we've fixed it at the caller level not worked around it
> at the scsipi level.

AFAIK the only drivers that has been changed are ncr53c9x and USB, and they
are MP-safe (they already use mutex instead of spl locking).

> 
> your change can't work with the drivers that are broken without the
> major regression of holding kernel lock over autoconfig, or, never
> ever having them be modules or detached/reattached.

Maybe non-MPSAFE drivers can't be modules or detached/reattached, indeed.
But that's a problem with autoconf not dealing with non-MPSAFE drivers,
not the driver themselve (they were working before the MP changes, they
should continue to work as-is). If you want autoconf to not run
under the KERNEL_LOCK when not needed, then is has to be extended so that
MPSAFE drivers can register themselves as MPSAFE, so that non-MPSAFE
drivers are still called with KERNEL_LOCK.

> 
>

re: CVS commit: src/sys/dev/scsipi

2012-04-19 Thread matthew green

> > > If the driver's attach is called after cold (e.g. after a detach/rescan
> > > of the pci bus), the driver's attach should be called with KERNEL_LOCK
> > > held, or bad things may happen when interrupts are enabled for this 
> > > driver.
> > 
> > there should be no reliance on "cold" being set for normal driver
> > attach.  it might be a module loaded after boot.  how ever the
> > driver is loaded, it will need to work without cold being set.
> 
> If it's a module laoded after boot, and the driver is not SMP-safe,
> its attach function has to be called with KERNEL_LOCK held. Or you may
> have problems when this driver starts receiving interrupts before its
> attach function has returned (which is typically the case, interrupt enable
> occurs somewhere in _attach()).

i don't see how this matters.  drivers should never enable
interrupts if they're not ready to service them.

if this is happening, then it is a driver bug.

> > in my mind, the scsi code should try to run regardless of the value
> > of "cold" and that's why i replied above.
> > 
> > > What kind of senario do you have in mind ?
> > 
> > modules, as above.  or simply drvctl -d / -r.  IMO, only platform
> > specific code really should depend upon cold.  "scsipi", as a
> > relatively high level subsystem, should not.
> 
> But I don't think it's the job of non-SMP-safe drivers to make sure scsipi
> is called with KERNEL_LOCK held (and scsipi may be the only subsystem
> to check for KERNEL_LOCK(), but others do rely on it as well - e.g.
> sys/dev/ata). Either attach() functions should always be called with
> KERNEL_LOCK(), or for checks as above we have to accept that when
> cold, locking is not fully up yet and lock checks should be bypassed.

the lack of checks elsewhere is not the point here.  we should add
them to dev/ata (and more) as well, if they depend upon these.

currently, autoconf isn't run with the kernel lock held at any point
and i don't think that saying that we should hold it during autoconf
is a good change.  we should fix the cases that trigger problems when
they appear, and it sounds like you've hit one in your new driver.
holding the kernel lock here would be a major step backwards.

scsipi depends upon kernel lock.  thus, callers should arrange for it
to be held.  since these are drivers calling, it is upto each driver
to do this currently.  this is what we've done with other problems we
have hit as they've arrived.

it may be that we have this problem in a large number of drivers but
these KASSERT()s have been in place for a long time now and, until
your change, we've fixed it at the caller level not worked around it
at the scsipi level.

your change can't work with the drivers that are broken without the
major regression of holding kernel lock over autoconfig, or, never
ever having them be modules or detached/reattached.

please make "drvctl -d" and rescan work for your drivers.  this will
make the checks against "cold" go away.


.mrg.


Re: CVS commit: src/sys/dev/scsipi

2012-04-19 Thread Manuel Bouyer
On Thu, Apr 19, 2012 at 06:25:54PM +1000, matthew green wrote:
> [...]
> 
> > If the driver's attach is called after cold (e.g. after a detach/rescan
> > of the pci bus), the driver's attach should be called with KERNEL_LOCK
> > held, or bad things may happen when interrupts are enabled for this driver.
> 
> there should be no reliance on "cold" being set for normal driver
> attach.  it might be a module loaded after boot.  how ever the
> driver is loaded, it will need to work without cold being set.

If it's a module laoded after boot, and the driver is not SMP-safe,
its attach function has to be called with KERNEL_LOCK held. Or you may
have problems when this driver starts receiving interrupts before its
attach function has returned (which is typically the case, interrupt enable
occurs somewhere in _attach()).

> 
> in my mind, the scsi code should try to run regardless of the value
> of "cold" and that's why i replied above.
> 
> > What kind of senario do you have in mind ?
> 
> modules, as above.  or simply drvctl -d / -r.  IMO, only platform
> specific code really should depend upon cold.  "scsipi", as a
> relatively high level subsystem, should not.

But I don't think it's the job of non-SMP-safe drivers to make sure scsipi
is called with KERNEL_LOCK held (and scsipi may be the only subsystem
to check for KERNEL_LOCK(), but others do rely on it as well - e.g.
sys/dev/ata). Either attach() functions should always be called with
KERNEL_LOCK(), or for checks as above we have to accept that when
cold, locking is not fully up yet and lock checks should be bypassed.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


re: CVS commit: src/sys/dev/scsipi

2012-04-19 Thread matthew green

> > > Module Name:  src
> > > Committed By: bouyer
> > > Date: Wed Apr 18 20:37:49 UTC 2012
> > > 
> > > Modified Files:
> > >   src/sys/dev/scsipi: scsipi_base.c
> > > 
> > > Log Message:
> > > Fix KASSERT(): autoconf doesn't run under the KERNEL_LOCK
> > 
> > this is true, but can you please fix it differently?  ie autoconf
> > isn't "cold".  most of scsi autoconf runs after cold is gone, so
> > i'd rather make whatever callers deal with it so that they work
> > cold or not.
> 
> But if we're not cold, interrupts are enabled so I hope whatever calls
> into scsi autoconf runs under the big lock if the underlying driver
> isn't SMP-safe (and if it is, it's its job to take the KERNEL_LOCK).

the point of the KASSERT()'s it to make sure that, until scsipi is
made smp safe, calls into scsi are all done with the kernel lock.

> The problem I've seen is with a driver I'm working on, similar to mfi(4)
> (and I suspect mfi(4) would have had the same problem):
> the driver's attach calls config_found_sm_loc() to attach the scsibus
> while cold, so the big lock isn't helt at this point.

yeah - i'm pretty sure i've seen this and fixed it in a few other
places rather than the above change.

> If the driver's attach is called after cold (e.g. after a detach/rescan
> of the pci bus), the driver's attach should be called with KERNEL_LOCK
> held, or bad things may happen when interrupts are enabled for this driver.

there should be no reliance on "cold" being set for normal driver
attach.  it might be a module loaded after boot.  how ever the
driver is loaded, it will need to work without cold being set.

in my mind, the scsi code should try to run regardless of the value
of "cold" and that's why i replied above.

> What kind of senario do you have in mind ?

modules, as above.  or simply drvctl -d / -r.  IMO, only platform
specific code really should depend upon cold.  "scsipi", as a
relatively high level subsystem, should not.

thanks.


.mrg.


Re: CVS commit: src/sys/dev/scsipi

2012-04-19 Thread Manuel Bouyer
On Thu, Apr 19, 2012 at 04:41:04PM +1000, matthew green wrote:
> 
> > Module Name:src
> > Committed By:   bouyer
> > Date:   Wed Apr 18 20:37:49 UTC 2012
> > 
> > Modified Files:
> > src/sys/dev/scsipi: scsipi_base.c
> > 
> > Log Message:
> > Fix KASSERT(): autoconf doesn't run under the KERNEL_LOCK
> 
> this is true, but can you please fix it differently?  ie autoconf
> isn't "cold".  most of scsi autoconf runs after cold is gone, so
> i'd rather make whatever callers deal with it so that they work
> cold or not.

But if we're not cold, interrupts are enabled so I hope whatever calls
into scsi autoconf runs under the big lock if the underlying driver
isn't SMP-safe (and if it is, it's its job to take the KERNEL_LOCK).

The problem I've seen is with a driver I'm working on, similar to mfi(4)
(and I suspect mfi(4) would have had the same problem):
the driver's attach calls config_found_sm_loc() to attach the scsibus
while cold, so the big lock isn't helt at this point.

If the driver's attach is called after cold (e.g. after a detach/rescan
of the pci bus), the driver's attach should be called with KERNEL_LOCK
held, or bad things may happen when interrupts are enabled for this driver.

What kind of senario do you have in mind ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--