Re: CVS commit: src/sys/dev/scsipi

2012-04-20 Thread Manuel Bouyer
On Thu, Apr 19, 2012 at 06:39:29PM -0600, Warner Losh wrote:
 FreeBSD started out with MPSAFE and then went to NEEDS_GIANT as flags for the 
 drivers.  I'm with Matthew: patch the drivers that are broken (or not known 
 to be safe) and then you have a convenient thing to grep for when you want to 
 expand the drivers that are safe.  Much better that way, and it turned out to 
 be a big win in FreeBSD when we went from opt-in MPSAFE to out-out...

But we're still in the MPSAFE world, not yet at NEEDS_GIANT. Very few drivers
are MPSAFE in NetBSD, so almost all of them need to be patched.

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/dev/scsipi

2012-04-20 Thread David Holland
On Fri, Apr 20, 2012 at 09:37:00AM +0200, Manuel Bouyer wrote:
   we've never had autoconfig run with the kernel lock AFAICT, so this
   assumption has never been true.
  
  So this is a bug. The contract was really that spl-locked drivers would
  continue to work as is when fine-grained locking was introduced.
  I don't think it's reasonable to require every (but the few
  already MP-safe) drivers to take KERNEL_LOCK in their attach
  routine.

So autoconf should check and take the kernel lock before calling the
attach routine of a non-MPSAFE driver. If it doesn't already do this,
it's wrong.

Meanwhile, since scsipi is not MPSAFE, whoever is calling into scsipi
without taking the kernel lock first is wrong, so the change at the
beginning of this thread should be reverted.

(Furthermore, scsipi is itself wrong. It is a core component; it
should be MPSAFE.)

-- 
David A. Holland
dholl...@netbsd.org


Re: CVS commit: src/sys/dev/scsipi

2012-04-20 Thread Manuel Bouyer
On Fri, Apr 20, 2012 at 07:46:35AM +, David Holland wrote:
 On Fri, Apr 20, 2012 at 09:37:00AM +0200, Manuel Bouyer wrote:
we've never had autoconfig run with the kernel lock AFAICT, so this
assumption has never been true.
   
   So this is a bug. The contract was really that spl-locked drivers would
   continue to work as is when fine-grained locking was introduced.
   I don't think it's reasonable to require every (but the few
   already MP-safe) drivers to take KERNEL_LOCK in their attach
   routine.
 
 So autoconf should check and take the kernel lock before calling the
 attach routine of a non-MPSAFE driver. If it doesn't already do this,
 it's wrong.

I agree :)

 
 Meanwhile, since scsipi is not MPSAFE, whoever is calling into scsipi
 without taking the kernel lock first is wrong, so the change at the
 beginning of this thread should be reverted.

I also agree, in light of this discussion (I though that KERNEL_LOCK
was not held while cold on purpose, but it seems that the issue is at
another level, and more annoying).

But I find it acceptable as an interim solution, it at last
allows attaching scsi drivers that are probed at boot.

 
 (Furthermore, scsipi is itself wrong. It is a core component; it
 should be MPSAFE.)

I also agree. So should be ata, if_ethersubr, etc ...
that's a lot of work.

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: src/sys/dev/scsipi

2012-04-20 Thread Manuel Bouyer
On Fri, Apr 20, 2012 at 06:06:35PM +1000, matthew green wrote:
 
   we've never had autoconfig run with the kernel lock AFAICT, so this
   assumption has never been true.
  
  So this is a bug. The contract was really that spl-locked drivers would
  continue to work as is when fine-grained locking was introduced.
 
 there are no actual bugs or races here.  this is about cleanup
 and ensuring code runs how it expects.

And that's the point: all code that use spl-style locking expects
to be called with KERNEL_LOCK held. And this includes the attach
routine.

 
  I don't think it's reasonable to require every (but the few
  already MP-safe) drivers to take KERNEL_LOCK in their attach
  routine.
 
 i'm not saying that they should.  i'm saying that scsipi drivers,
 regardless of whether they are fully MPSAFE or not, should now
 arrange to have the kernel lock held when calling into scsipi at
 all times.  this really has nothing to do with attach, except that
 attach is problem to fix currently in some drivers.

Almost all scsipi drivers ...

 
  It's not only IPL_VM drivers, it'a all driver wich don't register
  and interrupt handler explicitely marked MPSAFE (for example,
  with PCI_INTR_MPSAFE for PCI drivers).
 
 interesting; i hadn't seen PCI_INTR_MPSAFE before.  however, it
 seems ignored / unused.  the x86 pci_intr_setattr() does do
 something but it's never called.  none of the rest do anything.

this just means there's no PCI MP-SAFE driver yet ?

 
 my point is that drivers get interrupt protection from asking for
 an IPL_VM interrupt.  that's what tells the MD code to make sure
 the kernel lock is held for these interrupt handlers.

Or explicitely declared MPSAFE. at last x86 allows this.

 
 these KASSERT()s have been in place for a long time now and, until
 your change, we've fixed it at the caller level not worked around it
 at the scsipi level.

AFAIK the only drivers that has been changed are ncr53c9x and USB, and 
they
are MP-safe (they already use mutex instead of spl locking).
   
   usb is not MP safe in -current, only on the usbmp branch.  all those
   drivers take care of taking the kernel lock when calling into the
   scsipi code since they almost *always* run when not cold anyway.
  
  So things were fixed at the wrong place. Any non-MPSAFE driver is at
  risk right now
 
 i don't understand.  eg, the usbmp code never calls into scsipi
 during autoconfig.

usb is special in this regard.

 these asserts aren't specific to autoconfig,
 and IIRC, most drivers do not call into scsipi during attach.

Most if not all regular scsi drivers do: they do a config_found()
in their attach routine to attach the scsibus.
ATA does it from a kernel thread.

 
 your change can't work with the drivers that are broken without the
 major regression of holding kernel lock over autoconfig, or, never
 ever having them be modules or detached/reattached.

Maybe non-MPSAFE drivers can't be modules or detached/reattached, 
indeed.
   
   that's not true.  eg, radeondrm works just fine being MP safe or not.
   (it actually is MP safe, though it doesn't announce this to any of the
   rest of the system currently.)
  
  Yes, of course, unless there's a KASSERT, things will run fine most of
  the time. That's the problem with race conditions: it doesn't reliably
  fail. 
 
 that's not what i said.  what i said is that it is very trivial to have
 a non MPSAFE-driver handle being a module, being detached, and being
 reattached.

The discussion is about KERNEL_LOCK, and possible race of running
non-MPSAFE code without it.


  
  No, it's not specific to scsipi. It happens that scsipi asserts that
  KERNEL_LOCK is held, but any non-MPSAFE drivers needs to be called with
  KERNEL_LOCK held.
 
 why?  we've never done this for several years now and there's no reason
 why we think there is a problem with it currently.

why did you add this KASSERT to scsipi then ?

 we've actually run
 the interrupt part of autoconfig with multiple threads since sometime
 between netbsd 4.0 and 5.0.

this doesn't mean the races don't exists, just that we don't run into
them most of the time. Maybe that in some common case there's no possible
race, but that would be by accident, not by design.

 
   as in aside, you should make drvctl -d work for any drivers
   you maintain.  everyone (tm) should.  irrespective of modules
   or otherwise, the end result of it working tends to be bug
   fixes.
  
  It's hard to drvctl -d the root device ...
 
 eh, pxe is your friend.  :-)  

I can pxe boot the kernel; having root on NFS is another things.

 
If we want this to work for non-MPSAFE driver, then this has to be fixed
in autoconf (if the problem really exists, which I've not looked at 
yet).
   
   i don't believe there is a problem in autoconfig.  there is a
   problem with some (probably most) scsi drivers here, but that
   is about it.  it is true that the contracts have changed, but
 

Re: CVS commit: src/tests/lib/libc/gen (address alignment)

2012-04-20 Thread Paul Goyette
XXX I would appreciate if someone could test it under a real amd64 
host with an up-to-date kernel, so I can reasonably assume that the 
culprit is Virtual Box and not our amd64 port (my test machine being 
off line I cannot do it myself). Results from other arches would be a 
plus too.


I just updated one of my machines (running a bulldozer FX-8150) to 
-current from an hour ago (with the most up-to-date rev of psl.h) and 
ran atf-run on the libc/gen tests.  The machine hung solid at this 
point:


...
t_siginfo (28/32): 8 test cases
sigalarm: [1.006950s] Passed.
sigbus_adraln:

and I am no longer able to telnet into the box from elsewhere.  I am not 
physically at the machine, so I cannot tell if it has dropped into ddb.



-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |  | pgoyette at netbsd.org  |
-


Re: CVS commit: src/tests/lib/libc/gen (address alignment)

2012-04-20 Thread Paul Goyette

On Fri, 20 Apr 2012, Paul Goyette wrote:

The machine did not drop into ddb, it simply rebooted.  Unfortunately it did 
not leave a core dump behind, so I don't have much to look at just yet.  When 
I get home later today, I will try to get more info.


BTW, this occurred while running the ATF test from a non-privileged user, so 
if there's a bug lurking in these recent changes, it could be considered to 
be a security vulnerability - non-priv user should not be able to crash the 
box...


The machine does not drop into ddb, and it does not panic.  It simply 
reboots.


This is not good...   :)


-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |  | pgoyette at netbsd.org  |
-