Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)

2001-04-17 Thread Greg Lehey

On Tuesday, 17 April 2001 at  1:19:57 -0700, Alfred Perlstein wrote:
 * Matt Dillon [EMAIL PROTECTED] [010415 23:16] wrote:

   For example, all this work on a preemptive
 kernel is just insane.  Our entire kernel is built on the concept of
 not being preemptable except by interrupts.  We virtually guarentee
 years of instability and bugs leaking out of the woodwork by trying to
 make it preemptable, and the performance gain we get for that pain
 is going to be zilch.  Nada.  Nothing.

 Pre-emption is mearly a side effect of a mutex'd kernel.

 The actual gains are in terms of parallel execution internally.
 Meaning if we happen to copyin() a 4 meg buffer we can allow more
 than one process to be completing some sort of work inside the
 kernel other than spinning on the giant lock.

*sigh* Couldn't you have changed the subject line when discussing
something of this importance?

Greg
--
Finger [EMAIL PROTECTED] for PGP public key
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)

2001-04-17 Thread Alfred Perlstein

* Greg Lehey [EMAIL PROTECTED] [010417 17:02] wrote:
 On Tuesday, 17 April 2001 at  1:19:57 -0700, Alfred Perlstein wrote:
  * Matt Dillon [EMAIL PROTECTED] [010415 23:16] wrote:
 
For example, all this work on a preemptive
  kernel is just insane.  Our entire kernel is built on the concept of
  not being preemptable except by interrupts.  We virtually guarentee
  years of instability and bugs leaking out of the woodwork by trying to
  make it preemptable, and the performance gain we get for that pain
  is going to be zilch.  Nada.  Nothing.
 
  Pre-emption is mearly a side effect of a mutex'd kernel.
 
  The actual gains are in terms of parallel execution internally.
  Meaning if we happen to copyin() a 4 meg buffer we can allow more
  than one process to be completing some sort of work inside the
  kernel other than spinning on the giant lock.
 
 *sigh* Couldn't you have changed the subject line when discussing
 something of this importance?

I wasn't discussing, I was explaining.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Represent yourself, show up at BABUG http://www.babug.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)

2001-04-17 Thread Bosko Milekic


On Tue, Apr 17, 2001 at 05:47:23PM -0700, Matt Dillon wrote:
 Proposed:
 
   mainline kernel {
   get_spin_mutex(somemutex);
   .
   .
   masked interrupt occurs, interrupt structure contains array
   of mutexes the interrupt will need.  Check said mutexes, one
   found to be held by current cpu.  Set interrupt-pending bit
   in mutex and iret immediately.

You cannot be pre-empted by an interrupt if you are holding a spin
 mutex, AFAIK, even under present implementation.

   .
   .
   release_spin_mutex(somemutex)
   (bit found to be set in mutex, triggers interrupt reissuing code)
   }
 
 And there you have it.  The mutex/array test is takes very little time
 being a read-only test that requires no bus locking, and the collision
 case is cheap also because the current cpu already owns the mutex, allowing
 us to set the interrupt-pending bit in that mutex without any bus
 locking.  The check during the release of the mutex is two instructions,
 no bus locking required.  The whole thing can be implemented without any
 additional bus locking and virtually no contention.
 
 The case could be further optimized by requiring that interrupts only
 use a single mutex, period.  This would allow the mainline interrupt
 routine to obtain the mutex on entry to the interrupt and allow the 
 reissuing code to reissue the interrupt without freeing the mutex that
 caused the reissue, so the mutex is held throughout and then freed by
 the interrupt itself.

 Holy shit.  I think that's it!  I don't think it can get much better then
 that.  It solves all of BDE's issues, solves the interrupt-as-thread
 issue (by not using threads for interrupts at all), and removes a huge
 amount of unnecessary complexity from the system.  We could even get rid
 of the idle processes if we wanted to.

What happens if we get an interrupt, we're thinking about servicing
it, about to check whether we're already holding a mutex that may potentially
be used inside the mainline int routine, and another CPU becomes idle?
In this particular case, let's say that we decide that we have to set
ipending and iret immediately, because we're already holding a potential
lock when we got interrupted. Isn't the result that we have a second
CPU idling while we just set ipending? (I could be missing something, really).
Also, some mainline interrupt code may need to acquire a really
large number of locks, but only in some cases. Let's say we have to first
check if we have a free cached buffer sitting somewhere, and if not, malloc()
a new one. Well, the malloc() will eventually trigger a chain of mutex
lock operations, but only in the case where we lack the cached buffer to
allocate it. There is no practical way of telling up front whether or not
we'll have to malloc(), so I'm wondering how efficiently we would be able
to predict in cases such as these?

   -Matt

Cheers,
-- 
 Bosko Milekic
 [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)

2001-04-17 Thread Alfred Perlstein

* Matt Dillon [EMAIL PROTECTED] [010417 17:47] wrote:
...
 
 Interrupts by definition know precisely what they are going to do, so by
 definition they know precisely which mutexes (if any) they may need
 to get.  This means that, in fact, it is possible to implement a check
 to determine if any of the mutexes an interrupt might want to get are
 already being held by the SAME cpu or not, and if they are to do the
 equivalent of what our delayed-interrupt stuff does in the stable's
 spl/splx code, but instead do the check when a mutex is released.
 
 The result is:  No need for an idle process to support interrupt
 contexts, no need to implement interrupts as threads, and no need
 to implement fancy scheduling tricks or Giant handling.
 
...
 And there you have it.  The mutex/array test is takes very little time
 being a read-only test that requires no bus locking, and the collision
 case is cheap also because the current cpu already owns the mutex, allowing
 us to set the interrupt-pending bit in that mutex without any bus
 locking.  The check during the release of the mutex is two instructions,
 no bus locking required.  The whole thing can be implemented without any
 additional bus locking and virtually no contention.
 
 The case could be further optimized by requiring that interrupts only
 use a single mutex, period.  This would allow the mainline interrupt
 routine to obtain the mutex on entry to the interrupt and allow the 
 reissuing code to reissue the interrupt without freeing the mutex that
 caused the reissue, so the mutex is held throughout and then freed by
 the interrupt itself.
 
 Holy shit.  I think that's it!  I don't think it can get much better then
 that.  It solves all of BDE's issues, solves the interrupt-as-thread
 issue (by not using threads for interrupts at all), and removes a huge
 amount of unnecessary complexity from the system.  We could even get rid
 of the idle processes if we wanted to.

We can switch to this mechism at a later date.

There's issues here though:

  Mutex creation can be expensive as it seems like each interrupt
  needs to register what sort of mutex it's interested in, when a
  mutex is created the list must be scanned and each interrupt
  updated.

  Interrupts do not know "exactly" which mutexes they will need, they
  know about a subset of the mutexes they may need, this scheme causes
  several problems:
1) interrupts are again fan-in, meaning if you block an interrupt
class on one cpu you block them on all cpus
2) when we may have things like per-socket mutexes we are blocking
interrupts that may not need twiddling by the interrupt handler,
yet we need to block the interrupt anyway because it _may_ want
the same mutex that we have.

  Windriver has a full time developer working on the existing
  implementation, as far as I know we can only count on you for
  weekends and spare time.

  I'm starting to feel that I'm wasting time trying to get you to
  see the bigger picture; the fact that niether system means diddly
  unless we get to work on locking the rest of the kernel.

With that said, I'd really like to see the better of the two schemes
implemented when the dust settles.  The problem is that right now
neither scheme is buying us much other than overhead without
signifigant parts of the kernel being converted over to a mutexed
system.

Your proposal is valueable and might be something that we switch
to, however for the time being it's far more important to work on
locking down subsystems than working on the locking subsystem.

In fact if you proposed a new macro wrapper for mtx_* that would
make it easier at a later date to implement _your_ version of the
locking subsystem I would back it just to get you interested in
participating in locking down the other subsystems.

-Alfred

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)

2001-04-17 Thread Matt Dillon

:
:   You cannot be pre-empted by an interrupt if you are holding a spin
: mutex, AFAIK, even under present implementation.

Since spin mutexes are going to be held all over the place, this 
type of restriction would seem to be detrimental.  If you can do all
the checking up-front it is also unnecessary.

:   What happens if we get an interrupt, we're thinking about servicing
:it, about to check whether we're already holding a mutex that may potentially
:be used inside the mainline int routine, and another CPU becomes idle?
:In this particular case, let's say that we decide that we have to set
:ipending and iret immediately, because we're already holding a potential

You could, but you are talking about a very small window of opportunity
assuming that we are running similar code to what we have now in regards
to assigning the interrupts to a cpu.  In 4.x we assign interrupts to
whichever cpu is holding Giant.  With mutexes we would want to assign
interrupts to whichever cpu is idle and if no cpu is idle we round-robin
the interrupt assignments (e.g. cpu takes interrupt, assigns all 
interrupts to the next cpu if no cpus are idle).  With that in place,
the best course is almost certainly going to be to do nothing ... that is,
take the interrupt even though it might not be optimal.

If once every X thousand interrupts we happen to hit a case where a cpu
remains idle when it doesn't have to be, who gives a flying f**k if
that one interrupt is non-optimal?  I'm not kidding... that sort of
case is not a problem that needs to be solved.

:   Also, some mainline interrupt code may need to acquire a really
:large number of locks, but only in some cases. Let's say we have to first
:check if we have a free cached buffer sitting somewhere, and if not, malloc()

None of our current interrupt code needs to aquire a huge number of
locks, and if some piece of interrupt code is so complex that it does,
it should be relegated to a software interrupt (e.g. like the TCP stack).
Lets not create problems where they don't exist.  If one of our subsystems
happened to be require more complexity - for example, the I/O completion
handling (biodone() code), it's a solveable problem.

Simplification is what is needed here.  Creating a complex solution to
a complex problem only results in a mess.  Simplifying the problem so
that it covers most of your codebase and then focusing on the one or two
cases it doesn't cover would seem to be a better way of dealing with
the issue.

:lock operations, but only in the case where we lack the cached buffer to
:allocate it. There is no practical way of telling up front whether or not
:we'll have to malloc(), so I'm wondering how efficiently we would be able
:to predict in cases such as these?
:
:Cheers,
:-- 
: Bosko Milekic

It could very well be that for an interrupt we might need to list 
two mutexes --- one for the memory subsystem and one for the interrupt's
subsystem.  It would be nice if we could get away with one, but having
to list two or even three mutexes would not be much of a burden on
the interrupt code.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)

2001-04-17 Thread Alfred Perlstein

* E.B. Dreger [EMAIL PROTECTED] [010417 18:36] wrote:
 
 In this case, why not have a memory allocator similar to Hoard?

It doesn't work, but it's close:

http://people.freebsd.org/~alfred/memcache/

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Represent yourself, show up at BABUG http://www.babug.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)

2001-04-17 Thread Alfred Perlstein

* E.B. Dreger [EMAIL PROTECTED] [010417 18:48] wrote:
  Date: Tue, 17 Apr 2001 18:28:40 -0700
  From: Alfred Perlstein [EMAIL PROTECTED]
  
  1) interrupts are again fan-in, meaning if you block an interrupt
  class on one cpu you block them on all cpus
 
 When would this be a bad case?  i.e., if an interrupt [class] must be
 blocked, would we not it blocked across the board?

It'd be nice if you had something like 16 nic cards working
independantly of each other to not be in the same collision domain
if they don't have to.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)

2001-04-17 Thread Matt Dillon


:  Mutex creation can be expensive as it seems like each interrupt
:  needs to register what sort of mutex it's interested in, when a
:  mutex is created the list must be scanned and each interrupt
:  updated.

The list is based in the interrupt structure.  The cost is, what,
four or five instructions in a loop for which the vast majority will
only have to iterate once.  All the operations are read-only unless
you get a hit.  Very cheap.  It would be nice if the list could be
fixed to one or two items... same number of instructions, but no loop,
fewer memory accesses, and cheaper to execute.

The only interrupts we care about in regards to the efficiency of this 
design are:  network interrupts and I/O interrupts, yes?  Network
interrupts can get away with one or two mutexes (memory and queue, or
perhaps even just memory).  I/O interrupts are a stickier issue but
up until softupdates the only thing biodone() did was release a lock
already held, so it wouldn't be an issue.  I think softupdates relegates
nearly all of its work to a software interrupt or process so softupdates
would not represent a problem either.  I'd have to review it.  I made one
change to the VM system in 4.x which was to free swap indirectly from
biodone which I would have to rip out, but that would pretty much be it.

:  Interrupts do not know "exactly" which mutexes they will need, they
:  know about a subset of the mutexes they may need, this scheme causes
:  several problems:
:1) interrupts are again fan-in, meaning if you block an interrupt
:class on one cpu you block them on all cpus

They don't have to be.  If you have four NICs each one can be its own
interrupt, each with its own mutex.  Thus all four can be taken in
parallel.  I was under the impression that BSDI had achieved that
in their scheme.

If you have one NIC then obviously you can't take multiple interrupts
for that one NIC on different cpu's.  No great loss, you generally don't
want to do that anyway.

:2) when we may have things like per-socket mutexes we are blocking
:interrupts that may not need twiddling by the interrupt handler,
:yet we need to block the interrupt anyway because it _may_ want
:the same mutex that we have.

Network interrupts do not mess around with sockets.  The packets are
passed to a software interrupt level which is certainly a more heavyweight
entity.  I can be argued very easily that the TCP stack should operate
as a thread -- actually, one thread for each cpu, so if you have a lot
of TCP activity you can activate several threads and process TCP
operations in parallel.  (IRIX did this to good effect).

Nobody should ever do complex processing in an interrupt, period.  If
you need to do complex processing, you do it in a software interrupt
(in -stable), or a thread in the new design.

:  Windriver has a full time developer working on the existing
:  implementation, as far as I know we can only count on you for
:  weekends and spare time.

Doesn't effect the discussion, really.  It's nice that people are dedicated
to the project.  I wish someone were in charge of it, like Linus is
in charge of Linux.  When my time frees up (A year from now?  Less?  More?
I don't know).. when my time frees up I am going to start working from
whatever platform is the most stable.  If 5.x isn't stable by that time
it's probably hopeless and I'll have to start work from the 4.x base.  If
5.x is stable then I'll be able to start from 5.x.

I know that sounds harsh, but it's a realistic view.  I truely do not
believe that SMPifying things needs to be this difficult, if only
people would focus on the things that matter and stop trying to throw
the kitchen sink into -current (especially without adequate testing).
That's my beef with current.  I find it ironic that I was shot down for
not following the BSDI mutex model in the name of compatibility when I
did that first push, but when other people started messing with the 
system compatibility went flying right out the window.  Very ironic.

:neither scheme is buying us much other than overhead without
:signifigant parts of the kernel being converted over to a mutexed
:system.
:
:Your proposal is valueable and might be something that we switch
:to, however for the time being it's far more important to work on
:locking down subsystems than working on the locking subsystem.
:
:In fact if you proposed a new macro wrapper for mtx_* that would
:make it easier at a later date to implement _your_ version of the
:locking subsystem I would back it just to get you interested in
:participating in locking down the other subsystems.
:
:-Alfred

I wasn't really proposing a new macro wrapper, it was just pseudo
code.  If I were doing mutexes from scratch I would scrap all the
fancy features and just have spin