Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)

2001-04-18 Thread E.B. Dreger

 Date: Wed, 18 Apr 2001 01:38:14 -0300 (BRST)
 From: Rik van Riel [EMAIL PROTECTED]
 
  Hence, my philosophy is that task switching and preemption are
  necessary evils because hardware does not perfectly accomodate
  software.  If we must, we must... otherwise, use co-op switching as
  the next best thing to straight run-to-completion.
 
 Except that for the [extremely performance critical] interrupt
 handlers the "software" is under control of the folks who write
 the OS.
 
 You need preemption for userspace because it's possibly "hostile"
 software, but things like the interrupt handlers and the kernel
 code are under your control ... this means you can code it to be
 as efficient as possible without impacting latency too much.

Right.  This is why I think that messing with pre-emption inside interrupt
handlers is a bad thing.  If kernel code doesn't cooperatively time-share,
then we likely have bigger problems than task switching. :-)

Hence I'm curious about replacing Giant with a token-passing mechanism.
If the token equals your CPU number, you have "Giant"... do what's needed.
Then set the token to the next CPU, and do what doesn't require "Giant".

Matt pointed out (to me off-list IIRC) that the mutex usually shouldn't
have to spin.  However, passing a token would involve changing the value
of some known memory location... that should be even faster and simpler
than a mutex.  No bus locking, no spinning...

AFAIK, there isn't any "good" support specifically for token passing.  But
memory reads and writes that don't even require the lock prefix... how
much faster and simpler can you get?

Want finer-grained control than "Giant"?  Any time you have "Giant"/token,
you can poll (and claim, if available) any more-specific mutex.  Nobody
else has G/tk, so there would be no races.

By using fine-grained co-op mutexes, there is very little that must be
done when we have G/tk, thus minimizing wait for G/tk.  Note, too, that we
run our standard scheduler when we don't yet have G/tk, so we're not even
blocking unless the CPU is totally idle... and then, the degenerate case
is spinning.


Eddy

---

Brotsman  Dreger, Inc.
EverQuick Internet / EternalCommerce Division

Phone: (316) 794-8922

---


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)

2001-04-17 Thread E.B. Dreger

 Date: Tue, 17 Apr 2001 21:20:45 -0400
 From: Bosko Milekic [EMAIL PROTECTED]
 
 What happens if we get an interrupt, we're thinking about servicing
 it, about to check whether we're already holding a mutex that may
 potentially be used inside the mainline int routine, and another CPU
 becomes idle? In this particular case, let's say that we decide that we
 have to set ipending and iret immediately, because we're already holding
 a potential lock when we got interrupted. Isn't the result that we have
 a second CPU idling while we just set ipending? (I could be missing
 something, really).

(Thinking hard... this is fun stuff...)

 Also, some mainline interrupt code may need to acquire a really large
 number of locks, but only in some cases. Let's say we have to first
 check if we have a free cached buffer sitting somewhere, and if not,
 malloc() a new one. Well, the malloc() will eventually trigger a chain
 of mutex lock operations, but only in the case where we lack the cached
 buffer to allocate it. There is no practical way of telling up front
 whether or not we'll have to malloc(), so I'm wondering how efficiently
 we would be able to predict in cases such as these?

In this case, why not have a memory allocator similar to Hoard?

Let's say that I have a four-way system with 256 MB.  First CPU gets first
64 MB, second gets the next 64 MB, and so on.  Now we needn't lock before
malloc(), because each CPU knows ahead of time what is "off limits".

When one reaches a high water mark, it steals half the available space
from the CPU with the least memory utilization.  This _would_ require a
lock, but should only happen in rare instances.

I know that memory could become fragmented over time, but as long as we
don't screw up caching (which shouldn't be a problem considering that
pages are much larger than cache lines), who cares?


Eddy

---

Brotsman  Dreger, Inc.
EverQuick Internet / EternalCommerce Division

Phone: (316) 794-8922

---


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)

2001-04-17 Thread E.B. Dreger

 Date: Tue, 17 Apr 2001 19:06:08 -0700 (PDT)
 From: Matt Dillon [EMAIL PROTECTED]
 
 They don't have to be.  If you have four NICs each one can be its own
 interrupt, each with its own mutex.  Thus all four can be taken in
 parallel.  I was under the impression that BSDI had achieved that
 in their scheme.

IIRC, didn't the NT driver for some NIC (Intel?) switch to polling,
anyway, under heavy load?  The reasoning being that you _know_ that you're
going to get something... why bother an IRQ hit?

That said, IRQ distribution sounds like a good thing for the general case.

 If you have one NIC then obviously you can't take multiple interrupts
 for that one NIC on different cpu's.  No great loss, you generally don't
 want to do that anyway.

Actually, I should think that one would _want_ to serialize traffic for a
given NIC.  (I'm ignoring when one trunks NICs... speaking of which,
anyone have info on 802.3ad? ;-)  Otherwise, one ends up with a race that
[potentially] screws up packet sequence.


Eddy

---

Brotsman  Dreger, Inc.
EverQuick Internet / EternalCommerce Division

Phone: (316) 794-8922

---


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)

2001-04-17 Thread Matt Dillon


:IIRC, didn't the NT driver for some NIC (Intel?) switch to polling,
:anyway, under heavy load?  The reasoning being that you _know_ that you're
:going to get something... why bother an IRQ hit?
:
:That said, IRQ distribution sounds like a good thing for the general case.

Under a full load polling would work just as well as an interrupt.
With NT for the network tests they hardwired each NIC to a particular
CPU.  I don't know if they did any polling or not.

: If you have one NIC then obviously you can't take multiple interrupts
: for that one NIC on different cpu's.  No great loss, you generally don't
: want to do that anyway.
:
:Actually, I should think that one would _want_ to serialize traffic for a
:given NIC.  (I'm ignoring when one trunks NICs... speaking of which,
:anyone have info on 802.3ad? ;-)  Otherwise, one ends up with a race that
:[potentially] screws up packet sequence.
:
:Eddy

Yes.  Also NICs usually have circular buffers for packets so, really,
only one cpu can be processing a particular NIC's packets at any given
moment.

-Matt

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)

2001-04-17 Thread E.B. Dreger

 Date: Tue, 17 Apr 2001 19:34:56 -0700 (PDT)
 From: Matt Dillon [EMAIL PROTECTED]
 
 Yes.  Also NICs usually have circular buffers for packets so, really,
 only one cpu can be processing a particular NIC's packets at any given
 moment.

We could always have a mutex for each NIC's ring buffer...

*ducking and running*

Sorry... couldn't resist. :-)


Eddy

---

Brotsman  Dreger, Inc.
EverQuick Internet / EternalCommerce Division

Phone: (316) 794-8922

---


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)

2001-04-17 Thread janb

 IIRC, didn't the NT driver for some NIC (Intel?) switch to polling,
 anyway, under heavy load?  The reasoning being that you _know_ that you're
 going to get something... why bother an IRQ hit?
THis is very interesting. How does this affect performance?

JAn


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)

2001-04-17 Thread Rik van Riel

On Tue, 17 Apr 2001, Matt Dillon wrote:

 Under a full load polling would work just as well as an interrupt.
 With NT for the network tests they hardwired each NIC to a particular
 CPU.  I don't know if they did any polling or not.

Not true. Interrupts work worse than polling because the interrupt
top halves can keep the CPU busy, whereas with polling you only
peek at the card when you have time.

This means pure interrupts can possibly DoS a CPU (think about a
gigabit ping flood) while polling leaves the box alive and still
allows it to process as much as it can (while not wasting CPU on
taking in packets it cannot process higher up the stack).

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)

2001-04-17 Thread E.B. Dreger

 Date: Wed, 18 Apr 2001 00:04:12 -0300 (BRST)
 From: Rik van Riel [EMAIL PROTECTED]
 
 Not true. Interrupts work worse than polling because the interrupt
 top halves can keep the CPU busy, whereas with polling you only

Top halves and _task switching_.  Again, in a well-written handler with a
tight loop, task switching becomes expensive.

 peek at the card when you have time.

Think aio_ versus kernel queues. :-)

 This means pure interrupts can possibly DoS a CPU (think about a
 gigabit ping flood) while polling leaves the box alive and still
 allows it to process as much as it can (while not wasting CPU on
 taking in packets it cannot process higher up the stack).

I should hope that the card would be smart enough to combine consecutive
packets into a single DMA transfer, but I know what you mean.


Eddy

---

Brotsman  Dreger, Inc.
EverQuick Internet / EternalCommerce Division

Phone: (316) 794-8922

---


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)

2001-04-17 Thread E.B. Dreger

Going back to basic principles:

For minimal CPU utilization, it would be nice skip task switching, period.
Run something to completion, then go on to the next task.  Poll without
ever using an interrupt.

The problem is that latency becomes totally unacceptable.

So now let's go to the other extreme:  Create a Transputer-like array with
hundreds of 65xx-complexity CPUs.  Each atomic task runs on its own
private CPU.

The problem is that the electronics become a pain, and are often idle.
When too many tasks are launched, we run out of CPU power.

The compromise is to switch tasks on whatever CPU power is available...
balancing switching overhead with latency.  *Let the latency be as high as
is acceptable to reduce overhead as low as is practical.*

Hence, my philosophy is that task switching and preemption are necessary
evils because hardware does not perfectly accomodate software.  If we
must, we must... otherwise, use co-op switching as the next best thing to
straight run-to-completion.


Eddy

---

Brotsman  Dreger, Inc.
EverQuick Internet / EternalCommerce Division

Phone: (316) 794-8922

---


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)

2001-04-17 Thread Rik van Riel

On Wed, 18 Apr 2001, E.B. Dreger wrote:

 For minimal CPU utilization, it would be nice skip task switching,
 period. Run something to completion, then go on to the next task.  
 Poll without ever using an interrupt.

[snip]

 Hence, my philosophy is that task switching and preemption are
 necessary evils because hardware does not perfectly accomodate
 software.  If we must, we must... otherwise, use co-op switching as
 the next best thing to straight run-to-completion.

Except that for the [extremely performance critical] interrupt
handlers the "software" is under control of the folks who write
the OS.

You need preemption for userspace because it's possibly "hostile"
software, but things like the interrupt handlers and the kernel
code are under your control ... this means you can code it to be
as efficient as possible without impacting latency too much.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message