Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)
Date: Wed, 18 Apr 2001 01:38:14 -0300 (BRST) From: Rik van Riel [EMAIL PROTECTED] Hence, my philosophy is that task switching and preemption are necessary evils because hardware does not perfectly accomodate software. If we must, we must... otherwise, use co-op switching as the next best thing to straight run-to-completion. Except that for the [extremely performance critical] interrupt handlers the "software" is under control of the folks who write the OS. You need preemption for userspace because it's possibly "hostile" software, but things like the interrupt handlers and the kernel code are under your control ... this means you can code it to be as efficient as possible without impacting latency too much. Right. This is why I think that messing with pre-emption inside interrupt handlers is a bad thing. If kernel code doesn't cooperatively time-share, then we likely have bigger problems than task switching. :-) Hence I'm curious about replacing Giant with a token-passing mechanism. If the token equals your CPU number, you have "Giant"... do what's needed. Then set the token to the next CPU, and do what doesn't require "Giant". Matt pointed out (to me off-list IIRC) that the mutex usually shouldn't have to spin. However, passing a token would involve changing the value of some known memory location... that should be even faster and simpler than a mutex. No bus locking, no spinning... AFAIK, there isn't any "good" support specifically for token passing. But memory reads and writes that don't even require the lock prefix... how much faster and simpler can you get? Want finer-grained control than "Giant"? Any time you have "Giant"/token, you can poll (and claim, if available) any more-specific mutex. Nobody else has G/tk, so there would be no races. By using fine-grained co-op mutexes, there is very little that must be done when we have G/tk, thus minimizing wait for G/tk. Note, too, that we run our standard scheduler when we don't yet have G/tk, so we're not even blocking unless the CPU is totally idle... and then, the degenerate case is spinning. Eddy --- Brotsman Dreger, Inc. EverQuick Internet / EternalCommerce Division Phone: (316) 794-8922 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)
Date: Tue, 17 Apr 2001 21:20:45 -0400 From: Bosko Milekic [EMAIL PROTECTED] What happens if we get an interrupt, we're thinking about servicing it, about to check whether we're already holding a mutex that may potentially be used inside the mainline int routine, and another CPU becomes idle? In this particular case, let's say that we decide that we have to set ipending and iret immediately, because we're already holding a potential lock when we got interrupted. Isn't the result that we have a second CPU idling while we just set ipending? (I could be missing something, really). (Thinking hard... this is fun stuff...) Also, some mainline interrupt code may need to acquire a really large number of locks, but only in some cases. Let's say we have to first check if we have a free cached buffer sitting somewhere, and if not, malloc() a new one. Well, the malloc() will eventually trigger a chain of mutex lock operations, but only in the case where we lack the cached buffer to allocate it. There is no practical way of telling up front whether or not we'll have to malloc(), so I'm wondering how efficiently we would be able to predict in cases such as these? In this case, why not have a memory allocator similar to Hoard? Let's say that I have a four-way system with 256 MB. First CPU gets first 64 MB, second gets the next 64 MB, and so on. Now we needn't lock before malloc(), because each CPU knows ahead of time what is "off limits". When one reaches a high water mark, it steals half the available space from the CPU with the least memory utilization. This _would_ require a lock, but should only happen in rare instances. I know that memory could become fragmented over time, but as long as we don't screw up caching (which shouldn't be a problem considering that pages are much larger than cache lines), who cares? Eddy --- Brotsman Dreger, Inc. EverQuick Internet / EternalCommerce Division Phone: (316) 794-8922 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)
Date: Tue, 17 Apr 2001 19:06:08 -0700 (PDT) From: Matt Dillon [EMAIL PROTECTED] They don't have to be. If you have four NICs each one can be its own interrupt, each with its own mutex. Thus all four can be taken in parallel. I was under the impression that BSDI had achieved that in their scheme. IIRC, didn't the NT driver for some NIC (Intel?) switch to polling, anyway, under heavy load? The reasoning being that you _know_ that you're going to get something... why bother an IRQ hit? That said, IRQ distribution sounds like a good thing for the general case. If you have one NIC then obviously you can't take multiple interrupts for that one NIC on different cpu's. No great loss, you generally don't want to do that anyway. Actually, I should think that one would _want_ to serialize traffic for a given NIC. (I'm ignoring when one trunks NICs... speaking of which, anyone have info on 802.3ad? ;-) Otherwise, one ends up with a race that [potentially] screws up packet sequence. Eddy --- Brotsman Dreger, Inc. EverQuick Internet / EternalCommerce Division Phone: (316) 794-8922 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)
:IIRC, didn't the NT driver for some NIC (Intel?) switch to polling, :anyway, under heavy load? The reasoning being that you _know_ that you're :going to get something... why bother an IRQ hit? : :That said, IRQ distribution sounds like a good thing for the general case. Under a full load polling would work just as well as an interrupt. With NT for the network tests they hardwired each NIC to a particular CPU. I don't know if they did any polling or not. : If you have one NIC then obviously you can't take multiple interrupts : for that one NIC on different cpu's. No great loss, you generally don't : want to do that anyway. : :Actually, I should think that one would _want_ to serialize traffic for a :given NIC. (I'm ignoring when one trunks NICs... speaking of which, :anyone have info on 802.3ad? ;-) Otherwise, one ends up with a race that :[potentially] screws up packet sequence. : :Eddy Yes. Also NICs usually have circular buffers for packets so, really, only one cpu can be processing a particular NIC's packets at any given moment. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)
Date: Tue, 17 Apr 2001 19:34:56 -0700 (PDT) From: Matt Dillon [EMAIL PROTECTED] Yes. Also NICs usually have circular buffers for packets so, really, only one cpu can be processing a particular NIC's packets at any given moment. We could always have a mutex for each NIC's ring buffer... *ducking and running* Sorry... couldn't resist. :-) Eddy --- Brotsman Dreger, Inc. EverQuick Internet / EternalCommerce Division Phone: (316) 794-8922 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)
IIRC, didn't the NT driver for some NIC (Intel?) switch to polling, anyway, under heavy load? The reasoning being that you _know_ that you're going to get something... why bother an IRQ hit? THis is very interesting. How does this affect performance? JAn To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)
On Tue, 17 Apr 2001, Matt Dillon wrote: Under a full load polling would work just as well as an interrupt. With NT for the network tests they hardwired each NIC to a particular CPU. I don't know if they did any polling or not. Not true. Interrupts work worse than polling because the interrupt top halves can keep the CPU busy, whereas with polling you only peek at the card when you have time. This means pure interrupts can possibly DoS a CPU (think about a gigabit ping flood) while polling leaves the box alive and still allows it to process as much as it can (while not wasting CPU on taking in packets it cannot process higher up the stack). regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)
Date: Wed, 18 Apr 2001 00:04:12 -0300 (BRST) From: Rik van Riel [EMAIL PROTECTED] Not true. Interrupts work worse than polling because the interrupt top halves can keep the CPU busy, whereas with polling you only Top halves and _task switching_. Again, in a well-written handler with a tight loop, task switching becomes expensive. peek at the card when you have time. Think aio_ versus kernel queues. :-) This means pure interrupts can possibly DoS a CPU (think about a gigabit ping flood) while polling leaves the box alive and still allows it to process as much as it can (while not wasting CPU on taking in packets it cannot process higher up the stack). I should hope that the card would be smart enough to combine consecutive packets into a single DMA transfer, but I know what you mean. Eddy --- Brotsman Dreger, Inc. EverQuick Internet / EternalCommerce Division Phone: (316) 794-8922 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)
Going back to basic principles: For minimal CPU utilization, it would be nice skip task switching, period. Run something to completion, then go on to the next task. Poll without ever using an interrupt. The problem is that latency becomes totally unacceptable. So now let's go to the other extreme: Create a Transputer-like array with hundreds of 65xx-complexity CPUs. Each atomic task runs on its own private CPU. The problem is that the electronics become a pain, and are often idle. When too many tasks are launched, we run out of CPU power. The compromise is to switch tasks on whatever CPU power is available... balancing switching overhead with latency. *Let the latency be as high as is acceptable to reduce overhead as low as is practical.* Hence, my philosophy is that task switching and preemption are necessary evils because hardware does not perfectly accomodate software. If we must, we must... otherwise, use co-op switching as the next best thing to straight run-to-completion. Eddy --- Brotsman Dreger, Inc. EverQuick Internet / EternalCommerce Division Phone: (316) 794-8922 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a hugeperformance boost)
On Wed, 18 Apr 2001, E.B. Dreger wrote: For minimal CPU utilization, it would be nice skip task switching, period. Run something to completion, then go on to the next task. Poll without ever using an interrupt. [snip] Hence, my philosophy is that task switching and preemption are necessary evils because hardware does not perfectly accomodate software. If we must, we must... otherwise, use co-op switching as the next best thing to straight run-to-completion. Except that for the [extremely performance critical] interrupt handlers the "software" is under control of the folks who write the OS. You need preemption for userspace because it's possibly "hostile" software, but things like the interrupt handlers and the kernel code are under your control ... this means you can code it to be as efficient as possible without impacting latency too much. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message