Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)
: Mutex creation can be expensive as it seems like each interrupt : needs to register what sort of mutex it's interested in, when a : mutex is created the list must be scanned and each interrupt : updated. The list is based in the interrupt structure. The cost is, what, four or five instructions in a loop for which the vast majority will only have to iterate once. All the operations are read-only unless you get a hit. Very cheap. It would be nice if the list could be fixed to one or two items... same number of instructions, but no loop, fewer memory accesses, and cheaper to execute. The only interrupts we care about in regards to the efficiency of this design are: network interrupts and I/O interrupts, yes? Network interrupts can get away with one or two mutexes (memory and queue, or perhaps even just memory). I/O interrupts are a stickier issue but up until softupdates the only thing biodone() did was release a lock already held, so it wouldn't be an issue. I think softupdates relegates nearly all of its work to a software interrupt or process so softupdates would not represent a problem either. I'd have to review it. I made one change to the VM system in 4.x which was to free swap indirectly from biodone which I would have to rip out, but that would pretty much be it. : Interrupts do not know "exactly" which mutexes they will need, they : know about a subset of the mutexes they may need, this scheme causes : several problems: :1) interrupts are again fan-in, meaning if you block an interrupt :class on one cpu you block them on all cpus They don't have to be. If you have four NICs each one can be its own interrupt, each with its own mutex. Thus all four can be taken in parallel. I was under the impression that BSDI had achieved that in their scheme. If you have one NIC then obviously you can't take multiple interrupts for that one NIC on different cpu's. No great loss, you generally don't want to do that anyway. :2) when we may have things like per-socket mutexes we are blocking :interrupts that may not need twiddling by the interrupt handler, :yet we need to block the interrupt anyway because it _may_ want :the same mutex that we have. Network interrupts do not mess around with sockets. The packets are passed to a software interrupt level which is certainly a more heavyweight entity. I can be argued very easily that the TCP stack should operate as a thread -- actually, one thread for each cpu, so if you have a lot of TCP activity you can activate several threads and process TCP operations in parallel. (IRIX did this to good effect). Nobody should ever do complex processing in an interrupt, period. If you need to do complex processing, you do it in a software interrupt (in -stable), or a thread in the new design. : Windriver has a full time developer working on the existing : implementation, as far as I know we can only count on you for : weekends and spare time. Doesn't effect the discussion, really. It's nice that people are dedicated to the project. I wish someone were in charge of it, like Linus is in charge of Linux. When my time frees up (A year from now? Less? More? I don't know).. when my time frees up I am going to start working from whatever platform is the most stable. If 5.x isn't stable by that time it's probably hopeless and I'll have to start work from the 4.x base. If 5.x is stable then I'll be able to start from 5.x. I know that sounds harsh, but it's a realistic view. I truely do not believe that SMPifying things needs to be this difficult, if only people would focus on the things that matter and stop trying to throw the kitchen sink into -current (especially without adequate testing). That's my beef with current. I find it ironic that I was shot down for not following the BSDI mutex model in the name of compatibility when I did that first push, but when other people started messing with the system compatibility went flying right out the window. Very ironic. :neither scheme is buying us much other than overhead without :signifigant parts of the kernel being converted over to a mutexed :system. : :Your proposal is valueable and might be something that we switch :to, however for the time being it's far more important to work on :locking down subsystems than working on the locking subsystem. : :In fact if you proposed a new macro wrapper for mtx_* that would :make it easier at a later date to implement _your_ version of the :locking subsystem I would back it just to get you interested in :participating in locking down the other subsystems. : :-Alfred I wasn't really proposing a new macro wrapper, it was just pseudo code. If I were doing mutexes from scratch I would scrap all the fancy features and just have spin mut
Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)
* E.B. Dreger <[EMAIL PROTECTED]> [010417 18:48] wrote: > > Date: Tue, 17 Apr 2001 18:28:40 -0700 > > From: Alfred Perlstein <[EMAIL PROTECTED]> > > > > 1) interrupts are again fan-in, meaning if you block an interrupt > > class on one cpu you block them on all cpus > > When would this be a bad case? i.e., if an interrupt [class] must be > blocked, would we not it blocked across the board? It'd be nice if you had something like 16 nic cards working independantly of each other to not be in the same collision domain if they don't have to. -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)
* E.B. Dreger <[EMAIL PROTECTED]> [010417 18:36] wrote: > > In this case, why not have a memory allocator similar to Hoard? It doesn't work, but it's close: http://people.freebsd.org/~alfred/memcache/ -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] Represent yourself, show up at BABUG http://www.babug.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)
: : You cannot be pre-empted by an interrupt if you are holding a spin : mutex, AFAIK, even under present implementation. Since spin mutexes are going to be held all over the place, this type of restriction would seem to be detrimental. If you can do all the checking up-front it is also unnecessary. : What happens if we get an interrupt, we're thinking about servicing :it, about to check whether we're already holding a mutex that may potentially :be used inside the mainline int routine, and another CPU becomes idle? :In this particular case, let's say that we decide that we have to set :ipending and iret immediately, because we're already holding a potential You could, but you are talking about a very small window of opportunity assuming that we are running similar code to what we have now in regards to assigning the interrupts to a cpu. In 4.x we assign interrupts to whichever cpu is holding Giant. With mutexes we would want to assign interrupts to whichever cpu is idle and if no cpu is idle we round-robin the interrupt assignments (e.g. cpu takes interrupt, assigns all interrupts to the next cpu if no cpus are idle). With that in place, the best course is almost certainly going to be to do nothing ... that is, take the interrupt even though it might not be optimal. If once every X thousand interrupts we happen to hit a case where a cpu remains idle when it doesn't have to be, who gives a flying f**k if that one interrupt is non-optimal? I'm not kidding... that sort of case is not a problem that needs to be solved. : Also, some mainline interrupt code may need to acquire a really :large number of locks, but only in some cases. Let's say we have to first :check if we have a free cached buffer sitting somewhere, and if not, malloc() None of our current interrupt code needs to aquire a huge number of locks, and if some piece of interrupt code is so complex that it does, it should be relegated to a software interrupt (e.g. like the TCP stack). Lets not create problems where they don't exist. If one of our subsystems happened to be require more complexity - for example, the I/O completion handling (biodone() code), it's a solveable problem. Simplification is what is needed here. Creating a complex solution to a complex problem only results in a mess. Simplifying the problem so that it covers most of your codebase and then focusing on the one or two cases it doesn't cover would seem to be a better way of dealing with the issue. :lock operations, but only in the case where we lack the cached buffer to :allocate it. There is no practical way of telling up front whether or not :we'll have to malloc(), so I'm wondering how efficiently we would be able :to predict in cases such as these? : :Cheers, :-- : Bosko Milekic It could very well be that for an interrupt we might need to list two mutexes --- one for the memory subsystem and one for the interrupt's subsystem. It would be nice if we could get away with one, but having to list two or even three mutexes would not be much of a burden on the interrupt code. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)
* Matt Dillon <[EMAIL PROTECTED]> [010417 17:47] wrote: ... > > Interrupts by definition know precisely what they are going to do, so by > definition they know precisely which mutexes (if any) they may need > to get. This means that, in fact, it is possible to implement a check > to determine if any of the mutexes an interrupt might want to get are > already being held by the SAME cpu or not, and if they are to do the > equivalent of what our delayed-interrupt stuff does in the stable's > spl/splx code, but instead do the check when a mutex is released. > > The result is: No need for an idle process to support interrupt > contexts, no need to implement interrupts as threads, and no need > to implement fancy scheduling tricks or Giant handling. > ... > And there you have it. The mutex/array test is takes very little time > being a read-only test that requires no bus locking, and the collision > case is cheap also because the current cpu already owns the mutex, allowing > us to set the interrupt-pending bit in that mutex without any bus > locking. The check during the release of the mutex is two instructions, > no bus locking required. The whole thing can be implemented without any > additional bus locking and virtually no contention. > > The case could be further optimized by requiring that interrupts only > use a single mutex, period. This would allow the mainline interrupt > routine to obtain the mutex on entry to the interrupt and allow the > reissuing code to reissue the interrupt without freeing the mutex that > caused the reissue, so the mutex is held throughout and then freed by > the interrupt itself. > > Holy shit. I think that's it! I don't think it can get much better then > that. It solves all of BDE's issues, solves the interrupt-as-thread > issue (by not using threads for interrupts at all), and removes a huge > amount of unnecessary complexity from the system. We could even get rid > of the idle processes if we wanted to. We can switch to this mechism at a later date. There's issues here though: Mutex creation can be expensive as it seems like each interrupt needs to register what sort of mutex it's interested in, when a mutex is created the list must be scanned and each interrupt updated. Interrupts do not know "exactly" which mutexes they will need, they know about a subset of the mutexes they may need, this scheme causes several problems: 1) interrupts are again fan-in, meaning if you block an interrupt class on one cpu you block them on all cpus 2) when we may have things like per-socket mutexes we are blocking interrupts that may not need twiddling by the interrupt handler, yet we need to block the interrupt anyway because it _may_ want the same mutex that we have. Windriver has a full time developer working on the existing implementation, as far as I know we can only count on you for weekends and spare time. I'm starting to feel that I'm wasting time trying to get you to see the bigger picture; the fact that niether system means diddly unless we get to work on locking the rest of the kernel. With that said, I'd really like to see the better of the two schemes implemented when the dust settles. The problem is that right now neither scheme is buying us much other than overhead without signifigant parts of the kernel being converted over to a mutexed system. Your proposal is valueable and might be something that we switch to, however for the time being it's far more important to work on locking down subsystems than working on the locking subsystem. In fact if you proposed a new macro wrapper for mtx_* that would make it easier at a later date to implement _your_ version of the locking subsystem I would back it just to get you interested in participating in locking down the other subsystems. -Alfred To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)
On Tue, Apr 17, 2001 at 05:47:23PM -0700, Matt Dillon wrote: > Proposed: > > mainline kernel { > get_spin_mutex(&somemutex); > . > . > masked interrupt occurs, interrupt structure contains array > of mutexes the interrupt will need. Check said mutexes, one > found to be held by current cpu. Set interrupt-pending bit > in mutex and iret immediately. You cannot be pre-empted by an interrupt if you are holding a spin mutex, AFAIK, even under present implementation. > . > . > release_spin_mutex(&somemutex) > (bit found to be set in mutex, triggers interrupt reissuing code) > } > > And there you have it. The mutex/array test is takes very little time > being a read-only test that requires no bus locking, and the collision > case is cheap also because the current cpu already owns the mutex, allowing > us to set the interrupt-pending bit in that mutex without any bus > locking. The check during the release of the mutex is two instructions, > no bus locking required. The whole thing can be implemented without any > additional bus locking and virtually no contention. > > The case could be further optimized by requiring that interrupts only > use a single mutex, period. This would allow the mainline interrupt > routine to obtain the mutex on entry to the interrupt and allow the > reissuing code to reissue the interrupt without freeing the mutex that > caused the reissue, so the mutex is held throughout and then freed by > the interrupt itself. > > Holy shit. I think that's it! I don't think it can get much better then > that. It solves all of BDE's issues, solves the interrupt-as-thread > issue (by not using threads for interrupts at all), and removes a huge > amount of unnecessary complexity from the system. We could even get rid > of the idle processes if we wanted to. What happens if we get an interrupt, we're thinking about servicing it, about to check whether we're already holding a mutex that may potentially be used inside the mainline int routine, and another CPU becomes idle? In this particular case, let's say that we decide that we have to set ipending and iret immediately, because we're already holding a potential lock when we got interrupted. Isn't the result that we have a second CPU idling while we just set ipending? (I could be missing something, really). Also, some mainline interrupt code may need to acquire a really large number of locks, but only in some cases. Let's say we have to first check if we have a free cached buffer sitting somewhere, and if not, malloc() a new one. Well, the malloc() will eventually trigger a chain of mutex lock operations, but only in the case where we lack the cached buffer to allocate it. There is no practical way of telling up front whether or not we'll have to malloc(), so I'm wondering how efficiently we would be able to predict in cases such as these? > -Matt Cheers, -- Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)
* Greg Lehey <[EMAIL PROTECTED]> [010417 17:02] wrote: > On Tuesday, 17 April 2001 at 1:19:57 -0700, Alfred Perlstein wrote: > > * Matt Dillon <[EMAIL PROTECTED]> [010415 23:16] wrote: > >> > >> For example, all this work on a preemptive > >> kernel is just insane. Our entire kernel is built on the concept of > >> not being preemptable except by interrupts. We virtually guarentee > >> years of instability and bugs leaking out of the woodwork by trying to > >> make it preemptable, and the performance gain we get for that pain > >> is going to be zilch. Nada. Nothing. > > > > Pre-emption is mearly a side effect of a mutex'd kernel. > > > > The actual gains are in terms of parallel execution internally. > > Meaning if we happen to copyin() a 4 meg buffer we can allow more > > than one process to be completing some sort of work inside the > > kernel other than spinning on the giant lock. > > *sigh* Couldn't you have changed the subject line when discussing > something of this importance? I wasn't discussing, I was explaining. -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] Represent yourself, show up at BABUG http://www.babug.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Kernel preemption, yes or no? (was: Filesystem gets a huge performance boost)
:*sigh* Couldn't you have changed the subject line when discussing :something of this importance? : :Greg Sorry. Now I am so much in a huff I'm thinking about how all this could be implemented from scratch with the 4.x base. I know, I know, good luck Matt... For example, this business about interrupts as threads. BSDI had an interesting solution, but what I liked most about it was that if an interrupt could be completed right then and there it was just like a normal interrupt. It's the bit about switching stacks, detecting a mutex interlock, and blocking as a thread which was the complex part. But I think I just now came up with a better one (and to be fair, I just came up with this now). Interrupts by definition know precisely what they are going to do, so by definition they know precisely which mutexes (if any) they may need to get. This means that, in fact, it is possible to implement a check to determine if any of the mutexes an interrupt might want to get are already being held by the SAME cpu or not, and if they are to do the equivalent of what our delayed-interrupt stuff does in the stable's spl/splx code, but instead do the check when a mutex is released. The result is: No need for an idle process to support interrupt contexts, no need to implement interrupts as threads, and no need to implement fancy scheduling tricks or Giant handling. 4.x: mainline kernel { s = splblah(); . masked interrupt occurs, sets bit and immediately irets . . splx(s); (bit found to be set and delayed interrupt is issued here) } Proposed: mainline kernel { get_spin_mutex(&somemutex); . . masked interrupt occurs, interrupt structure contains array of mutexes the interrupt will need. Check said mutexes, one found to be held by current cpu. Set interrupt-pending bit in mutex and iret immediately. . . release_spin_mutex(&somemutex) (bit found to be set in mutex, triggers interrupt reissuing code) } And there you have it. The mutex/array test is takes very little time being a read-only test that requires no bus locking, and the collision case is cheap also because the current cpu already owns the mutex, allowing us to set the interrupt-pending bit in that mutex without any bus locking. The check during the release of the mutex is two instructions, no bus locking required. The whole thing can be implemented without any additional bus locking and virtually no contention. The case could be further optimized by requiring that interrupts only use a single mutex, period. This would allow the mainline interrupt routine to obtain the mutex on entry to the interrupt and allow the reissuing code to reissue the interrupt without freeing the mutex that caused the reissue, so the mutex is held throughout and then freed by the interrupt itself. Holy shit. I think that's it! I don't think it can get much better then that. It solves all of BDE's issues, solves the interrupt-as-thread issue (by not using threads for interrupts at all), and removes a huge amount of unnecessary complexity from the system. We could even get rid of the idle processes if we wanted to. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message