Re: [Libevent-users] sensible thread-safe signal handling proposal
On Sun, Nov 04, 2007 at 12:15:56PM -0800, Steven Grimm wrote: On Nov 4, 2007, at 8:13 AM, Marc Lehmann wrote: This would create additional loops (event_bases). The difference is that these cannot handle signals (or child watchers) at all, with the default loop being the only one to do signal handling. This seems like a totally sane approach to me. Having multiple loops is a big performance win for some applications (e.g., memcached in multithreaded mode), so making the behavior a bit more consistent is a good thing. It's only a performance win when the number of context switches and cache stomping, as a result of multiple threads cycling within their own context does not outweigh the latency of a model using less or even 1 thread. Consider a room with 20 people in it and a single door. The goal is to hand them a football as a new football is dropped off the assembly line and have them exit the door. You could throw them all a new football right as it comes off the line and have them immediately rush for the door - resulting in a log jam that one has to stop tending the assembly line to handle. You then head back to the line and begin the patterened task of throwing footballs to workers as fast as you can - only to have the log jam repeat itself. The only way to solve this efficiently is to have less people try and exit the door at once, or add more doors (CPUs). Now if only there were a way to wake just one thread up when input arrives on a descriptor being monitored by multiple threads... But that isn't supported by any of the underlying poll mechanisms as far as I can tell. -Steve It isn't typically supported because it's not a particularly useful or efficient path to head down in the first place. Thread pools being what they are, incredibly useful and pretty much the de facto in threaded code, do have their own abstraction limits as well. Setting up a thread pool, an inherently asynchronous and unordered collection of contexts, to asynchronously process an ordered stream of data (unless your protocol has no sequence, which I doubt), which I presume to somehow be in the name of performance, is way more complex and troublesome design than it needs to be. It's anchored somewhat to the every thread can do anything school of thought which has many hidden costs. The issue in itself is having multiple threads monitor the *same* fd via any kind of wait mechanism. It's short circuiting application layers, so that a thread (*any* thread in that pool) can immediately process new data. I think it would be much more structured, less complex (i.e. better performance in the long run anyways), and a cleaner design to have a set number (or even 1) thread handle the controller task of tending to new network events, push them onto a per-connection PDU queue, or pre-process in some form or fashion, condsig, and let previously mentioned thread pool handle it in an ordered fashion. Having a group of threads listening to the same fd has now just thrown our football manager out entirely and become a smash-and-grab for new footballs. There's still the door to get through. -cl ___ Libevent-users mailing list Libevent-users@monkey.org http://monkey.org/mailman/listinfo/libevent-users
Re: [Libevent-users] sensible thread-safe signal handling proposal
On Sun, Nov 04, 2007, Steven Grimm wrote: Would this be for listen sockets, or for general read/write IO on an FD? Specifically for a mixed TCP- and UDP-based protocol where any thread is equally able to handle an incoming request on the UDP socket, but TCP sockets are bound to particular threads. Makes sense. Doesn't solaris event ports system handle this? I haven't checked in depth. It sounds like something that kqueue could be extended to do relatively easily. What about multiple threads blocking on the same UDP socket? Do multiple threads wake up when IO arrives? Or just one? Adrian ___ Libevent-users mailing list Libevent-users@monkey.org http://monkey.org/mailman/listinfo/libevent-users
Re: [Libevent-users] sensible thread-safe signal handling proposal
On Sun, Nov 04, 2007 at 03:18:42PM -0800, Steven Grimm wrote: You've just pretty accurately described my initial implementation of thread support in memcached. It worked, but it was both more CPU- intensive and had higher response latency (yes, I actually measured it) than the model I'm using now. The only practical downside of my current implementation is that when there is only one UDP packet waiting to be processed, some CPU time is wasted on the threads that don't end up winning the race to read it. But those threads were idle at that instant anyway (or they wouldn't have been in a position to wake up) so, according to my benchmarking, there doesn't turn out to be an impact on latency. And though I am wasting CPU cycles, my total CPU consumption still ends up being lower than passing messages around between threads. Is this on Linux? They addressed the stampeding herd problem years ago. If you dig deep down in the kernel you'll see their waitq implemention for non-blocking socket work (and lots of other stuff). Only one thread is ever woken per event. ___ Libevent-users mailing list Libevent-users@monkey.org http://monkey.org/mailman/listinfo/libevent-users
Re: [Libevent-users] sensible thread-safe signal handling proposal
Christopher Layne wrote: On Sun, Nov 04, 2007 at 04:23:01PM -0800, Scott Lamb wrote: It wasn't what I expected; I was fully confident at first that the thread-pool, work-queue model would be the way to go, since it's one I've implemented in many applications in the past. But the numbers said otherwise. Thanks for the case study. To rephrase (hopefully correctly), you tried these two models: 1) one thread polls and puts events on a queue; a bunch of other threads pull from the queue. (resulted in high latency, and I'm not too surprised...an extra context switch before handling any events.) So back to this.. 2) a bunch of threads read and handle events independently. (your current model.) BTW: How does this model somehow exempt itself from said context switching issue of the former? Hmm, William Ahern says that at least on Linux, they only wake one thread per event. That would explain it. Did you also tried the so-called leader/follower model, in which the thread which does the polling handles the first event and puts the rest on a queue; another thread takes over polling if otherwise idle while the first thread is still working. My impression this was a widely favored model, though I don't know the details of where each performs best. Something about this just seems like smoke and mirrors to me. At the end of the day we still only have a finite amount of CPU cores available to us and any amount of playing with the order of things is not going to extract any magical *more* throughput out of a given box. Yes, some of these methods influence recv/send buffers and have a cascading effect on overall throughput, but efficient code and algorithms are going to make the real difference - not goofy thread games. (and this is coming from someone who *likes* comp.programming.threads) Oh, I don't know, there is something to be said for not making a handoff between threads if you can avoid it. You're not going to get more throughput than n_cores times what you got with one processor, but I'd expect avoiding context switches and cache bouncing to help you get closer to that. ___ Libevent-users mailing list Libevent-users@monkey.org http://monkey.org/mailman/listinfo/libevent-users