[Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Marc Lehmann
 On Sat, Nov 03, 2007 at 03:45:39PM -0700, William Ahern [EMAIL PROTECTED] 
 wrote:
  Curious how you managed to do this. Are you checking the process PID on each
  loop?
 
 I considered that, but I think its too slow (one also needs to be careful
 that watchers don't change e.g. epoll state until the getpid check is
 done), or at leats I think I don't want that speed hit, no matter what.

After giving signal handling and threads a lot of thought, I came to these
conclusions:

- requiring pthreads or windows mutexes by default is not acceptable,
  but thats the only way to distribute signal events among event loops
  properly, or globally among many threads if signal handling were global.
- the only way to do it without locking is to only allow a single
  loop to handle events.

This is the interface I came up with to manage multiple loops (which I
think makes more sense than the interface currently in libevent):

   struct ev_loop *ev_default_loop (int methods);
   void ev_default_destroy (void);
   void ev_default_fork (void);

this would create the default loop (event_base). ev_default_loop
would always create the same loop, and it would be the one to use for
third-party libraries in general, too. The fork method can be called in
the parent or child (or even in both, or without forking), and it would
destroy and recreate the kernel state but keep all the watchers for the
default loop.

   struct ev_loop *ev_loop_new (int methods);
   void ev_loop_destroy (EV_P);
   void ev_loop_fork (EV_P);

This would create additional loops (event_bases). The difference is that
these cannot handle signals (or child watchers) at all, with the default loop
being the only one to do signal handling.

This would be consistent with how signals are usually handled in a pthreads
environment: block signals in all threads and in one thread handle them all
(sigwait, or using the default mainloop).

No locking inside libevent would be required this way.

I'll implement this in my libev replacement code, unless somebody else comes
up with a better idea.

One such idea that isn't better, but different, would be to require the
user to provide mutex support, such as in ev_init_locking (size, init_cb,
lock_cb, unlock_cb, free_cb) or similar, then use locking and let any
event loop handle the signals and distribute signal events to the relevant
loops. But I am not sure how much locking would be required and I assume
it would be a lot, as one would need to handle the case where one thread
handles a signal for an event_base currently in use by another thread.

Looking at the code in libevent, it seems that signals get handled by
whatever loop was started last, so signal handling is not reliable at all
unless one registers the signal handlers in all threads, which is hard to
do in a thread-safe manner (for the user code).

Having a deterministic model where one loop handles all that would definitely
an improvement over this.

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  [EMAIL PROTECTED]
  -=/_/_//_/\_,_/ /_/\_\
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Christopher Layne
On Sun, Nov 04, 2007 at 12:15:56PM -0800, Steven Grimm wrote:
 On Nov 4, 2007, at 8:13 AM, Marc Lehmann wrote:
 This would create additional loops (event_bases). The difference is  
 that
 these cannot handle signals (or child watchers) at all, with the  
 default loop
 being the only one to do signal handling.
 
 This seems like a totally sane approach to me. Having multiple loops  
 is a big performance win for some applications (e.g., memcached in  
 multithreaded mode), so making the behavior a bit more consistent is a  
 good thing.

It's only a performance win when the number of context switches and
cache stomping, as a result of multiple threads cycling within their own
context does not outweigh the latency of a model using less or even
1 thread.

Consider a room with 20 people in it and a single door. The goal is to
hand them a football as a new football is dropped off the assembly
line and have them exit the door. You could throw them all a new football
right as it comes off the line and have them immediately rush for the door -
resulting in a log jam that one has to stop tending the assembly line to
handle. You then head back to the line and begin the patterened task of
throwing footballs to workers as fast as you can - only to have the log jam
repeat itself.

The only way to solve this efficiently is to have less people try and exit
the door at once, or add more doors (CPUs).

 Now if only there were a way to wake just one thread up when input  
 arrives on a descriptor being monitored by multiple threads... But  
 that isn't supported by any of the underlying poll mechanisms as far  
 as I can tell.
 
 -Steve

It isn't typically supported because it's not a particularly useful or
efficient path to head down in the first place.

Thread pools being what they are, incredibly useful and pretty much the de
facto in threaded code, do have their own abstraction limits as well.

Setting up a thread pool, an inherently asynchronous and unordered collection
of contexts, to asynchronously process an ordered stream of data (unless
your protocol has no sequence, which I doubt), which I presume to somehow
be in the name of performance, is way more complex and troublesome design
than it needs to be. It's anchored somewhat to the every thread can do
anything school of thought which has many hidden costs.

The issue in itself is having multiple threads monitor the *same* fd via any
kind of wait mechanism. It's short circuiting application layers, so that a
thread (*any* thread in that pool) can immediately process new data. I think
it would be much more structured, less complex (i.e. better performance in
the long run anyways), and a cleaner design to have a set number (or even
1) thread handle the controller task of tending to new network events,
push them onto a per-connection PDU queue, or pre-process in some form or
fashion, condsig, and let previously mentioned thread pool handle it in an
ordered fashion. Having a group of threads listening to the same fd has now
just thrown our football manager out entirely and become a smash-and-grab
for new footballs. There's still the door to get through.

-cl
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Adrian Chadd
On Sun, Nov 04, 2007, Steven Grimm wrote:

 Would this be for listen sockets, or for general read/write IO on an  
 FD?
 
 Specifically for a mixed TCP- and UDP-based protocol where any thread  
 is equally able to handle an incoming request on the UDP socket, but  
 TCP sockets are bound to particular threads.

Makes sense. Doesn't solaris event ports system handle this? I haven't
checked in depth.

It sounds like something that kqueue could be extended to do relatively
easily.

What about multiple threads blocking on the same UDP socket? Do multiple
threads wake up when IO arrives? Or just one?




Adrian

___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread William Ahern
On Sun, Nov 04, 2007 at 03:18:42PM -0800, Steven Grimm wrote:
 You've just pretty accurately described my initial implementation of  
 thread support in memcached. It worked, but it was both more CPU- 
 intensive and had higher response latency (yes, I actually measured  
 it) than the model I'm using now. The only practical downside of my  
 current implementation is that when there is only one UDP packet  
 waiting to be processed, some CPU time is wasted on the threads that  
 don't end up winning the race to read it. But those threads were idle  
 at that instant anyway (or they wouldn't have been in a position to  
 wake up) so, according to my benchmarking, there doesn't turn out to  
 be an impact on latency. And though I am wasting CPU cycles, my total  
 CPU consumption still ends up being lower than passing messages around  
 between threads.
 

Is this on Linux? They addressed the stampeding herd problem years ago. If
you dig deep down in the kernel you'll see their waitq implemention for
non-blocking socket work (and lots of other stuff). Only one thread is ever
woken per event.
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users


Re: [Libevent-users] sensible thread-safe signal handling proposal

2007-11-04 Thread Scott Lamb
Christopher Layne wrote:
 On Sun, Nov 04, 2007 at 04:23:01PM -0800, Scott Lamb wrote:
 It wasn't what I expected; I was fully confident at first that the
 thread-pool, work-queue model would be the way to go, since it's one
 I've implemented in many applications in the past. But the numbers said
 otherwise.
 Thanks for the case study. To rephrase (hopefully correctly), you tried
 these two models:

 1) one thread polls and puts events on a queue; a bunch of other threads
 pull from the queue. (resulted in high latency, and I'm not too
 surprised...an extra context switch before handling any events.)
 
 So back to this..
 
 2) a bunch of threads read and handle events independently. (your
 current model.)
 
 BTW: How does this model somehow exempt itself from said context switching
 issue of the former?

Hmm, William Ahern says that at least on Linux, they only wake one
thread per event. That would explain it.

 Did you also tried the so-called leader/follower model, in which the
 thread which does the polling handles the first event and puts the rest
 on a queue; another thread takes over polling if otherwise idle while
 the first thread is still working. My impression this was a widely
 favored model, though I don't know the details of where each performs best.
 
 Something about this just seems like smoke and mirrors to me. At the end of
 the day we still only have a finite amount of CPU cores available to us and
 any amount of playing with the order of things is not going to extract any
 magical *more* throughput out of a given box. Yes, some of these methods
 influence recv/send buffers and have a cascading effect on overall throughput,
 but efficient code and algorithms are going to make the real difference - not
 goofy thread games.
 
 (and this is coming from someone who *likes* comp.programming.threads)

Oh, I don't know, there is something to be said for not making a handoff
between threads if you can avoid it. You're not going to get more
throughput than n_cores times what you got with one processor, but I'd
expect avoiding context switches and cache bouncing to help you get
closer to that.
___
Libevent-users mailing list
Libevent-users@monkey.org
http://monkey.org/mailman/listinfo/libevent-users