On Thu, 10.10.13 13:12, David Strauss (da...@davidstrauss.net) wrote: > I was actually planning to rewrite on top of libuv today, but I'm > happy to port to the new, native event library. > > Is there any best-practice for using it with multiple threads?
We are pretty conservative on threads so far, but I guess in this case it makes some sense to distribute work on CPUs. Here's how I would do it: You start with one thread first (the main thread that is). You run an event queue, and add all listening sockets to it. When a connection comes in you process it as usual. As soon as you notice you are processing more than let's say 5 connections at the same time, you spawn a new thread and disable the listening sockets watches (use sd_event_source_set_enable(fd, SD_EVENT_OFF) for this). That new thread then also runs an event loop if its own, completely independent of the original one, and also adds the listening sockets to them, it basically takes over from the original main thread. Eventually this second thread will either also reach its limit of 5 connections. Now, we could just fork off a yet another thread, and again pass control of the listening socket to it and so on, but we cannot do this unbounded, and we should try to give work back to the older threads that have become idle again. To do this, we keep a (mutex protected) global list of thread information structs, each structure contains two things: a counter how many connections that thread currently processes, and an fd referring to a per-thread eventfd(). The eventfd() is hooked into the thread's event loop, and we use this to pass control of the listening socket from one thread to another. So with this in place we can now alter our thread allocation scheme: instead of stupidly forking off a new thread from a thread that reached its connection limit we simply sweep through the thread info struct array and look for the thread with the least number of connections, then trigger its eventfd. When that thread gets this in its event loop it will reenable the listening on the fds, and go on, until it reached again the limit, at which point it will try to find another thread to take control of the listening socket. When during the sweep a thread recognizes that all threads are at their limits it forks off a new one, as described above. If the max number of threads is reached (which we should put at 2x or 3x the number of CPUs in the the CPU affinity set of the process), the thread in control of the listening socket will simply turn off the poll flags for the listening socket, and stop porcessing it for one event loop iteration, and then try to pass it on to somebody else on the next iteration. With this scheme you should get pretty good distribution of things if a large number of long running TCP connections are made. It will be not as good if a lot of short ones are made. That all said, I am not convinced this is really something to necessarily implement in the service itself. Instead we could also beef up support for the new SO_REUSEPORT socket option in systemd. For example, we could add a new option in .socket files: Distribute=$NUMBER. If set to some number systemd will create that many socket fds and all bind them to the same configured address with SO_REUSEPORT. Then, when a connection comes in on any of these, we'd instantiate a new service instance for each and pass that one listening socket to it, which that daemon instance would then process. The daemon would invoke accept() on the fd, a couple of times, and process everything it finds there. After it became idle for a while it would exit. With the SO_REUSEPORT scheme your daemon can stay single threaded (making things much simpler), and you'd get much better performance too... (Oh, and of course, with that work, we'd have something powerful for other usecases too). All load balancing would be done by the kernel, and that's kinda cool, because they actually are good at these things... So, if you ask me, I vote for the SO_REUSEPORT logic. For more information on SO_REUSEPORT: https://lwn.net/Articles/542629/ Lennart -- Lennart Poettering - Red Hat, Inc. _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel