On Wed, Apr 4, 2018 at 8:42 PM, Paul Moore <p.f.mo...@gmail.com> wrote: > IMO, > async has proved useful for handling certain types of IO bound > workloads with lower overheads[1] than traditional multi-threaded or > multi-process designs. Whether it's a good fit for any particular > application is something you'd have to test, as with anything else.
This I would agree with. There are certain types of tasks that really lend themselves spectacularly well to async I/O models - mainly those that are fundamentally reactive and can have inordinate numbers of connected users. Take a chat room, for example. The client establishes a connection to a server (maybe IRC, maybe a WebSocket, whatever), and the server just sits there doing nothing until some client sends a message. Then the server processes the message, does whatever it thinks is right, and sends messages out to one or more clients. It's essential that multiple concurrent clients be supported (otherwise you're chatting with yourself), so how do the different models hold up? 1) Multiple independent processes. Abysmal; they have to communicate with each other, so this would require a measure of persistence (eg a database) and some means of signalling other processes. A pain to write, and horribly inefficient. You probably would need two threads per process (one to read, one to write). 2) The multiprocessing module. Better than the above because you can notify other processes using a convenient API, but you still need an entire process for every connected client. Probably you'll top out at a few hundred clients, even if they're all quiet. Still need two threads per process. 3) Native OS threads using the threading module. Vast improvement; the coding work would be pretty much the same as for multiprocessing, but instead of a process, you need a thread. Probably would top out at a few thousand clients, maybe a few tens of thousands, as long as they're all fairly quiet. Since all state is now shared, you now need only one thread per process (its reading thread), and writing is done in the thread that triggered it. Everything that's global is now stored in just one place. 4) Asynchronous I/O with an event loop. Small improvement over the above; now that there's no OS threads involved, you're now limited only by available memory. You could have tens of millions of connected clients as long as they're all quiet. Concurrency is now 100% in the programmer's hands (with explicit yield points), instead of having automatic yield points any time a thread blocks for any reason; this restricts parallelism to the points where actual I/O is happening. One DNS lookup can bring you down, but 100K connected sockets would be no problem at all. Async I/O certainly has its place, but the performance advantages don't really kick in until you're scaling to ridiculous levels of dormant clients. (If you have a large number of *active* clients, your actual work is going to be more significant, and you'll need to spend more time in the CPU, making multiple processes look more attractive.) Its biggest advantages are in _code simplicity_, not performance. (And only for people who can't wrap their head around threads; if you're fluent in threads, the simplicity is comparable, so there's less advantage.) ChrisA -- https://mail.python.org/mailman/listinfo/python-list