Re: Benefits of asyncio
Marko Rauhamaa writes: > Mostly asyncio is a way to deal with anything you throw at it. What do > you do if you need to exit the application immediately and your threads > are stuck in a 2-minute timeout? Eh? When the main thread exits, all the child threads go with it. Sometimes there is some crap in the stderr because of resource cleanups happening in unexpected order as the various threads exit, but it all shuts down. The new Tulip i/o stuff based on "yield" coroutines should combine the advantages of async and threads. -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
Marko Rauhamaa writes: > That's a good reason to avoid threads. Once you realize you would have > been better off with an async approach, you'll have to start over. That just hasn't happened to me yet, at least in terms of program organization. Python threads get too slow once there are too many tasks, but that's just an implementation artifact of Python threads, and goes along with Python being slow in general. Write threaded code in GHC or Erlang or maybe Go, and you can handle millions of connections, as the threads are in userspace and are very lightweight and fast. http://haskell.cs.yale.edu/wp-content/uploads/2013/08/hask035-voellmy.pdf -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On Wed, Jun 4, 2014 at 3:10 PM, Burak Arslan wrote: > Ah ok. Well, a couple of years of writing async code, my not-so-objective > opinion about it is that it forces you to split your code into functions, > just like Python forces you to indent your code properly. This in turn > generally helps the quality of the codebase. That's entirely possible, but it depends hugely on your library/framework, then - see earlier comments in this thread about Node.js and the nightmare of callbacks. One thing I'm seeing, though, the more different styles of programming I work with, is that since it's possible to write good code in pretty much anything (even PHP, and my last boss used that as a counter-argument to "PHP sucks"), and since a good programmer will write good code in anything, neither of these is really a good argument in favour of (or against) a feature/library/framework/style. Python forces you to indent your code. Fine! But a good programmer will already indent, and a sloppy programmer isn't forced to be consistent. (At worst, you just add "if True:" every time you unexpectedly indent.) To judge the quality of a framework based on code style, you need to look at a *bad* programmer and what s/he produces. A bad programmer, with just GOTO and line numbers, will often produce convoluted code that's completely unreadable; a bad programmer with a good suite of structured control flow will more generally stumble to something that's at least mostly clear. So how does async vs threaded stack up there? A competent programmer won't have a problem with either model. A mediocre programmer probably will think about one thing at a time, and will then run into problems. Threading produces these problems in one set of ways, asyncio produces problems in another set of ways. Which one would you, as an expert, prefer to deal with in a junior programmer's code? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On 03/06/14 14:57, Chris Angelico wrote: On Tue, Jun 3, 2014 at 9:05 PM, Burak Arslan wrote: On 06/03/14 12:30, Chris Angelico wrote: Write me a purely nonblocking web site concept that can handle a million concurrent connections, where each one requires one query against the database, and one in a hundred of them require five queries which happen atomically. I don't see why that can't be done. Twisted has everyting I can think of except database bits (adb runs on threads), and I got txpostgres[1] running in production, it seems quite robust so far. what else are we missing? [1]: https://pypi.python.org/pypi/txpostgres I never said it can't be done. My objection was to Marko's reiterated statement that asynchronous coding is somehow massively cleaner than threading; my argument is that threading is often significantly cleaner than async, and that at worst, they're about the same (because they're dealing with exactly the same problems). Ah ok. Well, a couple of years of writing async code, my not-so-objective opinion about it is that it forces you to split your code into functions, just like Python forces you to indent your code properly. This in turn generally helps the quality of the codebase. If you manage to keep yourself out of the closure hell by not writing more and more functions inside one another, I say async code and (non-sloppy) blocking code looks almost the same. (which means, I guess, that we mostly agree :)) Burak -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
Chris Angelico : > Okay. How do you do basic logging? (Also - rolling your own logging > facilities, instead of using what Python provides, is the simpler > solution? This does not aid your case.) Asyncio is fresh out of the oven. It's going to take years before the standard libraries catch up with it. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
In article <87ha42uos2@elektro.pacujo.net>, Marko Rauhamaa wrote: > Chris Angelico : > > > I don't see how Marko's assertion that event-driven asynchronous > > programming is a breath of fresh air compared with multithreading. The > > only way multithreading can possibly be more complicated is that > > preemption can occur anywhere - and that's exactly one of the big > > flaws in async work, if you don't do your job properly. > > Say you have a thread blocking on socket.accept(). Another thread > receives the management command to shut the server down. How do you tell > the socket.accept() thread to abort and exit? You do the accept() in a daemon thread? -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On Tue, Jun 3, 2014 at 11:42 PM, Marko Rauhamaa wrote: > Chris Angelico : > >> https://docs.python.org/3.4/library/logging.html#logging.Logger.debug >> >> What happens if that blocks? How can you make sure it won't? > > I haven't used that class. Generally, Python standard libraries are not > readily usable for nonblocking I/O. > > For myself, I have solved that particular problem my own way. Okay. How do you do basic logging? (Also - rolling your own logging facilities, instead of using what Python provides, is the simpler solution? This does not aid your case.) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
Chris Angelico : > https://docs.python.org/3.4/library/logging.html#logging.Logger.debug > > What happens if that blocks? How can you make sure it won't? I haven't used that class. Generally, Python standard libraries are not readily usable for nonblocking I/O. For myself, I have solved that particular problem my own way. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On Tue, Jun 3, 2014 at 11:05 PM, Marko Rauhamaa wrote: > Chris Angelico : > >> I don't see how Marko's assertion that event-driven asynchronous >> programming is a breath of fresh air compared with multithreading. The >> only way multithreading can possibly be more complicated is that >> preemption can occur anywhere - and that's exactly one of the big >> flaws in async work, if you don't do your job properly. > > Say you have a thread blocking on socket.accept(). Another thread > receives the management command to shut the server down. How do you tell > the socket.accept() thread to abort and exit? > > The classic hack is close the socket, which causes the blocking thread > to raise an exception. How's that a hack? If you're shutting the server down, you need to close the listening socket anyway, because otherwise clients will think they can get in. Yes, I would close the socket. Or just send the process a signal like SIGINT, which will break the accept() call. (I don't know about Python specifically here; the underlying Linux API works this way, returning EINTR, as does OS/2 which is where I learned. Generally I'd have the accept() loop as the process's main loop, and spin off threads for clients.) In fact, the most likely case I'd have would be that the receipt of that signal *is* the management command to shut the server down; it might be SIGINT or SIGQUIT or SIGTERM, or maybe some other signal, but one of the easiest ways to notify a Unix process to shut down is to send it a signal. Coping with broken proprietary platforms is an exercise for the reader, but I know it's possible to terminate a console-based socket accept loop in Windows with Ctrl-C, so there ought to be an equivalent API method. > The blocking thread might be also stuck in socket.recv(). Closing the > socket from the outside is dangerous now because of race conditions. So > you will have to carefully use add locking to block an unwanted closing > of the connection. Maybe. More likely, the same situation applies - you're shutting down, so you need to close the socket anyway. I've generally found - although this may not work on all platforms - that it's perfectly safe for one thread to be blocked in recv() while another thread calls send() on the same socket, and then closes that socket. On the other hand, if your notion of shutting down does NOT include closing the socket, then you have to deal with things some other way - maybe handing the connection on to some other process, or something - so a generic approach isn't appropriate here. > But what do you do if the blocking thread is stuck in the middle of a > black box API that doesn't expose a file you could close? > > So you hope all blocking APIs have a timeout parameter. No! I never put timeouts on blocking calls to solve shutdown problems. That is a hack, and a bad one. Timeouts should be used only when the timeout is itself significant (eg if you decide that your socket connections should time out if there's no activity in X minutes, so you put a timeout on socket reads of X*6 and close the connection cleanly if it times out). > Well, ok, > >os.kill(os.getpid(), signal.SIGKILL) > > is always an option. Yeah, that's one way. More likely, you'll find that a lesser signal also aborts the blocking API call. And even if you have to hope for an alternate API to solve this problem, how is that different from hoping that all blocking APIs have corresponding non-blocking APIs? I reiterate the example I've used a few times already: https://docs.python.org/3.4/library/logging.html#logging.Logger.debug What happens if that blocks? How can you make sure it won't? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
Chris Angelico : > I don't see how Marko's assertion that event-driven asynchronous > programming is a breath of fresh air compared with multithreading. The > only way multithreading can possibly be more complicated is that > preemption can occur anywhere - and that's exactly one of the big > flaws in async work, if you don't do your job properly. Say you have a thread blocking on socket.accept(). Another thread receives the management command to shut the server down. How do you tell the socket.accept() thread to abort and exit? The classic hack is close the socket, which causes the blocking thread to raise an exception. The blocking thread might be also stuck in socket.recv(). Closing the socket from the outside is dangerous now because of race conditions. So you will have to carefully use add locking to block an unwanted closing of the connection. But what do you do if the blocking thread is stuck in the middle of a black box API that doesn't expose a file you could close? So you hope all blocking APIs have a timeout parameter. You then replace all blocking calls with polling loops. You make the timeout value long enough not to burden the CPU too much and short enough not to annoy the human operator too much. Well, ok, os.kill(os.getpid(), signal.SIGKILL) is always an option. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On Tue, Jun 3, 2014 at 9:09 PM, Frank Millman wrote: > So why not keep a 'connection pool', and for every potentially blocking > request, grab a connection, set up a callback or a 'yield from' to wait for > the response, and unblock. Compare against a thread pool, where each thread simply does blocking requests. With threads, you use blocking database, blocking logging, blocking I/O, etc, and everything *just happens*; with a connection pool, like this, you need to do every single one of them separately. (How many of you have ever written non-blocking error logging? Or have you written a non-blocking system with blocking calls to write to your error log? The latter is far FAR more common, but all files, even stdout/stderr, can block.) I don't see how Marko's assertion that event-driven asynchronous programming is a breath of fresh air compared with multithreading. The only way multithreading can possibly be more complicated is that preemption can occur anywhere - and that's exactly one of the big flaws in async work, if you don't do your job properly. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On Tue, Jun 3, 2014 at 9:05 PM, Burak Arslan wrote: > On 06/03/14 12:30, Chris Angelico wrote: >> Write me a purely nonblocking >> web site concept that can handle a million concurrent connections, >> where each one requires one query against the database, and one in a >> hundred of them require five queries which happen atomically. > > > I don't see why that can't be done. Twisted has everyting I can think of > except database bits (adb runs on threads), and I got txpostgres[1] > running in production, it seems quite robust so far. what else are we > missing? > > [1]: https://pypi.python.org/pypi/txpostgres I never said it can't be done. My objection was to Marko's reiterated statement that asynchronous coding is somehow massively cleaner than threading; my argument is that threading is often significantly cleaner than async, and that at worst, they're about the same (because they're dealing with exactly the same problems). ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
Chris Angelico : > your throughput is defined by your database. Asyncio is not (primarily) a throughput-optimization method. Sometimes it is a resource consumption optimization method as the context objects are lighter-weight than full-blown threads. Mostly asyncio is a way to deal with anything you throw at it. What do you do if you need to exit the application immediately and your threads are stuck in a 2-minute timeout? With asyncio, you have full control of the situation. > It's the same with all sorts of other resources. What happens if your > error logging blocks? Do you code everything, *absolutely everything*, > around callbacks? Because ultimately, it adds piles and piles of > complexity and inefficiency, and it still comes back to the same > thing: stuff can make other stuff wait. It would be interesting to have an OS or a programming language where no function returns a value. Linux, in particular, suffers from the deeply-ingrained system assumption that all file access is synchronous. However, your protestations seem like a straw man to me. I have really been practicing event-driven programming for decades. It is fraught with frustrating complications but they feel like fresh air compared with the what-now moments I've had to deal with doing multithreaded programming. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
"Chris Angelico" wrote in message news:captjjmqwkestvrsrg30qjo+4ttlqfk9q4gabygovew8nsdx...@mail.gmail.com... > > This works as long as your database is reasonably fast and close > (common case for a lot of web servers: DB runs on same computer as web > and application and etc servers). It's nice and simple, lets you use a > single database connection (although you should probably wrap it in a > try/finally to ensure that you roll back on any exception), and won't > materially damage throughput as long as you don't run into problems. > For a database driven web site, most of the I/O time will be waiting > for clients, not waiting for your database. > > Getting rid of those blocking database calls means having multiple > concurrent transactions on the database. Whether you go async or > threaded, this is going to happen. Unless your database lets you run > multiple simultaneous transactions on a single connection (I don't > think the Python DB API allows that, and I can't think of any DB > backends that support it, off hand), that means that every single > concurrency point needs its own database connection. With threads, you > could have a pool of (say) a dozen or so, one per thread, with each > one working synchronously; with asyncio, you'd have to have one for > every single incoming client request, or else faff around with > semaphores and resource pools and such manually. The throughput you > gain by making those asynchronous with callbacks is quite probably > destroyed by the throughput you lose in having too many simultaneous > connections to the database. I can't prove that, obviously, but I do > know that PostgreSQL requires up-front RAM allocation based on the > max_connections setting, and trying to support 5000 connections > started to get kinda stupid. > I am following this with interest. I still struggle to get my head around the concepts, but it is slowly coming clearer. Focusing on PostgreSQL, couldn't you do the following? PostgreSQL runs client/server (they call it front-end/back-end) over TCP/IP. psycopg2 appears to have some support for async communication with the back-end. I only skimmed the docs, and it looks a bit complicated, but it is there. So why not keep a 'connection pool', and for every potentially blocking request, grab a connection, set up a callback or a 'yield from' to wait for the response, and unblock. Provided the requests return quickly, I would have thought a hundred database connections could support thousands of users. Frank Millman -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On 06/03/14 12:30, Chris Angelico wrote: > Write me a purely nonblocking > web site concept that can handle a million concurrent connections, > where each one requires one query against the database, and one in a > hundred of them require five queries which happen atomically. I don't see why that can't be done. Twisted has everyting I can think of except database bits (adb runs on threads), and I got txpostgres[1] running in production, it seems quite robust so far. what else are we missing? [1]: https://pypi.python.org/pypi/txpostgres -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On Tue, Jun 3, 2014 at 8:08 PM, Marko Rauhamaa wrote: > Chris Angelico : > >> Okay, but how do you handle two simultaneous requests going through >> the processing that you see above? You *MUST* separate them onto two >> transactions, otherwise one will commit half of the other's work. > > I will do whatever I have to. Pooling transaction contexts > ("connections") is probably necessary. Point is, no task should ever > block. > > I deal with analogous situations all the time, in fact, I'm dealing with > one as we speak. Rule 1: No task should ever block. Rule 2: Every task will require the database at least once. Rule 3: No task's actions on the database should damage another task's state. (Separate transactions.) Rule 4: Maximum of N concurrent database connections, for any given value of N. The only solution I can think of is to have a task wait (without blocking) for a database connection to be available. That's a lot of complexity, and you know what? It's going to come to exactly the same thing as blocking database queries will - your throughput is defined by your database. It's the same with all sorts of other resources. What happens if your error logging blocks? Do you code everything, *absolutely everything*, around callbacks? Because ultimately, it adds piles and piles of complexity and inefficiency, and it still comes back to the same thing: stuff can make other stuff wait. That's where threads are simpler. You do blocking I/O everywhere, and the system deals with the rest. Has its limitations, but sure is simpler. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
Chris Angelico : > Okay, but how do you handle two simultaneous requests going through > the processing that you see above? You *MUST* separate them onto two > transactions, otherwise one will commit half of the other's work. (Or > are you forgetting Databasing 101 - a transaction should be a logical > unit of work?) And since you can't, with most databases, have two > transactions on one connection, that means you need a separate > connection for each request. Given that the advantages of asyncio > include the ability to scale to arbitrary numbers of connections, it's > not really a good idea to then say "oh but you need that many > concurrent database connections". Most systems can probably handle a > few thousand threads without a problem, but a few million is going to > cause major issues; but most databases start getting inefficient at a > few thousand concurrent sessions. I will do whatever I have to. Pooling transaction contexts ("connections") is probably necessary. Point is, no task should ever block. I deal with analogous situations all the time, in fact, I'm dealing with one as we speak. > Alright. I'm throwing down the gauntlet. Write me a purely nonblocking > web site concept that can handle a million concurrent connections, > where each one requires one query against the database, and one in a > hundred of them require five queries which happen atomically. I can do > it with a thread pool and blocking database queries, and by matching > the thread pool size and the database concurrent connection limit, I > can manage memory usage fairly easily; how do you do it efficiently > with pure async I/O? Sorry, I'm going to pass. That doesn't look like a 5-liner. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On Tue, Jun 3, 2014 at 7:10 PM, Marko Rauhamaa wrote: > Chris Angelico : > >> def request.process(self): # I know this isn't valid syntax >> db.act(whatever) # may block but shouldn't for long >> db.commit() # ditto >> write(self, response) # won't block >> >> This works as long as your database is reasonably fast and close > > I find that assumption unacceptable. It is a dangerous assumption. > The DB APIs desperately need asynchronous variants. As it stands, you > are forced to delegate your DB access to threads/processes. > >> So how do you deal with the possibility that the database will block? > > You separate the request and response parts of the DB methods. That's > how it is implemented internally anyway. > > Say no to blocking APIs. Okay, but how do you handle two simultaneous requests going through the processing that you see above? You *MUST* separate them onto two transactions, otherwise one will commit half of the other's work. (Or are you forgetting Databasing 101 - a transaction should be a logical unit of work?) And since you can't, with most databases, have two transactions on one connection, that means you need a separate connection for each request. Given that the advantages of asyncio include the ability to scale to arbitrary numbers of connections, it's not really a good idea to then say "oh but you need that many concurrent database connections". Most systems can probably handle a few thousand threads without a problem, but a few million is going to cause major issues; but most databases start getting inefficient at a few thousand concurrent sessions. >> but otherwise, you would need to completely rewrite the main code. > > That's a good reason to avoid threads. Once you realize you would have > been better off with an async approach, you'll have to start over. You > can easily turn a nonblocking solution into a blocking one but not the > other way around. Alright. I'm throwing down the gauntlet. Write me a purely nonblocking web site concept that can handle a million concurrent connections, where each one requires one query against the database, and one in a hundred of them require five queries which happen atomically. I can do it with a thread pool and blocking database queries, and by matching the thread pool size and the database concurrent connection limit, I can manage memory usage fairly easily; how do you do it efficiently with pure async I/O? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
Chris Angelico : > def request.process(self): # I know this isn't valid syntax > db.act(whatever) # may block but shouldn't for long > db.commit() # ditto > write(self, response) # won't block > > This works as long as your database is reasonably fast and close I find that assumption unacceptable. The DB APIs desperately need asynchronous variants. As it stands, you are forced to delegate your DB access to threads/processes. > So how do you deal with the possibility that the database will block? You separate the request and response parts of the DB methods. That's how it is implemented internally anyway. Say no to blocking APIs. > but otherwise, you would need to completely rewrite the main code. That's a good reason to avoid threads. Once you realize you would have been better off with an async approach, you'll have to start over. You can easily turn a nonblocking solution into a blocking one but not the other way around. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On Tue, Jun 3, 2014 at 4:36 PM, Marko Rauhamaa wrote: > I have yet to see that in practice. The "typical" thread works as > follows: > > while True: > while request.incomplete(): > request.read() # block > sql_stmt = request.process() > db.act(sql_stmt) # block > db.commit()# block > response = request.ok_response() > while response.incomplete(): > response.write() # block > > The places marked with the "block" comment are states with only one > valid input stimulus. > ... > Yes, a "nest of callbacks" can get messy very quickly. That is why you > need to be very explicit with your states. Your class needs to have a > state field named "state" with clearly named state values. Simple/naive way to translate this into a callback system is like this: def request_read_callback(request, data): request.read(data) # however that part works if not request.incomplete(): request.process() def write(request, data): request.write_buffer += data request.attempt_write() # sets up callbacks for async writing def request.process(self): # I know this isn't valid syntax db.act(whatever) # may block but shouldn't for long db.commit() # ditto write(self, response) # won't block This works as long as your database is reasonably fast and close (common case for a lot of web servers: DB runs on same computer as web and application and etc servers). It's nice and simple, lets you use a single database connection (although you should probably wrap it in a try/finally to ensure that you roll back on any exception), and won't materially damage throughput as long as you don't run into problems. For a database driven web site, most of the I/O time will be waiting for clients, not waiting for your database. Getting rid of those blocking database calls means having multiple concurrent transactions on the database. Whether you go async or threaded, this is going to happen. Unless your database lets you run multiple simultaneous transactions on a single connection (I don't think the Python DB API allows that, and I can't think of any DB backends that support it, off hand), that means that every single concurrency point needs its own database connection. With threads, you could have a pool of (say) a dozen or so, one per thread, with each one working synchronously; with asyncio, you'd have to have one for every single incoming client request, or else faff around with semaphores and resource pools and such manually. The throughput you gain by making those asynchronous with callbacks is quite probably destroyed by the throughput you lose in having too many simultaneous connections to the database. I can't prove that, obviously, but I do know that PostgreSQL requires up-front RAM allocation based on the max_connections setting, and trying to support 5000 connections started to get kinda stupid. So how do you deal with the possibility that the database will block? "Pure" threading (one thread listens for clients, spin off a thread for each client, end the thread when the client disconnects) copes poorly; async I/O copes poorly. The thread pool copes well (you know exactly how many connections you'll need - one per thread in the pool), but doesn't necessarily solve the problem (you can get all threads waiting on the database and none handling other requests). Frankly, I think the only solution is to beef up the database so it won't block for too long (and, duh, to solve any stupid locking problems, because they WILL kill you :) ). > If threads simplify an asynchronous application, that is generally done > by oversimplifying and reducing functionality. Which means that I disagree with this statement. In my opinion, both simple models (pure threading and asyncio) can express the same functionality; the hybrid thread-pool model may simplify things a bit in the interests of resource usage; but threading does let you think about code the same way for one client as for fifty, without any change of functionality. Compare: # Console I/O: def print_menu(): print("1: Spam") print("2: Ham") print("3: Quit") def spam(): print("Spam, spam, spam, spam,") while input("Continue? ")!="NO!": print("spam, spam, spam...") def mainloop(): print("Welcome!") while True: print_menu() x = int(input("What would you like? ")) if x == 1: spam() elif x == 2: ham() elif x == 3: break else: print("I don't know numbers like %d."%x) print("Goodbye!") I could translate this into a pure-threading system very easily: # Socket I/O: import consoleio class TerminateRequest(Exception): pass tls = threading.local() def print(s): tls.socket.write(s+"\r\n") # Don't forget, most of the internet uses \r\n! def input(prompt): tls.socket.write(s) while '\n' not in tls.readbuffer: tls.readb
Re: Benefits of asyncio
Hello, On Mon, 02 Jun 2014 21:51:35 -0400 Terry Reedy wrote: > To all the great responders. If anyone thinks the async intro is > inadequate and has a paragraph to contribute, open a tracker issue. Not sure about intro (where's that?), but docs (https://docs.python.org/3/library/asyncio.html) are pretty confusing and bugs are reported, with no response: http://bugs.python.org/issue21365 > > -- > Terry Jan Reedy -- Best regards, Paul mailto:pmis...@gmail.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
Paul Rubin : > Marko Rauhamaa writes: >> - Thread programming assumes each thread is waiting for precisely >> one external stimulus in any given state -- in practice, each >> state must be prepared to handle quite a few possible stimuli. > > Eh? Threads typically have their own event loop dispatching various > kinds of stimuli. I have yet to see that in practice. The "typical" thread works as follows: while True: while request.incomplete(): request.read() # block sql_stmt = request.process() db.act(sql_stmt) # block db.commit()# block response = request.ok_response() while response.incomplete(): response.write() # block The places marked with the "block" comment are states with only one valid input stimulus. > Have threads communicate by message passing with immutable data in the > messages, and things tend to work pretty straightforwardly. Again, I have yet to see that in practice. It is more common, and naturally enforced, with multiprocessing. > Having dealt with some node.js programs and the nest of callbacks they > morph into as the application gets more complicated, threads have > their advantages. If threads simplify an asynchronous application, that is generally done by oversimplifying and reducing functionality. Yes, a "nest of callbacks" can get messy very quickly. That is why you need to be very explicit with your states. Your class needs to have a state field named "state" with clearly named state values. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
I haven't worked with asynchronous tasks or concurrent programming so far. Used VB2010 and have used some jQuery in a recent project but nothing low level. As per the explanation it seems that programming using asyncio would require identifying blocks of code which are not dependent on the IO. Wouldn't that get confusing? @Terry When I said that there would be waiting time I meant as compared to sequential programming. I was not comparing to threads. >From all the explanations what I got is that it is the way of doing event >driven programming like threads are for concurrent programming. It would have >been great if the library reference had mentioned the term event-driven >programming. It would have been a great starting point to understand. -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
To all the great responders. If anyone thinks the async intro is inadequate and has a paragraph to contribute, open a tracker issue. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On Tue, Jun 3, 2014 at 6:45 AM, Paul Rubin wrote: >> - Thread-safe programming is easy to explain but devilishly >> difficult to get right. > > I keep hearing that but not encountering it. Yes there are classic > hazards from sharing mutable state between threads. However, it's > generally not too difficult to program in a style that avoids such > sharing. Have threads communicate by message passing with immutable > data in the messages, and things tend to work pretty straightforwardly. It's more true on some systems than others. The issues of maintaining "safe" state are very similar in callback systems and threads; the main difference is that a single-threaded asyncio system becomes cooperative, where threading systems are (usually) preemptive. Preemption means you could get a context switch *anywhere*. (In Python, I think the rule is that thread switches can happen only between Python bytecodes, but that's still "anywhere" as far as your code's concerned.) That means you have to *keep* everything safe, rather than simply get it safe again. Cooperative multitasking means your function will run to completion before any other callback happens (or, at least, will get to a clearly defined yield point). That means you can muck state up all you like, and then fix it afterwards. In some ways, that's easier; but it has a couple of risks: firstly, if your code jumps out early somewhere, you might forget to fix the shared state, and only find out much later; and secondly, if your function takes a long time to execute, everything else stalls. So whichever way you do it, you still have to be careful - just careful of slightly different things. For instance, you might keep track of network activity as a potentially slow operation, and make sure you never block a callback waiting for a socket - but you might do a quick and simple system call, not realizing that it involves a directory that's mounted from a remote server. With threads, someone else will get priority as soon as you block, but with asyncio, you have to be explicit about everything that's done asynchronously. Threads are massively simpler if you have a top-down execution model for a relatively small number of clients. Works really nicely for a sequence of prompts - you just code it exactly as if you were using print() and input() and stuff, and then turn print() into a blocking socket write (or whatever your I/O is done over) and your input() into a blocking socket read with line splitting, and that's all the changes you need. (You could even replace the actual print and input functions, and use a whole block of code untouched.) Async I/O is massively simpler if you have very little state, and simply react to stimuli. Every client connects, authenticates, executes commands, and terminates its connection. If all you need to know is whether the client's authenticated or not (restricted commandset before login), asyncio will be really *really* easy, and threads are overkill. This is even more true if most of your clients are going to be massively idle most of the time, with just tiny queries coming in occasionally and getting responded to quickly. Both have their advantages and disadvantages. Learning both models is, IMO, worth doing; get to know them, then decide which one suits your project. >>Asyncio makes the prototype somewhat cumbersome to write. However, >>once it is done, adding features, stimuli and states is a routine >>matter. > > Having dealt with some node.js programs and the nest of callbacks they > morph into as the application gets more complicated, threads have their > advantages. I wrote an uberlite async I/O framework for my last job. Most of the work was done by the lower-level facilities (actual non-blocking I/O, etc), but basically, what I had was a single callback for each connection type and a dictionary of state for each connection (with a few exceptions - incoming UDP has no state, ergo no dict). Worked out beautifully simple; each run through the callback processed one logical action (eg a line of text arriving on a socket, terminated by newline), updated state if required, and returned, back to the main loop. Not all asyncio will fit into that sort of structure, but if it does fit, this keeps everything from getting out of hand. (Plus, keeping state in a separate dict rather than using closures and local variables meant I could update code while maintaining state. Not important for most Python projects, but it was for us.) Both have their merits. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On 06/02/14 20:40, Aseem Bansal wrote: > I read in these groups that asyncio is a great addition to Python 3. I have > looked around and saw the related PEP which is quite big BTW but couldn't > find a simple explanation for why this is such a great addition. Any simple > example where it can be used? AFAIR, Guido's US Pycon 2013 keynote is where he introduced asyncio (or tulip, which is the "internal codename" of the project) so you can watch it to get a good idea about his motivations. So what is Asyncio? In a nutshell, Asyncio is Python's standard event loop. Next time you're going to build an async framework, you should build on it instead of reimplementing it using system calls available on the platform(s) that you're targeting, like select() or epoll(). It's great because 1) Creating an abstraction over Windows and Unix way of event-driven programming is not trivial, 2) It makes use of "yield from", a feature available in Python 3.3 and up. Using "yield from" is arguably the cleanest way of doing async as it makes async code look like blocking code which seemingly makes it easier to reason about the flow of your logic. The idea is very similar to twisted's @inlineCallbacks, if you're familiar with it. If doing lower level programming with Python is not your cup of tea, you don't really care about asyncio. You should instead wait until your favourite async framework switches to it. > It can be used to have a queue of tasks? Like threads? Maybe light weight > threads? Those were my thoughts but the library reference clearly stated that > this is single-threaded. So there should be some waiting time in between the > tasks. Then what is good? You can use it to implement a queue of (mostly i/o bound) tasks. You are not supposed to use it in cases where you'd use threads or lightweight threads (or green threads, as in gevent or stackless). Gevent is also technically async but gevent and asyncio differ in a very subtle way: Gevent does cooperative multitasking whereas Asyncio (and twisted) does event driven programming. The difference is that with asyncio, you know exactly when you're switching to another task -- only when you use "yield from". This is not always explicit with gevent, as a function that you're calling can switch to another task without letting your code know. So with gevent, you still need to take the usual precautions of multithreaded programming. Gevent actually simulates threads by doing task switching (or thread scheduling, if you will) in userspace. Here's its secret sauce: https://github.com/python-greenlet/greenlet/tree/master/platform There's some scary platform-dependent assembly code in there! I'd think twice before seriously relying on it. Event driven programming does not need such dark magic. You also don't need to be so careful in a purely event-driven setting as you know that at any point in time only one task context can be active. It's like you have an implicit, zero-overhead LOCK ALL for all nonlocal state. Of course the tradeoff is that you should carefully avoid blocking the event loop. It's not that hard once you get the hang of it :) So, I hope this answers your questions. Let me know if I missed something. Best regards, Burak -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
Marko Rauhamaa writes: > - Thread programming assumes each thread is waiting for precisely > one external stimulus in any given state -- in practice, each > state must be prepared to handle quite a few possible stimuli. Eh? Threads typically have their own event loop dispatching various kinds of stimuli. > - Thread-safe programming is easy to explain but devilishly > difficult to get right. I keep hearing that but not encountering it. Yes there are classic hazards from sharing mutable state between threads. However, it's generally not too difficult to program in a style that avoids such sharing. Have threads communicate by message passing with immutable data in the messages, and things tend to work pretty straightforwardly. >Asyncio makes the prototype somewhat cumbersome to write. However, >once it is done, adding features, stimuli and states is a routine >matter. Having dealt with some node.js programs and the nest of callbacks they morph into as the application gets more complicated, threads have their advantages. -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
Terry Reedy : > I do not understand this. asyncio should switch between tasks faster > than the OS switches between threads, thus reducing waiting time. I don't know if thread switching is slower than task switching. However, there are two main reasons to prefer asyncio over threads: * Scalability. Asyncio can easily manage, say, a million contexts. Most operating systems will have a hard time managing more than about a thousand threads. Such scalability needs may arise in very busy network servers with tens of thousands of simultaneous connections or computer games that simulate thousands of "monsters." * Conceptual simplicity. Toy servers are far easier to implement using threads. However, before long, the seeming simplicity turns out to be a complication: - Thread programming assumes each thread is waiting for precisely one external stimulus in any given state -- in practice, each state must be prepared to handle quite a few possible stimuli. - Thread-safe programming is easy to explain but devilishly difficult to get right. Asyncio makes the prototype somewhat cumbersome to write. However, once it is done, adding features, stimuli and states is a routine matter. Threads have one major advantage: they can naturally take advantage of multiple CPU cores. Generally, I would stay away from threads and use multiple processes instead. However, threads may sometimes be the optimal solution. The key is to keep the number of threads small (maybe twice the number of CPUs). Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
In article , Terry Reedy wrote: > asyncio lets you write platform independent code while it makes good use > of the asynchronous i/o available on each particular system. Async-i/o > is one area where Windows has made advances over posix. But the models > are different, and if one uses Windows' i/o completion as if it were > posix poll/select, it works poorly. Running well on both types of > systems was a major challenge. How would you compare using the new asyncio module to using gevent? It seems like they do pretty much the same thing. Assume, for the moment, that gevent runs on Python 3.x (which I assume it will, eventually). -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On 6/2/2014 1:40 PM, Aseem Bansal wrote: The following supplement Ian's answer. I read in these groups that asyncio is a great addition to Python 3. I have looked around and saw the related PEP which is quite big BTW but couldn't find a simple explanation for why this is such a great addition. Any simple example where it can be used? asyncio replaces the very old asyncore, which has problems, is beyond fixing due to its design, and is now deprecated. So look up used for asyncore. You could think of asyncio as a lightweight version or core of other async packages, such as Twisted or Tornado. What are they good for. I admit that you should now have to answer the question so indirectly. One generic answer: carry on 'simultaneous' conversions with multiple external systems. asyncio lets you write platform independent code while it makes good use of the asynchronous i/o available on each particular system. Async-i/o is one area where Windows has made advances over posix. But the models are different, and if one uses Windows' i/o completion as if it were posix poll/select, it works poorly. Running well on both types of systems was a major challenge. It can be used to have a queue of tasks? Try set of tasks, as the sequencing may depend on external response times. Like threads? Maybe light weight threads? Try light-weight thread, manages by Python instead of the OS. I believe greenlets are a somewhat similar example. Those were my thoughts but the library reference clearly stated that this is single-threaded. Meaning, asyncio itself only uses one os thread. The application, or individual tasks, can still spin off other os threads, perhaps for a long computation. > So there should be some waiting time in between the tasks. I do not understand this. asyncio should switch between tasks faster than the OS switches between threads, thus reducing waiting time. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Benefits of asyncio
On Mon, Jun 2, 2014 at 11:40 AM, Aseem Bansal wrote: > I read in these groups that asyncio is a great addition to Python 3. I have > looked around and saw the related PEP which is quite big BTW but couldn't > find a simple explanation for why this is such a great addition. Any simple > example where it can be used? > > It can be used to have a queue of tasks? Like threads? Maybe light weight > threads? Those were my thoughts but the library reference clearly stated that > this is single-threaded. So there should be some waiting time in between the > tasks. Then what is good? > > These are just jumbled thoughts that came into my mind while trying to make > sense of usefulness of asyncio. Anyone can give a better idea? You're right, neither the PEP nor the docs to much to motivate the module's existence. I suggest you start here: http://en.wikipedia.org/wiki/Asynchronous_I/O The asynchronous model lets you initiate a task (typically an I/O task) that would normally block, and then go on to do other things (like initiating more tasks) while waiting on that task, without having to resort to multiple threads or processes (which have the disadvantages of consuming more system resources as well as introducing the risk of race conditions and deadlocks). It does this by using callbacks; when a task is complete, a callback is called that handles its completion. Often in asynchronous code you end up with large networks of callbacks that can be confusing to follow and debug because nothing ever gets called directly. One of the significant features of the asyncio module is that it allows asynchronous programming using coroutines, where the callbacks are abstracted away and essentially have the effect of resuming the coroutine when the task completes. Thus you end up writing code that looks a lot like threaded, sequential code with none of the pitfalls. -- https://mail.python.org/mailman/listinfo/python-list
Benefits of asyncio
I read in these groups that asyncio is a great addition to Python 3. I have looked around and saw the related PEP which is quite big BTW but couldn't find a simple explanation for why this is such a great addition. Any simple example where it can be used? It can be used to have a queue of tasks? Like threads? Maybe light weight threads? Those were my thoughts but the library reference clearly stated that this is single-threaded. So there should be some waiting time in between the tasks. Then what is good? These are just jumbled thoughts that came into my mind while trying to make sense of usefulness of asyncio. Anyone can give a better idea? -- https://mail.python.org/mailman/listinfo/python-list