Hello, On Fri, 25 Apr 2014 21:05:48 +0300 Paul Sokolovsky <[email protected]> wrote:
[] > > I suspect your expectactions are tainted by the previous knowledge > > of the threading API, which has a separate Thread.start() method. I > > My expectations are "tainted" by: 1) basic programming rule of thumb > that you first initialize things properly, and then execute them; 2) > intuitive feeling, and even explicit knowledge, of Python's "explicit > is better than implicit" principle; 3) acquaintance (cursory, I have > to admit) with many-year history of using generators/coroutines for > async cooperative multitasking, and desire to use that using > standardized API asyncio promotes. > > > think it makes _some_ sense that Thread objects do not start the > > actual thread automatically, since threads are preemptive and prone > > to race conditions, and you may want to store the Thread object in > > some data structure _before_ the thread actually begins executing. > > With asyncio.Task, even if the task is scheduled to be executed, it > > is guaranteed not to be executed until you reach "yield from" > > statement, so you have plenty of opportunity to any setup prior to > > the task executing. > > Let's sum up what you're saying here: asyncio Task implementation, by > relying on internal asyncio implementation details (so, naive users > who will get fixation on such behavior will fail miserably in other > contexts), violates "Explicit is better than implicit" principle *just > because it can* ? Ok, I did some (re)reading on the topic, and had some time to think about it, based on the arguments provided, and here some additional thoughts and arguments: Point #1 First of all I probably should have mentioned that my expectations for coroutine scheduler are set forth by wonderful series on generators and coroutines by David Beazley. This specific slide give the essence of it: http://www.slideshare.net/dabeaz/a-curious-course-on-coroutines-and-concurrency-5286140/137 . So, it's possible to write *coroutine* scheduler in such a way that coroutines do not (and cannot if needed) access the main loop directly. They communicate with using yield/yield from, which serve the same purpose as syscall in an OS design. So, knowing that Python offers such level separation, it added to cognitive dissonance to see that asyncio not only does not separate object access, it tightly couple even behavior of Task to a loop. Point #2 The latest of David' series was presented just at the recent PyCon 2014: http://www.dabeaz.com/finalgenerator/ . And from slide 43 he presents step-by-step walkthru on building a concurrent execution framework, which (un)surprisingly shapes up as having almost the same API and architecture asyncio. So, it should be fair to say that those slides are good tutorial on asyncio design for dummies. So, his framework is very similar to asyncio: it's starts with callbacks, then switches to coroutines as more adequate representation, they got wrapped in Task's for bookkeeping, results are represented by Future's, then it's shown that Task and Future share many traits, so it makes sense to make to make one subclass of another, etc. They are very similar except for one implementation detail: David's framework doesn't use cooperative multitasking for execution, but rather a thread pool. You can easily imagine what that means: a started Task really does start immediately, so if it suddenly starts behind user's back, there's no time to add callbacks to it later. That's why David's framework doesn't start Tasks behind user's back, which is natural solution (like, you don't need to know that it doesn't start them - it's just default choice). During initial stages of design, Tasks are kickstarted using a .step() method, later explicit scheduling function introduced: start_inline_future(), run_inline_future(). So, let's step back at overview the situation. https://docs.python.org/3.4/library/asyncio-task.html#future explicitly says that asyncio.Future is "almost" compatible with concurrent.futures.Future. Why "almost"? Apparently because concurrent.futures.Future has some features depending on concurrent execution model and specifically underlying thread/process implementations, which don't map well to cooperative/event loop execution model. PEP-3156 explicitly mentions that it would be nice to unify both Futures in the future. Certainly, asyncio would learn from such experience and try to provide API model not relying on particular underlying details which would hamper compatibility and reuse, yes? No, because what we talk about is that asyncio (ab)uses the fact that underlying event loop doesn't start execution immediately, so forcefully schedules a Task a makes user add important changes to it after it is in active state, which is backwards from general point of view. Point #3 Yet another perspective. Ok, after all there's nothing wrong with being able to schedule a coroutine using a global function - after all, Point #1 above praises complete separation between coroutines and loop using a yield. As yield cannot be used outside a function, it's not so bad idea to provide global function to schedule a coroutine. One problem here is that "Task" or "async" are not too suggestive names for a function which performs scheduling. Actually, I have hypothesis why it's not too plausible to imagine such purpose for them at all. It's grounded in dichotomization of asyncio API: 1. Some operations are expressed as methods of event loop object, e.g. loop.run_forever() loop.call_soon() 2. While other are expressed as global functions taking optional loop parameter: asyncio.wait(..., loop=None, ...) asyncio.sleep(..., loop=None) This API asymmetry is not particularly obvious from first look. The docs start with description of loop methods, which kind of sets expectations that all important functions should be available as such, and the rest are just objects/factory functions, and not normal functions with side effects, to which category both asyncio.Task(..., loop=None) asyncio.async(..., loop=None) should be related (regardless of the actual implementation details, like the fact that "Task" is implemented as a class). How this issue can be solved (besides being clearly described in docs)? Well, it would help if the module offered just a particular variety of API. For example, my problem is that I expected all operations to be available as methods of loop. But dropping that and having stuff like: asyncio.run_forever(loop=None) would work just as well, and probably would just allow for even more efficient implementation (no need for dummy loop object when we have "embedded loop" for example). Finally, having both models, but offering more complete coverage of operations in both (with easy-to-understand names) would be good either. -- Best regards, Paul mailto:[email protected]
