Paul, Where were you when PEP 3156 was being discussed?
There's probably a very good reason that explains why the current API is "right", but the point is moot -- we have selected an API, we have implemented it, we have released it, and now we should live with it and start using it. --Guido On Thu, May 1, 2014 at 7:14 PM, Paul Sokolovsky <[email protected]> wrote: > Hello, > > On Fri, 25 Apr 2014 21:05:48 +0300 > Paul Sokolovsky <[email protected]> wrote: > > [] > > > > I suspect your expectactions are tainted by the previous knowledge > > > of the threading API, which has a separate Thread.start() method. I > > > > My expectations are "tainted" by: 1) basic programming rule of thumb > > that you first initialize things properly, and then execute them; 2) > > intuitive feeling, and even explicit knowledge, of Python's "explicit > > is better than implicit" principle; 3) acquaintance (cursory, I have > > to admit) with many-year history of using generators/coroutines for > > async cooperative multitasking, and desire to use that using > > standardized API asyncio promotes. > > > > > think it makes _some_ sense that Thread objects do not start the > > > actual thread automatically, since threads are preemptive and prone > > > to race conditions, and you may want to store the Thread object in > > > some data structure _before_ the thread actually begins executing. > > > With asyncio.Task, even if the task is scheduled to be executed, it > > > is guaranteed not to be executed until you reach "yield from" > > > statement, so you have plenty of opportunity to any setup prior to > > > the task executing. > > > > Let's sum up what you're saying here: asyncio Task implementation, by > > relying on internal asyncio implementation details (so, naive users > > who will get fixation on such behavior will fail miserably in other > > contexts), violates "Explicit is better than implicit" principle *just > > because it can* ? > > Ok, I did some (re)reading on the topic, and had some time to think > about it, based on the arguments provided, and here some additional > thoughts and arguments: > > Point #1 > > First of all I probably should have mentioned that my expectations for > coroutine scheduler are set forth by wonderful series on generators and > coroutines by David Beazley. This specific slide give the essence of > it: > > http://www.slideshare.net/dabeaz/a-curious-course-on-coroutines-and-concurrency-5286140/137 > . So, it's possible to write *coroutine* scheduler in such a way that > coroutines do not (and cannot if needed) access the main loop directly. > They communicate with using yield/yield from, which serve the same > purpose as syscall in an OS design. So, knowing that Python offers such > level separation, it added to cognitive dissonance to see that asyncio > not only does not separate object access, it tightly couple even > behavior of Task to a loop. > > Point #2 > > The latest of David' series was presented just at the recent PyCon > 2014: http://www.dabeaz.com/finalgenerator/ . And from slide 43 he > presents step-by-step walkthru on building a concurrent execution > framework, which (un)surprisingly shapes up as having almost the same > API and architecture asyncio. So, it should be fair to say that those > slides are good tutorial on asyncio design for dummies. So, his > framework is very similar to asyncio: it's starts with > callbacks, then switches to coroutines as more adequate representation, > they got wrapped in Task's for bookkeeping, results are represented by > Future's, then it's shown that Task and Future share many traits, so it > makes sense to make to make one subclass of another, etc. > > They are very similar except for one implementation detail: David's > framework doesn't use cooperative multitasking for execution, but > rather a thread pool. You can easily imagine what that means: a started > Task really does start immediately, so if it suddenly starts behind > user's back, there's no time to add callbacks to it later. That's why > David's framework doesn't start Tasks behind user's back, which is > natural solution (like, you don't need to know that it doesn't start > them - it's just default choice). During initial stages of design, > Tasks are kickstarted using a .step() method, later explicit scheduling > function introduced: start_inline_future(), run_inline_future(). > > So, let's step back at overview the situation. > https://docs.python.org/3.4/library/asyncio-task.html#future explicitly > says that asyncio.Future is "almost" compatible with > concurrent.futures.Future. Why "almost"? Apparently because > concurrent.futures.Future has some features depending on concurrent > execution model and specifically underlying thread/process > implementations, which don't map well to cooperative/event loop > execution model. PEP-3156 explicitly mentions that it would be nice to > unify both Futures in the future. > > Certainly, asyncio would learn from such experience and try to provide > API model not relying on particular underlying details which would > hamper compatibility and reuse, yes? No, because what we talk about is > that asyncio (ab)uses the fact that underlying event loop doesn't start > execution immediately, so forcefully schedules a Task a makes user add > important changes to it after it is in active state, which is > backwards from general point of view. > > Point #3 > > Yet another perspective. Ok, after all there's nothing wrong with being > able to schedule a coroutine using a global function - after all, > Point #1 above praises complete separation between coroutines and loop > using a yield. As yield cannot be used outside a function, it's not > so bad idea to provide global function to schedule a coroutine. One > problem here is that "Task" or "async" are not too suggestive names for > a function which performs scheduling. Actually, I have hypothesis why > it's not too plausible to imagine such purpose for them at all. It's > grounded in dichotomization of asyncio API: > > 1. Some operations are expressed as methods of event loop object, e.g. > > loop.run_forever() > loop.call_soon() > > 2. While other are expressed as global functions taking optional loop > parameter: > > asyncio.wait(..., loop=None, ...) > asyncio.sleep(..., loop=None) > > > This API asymmetry is not particularly obvious from first look. The docs > start with description of loop methods, which kind of sets expectations > that all important functions should be available as such, and the rest > are just objects/factory functions, and not normal functions with side > effects, to which category both > > asyncio.Task(..., loop=None) > asyncio.async(..., loop=None) > > should be related (regardless of the actual implementation details, like > the fact that "Task" is implemented as a class). > > How this issue can be solved (besides being clearly described in docs)? > Well, it would help if the module offered just a particular variety of > API. For example, my problem is that I expected all operations to be > available as methods of loop. > > But dropping that and having stuff like: > > asyncio.run_forever(loop=None) > > would work just as well, and probably would just allow for even more > efficient implementation (no need for dummy loop object when we have > "embedded loop" for example). > > Finally, having both models, but offering more complete coverage of > operations in both (with easy-to-understand names) would be good either. > > > > -- > Best regards, > Paul mailto:[email protected] > -- --Guido van Rossum (python.org/~guido)
