Re: [stdlib-sig] futures - a new package for asynchronous execution

Brian Quinlan Sun, 08 Nov 2009 23:41:38 -0800


On Nov 8, 2009, at 7:01 PM, Jeffrey Yasskin wrote:

Did you mean to drop the list? Feel free to cc them back in when youreply.

No, that was a brain malfunction. Redirecting the discussion to thelist.

On Sat, Nov 7, 2009 at 3:31 PM, Brian Quinlan <br...@sweetapp.com>wrote:
On 8 Nov 2009, at 06:37, Jeffrey Yasskin wrote:
On Sat, Nov 7, 2009 at 7:32 AM, Jesse Noller <jnol...@gmail.com>wrote:
On Sat, Nov 7, 2009 at 10:21 AM, Antoine Pitrou <solip...@pitrou.net>
wrote:
Which API? My comment wasn't aimed at the API of the package -in thetime I got to scan it last night nothing jumped out at me asoverly
offensive API-wise.
Not offensive, but probably too complicated if it's meant to bea simple
helper. Anyway, let's wait for the PEP.
The PEP is right here:

http://code.google.com/p/pythonfutures/source/browse/trunk/PEP.txt

I'm interested in hearing specific complaints about the API in the
context of what it's trying to *do*. The only thing which jumpedout
at me was the number of methods on FutureList; but then again, each
one of those makes conceptual sense, even if they are verbose -
they're explicit on what's being done.
Overall, I like the idea of having futures in the standard library,
and I like the idea of pulling common bits of multiprocessing and
threading into a concurrent.* package. Here's my
stream-of-consciousness review of the PEP. I'll try to ** thingsthat
really affect the API.
The "Interface" section should start with a conceptual descriptionofwhat Executor, Future, and FutureList are. Something like "AnExecutor
is an object you can hand tasks to, which will run them for you,
usually in another thread or process. A Future represents a taskthatmay or may not have completed yet, and which can be waited for andits
value or exception queries. A FutureList is ... <haven't read that
far>."

** The Executor interface is pretty redundant, and it's missing the
most basic call. Fundamentally, all you need is an
Executor.execute(callable) method returning None,
How do you extract the results?
To implement submit in terms of execute, you write something like:

def submit(executor, callable):
 future = Future()
 def worker():
   try:
     result = callable()
   except:
     future.set_exception(sys.exc_info())
   else:
     future.set_value(result)
 executor.execute(worker)
 return future


I see. I'm not sure if that abstraction is useful but I get it now.

and all the

future-oriented methods can be built on top of that. I'd supportusing

Executor.submit(callable) for the simplest method instead, which
returns a Future, but there should be some way for implementers to
only implement execute() and get submit either for free or with a
1-line definition. (I'm using method names from

http://java.sun.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
in case I'm unclear about the semantics here.) run_to_futures,
run_to_results, and map should be implementable on top of the Future

interface, and shouldn't need to be methods on Executor. I'drecommend

they be demoted to helper functions in the concurrent.futures module
unless there's a reason they need to be methods, and that reason
should be documented in the PEP.

** run_to_futures() shouldn't take a return_when argument. It should
be possible to wait for those conditions on any list of Futures.
(_not_ just a FutureList)

I packaged up Futures into FutureLists to fix an annoyance that Ihave with

the Java implementation - you have all of these Future objects but no
convenient way of operating over them.


Yep, I totally agree with that annoyance. Note, though, that Java has
the CompletionService to support nearly same use cases as
run_to_futures.

CompletionService's use case is handling results as they finish (justlike the callbacks do in Deferreds).

The FutureList use case is querying e.g. which callables raised, whichreturned, which are still running?

I made the FutureList the unit of waiting because:
1. I couldn't think of a use case where this wasn't sufficient


Take your webcrawl example. In a couple years, when Futures are widely
accepted, it's quite possible that urllib.request.urlopen() will
return a Future instead of a file. Then I'd like to request a bunch of
URLs and process each as they come back. With the run_to_futures (or
CompletionService) API, urllib would instead have to take a set of
requests to open at once, which makes its API much harder to design.
With a wait-for-any function, urllib could continue to return a single
Future and let its users combine several results.


If we go down this road then we should just switch to Twisted :-)

Seriously, the idea is that no one would ever change their API toaccommodate futures - they are a way of making a library with nonotion of concurrency concurrent.

But I am starting to be convinced that individual futures are a goodidea because it makes the run/submit method easier to use.

Alternately, say you have an RPC system returning Futures. You've sent
off RPCs A, B, and C. Now you need two separate subsystems D and E to
do something with the results, except that D can continue when either
A or B finishes, but E can continue when either B or C finishes. Can D
and E express just what they need to express, or do they have to deal
with futures they don't really care about?

2. It makes the internal locking semantics a bit easier and faster(if youcan wait on any future then the wait has to acquire a lock forevery future[in a consistent order to prevent deadlocks when other threads aredoing thesame thing with an intersecting set of futures], add a resultlistener for
each and then great some sort of object to aggregate their state)


Yep. I suspect the extra overhead isn't significant compared to the
cost of scheduling threads.

But I am thinking that maybe FutureLists aren't the rightabstraction.

The code sample looks like Executor is a context manager. What does
its __exit__ do? shutdown()? shutdown&awaitTermination? I prefer

waiting in Executor.__exit__, since that makes it easier for usersto

avoid having tasks run after they've cleaned up data those tasks

depend on. But that could be my C++ bias, where we have to be sureto

free memory in the right places. Frank, does Java run into any
problems with people cleaning things up that an Executor's tasks
depend on without awaiting for the Executor first?

shutdown should explain why it's important. Specifically, since the
Executor controls threads, and those threads hold a reference to the
Executor, nothing will get garbage collected without the explicit
call.

Actually, the threads hold a weakref to the Executor so they canexit (whenthe work queue is empty) if the Executor is collected. Here is thecode from

futures/thread.py:

 while True:
   try:
       work_item = work_queue.get(block=True, timeout=0.1)
   except queue.Empty:
       executor = executor_reference()
       # Exit if:
       #   - The interpreter is shutting down OR
       #   - The executor that owns the worker has been collected OR
       #   - The executor that owns the worker has been shutdown.
       if _shutdown or executor is None or executor._shutdown:
           return


Oh, yeah, that sounds like it might work. So why does shutdown exist?


It does work - there are tests and everything :-)

.shutdown exists for the same reason that .close exists on files:
- Python does not guarantee any particular GC strategy

- tracebacks and other objects may retain a reference in an unexpectedway

- sometimes you want to free your resources before the function exits

** What happens when FutureList.wait(FIRST_COMPLETED) is calledtwice?
Does it return immediately the second time? Does it wait for the
second task to finish? I'm inclined to think that FutureListshould go
away and be replaced by functions that just take lists of Futures.
It waits until a new future is completed.
That seems confusing, since it's no longer the "FIRST" completed.


Maybe "NEXT_COMPLETED" would be better.

Cheers,
Brian

In general, I think the has_done_futures(), exception_futures(),etc.are fine even though their results may be out of date by the timeyouinspect them. That's because any individual Future goesmonotonically
from not-started->running->(exception|value), so users can take
advantage of even an out-of-date done_futures() result. However,it'sdangerous to have several query functions, since users may thinkthat
running_futures() `union` done_futures() `union` cancelled_futures()
covers the whole FutureList, but instead a Future can move betweentwo
of the sets between two of those calls. Instead, perhaps an atomic
partition() function would be better, which returns a collection of
sub-lists that cover the whole original set.

I would rename result() to get() (or maybe Antoine's suggestion of
__call__) to match Java. I'm not sure exception() needs to exist.

--- More general points ---
** Java's Futures made a mistake in not supporting work stealing,and
this has caused deadlocks at Google. Specifically, in a bounded-size
thread or process pool, when a task in the pool can wait for work
running in the same pool, you can fill up the pool with tasks thatare
waiting for tasks that haven't started running yet. To avoid this,
Future.get() should be able to steal the task it's waiting on out of
the pool's queue and run it immediately.

** I think both the Future-oriented blocking model and the
callback-based model Deferreds support are important for different
situations. Futures tend to be easier to program with, whileDeferreds
use fewer threads and can have more predictable latencies. It should
be possible to create a Future from a Deferred or a Deferred from a
Future without taking up a thread.


_______________________________________________
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig

Re: [stdlib-sig] futures - a new package for asynchronous execution

Reply via email to