On Nov 8, 2009, at 7:01 PM, Jeffrey Yasskin wrote:

Did you mean to drop the list? Feel free to cc them back in when you reply.

No, that was a brain malfunction. Redirecting the discussion to the list.

On Sat, Nov 7, 2009 at 3:31 PM, Brian Quinlan <br...@sweetapp.com> wrote:

On 8 Nov 2009, at 06:37, Jeffrey Yasskin wrote:

On Sat, Nov 7, 2009 at 7:32 AM, Jesse Noller <jnol...@gmail.com> wrote:

On Sat, Nov 7, 2009 at 10:21 AM, Antoine Pitrou <solip...@pitrou.net >
wrote:

Which API? My comment wasn't aimed at the API of the package - in the time I got to scan it last night nothing jumped out at me as overly
offensive API-wise.

Not offensive, but probably too complicated if it's meant to be a simple
helper. Anyway, let's wait for the PEP.


The PEP is right here:

http://code.google.com/p/pythonfutures/source/browse/trunk/PEP.txt

I'm interested in hearing specific complaints about the API in the
context of what it's trying to *do*. The only thing which jumped out
at me was the number of methods on FutureList; but then again, each
one of those makes conceptual sense, even if they are verbose -
they're explicit on what's being done.

Overall, I like the idea of having futures in the standard library,
and I like the idea of pulling common bits of multiprocessing and
threading into a concurrent.* package. Here's my
stream-of-consciousness review of the PEP. I'll try to ** things that
really affect the API.

The "Interface" section should start with a conceptual description of what Executor, Future, and FutureList are. Something like "An Executor
is an object you can hand tasks to, which will run them for you,
usually in another thread or process. A Future represents a task that may or may not have completed yet, and which can be waited for and its
value or exception queries. A FutureList is ... <haven't read that
far>."

** The Executor interface is pretty redundant, and it's missing the
most basic call. Fundamentally, all you need is an
Executor.execute(callable) method returning None,

How do you extract the results?

To implement submit in terms of execute, you write something like:

def submit(executor, callable):
 future = Future()
 def worker():
   try:
     result = callable()
   except:
     future.set_exception(sys.exc_info())
   else:
     future.set_value(result)
 executor.execute(worker)
 return future

I see. I'm not sure if that abstraction is useful but I get it now.

and all the
future-oriented methods can be built on top of that. I'd support using
Executor.submit(callable) for the simplest method instead, which
returns a Future, but there should be some way for implementers to
only implement execute() and get submit either for free or with a
1-line definition. (I'm using method names from

http://java.sun.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
in case I'm unclear about the semantics here.) run_to_futures,
run_to_results, and map should be implementable on top of the Future
interface, and shouldn't need to be methods on Executor. I'd recommend
they be demoted to helper functions in the concurrent.futures module
unless there's a reason they need to be methods, and that reason
should be documented in the PEP.

** run_to_futures() shouldn't take a return_when argument. It should
be possible to wait for those conditions on any list of Futures.
(_not_ just a FutureList)

I packaged up Futures into FutureLists to fix an annoyance that I have with
the Java implementation - you have all of these Future objects but no
convenient way of operating over them.

Yep, I totally agree with that annoyance. Note, though, that Java has
the CompletionService to support nearly same use cases as
run_to_futures.

CompletionService's use case is handling results as they finish (just like the callbacks do in Deferreds).

The FutureList use case is querying e.g. which callables raised, which returned, which are still running?

I made the FutureList the unit of waiting because:
1. I couldn't think of a use case where this wasn't sufficient

Take your webcrawl example. In a couple years, when Futures are widely
accepted, it's quite possible that urllib.request.urlopen() will
return a Future instead of a file. Then I'd like to request a bunch of
URLs and process each as they come back. With the run_to_futures (or
CompletionService) API, urllib would instead have to take a set of
requests to open at once, which makes its API much harder to design.
With a wait-for-any function, urllib could continue to return a single
Future and let its users combine several results.

If we go down this road then we should just switch to Twisted :-)

Seriously, the idea is that no one would ever change their API to accommodate futures - they are a way of making a library with no notion of concurrency concurrent.

But I am starting to be convinced that individual futures are a good idea because it makes the run/submit method easier to use.

Alternately, say you have an RPC system returning Futures. You've sent
off RPCs A, B, and C. Now you need two separate subsystems D and E to
do something with the results, except that D can continue when either
A or B finishes, but E can continue when either B or C finishes. Can D
and E express just what they need to express, or do they have to deal
with futures they don't really care about?

2. It makes the internal locking semantics a bit easier and faster (if you can wait on any future then the wait has to acquire a lock for every future [in a consistent order to prevent deadlocks when other threads are doing the same thing with an intersecting set of futures], add a result listener for
each and then great some sort of object to aggregate their state)

Yep. I suspect the extra overhead isn't significant compared to the
cost of scheduling threads.

But I am thinking that maybe FutureLists aren't the right abstraction.

The code sample looks like Executor is a context manager. What does
its __exit__ do? shutdown()? shutdown&awaitTermination? I prefer
waiting in Executor.__exit__, since that makes it easier for users to
avoid having tasks run after they've cleaned up data those tasks
depend on. But that could be my C++ bias, where we have to be sure to
free memory in the right places. Frank, does Java run into any
problems with people cleaning things up that an Executor's tasks
depend on without awaiting for the Executor first?

shutdown should explain why it's important. Specifically, since the
Executor controls threads, and those threads hold a reference to the
Executor, nothing will get garbage collected without the explicit
call.

Actually, the threads hold a weakref to the Executor so they can exit (when the work queue is empty) if the Executor is collected. Here is the code from
futures/thread.py:

 while True:
   try:
       work_item = work_queue.get(block=True, timeout=0.1)
   except queue.Empty:
       executor = executor_reference()
       # Exit if:
       #   - The interpreter is shutting down OR
       #   - The executor that owns the worker has been collected OR
       #   - The executor that owns the worker has been shutdown.
       if _shutdown or executor is None or executor._shutdown:
           return

Oh, yeah, that sounds like it might work. So why does shutdown exist?

It does work - there are tests and everything :-)

.shutdown exists for the same reason that .close exists on files:
- Python does not guarantee any particular GC strategy
- tracebacks and other objects may retain a reference in an unexpected way
- sometimes you want to free your resources before the function exits


** What happens when FutureList.wait(FIRST_COMPLETED) is called twice?
Does it return immediately the second time? Does it wait for the
second task to finish? I'm inclined to think that FutureList should go
away and be replaced by functions that just take lists of Futures.

It waits until a new future is completed.

That seems confusing, since it's no longer the "FIRST" completed.

Maybe "NEXT_COMPLETED" would be better.

Cheers,
Brian


In general, I think the has_done_futures(), exception_futures(), etc. are fine even though their results may be out of date by the time you inspect them. That's because any individual Future goes monotonically
from not-started->running->(exception|value), so users can take
advantage of even an out-of-date done_futures() result. However, it's dangerous to have several query functions, since users may think that
running_futures() `union` done_futures() `union` cancelled_futures()
covers the whole FutureList, but instead a Future can move between two
of the sets between two of those calls. Instead, perhaps an atomic
partition() function would be better, which returns a collection of
sub-lists that cover the whole original set.

I would rename result() to get() (or maybe Antoine's suggestion of
__call__) to match Java. I'm not sure exception() needs to exist.

--- More general points ---

** Java's Futures made a mistake in not supporting work stealing, and
this has caused deadlocks at Google. Specifically, in a bounded-size
thread or process pool, when a task in the pool can wait for work
running in the same pool, you can fill up the pool with tasks that are
waiting for tasks that haven't started running yet. To avoid this,
Future.get() should be able to steal the task it's waiting on out of
the pool's queue and run it immediately.

** I think both the Future-oriented blocking model and the
callback-based model Deferreds support are important for different
situations. Futures tend to be easier to program with, while Deferreds
use fewer threads and can have more predictable latencies. It should
be possible to create a Future from a Deferred or a Deferred from a
Future without taking up a thread.

_______________________________________________
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig

Reply via email to