On Sat, Nov 7, 2009 at 7:32 AM, Jesse Noller <jnol...@gmail.com> wrote: > On Sat, Nov 7, 2009 at 10:21 AM, Antoine Pitrou <solip...@pitrou.net> wrote: >> >>> Which API? My comment wasn't aimed at the API of the package - in the >>> time I got to scan it last night nothing jumped out at me as overly >>> offensive API-wise. >> >> Not offensive, but probably too complicated if it's meant to be a simple >> helper. Anyway, let's wait for the PEP. > > > The PEP is right here: > > http://code.google.com/p/pythonfutures/source/browse/trunk/PEP.txt > > I'm interested in hearing specific complaints about the API in the > context of what it's trying to *do*. The only thing which jumped out > at me was the number of methods on FutureList; but then again, each > one of those makes conceptual sense, even if they are verbose - > they're explicit on what's being done.
Overall, I like the idea of having futures in the standard library, and I like the idea of pulling common bits of multiprocessing and threading into a concurrent.* package. Here's my stream-of-consciousness review of the PEP. I'll try to ** things that really affect the API. The "Interface" section should start with a conceptual description of what Executor, Future, and FutureList are. Something like "An Executor is an object you can hand tasks to, which will run them for you, usually in another thread or process. A Future represents a task that may or may not have completed yet, and which can be waited for and its value or exception queries. A FutureList is ... <haven't read that far>." ** The Executor interface is pretty redundant, and it's missing the most basic call. Fundamentally, all you need is an Executor.execute(callable) method returning None, and all the future-oriented methods can be built on top of that. I'd support using Executor.submit(callable) for the simplest method instead, which returns a Future, but there should be some way for implementers to only implement execute() and get submit either for free or with a 1-line definition. (I'm using method names from http://java.sun.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html in case I'm unclear about the semantics here.) run_to_futures, run_to_results, and map should be implementable on top of the Future interface, and shouldn't need to be methods on Executor. I'd recommend they be demoted to helper functions in the concurrent.futures module unless there's a reason they need to be methods, and that reason should be documented in the PEP. ** run_to_futures() shouldn't take a return_when argument. It should be possible to wait for those conditions on any list of Futures. (_not_ just a FutureList) The code sample looks like Executor is a context manager. What does its __exit__ do? shutdown()? shutdown&awaitTermination? I prefer waiting in Executor.__exit__, since that makes it easier for users to avoid having tasks run after they've cleaned up data those tasks depend on. But that could be my C++ bias, where we have to be sure to free memory in the right places. Frank, does Java run into any problems with people cleaning things up that an Executor's tasks depend on without awaiting for the Executor first? shutdown should explain why it's important. Specifically, since the Executor controls threads, and those threads hold a reference to the Executor, nothing will get garbage collected without the explicit call. ** What happens when FutureList.wait(FIRST_COMPLETED) is called twice? Does it return immediately the second time? Does it wait for the second task to finish? I'm inclined to think that FutureList should go away and be replaced by functions that just take lists of Futures. In general, I think the has_done_futures(), exception_futures(), etc. are fine even though their results may be out of date by the time you inspect them. That's because any individual Future goes monotonically from not-started->running->(exception|value), so users can take advantage of even an out-of-date done_futures() result. However, it's dangerous to have several query functions, since users may think that running_futures() `union` done_futures() `union` cancelled_futures() covers the whole FutureList, but instead a Future can move between two of the sets between two of those calls. Instead, perhaps an atomic partition() function would be better, which returns a collection of sub-lists that cover the whole original set. I would rename result() to get() (or maybe Antoine's suggestion of __call__) to match Java. I'm not sure exception() needs to exist. --- More general points --- ** Java's Futures made a mistake in not supporting work stealing, and this has caused deadlocks at Google. Specifically, in a bounded-size thread or process pool, when a task in the pool can wait for work running in the same pool, you can fill up the pool with tasks that are waiting for tasks that haven't started running yet. To avoid this, Future.get() should be able to steal the task it's waiting on out of the pool's queue and run it immediately. ** I think both the Future-oriented blocking model and the callback-based model Deferreds support are important for different situations. Futures tend to be easier to program with, while Deferreds use fewer threads and can have more predictable latencies. It should be possible to create a Future from a Deferred or a Deferred from a Future without taking up a thread. _______________________________________________ stdlib-sig mailing list stdlib-sig@python.org http://mail.python.org/mailman/listinfo/stdlib-sig