Re: [stdlib-sig] futures - a new package for asynchronous execution

Brian Quinlan Fri, 15 Jan 2010 15:06:58 -0800


On 16 Jan 2010, at 00:56, Anh Hai Trinh wrote:

I'm not sure that I'd agree with the simpler API part though :-)


I was referring to your old API. Still, we are both obviously very
biased here :-p

For sure. I'm definitely used to looking at Future-style code so Ifind the model intuitive.

Does ThreadPool using some
sort of balancing strategy if poolsize where set to < len(URLs)?


Yes, of course! Otherwise it wouldn't really qualify as a pool.

"retrieve" seems to take multiple url arguments.


Correct. `retrieve` is simply a generator that retrieve URLs
sequentially, the ThreadPool distributes the input stream so that each
workers get an iterator over its work load.


That's a neat idea - it saves you the overhead of a function call.

If delicate job control is necessary, an Executor can be used. It is
implemented on top of the pool, and offers submit(*items) which
returns job ids to be used for cancel() and status().  Jobs can be
submitted and canceled concurrently.


What type is each "item" supposed to be?


Whatever your iterator-processing function is supposed to process.
The URLs example can be written using an Executor as:

e = Executor(ThreadPool, retrieve)
e.submit(*URLs)
e.close()
print list(e.result)

There are two common scenarios where I have seen Future-like thingsused:1. Do the same operation on different data e.g. copy some local filesto several remote servers2. Do several different operations on different data e.g.parallelizing code like this:


db = setup_database(host, port)
data = parse_big_xml_file(request.body)
save_data_in_db(data, db)

I'm trying to get a handle on how streams accommodates the secondcase. With futures, I would write something like this:


db_future = executor.submit(setup_database, host, port)
data_future = executor.submit(parse_big_xml_file, data)
# Maybe do something else here.
wait(
    [db_future, data_future],
    timeout=10,
    # If either function raises then we can't complete the operation so
    # there is no reason to make the user wait.
    return_when=FIRST_EXCEPTION)

db = db_future.result(timeout=0)
data = data.result(timeout=0)
save_data_in_db(data, db)

Cheers,
Brian

Can I wait on several items?


Do you mean wait for several particular input values to be completed?
As of this moment, yes but rather inefficiently. I have not considered
it is a useful feature, especially when taking a wholesale,
list-processing view: that a worker pool process its input stream
_out_of_order_.  If you just want to wait for several particular
items, it means you need their outputs _in_order_, so why do you want
to use a worker pool in the first place?

However, I'd be happy to implement something like
Executor.submit(*items, wait=True).

Cheers,
aht
_______________________________________________
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


_______________________________________________
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig

Re: [stdlib-sig] futures - a new package for asynchronous execution

Reply via email to