On 16 Jan 2010, at 00:56, Anh Hai Trinh wrote:

I'm not sure that I'd agree with the simpler API part though :-)

I was referring to your old API. Still, we are both obviously very
biased here :-p

For sure. I'm definitely used to looking at Future-style code so I find the model intuitive.

Does ThreadPool using some
sort of balancing strategy if poolsize where set to < len(URLs)?

Yes, of course! Otherwise it wouldn't really qualify as a pool.

"retrieve" seems to take multiple url arguments.

Correct. `retrieve` is simply a generator that retrieve URLs
sequentially, the ThreadPool distributes the input stream so that each
workers get an iterator over its work load.

That's a neat idea - it saves you the overhead of a function call.

If delicate job control is necessary, an Executor can be used. It is
implemented on top of the pool, and offers submit(*items) which
returns job ids to be used for cancel() and status().  Jobs can be
submitted and canceled concurrently.

What type is each "item" supposed to be?

Whatever your iterator-processing function is supposed to process.
The URLs example can be written using an Executor as:

e = Executor(ThreadPool, retrieve)
e.submit(*URLs)
e.close()
print list(e.result)

There are two common scenarios where I have seen Future-like things used: 1. Do the same operation on different data e.g. copy some local files to several remote servers 2. Do several different operations on different data e.g. parallelizing code like this:

db = setup_database(host, port)
data = parse_big_xml_file(request.body)
save_data_in_db(data, db)

I'm trying to get a handle on how streams accommodates the second case. With futures, I would write something like this:

db_future = executor.submit(setup_database, host, port)
data_future = executor.submit(parse_big_xml_file, data)
# Maybe do something else here.
wait(
    [db_future, data_future],
    timeout=10,
    # If either function raises then we can't complete the operation so
    # there is no reason to make the user wait.
    return_when=FIRST_EXCEPTION)

db = db_future.result(timeout=0)
data = data.result(timeout=0)
save_data_in_db(data, db)

Cheers,
Brian


Can I wait on several items?

Do you mean wait for several particular input values to be completed?
As of this moment, yes but rather inefficiently. I have not considered
it is a useful feature, especially when taking a wholesale,
list-processing view: that a worker pool process its input stream
_out_of_order_.  If you just want to wait for several particular
items, it means you need their outputs _in_order_, so why do you want
to use a worker pool in the first place?

However, I'd be happy to implement something like
Executor.submit(*items, wait=True).

Cheers,
aht
_______________________________________________
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig

_______________________________________________
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig

Reply via email to