On 16 Jan 2010, at 00:56, Anh Hai Trinh wrote:
I'm not sure that I'd agree with the simpler API part though :-)
I was referring to your old API. Still, we are both obviously very
biased here :-p
For sure. I'm definitely used to looking at Future-style code so I
find the model intuitive.
Does ThreadPool using some
sort of balancing strategy if poolsize where set to < len(URLs)?
Yes, of course! Otherwise it wouldn't really qualify as a pool.
"retrieve" seems to take multiple url arguments.
Correct. `retrieve` is simply a generator that retrieve URLs
sequentially, the ThreadPool distributes the input stream so that each
workers get an iterator over its work load.
That's a neat idea - it saves you the overhead of a function call.
If delicate job control is necessary, an Executor can be used. It is
implemented on top of the pool, and offers submit(*items) which
returns job ids to be used for cancel() and status(). Jobs can be
submitted and canceled concurrently.
What type is each "item" supposed to be?
Whatever your iterator-processing function is supposed to process.
The URLs example can be written using an Executor as:
e = Executor(ThreadPool, retrieve)
e.submit(*URLs)
e.close()
print list(e.result)
There are two common scenarios where I have seen Future-like things
used:
1. Do the same operation on different data e.g. copy some local files
to several remote servers
2. Do several different operations on different data e.g.
parallelizing code like this:
db = setup_database(host, port)
data = parse_big_xml_file(request.body)
save_data_in_db(data, db)
I'm trying to get a handle on how streams accommodates the second
case. With futures, I would write something like this:
db_future = executor.submit(setup_database, host, port)
data_future = executor.submit(parse_big_xml_file, data)
# Maybe do something else here.
wait(
[db_future, data_future],
timeout=10,
# If either function raises then we can't complete the operation so
# there is no reason to make the user wait.
return_when=FIRST_EXCEPTION)
db = db_future.result(timeout=0)
data = data.result(timeout=0)
save_data_in_db(data, db)
Cheers,
Brian
Can I wait on several items?
Do you mean wait for several particular input values to be completed?
As of this moment, yes but rather inefficiently. I have not considered
it is a useful feature, especially when taking a wholesale,
list-processing view: that a worker pool process its input stream
_out_of_order_. If you just want to wait for several particular
items, it means you need their outputs _in_order_, so why do you want
to use a worker pool in the first place?
However, I'd be happy to implement something like
Executor.submit(*items, wait=True).
Cheers,
aht
_______________________________________________
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig
_______________________________________________
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig