Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-03-05 Thread Jesse Noller
On Thu, Mar 4, 2010 at 11:18 PM, Guido van Rossum  wrote:
> And yes, go ahead and bring it up on python-dev. Don't bother with
> c.l.py unless you are particularly masochistic.
>
> --Guido

He's proposing a concurrency thingie for python. I think that implies
a certain level of masochism already. :)
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-03-04 Thread Jeffrey Yasskin
I'm not going to be the one jerk holding back this proposal, so go
ahead and submit it to python-dev. I'm not around again until
Saturday, so I won't get a chance to comment until then.

On Thu, Mar 4, 2010 at 8:02 PM, Brian Quinlan  wrote:
>
> On Feb 26, 2010, at 2:26 PM, Jeffrey Yasskin wrote:
>
>> On Thu, Feb 25, 2010 at 7:10 PM, Brian Quinlan  wrote:
>>>
>>> On Feb 26, 2010, at 4:27 AM, Jeffrey Yasskin wrote:

 Heh. If you're going to put that in the pep, at least make it correct
 (sleeping is not synchronization):
>>>
>>> I can't tell if you are joking or not. Was my demonstration of a possible
>>> deadlock scenario really unclear?
>>
>> It's clear; it's just wrong code, even if the futures weren't a cycle.
>> Waiting using sleep in any decently-sized system is guaranteed to
>> cause problems. Yes, this example will work nearly every time
>> (although if you get your load high enough, you'll still see
>> NameErrors), but it's not the kind of thing we should be showing
>> users. (For that matter, communicating between futures using globals
>> is also a bad use of them, but it's not outright broken.)
>
> Hey Jeff,
>
> I'm trying to demonstrate a pattern of executor usage that is likely to lead
> to deadlock.
>
> If, looking at the example, people are clear that this may lead to deadlock
> then I don't think that is necessary to write an example that provably
> always leads to deadlock.
>
> In fact, I think that all of the extra locking code required really
> distracts from the core of the problem being demonstrated.
>
>
 Thanks. I still disagree, and think users are much more likely to be
 surprised by occasional deadlocks due to cycles of executors than they
 are
 about guaranteed deadlocks from cycles of futures, but I don't want to
 be
 the only one holding up the PEP by insisting on this.
>>>
>>> Cycles of futures are not guaranteed to deadlock. Remove the sleeps from
>>> my
>>> example and it will deadlock a small percentage of the time.
>>
>> It only fails to deadlock when it fails to create a cycle of futures.
>>
>> It sounds like Antoine also wants you to either have the threaded
>> futures steal work or detect executor cycles and raise an exception.
>
>
> I really don't like the idea of work stealing.
>
> Do you have a concrete proposal on how to detect cycles?
>
> Cheers,
> Brian
>



-- 
Namasté,
Jeffrey Yasskin
http://jeffrey.yasskin.info/
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-03-04 Thread Guido van Rossum
And yes, go ahead and bring it up on python-dev. Don't bother with
c.l.py unless you are particularly masochistic.

--Guido

On Thu, Mar 4, 2010 at 7:09 PM, Brian Quinlan  wrote:
> Wow, timing is everything - I sent Guido an e-mail asking the same thing <
> 30 seconds ago :-)
>
> Cheers,
> Brian
>
> On Mar 5, 2010, at 2:08 PM, Jesse Noller wrote:
>
>> *mega snip*
>>
>> Jeffrey/Brian/all - Do you think we are ready to move this to the
>> grist mill of python-dev? Or should we hold off until I get off my
>> rump and do the concurrent.* namespace PEP?

-- 
--Guido van Rossum (python.org/~guido)
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-03-04 Thread Brian Quinlan


On Feb 26, 2010, at 2:26 PM, Jeffrey Yasskin wrote:

On Thu, Feb 25, 2010 at 7:10 PM, Brian Quinlan   
wrote:

On Feb 26, 2010, at 4:27 AM, Jeffrey Yasskin wrote:
Heh. If you're going to put that in the pep, at least make it  
correct

(sleeping is not synchronization):


I can't tell if you are joking or not. Was my demonstration of a  
possible

deadlock scenario really unclear?


It's clear; it's just wrong code, even if the futures weren't a cycle.
Waiting using sleep in any decently-sized system is guaranteed to
cause problems. Yes, this example will work nearly every time
(although if you get your load high enough, you'll still see
NameErrors), but it's not the kind of thing we should be showing
users. (For that matter, communicating between futures using globals
is also a bad use of them, but it's not outright broken.)


Hey Jeff,

I'm trying to demonstrate a pattern of executor usage that is likely  
to lead to deadlock.


If, looking at the example, people are clear that this may lead to  
deadlock then I don't think that is necessary to write an example that  
provably always leads to deadlock.


In fact, I think that all of the extra locking code required really  
distracts from the core of the problem being demonstrated.




Thanks. I still disagree, and think users are much more likely to be
surprised by occasional deadlocks due to cycles of executors than  
they are
about guaranteed deadlocks from cycles of futures, but I don't  
want to be

the only one holding up the PEP by insisting on this.


Cycles of futures are not guaranteed to deadlock. Remove the sleeps  
from my

example and it will deadlock a small percentage of the time.


It only fails to deadlock when it fails to create a cycle of futures.

It sounds like Antoine also wants you to either have the threaded
futures steal work or detect executor cycles and raise an exception.



I really don't like the idea of work stealing.

Do you have a concrete proposal on how to detect cycles?

Cheers,
Brian
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-03-04 Thread Jesse Noller
On Thu, Mar 4, 2010 at 10:09 PM, Brian Quinlan  wrote:
> Wow, timing is everything - I sent Guido an e-mail asking the same thing <
> 30 seconds ago :-)
>
> Cheers,
> Brian
>

Well, I'd like to make sure Jeffrey's concerns have been addressed.
Once he's happy, I'm ok with pushing it towards it's inevitable end. I
think the namespacing is secondary to the futures PEP though

jesse
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-03-04 Thread Brian Quinlan
Wow, timing is everything - I sent Guido an e-mail asking the same  
thing < 30 seconds ago :-)


Cheers,
Brian

On Mar 5, 2010, at 2:08 PM, Jesse Noller wrote:


*mega snip*

Jeffrey/Brian/all - Do you think we are ready to move this to the
grist mill of python-dev? Or should we hold off until I get off my
rump and do the concurrent.* namespace PEP?

jesse


___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-03-04 Thread Jesse Noller
*mega snip*

Jeffrey/Brian/all - Do you think we are ready to move this to the
grist mill of python-dev? Or should we hold off until I get off my
rump and do the concurrent.* namespace PEP?

jesse
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-25 Thread Jeffrey Yasskin
On Thu, Feb 25, 2010 at 7:26 PM, Jeffrey Yasskin  wrote:
> On Thu, Feb 25, 2010 at 7:10 PM, Brian Quinlan  wrote:
>> On Feb 26, 2010, at 4:27 AM, Jeffrey Yasskin wrote:
>>> Heh. If you're going to put that in the pep, at least make it correct
>>> (sleeping is not synchronization):
>>
>> I can't tell if you are joking or not. Was my demonstration of a possible
>> deadlock scenario really unclear?
>
> It's clear; it's just wrong code, even if the futures weren't a cycle.
> Waiting using sleep in any decently-sized system is guaranteed to
> cause problems. Yes, this example will work nearly every time
> (although if you get your load high enough, you'll still see
> NameErrors), but it's not the kind of thing we should be showing
> users. (For that matter, communicating between futures using globals
> is also a bad use of them, but it's not outright broken.)
>
>>> Thanks. I still disagree, and think users are much more likely to be
>>> surprised by occasional deadlocks due to cycles of executors than they are
>>> about guaranteed deadlocks from cycles of futures, but I don't want to be
>>> the only one holding up the PEP by insisting on this.
>>
>> Cycles of futures are not guaranteed to deadlock. Remove the sleeps from my
>> example and it will deadlock a small percentage of the time.
>
> It only fails to deadlock when it fails to create a cycle of futures.
>
>
>
> It sounds like Antoine also wants you to either have the threaded
> futures steal work or detect executor cycles and raise an exception.

FWIW, the other way to fix these deadlocks is to write a smarter
thread pool. If the thread pool can notice that it's not using as many
CPUs as it's been told to use, it can start a new thread, which runs
the queued task and resolves the deadlock. It's actually a better
solution in the long run since it also solves the problem with
wait-for-one deadlocking or behaving badly. The problem is that this
is surprisingly hard to get right. Measuring current CPU use is tricky
and non-portable; if you start new threads too aggressively, you can
run out of memory or start thrashing; and if you don't start threads
aggressively enough you hurt performance.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-25 Thread Jeffrey Yasskin
On Thu, Feb 25, 2010 at 7:54 PM, Jesse Noller  wrote:
> I'm on the fence. I took a few minutes to think about this today, and
> my gut says concurrent with a single logical namespace - so:
>
> from concurrent import futures
> futures.ThreadPoolExecutor
>
> And so on. Others might balk at a deeper namespace, but then say we add:
>
> concurrent/
>    futures/
>    pool.py (allows for a process pool, or threadpool)
>    managers.py
>
> And so on. I'm trying to mentally organize things to "be like"
> java.util.concurrent [1] - ideally we could move/consolidate the
> common sugar into this package, and remove the other "stuff" from
> multiprocessing as well. That way multiprocessing can become "just"
> Process and the locking stuff, ala threading, and the rest of the
> other nice-things can be made to work with threads *and* processes ala
> what you've done with futures.

My gut agrees, FWIW.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-25 Thread Jesse Noller
On Thu, Feb 25, 2010 at 10:28 PM, Brian Quinlan  wrote:
>
> On Feb 26, 2010, at 5:49 AM, Jesse Noller wrote:
...
>> Yes; I think this needs to be part of a new "concurrent" package in
>> the stdlib - e.g. concurrent.futures, understanding things within
>> multiprocessing will be put in there shortly, and possibly other
>> things such as a threadpool and other common sugary abstractions.
>
>
> Are you imagining that futures would be subpackage of concurrent with a
> single logical namespace i.e.
>
> concurrent/
>  __init__.py
>  futures/
>    __init__.py
>   threads.py
>   processes.py
>   ...
>
> from concurrent.future import wait
> from concurrent.future import ThreadPoolExecutor
>
> Or should the futures package be merged into the concurrent package i.e.
>
> concurrent/
>  __init__.py
>  futures.py
>  threadpoolexecutor.py (was threads.py)
>  processpoolexecutor.py (as processes.py)
>
> from concurrent.future import wait
> from concurrent.future.threadpoolexecutor import ThreadPoolExecutor
>

I'm on the fence. I took a few minutes to think about this today, and
my gut says concurrent with a single logical namespace - so:

from concurrent import futures
futures.ThreadPoolExecutor

And so on. Others might balk at a deeper namespace, but then say we add:

concurrent/
futures/
pool.py (allows for a process pool, or threadpool)
managers.py

And so on. I'm trying to mentally organize things to "be like"
java.util.concurrent [1] - ideally we could move/consolidate the
common sugar into this package, and remove the other "stuff" from
multiprocessing as well. That way multiprocessing can become "just"
Process and the locking stuff, ala threading, and the rest of the
other nice-things can be made to work with threads *and* processes ala
what you've done with futures.

This is just a thought; I've been thinking about it a lot, but I admit
not having sat down and itemized the things that would live in this
new home. The futures discussion just spurred me to propose the idea
sooner rather than later.

Jesse

[1] 
http://java.sun.com/javase/6/docs/api/java/util/concurrent/package-summary.html
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-25 Thread Brian Quinlan


On Feb 26, 2010, at 5:49 AM, Jesse Noller wrote:

On Thu, Feb 25, 2010 at 12:27 PM, Jeffrey Yasskin  
 wrote:

... snip


In any case, I've updated the docs and PEP to indicate that  
deadlocks are

possible.


Thanks. I still disagree, and think users are much more likely to be
surprised by occasional deadlocks due to cycles of executors than  
they are
about guaranteed deadlocks from cycles of futures, but I don't want  
to be

the only one holding up the PEP by insisting on this.
I think there are places the names could be improved, and Jesse  
probably has
an opinion on exactly where this should go in the package  
hierarchy, but I
think it will make a good addition to the standard library. Thanks  
for

working on it!
Jeffrey


Yes; I think this needs to be part of a new "concurrent" package in
the stdlib - e.g. concurrent.futures, understanding things within
multiprocessing will be put in there shortly, and possibly other
things such as a threadpool and other common sugary abstractions.



Are you imagining that futures would be subpackage of concurrent with  
a single logical namespace i.e.


concurrent/
  __init__.py
  futures/
__init__.py
   threads.py
   processes.py
   ...

from concurrent.future import wait
from concurrent.future import ThreadPoolExecutor

Or should the futures package be merged into the concurrent package i.e.

concurrent/
  __init__.py
  futures.py
  threadpoolexecutor.py (was threads.py)
  processpoolexecutor.py (as processes.py)

from concurrent.future import wait
from concurrent.future.threadpoolexecutor import ThreadPoolExecutor

?


Cheers,
Brian
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-25 Thread Jeffrey Yasskin
On Thu, Feb 25, 2010 at 7:10 PM, Brian Quinlan  wrote:
> On Feb 26, 2010, at 4:27 AM, Jeffrey Yasskin wrote:
>> Heh. If you're going to put that in the pep, at least make it correct
>> (sleeping is not synchronization):
>
> I can't tell if you are joking or not. Was my demonstration of a possible
> deadlock scenario really unclear?

It's clear; it's just wrong code, even if the futures weren't a cycle.
Waiting using sleep in any decently-sized system is guaranteed to
cause problems. Yes, this example will work nearly every time
(although if you get your load high enough, you'll still see
NameErrors), but it's not the kind of thing we should be showing
users. (For that matter, communicating between futures using globals
is also a bad use of them, but it's not outright broken.)

>> Thanks. I still disagree, and think users are much more likely to be
>> surprised by occasional deadlocks due to cycles of executors than they are
>> about guaranteed deadlocks from cycles of futures, but I don't want to be
>> the only one holding up the PEP by insisting on this.
>
> Cycles of futures are not guaranteed to deadlock. Remove the sleeps from my
> example and it will deadlock a small percentage of the time.

It only fails to deadlock when it fails to create a cycle of futures.



It sounds like Antoine also wants you to either have the threaded
futures steal work or detect executor cycles and raise an exception.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-25 Thread Brian Quinlan

On Feb 26, 2010, at 4:27 AM, Jeffrey Yasskin wrote:
On Thu, Feb 25, 2010 at 1:33 AM, Brian Quinlan   
wrote:
Your process pool still relies on future._condition, but I think you  
can just delete that line and everything will still work. This seems  
fine to me. Thanks!


Oops. Fixed. Thanks.

Heh. If you're going to put that in the pep, at least make it  
correct (sleeping is not synchronization):


I can't tell if you are joking or not. Was my demonstration of a  
possible deadlock scenario really unclear?



import threading
condition = threading.Condition(threading.Lock())
a = None
b = None

def wait_on_b():
  with condition:
while b is None:
  condition.wait()
  print(b.result())
  return 5

def wait_on_a():
  with condition:
while a is None:
  condition.wait()
  print(a.result())
  return 6

f = ThreadPoolExecutor(max_workers=2)
with condition:
  a = f.submit(wait_on_b)
  b = f.submit(wait_on_a)
  condition.notifyAll()

In any case, I've updated the docs and PEP to indicate that  
deadlocks are possible.


Thanks. I still disagree, and think users are much more likely to be  
surprised by occasional deadlocks due to cycles of executors than  
they are about guaranteed deadlocks from cycles of futures, but I  
don't want to be the only one holding up the PEP by insisting on this.


Cycles of futures are not guaranteed to deadlock. Remove the sleeps  
from my example and it will deadlock a small percentage of the time.


Cheers,
Brian

I think there are places the names could be improved, and Jesse  
probably has an opinion on exactly where this should go in the  
package hierarchy, but I think it will make a good addition to the  
standard library. Thanks for working on it!


Jeffrey


___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-25 Thread Antoine Pitrou
Le Thu, 25 Feb 2010 20:33:09 +1100,
Brian Quinlan  a écrit :
> 
> In any case, I've updated the docs and PEP to indicate that
> deadlocks are possible.

For the record, I think that potential deadlocks simply by using a
library function (other than locks themselves) are a bad thing. It would
be better if the library either avoided deadlocks, or detected
them and raised an exception instead.

(admittedly, we already have such an issue with the import lock)

Regards

Antoine.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-25 Thread Antoine Pitrou

Hey people, could you strip some quoting when you are replying to each
other's e-mails? It would make following the discussion much easier :)

Regards

Antoine.


Le Thu, 25 Feb 2010 09:27:14 -0800,
Jeffrey Yasskin  a écrit :

> On Thu, Feb 25, 2010 at 1:33 AM, Brian Quinlan 
> wrote:
> 
[snip]
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-25 Thread Jesse Noller
On Thu, Feb 25, 2010 at 12:27 PM, Jeffrey Yasskin  wrote:
... snip
>>
>> In any case, I've updated the docs and PEP to indicate that deadlocks are
>> possible.
>
> Thanks. I still disagree, and think users are much more likely to be
> surprised by occasional deadlocks due to cycles of executors than they are
> about guaranteed deadlocks from cycles of futures, but I don't want to be
> the only one holding up the PEP by insisting on this.
> I think there are places the names could be improved, and Jesse probably has
> an opinion on exactly where this should go in the package hierarchy, but I
> think it will make a good addition to the standard library. Thanks for
> working on it!
> Jeffrey

Yes; I think this needs to be part of a new "concurrent" package in
the stdlib - e.g. concurrent.futures, understanding things within
multiprocessing will be put in there shortly, and possibly other
things such as a threadpool and other common sugary abstractions.

jesse
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-25 Thread Brian Quinlan

The PEP officially lives at:
http://python.org/dev/peps/pep-3148

but this version is the most up-to-date:
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pep-3148.txt


On Feb 24, 2010, at 7:04 AM, Jeffrey Yasskin wrote:

On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan   
wrote:


On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:


Where's the current version of the PEP?


http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt

On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan   
wrote:


On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:




* I'd like users to be able to write Executors besides the simple
ThreadPoolExecutor and ProcessPoolExecutor you already have. To  
enable
that, could you document what the subclassing interface for  
Executor

looks like? that is, what code do user-written Executors need to
include?


I can do that.


I don't think it should include direct access to
future._state like ThreadPoolExecutor uses, if at all possible.


Would it be reasonable to make Future an ABC, make a _Future that  
subclasses
it for internal usage and let other Executor subclasses define  
their own

Futures.


What interface are you proposing for the Future ABC? It'll need to
support wait() and as_completed() from non-library Futures. I  
wouldn't

mind making the type just a duck-type (it probably wouldn't even need
an ABC), although I'd like to give people trying to implement their
own Executors as much help as possible. I'd assumed that giving  
Future

some public hooks would be easier than fixing the wait() interface,
but I could be wrong.


See below.


* Could you specify in what circumstances a pure computational
Future-based program may deadlock? (Ideally, that would be  
"never".)

Your current implementation includes two such deadlocks, for which
I've attached a test.


* Do you want to make calling Executor.shutdown(wait=True) from  
within

the same Executor 1) detect the problem and raise an exception, 2)
deadlock, 3) unspecified behavior, or 4) wait for all other threads
and then let the current one continue?


What about a note saying that using any futures functions or  
methods from
inside a scheduled call is likely to lead to deadlock unless care  
is taken?


Jesse pointed out that one of the first things people try to do when
using concurrency libraries is to try to use them inside themselves.
I've also tried to use a futures library that forbade nested use
('cause I wrote it), and it was a real pain.


You can use the API from within Executor-invoked functions - you  
just have to be careful.


It's the job of the PEP (and later the docs) to explain exactly what  
care is needed. Or were you asking if I was ok with adding that  
explanation to the PEP? I think that explanation is the minimum  
requirement (that's what I meant by "Could you specify in what  
circumstances a pure computational
Future-based program may deadlock?"), but it would be better if it  
could never deadlock, which is achievable by stealing work.


I don't think so, see below.


It should be easy enough to detect that the caller of
Executor.shutdown is one of the Executor's threads or processes,  
but I

wouldn't mind making the obviously incorrect "wait for my own
completion" deadlock or throw an exception, and it would make sense  
to

give Executor implementors their choice of which to do.


* This is a nit, but I think that the parameter names for
ThreadPoolExecutor and ProcessPoolExecutor should be the same so
people can parametrize their code on those constructors. Right now
they're "max_threads" and "max_processes", respectively. I might
suggest "max_workers".


I'm not sure that I like that. In general consolidating the  
constructors for

executors is not going to be possible.


In general, yes, but in this case they're the same, and we should try
to avoid gratuitous differences.


num_threads and num_processes is more explicit than num_workers but  
I don't really care so I changed it.


Thanks.


* I'd like users to be able to write Executors besides the simple
ThreadPoolExecutor and ProcessPoolExecutor you already have. To  
enable
that, could you document what the subclassing interface for  
Executor

looks like? that is, what code do user-written Executors need to
include? I don't think it should include direct access to
future._state like ThreadPoolExecutor uses, if at all possible.


One of the difficulties here is:
1. i don't want to commit to the internal implementation of Futures


Yep, that's why to avoid requiring them to have direct access to the
internal variables.

2. it might be hard to make it clear which methods are public to  
users and

which methods are public to executor implementors


One way to do it would be to create another type for implementors and
pass it to the Future constructor.


If we change the future interface like so:

class Future(object):
  # Existing public methods
  ...
  # For executors only
  def set_result(self

Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-23 Thread Guido van Rossum
On Tue, Feb 23, 2010 at 2:36 PM, Brett Cannon  wrote:
>
>
> On Tue, Feb 23, 2010 at 14:13, sstein...@gmail.com 
> wrote:
>>
>> On Feb 23, 2010, at 5:00 PM, Guido van Rossum wrote:
>>
>> > On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin 
>> > wrote:
>> >> On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan 
>> >> wrote:
>> >>>
>> >>> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
>> >>>
>> >>> Where's the current version of the PEP?
>> >>>
>> >>>
>> >>>
>> >>> http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt
>> >
>> > Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/
>>
>> I get a 404 on that URL.
>
> It's because one of the PEPs has become improperly encoded; you can run
> 'make' in a PEPs checkout to trigger the error.

Eh, sorry! Fixed now.

-- 
--Guido van Rossum (python.org/~guido)
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-23 Thread Brett Cannon
On Tue, Feb 23, 2010 at 14:13, sstein...@gmail.com wrote:

>
> On Feb 23, 2010, at 5:00 PM, Guido van Rossum wrote:
>
> > On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin 
> wrote:
> >> On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan 
> wrote:
> >>>
> >>> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
> >>>
> >>> Where's the current version of the PEP?
> >>>
> >>>
> >>>
> http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt
> >
> > Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/
>
> I get a 404 on that URL.
>

It's because one of the PEPs has become improperly encoded; you can run
'make' in a PEPs checkout to trigger the error.

-Brett



>
> S
>
> ___
> stdlib-sig mailing list
> stdlib-sig@python.org
> http://mail.python.org/mailman/listinfo/stdlib-sig
>
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-23 Thread sstein...@gmail.com

On Feb 23, 2010, at 5:00 PM, Guido van Rossum wrote:

> On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin  wrote:
>> On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan  wrote:
>>> 
>>> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
>>> 
>>> Where's the current version of the PEP?
>>> 
>>> 
>>> http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt
> 
> Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/

I get a 404 on that URL.

S

___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-23 Thread Guido van Rossum
On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin  wrote:
> On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan  wrote:
>>
>> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
>>
>> Where's the current version of the PEP?
>>
>>
>> http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt

Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/

-- 
--Guido van Rossum (python.org/~guido)
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-23 Thread Jeffrey Yasskin
On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan  wrote:

>
> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
>
> Where's the current version of the PEP?
>
>
>
> http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt
>
> On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan  wrote:
>
>
> On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
>
>
> * I'd like users to be able to write Executors besides the simple
>
> ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable
>
> that, could you document what the subclassing interface for Executor
>
> looks like? that is, what code do user-written Executors need to
>
> include?
>
>
> I can do that.
>
>
> I don't think it should include direct access to
>
> future._state like ThreadPoolExecutor uses, if at all possible.
>
>
> Would it be reasonable to make Future an ABC, make a _Future that
> subclasses
>
> it for internal usage and let other Executor subclasses define their own
>
> Futures.
>
>
> What interface are you proposing for the Future ABC? It'll need to
> support wait() and as_completed() from non-library Futures. I wouldn't
> mind making the type just a duck-type (it probably wouldn't even need
> an ABC), although I'd like to give people trying to implement their
> own Executors as much help as possible. I'd assumed that giving Future
> some public hooks would be easier than fixing the wait() interface,
> but I could be wrong.
>
>
> See below.
>
> * Could you specify in what circumstances a pure computational
>
> Future-based program may deadlock? (Ideally, that would be "never".)
>
> Your current implementation includes two such deadlocks, for which
>
> I've attached a test.
>
>
> * Do you want to make calling Executor.shutdown(wait=True) from within
>
> the same Executor 1) detect the problem and raise an exception, 2)
>
> deadlock, 3) unspecified behavior, or 4) wait for all other threads
>
> and then let the current one continue?
>
>
> What about a note saying that using any futures functions or methods from
>
> inside a scheduled call is likely to lead to deadlock unless care is taken?
>
>
> Jesse pointed out that one of the first things people try to do when
> using concurrency libraries is to try to use them inside themselves.
> I've also tried to use a futures library that forbade nested use
> ('cause I wrote it), and it was a real pain.
>
>
> You can use the API from within Executor-invoked functions - you just have
> to be careful.
>

It's the job of the PEP (and later the docs) to explain exactly what care is
needed. Or were you asking if I was ok with adding that explanation to the
PEP? I think that explanation is the minimum requirement (that's what I
meant by "Could you specify in what circumstances a pure computational
Future-based program may deadlock?"), but it would be better if it could
never deadlock, which is achievable by stealing work.

> It should be easy enough to detect that the caller of
> Executor.shutdown is one of the Executor's threads or processes, but I
> wouldn't mind making the obviously incorrect "wait for my own
> completion" deadlock or throw an exception, and it would make sense to
> give Executor implementors their choice of which to do.
>
> * This is a nit, but I think that the parameter names for
>
> ThreadPoolExecutor and ProcessPoolExecutor should be the same so
>
> people can parametrize their code on those constructors. Right now
>
> they're "max_threads" and "max_processes", respectively. I might
>
> suggest "max_workers".
>
>
> I'm not sure that I like that. In general consolidating the constructors
> for
>
> executors is not going to be possible.
>
>
> In general, yes, but in this case they're the same, and we should try
> to avoid gratuitous differences.
>
>
> num_threads and num_processes is more explicit than num_workers but I don't
> really care so I changed it.
>
> Thanks.

* I'd like users to be able to write Executors besides the simple
>
> ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable
>
> that, could you document what the subclassing interface for Executor
>
> looks like? that is, what code do user-written Executors need to
>
> include? I don't think it should include direct access to
>
> future._state like ThreadPoolExecutor uses, if at all possible.
>
>
> One of the difficulties here is:
>
> 1. i don't want to commit to the internal implementation of Futures
>
>
> Yep, that's why to avoid requiring them to have direct access to the
> internal variables.
>
> 2. it might be hard to make it clear which methods are public to users and
>
> which methods are public to executor implementors
>
>
> One way to do it would be to create another type for implementors and
> pass it to the Future constructor.
>
>
> If we change the future interface like so:
>
> class Future(object):
>   # Existing public methods
>   ...
>   # For executors only
>   def set_result(self):
> ...
>   def set_exception(self):
> ...
>   def check_cancel_and_notify(self):

Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-23 Thread Brian Quinlan


On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:


Where's the current version of the PEP?


http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt

On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan   
wrote:


On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:


Several comments:

* I see you using the Executors as context managers, but no  
mention in

the specification about what that does.


I can't see such documentation for built-in Python objects. To be
symmetrical with the built-in file object, i've documented the  
context

manager behavior as part of the Executor.shutdown method.


For locks, it has its own section:
http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement
But I don't care too much about the formatting as long as the PEP
specifies it clearly.


Added.


You need to specify it. (Your
current implementation doesn't wait in __exit__, which I think is  
the

opposite of what you agreed with Antoine, but you can fix that after
we get general agreement on the interface.)


Fixed.


* I'd like users to be able to write Executors besides the simple
ThreadPoolExecutor and ProcessPoolExecutor you already have. To  
enable

that, could you document what the subclassing interface for Executor
looks like? that is, what code do user-written Executors need to
include?


I can do that.


I don't think it should include direct access to
future._state like ThreadPoolExecutor uses, if at all possible.


Would it be reasonable to make Future an ABC, make a _Future that  
subclasses
it for internal usage and let other Executor subclasses define  
their own

Futures.


What interface are you proposing for the Future ABC? It'll need to
support wait() and as_completed() from non-library Futures. I wouldn't
mind making the type just a duck-type (it probably wouldn't even need
an ABC), although I'd like to give people trying to implement their
own Executors as much help as possible. I'd assumed that giving Future
some public hooks would be easier than fixing the wait() interface,
but I could be wrong.


See below.


* Could you specify in what circumstances a pure computational
Future-based program may deadlock? (Ideally, that would be "never".)
Your current implementation includes two such deadlocks, for which
I've attached a test.


* Do you want to make calling Executor.shutdown(wait=True) from  
within

the same Executor 1) detect the problem and raise an exception, 2)
deadlock, 3) unspecified behavior, or 4) wait for all other threads
and then let the current one continue?


What about a note saying that using any futures functions or  
methods from
inside a scheduled call is likely to lead to deadlock unless care  
is taken?


Jesse pointed out that one of the first things people try to do when
using concurrency libraries is to try to use them inside themselves.
I've also tried to use a futures library that forbade nested use
('cause I wrote it), and it was a real pain.


You can use the API from within Executor-invoked functions - you just  
have to be careful.



It should be easy enough to detect that the caller of
Executor.shutdown is one of the Executor's threads or processes, but I
wouldn't mind making the obviously incorrect "wait for my own
completion" deadlock or throw an exception, and it would make sense to
give Executor implementors their choice of which to do.


* This is a nit, but I think that the parameter names for
ThreadPoolExecutor and ProcessPoolExecutor should be the same so
people can parametrize their code on those constructors. Right now
they're "max_threads" and "max_processes", respectively. I might
suggest "max_workers".


I'm not sure that I like that. In general consolidating the  
constructors for

executors is not going to be possible.


In general, yes, but in this case they're the same, and we should try
to avoid gratuitous differences.


num_threads and num_processes is more explicit than num_workers but I  
don't really care so I changed it.


* You should document the exception that happens when you try to  
pass

a ProcessPoolExecutor as an argument to a task executing inside
another ProcessPoolExecutor, or make it not throw an exception and
document that.


The ProcessPoolExecutor limitations are the same as the  
multiprocessing
limitations. I can provide a note about that and a link to that  
module's

documentation.


And multiprocessing doesn't document that its Pool requires
picklability and isn't picklable itself. Saying that the
ProcessPoolExecutor is equivalent to a multiprocessing.Pool should be
enough for your PEP.


Done.


* If it's intentional, you should probably document that if one
element of a map() times out, there's no way to come back and wait
longer to retrieve it or later elements.


That's not obvious?


Maybe.


* You still mention run_to_futures, run_to_results, and FutureList,
even though they're no longer proposed.


Done.



* wait() should probably return a named

Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-21 Thread Jeffrey Yasskin
Where's the current version of the PEP?

On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan  wrote:
>
> On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
>
>> Several comments:
>>
>> * I see you using the Executors as context managers, but no mention in
>> the specification about what that does.
>
> I can't see such documentation for built-in Python objects. To be
> symmetrical with the built-in file object, i've documented the context
> manager behavior as part of the Executor.shutdown method.

For locks, it has its own section:
http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement
But I don't care too much about the formatting as long as the PEP
specifies it clearly.

>> You need to specify it. (Your
>> current implementation doesn't wait in __exit__, which I think is the
>> opposite of what you agreed with Antoine, but you can fix that after
>> we get general agreement on the interface.)
>
> Fixed.
>
>> * I'd like users to be able to write Executors besides the simple
>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable
>> that, could you document what the subclassing interface for Executor
>> looks like? that is, what code do user-written Executors need to
>> include?
>
> I can do that.
>
>> I don't think it should include direct access to
>> future._state like ThreadPoolExecutor uses, if at all possible.
>
> Would it be reasonable to make Future an ABC, make a _Future that subclasses
> it for internal usage and let other Executor subclasses define their own
> Futures.

What interface are you proposing for the Future ABC? It'll need to
support wait() and as_completed() from non-library Futures. I wouldn't
mind making the type just a duck-type (it probably wouldn't even need
an ABC), although I'd like to give people trying to implement their
own Executors as much help as possible. I'd assumed that giving Future
some public hooks would be easier than fixing the wait() interface,
but I could be wrong.

>> * Could you specify in what circumstances a pure computational
>> Future-based program may deadlock? (Ideally, that would be "never".)
>> Your current implementation includes two such deadlocks, for which
>> I've attached a test.
>
>> * Do you want to make calling Executor.shutdown(wait=True) from within
>> the same Executor 1) detect the problem and raise an exception, 2)
>> deadlock, 3) unspecified behavior, or 4) wait for all other threads
>> and then let the current one continue?
>
> What about a note saying that using any futures functions or methods from
> inside a scheduled call is likely to lead to deadlock unless care is taken?

Jesse pointed out that one of the first things people try to do when
using concurrency libraries is to try to use them inside themselves.
I've also tried to use a futures library that forbade nested use
('cause I wrote it), and it was a real pain.

It should be easy enough to detect that the caller of
Executor.shutdown is one of the Executor's threads or processes, but I
wouldn't mind making the obviously incorrect "wait for my own
completion" deadlock or throw an exception, and it would make sense to
give Executor implementors their choice of which to do.

>> * This is a nit, but I think that the parameter names for
>> ThreadPoolExecutor and ProcessPoolExecutor should be the same so
>> people can parametrize their code on those constructors. Right now
>> they're "max_threads" and "max_processes", respectively. I might
>> suggest "max_workers".
>
> I'm not sure that I like that. In general consolidating the constructors for
> executors is not going to be possible.

In general, yes, but in this case they're the same, and we should try
to avoid gratuitous differences.

>> * You should document the exception that happens when you try to pass
>> a ProcessPoolExecutor as an argument to a task executing inside
>> another ProcessPoolExecutor, or make it not throw an exception and
>> document that.
>
> The ProcessPoolExecutor limitations are the same as the multiprocessing
> limitations. I can provide a note about that and a link to that module's
> documentation.

And multiprocessing doesn't document that its Pool requires
picklability and isn't picklable itself. Saying that the
ProcessPoolExecutor is equivalent to a multiprocessing.Pool should be
enough for your PEP.

>> * If it's intentional, you should probably document that if one
>> element of a map() times out, there's no way to come back and wait
>> longer to retrieve it or later elements.
>
> That's not obvious?

Maybe.

>> * You still mention run_to_futures, run_to_results, and FutureList,
>> even though they're no longer proposed.
>
> Done.
>
>>
>> * wait() should probably return a named_tuple or an object so we don't
>> have people writing the unreadable "wait(fs)[0]".
>
> Done.
>
>>
>> * Instead of "call finishes" in the description of the return_when
>> parameter, you might describe the behavior in terms of futures
>> becoming done since that's the 

Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-21 Thread Brian Quinlan

A few extra points.

On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:

* I'd like users to be able to write Executors besides the simple
ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable
that, could you document what the subclassing interface for Executor
looks like? that is, what code do user-written Executors need to
include? I don't think it should include direct access to
future._state like ThreadPoolExecutor uses, if at all possible.


One of the difficulties here is:
1. i don't want to commit to the internal implementation of Futures
2. it might be hard to make it clear which methods are public to users  
and which methods are public to executor implementors



* Could you specify in what circumstances a pure computational
Future-based program may deadlock? (Ideally, that would be "never".)
Your current implementation includes two such deadlocks, for which
I've attached a test.


Thanks for the tests but I wasn't planning on changing this behavior.  
I don't really like the idea of using the calling thread to perform  
the wait because:

1. not all executors will be able to implement that behavior
2. it can only be made to work if no wait time is specified

Cheers,
Brian


___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-02-20 Thread Jeffrey Yasskin
Several comments:

* I see you using the Executors as context managers, but no mention in
the specification about what that does. You need to specify it. (Your
current implementation doesn't wait in __exit__, which I think is the
opposite of what you agreed with Antoine, but you can fix that after
we get general agreement on the interface.)

* I'd like users to be able to write Executors besides the simple
ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable
that, could you document what the subclassing interface for Executor
looks like? that is, what code do user-written Executors need to
include? I don't think it should include direct access to
future._state like ThreadPoolExecutor uses, if at all possible.

* Could you specify in what circumstances a pure computational
Future-based program may deadlock? (Ideally, that would be "never".)
Your current implementation includes two such deadlocks, for which
I've attached a test.

* This is a nit, but I think that the parameter names for
ThreadPoolExecutor and ProcessPoolExecutor should be the same so
people can parametrize their code on those constructors. Right now
they're "max_threads" and "max_processes", respectively. I might
suggest "max_workers".

* You should document the exception that happens when you try to pass
a ProcessPoolExecutor as an argument to a task executing inside
another ProcessPoolExecutor, or make it not throw an exception and
document that.

* If it's intentional, you should probably document that if one
element of a map() times out, there's no way to come back and wait
longer to retrieve it or later elements.

* Do you want to make calling Executor.shutdown(wait=True) from within
the same Executor 1) detect the problem and raise an exception, 2)
deadlock, 3) unspecified behavior, or 4) wait for all other threads
and then let the current one continue?

* You still mention run_to_futures, run_to_results, and FutureList,
even though they're no longer proposed.

* wait() should probably return a named_tuple or an object so we don't
have people writing the unreadable "wait(fs)[0]".

* Instead of "call finishes" in the description of the return_when
parameter, you might describe the behavior in terms of futures
becoming done since that's the accessor function you're using.

* Is RETURN_IMMEDIATELY just a way to categorize futures into done and
not? Is that useful over [f for f in fs if f.done()]?

* After shutdown, is RuntimeError the right exception, or should there
be a more specific exception?

Otherwise, looks good. Thanks!

On Fri, Jan 29, 2010 at 2:22 AM, Brian Quinlan  wrote:
> I've updated the PEP and included it inline. The interesting changes start
> in the "Specification" section.
>
> Cheers,
> Brian
>
> PEP:               XXX
> Title:             futures - execute computations asynchronously
> Version:           $Revision$
> Last-Modified:     $Date$
> Author:            Brian Quinlan 
> Status:            Draft
> Type:              Standards Track
> Content-Type:      text/x-rst
> Created:           16-Oct-2009
> Python-Version:    3.2
> Post-History:
>
> 
> Abstract
> 
>
> This PEP proposes a design for a package that facilitates the evaluation of
> callables using threads and processes.
>
> ==
> Motivation
> ==
>
> Python currently has powerful primitives to construct multi-threaded and
> multi-process applications but parallelizing simple operations requires a
> lot of
> work i.e. explicitly launching processes/threads, constructing a
> work/results
> queue, and waiting for completion or some other termination condition (e.g.
> failure, timeout). It is also difficult to design an application with a
> global
> process/thread limit when each component invents its own parallel execution
> strategy.
>
> =
> Specification
> =
>
> Check Prime Example
> ---
>
> ::
>
>    import futures
>    import math
>
>    PRIMES = [
>        112272535095293,
>        112582705942171,
>        112272535095293,
>        115280095190773,
>        115797848077099,
>        1099726899285419]
>
>    def is_prime(n):
>        if n % 2 == 0:
>            return False
>
>        sqrt_n = int(math.floor(math.sqrt(n)))
>        for i in range(3, sqrt_n + 1, 2):
>            if n % i == 0:
>                return False
>        return True
>
>    with futures.ProcessPoolExecutor() as executor:
>        for number, is_prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
>            print('%d is prime: %s' % (number, is_prime))
>
> Web Crawl Example
> -
>
> ::
>
>    import futures
>    import urllib.request
>
>    URLS = ['http://www.foxnews.com/',
>            'http://www.cnn.com/',
>            'http://europe.wsj.com/',
>            'http://www.bbc.co.uk/',
>            'http://some-made-up-domain.com/']
>
>    def load_url(url, timeout):
>        return urllib.request.urlopen(url, timeout=timeout).read()
>
>    with futures.ThreadPoolExecutor(max

Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-28 Thread Brian Quinlan
I've updated the PEP and included it inline. The interesting changes  
start in the "Specification" section.


Cheers,
Brian

PEP:   XXX
Title: futures - execute computations asynchronously
Version:   $Revision$
Last-Modified: $Date$
Author:Brian Quinlan 
Status:Draft
Type:  Standards Track
Content-Type:  text/x-rst
Created:   16-Oct-2009
Python-Version:3.2
Post-History:


Abstract


This PEP proposes a design for a package that facilitates the  
evaluation of

callables using threads and processes.

==
Motivation
==

Python currently has powerful primitives to construct multi-threaded and
multi-process applications but parallelizing simple operations  
requires a lot of
work i.e. explicitly launching processes/threads, constructing a work/ 
results
queue, and waiting for completion or some other termination condition  
(e.g.
failure, timeout). It is also difficult to design an application with  
a global
process/thread limit when each component invents its own parallel  
execution

strategy.

=
Specification
=

Check Prime Example
---

::

import futures
import math

PRIMES = [
112272535095293,
112582705942171,
112272535095293,
115280095190773,
115797848077099,
1099726899285419]

def is_prime(n):
if n % 2 == 0:
return False

sqrt_n = int(math.floor(math.sqrt(n)))
for i in range(3, sqrt_n + 1, 2):
if n % i == 0:
return False
return True

with futures.ProcessPoolExecutor() as executor:
for number, is_prime in zip(PRIMES, executor.map(is_prime,  
PRIMES)):

print('%d is prime: %s' % (number, is_prime))

Web Crawl Example
-

::

import futures
import urllib.request

URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']

def load_url(url, timeout):
return urllib.request.urlopen(url, timeout=timeout).read()

with futures.ThreadPoolExecutor(max_threads=5) as executor:
future_to_url = dict((executor.submit(load_url, url, 60), url)
 for url in URLS)

for future in futures.as_completed(future_to_url):
url = future_to_url[future]
if future.exception() is not None:
print('%r generated an exception: %s' % (url,  
future.exception()))

else:
print('%r page is %d bytes' % (url, len(future.result(

Interface
-

The proposed package provides two core classes: `Executor` and `Future`.
An `Executor` receives asynchronous work requests (in terms of a  
callable and

its arguments) and returns a `Future` to represent the execution of that
work request.

Executor


`Executor` is an abstract class that provides methods to execute calls
asynchronously.

`submit(fn, *args, **kwargs)`

Schedules the callable to be executed as fn(*\*args*, *\*\*kwargs*)  
and returns

a `Future` instance representing the execution of the function.

`map(func, *iterables, timeout=None)`

Equivalent to map(*func*, *\*iterables*) but executed asynchronously and
possibly out-of-order. The returned iterator raises a `TimeoutError` if
`__next__()` is called and the result isn't available after *timeout*  
seconds
from the original call to `run_to_results()`. If *timeout* is not  
specified or
``None`` then there is no limit to the wait time. If a call raises an  
exception

then that exception will be raised when its value is retrieved from the
iterator.

`Executor.shutdown(wait=False)`

Signal the executor that it should free any resources that it is using  
when

the currently pending futures are done executing. Calls to
`Executor.run_to_futures`, `Executor.run_to_results` and
`Executor.map` made after shutdown will raise `RuntimeError`.

If wait is `True` then the executor will not return until all the  
pending
futures are done executing and the resources associated with the  
executor

have been freed.

ProcessPoolExecutor
'''

The `ProcessPoolExecutor` class is an `Executor` subclass that uses a  
pool of

processes to execute calls asynchronously.

`__init__(max_processes)`

Executes calls asynchronously using a pool of a most *max_processes*
processes. If *max_processes* is ``None`` or not given then as many  
worker

processes will be created as the machine has processors.

ThreadPoolExecutor
''

The `ThreadPoolExecutor` class is an `Executor` subclass that uses a  
pool of

threads to execute calls asynchronously.

`__init__(max_threads)`

Executes calls asynchronously using a pool of at most *max_threads*  
threads.


Future Objects
''

The `Future` class encapsulates the asynchronous execution of a fu

Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-16 Thread Brett Cannon
Do you guys mind taking this discussion off-list? As of right now
neither of your projects are old enough or well known enough to be
considered for inclusion in the stdlib at this time so this is really
not relevant to the stdlib SIG to continue here.

-Brett

On Sat, Jan 16, 2010 at 20:53, Anh Hai Trinh  wrote:
> On Sun, Jan 17, 2010 at 5:22 AM, Brian Quinlan  wrote:
>
 db_future = executor.submit(setup_database, host, port)
 data_future = executor.submit(parse_big_xml_file, data)
 # Maybe do something else here.
 wait(
   [db_future, data_future],
   timeout=10,
   # If either function raises then we can't complete the operation so
   # there is no reason to make the user wait.
   return_when=FIRST_EXCEPTION)

 db = db_future.result(timeout=0)
 data = data.result(timeout=0)
 save_data_in_db(data, db)
>>
>> It is definitely true that you can roll your own implementation using
>> threads but the purpose of the futures library is to make that unnecessary.
>>
>> I don't understand your doubts. To me the example that I gave is simple and
>> useful.
>
> What I mean is that your example is simple enough to do with threads. Here:
>
> [...]
>
> def setup_db():
>  nonlocal db;
>  db = setup_database(host, port)
>
> def parse_xml():
>  nonlocal data;
>  data = parse_big_xml(file)
>
> db_thread = threading.Thread(target=setup_db)
> db_thread.start()
>
> parse_thread = threading.Thread(target=parse_xml)
> parse_thread.start()
>
> [...] # Do something else here.
>
> db_thread.join()
> parse_thread.join()
> save_data_in_db(data, db)
>
> I used "nonlocal" here but you usually do this within a method and
> refer to self.db, self.data.
>
>
>> I don't understand your doubts. To me the example that I gave is simple and 
>> useful.
>
> My doubt is about the usefulness of futures' constructs for the kind
> of code that "Do several different operations on different data". I
> think ThreadPool/ProcessPool is really useful when you do
>
> 1. Same operation on different data
> 2. Different operations on same datum
>
> But
>
> 3. Different operations on different data
>
> is perhaps misusing it. It is a too general use case because
> dependency comes into play. What if the different operations depend on
> each other? A useful thread pool construct for this would be at a more
> fundamental level, e.g. Grand Central Dispatch.
>
> Perhaps you would give another example?
>
> Cheers,
> --
> // aht
> http://blog.onideas.ws
> ___
> stdlib-sig mailing list
> stdlib-sig@python.org
> http://mail.python.org/mailman/listinfo/stdlib-sig
>
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-16 Thread Anh Hai Trinh
On Sun, Jan 17, 2010 at 5:22 AM, Brian Quinlan  wrote:

>>> db_future = executor.submit(setup_database, host, port)
>>> data_future = executor.submit(parse_big_xml_file, data)
>>> # Maybe do something else here.
>>> wait(
>>>   [db_future, data_future],
>>>   timeout=10,
>>>   # If either function raises then we can't complete the operation so
>>>   # there is no reason to make the user wait.
>>>   return_when=FIRST_EXCEPTION)
>>>
>>> db = db_future.result(timeout=0)
>>> data = data.result(timeout=0)
>>> save_data_in_db(data, db)
>
> It is definitely true that you can roll your own implementation using
> threads but the purpose of the futures library is to make that unnecessary.
>
> I don't understand your doubts. To me the example that I gave is simple and
> useful.

What I mean is that your example is simple enough to do with threads. Here:

[...]

def setup_db():
  nonlocal db;
  db = setup_database(host, port)

def parse_xml():
  nonlocal data;
  data = parse_big_xml(file)

db_thread = threading.Thread(target=setup_db)
db_thread.start()

parse_thread = threading.Thread(target=parse_xml)
parse_thread.start()

[...] # Do something else here.

db_thread.join()
parse_thread.join()
save_data_in_db(data, db)

I used "nonlocal" here but you usually do this within a method and
refer to self.db, self.data.


> I don't understand your doubts. To me the example that I gave is simple and 
> useful.

My doubt is about the usefulness of futures' constructs for the kind
of code that "Do several different operations on different data". I
think ThreadPool/ProcessPool is really useful when you do

1. Same operation on different data
2. Different operations on same datum

But

3. Different operations on different data

is perhaps misusing it. It is a too general use case because
dependency comes into play. What if the different operations depend on
each other? A useful thread pool construct for this would be at a more
fundamental level, e.g. Grand Central Dispatch.

Perhaps you would give another example?

Cheers,
-- 
// aht
http://blog.onideas.ws
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-16 Thread Jesse Noller
On Sat, Jan 16, 2010 at 5:22 PM, Brian Quinlan  wrote:

> It is definitely true that you can roll your own implementation using
> threads but the purpose of the futures library is to make that unnecessary.

I'd like to stress this; futures/pools/etc are common enough patterns
(and I get requests to add more to multiprocessing) that it makes
sense as an add-on to the stdlib. This is sugar; not magic.

jesse
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-16 Thread Brian Quinlan

On 17 Jan 2010, at 01:44, Anh Hai Trinh wrote:

2. Do several different operations on different data e.g.  
parallelizing code

like this:

db = setup_database(host, port)
data = parse_big_xml_file(request.body)
save_data_in_db(data, db)

I'm trying to get a handle on how streams accommodates the second  
case. With

futures, I would write something like this:

db_future = executor.submit(setup_database, host, port)
data_future = executor.submit(parse_big_xml_file, data)
# Maybe do something else here.
wait(
   [db_future, data_future],
   timeout=10,
   # If either function raises then we can't complete the operation  
so

   # there is no reason to make the user wait.
   return_when=FIRST_EXCEPTION)

db = db_future.result(timeout=0)
data = data.result(timeout=0)
save_data_in_db(data, db)


For this kind of scenario, I feel `futures` and friends are not
needed. My solution is to explicit use different threads for different
operations then use join() thread to wait for a particular operation.
Threading concurrency means memory is shared and thread.join() can be
used to synchronize events.


It is definitely true that you can roll your own implementation using  
threads but the purpose of the futures library is to make that  
unnecessary.



Generally, I would be doubtful about any library that support
parallelization of code that "do several different operations on
different data". One could have put it as "write concurrent programs",
to which the answer must be a complete concurrency model: threading,
multiprocessing, Erlang, Goroutines and CSP channels, etc.


I don't understand your doubts. To me the example that I gave is  
simple and useful.


Cheers,
Brian

___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-16 Thread Anh Hai Trinh
> 2. Do several different operations on different data e.g. parallelizing code
> like this:
>
> db = setup_database(host, port)
> data = parse_big_xml_file(request.body)
> save_data_in_db(data, db)
>
> I'm trying to get a handle on how streams accommodates the second case. With
> futures, I would write something like this:
>
> db_future = executor.submit(setup_database, host, port)
> data_future = executor.submit(parse_big_xml_file, data)
> # Maybe do something else here.
> wait(
>    [db_future, data_future],
>    timeout=10,
>    # If either function raises then we can't complete the operation so
>    # there is no reason to make the user wait.
>    return_when=FIRST_EXCEPTION)
>
> db = db_future.result(timeout=0)
> data = data.result(timeout=0)
> save_data_in_db(data, db)

For this kind of scenario, I feel `futures` and friends are not
needed. My solution is to explicit use different threads for different
operations then use join() thread to wait for a particular operation.
Threading concurrency means memory is shared and thread.join() can be
used to synchronize events.

Generally, I would be doubtful about any library that support
parallelization of code that "do several different operations on
different data". One could have put it as "write concurrent programs",
to which the answer must be a complete concurrency model: threading,
multiprocessing, Erlang, Goroutines and CSP channels, etc.

Cheers,
-- 
// aht
http://blog.onideas.ws
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-15 Thread Brian Quinlan


On 16 Jan 2010, at 00:56, Anh Hai Trinh wrote:


I'm not sure that I'd agree with the simpler API part though :-)


I was referring to your old API. Still, we are both obviously very
biased here :-p


For sure. I'm definitely used to looking at Future-style code so I  
find the model intuitive.



Does ThreadPool using some
sort of balancing strategy if poolsize where set to < len(URLs)?


Yes, of course! Otherwise it wouldn't really qualify as a pool.


"retrieve" seems to take multiple url arguments.


Correct. `retrieve` is simply a generator that retrieve URLs
sequentially, the ThreadPool distributes the input stream so that each
workers get an iterator over its work load.


That's a neat idea - it saves you the overhead of a function call.


If delicate job control is necessary, an Executor can be used. It is
implemented on top of the pool, and offers submit(*items) which
returns job ids to be used for cancel() and status().  Jobs can be
submitted and canceled concurrently.


What type is each "item" supposed to be?


Whatever your iterator-processing function is supposed to process.
The URLs example can be written using an Executor as:

e = Executor(ThreadPool, retrieve)
e.submit(*URLs)
e.close()
print list(e.result)


There are two common scenarios where I have seen Future-like things  
used:
1. Do the same operation on different data e.g. copy some local files  
to several remote servers
2. Do several different operations on different data e.g.  
parallelizing code like this:


db = setup_database(host, port)
data = parse_big_xml_file(request.body)
save_data_in_db(data, db)

I'm trying to get a handle on how streams accommodates the second  
case. With futures, I would write something like this:


db_future = executor.submit(setup_database, host, port)
data_future = executor.submit(parse_big_xml_file, data)
# Maybe do something else here.
wait(
[db_future, data_future],
timeout=10,
# If either function raises then we can't complete the operation so
# there is no reason to make the user wait.
return_when=FIRST_EXCEPTION)

db = db_future.result(timeout=0)
data = data.result(timeout=0)
save_data_in_db(data, db)

Cheers,
Brian




Can I wait on several items?


Do you mean wait for several particular input values to be completed?
As of this moment, yes but rather inefficiently. I have not considered
it is a useful feature, especially when taking a wholesale,
list-processing view: that a worker pool process its input stream
_out_of_order_.  If you just want to wait for several particular
items, it means you need their outputs _in_order_, so why do you want
to use a worker pool in the first place?

However, I'd be happy to implement something like
Executor.submit(*items, wait=True).

Cheers,
aht
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-15 Thread Brett Cannon
On Fri, Jan 15, 2010 at 02:50, Anh Hai Trinh  wrote:
> Hello all,
>
> I'd like to point out an alternative module with respect to
> asynchronous computation: `stream` (which I wrote) supports
> ThreadPool, ProcessPool and Executor with a simpler API and
> implementation.
>
> My module takes a list-processing oriented view in which a
> ThreadPool/ProcessPool is simply a way of working with each stream
> element concurrently and output results possibly in out of order.
>
> A trivial example is:
>
>  from stream import map
>  range(10) >> ThreadPool(map(lambda x: x*x)) >> sum
>  # returns 285

I have not looked at the code at all, but the overloading of binary
shift is not going to be viewed as a good thing. I realize there is an
analogy to C++ streams, but typically Python's stdlib frowns upon
overloading operators like this beyond what a newbie would think an
operator is meant to do.

-Brett

>
>
> The URLs retrieving example is:
>
>  import urllib2
>  from stream import ThreadPool
>
>  URLs = [
>     'http://www.cnn.com/',
>     'http://www.bbc.co.uk/',
>     'http://www.economist.com/',
>     'http://nonexistant.website.at.baddomain/',
>     'http://slashdot.org/',
>     'http://reddit.com/',
>     'http://news.ycombinator.com/',
>  ]
>
>  def retrieve(urls, timeout=10):
>     for url in urls:
>        yield url, urllib2.urlopen(url, timeout=timeout).read()
>
>  if __name__ == '__main__':
>     retrieved = URLs >> ThreadPool(retrieve, poolsize=len(URLs))
>     for url, content in retrieved:
>        print '%r is %d bytes' % (url, len(content))
>     for url, exception in retrieved.failure:
>        print '%r failed: %s' % (url, exception)
>
>
> Note that the main argument to ThreadPool is an iterator-processing
> function: one that takes an iterator and returns an iterator. A
> ThreadPool/Process simply distributes the input to workers running
> such function and gathers their output as a single stream.
>
> One important different between `stream` and `futures` is the order of
> returned results.  The pool object itself is an iterable and the
> returned iterator's `next()` calls unblocks as soon as there is an
> output value.  The order of output is the order of job completion,
> whereas for `futures.run_to_results()`, the order of the returned
> iterator is based on the submitted FutureList --- this means if the
> first item takes a long time to complete, subsequent processing of the
> output can not benefit from other results already available.
>
> The other difference is that there is no absolutely no abstraction but
> two bare iterables for client code to deal with: one iterable over the
> results, and one iterable over the failure; both are thread-safe.
>
> If delicate job control is necessary, an Executor can be used. It is
> implemented on top of the pool, and offers submit(*items) which
> returns job ids to be used for cancel() and status().  Jobs can be
> submitted and canceled concurrently.
>
> The documentation is available at .
>
> The code repository is located at .
> The implementation of ThreadPool, ProcessPool and Executor is little
> more than 300 lines of code.
>
>
> Peace,
>
> --
> // aht
> http://blog.onideas.ws
> ___
> stdlib-sig mailing list
> stdlib-sig@python.org
> http://mail.python.org/mailman/listinfo/stdlib-sig
>
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-15 Thread Anh Hai Trinh
> I'm not sure that I'd agree with the simpler API part though :-)

I was referring to your old API. Still, we are both obviously very
biased here :-p

> Does ThreadPool using some
> sort of balancing strategy if poolsize where set to < len(URLs)?

Yes, of course! Otherwise it wouldn't really qualify as a pool.

> "retrieve" seems to take multiple url arguments.

Correct. `retrieve` is simply a generator that retrieve URLs
sequentially, the ThreadPool distributes the input stream so that each
workers get an iterator over its work load.

>> If delicate job control is necessary, an Executor can be used. It is
>> implemented on top of the pool, and offers submit(*items) which
>> returns job ids to be used for cancel() and status().  Jobs can be
>> submitted and canceled concurrently.
>
> What type is each "item" supposed to be?

Whatever your iterator-processing function is supposed to process.
The URLs example can be written using an Executor as:

 e = Executor(ThreadPool, retrieve)
 e.submit(*URLs)
 e.close()
 print list(e.result)


> Can I wait on several items?

Do you mean wait for several particular input values to be completed?
As of this moment, yes but rather inefficiently. I have not considered
it is a useful feature, especially when taking a wholesale,
list-processing view: that a worker pool process its input stream
_out_of_order_.  If you just want to wait for several particular
items, it means you need their outputs _in_order_, so why do you want
to use a worker pool in the first place?

However, I'd be happy to implement something like
Executor.submit(*items, wait=True).

Cheers,
aht
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-15 Thread Brian Quinlan


On 15 Jan 2010, at 21:50, Anh Hai Trinh wrote:


Hello all,

I'd like to point out an alternative module with respect to
asynchronous computation: `stream` (which I wrote) supports
ThreadPool, ProcessPool and Executor with a simpler API and
implementation.


Neat!

I'm not sure that I'd agree with the simpler API part though :-)


My module takes a list-processing oriented view in which a
ThreadPool/ProcessPool is simply a way of working with each stream
element concurrently and output results possibly in out of order.

A trivial example is:

 from stream import map
 range(10) >> ThreadPool(map(lambda x: x*x)) >> sum
 # returns 285


I think that you are probably missing an import. The equivalent using  
futures would be:


from futures import ThreadPoolExecutor
sum(ThreadPoolExecutor.map(lambda x: x*x, range(10))




The URLs retrieving example is:

 import urllib2
 from stream import ThreadPool

 URLs = [
'http://www.cnn.com/',
'http://www.bbc.co.uk/',
'http://www.economist.com/',
'http://nonexistant.website.at.baddomain/',
'http://slashdot.org/',
'http://reddit.com/',
'http://news.ycombinator.com/',
 ]

 def retrieve(urls, timeout=10):
for url in urls:
   yield url, urllib2.urlopen(url, timeout=timeout).read()

 if __name__ == '__main__':
retrieved = URLs >> ThreadPool(retrieve, poolsize=len(URLs))
for url, content in retrieved:
   print '%r is %d bytes' % (url, len(content))
for url, exception in retrieved.failure:
   print '%r failed: %s' % (url, exception)


Note that the main argument to ThreadPool is an iterator-processing
function: one that takes an iterator and returns an iterator. A
ThreadPool/Process simply distributes the input to workers running
such function and gathers their output as a single stream.


"retrieve" seems to take multiple url arguments. Does ThreadPool using  
some sort of balancing strategy if poolsize where set to < len(URLs)?



One important different between `stream` and `futures` is the order of
returned results.  The pool object itself is an iterable and the
returned iterator's `next()` calls unblocks as soon as there is an
output value.  The order of output is the order of job completion,
whereas for `futures.run_to_results()`, the order of the returned
iterator is based on the submitted FutureList --- this means if the
first item takes a long time to complete, subsequent processing of the
output can not benefit from other results already available.


Right, which is why futures has a as_completed() function. One  
difference is between the two implementations is that streamed  
remembers the arguments that it is processing while futures discards  
them when it doesn't need them. This was done for memory consumption  
reasons but the streamed approach seems to lead to simpler code.




The other difference is that there is no absolutely no abstraction but
two bare iterables for client code to deal with: one iterable over the
results, and one iterable over the failure; both are thread-safe.

If delicate job control is necessary, an Executor can be used. It is
implemented on top of the pool, and offers submit(*items) which
returns job ids to be used for cancel() and status().  Jobs can be
submitted and canceled concurrently.


What type is each "item" supposed to be?

Can I wait on several items? What if they are created by different  
executors?


Cheers,
Brian

The documentation is available at .


The code repository is located at .
The implementation of ThreadPool, ProcessPool and Executor is little
more than 300 lines of code.


Peace,

--
// aht
http://blog.onideas.ws


___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-15 Thread Anh Hai Trinh
Hello all,

I'd like to point out an alternative module with respect to
asynchronous computation: `stream` (which I wrote) supports
ThreadPool, ProcessPool and Executor with a simpler API and
implementation.

My module takes a list-processing oriented view in which a
ThreadPool/ProcessPool is simply a way of working with each stream
element concurrently and output results possibly in out of order.

A trivial example is:

  from stream import map
  range(10) >> ThreadPool(map(lambda x: x*x)) >> sum
  # returns 285


The URLs retrieving example is:

  import urllib2
  from stream import ThreadPool

  URLs = [
 'http://www.cnn.com/',
 'http://www.bbc.co.uk/',
 'http://www.economist.com/',
 'http://nonexistant.website.at.baddomain/',
 'http://slashdot.org/',
 'http://reddit.com/',
 'http://news.ycombinator.com/',
  ]

  def retrieve(urls, timeout=10):
 for url in urls:
yield url, urllib2.urlopen(url, timeout=timeout).read()

  if __name__ == '__main__':
 retrieved = URLs >> ThreadPool(retrieve, poolsize=len(URLs))
 for url, content in retrieved:
print '%r is %d bytes' % (url, len(content))
 for url, exception in retrieved.failure:
print '%r failed: %s' % (url, exception)


Note that the main argument to ThreadPool is an iterator-processing
function: one that takes an iterator and returns an iterator. A
ThreadPool/Process simply distributes the input to workers running
such function and gathers their output as a single stream.

One important different between `stream` and `futures` is the order of
returned results.  The pool object itself is an iterable and the
returned iterator's `next()` calls unblocks as soon as there is an
output value.  The order of output is the order of job completion,
whereas for `futures.run_to_results()`, the order of the returned
iterator is based on the submitted FutureList --- this means if the
first item takes a long time to complete, subsequent processing of the
output can not benefit from other results already available.

The other difference is that there is no absolutely no abstraction but
two bare iterables for client code to deal with: one iterable over the
results, and one iterable over the failure; both are thread-safe.

If delicate job control is necessary, an Executor can be used. It is
implemented on top of the pool, and offers submit(*items) which
returns job ids to be used for cancel() and status().  Jobs can be
submitted and canceled concurrently.

The documentation is available at .

The code repository is located at .
The implementation of ThreadPool, ProcessPool and Executor is little
more than 300 lines of code.


Peace,

-- 
// aht
http://blog.onideas.ws
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2010-01-14 Thread Brian Quinlan

Hi all,

I've updated the implementation based on the feedback that I've  
received and the updated documentation of the API is here:

http://sweetapp.com/futures-pep/

If you are still interested, please take a look and let me know what  
you think..


Cheers,
Brian
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-29 Thread Antoine Pitrou

> It feels hacky. Getting the result doesn't feel so special that it  
> deserves to be a call rather than a simple getter.

Well, it is special since the whole point of a future is to get that
result. Like the whole point of a weakref is to get the underlying
object.
Of course this is pretty much bikeshedding...

> It would be great if other people tested the API. I'm not sure what  
> you mean by "mature" though.

What I mean is that it would be nice if it got reviewed, tested and
criticized by actual users. I have not looked at the implementation
though.

> No there isn't. That's a good point though. I wonder if futures will  
> tend to long-lived after there results are available?

It's hard to tell without anyone actually using them, but for advanced
uses I suspect that futures may become more or less long-lived objects
(like Deferreds are :-)).
In a Web spider example, you could have a bunch of futures representing
pending or completed HTTP fetches, and a worker thread processing the
results on the fly when each of those futures gets ready. If the
processing is non-trivial (or if it involves say a busy database) the
worker thread could get quite a bit behind the completion of HTTP
requests.

Twisted has a whole machinery for that in its specialized "Failure"
class, so as to keep the traceback information in string representation
and at the same time relinquish all references to the frames involved in
the traceback. I'm not sure we need the same degree of sophistication
but we should keep in mind that it's a potential problem.

(actually, perhaps this would deserve built-in support in the
interpreter)

> > The use of "with" here still is counter-intuitive, because it does
> not
> > clean up resources immediately as it would seem to do. "with" is  
> > always
> > synchronous in other situations.
> 
> Maybe waiting until all pending futures are done executing would be  
> better.

I think it would be better indeed. At least it would be more in line
with the other uses of context managers.

Regards

Antoine.


___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-27 Thread Brian Quinlan

Hey Antoine,

Sorry for not getting back to you sooner - I actually thought that I  
did reply but I see now that I didn't.


On 14 Nov 2009, at 01:22, Antoine Pitrou wrote:



Hey,


Future remains unchanged - I disagree that Deferreds would be better,
that .exception() is not useful, and that .result() should be
renamed .get() or .__call__().


On what grounds do you disagree with the latter?


It feels hacky. Getting the result doesn't feel so special that it  
deserves to be a call rather than a simple getter.



Another question: is the caught exception an attribute of the future?


Yes


If
so, is there any mechanism to clean it up (and its traceback) once the
future has been "consumed"?


No there isn't. That's a good point though. I wonder if futures will  
tend to long-lived after there results are available?





map becomes a utility function:

def map(executor, *iterables, timeout=None)


Why? map() can be defined on the ABC, so that subclasses don't have to
provide their own implementation.

An utility function which looks like a method and shadows the name  
of a

built-in looks like a bad choice to me.


Good point.


wait becomes a utility function that can wait on any iterable of
Futures:

def wait(futures, return_when=ALL_COMPLETED)


Does it work if the futures are executed by different executors?
If not, it should be an Executor method.


Yes, it goes.



return_when indicates when the method should return. It must be one  
of

the following constants:

NEXT_COMPLETED
NEXT_EXCEPTION
ALL_COMPLETED


Can you outline the difference between NEXT_COMPLETED and
NEXT_EXCEPTION? What happens if I ask for NEXT_COMPLETED but the next
future to complete raises an exception? etc.


NEXT_COMPLETED includes futures that raise. Completed in this sense  
means "done running".





def itercompleted(futures, timeout=None):

Returns an iterator that returns a completed Future from the given
list when __next__() is called. If no Futures are completed then
__next__() is called then __next__() waits until one does complete.


What about futures which complete with an exception?


They are included.



with futures.ThreadPoolExecutor(50) as executor:
  fs = [executor.submit(load_url, url, timeout=30) for url in URLS]


The use of "with" here still is counter-intuitive, because it does not
clean up resources immediately as it would seem to do. "with" is  
always

synchronous in other situations.


Maybe waiting until all pending futures are done executing would be  
better.





What do you think? Are we moving in the right direction?


Perhaps, yes, but there are still lots of dark areas.

Besides, it's obvious that the package has to mature, and should be
tested by other people.



It would be great if other people tested the API. I'm not sure what  
you mean by "mature" though.


Cheers,
Brian

___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-13 Thread Antoine Pitrou

Hey,

> Future remains unchanged - I disagree that Deferreds would be better,  
> that .exception() is not useful, and that .result() should be  
> renamed .get() or .__call__().

On what grounds do you disagree with the latter?

Another question: is the caught exception an attribute of the future? If
so, is there any mechanism to clean it up (and its traceback) once the
future has been "consumed"?

> map becomes a utility function:
> 
> def map(executor, *iterables, timeout=None)

Why? map() can be defined on the ABC, so that subclasses don't have to
provide their own implementation.

An utility function which looks like a method and shadows the name of a
built-in looks like a bad choice to me.

> wait becomes a utility function that can wait on any iterable of  
> Futures:
> 
> def wait(futures, return_when=ALL_COMPLETED)

Does it work if the futures are executed by different executors?
If not, it should be an Executor method.

> return_when indicates when the method should return. It must be one of  
> the following constants:
> 
>  NEXT_COMPLETED
>  NEXT_EXCEPTION
>  ALL_COMPLETED

Can you outline the difference between NEXT_COMPLETED and
NEXT_EXCEPTION? What happens if I ask for NEXT_COMPLETED but the next
future to complete raises an exception? etc.

> def itercompleted(futures, timeout=None):
> 
> Returns an iterator that returns a completed Future from the given  
> list when __next__() is called. If no Futures are completed then  
> __next__() is called then __next__() waits until one does complete.

What about futures which complete with an exception?

> with futures.ThreadPoolExecutor(50) as executor:
>fs = [executor.submit(load_url, url, timeout=30) for url in URLS]

The use of "with" here still is counter-intuitive, because it does not
clean up resources immediately as it would seem to do. "with" is always
synchronous in other situations.

> What do you think? Are we moving in the right direction?

Perhaps, yes, but there are still lots of dark areas.

Besides, it's obvious that the package has to mature, and should be
tested by other people.


___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-12 Thread Jeffrey Yasskin
On Thu, Nov 12, 2009 at 10:13 PM, Brian Quinlan  wrote:
>
> On Nov 13, 2009, at 4:27 PM, Jeffrey Yasskin wrote:
>
>> On Thu, Nov 12, 2009 at 9:19 PM, Brian Quinlan  wrote:
>>>
>>> On Nov 8, 2009, at 6:37 AM, Jeffrey Yasskin wrote:

 --- More general points ---

 ** Java's Futures made a mistake in not supporting work stealing, and
 this has caused deadlocks at Google. Specifically, in a bounded-size
 thread or process pool, when a task in the pool can wait for work
 running in the same pool, you can fill up the pool with tasks that are
 waiting for tasks that haven't started running yet. To avoid this,
 Future.get() should be able to steal the task it's waiting on out of
 the pool's queue and run it immediately.
>>>
>>> Hey Jeff,
>>>
>>> I understand the deadlock possibilities of the executor model, could you
>>> explain your proposal would work?
>>>
>>> Would it be some sort of flag on the Future.get method e.g.
>>> Future.get(timeout=None, immediate_execution=False)?
>>
>> I don't think a flag is the way to go at first glance, although there
>> could be upsides I haven't thought of. Here's what I had in mind:
>>
>> After I call "fut = executor.submit(task)", the task can be in 3
>> states: queued, running, and finished. The simplest deadlock happens
>> in a 1-thread pool when the running thread calls fut.result(), and the
>> task is queued on the same pool. So instead of just waiting for the
>> task to finish running, the current thread atomically(checks what
>> state it's in, and if it's queued, marks it as stolen instead) and
>> calls it in the current thread. When a stolen task gets to the front
>> of its queue and starts running, it just acts like a no-op.
>>
>> This can't introduce any new lock-order deadlocks, but it can be
>> observable if the task looks at thread-local variables.
>
> So you have something like this:
>
> def Future.result(self, timeout=None):
>  with some_lock:  # would have to think about locking here
>    do_work_locally =  (threading.current_thread in self._my_executor.threads
> and
>        self._my_executor.free_threads == 0 and
>        timeout is None):

You can deadlock from a cycle between multiple pools, too, so it's
probably a bad idea to limit it to only steal if self is one of the
pool's threads, and there's no real reason to limit the stealing to
when there are exactly 0 waiting threads. Depending on the internal
implementation, Future.result() might look something like (untested,
sorry if there are obvious bugs):

class Future:
  def __init__(self, f, args, kwargs):
self.f, self.args, self.kwargs = f, args, kwargs
self.state, self.lock = QUEUED, Executor.Lock()

  def run(self):
with self.lock:
  if self.state != QUEUED: return
  self.state = RUNNING
self._result = self.f(*self.args, **self.kwargs)
with self.lock:
  self.state = DONE
  self.notify()

  def result(self, timeout=None):
if timeout is None:  # Good catch.
  self.run()
self.wait(timeout)
return self._result

> That's pretty clever. Some things that I don't like:
> 1. it might only be applicable to executors using a thread pool so people
>    shouldn't count on it (but maybe only thread pool executors have this
>    deadlock problem so it doesn't matter?)

Process pools have the same deadlock problem, unless it's impossible
for a task in a process pool to hold a reference to the same pool, or
another pool whose tasks have a reference to the first one?

> 2. it makes the implementation of Future dependent on the executor that
>     created it - but maybe that's OK too, Future can be an ABC and
>     executor implementations that need customer Futures can subclass it

It makes some piece dependent on the executor, although not
necessarily the whole Future. For example, the run() method above
could be wrapped into a Task class that only knows how to mark work
stolen and run stuff locally.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-12 Thread Brian Quinlan


On Nov 13, 2009, at 4:27 PM, Jeffrey Yasskin wrote:

On Thu, Nov 12, 2009 at 9:19 PM, Brian Quinlan   
wrote:

On Nov 8, 2009, at 6:37 AM, Jeffrey Yasskin wrote:


--- More general points ---

** Java's Futures made a mistake in not supporting work stealing,  
and

this has caused deadlocks at Google. Specifically, in a bounded-size
thread or process pool, when a task in the pool can wait for work
running in the same pool, you can fill up the pool with tasks that  
are

waiting for tasks that haven't started running yet. To avoid this,
Future.get() should be able to steal the task it's waiting on out of
the pool's queue and run it immediately.


Hey Jeff,

I understand the deadlock possibilities of the executor model,  
could you

explain your proposal would work?

Would it be some sort of flag on the Future.get method e.g.
Future.get(timeout=None, immediate_execution=False)?


I don't think a flag is the way to go at first glance, although there
could be upsides I haven't thought of. Here's what I had in mind:

After I call "fut = executor.submit(task)", the task can be in 3
states: queued, running, and finished. The simplest deadlock happens
in a 1-thread pool when the running thread calls fut.result(), and the
task is queued on the same pool. So instead of just waiting for the
task to finish running, the current thread atomically(checks what
state it's in, and if it's queued, marks it as stolen instead) and
calls it in the current thread. When a stolen task gets to the front
of its queue and starts running, it just acts like a no-op.

This can't introduce any new lock-order deadlocks, but it can be
observable if the task looks at thread-local variables.


So you have something like this:

def Future.result(self, timeout=None):
  with some_lock:  # would have to think about locking here
do_work_locally =  (threading.current_thread in  
self._my_executor.threads and

self._my_executor.free_threads == 0 and
timeout is None):

That's pretty clever. Some things that I don't like:
1. it might only be applicable to executors using a thread pool so  
people
shouldn't count on it (but maybe only thread pool executors have  
this

deadlock problem so it doesn't matter?)
2. it makes the implementation of Future dependent on the executor that
 created it - but maybe that's OK too, Future can be an ABC and
 executor implementations that need customer Futures can subclass  
it


Cheers,
Brian
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-12 Thread Jeffrey Yasskin
On Thu, Nov 12, 2009 at 9:19 PM, Brian Quinlan  wrote:
> On Nov 8, 2009, at 6:37 AM, Jeffrey Yasskin wrote:
>>
>> --- More general points ---
>>
>> ** Java's Futures made a mistake in not supporting work stealing, and
>> this has caused deadlocks at Google. Specifically, in a bounded-size
>> thread or process pool, when a task in the pool can wait for work
>> running in the same pool, you can fill up the pool with tasks that are
>> waiting for tasks that haven't started running yet. To avoid this,
>> Future.get() should be able to steal the task it's waiting on out of
>> the pool's queue and run it immediately.
>
> Hey Jeff,
>
> I understand the deadlock possibilities of the executor model, could you
> explain your proposal would work?
>
> Would it be some sort of flag on the Future.get method e.g.
> Future.get(timeout=None, immediate_execution=False)?

I don't think a flag is the way to go at first glance, although there
could be upsides I haven't thought of. Here's what I had in mind:

After I call "fut = executor.submit(task)", the task can be in 3
states: queued, running, and finished. The simplest deadlock happens
in a 1-thread pool when the running thread calls fut.result(), and the
task is queued on the same pool. So instead of just waiting for the
task to finish running, the current thread atomically(checks what
state it's in, and if it's queued, marks it as stolen instead) and
calls it in the current thread. When a stolen task gets to the front
of its queue and starts running, it just acts like a no-op.

This can't introduce any new lock-order deadlocks, but it can be
observable if the task looks at thread-local variables.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-12 Thread Brian Quinlan

On Nov 8, 2009, at 6:37 AM, Jeffrey Yasskin wrote:

--- More general points ---

** Java's Futures made a mistake in not supporting work stealing, and
this has caused deadlocks at Google. Specifically, in a bounded-size
thread or process pool, when a task in the pool can wait for work
running in the same pool, you can fill up the pool with tasks that are
waiting for tasks that haven't started running yet. To avoid this,
Future.get() should be able to steal the task it's waiting on out of
the pool's queue and run it immediately.


Hey Jeff,

I understand the deadlock possibilities of the executor model, could  
you explain your proposal would work?


Would it be some sort of flag on the Future.get method e.g.  
Future.get(timeout=None, immediate_execution=False)?


Cheers,
Brian
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-12 Thread Jeffrey Yasskin
I am very happy with those changes.

I think deadlock should be addressed before the first release as it
changes the detailed semantics of some of the operations, but you've
promised to do that, so cool. :)

I think it's fine to leave the embedding of Deferred-like things into
futures and the embedding of futures into Deferred-like things until a
later release. I expect it to be requested, but I don't think it'll be
hard to add later.

On Thu, Nov 12, 2009 at 6:38 PM, Brian Quinlan  wrote:
> Hey all,
>
> I compiled a summary of people's feedback (about technical issues - I agree
> that the docs could be better but agreeing on the API seems like the first
> step) and have some API change proposals.
>
> Here is a summary of the feedback:
> - Use Twisted Deferreds rather than Futures
> - The API too complex
> - Make Future a callable and drop the .result()/.exception() methods
> - Remove .wait() from Executor
> - Make it easy to process results in the order of completion rather than in
> the order that the futures were generated
> - Executor context managers should wait until their workers complete before
> exiting
> - Extract Executor.map, etc. into separate functions/modules
> - FutureList has too many methods or is not necessary
> - Executor should have an easy way to produce a single future
> - Should be able to wait on an arbitrary list of futures
> - Should have a way of avoiding deadlock (will follow-up on this separately)
>
> Here is what I suggest as far as API changes (the docs suck, I'll polish
> them when we reach consensus):
>
> FutureList is eliminated completely.
>
> Future remains unchanged - I disagree that Deferreds would be better, that
> .exception() is not useful, and that .result() should be renamed .get() or
> .__call__(). But I am easily persuadable :-)
>
> The Executor ABC is simplified to only contain a single method:
>
> def Executor.submit(self, fn, *args, **kwargs) :
>
> Submits a call for execution and returns a Future representing the pending
> results of fn(*args, **kwargs)
>
> map becomes a utility function:
>
> def map(executor, *iterables, timeout=None)
>
> Equivalent to map(func, *iterables) but executed asynchronously and possibly
> out-of-order. The returned iterator raises a TimeoutError if __next__() is
> called and the result isn’t available after timeout seconds from the
> original call to run_to_results(). If timeout is not specified or None then
> there is no limit to the wait time. If a call raises an exception then that
> exception will be raised when its value is retrieved from the iterator.
>
> wait becomes a utility function that can wait on any iterable of Futures:
>
> def wait(futures, return_when=ALL_COMPLETED)
>
> Wait until the given condition is met for the given futures. This method
> should always be called using keyword arguments, which are:
>
> timeout can be used to control the maximum number of seconds to wait before
> returning. If timeout is not specified or None then there is no limit to the
> wait time.
>
> return_when indicates when the method should return. It must be one of the
> following constants:
>
>    NEXT_COMPLETED
>    NEXT_EXCEPTION
>    ALL_COMPLETED
>
> a new utility function is added that iterates over the given Futures and
> returns the as they are completed:
>
> def itercompleted(futures, timeout=None):
>
> Returns an iterator that returns a completed Future from the given list when
> __next__() is called. If no Futures are completed then __next__() is called
> then __next__() waits until one does complete. Raises a TimeoutError if
> __next__() is called and no completed future is available after timeout
> seconds from the original call.
>
> The URL loading example becomes:
>
> import functools
> import urllib.request
> import futures
>
> URLS = ['http://www.foxnews.com/',
>        'http://www.cnn.com/',
>        'http://europe.wsj.com/',
>        'http://www.bbc.co.uk/',
>        'http://some-made-up-domain.com/']
>
> def load_url(url, timeout):
>    return urllib.request.urlopen(url, timeout=timeout).read()
>
> with futures.ThreadPoolExecutor(50) as executor:
>  fs = [executor.submit(load_url, url, timeout=30) for url in URLS]
>
> for future in futures.itercompleted(fs):
>    if future.exception() is not None:
>        print('%r generated an exception: %s' % (url, future.exception()))
>    else:
>        print('%r page is %d bytes' % (url, len(future.result(
>
> What do you think? Are we moving in the right direction?
>
> Cheers,
> Brian
>
> ___
> stdlib-sig mailing list
> stdlib-sig@python.org
> http://mail.python.org/mailman/listinfo/stdlib-sig
>



-- 
Namasté,
Jeffrey Yasskin
http://jeffrey.yasskin.info/
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-12 Thread Brian Quinlan

Hey all,

I compiled a summary of people's feedback (about technical issues - I  
agree that the docs could be better but agreeing on the API seems like  
the first step) and have some API change proposals.


Here is a summary of the feedback:
- Use Twisted Deferreds rather than Futures
- The API too complex
- Make Future a callable and drop the .result()/.exception() methods
- Remove .wait() from Executor
- Make it easy to process results in the order of completion rather  
than in the order that the futures were generated
- Executor context managers should wait until their workers complete  
before exiting

- Extract Executor.map, etc. into separate functions/modules
- FutureList has too many methods or is not necessary
- Executor should have an easy way to produce a single future
- Should be able to wait on an arbitrary list of futures
- Should have a way of avoiding deadlock (will follow-up on this  
separately)


Here is what I suggest as far as API changes (the docs suck, I'll  
polish them when we reach consensus):


FutureList is eliminated completely.

Future remains unchanged - I disagree that Deferreds would be better,  
that .exception() is not useful, and that .result() should be  
renamed .get() or .__call__(). But I am easily persuadable :-)


The Executor ABC is simplified to only contain a single method:

def Executor.submit(self, fn, *args, **kwargs) :

Submits a call for execution and returns a Future representing the  
pending results of fn(*args, **kwargs)


map becomes a utility function:

def map(executor, *iterables, timeout=None)

Equivalent to map(func, *iterables) but executed asynchronously and  
possibly out-of-order. The returned iterator raises a TimeoutError if  
__next__() is called and the result isn’t available after timeout  
seconds from the original call to run_to_results(). If timeout is not  
specified or None then there is no limit to the wait time. If a call  
raises an exception then that exception will be raised when its value  
is retrieved from the iterator.


wait becomes a utility function that can wait on any iterable of  
Futures:


def wait(futures, return_when=ALL_COMPLETED)

Wait until the given condition is met for the given futures. This  
method should always be called using keyword arguments, which are:


timeout can be used to control the maximum number of seconds to wait  
before returning. If timeout is not specified or None then there is no  
limit to the wait time.


return_when indicates when the method should return. It must be one of  
the following constants:


NEXT_COMPLETED
NEXT_EXCEPTION
ALL_COMPLETED

a new utility function is added that iterates over the given Futures  
and returns the as they are completed:


def itercompleted(futures, timeout=None):

Returns an iterator that returns a completed Future from the given  
list when __next__() is called. If no Futures are completed then  
__next__() is called then __next__() waits until one does complete.  
Raises a TimeoutError if __next__() is called and no completed future  
is available after timeout seconds from the original call.


The URL loading example becomes:

import functools
import urllib.request
import futures

URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']

def load_url(url, timeout):
return urllib.request.urlopen(url, timeout=timeout).read()

with futures.ThreadPoolExecutor(50) as executor:
  fs = [executor.submit(load_url, url, timeout=30) for url in URLS]

for future in futures.itercompleted(fs):
if future.exception() is not None:
print('%r generated an exception: %s' % (url,  
future.exception()))

else:
print('%r page is %d bytes' % (url, len(future.result(

What do you think? Are we moving in the right direction?

Cheers,
Brian

___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-08 Thread Brian Quinlan

Hey everyone,

Thanks for all the great feedback!

I'm going to compile everyone's feedback and then send out a list of  
proposed changes. In the meantime, more discussion is welcome :-)


Cheers,
Brian

___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-08 Thread Brian Quinlan


On Nov 8, 2009, at 7:01 PM, Jeffrey Yasskin wrote:

Did you mean to drop the list? Feel free to cc them back in when you  
reply.


No, that was a brain malfunction. Redirecting the discussion to the  
list.


On Sat, Nov 7, 2009 at 3:31 PM, Brian Quinlan   
wrote:


On 8 Nov 2009, at 06:37, Jeffrey Yasskin wrote:

On Sat, Nov 7, 2009 at 7:32 AM, Jesse Noller   
wrote:


On Sat, Nov 7, 2009 at 10:21 AM, Antoine Pitrou >

wrote:


Which API? My comment wasn't aimed at the API of the package -  
in the
time I got to scan it last night nothing jumped out at me as  
overly

offensive API-wise.


Not offensive, but probably too complicated if it's meant to be  
a simple

helper. Anyway, let's wait for the PEP.



The PEP is right here:

http://code.google.com/p/pythonfutures/source/browse/trunk/PEP.txt

I'm interested in hearing specific complaints about the API in the
context of what it's trying to *do*. The only thing which jumped  
out

at me was the number of methods on FutureList; but then again, each
one of those makes conceptual sense, even if they are verbose -
they're explicit on what's being done.


Overall, I like the idea of having futures in the standard library,
and I like the idea of pulling common bits of multiprocessing and
threading into a concurrent.* package. Here's my
stream-of-consciousness review of the PEP. I'll try to ** things  
that

really affect the API.

The "Interface" section should start with a conceptual description  
of
what Executor, Future, and FutureList are. Something like "An  
Executor

is an object you can hand tasks to, which will run them for you,
usually in another thread or process. A Future represents a task  
that
may or may not have completed yet, and which can be waited for and  
its

value or exception queries. A FutureList is ... ."

** The Executor interface is pretty redundant, and it's missing the
most basic call. Fundamentally, all you need is an
Executor.execute(callable) method returning None,


How do you extract the results?


To implement submit in terms of execute, you write something like:

def submit(executor, callable):
 future = Future()
 def worker():
   try:
 result = callable()
   except:
 future.set_exception(sys.exc_info())
   else:
 future.set_value(result)
 executor.execute(worker)
 return future


I see. I'm not sure if that abstraction is useful but I get it now.


and all the
future-oriented methods can be built on top of that. I'd support  
using

Executor.submit(callable) for the simplest method instead, which
returns a Future, but there should be some way for implementers to
only implement execute() and get submit either for free or with a
1-line definition. (I'm using method names from

http://java.sun.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
in case I'm unclear about the semantics here.) run_to_futures,
run_to_results, and map should be implementable on top of the Future
interface, and shouldn't need to be methods on Executor. I'd  
recommend

they be demoted to helper functions in the concurrent.futures module
unless there's a reason they need to be methods, and that reason
should be documented in the PEP.

** run_to_futures() shouldn't take a return_when argument. It should
be possible to wait for those conditions on any list of Futures.
(_not_ just a FutureList)


I packaged up Futures into FutureLists to fix an annoyance that I  
have with

the Java implementation - you have all of these Future objects but no
convenient way of operating over them.


Yep, I totally agree with that annoyance. Note, though, that Java has
the CompletionService to support nearly same use cases as
run_to_futures.


CompletionService's use case is handling results as they finish (just  
like the callbacks do in Deferreds).


The FutureList use case is querying e.g. which callables raised, which  
returned, which are still running?



I made the FutureList the unit of waiting because:
1. I couldn't think of a use case where this wasn't sufficient


Take your webcrawl example. In a couple years, when Futures are widely
accepted, it's quite possible that urllib.request.urlopen() will
return a Future instead of a file. Then I'd like to request a bunch of
URLs and process each as they come back. With the run_to_futures (or
CompletionService) API, urllib would instead have to take a set of
requests to open at once, which makes its API much harder to design.
With a wait-for-any function, urllib could continue to return a single
Future and let its users combine several results.


If we go down this road then we should just switch to Twisted :-)

Seriously, the idea is that no one would ever change their API to  
accommodate futures - they are a way of making a library with no  
notion of concurrency concurrent.


But I am starting to be convinced that individual futures are a good  
idea because it makes the run/submit method easier to use.



Alternately, say you have an RPC system returning Futures. You've sent
off RPCs 

Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Brett Cannon
On Sat, Nov 7, 2009 at 15:02, Antoine Pitrou  wrote:
>
>> They also have not stepped forward for inclusion. A point needs to be
>> made that Brian has working code and has come forward to contribute
>> it.
>
> I do hope that the code and its API get exercised before being included,
> though. Looking at the Google Code page, the project has only existed
> since May 2009, and there isn't even a mailing-list.
>

That's why we are discussing it now. But as Jesse mentioned, this is
simple stuff that he was already planning to add to multiprocessing,
so it isn't that radical or controversial of an approach.

> Remember, once we release a Python version with that code in it, we
> can't change the API very easily.

As the author of PEP 3108, I am acutely aware of that fact. =)

-Brett
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Antoine Pitrou

> They also have not stepped forward for inclusion. A point needs to be
> made that Brian has working code and has come forward to contribute
> it.

I do hope that the code and its API get exercised before being included,
though. Looking at the Google Code page, the project has only existed
since May 2009, and there isn't even a mailing-list.

Remember, once we release a Python version with that code in it, we
can't change the API very easily.


___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Brett Cannon
On Sat, Nov 7, 2009 at 11:36, Jesse Noller  wrote:
>
>
> On Nov 7, 2009, at 2:19 PM, Laura Creighton  wrote:
>
>> Anybody here looked at Kamaelia? http://www.kamaelia.org/
>> I haven't, but wonder if we are considering re-inventing the
>> wheel.
>>
>> Laura
>
> I have, and no - it's not what we're discussing. Kamaelia is a framework,
> really. It's also GPL.

They also have not stepped forward for inclusion. A point needs to be
made that Brian has working code and has come forward to contribute
it.

It also should be noted that the concepts are extremely simple here as
management pools for threads and processes is not hard to grasp (a
definitely knock against Twisted as their Deferred approach definitely
hurts some peoples' heads no matter how much Antoine loves them =).
And especially important, the proposed solution is in pure Python and
does not rely exclusively on multiprocessing, thus avoids alienating
other VMs beyond CPython.

>>
>
> Inclusion of kamaelia, or twisted wholesale won't occur any time soon.

I would replace "any time soon" with "ever". Both projects are massive
and probably don't want to be locked into our development cycle. Both
also have their own development teams and culture. Adding a single
module or package by a single author is enough challenge as it is, but
bringing in another team would be more than challenging.

-Brett
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Jeffrey Yasskin
On Sat, Nov 7, 2009 at 7:32 AM, Jesse Noller  wrote:
> On Sat, Nov 7, 2009 at 10:21 AM, Antoine Pitrou  wrote:
>>
>>> Which API? My comment wasn't aimed at the API of the package - in the
>>> time I got to scan it last night nothing jumped out at me as overly
>>> offensive API-wise.
>>
>> Not offensive, but probably too complicated if it's meant to be a simple
>> helper. Anyway, let's wait for the PEP.
>
>
> The PEP is right here:
>
> http://code.google.com/p/pythonfutures/source/browse/trunk/PEP.txt
>
> I'm interested in hearing specific complaints about the API in the
> context of what it's trying to *do*. The only thing which jumped out
> at me was the number of methods on FutureList; but then again, each
> one of those makes conceptual sense, even if they are verbose -
> they're explicit on what's being done.

Overall, I like the idea of having futures in the standard library,
and I like the idea of pulling common bits of multiprocessing and
threading into a concurrent.* package. Here's my
stream-of-consciousness review of the PEP. I'll try to ** things that
really affect the API.

The "Interface" section should start with a conceptual description of
what Executor, Future, and FutureList are. Something like "An Executor
is an object you can hand tasks to, which will run them for you,
usually in another thread or process. A Future represents a task that
may or may not have completed yet, and which can be waited for and its
value or exception queries. A FutureList is ... ."

** The Executor interface is pretty redundant, and it's missing the
most basic call. Fundamentally, all you need is an
Executor.execute(callable) method returning None, and all the
future-oriented methods can be built on top of that. I'd support using
Executor.submit(callable) for the simplest method instead, which
returns a Future, but there should be some way for implementers to
only implement execute() and get submit either for free or with a
1-line definition. (I'm using method names from
http://java.sun.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
in case I'm unclear about the semantics here.) run_to_futures,
run_to_results, and map should be implementable on top of the Future
interface, and shouldn't need to be methods on Executor. I'd recommend
they be demoted to helper functions in the concurrent.futures module
unless there's a reason they need to be methods, and that reason
should be documented in the PEP.

** run_to_futures() shouldn't take a return_when argument. It should
be possible to wait for those conditions on any list of Futures.
(_not_ just a FutureList)

The code sample looks like Executor is a context manager. What does
its __exit__ do? shutdown()? shutdown&awaitTermination? I prefer
waiting in Executor.__exit__, since that makes it easier for users to
avoid having tasks run after they've cleaned up data those tasks
depend on. But that could be my C++ bias, where we have to be sure to
free memory in the right places. Frank, does Java run into any
problems with people cleaning things up that an Executor's tasks
depend on without awaiting for the Executor first?

shutdown should explain why it's important. Specifically, since the
Executor controls threads, and those threads hold a reference to the
Executor, nothing will get garbage collected without the explicit
call.

** What happens when FutureList.wait(FIRST_COMPLETED) is called twice?
Does it return immediately the second time? Does it wait for the
second task to finish? I'm inclined to think that FutureList should go
away and be replaced by functions that just take lists of Futures.

In general, I think the has_done_futures(), exception_futures(), etc.
are fine even though their results may be out of date by the time you
inspect them. That's because any individual Future goes monotonically
from not-started->running->(exception|value), so users can take
advantage of even an out-of-date done_futures() result. However, it's
dangerous to have several query functions, since users may think that
running_futures() `union` done_futures() `union` cancelled_futures()
covers the whole FutureList, but instead a Future can move between two
of the sets between two of those calls. Instead, perhaps an atomic
partition() function would be better, which returns a collection of
sub-lists that cover the whole original set.

I would rename result() to get() (or maybe Antoine's suggestion of
__call__) to match Java. I'm not sure exception() needs to exist.

--- More general points ---

** Java's Futures made a mistake in not supporting work stealing, and
this has caused deadlocks at Google. Specifically, in a bounded-size
thread or process pool, when a task in the pool can wait for work
running in the same pool, you can fill up the pool with tasks that are
waiting for tasks that haven't started running yet. To avoid this,
Future.get() should be able to steal the task it's waiting on out of
the pool's queue and run it immediately.

** I think both the Future-orien

Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Jesse Noller



On Nov 7, 2009, at 2:19 PM, Laura Creighton  wrote:


Anybody here looked at Kamaelia? http://www.kamaelia.org/
I haven't, but wonder if we are considering re-inventing the
wheel.

Laura


I have, and no - it's not what we're discussing. Kamaelia is a  
framework, really. It's also GPL.




Inclusion of kamaelia, or twisted wholesale won't occur any time soon.

Jesse
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Laura Creighton
Anybody here looked at Kamaelia? http://www.kamaelia.org/
I haven't, but wonder if we are considering re-inventing the
wheel.

Laura

___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Jesse Noller
On Sat, Nov 7, 2009 at 11:47 AM, Paul Moore  wrote:
> 2009/11/7 Antoine Pitrou :
>> I'm not sure this has anything to do with the discussion about futures
>> anyway.
>
> It's not - unless the suggestion that futures get added into
> multiprocessing was serious.
>
> Personally, I like the idea of a "concurrent" namespace -
> concurrent.futures seems like an ideal place for the module.

My point in saying that was to note that I've wanted to add something
like this into multiprocessing for awhile. More expansive use of
context managers to control pools of processes, possibly decorators to
indicate a function should be run in a process, etc.

That all being said; I'm more closely aligned with the concept of
building out/starting a python.concurrent package (starting with the
futures package) and then refactoring some of the multiprocessing API
into that package than I am adding futures right into multiprocessing.

jesse
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Paul Moore
2009/11/7 Antoine Pitrou :
> I'm not sure this has anything to do with the discussion about futures
> anyway.

It's not - unless the suggestion that futures get added into
multiprocessing was serious.

Personally, I like the idea of a "concurrent" namespace -
concurrent.futures seems like an ideal place for the module.

Paul.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Paul Moore
2009/11/7 Jesse Noller :
> Also, multiprocessing has never been "just" the threading
> API on top of processes. Part of the PEP for it's inclusion was that
> it had other items in it of value.

I guess it's my fault then for not paying enough attention to the
multithreading PEP. I did think it was "just" a multiprocess version
of threading, and if I'd have realised, I'd have lobbied for parity of
implementation at the time. Ah well, none of that is a criticism of
multiprocessing itself, and it's certainly not too late to add this,
as you said.

Paul.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Frank Wierzbicki
On Sat, Nov 7, 2009 at 10:05 AM, Jesse Noller  wrote:
> On the other hand, possibly pushing futures into a concurrent.*
> namespace within the standard library means that you could have
> concurrent.futures, concurrent.map, concurrent.apply and so on, and
> pull the things which multiprocessing does and threading can as well
> into that concurrent package.
I really like the idea of a concurrent package over adding these
things to multiprocessing - I might even be able to get it implemented
before  CPython since I have some obvious implementation advantages ;)

-Frank
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Frank Wierzbicki
On Sat, Nov 7, 2009 at 9:49 AM, Antoine Pitrou  wrote:
>
>> This is the area where I am most worried. Though multiprocessing is a
>> drop in replacement for threading, threading is not currently a drop
>> in replacement for multiprocessing. If multiprocessing doesn't make
>> sense for Jython and we need to tell our users that they should just
>> use threading, threading needs to do everything that multiprocessing
>> does...
>
> Well, feel free to propose a patch for threading.py.
> I'm not sure this has anything to do with the discussion about futures
> anyway.
If it can be done in pure Python I'd certainly be up for taking a a
crack at such a patch.  If it involves significant work with C and
threading it might be a little out of my scope.  If pure python is
out, I may end up implementing those parts missing in threading.py in
Java for Jython, and then circling back to see if doing it in C for
CPython makes sense.

-Frank
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Georg Brandl
Brett Cannon schrieb:
> [I am going to be lazy and mass reply to people with a top-post; you
> can burn an effigy of me later]
> 
> In response to Guido, yes, I sent Brian here with my PEP editor hat
> on. Unfortunately the hat was on rather firmly and I totally forgot to
> check to see how old the code is. Yet another reason I need to get the
> Hg conversion done so I can start writing a "Adding to the Stdlib"
> PEP.

I think it isn't entirely wrong to post to stdlib-sig about an interesting
area that could use a battery, and to present code that may become that
battery given enough time.  That way, we who need to accept the code later
can suggest API changes or point out problems now, instead of just before
inclusion when incompatible changes will only upset the (by then hopefully
many) users of the existing package.

Now if I could remember where I put the matches...

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Jesse Noller
On Sat, Nov 7, 2009 at 10:21 AM, Antoine Pitrou  wrote:
>
>> Which API? My comment wasn't aimed at the API of the package - in the
>> time I got to scan it last night nothing jumped out at me as overly
>> offensive API-wise.
>
> Not offensive, but probably too complicated if it's meant to be a simple
> helper. Anyway, let's wait for the PEP.


The PEP is right here:

http://code.google.com/p/pythonfutures/source/browse/trunk/PEP.txt

I'm interested in hearing specific complaints about the API in the
context of what it's trying to *do*. The only thing which jumped out
at me was the number of methods on FutureList; but then again, each
one of those makes conceptual sense, even if they are verbose -
they're explicit on what's being done.

jesse
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Antoine Pitrou

> Which API? My comment wasn't aimed at the API of the package - in the
> time I got to scan it last night nothing jumped out at me as overly
> offensive API-wise.

Not offensive, but probably too complicated if it's meant to be a simple
helper. Anyway, let's wait for the PEP.


___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Jesse Noller
On Sat, Nov 7, 2009 at 9:53 AM, Antoine Pitrou  wrote:
> Le vendredi 06 novembre 2009 à 21:20 -0500, Jesse Noller a écrit :
>> > But I really do like the idea. With java.util.concurrent and Grand
>> > Central Dispatch out there, I think it shows some demand for a way to
>> > easily abstract out concurrency management stuff and leave it up to a
>> > library.
>>
>> Making it eas(y)(ier), safer and simple would be nice.
>
> I agree with that. From a quick look at the API it seems it deserves
> simplifying and polishing.
>
> Regards
>
> Antoine.

Which API? My comment wasn't aimed at the API of the package - in the
time I got to scan it last night nothing jumped out at me as overly
offensive API-wise.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Jesse Noller
On Sat, Nov 7, 2009 at 9:49 AM, Antoine Pitrou  wrote:
>
>> This is the area where I am most worried. Though multiprocessing is a
>> drop in replacement for threading, threading is not currently a drop
>> in replacement for multiprocessing. If multiprocessing doesn't make
>> sense for Jython and we need to tell our users that they should just
>> use threading, threading needs to do everything that multiprocessing
>> does...
>
> Well, feel free to propose a patch for threading.py.
> I'm not sure this has anything to do with the discussion about futures
> anyway.
>
> Regards
>
> Antoine.
>

It may smell off topic Antoine - but it fundamentally isn't.
Multiprocessing exposes a lot more "goodies" than the threading
module. Threading lacks parity (and actually can't have it for some
things) with multiprocessing.

The futures package actually adds a nice API for a given set of tasks
on top of both threading and multiprocessing, and so begs the question
"how do alternative implementations which don't have multiprocessing"
deal with a new package which offers access to an API of which part
builds on multiprocessing.

I don't think that question is going to be solved in the context of
this particular discussion, given that any solution to that question
lacks something Brian's futures package has - working code.

On the other hand, possibly pushing futures into a concurrent.*
namespace within the standard library means that you could have
concurrent.futures, concurrent.map, concurrent.apply and so on, and
pull the things which multiprocessing does and threading can as well
into that concurrent package.

jesse
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Jesse Noller
On Sat, Nov 7, 2009 at 9:48 AM, Paul Moore  wrote:
> 2009/11/7 Jesse Noller :
>>> Actually, it's a pity that things like the Pool class only exist in
>>> multiprocessing. A threaded version of that would be very useful to me
>>> as well.
>>
>> It's an easily rectified pity. Also, your not the only use case addressed by
>> multiprocessing, which is why stuff you wouldn't use is in there.
>
> I'm not quite sure what you mean there. Are you suggesting that there
> could be a threading.Pool which mirrors multiprocessing.Pool? If so,
> then yes I agree - but of course, it wouldn't be available till
> 2.7/3.2.
>
> What I suppose I was thinking of as a "pity" was that it wasn't
> already added to threading. I thought multiprocessing was "just" the
> threading API using multiple processes - but it looks like it's more
> than that.
>

See my response to frank: There's nothing blocking this except for:

1> A patch
2> Tests
3> Docs

It's been on my wish list for ~2 years now, I might get it done in the
next decade. Also, multiprocessing has never been "just" the threading
API on top of processes. Part of the PEP for it's inclusion was that
it had other items in it of value.

jesse
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Jesse Noller
On Sat, Nov 7, 2009 at 9:42 AM, Frank Wierzbicki  wrote:
> On Sat, Nov 7, 2009 at 12:29 AM, Brian Quinlan  wrote:
>
>> Right now multiprocessing is ahead of threading in terms of features.
>> Pool.map() in particular is a pretty powerful idiom that has no equivalent
>> in threading.
> This is the area where I am most worried. Though multiprocessing is a
> drop in replacement for threading, threading is not currently a drop
> in replacement for multiprocessing. If multiprocessing doesn't make
> sense for Jython and we need to tell our users that they should just
> use threading, threading needs to do everything that multiprocessing
> does... or maybe there needs to be a higher level package?

The only reason in my view that this is not the case is because no one
has submitted a patch, myself included, it's been on my wish list for
some time.

There is nothing blocking that, AFAIK.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Antoine Pitrou
Le vendredi 06 novembre 2009 à 21:20 -0500, Jesse Noller a écrit :
> > But I really do like the idea. With java.util.concurrent and Grand
> > Central Dispatch out there, I think it shows some demand for a way to
> > easily abstract out concurrency management stuff and leave it up to a
> > library.
> 
> Making it eas(y)(ier), safer and simple would be nice.

I agree with that. From a quick look at the API it seems it deserves
simplifying and polishing.

Regards

Antoine.


___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Antoine Pitrou

> This is the area where I am most worried. Though multiprocessing is a
> drop in replacement for threading, threading is not currently a drop
> in replacement for multiprocessing. If multiprocessing doesn't make
> sense for Jython and we need to tell our users that they should just
> use threading, threading needs to do everything that multiprocessing
> does...

Well, feel free to propose a patch for threading.py.
I'm not sure this has anything to do with the discussion about futures
anyway.

Regards

Antoine.


___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Paul Moore
2009/11/7 Jesse Noller :
>> Actually, it's a pity that things like the Pool class only exist in
>> multiprocessing. A threaded version of that would be very useful to me
>> as well.
>
> It's an easily rectified pity. Also, your not the only use case addressed by
> multiprocessing, which is why stuff you wouldn't use is in there.

I'm not quite sure what you mean there. Are you suggesting that there
could be a threading.Pool which mirrors multiprocessing.Pool? If so,
then yes I agree - but of course, it wouldn't be available till
2.7/3.2.

What I suppose I was thinking of as a "pity" was that it wasn't
already added to threading. I thought multiprocessing was "just" the
threading API using multiple processes - but it looks like it's more
than that.

2009/11/7 Frank Wierzbicki :
> On Sat, Nov 7, 2009 at 12:29 AM, Brian Quinlan  wrote:
>
>> Right now multiprocessing is ahead of threading in terms of features.
>> Pool.map() in particular is a pretty powerful idiom that has no equivalent
>> in threading.
> This is the area where I am most worried. Though multiprocessing is a
> drop in replacement for threading, threading is not currently a drop
> in replacement for multiprocessing. If multiprocessing doesn't make
> sense for Jython and we need to tell our users that they should just
> use threading, threading needs to do everything that multiprocessing
> does... or maybe there needs to be a higher level package?

Yes, *that's* my point.

Paul.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Brian Quinlan


On 7 Nov 2009, at 12:40, Antoine Pitrou wrote:



To Antoine's Twisted comment, I don't see a direct comparison. From  
my

understanding Twisted's Deferred objects are ways to have callbacks
executed once an async event occurs, not to help execute code
concurrently.


Well, waiting for concurrently executing code to terminate *is* a case
of waiting for an async event to happen.

Deferred objects are a generic mechanism to chain reactions to
termination or failure of code. Whether the event your Deferred reacts
to is "async" or not is really a matter of how you use it (and of how
you define "async" -- perhaps you meant "I/O" but Deferreds are not
specialized for I/O).


They do seem specialized for continuation-passing style programming  
though. As far as I can tell from the docs (http://python.net/crew/mwh/apidocs/twisted.internet.defer.Deferred.html 
), the only way to process the results of a Deferred is my installing  
a callback.


Maybe you could outline (at a super-high-level) how you would  
implement my URL-downloading example using a Deferred-based API? Maybe  
something like:


def print_success(result, url):
   print('%r page is %d bytes' % (url, len(result)))

def print_failure(exception, url):
  print('%r generated an exception: %s' % (url, exception))

with ThreadedDeferredMaker(max_threads=5) as dm
  deferreds = []
  for url in URLS:
deferred = dm.defer(load_url, url)
deferred. addCallbacks(print_success, print_failure, url=url)
deferred.unpause()
deferreds.append(deferred)
  dm.wait_for_all_to_complete(deferreds)

The semantics aren't quite the same because the order of the output  
would be non-deterministic in this case. OTOH, you are going to get  
intermediate results as they become available, which is cool.



Cheers,
Brian

___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Frank Wierzbicki
On Sat, Nov 7, 2009 at 12:29 AM, Brian Quinlan  wrote:

> Right now multiprocessing is ahead of threading in terms of features.
> Pool.map() in particular is a pretty powerful idiom that has no equivalent
> in threading.
This is the area where I am most worried. Though multiprocessing is a
drop in replacement for threading, threading is not currently a drop
in replacement for multiprocessing. If multiprocessing doesn't make
sense for Jython and we need to tell our users that they should just
use threading, threading needs to do everything that multiprocessing
does... or maybe there needs to be a higher level package?

-Frank
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Jesse Noller



On Nov 7, 2009, at 9:03 AM, Paul Moore  wrote:


2009/11/7 Brian Quinlan :


On 7 Nov 2009, at 22:06, Paul Moore wrote:
I'm not convinced it should go in multiprocessing, though. After  
all,

it uses threading rather than multiple processes.



Actually, you can choose weather to use threads or processes. The  
current
implementation includes a ThreadPoolExecutor and a  
ProcessPoolExecutor

(which is an argument to making it a separate package) and should be
abstract enough to accommodate other strategies in the future.


That's my point. Multiprocessing is about just that - multiprocessing.
I wouldn't use it (or even think of looking in it) if I wanted to
write a single-process multithreaded program (which is what I usually
do on Windows). I was responding to the suggestion that your futures
module would work as a component of the multiprocessing package.

Actually, it's a pity that things like the Pool class only exist in
multiprocessing. A threaded version of that would be very useful to me
as well.

Paul.
__


It's an easily rectified pity. Also, your not the only use case  
addressed by multiprocessing, which is why stuff you wouldn't use is  
in there.


Jesse
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Paul Moore
2009/11/7 Brian Quinlan :
>
> On 7 Nov 2009, at 22:06, Paul Moore wrote:
>> I'm not convinced it should go in multiprocessing, though. After all,
>> it uses threading rather than multiple processes.
>
>
> Actually, you can choose weather to use threads or processes. The current
> implementation includes a ThreadPoolExecutor and a ProcessPoolExecutor
> (which is an argument to making it a separate package) and should be
> abstract enough to accommodate other strategies in the future.

That's my point. Multiprocessing is about just that - multiprocessing.
I wouldn't use it (or even think of looking in it) if I wanted to
write a single-process multithreaded program (which is what I usually
do on Windows). I was responding to the suggestion that your futures
module would work as a component of the multiprocessing package.

Actually, it's a pity that things like the Pool class only exist in
multiprocessing. A threaded version of that would be very useful to me
as well.

Paul.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Brian Quinlan


On 7 Nov 2009, at 22:06, Paul Moore wrote:


2009/11/7 Jesse Noller :

I obviously tend to agree with Brian; I know I've personally had to
implement things like this plenty of times, it's relatively simple
once you do it once or twice. This is a nice bit of syntactic sugar  
on

top of the threading/multiprocessing modules.


I agree. I've implemented futures a few times, and I'd be very glad
not to have to again. I'll certainly check out the package, but I'd
like to see the functionality in the stdlib.

I'm not convinced it should go in multiprocessing, though. After all,
it uses threading rather than multiple processes.



Actually, you can choose weather to use threads or processes. The  
current implementation includes a ThreadPoolExecutor and a  
ProcessPoolExecutor (which is an argument to making it a separate  
package) and should be abstract enough to accommodate other strategies  
in the future.


Java, for example,
Cheers,
Brian
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-07 Thread Paul Moore
2009/11/7 Jesse Noller :
> I obviously tend to agree with Brian; I know I've personally had to
> implement things like this plenty of times, it's relatively simple
> once you do it once or twice. This is a nice bit of syntactic sugar on
> top of the threading/multiprocessing modules.

I agree. I've implemented futures a few times, and I'd be very glad
not to have to again. I'll certainly check out the package, but I'd
like to see the functionality in the stdlib.

I'm not convinced it should go in multiprocessing, though. After all,
it uses threading rather than multiple processes.

Paul.
___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig


Re: [stdlib-sig] futures - a new package for asynchronous execution

2009-11-06 Thread Brian Quinlan

On 7 Nov 2009, at 18:37, inhahe wrote:


i don't understand this at all
i hope you will provide a basic explanation of what we're doing for  
us simpletons :P




Sure, but next time could you ask a more precise question? ;-)

# Create a pool of threads to execute calls.
with futures.ThreadPoolExecutor(max_threads=5) as executor:
  # Schedule the given calls to run using the thread pool created  
above and
  # return a FutureList (a list of Futures + some convenience  
methods). Called
  # without the "return_when" argument, waits until all calls are  
complete.

  futures_list = executor.run_to_futures(...)

# Iterate through every Future. A Future represents one of the  
asynchronous

# calls.
for url, future in zip(URLS, future_list):
  # Check if the call raised an exception.
  if future.exception() is not None:
# Print the exception
print('%r generated an exception: %s' % (url, future.exception()))
  else:
# The call returned successfully so future.result() contains the  
return

# value.
print('%r page is %d bytes' % (url, len(future.result(

Cheers,
Brian


import futures
import functools
import urllib.request

URLS = [
  'http://www.foxnews.com/',
  'http://www.cnn.com/',
  'http://europe.wsj.com/',
  'http://www.bbc.co.uk/',
  'http://some-made-up-domain.com/']

def load_url(url, timeout):
  return urllib.request.urlopen(url, timeout=timeout).read()

# Use a thread pool with 5 threads to download the URLs. Using a pool
# of processes would involve changing the initialization to:
#   with futures.ProcessPoolExecutor(max_processes=5) as executor
with futures.ThreadPoolExecutor(max_threads=5) as executor:



  future_list = executor.run_to_futures(
  [functools.partial(load_url, url, 30) for url in URLS])

# Check the results of each future.
for url, future in zip(URLS, future_list):
  if future.exception() is not None:
  print('%r generated an exception: %s' % (url,  
future.exception()))

  else:
  print('%r page is %d bytes' % (url, len(future.result(

In this example, executor.run_to_futures() returns only when every  
url has been retrieved but it is possible to return immediately, on  
the first completion or on the first failure depending on the  
desired work pattern.


The complete docs are here:
http://sweetapp.com/futures/



___
stdlib-sig mailing list
stdlib-sig@python.org
http://mail.python.org/mailman/listinfo/stdlib-sig