[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-04-10 Thread Tom Augspurger
A protocol that other Future implementations would be great. The Dask 
distributed library has an API compatible with concurrent.futures, but would 
never be appropriate for inclusion in the standard library. It'd be perfect if 
Dask's Future objects would work well with concurrent.futures.as_completed.

https://github.com/dask/distributed/issues/3695 has a view more details.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DEPZNH3TU5PRBRLKALKLD7LNZBCCDL4J/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-17 Thread Kyle Stanley
> Anyway, I think the spelled-out “Synchronous” may be a better name, to
avoid the (very likely) case of people mistakenly reading “Sync” as short
for “Synchronized”. It’s no longer than “ProcessPool”, and, although it is
easy to typo, tab-completion or copy-paste helps, and how many times do you
need to type it anyway? And there will always be more readers than writers,
and it’s more likely the writers will be familiar with the futures module
contents than the readers. And IIRC, this is the name Scala uses.

> Maybe “Serial” is ok too, but to me that implies serialized on a queue,
probably using a single background thread. That’s the naming used in the
third-party C++ and ObjC libs I’ve used most recently, and it may be more
common than that—but it may not, in which case my reading may be
idiosyncratic and not worth worrying about.

FWIW, I'm also in favor of SynchronousExecutor. I find that the term
"Serial" has a bit too many definitions depending on the context; whereas
"Synchronous" is very clear as to the behavior and purpose of the executor.
I'd rather the class name to be excessively verbose and more immediately
obvious as to what it does; rather than shorter to type and a bit ambiguous.

On Mon, Feb 17, 2020 at 9:05 PM Andrew Barnert via Python-ideas <
python-ideas@python.org> wrote:

> On Feb 17, 2020, at 15:41, Jonathan Crall  wrote:
> >
> > FWIW I found the term "SyncExecutor" really confusing when I was reading
> this thread. I thought it was short for Synchonized, but I just realized
> its actually short for Synchronous, which makes much more sense. While
> SynchronousExecutor makes more sense to me, it is also more verbose and
> difficult to spell.
>
> I think that’s my fault—I switched from “serial” to “sync” in the middle
> of a message without even realizing It, probably borrowed from an ObjC
> library I used recently.
>
> Anyway, I think the spelled-out “Synchronous” may be a better name, to
> avoid the (very likely) case of people mistakenly reading “Sync” as short
> for “Synchronized”. It’s no longer than “ProcessPool”, and, although it is
> easy to typo, tab-completion or copy-paste helps, and how many times do you
> need to type it anyway? And there will always be more readers than writers,
> and it’s more likely the writers will be familiar with the futures module
> contents than the readers. And IIRC, this is the name Scala uses.
>
> Maybe “Serial” is ok too, but to me that implies serialized on a queue,
> probably using a single background thread. That’s the naming used in the
> third-party C++ and ObjC libs I’ve used most recently, and it may be more
> common than that—but it may not, in which case my reading may be
> idiosyncratic and not worth worrying about.
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/5KVFCF7S2EBRL2RMZMDMQLA562CM5IG7/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4HKKXTLKTHZBPMF4J57UVX7S3NAU77A3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-17 Thread Andrew Barnert via Python-ideas
On Feb 17, 2020, at 15:41, Jonathan Crall  wrote:
> 
> FWIW I found the term "SyncExecutor" really confusing when I was reading this 
> thread. I thought it was short for Synchonized, but I just realized its 
> actually short for Synchronous, which makes much more sense. While 
> SynchronousExecutor makes more sense to me, it is also more verbose and 
> difficult to spell. 

I think that’s my fault—I switched from “serial” to “sync” in the middle of a 
message without even realizing It, probably borrowed from an ObjC library I 
used recently.

Anyway, I think the spelled-out “Synchronous” may be a better name, to avoid 
the (very likely) case of people mistakenly reading “Sync” as short for 
“Synchronized”. It’s no longer than “ProcessPool”, and, although it is easy to 
typo, tab-completion or copy-paste helps, and how many times do you need to 
type it anyway? And there will always be more readers than writers, and it’s 
more likely the writers will be familiar with the futures module contents than 
the readers. And IIRC, this is the name Scala uses.

Maybe “Serial” is ok too, but to me that implies serialized on a queue, 
probably using a single background thread. That’s the naming used in the 
third-party C++ and ObjC libs I’ve used most recently, and it may be more 
common than that—but it may not, in which case my reading may be idiosyncratic 
and not worth worrying about.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5KVFCF7S2EBRL2RMZMDMQLA562CM5IG7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-17 Thread Jonathan Crall
Based on the conversation so far, I agree with @Kyle Stanley's breakdown of
the proposal. I think shelving the "*Add a new way to create and specify
executor*" and focusing on "*Add a SerialExecutor, which does not use
threads or processes*" is the best way forward.

For context, I'm a machine learning researcher and developer. I've made
extensive use of both thread and process based parallelism (and I'm very
much looking forward to subinterpreters).  I use threads for tasks like
downloading files, running background tasks when my GPU computations are
the bottleneck, and other IO related tasks. I use processes for image
processing and other CPU bound tasks.

@Andrew Barnert 's analysis of the use case is spot on.
Andrew states:

>
> I’m pretty sure what he meant is that the developer _usually_ wants the
> task to run in parallel, but in some specific situation he wants it to
> _not_ run in parallel.

The concrete use case I’ve run into is this: I’ve got some parallel code
> that has a bug. I’m pretty sure the bug isn’t actually related to the
> shared data or the parallelism itself, but I want to be sure. I replace the
> ThreadPoolExecutor with a SyncExecutor and change nothing else about the
> code, and the bug still happens. Now I’ve proven that the bug isn’t related
> to parallelism. And, as a bonus, I’ve got nice logs that aren’t interleaved
> into a big mess, so it’s easier to track down the problem.


This is exactly the use case that I run into, but this isn't the only use
case for SerialExecutor. @Antoine Pitrou put it nicely:

Being able to swap a ThreadPoolExecutor or ProcessPoolExecutor with a
> serial version using the same API can have benefits in various
> situations.  One is easier debugging (in case the problem you have to
> debug isn't a race condition, of course :-)).  Another is writing a
> library a command-line tool or library where the final decision of
> whether to parallelize execution (e.g. through a command-line option for
> a CLI tool) is up to the user, not the library developer.


Antoine's second point is  important in certain multiuser or limited
hardware environments. On my personal machine I use all the compute
available, but on a shared system I need to constrain the resources I'm
using. Disabling parallelism can also be useful on hardware like the
raspberry pi.


1) Debugging parallel code: this is the use case stated by @Andrew Barnert
. Serial code is easier to debug, and currently the
executor API requires restructuring of the code if you want to rule out
parallelism as the source of a bug.
2) Some programs run better on one CPU in certain hardware / multiuser
environments : depending on the hardware you may want to disable
parallelism in your code. Many times I check for a `--serial` flag in the
command line to disable parallelism.

This proposal isn't so much about faking parallelism as it is disabling it
when you need to. If you set `max_workers` to 0 in ThreadPoolExecutor or
ProcessPoolExecutor you get an error. I don't think that disabling
parallelism is an uncommon use case. As previously mentioned it has uses in
debugging and allowing the user to control the flow of execution. This
second case is useful when your parallel code has a race condition that
doesn't appear on your machine, but it does on your customer's machine. The
current futures API does not work if you need to fallback on
single-threaded execution, which means that if the developer wants the
option to disable parallelism they have to maintain two different
implementations of the same functionality. A serial executor would allow
duck-typing to solve that problem.


Also, as a sidenote, I much more prefer the term "SyncExecutor" rather than
> "SerialExecutor". I think the former is a bit more clear at defining it's
> actual purpose.


FWIW I found the term "SyncExecutor" really confusing when I was reading
this thread. I thought it was short for Synchonized, but I just realized
its actually short for Synchronous, which makes much more sense. While
SynchronousExecutor makes more sense to me, it is also more verbose and
difficult to spell.

It seems there are two possible design decisions for a serial executor:
> - one is to execute the task immediately on `submit()`
> - another is to execute the task lazily on `result()`

This could for example be controlled by a constructor argument to
> SerialExecutor.


This is a great idea. I think I like the default being lazy execution, but
giving the user control over that would increase the usefulness.

I also see some conversation about a public API to query and get the state
of a process. That's likely because my implementation abuses a private
member variable, but I think it might be possible to implement
"SerialExecutor" without exposing state setter / getters. I think @Kyle
Stanley's idea makes sense:

 ``submit()`` could potentially "fake" the process of scheduling the
> execution of the function, but without directly executing it; perhaps with
> 

[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-17 Thread Kyle Stanley
> I'm much more lukewarm on set_state().  How hard is it to reimplement
> one's own Future if someone wants a different implementation?  By
> allowing people to change the future's internal state, we're also
> giving them a (small) gun to shoot themselves with.

Yeah, I don't feel quite as strongly about future.set_state(). My primary
motivation was to work on a complete means of extending the Future class
through a public API, starting with the future's state. But it might be too
(potentially) detrimental for the average user to be worth the more niche
case of being able to extend Future without needing to use the private
members.

Upon further consideration, I think it would be better to stick with
future.state() for now, since it has more of a general-purpose use case. If
it's specifically documented in similar manner to queue.qsize(); stating
something along the lines of "Return the approximate state of the future.
Note that this state is only advisory, and is not guaranteed."

> No strong opinion on this, but it sounds ok.  That means
> `future.state()` would return an enum value, not a bare string?

Yeah, presumably with each using auto() for the value. For simplified
review purposes though, I'll likely do these in separate PRs, but attached
to the same bpo issue: https://bugs.python.org/issue39645. I'll also update
the issue to reduce the scope a bit (mainly removing future.set_state()).

On Mon, Feb 17, 2020 at 5:08 AM Antoine Pitrou  wrote:

> On Sun, 16 Feb 2020 19:46:13 -0500
> Kyle Stanley  wrote:
> >
> > Based on the proposal in the OP, I had considered that it might also be
> > needed to be able to manually set the state of the future through
> something
> > like a `Future.set_state()`, which would have a parameter for accessing
> it
> > safely through the condition's RLock, and another without it (in case
> they
> > want to specify their own, such as in the OP's example code).
>
> I'm much more lukewarm on set_state().  How hard is it to reimplement
> one's own Future if someone wants a different implementation?  By
> allowing people to change the future's internal state, we're also
> giving them a (small) gun to shoot themselves with.
>
> > Lastly, it seemed also useful to be able to publicly use the future state
> > constants. This isn't necessary for extending them, but IMO it would look
> > better from an API design perspective to use
> `future.set_state(cf.RUNNING)`
> > instead of `future.set_state(cf._base.RUNNING)` or
> > `future.set_state("running") [1].
>
> No strong opinion on this, but it sounds ok.  That means
> `future.state()` would return an enum value, not a bare string?
>
> Regards
>
> Antoine.
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/APZRZPV5YPBQ7TOERXFODISMUII2MOUL/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WI6NWWOT4BCXNZNNGX53KM64LSC2PPKJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-17 Thread Antoine Pitrou
On Mon, 17 Feb 2020 12:19:59 -0800
Guido van Rossum  wrote:
> It's actually really hard to implement your own Future class that works
> well with concurrent.futures.as_completed() -- this is basically what
> complicated the OP's implementation. Maybe it would be useful to look into
> a protocol to allow alternative Future implementations to hook into that?

Ah, I understand the reasons then.  Ok, it does sound useful to explore
the space of solutions.  But let's decouple it from simply querying the
current Future state.

Regards

Antoine.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5UJSZP47TA3ULWNFAG33NFL4KL75QC2Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-17 Thread Guido van Rossum
It's actually really hard to implement your own Future class that works
well with concurrent.futures.as_completed() -- this is basically what
complicated the OP's implementation. Maybe it would be useful to look into
a protocol to allow alternative Future implementations to hook into that?

On Mon, Feb 17, 2020 at 2:07 AM Antoine Pitrou  wrote:

> On Sun, 16 Feb 2020 19:46:13 -0500
> Kyle Stanley  wrote:
> >
> > Based on the proposal in the OP, I had considered that it might also be
> > needed to be able to manually set the state of the future through
> something
> > like a `Future.set_state()`, which would have a parameter for accessing
> it
> > safely through the condition's RLock, and another without it (in case
> they
> > want to specify their own, such as in the OP's example code).
>
> I'm much more lukewarm on set_state().  How hard is it to reimplement
> one's own Future if someone wants a different implementation?  By
> allowing people to change the future's internal state, we're also
> giving them a (small) gun to shoot themselves with.
>
> > Lastly, it seemed also useful to be able to publicly use the future state
> > constants. This isn't necessary for extending them, but IMO it would look
> > better from an API design perspective to use
> `future.set_state(cf.RUNNING)`
> > instead of `future.set_state(cf._base.RUNNING)` or
> > `future.set_state("running") [1].
>
> No strong opinion on this, but it sounds ok.  That means
> `future.state()` would return an enum value, not a bare string?
>
> Regards
>
> Antoine.
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/APZRZPV5YPBQ7TOERXFODISMUII2MOUL/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PJOZYWBZS5ZMDMHAVEBCXUJSSKP4STLF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-17 Thread Antoine Pitrou
On Sun, 16 Feb 2020 19:46:13 -0500
Kyle Stanley  wrote:
> 
> Based on the proposal in the OP, I had considered that it might also be
> needed to be able to manually set the state of the future through something
> like a `Future.set_state()`, which would have a parameter for accessing it
> safely through the condition's RLock, and another without it (in case they
> want to specify their own, such as in the OP's example code).

I'm much more lukewarm on set_state().  How hard is it to reimplement
one's own Future if someone wants a different implementation?  By
allowing people to change the future's internal state, we're also
giving them a (small) gun to shoot themselves with.

> Lastly, it seemed also useful to be able to publicly use the future state
> constants. This isn't necessary for extending them, but IMO it would look
> better from an API design perspective to use `future.set_state(cf.RUNNING)`
> instead of `future.set_state(cf._base.RUNNING)` or
> `future.set_state("running") [1].

No strong opinion on this, but it sounds ok.  That means
`future.state()` would return an enum value, not a bare string?

Regards

Antoine.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/APZRZPV5YPBQ7TOERXFODISMUII2MOUL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-16 Thread Kyle Stanley
> Hm, but doesn't the OP's example require *synchronously* reading and
writing the state?

Correct. But in the OP's example, they wanted to use their own
"FakeCondition" for reading and writing the state, rather than the
executor's internal condition (which is bypassed when you directly access
or modify the state through future._state instead of the public methods).
That's why I proposed to add something like future.state(). In the case of
the OP's example, they would presumably access future.state() through
"FakeCondition".

Or am I misunderstanding something?

On Mon, Feb 17, 2020 at 12:03 AM Guido van Rossum  wrote:

> Hm, but doesn't the OP's example require *synchronously* reading and
> writing the state?
>
> On Sun, Feb 16, 2020 at 4:47 PM Kyle Stanley  wrote:
>
>> > That sounds useful to me indeed.  I assume you mean something like a
>> > state() method?  We already have Queue.qsize() which works a bit like
>> > this (unlocked and advisory).
>>
>> Yep, a `Future.state()` method is exactly what I had in mind! I hadn't
>> considered that `Queue.qsize()` was analogous, but that's a perfect example.
>>
>> Based on the proposal in the OP, I had considered that it might also be
>> needed to be able to manually set the state of the future through something
>> like a `Future.set_state()`, which would have a parameter for accessing it
>> safely through the condition's RLock, and another without it (in case they
>> want to specify their own, such as in the OP's example code).
>>
>> Lastly, it seemed also useful to be able to publicly use the future state
>> constants. This isn't necessary for extending them, but IMO it would look
>> better from an API design perspective to use `future.set_state(cf.RUNNING)`
>> instead of `future.set_state(cf._base.RUNNING)` or
>> `future.set_state("running") [1].
>>
>> Combining the above, this would look something like
>> `future.set_state(cf.FINISHED)`, instead of the current private means of
>> modifying them with `future._state = cf._base.FINISHED` or `future._state =
>> "finished"`.
>>
>> Personally, I'm most strongly in favor of adding Future.state(), as it
>> would be personally useful for me (for reasons previously mentioned); but I
>> think that the other two would be useful for properly extending the Future
>> class without having to access private members. This was more formally
>> proposed in https://bugs.python.org/issue39645.
>>
>>
>> [1] - Setting running was just an example, although normally that would
>> be just done in the executor through `Future.set_running_or_notify_cancel
>> ()`.
>>
>> On Sun, Feb 16, 2020 at 6:00 PM Antoine Pitrou 
>> wrote:
>>
>>> On Sun, 16 Feb 2020 17:41:36 -0500
>>> Kyle Stanley  wrote:
>>> >
>>> > As a side note, are we still interested in expanding the public API
>>> for the
>>> > Future class? Particularly for a public means of accessing the state.
>>> The
>>> > primary motivation for it was this topic, but I could easily the same
>>> > issues coming up with custom Future and Executor classes; not to
>>> mention
>>> > the general debugging usefulness for being able to log the current
>>> state of
>>> > the future (without relying on private members).
>>>
>>> That sounds useful to me indeed.  I assume you mean something like a
>>> state() method?  We already have Queue.qsize() which works a bit like
>>> this (unlocked and advisory).
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>> ___
>>> Python-ideas mailing list -- python-ideas@python.org
>>> To unsubscribe send an email to python-ideas-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-ideas@python.org/message/7ZQE3IB4NR7ZPLLKWIY54PW3X5K6YWUF/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/BT5AKV4AOACU6TFYI6MIQXC6RK7BQFEK/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> *Pronouns: he/him **(why is my pronoun here?)*
> 
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JQWCUIV5VLKAI54TVGNZQTR72R4LVK4E/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-16 Thread Guido van Rossum
Hm, but doesn't the OP's example require *synchronously* reading and
writing the state?

On Sun, Feb 16, 2020 at 4:47 PM Kyle Stanley  wrote:

> > That sounds useful to me indeed.  I assume you mean something like a
> > state() method?  We already have Queue.qsize() which works a bit like
> > this (unlocked and advisory).
>
> Yep, a `Future.state()` method is exactly what I had in mind! I hadn't
> considered that `Queue.qsize()` was analogous, but that's a perfect example.
>
> Based on the proposal in the OP, I had considered that it might also be
> needed to be able to manually set the state of the future through something
> like a `Future.set_state()`, which would have a parameter for accessing it
> safely through the condition's RLock, and another without it (in case they
> want to specify their own, such as in the OP's example code).
>
> Lastly, it seemed also useful to be able to publicly use the future state
> constants. This isn't necessary for extending them, but IMO it would look
> better from an API design perspective to use `future.set_state(cf.RUNNING)`
> instead of `future.set_state(cf._base.RUNNING)` or
> `future.set_state("running") [1].
>
> Combining the above, this would look something like
> `future.set_state(cf.FINISHED)`, instead of the current private means of
> modifying them with `future._state = cf._base.FINISHED` or `future._state =
> "finished"`.
>
> Personally, I'm most strongly in favor of adding Future.state(), as it
> would be personally useful for me (for reasons previously mentioned); but I
> think that the other two would be useful for properly extending the Future
> class without having to access private members. This was more formally
> proposed in https://bugs.python.org/issue39645.
>
>
> [1] - Setting running was just an example, although normally that would be
> just done in the executor through `Future.set_running_or_notify_cancel()`.
>
>
> On Sun, Feb 16, 2020 at 6:00 PM Antoine Pitrou 
> wrote:
>
>> On Sun, 16 Feb 2020 17:41:36 -0500
>> Kyle Stanley  wrote:
>> >
>> > As a side note, are we still interested in expanding the public API for
>> the
>> > Future class? Particularly for a public means of accessing the state.
>> The
>> > primary motivation for it was this topic, but I could easily the same
>> > issues coming up with custom Future and Executor classes; not to mention
>> > the general debugging usefulness for being able to log the current
>> state of
>> > the future (without relying on private members).
>>
>> That sounds useful to me indeed.  I assume you mean something like a
>> state() method?  We already have Queue.qsize() which works a bit like
>> this (unlocked and advisory).
>>
>> Regards
>>
>> Antoine.
>>
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/7ZQE3IB4NR7ZPLLKWIY54PW3X5K6YWUF/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/BT5AKV4AOACU6TFYI6MIQXC6RK7BQFEK/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7LPVJ6OPAFG7IZDGYU5FDXD4SX54KDN4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-16 Thread Kyle Stanley
> That sounds useful to me indeed.  I assume you mean something like a
> state() method?  We already have Queue.qsize() which works a bit like
> this (unlocked and advisory).

Yep, a `Future.state()` method is exactly what I had in mind! I hadn't
considered that `Queue.qsize()` was analogous, but that's a perfect example.

Based on the proposal in the OP, I had considered that it might also be
needed to be able to manually set the state of the future through something
like a `Future.set_state()`, which would have a parameter for accessing it
safely through the condition's RLock, and another without it (in case they
want to specify their own, such as in the OP's example code).

Lastly, it seemed also useful to be able to publicly use the future state
constants. This isn't necessary for extending them, but IMO it would look
better from an API design perspective to use `future.set_state(cf.RUNNING)`
instead of `future.set_state(cf._base.RUNNING)` or
`future.set_state("running") [1].

Combining the above, this would look something like
`future.set_state(cf.FINISHED)`, instead of the current private means of
modifying them with `future._state = cf._base.FINISHED` or `future._state =
"finished"`.

Personally, I'm most strongly in favor of adding Future.state(), as it
would be personally useful for me (for reasons previously mentioned); but I
think that the other two would be useful for properly extending the Future
class without having to access private members. This was more formally
proposed in https://bugs.python.org/issue39645.


[1] - Setting running was just an example, although normally that would be
just done in the executor through `Future.set_running_or_notify_cancel()`.

On Sun, Feb 16, 2020 at 6:00 PM Antoine Pitrou  wrote:

> On Sun, 16 Feb 2020 17:41:36 -0500
> Kyle Stanley  wrote:
> >
> > As a side note, are we still interested in expanding the public API for
> the
> > Future class? Particularly for a public means of accessing the state. The
> > primary motivation for it was this topic, but I could easily the same
> > issues coming up with custom Future and Executor classes; not to mention
> > the general debugging usefulness for being able to log the current state
> of
> > the future (without relying on private members).
>
> That sounds useful to me indeed.  I assume you mean something like a
> state() method?  We already have Queue.qsize() which works a bit like
> this (unlocked and advisory).
>
> Regards
>
> Antoine.
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/7ZQE3IB4NR7ZPLLKWIY54PW3X5K6YWUF/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BT5AKV4AOACU6TFYI6MIQXC6RK7BQFEK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-16 Thread Antoine Pitrou
On Sun, 16 Feb 2020 17:41:36 -0500
Kyle Stanley  wrote:
> 
> As a side note, are we still interested in expanding the public API for the
> Future class? Particularly for a public means of accessing the state. The
> primary motivation for it was this topic, but I could easily the same
> issues coming up with custom Future and Executor classes; not to mention
> the general debugging usefulness for being able to log the current state of
> the future (without relying on private members).

That sounds useful to me indeed.  I assume you mean something like a
state() method?  We already have Queue.qsize() which works a bit like
this (unlocked and advisory).

Regards

Antoine.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7ZQE3IB4NR7ZPLLKWIY54PW3X5K6YWUF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-16 Thread Kyle Stanley
> I don't think we need to be dogmatic here.  If someone wants to provide
> it on PyPI, then be it.  But if they'd rather contribute it to the
> stdlib, we should examine the relevant PR at face value.

> Asking it to be exercised first on PyPI is worthwhile if the domain
> space is complex or there are multiple possible APIs.  It's not really
> the case here: the API is basically constrained (it must be an
> Executor) and the main unknown seems to be whether execution is lazily
> or immediate (which may be as well governed by a constructor
> parameter).  And the implementation shouldn't be very hairy either :-)

Alright, fair enough. I suppose that I hadn't adequately considered how
constrained the API and straightforward the implementation would likely be.
If you think it would very likely receive widespread enough usage to
justify adding and maintaining it to the stdlib, I fully trust your
judgement on that. (:

As a side note, are we still interested in expanding the public API for the
Future class? Particularly for a public means of accessing the state. The
primary motivation for it was this topic, but I could easily the same
issues coming up with custom Future and Executor classes; not to mention
the general debugging usefulness for being able to log the current state of
the future (without relying on private members).

On Sun, Feb 16, 2020 at 9:49 AM Antoine Pitrou  wrote:

> On Sun, 16 Feb 2020 09:29:36 -0500
> Kyle Stanley  wrote:
> >
> > After Andrew explained his own use case for it with isolating bugs to
> > ensure that the issue wasn't occurring as a result of parallelism,
> threads,
> > processes, etc; I certainly can see how it would be useful. I could also
> > see a use case in a CLI tool for a conveniently similar parallel and
> > non-parallel version, although I'd likely prefer just having an entirely
> > separate implementation. Particularly if the parallel version includes
> > diving a large, computationally intensive task into many sub-tasks (more
> > common for PPE), that seems like it could result in significant
> additional
> > unneeded overhead for the non-parallel version.
> >
> > I think at this point, it's potential usefulness is clear though. But,
> IMO,
> > the main question is now the following: would it be better *initially*
> > placed in the standard library or on PyPI (which could eventually
> > transition into stdlib if it sees widespread usage)?
>
> I don't think we need to be dogmatic here.  If someone wants to provide
> it on PyPI, then be it.  But if they'd rather contribute it to the
> stdlib, we should examine the relevant PR at face value.
>
> Asking it to be exercised first on PyPI is worthwhile if the domain
> space is complex or there are multiple possible APIs.  It's not really
> the case here: the API is basically constrained (it must be an
> Executor) and the main unknown seems to be whether execution is lazily
> or immediate (which may be as well governed by a constructor
> parameter).  And the implementation shouldn't be very hairy either :-)
>
> Regards
>
> Antoine.
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/U5AOBMMGIANXFJEKMZYMWNQSY7D6RPL5/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TPT67BK6XWUMYBXVAYVJYL46HXECNCPI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-16 Thread Guido van Rossum
I'm happy to defer to Antoine, who is the subject expert here (and Brian
Quinlan, the original author).

On Sun, Feb 16, 2020 at 6:48 AM Antoine Pitrou  wrote:

> On Sun, 16 Feb 2020 09:29:36 -0500
> Kyle Stanley  wrote:
> >
> > After Andrew explained his own use case for it with isolating bugs to
> > ensure that the issue wasn't occurring as a result of parallelism,
> threads,
> > processes, etc; I certainly can see how it would be useful. I could also
> > see a use case in a CLI tool for a conveniently similar parallel and
> > non-parallel version, although I'd likely prefer just having an entirely
> > separate implementation. Particularly if the parallel version includes
> > diving a large, computationally intensive task into many sub-tasks (more
> > common for PPE), that seems like it could result in significant
> additional
> > unneeded overhead for the non-parallel version.
> >
> > I think at this point, it's potential usefulness is clear though. But,
> IMO,
> > the main question is now the following: would it be better *initially*
> > placed in the standard library or on PyPI (which could eventually
> > transition into stdlib if it sees widespread usage)?
>
> I don't think we need to be dogmatic here.  If someone wants to provide
> it on PyPI, then be it.  But if they'd rather contribute it to the
> stdlib, we should examine the relevant PR at face value.
>
> Asking it to be exercised first on PyPI is worthwhile if the domain
> space is complex or there are multiple possible APIs.  It's not really
> the case here: the API is basically constrained (it must be an
> Executor) and the main unknown seems to be whether execution is lazily
> or immediate (which may be as well governed by a constructor
> parameter).  And the implementation shouldn't be very hairy either :-)
>
> Regards
>
> Antoine.
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/U5AOBMMGIANXFJEKMZYMWNQSY7D6RPL5/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EHI2IR4QLKVY4CNZJKRT7JVUMK7T2SSJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-16 Thread Antoine Pitrou
On Sun, 16 Feb 2020 09:29:36 -0500
Kyle Stanley  wrote:
> 
> After Andrew explained his own use case for it with isolating bugs to
> ensure that the issue wasn't occurring as a result of parallelism, threads,
> processes, etc; I certainly can see how it would be useful. I could also
> see a use case in a CLI tool for a conveniently similar parallel and
> non-parallel version, although I'd likely prefer just having an entirely
> separate implementation. Particularly if the parallel version includes
> diving a large, computationally intensive task into many sub-tasks (more
> common for PPE), that seems like it could result in significant additional
> unneeded overhead for the non-parallel version.
> 
> I think at this point, it's potential usefulness is clear though. But, IMO,
> the main question is now the following: would it be better *initially*
> placed in the standard library or on PyPI (which could eventually
> transition into stdlib if it sees widespread usage)?

I don't think we need to be dogmatic here.  If someone wants to provide
it on PyPI, then be it.  But if they'd rather contribute it to the
stdlib, we should examine the relevant PR at face value.

Asking it to be exercised first on PyPI is worthwhile if the domain
space is complex or there are multiple possible APIs.  It's not really
the case here: the API is basically constrained (it must be an
Executor) and the main unknown seems to be whether execution is lazily
or immediate (which may be as well governed by a constructor
parameter).  And the implementation shouldn't be very hairy either :-)

Regards

Antoine.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/U5AOBMMGIANXFJEKMZYMWNQSY7D6RPL5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-16 Thread Kyle Stanley
> FWIW, I agree with Andrew here.  Being able to swap a
> ThreadPoolExecutor or ProcessPoolExecutor with a serial version using
> the same API can have benefits in various situations. One is
> easier debugging (in case the problem you have to debug isn't a race
> condition, of course :-)).  Another is writing a library a command-line
> tool or library where the final decision of whether to parallelize
> execution (e.g. through a command-line option for a CLI tool) is up
> to the user, not the library developer.

After Andrew explained his own use case for it with isolating bugs to
ensure that the issue wasn't occurring as a result of parallelism, threads,
processes, etc; I certainly can see how it would be useful. I could also
see a use case in a CLI tool for a conveniently similar parallel and
non-parallel version, although I'd likely prefer just having an entirely
separate implementation. Particularly if the parallel version includes
diving a large, computationally intensive task into many sub-tasks (more
common for PPE), that seems like it could result in significant additional
unneeded overhead for the non-parallel version.

I think at this point, it's potential usefulness is clear though. But, IMO,
the main question is now the following: would it be better *initially*
placed in the standard library or on PyPI (which could eventually
transition into stdlib if it sees widespread usage)?

> It seems there are two possible design decisions for a serial executor:
> - one is to execute the task immediately on `submit()`
> - another is to execute the task lazily on `result()`

To me, it seems like the latter would be more useful for debugging
purposes, since that would be more similar to how the submitted
task/function would actually be executed. ``submit()`` could potentially
"fake" the process of scheduling the execution of the function, but without
directly executing it; perhaps with something like this:
``executor.submit()`` => create a pending item => add pending item to dict
=> add callable to call queue => fut.result() => check if in pending items
=> get from top of call queue  => run work item => pop from pending items
=> set result/exception => return result (skip last three if fut is not
in/associated with a pending item). IMO, that would be similar enough to
the general workflow followed in the executors without any of the
parallelization.

On Sun, Feb 16, 2020 at 6:29 AM Antoine Pitrou  wrote:

> On Sat, 15 Feb 2020 14:16:39 -0800
> Andrew Barnert via Python-ideas
>  wrote:
> > > On Feb 15, 2020, at 13:36, Jonathan Crall  wrote:
> > >
> > > Also, there is no duck-typed class that behaves like an executor, but
> does its processing in serial. Often times a develop will want to run a
> task in parallel, but depending on the environment they may want to disable
> threading or process execution. To address this I use a utility called a
> `SerialExecutor` which shares an API with
> ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially
> in the same python thread:
> >
> > This makes sense. I think most futures-and-executors frameworks in other
> languages have a serial/synchronous/immediate/blocking executor just like
> this. (And the ones that don’t, it’s usually because they have a different
> way to specify the same functionality—e.g., in C++, you only use executors
> via the std::async function, and you can just pass a launch option instead
> of an executor to run synchronously.)
>
> FWIW, I agree with Andrew here.  Being able to swap a
> ThreadPoolExecutor or ProcessPoolExecutor with a serial version using
> the same API can have benefits in various situations.  One is
> easier debugging (in case the problem you have to debug isn't a race
> condition, of course :-)).  Another is writing a library a command-line
> tool or library where the final decision of whether to parallelize
> execution (e.g. through a command-line option for a CLI tool) is up
> to the user, not the library developer.
>
> It seems there are two possible design decisions for a serial executor:
> - one is to execute the task immediately on `submit()`
> - another is to execute the task lazily on `result()`
>
> This could for example be controlled by a constructor argument to
> SerialExecutor.
>
> Regards
>
> Antoine.
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/PCDN4JMKR7VCWXTEZSMWWIY55NTT3JOM/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 

[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-16 Thread Antoine Pitrou
On Sat, 15 Feb 2020 14:16:39 -0800
Andrew Barnert via Python-ideas
 wrote:
> > On Feb 15, 2020, at 13:36, Jonathan Crall  wrote:
> > 
> > Also, there is no duck-typed class that behaves like an executor, but does 
> > its processing in serial. Often times a develop will want to run a task in 
> > parallel, but depending on the environment they may want to disable 
> > threading or process execution. To address this I use a utility called a 
> > `SerialExecutor` which shares an API with 
> > ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially 
> > in the same python thread:  
> 
> This makes sense. I think most futures-and-executors frameworks in other 
> languages have a serial/synchronous/immediate/blocking executor just like 
> this. (And the ones that don’t, it’s usually because they have a different 
> way to specify the same functionality—e.g., in C++, you only use executors 
> via the std::async function, and you can just pass a launch option instead of 
> an executor to run synchronously.)

FWIW, I agree with Andrew here.  Being able to swap a
ThreadPoolExecutor or ProcessPoolExecutor with a serial version using
the same API can have benefits in various situations.  One is
easier debugging (in case the problem you have to debug isn't a race
condition, of course :-)).  Another is writing a library a command-line
tool or library where the final decision of whether to parallelize
execution (e.g. through a command-line option for a CLI tool) is up
to the user, not the library developer.

It seems there are two possible design decisions for a serial executor:
- one is to execute the task immediately on `submit()`
- another is to execute the task lazily on `result()`

This could for example be controlled by a constructor argument to
SerialExecutor.

Regards

Antoine.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PCDN4JMKR7VCWXTEZSMWWIY55NTT3JOM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-15 Thread Kyle Stanley
I opened a bpo issue to expand upon the public API for cf.Future:
https://bugs.python.org/issue39645. Any feedback would be greatly
appreciated. (:

On Sun, Feb 16, 2020 at 12:22 AM Guido van Rossum  wrote:

>
>
> On Sat, Feb 15, 2020 at 21:00 Kyle Stanley  wrote:
>
>> > I've never felt the need for either of these myself, nor have I
>> observed it in others I worked with. In general I feel the difference
>> between processes and threads is so large that I can't believe a realistic
>> application would work with either.
>>
>> Also, ThreadPoolExecutor and ProcessPoolExecutor both have their specific
>> purposes in concurrent.futures: TPE for IO-bound parallelism, and PPE for
>> CPU-bound parallelism, what niche would the proposed SerialExecutor fall
>> under? Fake/dummy parallelism? If so, I personally don't see that as being
>> worth the cost of adding it and then maintaining it in the standard
>> library. But, that's not to say that it wouldn't have a place on PyPI.
>>
>> > (Then again I've never had much use for ProcessExecutor period.)
>>
>> I've also made use of TPE far more times than PPE, but I've definitely
>> seen several interesting and useful real-world applications of PPE.
>> Particularly with image processing. I can also imagine it also being quite
>> useful for scientific computing, although I've not personally used it for
>> that purpose.
>>
>> > IOW I'm rather lukewarm about this -- even if you (Jonathan) have found
>> use for it, I'm not sure how many other people would use it, so I doubt
>> it's worth adding it to the stdlib. (The only thing the stdlib might grow
>> could be a public API that makes implementing this feasible without
>> overriding private methods.)
>>
>> Expanding a bit upon the public API for the cf.Future class would likely
>> allow something like this to be possible without accessing any private
>> members. In particular, I believe there would have to be an public means of
>> accessing the state of the future without having to go through the
>> condition (currently, this can only be done with ``future._state``), and
>> accessing a constant for each of the possible states: PENDING. RUNNING,
>> CANCELLED, CANCELLED_AND_NOTIFIED, and FINISHED.
>>
>> Since that would actually be quite useful for debugging purposes (I had
>> to access ``future._state`` several times while testing the new
>> *cancel_futures*), I'd be willing to work on implementing something like
>> this.
>>
>
> Excellent!
>
>
>>
>> On Sat, Feb 15, 2020 at 10:16 PM Guido van Rossum 
>> wrote:
>>
>>> Having tried my hand at a simpler version for about 15 minutes, I see
>>> the reason for the fiddly subclass of Future -- it seems over-engineered
>>> because concurrent.future is complicated.
>>>
>>> I've never felt the need for either of these myself, nor have I observed
>>> it in others I worked with. In general I feel the difference between
>>> processes and threads is so large that I can't believe a realistic
>>> application would work with either. (Then again I've never had much use for
>>> ProcessExecutor period.)
>>>
>>> The "Serial" variants somehow remind me of the "dummy_thread.py" module
>>> we had in Python 2. It was removed in Python 3, mostly because we ran out
>>> of cases where real threads weren't an option.
>>>
>>> IOW I'm rather lukewarm about this -- even if you (Jonathan) have found
>>> use for it, I'm not sure how many other people would use it, so I doubt
>>> it's worth adding it to the stdlib. (The only thing the stdlib might grow
>>> could be a public API that makes implementing this feasible without
>>> overriding private methods.)
>>>
>>> On Sat, Feb 15, 2020 at 3:16 PM Jonathan Crall 
>>> wrote:
>>>
 This implementation is a proof-of-concept that I've been using for
 awhile
 .
 Its certain that any version that made it into the stdlib would have to be
 more carefully designed than the implementation I threw together. However,
 my implementation demonstrates the concept and there are reasons for the
 choices I made.

 First, the choice to create a SerialFuture object that inherits from
 the base Future was because I only wanted a process to run if the
 SerialFuture.result method was called. The most obvious way to do that was
 to overload the `result` method to execute the function when called.
 Perhaps there is a better way, but in an effort to KISS I just went with
 the <100 line version that seemed to work well enough.

 The `set_result` is overloaded because in Python 3.8, the base
 Future.set_result function asserts that the _state is not FINISHED when it
 is called. In my proof-of-concept implementation I had to set state of the
 SerialFuture._state to FINISHED in order for `as_completed` to yield it.
 Again, there may be a better way to do this, but I don't claim to know what
 that is yet.

 I was 

[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-15 Thread Kyle Stanley
> No; the point of launch policies is that you can (without needing an
executor object[1]) tell the task to run “async” (on its own thread[2]),
“deferred” (serially[3] on first demand), or “immediate” (serially right
now)[4]. You can even or together multiple policies to let the
implementation choose, and IIRC the default is async|deferred.

> [2] Actually “as if on its own thread”. But AFAIK, every implementation
handles this by spawning a thread. I think the distinction is for future
expansions, either so they can do something like Java’s ForkJoinPool, or so
they can use fibers or coroutines that don’t care what thread they’re on.

Ah, so the fact that std::async spawns a separate thread in many
implementations is more of an internal detail that could be changed, rather
than a guaranteed behavior. Thanks for the clarification and detailed
explanation, libstdc++ is definitely not an area of expertise for me. (:

> The concrete use case I’ve run into is this: I’ve got some parallel code
that has a bug. I’m pretty sure the bug isn’t actually related to the
shared data or the parallelism itself, but I want to be sure. I replace the
ThreadPoolExecutor with a SyncExecutor and change nothing else about the
code, and the bug still happens. Now I’ve proven that the bug isn’t related
to parallelism. And, as a bonus, I’ve got nice logs that aren’t interleaved
into a big mess, so it’s easier to track down the problem.

That sounds like it would be quite a useful utility class for general
executor debugging purposes. But, I'm just not convinced that it would see
wide enough usage to justify adding it to concurrent.futures. IMO, this
makes it a perfect candidate for a decent PyPI package. If that package
ends up being significantly popular, it might be worth re-examining it's
membership in the stdlib once it becomes mature. This would reduce the risk
of burdening CPython development time with an underused feature, and gives
it far more room for growth/improvement [1].

Also, as a sidenote, I much more prefer the term "SyncExecutor" rather than
"SerialExecutor". I think the former is a bit more clear at defining it's
actual purpose.


[1] - Once something gets added to the standard library, it has to adhere
as much as reasonably possible to backwards compatibility, making any
changes in behavior and API drastically more difficult. Also, its
development time becomes limited by CPython's release cycle rather than
having its own.

On Sun, Feb 16, 2020 at 12:43 AM Andrew Barnert  wrote:

> On Feb 15, 2020, at 20:29, Kyle Stanley  wrote:
>
>
> *Add a SerialExecutor, which does not use threads or processes*
>
> Andrew Barnert wrote:
> > e.g., in C++, you only use executors via the std::async function, and
> you can just pass a launch option instead of an executor to run
> synchronously
>
> In the case of C++'s std::async though, it still launches a thread to run
> the function within, no?
>
>
> No; the point of launch policies is that you can (without needing an
> executor object[1]) tell the task to run “async” (on its own thread[2]),
> “deferred” (serially[3] on first demand), or “immediate” (serially right
> now)[4]. You can even or together multiple policies to let the
> implementation choose, and IIRC the default is async|deferred.
>
> At any rate, I’m not suggesting that C++ is a design worth looking at,
> just parenthetically noting it as an example of how when libraries don’t
> have a serial executor, it’s often because they already have a different
> way to specify the same thing.
>
> This doesn't require the user to explicitly create or interact with the
> thread in any way, but that seems to go against what OP was looking for:
>
> Jonathan Crall wrote:
> > Often times a develop will want to run a task in parallel, but depending
> on the environment they may want to disable threading or process execution.
>
> The *concrete* purpose of what that accomplishes (in the context of
> CPython) isn't clear to me. How exactly are you running the task in
> parallel without using a thread, process, or coroutine [1]?
>
>
> I’m pretty sure what he meant is that the developer _usually_ wants the
> task to run in parallel, but in some specific situation he wants it to
> _not_ run in parallel.
>
> The concrete use case I’ve run into is this: I’ve got some parallel code
> that has a bug. I’m pretty sure the bug isn’t actually related to the
> shared data or the parallelism itself, but I want to be sure. I replace the
> ThreadPoolExecutor with a SyncExecutor and change nothing else about the
> code, and the bug still happens. Now I’ve proven that the bug isn’t related
> to parallelism. And, as a bonus, I’ve got nice logs that aren’t interleaved
> into a big mess, so it’s easier to track down the problem.
>
> I have no idea if this is Jonathan’s use, but it is the reason I’ve built
> something similar myself.
>
> —-
>
> [1] Actually, the version that got into C++11 doesn’t even have executors,
> only launch policies. It also doesn’t 

[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-15 Thread Andrew Barnert via Python-ideas
On Feb 15, 2020, at 20:29, Kyle Stanley  wrote:
> Add a SerialExecutor, which does not use threads or processes
> 
> Andrew Barnert wrote:
> > e.g., in C++, you only use executors via the std::async function, and you 
> > can just pass a launch option instead of an executor to run synchronously
> 
> In the case of C++'s std::async though, it still launches a thread to run the 
> function within, no?

No; the point of launch policies is that you can (without needing an executor 
object[1]) tell the task to run “async” (on its own thread[2]), “deferred” 
(serially[3] on first demand), or “immediate” (serially right now)[4]. You can 
even or together multiple policies to let the implementation choose, and IIRC 
the default is async|deferred.

At any rate, I’m not suggesting that C++ is a design worth looking at, just 
parenthetically noting it as an example of how when libraries don’t have a 
serial executor, it’s often because they already have a different way to 
specify the same thing.

> This doesn't require the user to explicitly create or interact with the 
> thread in any way, but that seems to go against what OP was looking for:
> 
> Jonathan Crall wrote:
> > Often times a develop will want to run a task in parallel, but depending on 
> > the environment they may want to disable threading or process execution.
> 
> The *concrete* purpose of what that accomplishes (in the context of CPython) 
> isn't clear to me. How exactly are you running the task in parallel without 
> using a thread, process, or coroutine [1]?

I’m pretty sure what he meant is that the developer _usually_ wants the task to 
run in parallel, but in some specific situation he wants it to _not_ run in 
parallel.

The concrete use case I’ve run into is this: I’ve got some parallel code that 
has a bug. I’m pretty sure the bug isn’t actually related to the shared data or 
the parallelism itself, but I want to be sure. I replace the ThreadPoolExecutor 
with a SyncExecutor and change nothing else about the code, and the bug still 
happens. Now I’ve proven that the bug isn’t related to parallelism. And, as a 
bonus, I’ve got nice logs that aren’t interleaved into a big mess, so it’s 
easier to track down the problem.

I have no idea if this is Jonathan’s use, but it is the reason I’ve built 
something similar myself.

—-

[1] Actually, the version that got into C++11 doesn’t even have executors, only 
launch policies. It also doesn’t have then continuation methods, composing 
functions like all and as_completed, … It’s basically useless. All of those 
other features got deferred to a tech specification that was supposed to be 
before C++14 but got pushed back repeatedly until it came out after C++17, and 
then got withdrawn, and now they’re awaiting proposals for a second TS to come. 
Which will probably be after the language has first-class coroutines and maybe 
fibers, and async/await, so they may well have to redesign the whole futures 
model yet again to make futures awaitable…

[2] Actually “as if on its own thread”. But AFAIK, every implementation handles 
this by spawning a thread. I think the distinction is for future expansions, 
either so they can do something like Java’s ForkJoinPool, or so they can use 
fibers or coroutines that don’t care what thread they’re on.

[3] In C++ futures lingo, “serial” actually means an executor that runs all 
tasks on a single background thread, with a queue that’s guaranteed to be 
mutex-locked rather than lock-free. But I mean “serial” in Jonathan’s sense 
here.

[4] Checking the docs, it looks like the immediate policy didn’t make into 
C++11 either. But anyway, the deferred policy did, and that’s serial in 
Jonathan’s sense.___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FYYHAAJS7LY4OP4AI6WWJVH3RHQ5AMW2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-15 Thread Guido van Rossum
On Sat, Feb 15, 2020 at 21:00 Kyle Stanley  wrote:

> > I've never felt the need for either of these myself, nor have I observed
> it in others I worked with. In general I feel the difference between
> processes and threads is so large that I can't believe a realistic
> application would work with either.
>
> Also, ThreadPoolExecutor and ProcessPoolExecutor both have their specific
> purposes in concurrent.futures: TPE for IO-bound parallelism, and PPE for
> CPU-bound parallelism, what niche would the proposed SerialExecutor fall
> under? Fake/dummy parallelism? If so, I personally don't see that as being
> worth the cost of adding it and then maintaining it in the standard
> library. But, that's not to say that it wouldn't have a place on PyPI.
>
> > (Then again I've never had much use for ProcessExecutor period.)
>
> I've also made use of TPE far more times than PPE, but I've definitely
> seen several interesting and useful real-world applications of PPE.
> Particularly with image processing. I can also imagine it also being quite
> useful for scientific computing, although I've not personally used it for
> that purpose.
>
> > IOW I'm rather lukewarm about this -- even if you (Jonathan) have found
> use for it, I'm not sure how many other people would use it, so I doubt
> it's worth adding it to the stdlib. (The only thing the stdlib might grow
> could be a public API that makes implementing this feasible without
> overriding private methods.)
>
> Expanding a bit upon the public API for the cf.Future class would likely
> allow something like this to be possible without accessing any private
> members. In particular, I believe there would have to be an public means of
> accessing the state of the future without having to go through the
> condition (currently, this can only be done with ``future._state``), and
> accessing a constant for each of the possible states: PENDING. RUNNING,
> CANCELLED, CANCELLED_AND_NOTIFIED, and FINISHED.
>
> Since that would actually be quite useful for debugging purposes (I had to
> access ``future._state`` several times while testing the new
> *cancel_futures*), I'd be willing to work on implementing something like
> this.
>

Excellent!


>
> On Sat, Feb 15, 2020 at 10:16 PM Guido van Rossum 
> wrote:
>
>> Having tried my hand at a simpler version for about 15 minutes, I see the
>> reason for the fiddly subclass of Future -- it seems over-engineered
>> because concurrent.future is complicated.
>>
>> I've never felt the need for either of these myself, nor have I observed
>> it in others I worked with. In general I feel the difference between
>> processes and threads is so large that I can't believe a realistic
>> application would work with either. (Then again I've never had much use for
>> ProcessExecutor period.)
>>
>> The "Serial" variants somehow remind me of the "dummy_thread.py" module
>> we had in Python 2. It was removed in Python 3, mostly because we ran out
>> of cases where real threads weren't an option.
>>
>> IOW I'm rather lukewarm about this -- even if you (Jonathan) have found
>> use for it, I'm not sure how many other people would use it, so I doubt
>> it's worth adding it to the stdlib. (The only thing the stdlib might grow
>> could be a public API that makes implementing this feasible without
>> overriding private methods.)
>>
>> On Sat, Feb 15, 2020 at 3:16 PM Jonathan Crall 
>> wrote:
>>
>>> This implementation is a proof-of-concept that I've been using for
>>> awhile
>>> .
>>> Its certain that any version that made it into the stdlib would have to be
>>> more carefully designed than the implementation I threw together. However,
>>> my implementation demonstrates the concept and there are reasons for the
>>> choices I made.
>>>
>>> First, the choice to create a SerialFuture object that inherits from the
>>> base Future was because I only wanted a process to run if the
>>> SerialFuture.result method was called. The most obvious way to do that was
>>> to overload the `result` method to execute the function when called.
>>> Perhaps there is a better way, but in an effort to KISS I just went with
>>> the <100 line version that seemed to work well enough.
>>>
>>> The `set_result` is overloaded because in Python 3.8, the base
>>> Future.set_result function asserts that the _state is not FINISHED when it
>>> is called. In my proof-of-concept implementation I had to set state of the
>>> SerialFuture._state to FINISHED in order for `as_completed` to yield it.
>>> Again, there may be a better way to do this, but I don't claim to know what
>>> that is yet.
>>>
>>> I was thinking that a factory function might be a good idea, but if I
>>> was designing the system I would have put that in the abstract Executor
>>> class. Maybe something like
>>>
>>>
>>> ```
>>> @classmethod
>>> def create(cls, mode, max_workers=0):
>>> """ Create an instance of a serial, thread, or 

[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-15 Thread Kyle Stanley
> I've never felt the need for either of these myself, nor have I observed
it in others I worked with. In general I feel the difference between
processes and threads is so large that I can't believe a realistic
application would work with either.

Also, ThreadPoolExecutor and ProcessPoolExecutor both have their specific
purposes in concurrent.futures: TPE for IO-bound parallelism, and PPE for
CPU-bound parallelism, what niche would the proposed SerialExecutor fall
under? Fake/dummy parallelism? If so, I personally don't see that as being
worth the cost of adding it and then maintaining it in the standard
library. But, that's not to say that it wouldn't have a place on PyPI.

> (Then again I've never had much use for ProcessExecutor period.)

I've also made use of TPE far more times than PPE, but I've definitely seen
several interesting and useful real-world applications of PPE. Particularly
with image processing. I can also imagine it also being quite useful for
scientific computing, although I've not personally used it for that purpose.

> IOW I'm rather lukewarm about this -- even if you (Jonathan) have found
use for it, I'm not sure how many other people would use it, so I doubt
it's worth adding it to the stdlib. (The only thing the stdlib might grow
could be a public API that makes implementing this feasible without
overriding private methods.)

Expanding a bit upon the public API for the cf.Future class would likely
allow something like this to be possible without accessing any private
members. In particular, I believe there would have to be an public means of
accessing the state of the future without having to go through the
condition (currently, this can only be done with ``future._state``), and
accessing a constant for each of the possible states: PENDING. RUNNING,
CANCELLED, CANCELLED_AND_NOTIFIED, and FINISHED.

Since that would actually be quite useful for debugging purposes (I had to
access ``future._state`` several times while testing the new
*cancel_futures*), I'd be willing to work on implementing something like
this.


On Sat, Feb 15, 2020 at 10:16 PM Guido van Rossum  wrote:

> Having tried my hand at a simpler version for about 15 minutes, I see the
> reason for the fiddly subclass of Future -- it seems over-engineered
> because concurrent.future is complicated.
>
> I've never felt the need for either of these myself, nor have I observed
> it in others I worked with. In general I feel the difference between
> processes and threads is so large that I can't believe a realistic
> application would work with either. (Then again I've never had much use for
> ProcessExecutor period.)
>
> The "Serial" variants somehow remind me of the "dummy_thread.py" module we
> had in Python 2. It was removed in Python 3, mostly because we ran out of
> cases where real threads weren't an option.
>
> IOW I'm rather lukewarm about this -- even if you (Jonathan) have found
> use for it, I'm not sure how many other people would use it, so I doubt
> it's worth adding it to the stdlib. (The only thing the stdlib might grow
> could be a public API that makes implementing this feasible without
> overriding private methods.)
>
> On Sat, Feb 15, 2020 at 3:16 PM Jonathan Crall  wrote:
>
>> This implementation is a proof-of-concept that I've been using for awhile
>> .
>> Its certain that any version that made it into the stdlib would have to be
>> more carefully designed than the implementation I threw together. However,
>> my implementation demonstrates the concept and there are reasons for the
>> choices I made.
>>
>> First, the choice to create a SerialFuture object that inherits from the
>> base Future was because I only wanted a process to run if the
>> SerialFuture.result method was called. The most obvious way to do that was
>> to overload the `result` method to execute the function when called.
>> Perhaps there is a better way, but in an effort to KISS I just went with
>> the <100 line version that seemed to work well enough.
>>
>> The `set_result` is overloaded because in Python 3.8, the base
>> Future.set_result function asserts that the _state is not FINISHED when it
>> is called. In my proof-of-concept implementation I had to set state of the
>> SerialFuture._state to FINISHED in order for `as_completed` to yield it.
>> Again, there may be a better way to do this, but I don't claim to know what
>> that is yet.
>>
>> I was thinking that a factory function might be a good idea, but if I was
>> designing the system I would have put that in the abstract Executor class.
>> Maybe something like
>>
>>
>> ```
>> @classmethod
>> def create(cls, mode, max_workers=0):
>> """ Create an instance of a serial, thread, or process-based executor
>> """
>> from concurrent import futures
>> if mode == 'serial' or max_workers == 0:
>> return futures.SerialExecutor()
>> elif mode == 'thread':
>> return 

[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-15 Thread Kyle Stanley
This seems to be two separate proposals:

1) Add a new way to create and specify executor
2) Add a SerialExecutor, which does not use threads or processes

So, I'll respond to each one separately.


*Add a new way to create and specify executor*

Jonathan Crall wrote:
> The library's ThreadPoolExecutor and ProcessPoolExecutor are excellent
tools, but there is currently no mechanism for configuring which type of
executor you want.

The mechanism of configuring the executor type is by instantiating the type
of executor you want to use. For IO-bound parallelism you use
``cf.ThreadPoolExecutor()`` or for CPU-bound parallelism you use
``cf.ProcessPoolExecutor()``. So I'm not sure that it would be practically
beneficial to provide multiple ways to configure the type of executor to
use. That seems to go against the philosophy of preferring "one obvious way
to do it" [1].

I think there's a very reasonable argument for using a
``cf.Executor.create()`` or ``cf.create_executor()`` that works as a
factory to initialize and return an executor class based on parameters that
are passed to it, but to me, that seems better suited for a different
library/alternative interface. I guess that I just don't see a practical
benefit in having both means of specifying the type of executor for
concurrent.futures in the standard library, both from a development
maintenance perspective and feature bloat. If a user wants to be able to
specify the executor used in this manner, it's rather trivial to implement
it in a few lines of code without having to access any private members;
which to me seems to indicate that there's not a whole lot of value in
adding it to the standard library.

That being said, if there are others that would like to use an alternative
interface for concurrent.futures, it could very well be uploaded as a small
package on PyPI. I just personally don't think it has a place in the
existing concurrent.futures module.


[1] - One could say that context managers provide an alternative means of
creating and using the executors, but context managers provide a
significant added value in the form of resource cleanup. To me, there
doesn't seem to be much real added value in being able to both use the
existing``executor = cf.ThreadPoolExecutor()`` and a new ``executor =
cf.create_executor(mode="thread")`` / ``executor =
cf.Executor.create(mode="thread")``.


*Add a SerialExecutor, which does not use threads or processes*

Andrew Barnert wrote:
> e.g., in C++, you only use executors via the std::async function, and you
can just pass a launch option instead of an executor to run synchronously

In the case of C++'s std::async though, it still launches a thread to run
the function within, no? This doesn't require the user to explicitly create
or interact with the thread in any way, but that seems to go against what
OP was looking for:

Jonathan Crall wrote:
> Often times a develop will want to run a task in parallel, but depending
on the environment they may want to disable threading or process execution.

The *concrete* purpose of what that accomplishes (in the context of
CPython) isn't clear to me. How exactly are you running the task in
parallel without using a thread, process, or coroutine [1]? Without using
one of those constructs (directly or indirectly), you're really just
executing the tasks one-by-one, not with any form of parallelism, no? That
seems to go against the primary practical purpose of using
concurrent.futures in the first place. Am I misunderstanding something
here? Perhaps it would help to have some form of real-world example where
this might be useful, and how it would benefit from using something like
SerialExecutor over other alternatives.

Jonathan Crall wrote:
> The `set_result` is overloaded because in Python 3.8, the base
Future.set_result function asserts that the _state is not FINISHED when it
is called. In my proof-of-concept implementation I had to set state of the
SerialFuture._state to FINISHED in order for `as_completed` to yield it.
Again, there may be a better way to do this, but I don't claim to know what
that is yet.

The main purpose of `cf.as_completed()` is to yield the results
asynchronously as they're completed (FINISHED or CANCELLED), which is
inherently *not* going to be serial. If you want to instead yield each
result in the same order they're submitted, but as each one is completed
[2], you could do something like this:

```
executor = cf.ThreadPoolExecutor()
futs = []
for item in to_do:
 fut = executor.submit(do_something, item)
 futs.append(fut)
for fut in futs:
yield fut.result()
```

(The above would be presumably part of some generator function/method where
you could pass a function *do_something* and an iterable of IO-bound tasks
*to_do*)

This would allow you to execute tasks the parallel, while ensuring the
results yielded are serial/synchronous.


[1] - You could also create subinterpreters to run tasks in parallel
through the C-API, or through the upcoming 

[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-15 Thread Guido van Rossum
Having tried my hand at a simpler version for about 15 minutes, I see the
reason for the fiddly subclass of Future -- it seems over-engineered
because concurrent.future is complicated.

I've never felt the need for either of these myself, nor have I observed it
in others I worked with. In general I feel the difference between processes
and threads is so large that I can't believe a realistic application would
work with either. (Then again I've never had much use for ProcessExecutor
period.)

The "Serial" variants somehow remind me of the "dummy_thread.py" module we
had in Python 2. It was removed in Python 3, mostly because we ran out of
cases where real threads weren't an option.

IOW I'm rather lukewarm about this -- even if you (Jonathan) have found use
for it, I'm not sure how many other people would use it, so I doubt it's
worth adding it to the stdlib. (The only thing the stdlib might grow could
be a public API that makes implementing this feasible without overriding
private methods.)

On Sat, Feb 15, 2020 at 3:16 PM Jonathan Crall  wrote:

> This implementation is a proof-of-concept that I've been using for awhile
> .
> Its certain that any version that made it into the stdlib would have to be
> more carefully designed than the implementation I threw together. However,
> my implementation demonstrates the concept and there are reasons for the
> choices I made.
>
> First, the choice to create a SerialFuture object that inherits from the
> base Future was because I only wanted a process to run if the
> SerialFuture.result method was called. The most obvious way to do that was
> to overload the `result` method to execute the function when called.
> Perhaps there is a better way, but in an effort to KISS I just went with
> the <100 line version that seemed to work well enough.
>
> The `set_result` is overloaded because in Python 3.8, the base
> Future.set_result function asserts that the _state is not FINISHED when it
> is called. In my proof-of-concept implementation I had to set state of the
> SerialFuture._state to FINISHED in order for `as_completed` to yield it.
> Again, there may be a better way to do this, but I don't claim to know what
> that is yet.
>
> I was thinking that a factory function might be a good idea, but if I was
> designing the system I would have put that in the abstract Executor class.
> Maybe something like
>
>
> ```
> @classmethod
> def create(cls, mode, max_workers=0):
> """ Create an instance of a serial, thread, or process-based executor
> """
> from concurrent import futures
> if mode == 'serial' or max_workers == 0:
> return futures.SerialExecutor()
> elif mode == 'thread':
> return futures.ThreadPoolExecutor(max_workers=max_workers)
> elif mode == 'process':
> return futures.ProcessPoolExecutor(max_workers=max_workers)
> else:
> raise KeyError(mode)
> ```
>
> I do think that it would improve the standard lib to have something like
> this --- again perhaps not this exact version (it does seem a bit weird to
> give this method to an abstract class), but some common API that makes it
> easy for the user to swap between the backend Executor implementation. Even
> though the implementation is "trivial", lots of things in the standard lib
> are, but they the reduce boilerplate that developers would otherwise need,
> provide examples of good practices to new developers, and provide a defacto
> way to do something that might otherwise be implemented differently by
> different people, so it adds value to the stdlib.
>
> That being said, while I will advocate for the inclusion of such a factory
> method or wrapper class, it would only be a minor annoyance to not have it.
> On the other hand I think a SerialExecutor is something that is sorely
> missing from the standard library.
>
> On Sat, Feb 15, 2020 at 5:16 PM Andrew Barnert  wrote:
>
>> > On Feb 15, 2020, at 13:36, Jonathan Crall  wrote:
>> >
>> > Also, there is no duck-typed class that behaves like an executor, but
>> does its processing in serial. Often times a develop will want to run a
>> task in parallel, but depending on the environment they may want to disable
>> threading or process execution. To address this I use a utility called a
>> `SerialExecutor` which shares an API with
>> ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially
>> in the same python thread:
>>
>> This makes sense. I think most futures-and-executors frameworks in other
>> languages have a serial/synchronous/immediate/blocking executor just like
>> this. (And the ones that don’t, it’s usually because they have a different
>> way to specify the same functionality—e.g., in C++, you only use executors
>> via the std::async function, and you can just pass a launch option instead
>> of an executor to run synchronously.)
>>
>> And I’ve wanted this, and even built it myself at least 

[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-15 Thread Jonathan Crall
This implementation is a proof-of-concept that I've been using for awhile
.
Its certain that any version that made it into the stdlib would have to be
more carefully designed than the implementation I threw together. However,
my implementation demonstrates the concept and there are reasons for the
choices I made.

First, the choice to create a SerialFuture object that inherits from the
base Future was because I only wanted a process to run if the
SerialFuture.result method was called. The most obvious way to do that was
to overload the `result` method to execute the function when called.
Perhaps there is a better way, but in an effort to KISS I just went with
the <100 line version that seemed to work well enough.

The `set_result` is overloaded because in Python 3.8, the base
Future.set_result function asserts that the _state is not FINISHED when it
is called. In my proof-of-concept implementation I had to set state of the
SerialFuture._state to FINISHED in order for `as_completed` to yield it.
Again, there may be a better way to do this, but I don't claim to know what
that is yet.

I was thinking that a factory function might be a good idea, but if I was
designing the system I would have put that in the abstract Executor class.
Maybe something like


```
@classmethod
def create(cls, mode, max_workers=0):
""" Create an instance of a serial, thread, or process-based executor
"""
from concurrent import futures
if mode == 'serial' or max_workers == 0:
return futures.SerialExecutor()
elif mode == 'thread':
return futures.ThreadPoolExecutor(max_workers=max_workers)
elif mode == 'process':
return futures.ProcessPoolExecutor(max_workers=max_workers)
else:
raise KeyError(mode)
```

I do think that it would improve the standard lib to have something like
this --- again perhaps not this exact version (it does seem a bit weird to
give this method to an abstract class), but some common API that makes it
easy for the user to swap between the backend Executor implementation. Even
though the implementation is "trivial", lots of things in the standard lib
are, but they the reduce boilerplate that developers would otherwise need,
provide examples of good practices to new developers, and provide a defacto
way to do something that might otherwise be implemented differently by
different people, so it adds value to the stdlib.

That being said, while I will advocate for the inclusion of such a factory
method or wrapper class, it would only be a minor annoyance to not have it.
On the other hand I think a SerialExecutor is something that is sorely
missing from the standard library.

On Sat, Feb 15, 2020 at 5:16 PM Andrew Barnert  wrote:

> > On Feb 15, 2020, at 13:36, Jonathan Crall  wrote:
> >
> > Also, there is no duck-typed class that behaves like an executor, but
> does its processing in serial. Often times a develop will want to run a
> task in parallel, but depending on the environment they may want to disable
> threading or process execution. To address this I use a utility called a
> `SerialExecutor` which shares an API with
> ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially
> in the same python thread:
>
> This makes sense. I think most futures-and-executors frameworks in other
> languages have a serial/synchronous/immediate/blocking executor just like
> this. (And the ones that don’t, it’s usually because they have a different
> way to specify the same functionality—e.g., in C++, you only use executors
> via the std::async function, and you can just pass a launch option instead
> of an executor to run synchronously.)
>
> And I’ve wanted this, and even built it myself at least once—it’s a great
> way to get all of the logging in order to make things easier to debug, for
> example.
>
> However, I think you may have overengineered this.
>
> Why can’t you use the existing Future type as-is? Yes, there’s a bit of
> unnecessary overhead, but your reimplementation seems to add almost the
> same unnecessary overhead. And does it make enough difference in practice
> to be worth worrying about anyway? (It doesn’t for my uses, but maybe
> you’re are different.)
>
> Also, why are you overriding set_result to restore pre-3.8 behavior? The
> relevant change here seems to be the one where 3.8 prevents executors from
> finishing already-finished (or canceled) futures; why does your executor
> need that?
>
> Finally, why do you need a wrapper class that constructs one of the three
> types at initialization and then just delegates all methods to it? Why not
> just use a factory function that constructs and returns an instance of one
> of the three types directly? And, given how trivial that factory function
> is, does it even need to be in the stdlib?
>
> I may well be missing something that makes some of these choices necessary
> or desirable. But otherwise, I 

[Python-ideas] Re: SerialExecutor for concurrent.futures + Convenience constructor

2020-02-15 Thread Andrew Barnert via Python-ideas
> On Feb 15, 2020, at 13:36, Jonathan Crall  wrote:
> 
> Also, there is no duck-typed class that behaves like an executor, but does 
> its processing in serial. Often times a develop will want to run a task in 
> parallel, but depending on the environment they may want to disable threading 
> or process execution. To address this I use a utility called a 
> `SerialExecutor` which shares an API with 
> ThreadPoolExecutor/ProcessPoolExecutor but executes processes sequentially in 
> the same python thread:

This makes sense. I think most futures-and-executors frameworks in other 
languages have a serial/synchronous/immediate/blocking executor just like this. 
(And the ones that don’t, it’s usually because they have a different way to 
specify the same functionality—e.g., in C++, you only use executors via the 
std::async function, and you can just pass a launch option instead of an 
executor to run synchronously.)

And I’ve wanted this, and even built it myself at least once—it’s a great way 
to get all of the logging in order to make things easier to debug, for example.

However, I think you may have overengineered this. 

Why can’t you use the existing Future type as-is? Yes, there’s a bit of 
unnecessary overhead, but your reimplementation seems to add almost the same 
unnecessary overhead. And does it make enough difference in practice to be 
worth worrying about anyway? (It doesn’t for my uses, but maybe you’re are 
different.)

Also, why are you overriding set_result to restore pre-3.8 behavior? The 
relevant change here seems to be the one where 3.8 prevents executors from 
finishing already-finished (or canceled) futures; why does your executor need 
that?

Finally, why do you need a wrapper class that constructs one of the three types 
at initialization and then just delegates all methods to it? Why not just use a 
factory function that constructs and returns an instance of one of the three 
types directly? And, given how trivial that factory function is, does it even 
need to be in the stdlib?

I may well be missing something that makes some of these choices necessary or 
desirable. But otherwise, I think we’d be better off adding a SerialExecutor 
(that works with the existing Future type as-is) but not adding or changing 
anything else.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/M5EQAU5HN2REEJ6LP5MF55JPDRZESTUX/
Code of Conduct: http://python.org/psf/codeofconduct/