On May 24, 2010, at 5:16 AM, Glyph Lefkowitz wrote:


On May 23, 2010, at 2:37 AM, Brian Quinlan wrote:

On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:


On May 22, 2010, at 8:47 PM, Brian Quinlan wrote:

Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days.

So fire away!

As you wish!

I retract my request ;-)

May you get what you wish for, may you find what you are seeking :).

The PEP should be consistent in its usage of terminology about callables. It alternately calls them "callables", "functions", and "functions or methods". It would be nice to clean this up and be consistent about what can be called where. I personally like "callables".

Did you find the terminology confusing? If not then I propose not changing it.

Yes, actually. Whenever I see references to the multiprocessing module, I picture a giant "HERE BE (serialization) DRAGONS" sign. When I saw that some things were documented as being "functions", I thought that maybe there was intended to be a restriction like the "these can only be top-level functions so they're easy for different executors to locate and serialize". I didn't realize that the intent was "arbitrary callables" until I carefully re-read the document and noticed that the terminology was inconsistent.

ProcessPoolExecutor has the same serialization perils that multiprocessing does. My original plan was to link to the multiprocessing docs to explain them but I couldn't find them listed.

But changing it in the user docs is probably a good idea. I like "callables" too.

Great. Still, users will inevitably find the PEP and use it as documentation too.

The execution context of callable code is not made clear. Implicitly, submit() or map() would run the code in threads or processes as defined by the executor, but that's not spelled out clearly.

Any response to this bit?  Did I miss something in the PEP?

Yes, the execution context is Executor-dependent. The section under ProcessPoolExecutor and ThreadPoolExecutor spells this out, I think.

More relevant to my own interests, the execution context of the callables passed to add_done_callback and remove_done_callback is left almost completely to the imagination. If I'm reading the sample implementation correctly, <http://code.google.com/p/pythonfutures/source/browse/branches/feedback/python3/futures/process.py#241 >, it looks like in the multiprocessing implementation, the done callbacks are invoked in a random local thread. The fact that they are passed the future itself *sort* of implies that this is the case, but the multiprocessing module plays fast and loose with object identity all over the place, so it would be good to be explicit and say that it's *not* a pickled copy of the future sitting in some arbitrary process (or even on some arbitrary machine).

The callbacks will always be called in a thread other than the main thread in the process that created the executor. Is that a strong enough contract?

Sure. Really, almost any contract would work, it just needs to be spelled out. It might be nice to know whether the thread invoking the callbacks is a daemon thread or not, but I suppose it's not strictly necessary.

Your concerns is that the thread will be killed when the interpreter exits? It won't be.

This is really minor, I know, but why does it say "NOTE: This method can be used to create adapters from Futures to Twisted Deferreds"? First of all, what's the deal with "NOTE"; it's the only "NOTE" in the whole PEP, and it doesn't seem to add anything. This sentence would read exactly the same if that word were deleted. Without more clarity on the required execution context of the callbacks, this claim might not actually be true anyway; Deferred callbacks can only be invoked in the main reactor thread in Twisted. But even if it is perfectly possible, why leave so much of the adapter implementation up to the imagination? If it's important enough to mention, why not have a reference to such an adapter in the reference Futures implementation, since it *should* be fairly trivial to write?

I'm a bit surprised that this doesn't allow for better interoperability with Deferreds given this discussion:

<discussion snipped>

I did not communicate that well. As implemented, it's quite possible to implement a translation layer which turns a Future into a Deferred. What I meant by that comment was, the specification in the PEP was to loose to be sure that such a layer would work with arbitrary executors.

For what it's worth, the Deferred translator would look like this, if you want to include it in the PEP (untested though, you may want to run it first):

    from twisted.internet.defer import Deferred
    from twisted.internet.reactor import callFromThread

    def future2deferred(future):
        d = Deferred()
        def invoke_deferred():
            try:
                result = future.result()
            except:
                d.errback()
            else:
                d.callback(result)
        def done_callback(same_future):
            callFromThread(invoke_deferred)
        future.add_done_callback(done_callback)
        return d

This does beg the question of what the traceback will look like in that except: block though. I guess the multi-threaded executor will use python3 exception chaining so Deferred should be able to show a sane traceback in case of an error, but what about exceptions in other processes?

I suggest having have add_done_callback, implementing it with a list so that callbacks are always invoked in the order that they're added, and getting rid of remove_done_callback.

Sounds good to me!

Great! :-)

futures._base.Executor isn't exposed publicly, but it needs to be. The PEP kinda makes it sound like it is ("Executor is an abstract class..."). Plus, A third party library wanting to implement an executor of its own shouldn't have to copy and paste the implementation of Executor.map.

That was a bug that I've fixed. Thanks!

Double-great!

One minor suggestion on the "internal future methods" bit - something I wish we'd done with Deferreds was to put 'callback()' and 'addCallbacks()' on separate objects, so that it was very explicit whether you were on the emitting side of a Deferred or the consuming side. That seems to be the case with these internal methods - they are not so much "internal" as they are for the producer of the Future (whether a unit test or executor) so you might want to put them on a different object that it's easy for the thing creating a Future() to get at but hard for any subsequent application code to fiddle with by accident. Off the top of my head, I suggest naming it "Invoker()". A good way to do this would be to have an Invoker class which can't be instantiated (raises an exception from __init__ or somesuch), then a Future.create() method which returns an Invoker, which itself has a '.future' attribute.

No reaction on this part? I think you'll wish you did this in a couple of years when you start bumping into application code that calls "set_result" :).

My reactions are mixed ;-)

Your proposal is to add a level of indirection to make it harder for people to call implementation methods. The downside is that it makes it a bit harder to write tests and Executors. I also can't see a big problem in letting people call set_result in client code though it is documented as being only for Executor implementations and tests.

On the implementation side, I don't see why an Invoker needs a reference to the future. Each Invoker could own one Future. A reference to the Invoker is kept by the Executor and its future is returned to the client i.e.

class Invoker(object):
  def __init__(self):
    """Should only be called by Executor implementations."""
    self.future = Future()

  def set_running_or_notify_cancel(self):
    # Messes with self.future's internals

  def set_result(self):
    # Messes with self.future's internals

  def set_exception(self):
    # Messes with self.future's internals


Cheers,
Brian


Finally, why isn't this just a module on PyPI? It doesn't seem like there's any particular benefit to making this a stdlib module and going through the whole PEP process - except maybe to prompt feedback like this :).

We've already had this discussion before. Could you explain why this module should *not* be in the stdlib e.g. does it have significantly less utility than other modules in stdlib? Is it significantly higher risk? etc?

You've convinced me, mainly because I noticed later on in the discussion that it *has* been released to pypi for several months, and does have a bunch of downloads. It doesn't have quite the popularity I'd personally like to see for stdlib modules, but it's not like you didn't try, and you do (sort of) have a point about small modules being hard to get adoption. I'm sorry that this, my least interesting point in my opinion, is what has seen the most discussion so far.

I'd appreciate it if you could do a release to pypi with the bugfixes you mentioned here, to make sure that the released version is consistent with what eventually gets into Python.

Oh, and one final nitpick: <http://www.rfc-editor.org/rfc/ rfc2606.txt> says you really should not put real domain names into your "web crawl example", especially not "some-made-up-domain.com".

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/brian%40sweetapp.com

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to