[Python-ideas] Re: Generalized deferred computation in Python

Brendan Barnwell Sat, 25 Jun 2022 11:41:04 -0700

On 2022-06-21 13:53, David Mertz, Ph.D. wrote:

Here is a very rough draft of an idea I've floated often, but not with
much specification.  Take this as "ideas" with little firm commitment to
details from me. PRs, or issues, or whatever, can go to
https://github.com/DavidMertz/peps/blob/master/pep-9999.rst as well as
mentioning them in this thread.

After looking at this a bit more (with the newer revisions) andfollowing the discussion I think this proposal doesn't really achievewhat I would want from deferred evaluation. That may be because what Iwant is unreasonable, but, well, such is life. :-)

First, it is not clear to me what the real point of this type ofdeferred evaluation is. The PEP has a "motivation" section that makes alink to Haskell and Dask, but as far as I can see it doesn't explicitlysay what is gained by introducing this new form of lazy evaluation intoPython.

In particular (as I think someone else mentioned on this thread),Dask-style deferred computations are based on explicitly evaluating thethunk, whereas this proposal would automatically evaluate it onreference. I think that in practice this would make many Dask-styleusages unwieldy because you would have to keep repeating the `later`keyword in order to gradually build up a complex deferred computationover multiple statements. For such cases it is more natural toexplicitly evaluate the whole thing at the end, rather than explicitlynot evaluate it until then.

In theory there could be performance gains, as mentioned in the PEP.But again I don't see a huge advantage to this in Python. It might makesense in Haskell where laziness is built into the language at afundamental level. But in Python, where eager evaluation is the norm,it again seems more natural to me to use "explicit laziness" (i.e.,explicit rather than automatic evaluation). It seems rather unusual tohave cases where some variable or function argument might contain eithera computationally cheap expression or an expensive one; usually forthose types of applications you know where you might do somethingexpensive. And even if you don't, I see little downside to requiring anexplicit "eval this thunk" step at the end.

In contrast, what I would want out of deferred evaluation is preciselythe ability to evaluate the deferred expression in the *evaluating*scope (not the definition scope) --- or in a custom provided namespace.Whether this evaluation is implicit or explicit is less important tome than the ability to control the scope in which it occurs. As othersmentioned in early posts on this thread, this could complicate thingstoo much to be feasible, but without it I don't really see the point.

The reason this is key for me is that I'm focused on a different set ofmotivating use cases. What I'm interested in is "query-type" situationswhere you want to pass an expression to some sort of "query engine",which will evaluate the expression in a namespace representing thedataset to be queried. One example would be SQL queries, where it wouldbe nice to be able to do things like:


my_sql_table.select(where=thunk (column1 == 2 and column2 > 5))

        Likewise this would make pandas indexing less verbose, turning it from:

df[(df.column1 == 2) & (df.column2 > 5)]

        to:

df[(column1 == 2) & (column2 > 3)]

        or even potentially:

df[column1 == 2 and column2 > 3]

. . . because the evaluator would have control over the evaluation andcould provide a namespace in which `column1` and `column2` do notevaluate directly to numpy-like arrays (for which `and` doesn't work),but to some kind of combinable query object which converts the `and`into something that will work with numpy-like elementwise comparison.

In other words, the point here is not performance gains or evenlaziness, but simply the ability to use ordinary Python expressionsyntax (not, say, a string) to create an unevaluated chunk which can bepassed to some other code which then gets to control its evaluationscope, rather than having that scope locked to where it was defined.Because of this, it is probably okay with me if explicit unwrapping ofthe thunk is required. You know when you are writing a query handlerand so you know that what you want is an unevaluated query expression;you don't need to have an argument whose value might either be anunevaluated expression or a fully-evaluated result.

This would also mean that such deferred objects could handle thelate-bound default case, but the function would have to "commit" toexplicit evaluation of such defaults. Probably there could be a no-op"unwrapping" operation that would work on non-deferred objects (so that`unwrap([])` or whatever would just evaluate to the same regular listyou passed in), so you could still pass in a plain list a to an argumentwhose default was `deferred []`, but the function would still have toexplicitly evaluate it in its body. Again, I think I'm okay with this,partly because (as I mentioned in the other thread) I don't see PEP671-style late-bound defaults as a particularly pressing need.

There are definitely some holes in my idea. For one thing, withexplicit evaluation required, it is much closer to a regular lambda.The only real difference is that it would involve more flexible scopecontrol (rather than unalterably closing over the defining scope). Foranother, because it is not lazy, it is closer to being achievable withexisting mechanisms, like requiring all "field" references in the queryto be specified as attributes on some base object (which is indeed mostSQL ORMs and pandas-like data structures do it currently). Other peoplemight not be as annoyed with these existing solutions as I am. :-)There is also the question of whether it would unacceptably slow downname references because functions would no longer know which variableswere local; I think I would be okay with saying that the thunk could notmutate the enclosing namespace (so, e.g., walruses inside the thunkwould only affect an internal thunk namespace). The point here is forthe consumer to *evaluate* the thunk and get the result, not inline itinto the surrounding code.

My idea is much more half-baked than David's proto-PEP so this isn'treally worthy of being called an alternative proposal right now. But Iwanted to mention these ideas here to at least handwave about what to methe gain would be from deferred evaluation, as I'm coming at it from asomewhat different angle than the proto-PEP. I have a suspicion thatthe response will be a combination of disgust and deafening silence butthat's life.


--
Brendan Barnwell

"Do not follow where the path may lead. Go, instead, where there is nopath, and leave a trail."

   --author unknown
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DMW5TI2WGHXYQQA334YT4VCA3PUZ237U/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Generalized deferred computation in Python

Reply via email to