[Python-ideas] Re: Generalized deferred computation in Python

Chris Angelico Sat, 25 Jun 2022 13:42:17 -0700

On Sun, 26 Jun 2022 at 04:41, Brendan Barnwell <[email protected]> wrote:
>         In contrast, what I would want out of deferred evaluation is precisely
> the ability to evaluate the deferred expression in the *evaluating*
> scope (not the definition scope) --- or in a custom provided namespace.
>   Whether this evaluation is implicit or explicit is less important to
> me than the ability to control the scope in which it occurs.  As others
> mentioned in early posts on this thread, this could complicate things
> too much to be feasible, but without it I don't really see the point.


A custom-provided namespace can already be partly achieved, but
working in the evaluating scope is currently impossible and would
require some major deoptimizations to become possible.

>>> expr = lambda: x + y
>>> expr.__code__.co_code
b't\x00t\x01\x17\x00S\x00'
>>> ns = {"x": 3, "y": 7}
>>> eval(expr.__code__, ns)
10

This works because the code object doesn't have any locals, so the
name references are encoded as global lookups, and eval() is happy to
use arbitrary globals. I say "partly achieved" because this won't work
if there are any accidental closure variables - you can't isolate the
lambda function from its original context and force everything to be a
global:

>>> def f(x):
...     return lambda: x + y
...
>>> expr = f(42)
>>> eval(expr.__code__, ns)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: code object passed to eval() may not contain free variables

The mere fact that there's a local variable 'x' means that you can't
compile the expression 'x + y'. So maybe there'd need to be some weird
trick with class namespaces, but I'm really not sure what would be
worth doing.

But evaluating in the caller's namespace is not going to work without
some fairly major reworking. At the very least, you'd have to forbid
any form of assignment (including assignment expressions), and it
would force every surrounding variable to become a nonlocal (no fast
locals any more). I don't know what other costs there'd be, and
whether it'd even be possible, but if it is, it would certainly be a
massive deoptimization to all code, just to permit the possibility
that something gets evaluated in this context.

>         The reason this is key for me is that I'm focused on a different set 
> of
> motivating use cases.  What I'm interested in is "query-type" situations
> where you want to pass an expression to some sort of "query engine",
> which will evaluate the expression in a namespace representing the
> dataset to be queried.  One example would be SQL queries, where it would
> be nice to be able to do things like:
>
> my_sql_table.select(where=thunk (column1 == 2 and column2 > 5))
>
>         Likewise this would make pandas indexing less verbose, turning it 
> from:
>
> df[(df.column1 == 2) & (df.column2 > 5)]
>
>         to:
>
> df[(column1 == 2) & (column2 > 3)]

So far, so good. In fact, aside from the "accidental closure variable"
problem, these could currently be done with a lambda function.

>         or even potentially:
>
> df[column1 == 2 and column2 > 3]
>
>         . . . because the evaluator would have control over the evaluation and
> could provide a namespace in which `column1` and `column2` do not
> evaluate directly to numpy-like arrays (for which `and` doesn't work),
> but to some kind of combinable query object which converts the `and`
> into something that will work with numpy-like elementwise comparison.

Converting "and" isn't possible, nor should it ever be. But depending
on how the lookup is done, it might be possible to actually reevaluate
for every row (or maybe that'd be just hopelessly inefficient on
numpy's end).

>         This would also mean that such deferred objects could handle the
> late-bound default case, but the function would have to "commit" to
> explicit evaluation of such defaults.  Probably there could be a no-op
> "unwrapping" operation that would work on non-deferred objects (so that
> `unwrap([])` or whatever would just evaluate to the same regular list
> you passed in), so you could still pass in a plain list a to an argument
> whose default was `deferred []`, but the function would still have to
> explicitly evaluate it in its body.  Again, I think I'm okay with this,
> partly because (as I mentioned in the other thread) I don't see PEP
> 671-style late-bound defaults as a particularly pressing need.

That seems all very well, but it does incur a fairly huge cost for a
relatively simple benefit. Consider:

def f(x=defer [], n=defer len(x)):
    unwrap(x); unwrap(n)
    print("You gave me", n, "elements to work with")

f(defer (print := lambda *x: None))

Is it correct for every late-bound argument default to also be a code
injection opportunity? And if so, then why should other functions
*not* have such an opportunity afforded to them? I mean, if we're
going to have spooky action at a distance, we may as well commit to
it. Okay, I jest, but still - giving callers the ability to put
arbitrary code into the function is going to be FAR harder to reason
about than simply having the code in the function header.

>         There are definitely some holes in my idea.  For one thing, with
> explicit evaluation required, it is much closer to a regular lambda.
> The only real difference is that it would involve more flexible scope
> control (rather than unalterably closing over the defining scope).

TBH I think that that's quite useful, just not for PEP 671. For query
languages, it'd be very handy to be able to have a keyword that says
"isolate the parsing of this". I could imagine this being useful for
function annotations too, although they've been special-cased
somewhat, so that might be less of a concern.

> There is also the question of whether it would unacceptably slow down
> name references because functions would no longer know which variables
> were local; I think I would be okay with saying that the thunk could not
> mutate the enclosing namespace (so, e.g., walruses inside the thunk
> would only affect an internal thunk namespace).  The point here is for
> the consumer to *evaluate* the thunk and get the result, not inline it
> into the surrounding code.

Yep; but the trouble is that referring to a name can also incur a
cost, especially when it comes to closures. So I think the explicit
namespace is going to be far safer than "evaluate in the caller's
context".

That said: you can and should be able to prepopulate the evaluation
namespace with whatever you like, so using locals() as a "seed"
dictionary would basically give you what you want - a non-assignable
namespace that has all of these locals available for reference.

ChrisA
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/5SZZEPDI2X73WLVJWJEN33PBFH2OPGTX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Generalized deferred computation in Python

Reply via email to