[Python-ideas] Re: dataclass field argument to allow converting value on init

2022-06-25 Thread Steve Jorgensen
Dexter Hill wrote:
> Do you mind providing a little example of what you mean? I'm not sure I 100% 
> understand what your use of `__post_init__` is. In my mind, it would be 
> something like:
> ```py
> @dataclass
> class Foo:
> x: str = field(init=int, converter=chr)
> # which converts to
> class Foo:
> def __init__(self, x: int):
> self.x = chr(x)
> ```
> without any use of `__post_init__`. If it were to be something like:
> ```py
> class Foo:
> def __init__(self, x: int):
> self.__post_init__(x)
> def __post_init__(x: int):
> self.x = chr(x)
> ```
> which, I think is what you are suggesting (please correct me if I'm wrong), 
> then I feel that may be confusing if you were to override `__post_init__`, 
> which is often much easier than overriding `__init__`.
> For exmple, in a situation like:
> ```py
> @dataclass
> class Foo:
> x: str = field(init=int, converter=chr)
> y: InitVar[str]
> ```
> if the user were to override `__post_init__`, would they know that they need 
> to include `x` as the first argument? It's not typed with `InitVar` so it 
> might not be clear that it's passed to `__post_init__`.
That's close to what I mean. I'm actually suggesting to not have 'converter 
though, and instead use an explicit `__post_init__` for that, so
```py
@dataclass
class Foo:
x: str = field(init=int)

def __post_init__(self, x: int):
self.x = chr(x)

# converts to
class Foo:
def __init__(self, x: int):
self.__post_init__(x)

def __post_init__(self, x: int):
self.x = chr(x)
```
Writing that out is helpful because now I see that the argument type can 
possibly be taken from the `__post_init__` signature, meaning there is no need 
to use the type as the value for the `init` argument to `field`. In that case, 
instead of `init=int`, it could maybe be something like `post_init=True`.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LI7ZSAZ6VGQV4OEP7ZOXIWIKA4VLMWXJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-25 Thread Chris Angelico
On Sun, 26 Jun 2022 at 04:41, Brendan Barnwell  wrote:
> In contrast, what I would want out of deferred evaluation is precisely
> the ability to evaluate the deferred expression in the *evaluating*
> scope (not the definition scope) --- or in a custom provided namespace.
>   Whether this evaluation is implicit or explicit is less important to
> me than the ability to control the scope in which it occurs.  As others
> mentioned in early posts on this thread, this could complicate things
> too much to be feasible, but without it I don't really see the point.

A custom-provided namespace can already be partly achieved, but
working in the evaluating scope is currently impossible and would
require some major deoptimizations to become possible.

>>> expr = lambda: x + y
>>> expr.__code__.co_code
b't\x00t\x01\x17\x00S\x00'
>>> ns = {"x": 3, "y": 7}
>>> eval(expr.__code__, ns)
10

This works because the code object doesn't have any locals, so the
name references are encoded as global lookups, and eval() is happy to
use arbitrary globals. I say "partly achieved" because this won't work
if there are any accidental closure variables - you can't isolate the
lambda function from its original context and force everything to be a
global:

>>> def f(x):
... return lambda: x + y
...
>>> expr = f(42)
>>> eval(expr.__code__, ns)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: code object passed to eval() may not contain free variables

The mere fact that there's a local variable 'x' means that you can't
compile the expression 'x + y'. So maybe there'd need to be some weird
trick with class namespaces, but I'm really not sure what would be
worth doing.

But evaluating in the caller's namespace is not going to work without
some fairly major reworking. At the very least, you'd have to forbid
any form of assignment (including assignment expressions), and it
would force every surrounding variable to become a nonlocal (no fast
locals any more). I don't know what other costs there'd be, and
whether it'd even be possible, but if it is, it would certainly be a
massive deoptimization to all code, just to permit the possibility
that something gets evaluated in this context.

> The reason this is key for me is that I'm focused on a different set 
> of
> motivating use cases.  What I'm interested in is "query-type" situations
> where you want to pass an expression to some sort of "query engine",
> which will evaluate the expression in a namespace representing the
> dataset to be queried.  One example would be SQL queries, where it would
> be nice to be able to do things like:
>
> my_sql_table.select(where=thunk (column1 == 2 and column2 > 5))
>
> Likewise this would make pandas indexing less verbose, turning it 
> from:
>
> df[(df.column1 == 2) & (df.column2 > 5)]
>
> to:
>
> df[(column1 == 2) & (column2 > 3)]

So far, so good. In fact, aside from the "accidental closure variable"
problem, these could currently be done with a lambda function.

> or even potentially:
>
> df[column1 == 2 and column2 > 3]
>
> . . . because the evaluator would have control over the evaluation and
> could provide a namespace in which `column1` and `column2` do not
> evaluate directly to numpy-like arrays (for which `and` doesn't work),
> but to some kind of combinable query object which converts the `and`
> into something that will work with numpy-like elementwise comparison.

Converting "and" isn't possible, nor should it ever be. But depending
on how the lookup is done, it might be possible to actually reevaluate
for every row (or maybe that'd be just hopelessly inefficient on
numpy's end).

> This would also mean that such deferred objects could handle the
> late-bound default case, but the function would have to "commit" to
> explicit evaluation of such defaults.  Probably there could be a no-op
> "unwrapping" operation that would work on non-deferred objects (so that
> `unwrap([])` or whatever would just evaluate to the same regular list
> you passed in), so you could still pass in a plain list a to an argument
> whose default was `deferred []`, but the function would still have to
> explicitly evaluate it in its body.  Again, I think I'm okay with this,
> partly because (as I mentioned in the other thread) I don't see PEP
> 671-style late-bound defaults as a particularly pressing need.

That seems all very well, but it does incur a fairly huge cost for a
relatively simple benefit. Consider:

def f(x=defer [], n=defer len(x)):
unwrap(x); unwrap(n)
print("You gave me", n, "elements to work with")

f(defer (print := lambda *x: None))

Is it correct for every late-bound argument default to also be a code
injection opportunity? And if so, then why should other functions
*not* have such an opportunity afforded to them? I mean, if we're
going to have spooky action at a distance, we may as well commit to
it. Okay, I jest, but still - giving callers the ability to put

[Python-ideas] Re: Generalized deferred computation in Python

2022-06-25 Thread Brendan Barnwell

On 2022-06-21 13:53, David Mertz, Ph.D. wrote:

Here is a very rough draft of an idea I've floated often, but not with
much specification.  Take this as "ideas" with little firm commitment to
details from me. PRs, or issues, or whatever, can go to
https://github.com/DavidMertz/peps/blob/master/pep-.rst as well as
mentioning them in this thread.


	After looking at this a bit more (with the newer revisions) and 
following the discussion I think this proposal doesn't really achieve 
what I would want from deferred evaluation.  That may be because what I 
want is unreasonable, but, well, such is life.  :-)


	First, it is not clear to me what the real point of this type of 
deferred evaluation is.  The PEP has a "motivation" section that makes a 
link to Haskell and Dask, but as far as I can see it doesn't explicitly 
say what is gained by introducing this new form of lazy evaluation into 
Python.


	In particular (as I think someone else mentioned on this thread), 
Dask-style deferred computations are based on explicitly evaluating the 
thunk, whereas this proposal would automatically evaluate it on 
reference.  I think that in practice this would make many Dask-style 
usages unwieldy because you would have to keep repeating the `later` 
keyword in order to gradually build up a complex deferred computation 
over multiple statements.  For such cases it is more natural to 
explicitly evaluate the whole thing at the end, rather than explicitly 
not evaluate it until then.


	In theory there could be performance gains, as mentioned in the PEP. 
But again I don't see a huge advantage to this in Python.  It might make 
sense in Haskell where laziness is built into the language at a 
fundamental level.  But in Python, where eager evaluation is the norm, 
it again seems more natural to me to use "explicit laziness" (i.e., 
explicit rather than automatic evaluation).  It seems rather unusual to 
have cases where some variable or function argument might contain either 
a computationally cheap expression or an expensive one; usually for 
those types of applications you know where you might do something 
expensive.  And even if you don't, I see little downside to requiring an 
explicit "eval this thunk" step at the end.


	In contrast, what I would want out of deferred evaluation is precisely 
the ability to evaluate the deferred expression in the *evaluating* 
scope (not the definition scope) --- or in a custom provided namespace. 
 Whether this evaluation is implicit or explicit is less important to 
me than the ability to control the scope in which it occurs.  As others 
mentioned in early posts on this thread, this could complicate things 
too much to be feasible, but without it I don't really see the point.


	The reason this is key for me is that I'm focused on a different set of 
motivating use cases.  What I'm interested in is "query-type" situations 
where you want to pass an expression to some sort of "query engine", 
which will evaluate the expression in a namespace representing the 
dataset to be queried.  One example would be SQL queries, where it would 
be nice to be able to do things like:


my_sql_table.select(where=thunk (column1 == 2 and column2 > 5))

Likewise this would make pandas indexing less verbose, turning it from:

df[(df.column1 == 2) & (df.column2 > 5)]

to:

df[(column1 == 2) & (column2 > 3)]

or even potentially:

df[column1 == 2 and column2 > 3]

	. . . because the evaluator would have control over the evaluation and 
could provide a namespace in which `column1` and `column2` do not 
evaluate directly to numpy-like arrays (for which `and` doesn't work), 
but to some kind of combinable query object which converts the `and` 
into something that will work with numpy-like elementwise comparison.


	In other words, the point here is not performance gains or even 
laziness, but simply the ability to use ordinary Python expression 
syntax (not, say, a string) to create an unevaluated chunk which can be 
passed to some other code which then gets to control its evaluation 
scope, rather than having that scope locked to where it was defined. 
Because of this, it is probably okay with me if explicit unwrapping of 
the thunk is required.  You know when you are writing a query handler 
and so you know that what you want is an unevaluated query expression; 
you don't need to have an argument whose value might either be an 
unevaluated expression or a fully-evaluated result.


	This would also mean that such deferred objects could handle the 
late-bound default case, but the function would have to "commit" to 
explicit evaluation of such defaults.  Probably there could be a no-op 
"unwrapping" operation that would work on non-deferred objects (so that 
`unwrap([])` or whatever would just evaluate to the same regular list 
you passed in), so you could still pass in a plain list a to an argument 
whose default was `deferred []`, but the function would still have to 

[Python-ideas] Re: dataclass field argument to allow converting value on init

2022-06-25 Thread Dexter Hill
Do you mind providing a little example of what you mean? I'm not sure I 100% 
understand what your use of `__post_init__` is. In my mind, it would be 
something like:
```py
@dataclass
class Foo:
x: str = field(init=int, converter=chr)

# which converts to
class Foo:
def __init__(self, x: int):
self.x = chr(x)
```
without any use of `__post_init__`. If it were to be something like:
```py
class Foo:
def __init__(self, x: int):
self.__post_init__(x)

def __post_init__(x: int):
self.x = chr(x)
```
which, I think is what you are suggesting (please correct me if I'm wrong), 
then I feel that may be confusing if you were to override `__post_init__`, 
which is often much easier than overriding `__init__`.
For exmple, in a situation like:
```py
@dataclass
class Foo:
x: str = field(init=int, converter=chr)
y: InitVar[str]
```
if the user were to override `__post_init__`, would they know that they need to 
include `x` as the first argument? It's not typed with `InitVar` so it might 
not be clear that it's passed to `__post_init__`.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VHVOZYGIXIMZR66COBKCKQB7JP7LPPT3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: dataclass field argument to allow converting value on init

2022-06-25 Thread Steve Jorgensen
Dexter Hill wrote:
> I don't mind that solution although my concern is whether it would be 
> confusing to have `init` have two different purposes depending on the 
> argument. And, if `__post_init__` was overrided, which I would say it 
> commonly is, that would mean the user would have to manually do the 
> conversion, as well as remembering to add an extra argument for the 
> conversion function (assuming I'm understanding what you're saying).
> If no type was provided to `init` but a conversion function was, it would be 
> a case of getting the type from the function signature, right?

The reason I am saying to use the 'init' argument is that it seems to me to be 
a variation on what that argument already does. It controls whether the 
argument is passed to the generated `__init__` method. Passing a type as the 
value for 'init' would now behave like sort of a cross between `init=False` and 
`InitVar`. The field would still be created (unlike `InitVar`) but would not be 
automatically assigned the value passed as its corresponding argument, leaving 
that responsibility to `__post_init__`. Like with `InitVar`, the argument would 
be passed to `__post_init__` since it was not processed by `__init__`.

The type annotation would continue to specify the type of the field, and the 
type passed to the 'init' argument would specify the type of its constructor 
argument.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4DUTNRIRLJKOY3CDRGIU6TZ4NV2RWP5Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-25 Thread Stephen J. Turnbull
Chris Angelico writes:

 > So the only way around it would be to make the defer keyword somehow
 > magical when used in a function signature, which kinda defeats the
 > whole point about being able to reuse another mechanic to achieve
 > this.

The defer keyword is already magical.  Overloading it with more magic
doesn't bother me.  The question of internal consistency of the
various magics does bother me.

 > Also, it would create some other oddity, depending on which way
 > this is handled:
 > 
 > _default = defer []
 > def foo(cookiejar=_default):
 > 
 > Does this also get the magic, or doesn't it? Either way, there'd be a
 > really weird inconsistency here.

Don't know, need to think about the definition and implementation of
the magic first.



___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SZRYDYQDDGTWPFALR3GRKFGDUMTXC5RZ/
Code of Conduct: http://python.org/psf/codeofconduct/