Martin Di Paola wrote:
> Three cases: Dask/PySpark, Django's ORM and selectq. All of them
> implement deferred expressions but all of them "compute" them in very
> specific ways (aka, they plan and execute the computation differently).


So - I've been hit with the "transparency execution of deferred code"
dilemma
before.

What happens is that: Python, at one point will have to "use" an object -
and that use
is through calling one of the dunder methods. Up to that time, like, just
writing the object name
in a no-operation line, does nothing. (unless the line is in a REPL, which
will then call the __repr__
method in the object).

I have implemented a toy project far in the past that would implement "all
possible"
dunder methods, and proxy those to the underlying object, for a "future
type" that was
calculated off-process, and did not need any ".value()" or ".result()"
methods to be called.

Any such an object, that has slots for all dunder methods, any of which,
when called, would
trigger the resolve, could work today, without any modification, to
implement
the proposed behavior.

And all that is needed to be possible to manipulate the object before the
evaluation takes place, is a single reserved name, within the object
namespace, that is not a proxy to evaluation. It could be special-cased
within the object's class __getattribute__ itself: not even a new reserved
dunder slot would be needed: that is:
"getattr(myobj, "_is_deferred", False)" would not trigger the evaluation.
(although a special slot for it in object would allow plain checking using
"myobj.__is_deferred__" without the need to use getattr or hasattr)

So, all that would  be needed for such a feature would be
keyword support to build this special proxy type.

That said, the usefulness or not of this proposal can be better thought,
as well, as, knowing that this "special attribute" mechanism can be used
to add further inspection/modification mechanisms to the delayed
objects.



The act of "filling in all possible dunder methods" itself is
quite hacky, but even if done in C, I don't think it could be avoided.

Here is the code I referred to that implements the same proxy
type that would be needed for this feature -
(IRRC it is even pip installable):

https://bitbucket.org/jsbueno/lelo/src/master/lelo/_lelo.py


On Wed, Jun 22, 2022 at 11:46 AM Martin Di Paola <martinp.dipa...@gmail.com>
wrote:

> Hi David, I read the PEP and I think it would be useful to expand the
> Motivation and Examples sections.
>
> While indeed Dask uses lazy evaluation to build a complex computation
> without executing it, I don't think that it is the whole story.
>
> Dask takes this deferred complex computation and *plans* how to execute it
> and then it *executes* it in non-obvious/direct ways.
>
> For example, the computation of the min() of a dataframe can be done
> computing the min() of each partition of the dataframe and then
> computing the min() of them. Here is where the plan and the execution
> stages play.
>
> All of this is hidden from the developer. From his/her perspective the
> min() is called once over the whole dataframe.
>
> Dask's deferred computations are "useless" without the
> planning/execution plan.
>
> PySpark, like Dask, does exactly the same.
>
> But what about Django's ORM? Indeed Django allows you the build a SQL
> query without executing it. You can then perform more subqueries,
> joins and group by without executing them.
>
> Only when you need the real data the query is executed.
>
> This is another example of deferred execution similar to Dask/PySpark
> however when we consider the planning/execution stages the similarities
> ends there.
>
> Django's ORM writes a SQL query and send it to a SQL database.
>
> Another example of deferred execution would be my library to interact
> with web pages programmatically: selectq.
>
> Very much like an ORM, you can select elements from a web page, perform
> subselections and unions without really interacting with the web page.
>
> Only when you want to get the data from the page is when the deferred
> computations are executed and like an ORM, the plan done by selectq is
> to build a single xpath and then execute it using Selenium.
>
> So...
>
> Three cases: Dask/PySpark, Django's ORM and selectq. All of them
> implement deferred expressions but all of them "compute" them in very
> specific ways (aka, they plan and execute the computation differently).
>
> Would those libs (and probably others) do benefit from the PEP? How?
>
> Thanks,
> Martin.
>
> On Tue, Jun 21, 2022 at 04:53:44PM -0400, David Mertz, Ph.D. wrote:
> >Here is a very rough draft of an idea I've floated often, but not with
> much
> >specification.  Take this as "ideas" with little firm commitment to
> details
> >from me. PRs, or issues, or whatever, can go to
> >https://github.com/DavidMertz/peps/blob/master/pep-9999.rst as well as
> >mentioning them in this thread.
> >
> >PEP: 9999
> >Title: Generalized deferred computation
> >Author: David Mertz <dme...@gnosis.cx>
> >Discussions-To:
> >https://mail.python.org/archives/list/python-ideas@python.org/thread/
> >Status: Draft
> >Type: Standards Track
> >Content-Type: text/x-rst
> >Created: 21-Jun-2022
> >Python-Version: 3.12
> >Post-History:
> >
> >Abstract
> >========
> >
> >This PEP proposes introducing the soft keyword ``later`` to express the
> >concept
> >of deferred computation.  When an expression is preceded by the keyword,
> the
> >expression is not evaluated but rather creates a "thunk" or "deferred
> >object."
> >Reference to the deferred object later in program flow causes the
> >expression to
> >be executed at that point, and for both the value and type of the object
> to
> >become the result of the evaluated expression.
> >
> >
> >Motivation
> >==========
> >
> >"Lazy" or "deferred" evaluation is useful paradigm for expressing
> >relationships
> >among potentially expensive operations prior their actual computation.
> Many
> >functional programming languages, such as Haskell, build laziness into the
> >heart of their language.  Within the Python ecosystem, the popular
> >scientific
> >library `dask-delayed <dask-delayed>`_ provides a framework for lazy
> >evaluation
> >that is very similar to that proposed in this PEP.
> >
> >.. _dask-delayed:
> >   https://docs.dask.org/en/stable/delayed.html
> >
> >
> >Examples of Use
> >===============
> >
> >While the use of deferred computation is principally useful when
> >computations
> >are likely to be expensive, the simple examples shown do not necessarily
> use
> >such expecially spendy computations.  Most of these are directly inspired
> by
> >examples used in the documentation of dask-delayed.
> >
> >In dask-delayed, ``Delayed`` objects are create by functions, and
> operations
> >create a *directed acyclic graph* rather than perform actual computations.
> >For
> >example::
> >
> >    >>> import dask
> >    >>> @dask.delayed
> >    ... def later(x):
> >    ...     return x
> >    ...
> >    >>> output = []
> >    >>> data = [23, 45, 62]
> >    >>> for x in data:
> >    ...     x = later(x)
> >    ...     a = x * 3
> >    ...     b = 2**x
> >    ...     c = a + b
> >    ...     output.append(c)
> >    ...
> >    >>> total = sum(output)
> >    >>> total
> >    Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0')
> >    >>> total.compute()
> >    4611721202807865734
> >    >>> total.visualize()
> >
> >.. figure:: pep-9999-dag.png
> >   :align: center
> >   :width: 50%
> >   :class: invert-in-dark-mode
> >
> >   Figure 1.  Dask DAG created from simple operations.
> >
> >Under this PEP, the soft keyword ``later`` would work in a similar manner
> to
> >this dask.delayed code.  But rather than requiring calling ``.compute()``
> >on a
> >``Delayed`` object to arrive at the result of a computation, every
> >reference to
> >a binding would perform the "compute" *unless* it was itself a deferred
> >expression.  So the equivalent code under this PEP would be::
> >
> >    >>> output = []
> >    >>> data = [23, 45, 62]
> >    >>> for later x in data:
> >    ...     a = later (x * 3)
> >    ...     b = later (2**x)
> >    ...     c = later (a + b)
> >    ...     output.append(later c)
> >    ...
> >    >>> total = later sum(output)
> >    >>> type(total)  # type() does not un-thunk
> >    <class 'DeferredObject'>
> >    >>> if value_needed:
> >    ...     print(total)  # Actual computation occurs here
> >    4611721202807865734
> >
> >In the example, we assume that the built-in function `type()` is special
> in
> >not
> >counting as a reference to the binding for purpose of realizing a
> >computation.
> >Alternately, some new special function like `isdeferred()` might be used
> to
> >check for ``Deferred`` objects.
> >
> >In general, however, every regular reference to a bound object will force
> a
> >computation and re-binding on a ``Deferred``.  This includes access to
> >simple
> >names, but also similarly to instance attributes, index positions in lists
> >or
> >tuples, or any other means by which an object may be referenced.
> >
> >
> >Rejected Spellings
> >==================
> >
> >A number of alternate spellings for creating a ``Deferred`` object are
> >possible.  This PEP-author has little preference among them.  The words
> >``defer`` or ``delay``, or their past participles ``deferred`` and
> >``delayed``
> >are commonly used in discussions of lazy evaluation.  All of these would
> >work
> >equally well as the suggested soft keyword ``later``.  The keyword
> ``lazy``
> >is
> >not completely implausible, but does not seem to read as well.
> >
> >No punctuation is immediately obvious for this purpose, although
> surrounding
> >expressions with backticks is somewhat suggestive of quoting in Lisp, and
> >perhaps slightly reminiscent of the ancient use of backtick for shell
> >commands
> >in Python 1.x.  E.g.::
> >
> >    might_use = `math.gcd(a, math.factorial(b))`
> >
> >
> >Relationship to PEP-0671
> >========================
> >
> >The concept of "late-bound function argument defaults" is introduced in
> >:pep:`671`.  Under that proposal, a special syntactic marker would be
> >permitted
> >in function signatures with default arguments to allow the expressions
> >indicated as defaults to be evaluated at call time rather than at runtime.
> >In
> >current Python, we might write a toy function such as::
> >
> >    def func(items=[], n=None):
> >        if n is None:
> >            n = len(items)
> >        items.append("Hello")
> >        print(n)
> >
> >    func([1, 2, 3])  # prints: 3
> >
> >Using the :pep:`671` approach this could be simplified somewhat as::
> >
> >    def func(items=[], n=>len(items)):
> >        # late-bound defaults act as if bound here
> >        items.append("Hello")
> >        print(n)
> >
> >    func([1, 2, 3])  # prints: 3
> >
> >Under the current PEP, evaluation of a ``Deferred`` object only occurs
> upon
> >reference.  That is, for the current toy function, the evaluation would
> not
> >occur until the ``print(n)`` line.::
> >
> >    def func(items=[], n=later len(items)):
> >        items.append("Hello")
> >        print(n)
> >
> >    func([1, 2, 3])  # prints: 4
> >
> >To completely replicate the behavior of PEP-0671, an extra line at the
> >start of
> >the function body would be required::
> >
> >    def func(items=[], n=later len(items)):
> >        n = n  # Evaluate the Deferred and re-bind the name n
> >        items.append("Hello")
> >        print(n)
> >
> >    func([1, 2, 3])  # prints: 3
> >
> >
> >References
> >==========
> >
> >https://github.com/DavidMertz/peps/blob/master/pep-9999.rst
> >
> >Copyright
> >=========
> >
> >This document is placed in the public domain or under the
> >CC0-1.0-Universal license, whichever is more permissive.
> >
> >
> >
> >--
> >Keeping medicines from the bloodstreams of the sick; food
> >from the bellies of the hungry; books from the hands of the
> >uneducated; technology from the underdeveloped; and putting
> >advocates of freedom in prisons.  Intellectual property is
> >to the 21st century what the slave trade was to the 16th.
>
>
> >_______________________________________________
> >Python-ideas mailing list -- python-ideas@python.org
> >To unsubscribe send an email to python-ideas-le...@python.org
> >https://mail.python.org/mailman3/lists/python-ideas.python.org/
> >Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/DQTR3CYWMLSRRKR6MBLZNTGCG762QNDF/
> >Code of Conduct: http://python.org/psf/codeofconduct/
>
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/SK267MWXWE3CBMUY46NVD5OBZNT7KSY4/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WQXJERJOQXWYAATKDRTSQJDAPM7W2N3U/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to