Hi Martin, Short answer: yes, I agree. Slightly longer: I would be eternally grateful if you wish to contribute to the PEP with any such expansion of the Motivation and Expansion.
On Wed, Jun 22, 2022 at 10:47 AM Martin Di Paola <martinp.dipa...@gmail.com> wrote: > Hi David, I read the PEP and I think it would be useful to expand the > Motivation and Examples sections. > > While indeed Dask uses lazy evaluation to build a complex computation > without executing it, I don't think that it is the whole story. > > Dask takes this deferred complex computation and *plans* how to execute it > and then it *executes* it in non-obvious/direct ways. > > For example, the computation of the min() of a dataframe can be done > computing the min() of each partition of the dataframe and then > computing the min() of them. Here is where the plan and the execution > stages play. > > All of this is hidden from the developer. From his/her perspective the > min() is called once over the whole dataframe. > > Dask's deferred computations are "useless" without the > planning/execution plan. > > PySpark, like Dask, does exactly the same. > > But what about Django's ORM? Indeed Django allows you the build a SQL > query without executing it. You can then perform more subqueries, > joins and group by without executing them. > > Only when you need the real data the query is executed. > > This is another example of deferred execution similar to Dask/PySpark > however when we consider the planning/execution stages the similarities > ends there. > > Django's ORM writes a SQL query and send it to a SQL database. > > Another example of deferred execution would be my library to interact > with web pages programmatically: selectq. > > Very much like an ORM, you can select elements from a web page, perform > subselections and unions without really interacting with the web page. > > Only when you want to get the data from the page is when the deferred > computations are executed and like an ORM, the plan done by selectq is > to build a single xpath and then execute it using Selenium. > > So... > > Three cases: Dask/PySpark, Django's ORM and selectq. All of them > implement deferred expressions but all of them "compute" them in very > specific ways (aka, they plan and execute the computation differently). > > Would those libs (and probably others) do benefit from the PEP? How? > > Thanks, > Martin. > > On Tue, Jun 21, 2022 at 04:53:44PM -0400, David Mertz, Ph.D. wrote: > >Here is a very rough draft of an idea I've floated often, but not with > much > >specification. Take this as "ideas" with little firm commitment to > details > >from me. PRs, or issues, or whatever, can go to > >https://github.com/DavidMertz/peps/blob/master/pep-9999.rst as well as > >mentioning them in this thread. > > > >PEP: 9999 > >Title: Generalized deferred computation > >Author: David Mertz <dme...@gnosis.cx> > >Discussions-To: > >https://mail.python.org/archives/list/python-ideas@python.org/thread/ > >Status: Draft > >Type: Standards Track > >Content-Type: text/x-rst > >Created: 21-Jun-2022 > >Python-Version: 3.12 > >Post-History: > > > >Abstract > >======== > > > >This PEP proposes introducing the soft keyword ``later`` to express the > >concept > >of deferred computation. When an expression is preceded by the keyword, > the > >expression is not evaluated but rather creates a "thunk" or "deferred > >object." > >Reference to the deferred object later in program flow causes the > >expression to > >be executed at that point, and for both the value and type of the object > to > >become the result of the evaluated expression. > > > > > >Motivation > >========== > > > >"Lazy" or "deferred" evaluation is useful paradigm for expressing > >relationships > >among potentially expensive operations prior their actual computation. > Many > >functional programming languages, such as Haskell, build laziness into the > >heart of their language. Within the Python ecosystem, the popular > >scientific > >library `dask-delayed <dask-delayed>`_ provides a framework for lazy > >evaluation > >that is very similar to that proposed in this PEP. > > > >.. _dask-delayed: > > https://docs.dask.org/en/stable/delayed.html > > > > > >Examples of Use > >=============== > > > >While the use of deferred computation is principally useful when > >computations > >are likely to be expensive, the simple examples shown do not necessarily > use > >such expecially spendy computations. Most of these are directly inspired > by > >examples used in the documentation of dask-delayed. > > > >In dask-delayed, ``Delayed`` objects are create by functions, and > operations > >create a *directed acyclic graph* rather than perform actual computations. > >For > >example:: > > > > >>> import dask > > >>> @dask.delayed > > ... def later(x): > > ... return x > > ... > > >>> output = [] > > >>> data = [23, 45, 62] > > >>> for x in data: > > ... x = later(x) > > ... a = x * 3 > > ... b = 2**x > > ... c = a + b > > ... output.append(c) > > ... > > >>> total = sum(output) > > >>> total > > Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0') > > >>> total.compute() > > 4611721202807865734 > > >>> total.visualize() > > > >.. figure:: pep-9999-dag.png > > :align: center > > :width: 50% > > :class: invert-in-dark-mode > > > > Figure 1. Dask DAG created from simple operations. > > > >Under this PEP, the soft keyword ``later`` would work in a similar manner > to > >this dask.delayed code. But rather than requiring calling ``.compute()`` > >on a > >``Delayed`` object to arrive at the result of a computation, every > >reference to > >a binding would perform the "compute" *unless* it was itself a deferred > >expression. So the equivalent code under this PEP would be:: > > > > >>> output = [] > > >>> data = [23, 45, 62] > > >>> for later x in data: > > ... a = later (x * 3) > > ... b = later (2**x) > > ... c = later (a + b) > > ... output.append(later c) > > ... > > >>> total = later sum(output) > > >>> type(total) # type() does not un-thunk > > <class 'DeferredObject'> > > >>> if value_needed: > > ... print(total) # Actual computation occurs here > > 4611721202807865734 > > > >In the example, we assume that the built-in function `type()` is special > in > >not > >counting as a reference to the binding for purpose of realizing a > >computation. > >Alternately, some new special function like `isdeferred()` might be used > to > >check for ``Deferred`` objects. > > > >In general, however, every regular reference to a bound object will force > a > >computation and re-binding on a ``Deferred``. This includes access to > >simple > >names, but also similarly to instance attributes, index positions in lists > >or > >tuples, or any other means by which an object may be referenced. > > > > > >Rejected Spellings > >================== > > > >A number of alternate spellings for creating a ``Deferred`` object are > >possible. This PEP-author has little preference among them. The words > >``defer`` or ``delay``, or their past participles ``deferred`` and > >``delayed`` > >are commonly used in discussions of lazy evaluation. All of these would > >work > >equally well as the suggested soft keyword ``later``. The keyword > ``lazy`` > >is > >not completely implausible, but does not seem to read as well. > > > >No punctuation is immediately obvious for this purpose, although > surrounding > >expressions with backticks is somewhat suggestive of quoting in Lisp, and > >perhaps slightly reminiscent of the ancient use of backtick for shell > >commands > >in Python 1.x. E.g.:: > > > > might_use = `math.gcd(a, math.factorial(b))` > > > > > >Relationship to PEP-0671 > >======================== > > > >The concept of "late-bound function argument defaults" is introduced in > >:pep:`671`. Under that proposal, a special syntactic marker would be > >permitted > >in function signatures with default arguments to allow the expressions > >indicated as defaults to be evaluated at call time rather than at runtime. > >In > >current Python, we might write a toy function such as:: > > > > def func(items=[], n=None): > > if n is None: > > n = len(items) > > items.append("Hello") > > print(n) > > > > func([1, 2, 3]) # prints: 3 > > > >Using the :pep:`671` approach this could be simplified somewhat as:: > > > > def func(items=[], n=>len(items)): > > # late-bound defaults act as if bound here > > items.append("Hello") > > print(n) > > > > func([1, 2, 3]) # prints: 3 > > > >Under the current PEP, evaluation of a ``Deferred`` object only occurs > upon > >reference. That is, for the current toy function, the evaluation would > not > >occur until the ``print(n)`` line.:: > > > > def func(items=[], n=later len(items)): > > items.append("Hello") > > print(n) > > > > func([1, 2, 3]) # prints: 4 > > > >To completely replicate the behavior of PEP-0671, an extra line at the > >start of > >the function body would be required:: > > > > def func(items=[], n=later len(items)): > > n = n # Evaluate the Deferred and re-bind the name n > > items.append("Hello") > > print(n) > > > > func([1, 2, 3]) # prints: 3 > > > > > >References > >========== > > > >https://github.com/DavidMertz/peps/blob/master/pep-9999.rst > > > >Copyright > >========= > > > >This document is placed in the public domain or under the > >CC0-1.0-Universal license, whichever is more permissive. > > > > > > > >-- > >Keeping medicines from the bloodstreams of the sick; food > >from the bellies of the hungry; books from the hands of the > >uneducated; technology from the underdeveloped; and putting > >advocates of freedom in prisons. Intellectual property is > >to the 21st century what the slave trade was to the 16th. > > > >_______________________________________________ > >Python-ideas mailing list -- python-ideas@python.org > >To unsubscribe send an email to python-ideas-le...@python.org > >https://mail.python.org/mailman3/lists/python-ideas.python.org/ > >Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/DQTR3CYWMLSRRKR6MBLZNTGCG762QNDF/ > >Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/SK267MWXWE3CBMUY46NVD5OBZNT7KSY4/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PCKAUMW4DOA3ZN6AQ6GXGWYVVIGVJ5XI/ Code of Conduct: http://python.org/psf/codeofconduct/