Hi,

For a PEP to succeed it needs to show two things.

1. Exactly what problem is being solved, or need is to be fulfilled, and that is a sufficiently large problem, or need, to merit the proposed change.

2. That the proposed change is the best known solution for the problem being addressed.

IMO, PEP 622 fails on both counts.

This email addresses point 1.

Given the positive response to the PEP, it may well be that it does address a need. However, the PEP itself fails to show that.


Abstract
--------

This PEP proposes adding pattern matching statements [1] to Python in order to 
create more expressive ways of handling structured heterogeneous data. The 
authors take a holistic approach, providing both static and runtime 
specifications.

What does "static and dynamic specifications" mean? Surely, there are just specifications. Python does not have a static checking phase, so static analysis tools need to understand the dynamic behaviour of the program, not have their own alternate semantics. There is no "static specification" of `isinstance()`, yet static analysis tools understand it.


PEP 275 and PEP 3103 previously proposed similar constructs, and were rejected. 
Instead of targeting the optimization of if ... elif ... else statements (as 
those PEPs did), this design focuses on generalizing sequence, mapping, and 
object destructuring. It uses syntactic features made possible by PEP 617, 
which introduced a more powerful method of parsing Python source code.

Why couple the choice part (a sort of enhanced elif) with destructing (a sort of enhanced unpacking)? We could have a "switch" statement that chooses according to value, and we could have "destructuring" that pulls values apart. Why do they need to be coupled?

Rationale and Goals
-------------------

Let us start from some anecdotal evidence: isinstance() is one of the most 
called functions in large scale Python code-bases (by static call count). In 
particular, when analyzing some multi-million line production code base, it was 
discovered that isinstance() is the second most called builtin function (after 
len()). Even taking into account builtin classes, it is still in the top ten. 
Most of such calls are followed by specific attribute access.

Why use anecdotal evidence? I don't doubt the numbers, but it would be better to use the standard library, or the top N most popular packages from GitHub.


There are two possible conclusions that can be drawn from this information:

    Handling of heterogeneous data (i.e. situations where a variable can take 
values of multiple types) is common in real world code.
    Python doesn't have expressive ways of destructuring object data (i.e. 
separating the content of an object into multiple variables).

I don't see how the second conclusion can be drawn.
How does the prevalence of `isinstance()` suggest that Python doesn't have expressive ways of destructuring object data?

That `len()` is also common, does suggests that some more expressive unpacking syntax might be useful. However, since `len()` only applies to sequences, it suggests to me that unpacking of non-sequences isn't generally useful.


This is in contrast with the opposite sides of both aspects:

This sentence makes no sense. What is "this"? Both aspects of what?


    Its success in the numeric world indicates that Python is good when working 
with homogeneous data. It also has builtin support for homogeneous data 
structures such as e.g. lists and arrays, and semantic constructs such as 
iterators and generators.
    Python is expressive and flexible at constructing objects. It has syntactic 
support for collection literals and comprehensions. Custom objects can be 
created using positional and keyword calls that are customized by special 
__init__() method.

This PEP aims at improving the support for destructuring heterogeneous data by 
adding a dedicated syntactic support for it in the form of pattern matching. On 
a very high level it is similar to regular expressions, but instead of matching 
strings, it will be possible to match arbitrary Python objects.

An explanation is needed of why "destructuring" needs to be so tightly coupled with matching by class or value.


We believe this will improve both readability and reliability of relevant code. 
To illustrate the readability improvement, let us consider an actual example 
from the Python standard library:

def is_tuple(node):
    if isinstance(node, Node) and node.children == [LParen(), RParen()]:
        return True
    return (isinstance(node, Node)
            and len(node.children) == 3
            and isinstance(node.children[0], Leaf)
            and isinstance(node.children[1], Node)
            and isinstance(node.children[2], Leaf)
            and node.children[0].value == "("
            and node.children[2].value == ")")


Just one example?
The PEP needs to show that this sort of pattern is widespread.

With the syntax proposed in this PEP it can be rewritten as below. Note that 
the proposed code will work without any modifications to the definition of Node 
and other classes here:

Without modifying Node or Leaf, the matching code will need to access attributes. You should at least mention side effects and exceptions.
E.g. matching on ORM objects might be problematic.


def is_tuple(node: Node) -> bool:
    match node:
        case Node(children=[LParen(), RParen()]):
            return True
        case Node(children=[Leaf(value="("), Node(), Leaf(value=")")]):
            return True
        case _:
            return False


Python's support for OOP provides an alternative to ADTs.
For example, by adding a simple "matches" method to Node and Leaf, `is_tuple` can be rewritten as something like:

def is_tuple(node):
    if not isinstance(node, Node):
        return False
    return node.matches("(", ")") or node.matches("(", ..., ")")

See the syntax sections below for a more detailed specification.

Similarly to how constructing objects can be customized by a user-defined 
__init__() method, we propose that destructuring objects can be customized by a 
new special __match__() method. As part of this PEP we specify the general 
__match__() API, its implementation for object.__match__(), and for some 
standard library classes (including PEP 557 dataclasses). See runtime section 
below.

You should mention that we already have the ability to "destructure", aka unpack, objects using __iter__.

t = 1, 2 # Creation
a, b = t # "Destructuring"


Finally, we aim to provide a comprehensive support for static type checkers and 
similar tools. For this purpose we propose to introduce a @typing.sealed class 
decorator that will be a no-op at runtime, but will indicate to static tools 
that all subclasses of this class must be defined in the same module. This will 
allow effective static exhaustiveness checks, and together with dataclasses, 
will provide a nice support for algebraic data types [2]. See the static 
checkers section for more details.

Shouldn't this be in a separate PEP? It seems only loosely related, and would have some value regardless of whether the rest of the PEP is accepted.


In general, we believe that pattern matching has been proved to be a useful and 
expressive tool in various modern languages. In particular, many aspects of 
this PEP were inspired by how pattern matching works in Rust [3] and Scala [4].

Both those languages are statically typed, which allows the compiler to perform the much of the pattern matching at compile time.

You should give examples from dynamic typed languages instead, e.g. clojure.



Cheers,
Mark.


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IHO56EK7R77TGVNSDJXUFQARHZGLVYNE/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to