Hi,
For a PEP to succeed it needs to show two things.
1. Exactly what problem is being solved, or need is to be fulfilled, and
that is a sufficiently large problem, or need, to merit the proposed change.
2. That the proposed change is the best known solution for the problem
being addressed.
IMO, PEP 622 fails on both counts.
This email addresses point 1.
Given the positive response to the PEP, it may well be that it does
address a need. However, the PEP itself fails to show that.
Abstract
--------
This PEP proposes adding pattern matching statements [1] to Python in order to
create more expressive ways of handling structured heterogeneous data. The
authors take a holistic approach, providing both static and runtime
specifications.
What does "static and dynamic specifications" mean? Surely, there are
just specifications.
Python does not have a static checking phase, so static analysis tools
need to understand the dynamic behaviour of the program, not have their
own alternate semantics. There is no "static specification" of
`isinstance()`, yet static analysis tools understand it.
PEP 275 and PEP 3103 previously proposed similar constructs, and were rejected.
Instead of targeting the optimization of if ... elif ... else statements (as
those PEPs did), this design focuses on generalizing sequence, mapping, and
object destructuring. It uses syntactic features made possible by PEP 617,
which introduced a more powerful method of parsing Python source code.
Why couple the choice part (a sort of enhanced elif) with destructing (a
sort of enhanced unpacking)?
We could have a "switch" statement that chooses according to value, and
we could have "destructuring" that pulls values apart. Why do they need
to be coupled?
Rationale and Goals
-------------------
Let us start from some anecdotal evidence: isinstance() is one of the most
called functions in large scale Python code-bases (by static call count). In
particular, when analyzing some multi-million line production code base, it was
discovered that isinstance() is the second most called builtin function (after
len()). Even taking into account builtin classes, it is still in the top ten.
Most of such calls are followed by specific attribute access.
Why use anecdotal evidence? I don't doubt the numbers, but it would be
better to use the standard library, or the top N most popular packages
from GitHub.
There are two possible conclusions that can be drawn from this information:
Handling of heterogeneous data (i.e. situations where a variable can take
values of multiple types) is common in real world code.
Python doesn't have expressive ways of destructuring object data (i.e.
separating the content of an object into multiple variables).
I don't see how the second conclusion can be drawn.
How does the prevalence of `isinstance()` suggest that Python doesn't
have expressive ways of destructuring object data?
That `len()` is also common, does suggests that some more expressive
unpacking syntax might be useful. However, since `len()` only applies to
sequences, it suggests to me that unpacking of non-sequences isn't
generally useful.
This is in contrast with the opposite sides of both aspects:
This sentence makes no sense. What is "this"? Both aspects of what?
Its success in the numeric world indicates that Python is good when working
with homogeneous data. It also has builtin support for homogeneous data
structures such as e.g. lists and arrays, and semantic constructs such as
iterators and generators.
Python is expressive and flexible at constructing objects. It has syntactic
support for collection literals and comprehensions. Custom objects can be
created using positional and keyword calls that are customized by special
__init__() method.
This PEP aims at improving the support for destructuring heterogeneous data by
adding a dedicated syntactic support for it in the form of pattern matching. On
a very high level it is similar to regular expressions, but instead of matching
strings, it will be possible to match arbitrary Python objects.
An explanation is needed of why "destructuring" needs to be so tightly
coupled with matching by class or value.
We believe this will improve both readability and reliability of relevant code.
To illustrate the readability improvement, let us consider an actual example
from the Python standard library:
def is_tuple(node):
if isinstance(node, Node) and node.children == [LParen(), RParen()]:
return True
return (isinstance(node, Node)
and len(node.children) == 3
and isinstance(node.children[0], Leaf)
and isinstance(node.children[1], Node)
and isinstance(node.children[2], Leaf)
and node.children[0].value == "("
and node.children[2].value == ")")
Just one example?
The PEP needs to show that this sort of pattern is widespread.
With the syntax proposed in this PEP it can be rewritten as below. Note that
the proposed code will work without any modifications to the definition of Node
and other classes here:
Without modifying Node or Leaf, the matching code will need to access
attributes. You should at least mention side effects and exceptions.
E.g. matching on ORM objects might be problematic.
def is_tuple(node: Node) -> bool:
match node:
case Node(children=[LParen(), RParen()]):
return True
case Node(children=[Leaf(value="("), Node(), Leaf(value=")")]):
return True
case _:
return False
Python's support for OOP provides an alternative to ADTs.
For example, by adding a simple "matches" method to Node and Leaf,
`is_tuple` can be rewritten as something like:
def is_tuple(node):
if not isinstance(node, Node):
return False
return node.matches("(", ")") or node.matches("(", ..., ")")
See the syntax sections below for a more detailed specification.
Similarly to how constructing objects can be customized by a user-defined
__init__() method, we propose that destructuring objects can be customized by a
new special __match__() method. As part of this PEP we specify the general
__match__() API, its implementation for object.__match__(), and for some
standard library classes (including PEP 557 dataclasses). See runtime section
below.
You should mention that we already have the ability to "destructure",
aka unpack, objects using __iter__.
t = 1, 2 # Creation
a, b = t # "Destructuring"
Finally, we aim to provide a comprehensive support for static type checkers and
similar tools. For this purpose we propose to introduce a @typing.sealed class
decorator that will be a no-op at runtime, but will indicate to static tools
that all subclasses of this class must be defined in the same module. This will
allow effective static exhaustiveness checks, and together with dataclasses,
will provide a nice support for algebraic data types [2]. See the static
checkers section for more details.
Shouldn't this be in a separate PEP? It seems only loosely related, and
would have some value regardless of whether the rest of the PEP is accepted.
In general, we believe that pattern matching has been proved to be a useful and
expressive tool in various modern languages. In particular, many aspects of
this PEP were inspired by how pattern matching works in Rust [3] and Scala [4].
Both those languages are statically typed, which allows the compiler to
perform the much of the pattern matching at compile time.
You should give examples from dynamic typed languages instead, e.g. clojure.
Cheers,
Mark.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/IHO56EK7R77TGVNSDJXUFQARHZGLVYNE/
Code of Conduct: http://python.org/psf/codeofconduct/