[Python-Dev] Re: Request for comments on final version of PEP 653 (Precise Semantics for Pattern Matching)

Mark Shannon Tue, 30 Mar 2021 10:41:49 -0700


Hi Brandt,


On 30/03/2021 5:25 pm, Brandt Bucher wrote:

Overall, I am still uncomfortable with PEP 653, and would probably not support 
its acceptance.

Although it has thankfully become a much less radical proposal than it was a 
few weeks ago (thanks, Mark, for your attention to our feedback), I feel that 
the rules it binds implementations to are *very* premature, and that the new 
mechanisms it introduces to do so only modestly improve potential performance 
at great expense to the ease of learning, using, and maintaining code using 
structural pattern matching.

A few notes follow:

For example, using `sympy`, we might want to write:

```
# a*a == a**2
case Mul(args=[a, b]) if a == b:
     return Pow(a, 2)
```

Which requires the sympy class `Symbol` to "self" match. For `sympy` to support 
this pattern with PEP 634 is possible, but a bit tricky. With this PEP it can be 
implemented very easily.


Maybe I'm missing something, but I don't understand at all how the provided 
code snippet relies on the self-matching behavior.


No I'm missing something.
That should read
 case Mul(args=[Symbol(a), Symbol(b)) if a == b:
     ...

I'll fix that in the PEP thanks.


Have the maintainers of SymPy (or any large library supposedly benefitting here) come out 
in support of the PEP? Are they at least aware of it? Have they indicated that the 
proposed idiom for implementing self-matching behavior using a property is truly too 
"tricky" for them?

Have you identified any stdlib classes that would benefit greatly from this?

For me, `__match_class__` feels like a feature without demonstrated need. Even 
if there is a great demand for this, I certainly think that there are far 
better options than the proposed flagging system:

The distinction between those classes that have the default behavior andthose that match "self" is from PEP 634. I didn't introduce it.

I'm just proposing a more principled way to make that distinction.


- A `@match_self` class decorator (someone's bound to put one on PyPI, at any 
rate).
- Allowing `__match_args__ = None` to signal this case (an option we previously 
considered, and my personal preference).

...both of which can be added later, if needed.

Further, PEP 634 makes it very easy for libraries to support Python versions 
with *and* without pattern matching (something I consider to be an important 
requirement). The following class works with both 3.9 and 3.10:

```
class C(collections.abc.Sequence):
     ...
```

While something like this is required for PEP 653:

```
class C:
     if sys.version_info >= (3, 10):
         from somewhere import MATCH_SEQUENCE
         __match_container__ = MATCH_SEQUENCE
     ...
```


Or

class C:
    __match_container__ = 1 # MATCH_SEQUENCE

Which is one reason the PEP states that the values of MATCH_SEQUENCE,etc. will never change.

PEP 634 relies on the `collections.abc` module when determining which patterns 
a value can match, implicitly importing it if necessary. This PEP will 
eliminate surprising import errors and misleading audit events from those 
imports.


I think that a broken `_collections_abc` module *should* be surprising. Is 
there any reasonable scenario where it's expected to not exist, or be not be 
fit for this purpose?


No reasonable scenario, but unreasonable scenarios happen all too often.


And I'm not sure how an audit event for an import that is happening could be considered 
"misleading"... I certainly wouldn't want it suppressed.

It's misleading because a match statement doesn't include any explicitimports.

Looking up a special attribute is much faster than performing a subclass test 
on an abstract base class.


How much faster? A quick benchmark on my machine suggests less than half a 
microsecond. PEP 634 (like PEP 653) already allows us to cache this information 
for the subject of a match statement, so I doubt that this is actually a real 
issue in practice. An indeed, with the current implementation, this test isn't 
even performed on the most common types, such as lists, tuples, and 
dictionaries.


Half a microsecond is thousands of instructions on a modern CPU.
That is a long time for a single VM operation.


At the very least, PEP 653's confusing new flag system seems to be a *very* 
premature optimization, seriously hurting usability for a modest performance 
increase. (Using them wrongly also seems to introduce a fair amount of 
undefined behavior, which seems to go against the PEP's own motivation.)


Why do you say it is a premature optimization?
It's primary purpose is reliability and precise semantics.
It is more optimizable, I agree, but that is hardly premature.

You also say it is confusing, but I think it is simpler than theworkarounds to match "self" that you propose.

This is very subjective though. Evidently we think differently.

If the value of `__match_args__` is not as specified, then the implementation 
may raise any exception, or match the wrong pattern.


I think there's a name for this sort of behavior... ;)

Indeed, but there is only undefined behavior if a class violates clearlyspecified rules. The undefined behavior in PEP 634 is much broader.


We already tolerate some amount of undefined behavior. For example,
dictionary lookup is also undefined for classes which do not hash properly.


A couple of other, more technical notes:

- PEP 653 requires mappings to have a `keys()` method that returns an object 
supporting set inequality operations. It is not really that common to find this 
sort of support in user code (in my experience, it is more likely that 
user-defined `keys()` methods will return iterables). It's not even clear to me 
if this is an interface requirement for mappings in general. For example, 
`weakref.WeakKeyDictionary` and `weakref.WeakValueDictionary` presently do not 
work with PEP 653's requirements for mapping patterns, since their `keys()` 
methods return iterators.

- Treating `__getitem__` as pure is problematic for some common classes (such 
as `defaultdict`). That's why we use two-argument `get()` instead.

Thanks for pointing that out.

I'd noted that PEP 634 used `get()`, which is why I inserted the guardon keys beforehand. Clearly that is insufficient.

I'll update the semantics to use the two-argument `get()`. It does seemmore robust.


As well-fleshed out as the pseudocode for the matching operations in this PEP 
may be, examples like this suggest that perhaps we should wait until 3.11 or 
later to figure out what actually works in practice and what doesn't. PEP 634 
took a full year of work, and the ideas it proposed changed substantially 
during that time (in no small part because we had many people experimenting 
with how an actual implementation interacted with real code).


I fully understand that a lot of work went into PEP 634.

Which is why I am using the syntax and much of the semantics of PEP 634as I can.


The test suite is proving very useful, thanks.

The problem with waiting for 3.11 is that code will start to rely onsome of the implementation details of pattern matching as it is now,

and that our ability to optimize it will be delayed by a year.


Cheers,
Mark.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2IANQRLB4JR7RBNOSI2PLBQZP4HTMU54/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Request for comments on final version of PEP 653 (Precise Semantics for Pattern Matching)

Reply via email to