[Python-Dev] Re: The semantics of pattern matching for Python

Mark Shannon Fri, 20 Nov 2020 06:34:17 -0800

Hi Daniel,

On 20/11/2020 10:50 am, Daniel Moisset wrote:

Hello again Mark, I took some time looking in more detail at yourproposal, and these are my thoughts. I'm writing with the assumptionthat this proposal describes some "internal" representation of matchstatements which is never exposed to the users (so I'd mostly steer awayfrom lexical/syntactic preferences).
My first general thought is that this is a useful way to describe anddiscuss implementation, although I wouldn't wait on refinement of thisto choose whether to accept/reject PEP 634 (or 642, or your favouritealternative), work can be done on parallel. It may be a good idea towait for your proposal to be refined before landing a specificimplementation into CPython (including how the chosen implementation,assuming it's accepted, desugars into your semantics).
Going into more details, here's a list of thoughts on more specific points:

 1. You mention a goal about "erroneous patterns" (which I'm still not
    sure I agree on), and your proposal addresses that by forcing
    classes to be explicit (via __atributes__ and __deconstruct__) about
    what attributes are accepted as matches. This is against one design
    principle that's not in your list but it was (at least implicitly)
    in PEP622 and 634: "pattern matching must allow matching objects on
    types not specifically designed for it"; this will allow to apply
    this feature to classes that you can not modify (like instances
    created by a 3rd party library ). That's why PEP634 relies on
    getattr() ; that could be extended in the feature (providing some
    override using attributes like yours) but that wouldn't be required
    by default

Why force pattern matching onto library code that was not designed forpattern matching? It seems risky.

Fishing arbitrary attributes out of an object and assuming that thevalues returned by attribute lookup are equivalent to the internalstructure breaks abstraction and data-hiding.

An object's API may consist of methods only. Pulling arbitraryattributes out of that object may have all sorts of unintended side-effects.

PEP 634 and the DLS paper assert that deconstruction, by accessingattributes of an object, is the opposite of construction.

This assertion seems false in OOP.

You might consider OOP as a hangover of the 1990s and that FP is the wayforward, and you may well be right, but the vast majority of Pythonlibraries have a distinctly OO flavor, and we need to respect that.

When we added the "with" statement, there was no attempt to forceexisting code to support it. We made the standard library support it,and let the community add support as and when it suited them.


We should do the same with pattern matching.

 2. We considered and discussed something similar to the __deconstruct__
    approach (as an override for getattr()), and also noted is
    potentially expensive, requiring to allocate an object and evaluate
    *all* attributes, even if only one is required. That is against the
    principle of " it should perform at least as well as an equivalent
    sequence of if, elif statements."

The majority of cases will want more than one attribute, and evaluatingall attributes at once may well be more efficient than doing so one at atime, especially for builtin classes.

I assure you that this is not going to be a performance bottleneck untilthe implementation is a lot slicker than the PEP 634 "referenceimplementation".

 3. You removed or patterns, and added multiple "case P" for each case
    body. This easily covers cases where the multiple options are at top
    level in the pattern, but if the or-pattern is in a subpattern you
    have to duplicate much of the outer context. And this "duplication"
    is actually exponential on the number of or patterns, so for
    matching the pattern "[0|1, 0|1, 0|1, 'foo'|'bar'|'baz']" you need
    to desugar this into 24 case clauses.

Why is this a problem? The compiler can handle millions of cases withoutit being a problem. The number of tests to distinguish N cases islog(N), so the number of tests remains the same.

 4. Something else about decomposing patterns into multiple case clauses
    is that it makes it harder to express the constraint that all
    alternatives must bind the same variable.


Why? It is a mechanical transformation.

 5. I found a bit confusing to read the multiple cases on top. It looks
    like C switch statements, which fall-through by default, but in a
    context where some case clauses do fall-through (if empty) and
    others aren't (if they have a body, even if it's "pass"). I know
    this is not user exposed syntax so it's not that important, but this
    made the description harder to read for me.


You'll just have to let go of your C-based preconceptions ;)

 6. Given the previous points, added to not seeing the gains of not
    putting the or patterns into the desugared version, I'd prefer it to
    be included in the desugaring.

A key point of the syntax is that if something looks like a Pythonexpression, then it is a Python expression. That is hard to do with a"pattern or" operator embedded in the patterns.That `0|1` means something completely different to `0+1` in PEP 634seems like an unnecessary trap.

 7. I think there's some surprising behaviour in the assignments being
    done after a successful match but before the guard is evaluated. In
    your proposal the guard has no access to the variables, so it has to
    be compiled differently (using $0, $1, ... rather than the actual
    names that appear in the expression). And if this guard calls a
    function which exposes those variables in any way (for example if
    the variable is in a closure) I think the behaviour may be
    unexpected /surprising; same if I stop a debugger inside that
    function and try to inspect the frame where the matching statement is.

All the "$0, $1" variables have a very limited scope. They're notvisible outside of that scope, so they won't be visible to the debuggerat all.

Modifying variables during matching, as you describe, is a serious flawin PEP 634/642. You don't need a debugger for them to have surprisingbehavior, failed matches can change global state in an unspecified way.

Worse than that, PEP 634 comes close to blaming the user for anyunwanted side-effects.

https://www.python.org/dev/peps/pep-0634/#side-effects-and-undefined-behavior

 8. I like your implementation approach to capture on the stack and then
    assign. I was curious if you considered, rather than using a
    variable number of stack cells using a single object/dict to store
    those values. The compiler and the generated bytecode could end up
    being simpler, and you need less stack juggling and possibly no PEEK
    operation. a small list/array would suffice, but a dict may provide
    further debugging opportunities (and it's likely that a split table
    dict could make the representation quite compact). I know this is
    less performant but I'm also thinking of simplicity.

Implement it how you please, as long as it's correct, maintainable, andnot too slow :)

 9. I think your optimisation approaches are great, the spec was made
    lax expecting for people like you to come up with a proposal of this
    kind :) I don't think the first implementation of this should be
    required to optimize/implement things in a certain way, but if the
    spec is turned into implementation dependent and then fixed, it
    shouldn't break anything (it's like the change in dictionary order
    moving to "undefined/arbitrary" to "preserving insertion order") and
    can be done later one


I think it is important that *all* implementations, including the first,

respect the exact semantics as defined. The first implementation shouldbe reasonably efficient, but doesn't have to be super quick.


Cheers,
Mark.


Thanks again,

Daniel

On Mon, 16 Nov 2020 at 14:44, Mark Shannon <m...@hotpy.org<mailto:m...@hotpy.org>> wrote:

Hi everyone,

There has been much discussion on the syntax of pattern matching for
Python (in case you hadn't noticed ;)

Unfortunately the semantics seem to have been somewhat overlooked.
What pattern matching actually does seems at least as important as the
syntax.

I believe that a pattern matching implementation must have the
following
properties:

* The semantics must be precisely defined.
* It must be implemented efficiently.
* Failed matches must not pollute the enclosing namespace.
* Objects should be able determine which patterns they match.
* It should be able to handle erroneous patterns, beyond just syntax
errors.

PEP 634 and PEP 642 don't have *any* of these properties.

I've written up a document to specify a possible semantics of pattern
matching for Python that has the above properties, and includes reasons
why they are necessary.

https://github.com/markshannon/pattern-matching/blob/master/precise_semantics.rst

It's in the format of a PEP, but it isn't a complete PEP as it lacks
surface syntax.

Please, let me know what you think.

Cheers,
Mark.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
<mailto:python-dev@python.org>
To unsubscribe send an email to python-dev-le...@python.org
<mailto:python-dev-le...@python.org>
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at

https://mail.python.org/archives/list/python-dev@python.org/message/BTPODVYLPKY5IHWFKYQJICONTNTRNDB2/
Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NN4DNK4MXK6LGZMHY4XW22FF3EQZBSUB/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: The semantics of pattern matching for Python

Reply via email to