Hi Daniel,

On 06/02/2021 7:47 pm, Daniel Moisset wrote:
Hi Mark,

I think some of these issues have already been raised and replied (even if no agreement has been reached). but this is a good summary, so let me reply with a summary of responses for this. > On Sat, 6 Feb 2021 at 15:51, Mark Shannon <m...@hotpy.org <mailto:m...@hotpy.org>> wrote:

    Hi,

    Since a decision on PEP 634 is imminent, I'd like to reiterate some
    concerns that I voiced last year.

    I am worried about the semantics and implementation of PEP 634.
    I don't want to comment on the merits of pattern matching in
    general, or
    the proposed syntax in PEP 634 (or PEP 640 or PEP 642).

    Semantics
    ---------

    1. PEP 634 eschews the object model, in favour of adhoc instance
    checks,
    length checks and attribute accesses.

    This is in contrast to almost all of the the rest of the language,
    which
    uses special (dunder) methods:
        All operators,
        subscripting,
        attribute lookup,
        iteration,
        calls,
        tests,
        object creation,
        conversions,
        and the with statement

    AFAICT, no justification is given for this.
    Why can't pattern matching use the object model?


No one has said that "we can't". It's just that "we don't have to". The underlying mechanisms used by pattern matching (instance check, length, attribute access) already have their defined protocols and support within the object model. It's analogous as the way in which  iterable unpacking didn't need to define it's own object model special methods, because the existing iteration mechanism used in for loops was sufficient.

You seem to be jumping on my use of the word "can't".
I should have said
"Why *doesn't* PEP 634 use the object model?"

As for unpacking, it builds on iteration in a way that is clear and precise.

For example, I can desugar:

a, b = t

into:

try:
    __tmp = iter(t)
except TypeError:
    raise TypeError("cannot unpack non-iterable ...")
try:
    __tmp_a = next(__tmp)
    __tmp_b = next(__tmp)
except StopIteration:
    raise ValueError("not enough values ...")
try:
    next(__tmp)
except StopIteration:
    pass
else:
    raise raise ValueError("too many values ...")
a = __tmp_a; b = __tmp_b

Noting that variables starting "__tmp" are invisible.

Why not do something similar for PEP 634?



This does not exclude possible future extensions to the object model to include a richer protocol like described in https://www.python.org/dev/peps/pep-0622/#custom-matching-protocol (where it also describes why we're not proposing that *now*, why it can be done later, and why we think it's best to do it later)

I don't see how this is relevant. It is the semantics of PEP 634 as proposed that concerns me.



    PEP 343 (the "with" statement) added the __enter__ and __exit__ methods
    to the object model, and that works very well.


    2. PEP 634 deliberately introduces a large area of undefined behaviour
    into Python.

    
https://www.python.org/dev/peps/pep-0634/#side-effects-and-undefined-behavior

    Python has, in general, been strict about not having undefined
    behaviour.
    Having defined semantics means that you can reason about programs, even
    non-idiomatic ones.
    [This is not unique to Python, it is really only C and C++ that have
    areas of undefined behaviour]


The C standard uses a very peculiar definition of "undefined behaviour" (I'm not familiar with the C++ one to assert anything, I'll assume it's the same), where for certain set of programs, any resulting behaviour is valid, even at compile time (so a compiler that deletes all your files when trying to compile "void f() {int a[10]; a[10]=0;}" is standard compliant). Comparing that with the use of the term "undefined behaviour" in the PEP is not very useful, because even if they are the same words, they don't have the same meaning

If you want to compare it with the C standards, the term we'd use would be "implementation defined behaviour". Python has a lot of those. For example, the output of all these python programs can change between implementations (And some of these even change between versions of cpython):

  * print({3,2,1})
  * print(hash("hello"))
  * if id(1) == id(1): print("hello")
  * import sys; print(sys.getsizeof([]))

Some order of operations is also implementation dependent for example in
def foo():
    print(open("/etc/passwd")).read()
foo()

The moment where file closing happens is implementation dependent.

The existence of a few cases of undefined behaviour (e.g. race conditions) does not excuse adding a whole lot more. Especially when there is no good reason.


The section you linked to introduces behaviour in similar lines to all of the above.


    I can see no good reason for adding undefined behaviour. It doesn't
    help
    anyone.


It helps for the point 3 that you mention (See below)

No it doesn't.
It is an impediment to performance for the reasons I give here:


    The lack of precise semantics makes programs harder to understand, and
    it makes the language harder to implement.
    If the semantics aren't specified, then the implementation becomes the
    specification.
    This bakes bugs into the language and makes it harder to maintain,
    as bug-for-bug compatibility must be maintained.


    3. Performance

    PEP 634 says that each pattern must be checked in turn.
    That means that multiple redundant checks must be performed on (for
    example) a sequence if there are several mapping patterns.
    This is unnecessarily slow.


What PEP 634 says about this(unless we're reading different sections, a quote could help here) is that the semantics are the one of selecting the /first case block whose patterns succeeds matching it and whose guard condition (if present) is "truthy"/.

As long as the semantics are respected, a smart compiler could reduce some of the redundant checks (and that's *precisely* why the order of the checks is left as implementation dependent).

What is this mythical "smart compiler". Are you expecting some research breakthrough to solve this or an existing algorithm? If the latter, then what algorithm?


It's important here to separate "sequential semantics" vs "implemented as sequential checks". For an analogy, look at the "serializable" concept in databases, where the outcome of multiple statements must be equal to the outcome of operations executed serially, but in practice databases can find more efficient non-serial executions with the same outcome. The PEP specifies semantics, not how the implementation must behave.

I'm puzzled by that last sentence.
Isn't that what semantics are? The meaning of something.
How the implementation behaves determines the meanings of programs.

If the individual steps in a sequence have observable side-effects, then it changes the semantics to remove, add to, or reorder those side-effects. All of the operations you propose using to implement pattern matching potentially have side effects.


    Implementation
    --------------

    My main concern with the implementation is that it does too much work
    into the interpreter.
    Much of that work can and should be done in the compiler.
    For example, deep in the implementation of the MATCH_CLASS instruction
    is the following comment:
    https://github.com/brandtbucher/cpython/blob/patma/Python/ceval.c#L981

    Such complex control flow should be handled during compilation, rather
    than in the interpreter.
    Giant instructions like MATCH_CLASS are likely to have odd corner cases,
    and may well have a negative impact on the performance of the rest of
    the language.
    It is a lot easier to reason about a sequence of simple bytecodes, than
    one giant one with context-dependent behaviour.

    We have spent quite a lot of effort over the last few years
    streamlining
    the interpreter.
    Adding these extremely complex instructions would be a big backward
    step.


What you say about this sounds very reasonable to me, although I don't understand who it is a "Concern with PEP 634". The implementation is not part of the PEP, and I'm pretty sure all parties here (the PEP authors, the PEP critics, the SC) would be more than happy to see this implementation improving.

You say that the implementation is not part of the PEP, but what happens if the PEP is accepted? The implementation then become parts of CPython, which is the de-facto specification of Python, and because the PEP is not precisely defined, the implementation becomes the de-factor specification of pattern matching in Python.

You say that you would be happy to see the implementation improving.
Are you going to do it? If not, then who?
If you can't do it now, then at least explain what optimisations are proposed.

Cheers,
Mark.


I'm pretty sure we both have presented these ideas before so I'm not expecting any of us to change our minds at this point, but I'm leaving this reply here for future reference in case other people want to have the different positions on this discussion

Best,
D.


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/K34246PBAEJDOHKMJX35UC2COYLC4OXZ/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to