Hi Daniel,
On 06/02/2021 7:47 pm, Daniel Moisset wrote:
Hi Mark,
I think some of these issues have already been raised and replied (even
if no agreement has been reached). but this is a good summary, so let me
reply with a summary of responses for this. >
On Sat, 6 Feb 2021 at 15:51, Mark Shannon <m...@hotpy.org
<mailto:m...@hotpy.org>> wrote:
Hi,
Since a decision on PEP 634 is imminent, I'd like to reiterate some
concerns that I voiced last year.
I am worried about the semantics and implementation of PEP 634.
I don't want to comment on the merits of pattern matching in
general, or
the proposed syntax in PEP 634 (or PEP 640 or PEP 642).
Semantics
---------
1. PEP 634 eschews the object model, in favour of adhoc instance
checks,
length checks and attribute accesses.
This is in contrast to almost all of the the rest of the language,
which
uses special (dunder) methods:
All operators,
subscripting,
attribute lookup,
iteration,
calls,
tests,
object creation,
conversions,
and the with statement
AFAICT, no justification is given for this.
Why can't pattern matching use the object model?
No one has said that "we can't". It's just that "we don't have to". The
underlying mechanisms used by pattern matching (instance check, length,
attribute access) already have their defined protocols and support
within the object model. It's analogous as the way in which iterable
unpacking didn't need to define it's own object model special methods,
because the existing iteration mechanism used in for loops was sufficient.
You seem to be jumping on my use of the word "can't".
I should have said
"Why *doesn't* PEP 634 use the object model?"
As for unpacking, it builds on iteration in a way that is clear and precise.
For example, I can desugar:
a, b = t
into:
try:
__tmp = iter(t)
except TypeError:
raise TypeError("cannot unpack non-iterable ...")
try:
__tmp_a = next(__tmp)
__tmp_b = next(__tmp)
except StopIteration:
raise ValueError("not enough values ...")
try:
next(__tmp)
except StopIteration:
pass
else:
raise raise ValueError("too many values ...")
a = __tmp_a; b = __tmp_b
Noting that variables starting "__tmp" are invisible.
Why not do something similar for PEP 634?
This does not exclude possible future extensions to the object model to
include a richer protocol like described in
https://www.python.org/dev/peps/pep-0622/#custom-matching-protocol
(where it also describes why we're not proposing that *now*, why it can
be done later, and why we think it's best to do it later)
I don't see how this is relevant. It is the semantics of PEP 634 as
proposed that concerns me.
PEP 343 (the "with" statement) added the __enter__ and __exit__ methods
to the object model, and that works very well.
2. PEP 634 deliberately introduces a large area of undefined behaviour
into Python.
https://www.python.org/dev/peps/pep-0634/#side-effects-and-undefined-behavior
Python has, in general, been strict about not having undefined
behaviour.
Having defined semantics means that you can reason about programs, even
non-idiomatic ones.
[This is not unique to Python, it is really only C and C++ that have
areas of undefined behaviour]
The C standard uses a very peculiar definition of "undefined behaviour"
(I'm not familiar with the C++ one to assert anything, I'll assume it's
the same), where for certain set of programs, any resulting behaviour is
valid, even at compile time (so a compiler that deletes all your files
when trying to compile "void f() {int a[10]; a[10]=0;}" is standard
compliant). Comparing that with the use of the term "undefined
behaviour" in the PEP is not very useful, because even if they are the
same words, they don't have the same meaning
If you want to compare it with the C standards, the term we'd use would
be "implementation defined behaviour". Python has a lot of those. For
example, the output of all these python programs can change between
implementations (And some of these even change between versions of cpython):
* print({3,2,1})
* print(hash("hello"))
* if id(1) == id(1): print("hello")
* import sys; print(sys.getsizeof([]))
Some order of operations is also implementation dependent for example in
def foo():
print(open("/etc/passwd")).read()
foo()
The moment where file closing happens is implementation dependent.
The existence of a few cases of undefined behaviour (e.g. race
conditions) does not excuse adding a whole lot more. Especially when
there is no good reason.
The section you linked to introduces behaviour in similar lines to all
of the above.
I can see no good reason for adding undefined behaviour. It doesn't
help
anyone.
It helps for the point 3 that you mention (See below)
No it doesn't.
It is an impediment to performance for the reasons I give here:
The lack of precise semantics makes programs harder to understand, and
it makes the language harder to implement.
If the semantics aren't specified, then the implementation becomes the
specification.
This bakes bugs into the language and makes it harder to maintain,
as bug-for-bug compatibility must be maintained.
3. Performance
PEP 634 says that each pattern must be checked in turn.
That means that multiple redundant checks must be performed on (for
example) a sequence if there are several mapping patterns.
This is unnecessarily slow.
What PEP 634 says about this(unless we're reading different sections, a
quote could help here) is that the semantics are the one of selecting
the /first case block whose patterns succeeds matching it and whose
guard condition (if present) is "truthy"/.
As long as the semantics are respected, a smart compiler could reduce
some of the redundant checks (and that's *precisely* why the order of
the checks is left as implementation dependent).
What is this mythical "smart compiler". Are you expecting some research
breakthrough to solve this or an existing algorithm? If the latter, then
what algorithm?
It's important here to separate "sequential semantics" vs "implemented
as sequential checks". For an analogy, look at the "serializable"
concept in databases, where the outcome of multiple statements must be
equal to the outcome of operations executed serially, but in practice
databases can find more efficient non-serial executions with the same
outcome. The PEP specifies semantics, not how the implementation must
behave.
I'm puzzled by that last sentence.
Isn't that what semantics are? The meaning of something.
How the implementation behaves determines the meanings of programs.
If the individual steps in a sequence have observable side-effects, then
it changes the semantics to remove, add to, or reorder those
side-effects. All of the operations you propose using to implement
pattern matching potentially have side effects.
Implementation
--------------
My main concern with the implementation is that it does too much work
into the interpreter.
Much of that work can and should be done in the compiler.
For example, deep in the implementation of the MATCH_CLASS instruction
is the following comment:
https://github.com/brandtbucher/cpython/blob/patma/Python/ceval.c#L981
Such complex control flow should be handled during compilation, rather
than in the interpreter.
Giant instructions like MATCH_CLASS are likely to have odd corner cases,
and may well have a negative impact on the performance of the rest of
the language.
It is a lot easier to reason about a sequence of simple bytecodes, than
one giant one with context-dependent behaviour.
We have spent quite a lot of effort over the last few years
streamlining
the interpreter.
Adding these extremely complex instructions would be a big backward
step.
What you say about this sounds very reasonable to me, although I don't
understand who it is a "Concern with PEP 634". The implementation is not
part of the PEP, and I'm pretty sure all parties here (the PEP authors,
the PEP critics, the SC) would be more than happy to see this
implementation improving.
You say that the implementation is not part of the PEP, but what happens
if the PEP is accepted?
The implementation then become parts of CPython, which is the de-facto
specification of Python, and because the PEP is not precisely defined,
the implementation becomes the de-factor specification of pattern
matching in Python.
You say that you would be happy to see the implementation improving.
Are you going to do it? If not, then who?
If you can't do it now, then at least explain what optimisations are
proposed.
Cheers,
Mark.
I'm pretty sure we both have presented these ideas before so I'm not
expecting any of us to change our minds at this point, but I'm leaving
this reply here for future reference in case other people want to have
the different positions on this discussion
Best,
D.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/K34246PBAEJDOHKMJX35UC2COYLC4OXZ/
Code of Conduct: http://python.org/psf/codeofconduct/