Hi Daniel,
On 20/11/2020 10:50 am, Daniel Moisset wrote:
Hello again Mark, I took some time looking in more detail at your
proposal, and these are my thoughts. I'm writing with the assumption
that this proposal describes some "internal" representation of match
statements which is never exposed to the users (so I'd mostly steer away
from lexical/syntactic preferences).
My first general thought is that this is a useful way to describe and
discuss implementation, although I wouldn't wait on refinement of this
to choose whether to accept/reject PEP 634 (or 642, or your favourite
alternative), work can be done on parallel. It may be a good idea to
wait for your proposal to be refined before landing a specific
implementation into CPython (including how the chosen implementation,
assuming it's accepted, desugars into your semantics).
Going into more details, here's a list of thoughts on more specific points:
1. You mention a goal about "erroneous patterns" (which I'm still not
sure I agree on), and your proposal addresses that by forcing
classes to be explicit (via __atributes__ and __deconstruct__) about
what attributes are accepted as matches. This is against one design
principle that's not in your list but it was (at least implicitly)
in PEP622 and 634: "pattern matching must allow matching objects on
types not specifically designed for it"; this will allow to apply
this feature to classes that you can not modify (like instances
created by a 3rd party library ). That's why PEP634 relies on
getattr() ; that could be extended in the feature (providing some
override using attributes like yours) but that wouldn't be required
by default
Why force pattern matching onto library code that was not designed for
pattern matching? It seems risky.
Fishing arbitrary attributes out of an object and assuming that the
values returned by attribute lookup are equivalent to the internal
structure breaks abstraction and data-hiding.
An object's API may consist of methods only. Pulling arbitrary
attributes out of that object may have all sorts of unintended side-effects.
PEP 634 and the DLS paper assert that deconstruction, by accessing
attributes of an object, is the opposite of construction.
This assertion seems false in OOP.
You might consider OOP as a hangover of the 1990s and that FP is the way
forward, and you may well be right, but the vast majority of Python
libraries have a distinctly OO flavor, and we need to respect that.
When we added the "with" statement, there was no attempt to force
existing code to support it. We made the standard library support it,
and let the community add support as and when it suited them.
We should do the same with pattern matching.
2. We considered and discussed something similar to the __deconstruct__
approach (as an override for getattr()), and also noted is
potentially expensive, requiring to allocate an object and evaluate
*all* attributes, even if only one is required. That is against the
principle of " it should perform at least as well as an equivalent
sequence of if, elif statements."
The majority of cases will want more than one attribute, and evaluating
all attributes at once may well be more efficient than doing so one at a
time, especially for builtin classes.
I assure you that this is not going to be a performance bottleneck until
the implementation is a lot slicker than the PEP 634 "reference
implementation".
3. You removed or patterns, and added multiple "case P" for each case
body. This easily covers cases where the multiple options are at top
level in the pattern, but if the or-pattern is in a subpattern you
have to duplicate much of the outer context. And this "duplication"
is actually exponential on the number of or patterns, so for
matching the pattern "[0|1, 0|1, 0|1, 'foo'|'bar'|'baz']" you need
to desugar this into 24 case clauses.
Why is this a problem? The compiler can handle millions of cases without
it being a problem. The number of tests to distinguish N cases is
log(N), so the number of tests remains the same.
4. Something else about decomposing patterns into multiple case clauses
is that it makes it harder to express the constraint that all
alternatives must bind the same variable.
Why? It is a mechanical transformation.
5. I found a bit confusing to read the multiple cases on top. It looks
like C switch statements, which fall-through by default, but in a
context where some case clauses do fall-through (if empty) and
others aren't (if they have a body, even if it's "pass"). I know
this is not user exposed syntax so it's not that important, but this
made the description harder to read for me.
You'll just have to let go of your C-based preconceptions ;)
6. Given the previous points, added to not seeing the gains of not
putting the or patterns into the desugared version, I'd prefer it to
be included in the desugaring.
A key point of the syntax is that if something looks like a Python
expression, then it is a Python expression. That is hard to do with a
"pattern or" operator embedded in the patterns.
That `0|1` means something completely different to `0+1` in PEP 634
seems like an unnecessary trap.
7. I think there's some surprising behaviour in the assignments being
done after a successful match but before the guard is evaluated. In
your proposal the guard has no access to the variables, so it has to
be compiled differently (using $0, $1, ... rather than the actual
names that appear in the expression). And if this guard calls a
function which exposes those variables in any way (for example if
the variable is in a closure) I think the behaviour may be
unexpected /surprising; same if I stop a debugger inside that
function and try to inspect the frame where the matching statement is.
All the "$0, $1" variables have a very limited scope. They're not
visible outside of that scope, so they won't be visible to the debugger
at all.
Modifying variables during matching, as you describe, is a serious flaw
in PEP 634/642. You don't need a debugger for them to have surprising
behavior, failed matches can change global state in an unspecified way.
Worse than that, PEP 634 comes close to blaming the user for any
unwanted side-effects.
https://www.python.org/dev/peps/pep-0634/#side-effects-and-undefined-behavior
8. I like your implementation approach to capture on the stack and then
assign. I was curious if you considered, rather than using a
variable number of stack cells using a single object/dict to store
those values. The compiler and the generated bytecode could end up
being simpler, and you need less stack juggling and possibly no PEEK
operation. a small list/array would suffice, but a dict may provide
further debugging opportunities (and it's likely that a split table
dict could make the representation quite compact). I know this is
less performant but I'm also thinking of simplicity.
Implement it how you please, as long as it's correct, maintainable, and
not too slow :)
9. I think your optimisation approaches are great, the spec was made
lax expecting for people like you to come up with a proposal of this
kind :) I don't think the first implementation of this should be
required to optimize/implement things in a certain way, but if the
spec is turned into implementation dependent and then fixed, it
shouldn't break anything (it's like the change in dictionary order
moving to "undefined/arbitrary" to "preserving insertion order") and
can be done later one
I think it is important that *all* implementations, including the first,
respect the exact semantics as defined. The first implementation should
be reasonably efficient, but doesn't have to be super quick.
Cheers,
Mark.
Thanks again,
Daniel
On Mon, 16 Nov 2020 at 14:44, Mark Shannon <m...@hotpy.org
<mailto:m...@hotpy.org>> wrote:
Hi everyone,
There has been much discussion on the syntax of pattern matching for
Python (in case you hadn't noticed ;)
Unfortunately the semantics seem to have been somewhat overlooked.
What pattern matching actually does seems at least as important as the
syntax.
I believe that a pattern matching implementation must have the
following
properties:
* The semantics must be precisely defined.
* It must be implemented efficiently.
* Failed matches must not pollute the enclosing namespace.
* Objects should be able determine which patterns they match.
* It should be able to handle erroneous patterns, beyond just syntax
errors.
PEP 634 and PEP 642 don't have *any* of these properties.
I've written up a document to specify a possible semantics of pattern
matching for Python that has the above properties, and includes reasons
why they are necessary.
https://github.com/markshannon/pattern-matching/blob/master/precise_semantics.rst
It's in the format of a PEP, but it isn't a complete PEP as it lacks
surface syntax.
Please, let me know what you think.
Cheers,
Mark.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
<mailto:python-dev@python.org>
To unsubscribe send an email to python-dev-le...@python.org
<mailto:python-dev-le...@python.org>
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/BTPODVYLPKY5IHWFKYQJICONTNTRNDB2/
Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/NN4DNK4MXK6LGZMHY4XW22FF3EQZBSUB/
Code of Conduct: http://python.org/psf/codeofconduct/