[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-29 Thread Pierre Quentel


Pierre Quentel  added the comment:

I found why len() is required, it's to avoid trying to match the subject (thus 
consuming a part of it) if its length is less than the number of non-star 
patterns, as explained in the PEP.

My mistake, sorry.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-27 Thread Pierre Quentel


Pierre Quentel  added the comment:

Oh, I did not invent this class, it is in the test script for pattern matching 
: 
https://github.com/python/cpython/blob/6948964ecf94e858448dd28eea634317226d2913/Lib/test/test_patma.py#L1932

With this class, [x, *_, y] matches, but not [x, *w, y] : this is what made me 
create this issue. Maybe it would be a good idea to change this class in 
test_patma.py ?

OTOH, if the current implementation remains the same, why does the PEP insist 
on subjects having a len() ? Could sequence patterns match a wider range of 
subjects that can be unpacked ?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-27 Thread Guido van Rossum


Guido van Rossum  added the comment:

Given that the class used to demonstrate this behavior has a bug (__len__ is 
inconsistent with __getitem__), I don't think it matters much -- if doing it 
your way is (as I suspect) slower than the current implementation for other 
(well-behaved) sequences, we should not change the implementation.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-26 Thread Pierre Quentel


Pierre Quentel  added the comment:

Thanks for the explanations, but I feel unconfortable with the fact that 
variable-length sequence patterns are implemented the same as unpacking. (sorry 
if this has been discussed before, I can't find references to the discussions 
that lead to the current version of the PEP).

There are obvious similarities between

[x, *y, z] = A

and

match A:
case [x, *y, z]:
print('ok')

but a big difference, put forward in PEP 634 : the classes supported for 
pattern matching are limited (unpacking a generator expression is possible, but 
they are not supported as subjects for sequence patterns), and the PEP 
explicitely refers to them having a len().

It seems to me that this implies that the algorithms should be different:

- for unpacking, the iterable must be completely consumed before binding the 
names on the left-hand side. If it is infinite, unpacking fails

- for variable-length sequence pattern matching (this is how I understand the 
last paragraph about them in PEP 634):
. get the subject length
. iterate one by one before the star pattern
. iterate (len(subject) - number of non-star patterns) times for the star 
pattern
. iterate one by one after the star pattern

In the second case, even if the subject never raises StopIteration, the match 
succeeds.

Does this make sense ?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-26 Thread Steven D'Aprano


Steven D'Aprano  added the comment:

Ah, I missed that and totally misinterrpreted other comments. 
Confirmation bias at work: I saw the code I expected to see, not the 
code that was actually there :-(

All good, I agree that it is a bug in the class, not in the language. 
Sorry for the noise.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-26 Thread Brandt Bucher


Brandt Bucher  added the comment:

> All the examples that are said to go into an infinite loop work fine in 3.9. 
> [...] At the very least, the examples shouldn't swallow the IndexError and go 
> into an infinite loop.

Note that the original example posted did not include a "raise IndexError" 
branch in its __getitem__. That was class I was referring to, since the lack of 
an IndexError causes the iteration to continue forever.

As far as I know, nothing has changed about the iteration protocol in 3.10.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-26 Thread Steven D'Aprano


Steven D'Aprano  added the comment:

How is this not a regression? And a very serious one by the sound of it. All 
the examples that are said to go into an infinite loop work fine in 3.9.

(Obviously I can't check the match statement example in 3.9.)

If the Sequence Protocol really has been dropped from Python's execution model, 
shouldn't there be a PEP for such a major change in behaviour? Legacy or not, 
it is still a part of the language.

At the very least, the examples shouldn't swallow the IndexError and go into an 
infinite loop.

Brandt said:

> the destructuring is basically the same as if you had written:
> 
> [x, *w, y] = Seq()
> 
> ...which also hangs.


It works fine in 3.9, with or without inheriting from the Sequence ABC.

# Python 3.9
>>> class A:
... def __len__(self):
... return 5
... def __getitem__(self, i):
... print(i)
... if i < 5:
... return i
... raise IndexError
... 
>>> [x, *y, z] = A()
0
1
2
3
4
5
>>> x, y, z
(0, [1, 2, 3], 4)


Likewise for list(A()), etc.

The __len__ method isn't needed for iteration to work correctly. I just 
included it to match Pierre's example.

Inheriting from collections.abc.Sequence does not change the behaviour.

Still in 3.9:

>>> list(A())
0
1
2
3
4
5
[0, 1, 2, 3, 4]


I think that closing this is as Not A Bug was premature. If this isn't a bug, 
then I have no idea what counts as a bug any longer :-(

--
nosy: +steven.daprano

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-26 Thread Brandt Bucher


Brandt Bucher  added the comment:

I agree, Pablo.

The only thing that makes me pause is that, depending on the pattern, 
*sometimes* we use iteration and *sometimes* we use indexing. That might be 
sort of surprising to classes like Seq if you change patterns or major Python 
versions and this thing starts hanging. Hopefully, though, the reason will be 
relatively easy to debug and fix.

Closing as "not a bug".

--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed
type: crash -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-26 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

In general I would propose to close this as "not a bug"

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-26 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

I don't think there is anything technically wrong here. The Seq class is 
relying on the legacy version of the iterator protocol, which specifies that 
you need to raise IndexError there to stop the iteration.

For instance, this version works:

import collections.abc

class Seq(collections.abc.Sequence):
def __getitem__(self, i):
if i >= len(self):
raise IndexError
return i
def __len__(self):
return 42

match Seq():
case [x, *w, y]:
z = 0
print(z)


Also, one could argue that Seq has exactly the same problem all over the 
language. For instance

>> list(Seq())

will hang

>> [x,*y] = Seq()

will hang

and so on and so forth. So, if I am a user and I am trying to **predict** how 
this would behave based on these two experiments my conclusion would be:

"Anything that tries to iterate on this thing will hang"

I think that whatever we do, we should make it *predictable* and consistency 
across the language, and IMHO only fixing it in pattern matching doesn't make 
it predictable. Being said that, I don't think these use cases justifies a 
major overhaul of how this behaves everywhere.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-26 Thread Brandt Bucher


Brandt Bucher  added the comment:

(Raising the priority to "high" because any decision on this should ideally be 
made before the 3.10 RCs.)

Hm, interesting. This is because we use UNPACK_EX for these patterns, so the 
destructuring is basically the same as if you had written:

[x, *w, y] = Seq()

...which also hangs.

We actually have three implementations for destructuring sequences:
- Using UNPACK_EX, when there is a starred name.
- Using BINARY_SUBSCR, when there is a starred wildcard.
- Using UNPACK_SEQUENCE, when there is no star.

When using your Seq class:
- The first fails by falling into an infinite loop.
- The second works as expected.
- The third fails with a ValueError at runtime (for a length mismatch).

*If* we decide that this is a big enough problem to fix, then I think the 
easiest way of doing it is to use the BINARY_SUBSCR implementation for all 
three paths (we'll just also need to use BUILD_LIST / LIST_APPEND to actually 
collect the starred items). It will simplify the pattern compiler a bit, but I 
imagine it will also come with a performance penalty as well.

In my opinion, I don't think we should rewrite everything to support Seq 
(though my mind isn't made up entirely). Sequences are supposed to be sized and 
iterable, but Seq doesn't really support the iteration protocol correctly (it 
expects the iterating code to stop once the length is exhausted, rather than 
raising StopIteration).

I'm curious to hear opinions on whether we want to actually fix this, though. 
It seems that it will always be possible to write classes like Seq the hack our 
pattern-matching implementation with dubious sequences and mappings, so it 
really comes down to: is supporting classes like Seq worth potentially slowing 
down all other sequence patterns?

--
assignee:  -> brandtbucher
nosy: +Mark.Shannon, gvanrossum, pablogsal
priority: normal -> high

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-26 Thread Brandt Bucher


Change by Brandt Bucher :


--
nosy: +brandtbucher

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44741] Pattern Matching - star subpattern with a subject derived from collections.abc.Sequence

2021-07-26 Thread Pierre Quentel


New submission from Pierre Quentel :

This code

match range(42):
case [x, *w, y]:
z = 0

sets w to a list with 40 items : the length of the subject, minus the number of 
non-star subpatterns.

But this code (adapted from test_patma_186) enters an infinite loop

import collections.abc

class Seq(collections.abc.Sequence):
def __getitem__(self, i):
print('get item', i)
return i
def __len__(self):
return 42

match Seq():
case [x, *w, y]:
z = 0

__getitem__ gets called forever, instead of stopping when the expected number 
of items in w is reached.

--
components: Interpreter Core
messages: 398226
nosy: quentel
priority: normal
severity: normal
status: open
title: Pattern Matching - star subpattern with a subject derived from 
collections.abc.Sequence
type: crash
versions: Python 3.10, Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com