[Python-ideas] Re: Regex pattern matching

2022-02-17 Thread Valentin Berlier
I see. I guess the ambiguity would stem from trying to force match objects into 
the sequence protocol even though the custom __getitem__() means that they're 
essentially a mixed mapping:

Mapping[int | str, str | None]

If we avoid any sort of "smart" length derived from only mo.groups() or 
mo.groupdict(), there's nothing stopping match objects from acting as proper 
mappings. We would need __iter__() which would simply yield all the available 
keys, including group 0 and all the named groups, and __len__() which would 
return the total number of keys.

My point is that the match object doesn't need to masquerade as something else 
to be useful, just implement the protocol to describe the available keys.

m = re.match(r"(a) (?Pb)(x)?", "a b")
list(m)  # [0, 1, 2, 3, 'foo']
dict(m)  # {0: 'a b', 1: 'a', 2: 'b', 3: None, 'foo': 'b'}

This means that pattern matching with mapping patterns would work 
automatically. The first example I shared would look like this:

match re.match(r"(v|f) (\d+) (\d+) (\d+)", line):
case {1: "v", 2: x, 3: y, 4: z}:
print("Handle vertex")
case {1: "f", 2: a, 3: b, 4: c}:
print("Handle face")

The second example would work without any changes:

match re.match(r"(?P\d+)|(?P+)|(?P*)", line):
case {"number": str(value)}:
return Token(type="number", value=int(value))
case {"add": str()}:
return Token(type="add")
case {"mul": str()}:
return Token(type="mul")
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FUB4CD2DKPJV5QVKZEFJ6XBAFKIP4EZ6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Regex pattern matching

2022-02-16 Thread Eric V. Smith
See https://bugs.python.org/issue46692. It's not so easy to make match 
objects mappings or sequences because of the len() problem.


Eric

On 2/16/2022 9:46 AM, Valentin Berlier wrote:

Hi,

I've been thinking that it would be nice if regex match objects could be 
deconstructed with pattern matching. For example, a simple .obj parser could 
use it like this:

 match re.match(r"(v|f) (\d+) (\d+) (\d+)", line):
 case ["v", x, y, z]:
 print("Handle vertex")
 case ["f", a, b, c]:
 print("Handle face")

Sequence patterns would extract groups directly. Mapping patterns could be used 
to extract named groups, which would be nice for simple parsers/tokenizers:

 match re.match(r"(?P\d+)|(?P\+)|(?P\*)", line):
 case {"number": str(value)}:
 return Token(type="number", value=int(value))
 case {"add": str()}:
 return Token(type="add")
 case {"mul": str()}:
 return Token(type="mul")

Right now, match objects aren't proper sequence or mapping types though, but 
that doesn't seem too complicated to achieve. If this is something that enough 
people would consider useful I'm willing to look into how to implement this.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EKMIJCSJGHJR36W2CNJE4CKO3S5MW3U4/
Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BA6TGJJN65246H7MWYLTUGFSEJ2U2KQ7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Regex pattern matching

2022-02-16 Thread Paul Moore
On Wed, 16 Feb 2022 at 14:47, Valentin Berlier  wrote:
>
> Hi,
>
> I've been thinking that it would be nice if regex match objects could be 
> deconstructed with pattern matching. For example, a simple .obj parser could 
> use it like this:
>
> match re.match(r"(v|f) (\d+) (\d+) (\d+)", line):
> case ["v", x, y, z]:
> print("Handle vertex")
> case ["f", a, b, c]:
> print("Handle face")
>
> Sequence patterns would extract groups directly. Mapping patterns could be 
> used to extract named groups, which would be nice for simple 
> parsers/tokenizers:
>
> match re.match(r"(?P\d+)|(?P\+)|(?P\*)", line):
> case {"number": str(value)}:
> return Token(type="number", value=int(value))
> case {"add": str()}:
> return Token(type="add")
> case {"mul": str()}:
> return Token(type="mul")
>
> Right now, match objects aren't proper sequence or mapping types though, but 
> that doesn't seem too complicated to achieve. If this is something that 
> enough people would consider useful I'm willing to look into how to implement 
> this.

I'm not sure I really see the benefit of this, but if you want to do
it, couldn't you just write a wrapper?

>>> class MatchAsSeq(Sequence):
... def __getattr__(self, attr):
... return getattr(self.m, attr)
... def __len__(self):
... return len(self.m.groups())
... def __init__(self, m):
... self.m = m
... def __getitem__(self, n):
... return self.group(n+1)
...
>>> line = "v 1 12 3"
>>> match MatchAsSeq(re.match(r"(v|f) (\d+) (\d+) (\d+)", line)):
... case ["v", x, y, z]:
... print("Handle vertex")
... case ["f", a, b, c]:
... print("Handle face")
...
Handle vertex

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LCWHARNW5OOCY7CHCXC5CVGFH4OAFOEW/
Code of Conduct: http://python.org/psf/codeofconduct/