[Python-ideas] Re: findfirst() from findalliter(), and first

Andrew Barnert via Python-ideas Sat, 28 Dec 2019 12:06:16 -0800

> On Dec 28, 2019, at 10:12, Juancarlo Añez <apal...@gmail.com> wrote:
> As far as I understand it, my implementation of findalliter() matches the 
> semantics in the switch statement.


There’s nothing outside the switch statement that converts PyNone values to 
empty strings, so whatever the difference is, it must be inside the switch 
statement. And, even though your control flow is the same, there are two 
obvious ways in which you aren’t using the same input as the C code, so the 
semantics aren’t going to be the same.

The C code pulls values out of the pattern’s internal state without building a 
match object. My guess is that this is where the difference is—either building 
the match object, or somewhere inside the groups method, unmatched groups get 
converted into something else, which the groups method then replaces with its 
default parameter, which defaults to None, while the C state_getslice function 
is just pulling a 0-length string out of the input. But without diving into the 
code that’s just a guess.

The C code also switches on the number of groups in the pattern (which I think 
is exposed from the compiled pattern object?), not the number of results in the 
current match. I’d guess that’s guaranteed to always be the same even in weird 
cases like nested groups, so isn’t relevant here, but again that’s just a guess.

> This is the matching implementation:
> 
>     for m in re.finditer(pattern, string, flags=flags):
>         g = m.groups()
>         if len(g) == 1:
>             yield g[0]
>         elif g:
>             yield tuple(s if s else '' for s in g)
>         else:
>             yield m.group()

Why not just call groups(default='') instead of calling groups() to replace 
them with None and then using a genexpr to convert that None to ''?

More importantly, you can’t return '', you have to return '' or b'' depending 
on the type of the input string, using the same rule (whatever it is) that 
findall and the rest of the module use. (I think that’s worked out at compile 
time and exposed on the compiler pattern object, but I’m not sure.)

And even using a default with groups assumes I guessed right about the problem, 
and that it’s the only difference in behavior. If not, it may still be a hack 
that only sometimes gets the right answer and just nobody’s thought up a test 
case otherwise. I think you really do need to go through either the C code or 
the docs to make sure there aren’t any other edge cases.

> Updated unit test:

Are there tests for findall and/or finditer in the stdlib test suite with wide 
coverage that you could adapt to compare list(findalliter) vs. findall or 
something?

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/M5BZOBGXBEUGOATC36DTMDOPMT2CQAO2/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: findfirst() from findalliter(), and first

Reply via email to