> On Dec 28, 2019, at 10:12, Juancarlo Añez <apal...@gmail.com> wrote: > As far as I understand it, my implementation of findalliter() matches the > semantics in the switch statement.
There’s nothing outside the switch statement that converts PyNone values to empty strings, so whatever the difference is, it must be inside the switch statement. And, even though your control flow is the same, there are two obvious ways in which you aren’t using the same input as the C code, so the semantics aren’t going to be the same. The C code pulls values out of the pattern’s internal state without building a match object. My guess is that this is where the difference is—either building the match object, or somewhere inside the groups method, unmatched groups get converted into something else, which the groups method then replaces with its default parameter, which defaults to None, while the C state_getslice function is just pulling a 0-length string out of the input. But without diving into the code that’s just a guess. The C code also switches on the number of groups in the pattern (which I think is exposed from the compiled pattern object?), not the number of results in the current match. I’d guess that’s guaranteed to always be the same even in weird cases like nested groups, so isn’t relevant here, but again that’s just a guess. > This is the matching implementation: > > for m in re.finditer(pattern, string, flags=flags): > g = m.groups() > if len(g) == 1: > yield g[0] > elif g: > yield tuple(s if s else '' for s in g) > else: > yield m.group() Why not just call groups(default='') instead of calling groups() to replace them with None and then using a genexpr to convert that None to ''? More importantly, you can’t return '', you have to return '' or b'' depending on the type of the input string, using the same rule (whatever it is) that findall and the rest of the module use. (I think that’s worked out at compile time and exposed on the compiler pattern object, but I’m not sure.) And even using a default with groups assumes I guessed right about the problem, and that it’s the only difference in behavior. If not, it may still be a hack that only sometimes gets the right answer and just nobody’s thought up a test case otherwise. I think you really do need to go through either the C code or the docs to make sure there aren’t any other edge cases. > Updated unit test: Are there tests for findall and/or finditer in the stdlib test suite with wide coverage that you could adapt to compare list(findalliter) vs. findall or something?
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/M5BZOBGXBEUGOATC36DTMDOPMT2CQAO2/ Code of Conduct: http://python.org/psf/codeofconduct/