> Code examples should of course be used sparingly, but I think re.finditer() could benefit from at least one
Clarification: I see that there's an example of it being used in https://docs.python.org/3.8/library/re.html#finding-all-adverbs-and-their-positions and one more complex example with https://docs.python.org/3.8/library/re.html#writing-a-tokenizer. I was specifically referring to including a basic example directly within https://docs.python.org/3.8/library/re.html#re.finditer, similar to the section for https://docs.python.org/3.8/library/re.html#re.split or https://docs.python.org/3.8/library/re.html#re.sub. Alternatively: creating a new section under https://docs.python.org/3.8/library/re.html#regular-expression-examples, titled "Finding the first match", where it briefly explains the difference in behavior between using re.findall()[0] and re.finditer().group(1) (or re.finditer.group() when there's not a subgroup). Based on the discussions in this thread and code examples, this seems to be rather commonly misunderstood. On Sat, Dec 7, 2019 at 7:29 AM Kyle Stanley <aeros...@gmail.com> wrote: > Serhiy Storchaka wrote: > > My concern is that this will add complexity to the module documentation > > which is already too complex. re.findfirst() has more complex semantic > > (if no capture groups returns this, if one capture group return that, > > and in other cases return even something of different type) than > > re.search() which just returns a match object or None. This will > > increase chance that the user miss the appropriate function and use > > suboptimal functions like findall()[0]. > > > re.finditer() is more modern and powerful function than re.findall(). > > The latter may be even deprecated in future. > > Hmm, perhaps another consideration then would be to think of improvements > to make to the existing documentation, particularly with including some > code examples or expanding upon the docs for re.finditer() to make its > usage more clear. Personally, it took me quite a while to understand its > role in the module (as someone who does not use it on a frequent basis). > Code examples should of course be used sparingly, but I think re.finditer() > could benefit from at least one. Especially considering that far less > complex functions in the module have several examples. See > https://docs.python.org/3.8/library/re.html#re.finditer. > > Serhiy Storchaka wrote: > > > Another option to consider might be adding a boolean parameter to > > > re.search() that changes the behavior to directly return a string > > > instead of a match object, similar to re.findall() when there are not > > > multiple subgroups. > > > Oh, no, this is the worst idea! > > Yeah, after having some time to reflect on that idea a bit more I don't > think it would work. That would just end up adding confusion to > re.search(), ultimately defeating the purpose of the parameter in the first > place. It would be too drastic of a change in behavior for a single > parameter to make. > > Thanks for the honesty though, not all of my ideas are good ones. But, if > I can come up with something half-decent every once in a while I think it's > worth throwing them out there. (: > > > > > On Sat, Dec 7, 2019 at 2:56 AM Serhiy Storchaka <storch...@gmail.com> > wrote: > >> 06.12.19 23:20, Kyle Stanley пише: >> > Serhiy Storchaka wrote: >> > > It seems that in most cases the author just do not know about >> > > re.search(). Adding re.findfirst() will not fix this. >> > >> > That's definitely possible, but it might be just as likely that they >> saw >> > re.findall() as being more simple to use compared to re.search(). >> > Although it has worse performance by a substantial amount when parsing >> > decent amounts of text (assuming the first match isn't at the end), >> > ``re.findall()[0]`` /consistently/ returns the first string that was >> > matched, as long as no subgroups were used. This allows them to >> > circumvent the usage of match objects entirely, which makes it a bit >> > easier to learn. Especially for those who are less familiar with OOP, >> or >> > are already familiar with other popular flavors of regex (such as JS). >> > >> > I'll admit this is mostly speculation, but I think there's an >> especially >> > large number of re users (compared to other modules) that aren't >> > necessarily developers, and might just be someone who wants to write a >> > script to quickly parse some documents. These types of users are the >> > ones who would likely benefit the most from the proposed >> re.findfirst(), >> > particularly if it directly returns a string as Guido is suggesting. >> > >> > I think at the end of the day, the critical question to answer is this: >> > >> > *Do we want to add a new helper function that's easy to use, >> consistent, >> > and provides good performance for finding the first match, even if the >> > functionality already exists within the module?* >> >> My concern is that this will add complexity to the module documentation >> which is already too complex. re.findfirst() has more complex semantic >> (if no capture groups returns this, if one capture group return that, >> and in other cases return even something of different type) than >> re.search() which just returns a match object or None. This will >> increase chance that the user miss the appropriate function and use >> suboptimal functions like findall()[0]. >> >> re.finditer() is more modern and powerful function than re.findall(). >> The latter may be even deprecated in future. >> >> In future we may add yet few functions/methods: re.rmatch() (like >> re.match(), but matches at the end of the string instead of the start), >> re.rsearch() (searches from the end), re.rfinditer() (iterates in the >> reversed order). Unlike to findfirst() they will implement features that >> cannot be easily expressed using existing functions. >> >> > Another option to consider might be adding a boolean parameter to >> > re.search() that changes the behavior to directly return a string >> > instead of a match object, similar to re.findall() when there are not >> > multiple subgroups. >> >> Oh, no, this is the worst idea! >> _______________________________________________ >> Python-ideas mailing list -- python-ideas@python.org >> To unsubscribe send an email to python-ideas-le...@python.org >> https://mail.python.org/mailman3/lists/python-ideas.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-ideas@python.org/message/C4VUEDFVLRJ5G7KTDI5G5RNC3MMP7X6V/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NBGSISE66WRMI3LHU6CNIDBBHVO4FXO5/ Code of Conduct: http://python.org/psf/codeofconduct/