[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

Kyle Stanley Sat, 07 Dec 2019 04:48:43 -0800

> Code examples should of course be used sparingly, but I think
re.finditer() could benefit from at least one


Clarification: I see that there's an example of it being used in
https://docs.python.org/3.8/library/re.html#finding-all-adverbs-and-their-positions
and one more complex example with
https://docs.python.org/3.8/library/re.html#writing-a-tokenizer. I was
specifically referring to including a basic example directly within
https://docs.python.org/3.8/library/re.html#re.finditer, similar to the
section for https://docs.python.org/3.8/library/re.html#re.split or
https://docs.python.org/3.8/library/re.html#re.sub.

Alternatively: creating a new section under
https://docs.python.org/3.8/library/re.html#regular-expression-examples,
titled "Finding the first match", where it briefly explains the difference
in behavior between using re.findall()[0] and re.finditer().group(1) (or
re.finditer.group() when there's not a subgroup). Based on the discussions
in this thread and code examples, this seems to be rather commonly
misunderstood.

On Sat, Dec 7, 2019 at 7:29 AM Kyle Stanley <[email protected]> wrote:

> Serhiy Storchaka wrote:
> > My concern is that this will add complexity to the module documentation
> > which is already too complex. re.findfirst() has more complex semantic
> > (if no capture groups returns this, if one capture group return that,
> > and in other cases return even something of different type) than
> > re.search() which just returns a match object or None. This will
> > increase chance that the user miss the appropriate function and use
> > suboptimal functions like findall()[0].
>
> > re.finditer() is more modern and powerful function than re.findall().
> > The latter may be even deprecated in future.
>
> Hmm, perhaps another consideration then would be to think of improvements
> to make to the existing documentation, particularly with including some
> code examples or expanding upon the docs for re.finditer() to make its
> usage more clear. Personally, it took me quite a while to understand its
> role in the module (as someone who does not use it on a frequent basis).
> Code examples should of course be used sparingly, but I think re.finditer()
> could benefit from at least one. Especially considering that far less
> complex functions in the module have several examples. See
> https://docs.python.org/3.8/library/re.html#re.finditer.
>
> Serhiy Storchaka wrote:
> > > Another option to consider might be adding a boolean parameter to
> > > re.search() that changes the behavior to directly return a string
> > > instead of a match object, similar to re.findall() when there are not
> > > multiple subgroups.
>
> > Oh, no, this is the worst idea!
>
> Yeah, after having some time to reflect on that idea a bit more I don't
> think it would work. That would just end up adding confusion to
> re.search(), ultimately defeating the purpose of the parameter in the first
> place. It would be too drastic of a change in behavior for a single
> parameter to make.
>
> Thanks for the honesty though, not all of my ideas are good ones. But, if
> I can come up with something half-decent every once in a while I think it's
> worth throwing them out there. (:
>
>
>
>
> On Sat, Dec 7, 2019 at 2:56 AM Serhiy Storchaka <[email protected]>
> wrote:
>
>> 06.12.19 23:20, Kyle Stanley пише:
>> > Serhiy Storchaka wrote:
>> >  > It seems that in most cases the author just do not know about
>> >  > re.search(). Adding re.findfirst() will not fix this.
>> >
>> > That's definitely possible, but it might be just as likely that they
>> saw
>> > re.findall() as being more simple to use compared to re.search().
>> > Although it has worse performance by a substantial amount when parsing
>> > decent amounts of text (assuming the first match isn't at the end),
>> > ``re.findall()[0]`` /consistently/ returns the first string that was
>> > matched, as long as no subgroups were used. This allows them to
>> > circumvent the usage of match objects entirely, which makes it a bit
>> > easier to learn. Especially for those who are less familiar with OOP,
>> or
>> > are already familiar with other popular flavors of regex (such as JS).
>> >
>> > I'll admit this is mostly speculation, but I think there's an
>> especially
>> > large number of re users (compared to other modules) that aren't
>> > necessarily developers, and might just be someone who wants to write a
>> > script to quickly parse some documents. These types of users are the
>> > ones who would likely benefit the most from the proposed
>> re.findfirst(),
>> > particularly if it directly returns a string as Guido is suggesting.
>> >
>> > I think at the end of the day, the critical question to answer is this:
>> >
>> > *Do we want to add a new helper function that's easy to use,
>> consistent,
>> > and provides good performance for finding the first match, even if the
>> > functionality already exists within the module?*
>>
>> My concern is that this will add complexity to the module documentation
>> which is already too complex. re.findfirst() has more complex semantic
>> (if no capture groups returns this, if one capture group return that,
>> and in other cases return even something of different type) than
>> re.search() which just returns a match object or None. This will
>> increase chance that the user miss the appropriate function and use
>> suboptimal functions like findall()[0].
>>
>> re.finditer() is more modern and powerful function than re.findall().
>> The latter may be even deprecated in future.
>>
>> In future we may add yet few functions/methods: re.rmatch() (like
>> re.match(), but matches at the end of the string instead of the start),
>> re.rsearch() (searches from the end), re.rfinditer() (iterates in the
>> reversed order). Unlike to findfirst() they will implement features that
>> cannot be easily expressed using existing functions.
>>
>> > Another option to consider might be adding a boolean parameter to
>> > re.search() that changes the behavior to directly return a string
>> > instead of a match object, similar to re.findall() when there are not
>> > multiple subgroups.
>>
>> Oh, no, this is the worst idea!
>> _______________________________________________
>> Python-ideas mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/[email protected]/message/C4VUEDFVLRJ5G7KTDI5G5RNC3MMP7X6V/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/NBGSISE66WRMI3LHU6CNIDBBHVO4FXO5/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

Reply via email to