[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

Serhiy Storchaka Thu, 05 Dec 2019 23:49:21 -0800

05.12.19 23:47, Kyle Stanley пише:

Serhiy Storchaka wrote:
 > We still do not know a use case for findfirst. If the OP would show his
 > code and several examples in others code this could be an argument for
 > usefulness of this feature.
I'm not sure about the OP's exact use case, but using GitHub's codesearch for .py files that match with "first re.findall" shows a decentamount of code that uses the format ``re.findall()[0]``. It would benice if GitHub's search properly supported symbols and regularexpressions, but this presents a decent number of examples. Seehttps://github.com/search?l=Python&q=first+re.findall&type=Code.
I also spent some time looking for a few specific examples, since therewere a number of false positives in the above results. Note that Ididn't look much into the actual purpose of the code or judge it basedon quality, I was just looking for anything that seemed remotelypractical and contained something along the lines of``re.findall()[0]``. Several of the links below contain multiple lineswhere findfirst would likely be a better alternative, but I onlyincluded one permalink per code file.


Thank you Kyle for your investigation!

https://github.com/MohamedAl-Hussein/my_projects/blob/15feca5254fe1b2936d39369365867496ce5b2aa/fifa_workspace/fifa_market_analysis/fifa_market_analysis/items.py#L325


It is easy to rewrite it using re.search().

- input_processor=MapCompose(lambda x: re.findall(r'pointDRI =([0-9]+)', x)[0], eval),+ input_processor=MapCompose(lambda x: re.search(r'pointDRI =([0-9]+)', x).group(1), eval),

I also wonder if it is worth to replace eval with more efficient andsafe int.

https://github.com/MohamedAl-Hussein/FIFA/blob/2b1390fe46f94648e5b0bcfd28bc67a3bc43f09d/fifa_data/fifa_data/items.py#L370


It is the same code differently formatted.

https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab0d5e3379b04588c/new_jersey.py#L82


-       clerk_name = name_re.findall(clerk)[0]
+       clerk_name = name_re.search(clerk).group(1)

https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab0d5e3379b04588c/connecticut.py#L182


-     official_name = name_re.findall(town)[0].title()
+     official_name = name_re.search(town).group().title()

https://github.com/jessyL6/CQUPTHUB-spiders_task1/blob/db73c47c0703ed01eb2a6034c37edd9e18abb2e0/ZhongBiao2/spiders/zhongbiao2.py#L176


-             first_1_results = re.findall(first_1,all_list9)[0]
+             first_1_results = re.findall(first_1,all_list9).group(1)

https://github.com/kerinin/giscrape/blob/d398206ed4a7e48e1ef6afbf37b4f98784cf2442/giscrape/spiders/people_search.py#L26

It is a complex example which performs multiple searches with differentregular expressions. It is all can be replaced with a single moreefficient regular expression.


-   if re.search('^(\w+) (\w+)$', parcel.owner):
-     last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
-   elif re.search('^(\w+) (\w+) (\w+)$', parcel.owner):

- last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner)[0]

-   elif re.search('^(\w+) (\w+) &amp; (\w+)$', parcel.owner):
-     last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
-   elif re.search('^(\w+) (\w+) (\w+) &amp: (\w+)$', parcel.owner):

- last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner)[0]

-   elif re.search('^(\w+) (\w+) &amp; (\w+) (\w+)$', parcel.owner):
-     last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
-   elif re.search('^(\w+) (\w+) (\w+) &amp: (\w+) (\w+)$', parcel.owner):

- last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner)[0]

-   elif re.search('^(\w+) (\w+) &amp; (\w+) (\w+) (\w+)$', parcel.owner):
-     last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]

- elif re.search('^(\w+) (\w+) (\w+) &amp: (\w+) (\w+) (\w+)$',parcel.owner):- last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner )[0]

+ m = re.fullmatch('(\w+) (\w+)(?: (\w+))?(?: &(?: \w+){1,3})?',parcel.owner)

+   if m:
+     last, first, middle = m.groups()

https://github.com/songweifun/parsebook/blob/529a86739208e9dc07abbb31363462e2921f00a0/dao/parseMarc.py#L211

This is the only example which checks if findall() returns an emptylist. It calls findall() twice! Fortunately it can be easily optimizedusing a fact that the Match object support subscription. I used group()above because it is more explicit and works in older Python.

- self.item.first_tutor_name = REGPX_A.findall(value)[0] ifREGPX_A.findall(value) else ''+ self.item.first_tutor_name = (REGPX_A.search(value) or[''])[0]

It seems that in most cases the author just do not know aboutre.search(). Adding re.findfirst() will not fix this.

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/5O2TP5HZHHJC7E55K2OYVKND4ITDB5DM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

Reply via email to