05.12.19 23:47, Kyle Stanley пише:
Serhiy Storchaka wrote:
> We still do not know a use case for findfirst. If the OP would show his
> code and several examples in others code this could be an argument for
> usefulness of this feature.
I'm not sure about the OP's exact use case, but using GitHub's code
search for .py files that match with "first re.findall" shows a decent
amount of code that uses the format ``re.findall()[0]``. It would be
nice if GitHub's search properly supported symbols and regular
expressions, but this presents a decent number of examples. See
https://github.com/search?l=Python&q=first+re.findall&type=Code.
I also spent some time looking for a few specific examples, since there
were a number of false positives in the above results. Note that I
didn't look much into the actual purpose of the code or judge it based
on quality, I was just looking for anything that seemed remotely
practical and contained something along the lines of
``re.findall()[0]``. Several of the links below contain multiple lines
where findfirst would likely be a better alternative, but I only
included one permalink per code file.
Thank you Kyle for your investigation!
https://github.com/MohamedAl-Hussein/my_projects/blob/15feca5254fe1b2936d39369365867496ce5b2aa/fifa_workspace/fifa_market_analysis/fifa_market_analysis/items.py#L325
It is easy to rewrite it using re.search().
- input_processor=MapCompose(lambda x: re.findall(r'pointDRI =
([0-9]+)', x)[0], eval),
+ input_processor=MapCompose(lambda x: re.search(r'pointDRI =
([0-9]+)', x).group(1), eval),
I also wonder if it is worth to replace eval with more efficient and
safe int.
https://github.com/MohamedAl-Hussein/FIFA/blob/2b1390fe46f94648e5b0bcfd28bc67a3bc43f09d/fifa_data/fifa_data/items.py#L370
It is the same code differently formatted.
https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab0d5e3379b04588c/new_jersey.py#L82
- clerk_name = name_re.findall(clerk)[0]
+ clerk_name = name_re.search(clerk).group(1)
https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab0d5e3379b04588c/connecticut.py#L182
- official_name = name_re.findall(town)[0].title()
+ official_name = name_re.search(town).group().title()
https://github.com/jessyL6/CQUPTHUB-spiders_task1/blob/db73c47c0703ed01eb2a6034c37edd9e18abb2e0/ZhongBiao2/spiders/zhongbiao2.py#L176
- first_1_results = re.findall(first_1,all_list9)[0]
+ first_1_results = re.findall(first_1,all_list9).group(1)
https://github.com/kerinin/giscrape/blob/d398206ed4a7e48e1ef6afbf37b4f98784cf2442/giscrape/spiders/people_search.py#L26
It is a complex example which performs multiple searches with different
regular expressions. It is all can be replaced with a single more
efficient regular expression.
- if re.search('^(\w+) (\w+)$', parcel.owner):
- last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
- elif re.search('^(\w+) (\w+) (\w+)$', parcel.owner):
- last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner
)[0]
- elif re.search('^(\w+) (\w+) & (\w+)$', parcel.owner):
- last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
- elif re.search('^(\w+) (\w+) (\w+) &: (\w+)$', parcel.owner):
- last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner
)[0]
- elif re.search('^(\w+) (\w+) & (\w+) (\w+)$', parcel.owner):
- last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
- elif re.search('^(\w+) (\w+) (\w+) &: (\w+) (\w+)$', parcel.owner):
- last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner
)[0]
- elif re.search('^(\w+) (\w+) & (\w+) (\w+) (\w+)$', parcel.owner):
- last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
- elif re.search('^(\w+) (\w+) (\w+) &: (\w+) (\w+) (\w+)$',
parcel.owner):
- last, first, middle = re.findall( '(\w+) (\w+) (\w+)',
parcel.owner )[0]
+ m = re.fullmatch('(\w+) (\w+)(?: (\w+))?(?: &(?: \w+){1,3})?',
parcel.owner)
+ if m:
+ last, first, middle = m.groups()
https://github.com/songweifun/parsebook/blob/529a86739208e9dc07abbb31363462e2921f00a0/dao/parseMarc.py#L211
This is the only example which checks if findall() returns an empty
list. It calls findall() twice! Fortunately it can be easily optimized
using a fact that the Match object support subscription. I used group()
above because it is more explicit and works in older Python.
- self.item.first_tutor_name = REGPX_A.findall(value)[0] if
REGPX_A.findall(value) else ''
+ self.item.first_tutor_name = (REGPX_A.search(value) or
[''])[0]
It seems that in most cases the author just do not know about
re.search(). Adding re.findfirst() will not fix this.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/5O2TP5HZHHJC7E55K2OYVKND4ITDB5DM/
Code of Conduct: http://python.org/psf/codeofconduct/