[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

2019-12-05 Thread Serhiy Storchaka

06.12.19 04:31, Guido van Rossum пише:
(There are apparently subtle differences between re.search() and 
re.findall() -- not sure if they matter in this case.)


There is no any differences.

Also, analyzing examples from GitHub, in most cases the pattern contains 
no or single group, so the code can be written as (if no groups)


result = (re.search(pattern, string) or [default])[0]

or (is a single group)

result = (re.search(pattern, string) or ['', default])[1]

And since most code do not handle the case when the pattern is not found 
in any case, it can be simplified even more.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PTZKHHPQYR336R5G5YGSCBYJJRRVBLUP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

2019-12-05 Thread Serhiy Storchaka

05.12.19 23:47, Kyle Stanley пише:


Serhiy Storchaka wrote:
 > We still do not know a use case for findfirst. If the OP would show his
 > code and several examples in others code this could be an argument for
 > usefulness of this feature.

I'm not sure about the OP's exact use case, but using GitHub's code 
search for .py files that match with "first re.findall" shows a decent 
amount of code that uses the format ``re.findall()[0]``. It would be 
nice if GitHub's search properly supported symbols and regular 
expressions, but this presents a decent number of examples. See 
https://github.com/search?l=Python=first+re.findall=Code.


I also spent some time looking for a few specific examples, since there 
were a number of false positives in the above results. Note that I 
didn't look much into the actual purpose of the code or judge it based 
on quality, I was just looking for anything that seemed remotely 
practical and contained something along the lines of 
``re.findall()[0]``. Several of the links below contain multiple lines 
where findfirst would likely be a better alternative, but I only 
included one permalink per code file.


Thank you Kyle for your investigation!


https://github.com/MohamedAl-Hussein/my_projects/blob/15feca5254fe1b2936d39369365867496ce5b2aa/fifa_workspace/fifa_market_analysis/fifa_market_analysis/items.py#L325


It is easy to rewrite it using re.search().

- input_processor=MapCompose(lambda x: re.findall(r'pointDRI = 
([0-9]+)', x)[0], eval),
+ input_processor=MapCompose(lambda x: re.search(r'pointDRI = 
([0-9]+)', x).group(1), eval),


I also wonder if it is worth to replace eval with more efficient and 
safe int.




https://github.com/MohamedAl-Hussein/FIFA/blob/2b1390fe46f94648e5b0bcfd28bc67a3bc43f09d/fifa_data/fifa_data/items.py#L370


It is the same code differently formatted.


https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab0d5e3379b04588c/new_jersey.py#L82


-   clerk_name = name_re.findall(clerk)[0]
+   clerk_name = name_re.search(clerk).group(1)



https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab0d5e3379b04588c/connecticut.py#L182


- official_name = name_re.findall(town)[0].title()
+ official_name = name_re.search(town).group().title()



https://github.com/jessyL6/CQUPTHUB-spiders_task1/blob/db73c47c0703ed01eb2a6034c37edd9e18abb2e0/ZhongBiao2/spiders/zhongbiao2.py#L176


- first_1_results = re.findall(first_1,all_list9)[0]
+ first_1_results = re.findall(first_1,all_list9).group(1)




https://github.com/kerinin/giscrape/blob/d398206ed4a7e48e1ef6afbf37b4f98784cf2442/giscrape/spiders/people_search.py#L26


It is a complex example which performs multiple searches with different 
regular expressions. It is all can be replaced with a single more 
efficient regular expression.


-   if re.search('^(\w+) (\w+)$', parcel.owner):
- last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
-   elif re.search('^(\w+) (\w+) (\w+)$', parcel.owner):
- last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner 
)[0]

-   elif re.search('^(\w+) (\w+)  (\w+)$', parcel.owner):
- last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
-   elif re.search('^(\w+) (\w+) (\w+) : (\w+)$', parcel.owner):
- last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner 
)[0]

-   elif re.search('^(\w+) (\w+)  (\w+) (\w+)$', parcel.owner):
- last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
-   elif re.search('^(\w+) (\w+) (\w+) : (\w+) (\w+)$', parcel.owner):
- last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner 
)[0]

-   elif re.search('^(\w+) (\w+)  (\w+) (\w+) (\w+)$', parcel.owner):
- last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
-   elif re.search('^(\w+) (\w+) (\w+) : (\w+) (\w+) (\w+)$', 
parcel.owner):
- last, first, middle = re.findall( '(\w+) (\w+) (\w+)', 
parcel.owner	 )[0]


+   m = re.fullmatch('(\w+) (\w+)(?: (\w+))?(?: (?: \w+){1,3})?', 
parcel.owner)

+   if m:
+ last, first, middle = m.groups()



https://github.com/songweifun/parsebook/blob/529a86739208e9dc07abbb31363462e2921f00a0/dao/parseMarc.py#L211


This is the only example which checks if findall() returns an empty 
list. It calls findall() twice! Fortunately it can be easily optimized 
using a fact that the Match object support subscription. I used group() 
above because it is more explicit and works in older Python.


- self.item.first_tutor_name = REGPX_A.findall(value)[0] if 
REGPX_A.findall(value) else ''
+ self.item.first_tutor_name = (REGPX_A.search(value) or 
[''])[0]



It seems that in most cases the author just do not know about 
re.search(). Adding re.findfirst() will not fix this.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org

[Python-ideas] Re: Argumenting in favor of first()

2019-12-05 Thread Guido van Rossum
On Thu, Dec 5, 2019 at 22:08 Jonathan Goble  wrote:

> On Fri, Dec 6, 2019, 12:47 AM Steven D'Aprano  wrote:
>
>> On Thu, Dec 05, 2019 at 05:40:05PM -0400, Juancarlo Añez wrote:
>> > I just found this code:
>> >
>> > def get_product_item(jsonld_items):
>> > for item in jsonld_items:
>> > if item['@type'] == 'Product':
>> > return item
>> > else:
>> > return {}
>>
>> I'm sorry, I can't tell what that is supposed to do. Is the "return {}"
>> supposed to be inside the loop? If so, it has been accidentally
>> dedented. Is it meant to be outside the loop? The "for-else" is
>> redundent, since there is no break.
>>
>
> "return", like "break", causes the "else" suite to be skipped.
>
> https://docs.python.org/3/reference/compound_stmts.html#the-for-statement
> does not clearly specify this; it only says that the else suite is executed
> when the iterator is exhausted or empty, and that "break" skips it. Perhaps
> a sentence should be added to clearly and unambiguously state that "return"
> skips it also?
>

No, that follows from the semantics of return. (Same for raise.) Break is
mentioned specifically because it transfers control to the code after the
loop, just like exhausting the iterable. The difference between exhaustion
and break is the reason the else clause (on loops) exists -- so we can do
something only when no break is taken. (Only a finally clause can intercept
return.)

> --
--Guido (mobile)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MEAUG55DPM5FOZCYLIX3EWUJX6YYWYYV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Argumenting in favor of first()

2019-12-05 Thread Jonathan Goble
On Fri, Dec 6, 2019, 12:47 AM Steven D'Aprano  wrote:

> On Thu, Dec 05, 2019 at 05:40:05PM -0400, Juancarlo Añez wrote:
> > I just found this code:
> >
> > def get_product_item(jsonld_items):
> > for item in jsonld_items:
> > if item['@type'] == 'Product':
> > return item
> > else:
> > return {}
>
> I'm sorry, I can't tell what that is supposed to do. Is the "return {}"
> supposed to be inside the loop? If so, it has been accidentally
> dedented. Is it meant to be outside the loop? The "for-else" is
> redundent, since there is no break.
>

"return", like "break", causes the "else" suite to be skipped.

https://docs.python.org/3/reference/compound_stmts.html#the-for-statement
does not clearly specify this; it only says that the else suite is executed
when the iterator is exhausted or empty, and that "break" skips it. Perhaps
a sentence should be added to clearly and unambiguously state that "return"
skips it also?

>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NLXONUKK7LDUPDST6WCVDQY2GTEXJNE4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Argumenting in favor of first()

2019-12-05 Thread Steven D'Aprano
On Thu, Dec 05, 2019 at 05:40:05PM -0400, Juancarlo Añez wrote:
> I just found this code:
> 
> def get_product_item(jsonld_items):
> for item in jsonld_items:
> if item['@type'] == 'Product':
> return item
> else:
> return {}

I'm sorry, I can't tell what that is supposed to do. Is the "return {}" 
supposed to be inside the loop? If so, it has been accidentally 
dedented. Is it meant to be outside the loop? The "for-else" is 
redundent, since there is no break.

for item in jsonld_items:
if item['@type'] == 'Product':
return item
else:  # Even this "else" is redundent too.
return {}

for item in jsonld_items:
if item['@type'] == 'Product':
return item
# else is unnecessary here
return {}

The name says it returns an item, but the default is to return an empty 
dict, which seems like an unusual choice for the "default product where 
no product is specified". I would have guessed None if the product is 
missing.


> My argument is that the intent is clearer in:
> 
> def get_product_item(jsonld_items):
> return first((item for item in jsonld_items if item['@type'] ==
> 'Product'), {})

That's certainly an improvement, but the helper function "first" is 
redundant. You can just write:

return next((item for item in jsonld_items if item['@type'] == 
'Product'), default={})

What's the benefit of adding a new builtin which is essentially just a 
thin do-almost-nothing wrapper around `next`? Have I missed something?


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/S5JG75H7DSQINDAPGZSKGVOLIQRFOESD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

2019-12-05 Thread Guido van Rossum
On Thu, Dec 5, 2019 at 6:16 PM Juancarlo Añez  wrote:

> It’s unfortunate that these functions aren’t better matched. Why is there
>> a simple-semantics find-everything and a match-semantics find-iteratively
>> and find-one? But I don’t think adding a simple-semantics find-one that
>> works by inefficiently finding all is the right solution.
>>
>
> The proposed implementation for *findfirst()* is:
>
> *return next(finditer(pattern, text, flags=flags), default=default)*
>
>
Um, finditer() returns a Match object, and IIUC findfirst() should return a
string, or a tuple of groups if there's more than one group. So the actual
implementation would be a bit more involved. Something like this, to match
findall() better:

for match in re.finditer(pattern, text, flags=flags):
# Only act on first match
groups = match.groups()
if not groups:
return match.group(0)  # Whole match
if len(groups) == 1:
return groups[0]  # One match
return groups
# No match, use default
return default

Alternatively, replace the first line with this:

match = re.search(pattern, text, flags=flags)
if match is not None:

(There are apparently subtle differences between re.search() and
re.findall() -- not sure if they matter in this case.)

And if the point of proposing first is that novices will figure out how to
>> write first(findall(…)) so we don’t need to add findfirst, then I think we
>> need findfirst even more, because novices shouldn’t learn that bad idea.
>>
>
Yes, my point exactly.


> I posted another thread to argue in favor of *first()*, independently of
> *findfirst().*
>

Also agreed, I've observed that as a common pattern.

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RC2IAT2IP55XFYZLZ2ENAXZCXQCHER2F/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

2019-12-05 Thread Sebastian Kreft
To overcome Github's search limitations, one can use Chrome's codesearch or
the public github dataset available on bigquery (note: it's only a sample
from 2012 if I'm not mistaken).

https://cs.chromium.org/search/?q=lang:py+re%5C.findall%5C(.*%5C)%5C%5B0%5C%5D=package:chromium=cs
returns
5 results

while the following query:

SELECT COUNT(*) FROM (SELECT
  c.id id,
  c.content content,
  f.repo_name repo_name,
  f.path path
FROM
  `bigquery-public-data.github_repos.sample_files` f
JOIN (
  SELECT
*
  FROM
`bigquery-public-data.github_repos.sample_contents`
 ) c
ON
  f.id = c.id
WHERE
  ENDS_WITH(f.path, ".py") AND
  REGEXP_CONTAINS(c.content, "re\\.findall\\(.*\\)\\[0\\]")
)

returns 84 entries.

On Thu, Dec 5, 2019 at 6:51 PM Kyle Stanley  wrote:

>
> Serhiy Storchaka wrote:
> > We still do not know a use case for findfirst. If the OP would show his
> > code and several examples in others code this could be an argument for
> > usefulness of this feature.
>
> I'm not sure about the OP's exact use case, but using GitHub's code search
> for .py files that match with "first re.findall" shows a decent amount of
> code that uses the format ``re.findall()[0]``. It would be nice if GitHub's
> search properly supported symbols and regular expressions, but this
> presents a decent number of examples. See
> https://github.com/search?l=Python=first+re.findall=Code.
>
> I also spent some time looking for a few specific examples, since there
> were a number of false positives in the above results. Note that I didn't
> look much into the actual purpose of the code or judge it based on quality,
> I was just looking for anything that seemed remotely practical and
> contained something along the lines of ``re.findall()[0]``. Several of the
> links below contain multiple lines where findfirst would likely be a better
> alternative, but I only included one permalink per code file.
>
>
> https://github.com/MohamedAl-Hussein/my_projects/blob/15feca5254fe1b2936d39369365867496ce5b2aa/fifa_workspace/fifa_market_analysis/fifa_market_analysis/items.py#L325
>
> https://github.com/MohamedAl-Hussein/FIFA/blob/2b1390fe46f94648e5b0bcfd28bc67a3bc43f09d/fifa_data/fifa_data/items.py#L370
>
> https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab0d5e3379b04588c/new_jersey.py#L82
>
> https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab0d5e3379b04588c/connecticut.py#L182
>
> https://github.com/jessyL6/CQUPTHUB-spiders_task1/blob/db73c47c0703ed01eb2a6034c37edd9e18abb2e0/ZhongBiao2/spiders/zhongbiao2.py#L176
>
> https://github.com/kerinin/giscrape/blob/d398206ed4a7e48e1ef6afbf37b4f98784cf2442/giscrape/spiders/people_search.py#L26
>
> https://github.com/songweifun/parsebook/blob/529a86739208e9dc07abbb31363462e2921f00a0/dao/parseMarc.py#L211
>
> I'm sure there are far more examples and perhaps some more "realistic"
> ones, I only went through the first few pages of results.
>
> On Thu, Dec 5, 2019 at 3:08 PM Serhiy Storchaka 
> wrote:
>
>> 05.12.19 21:07, Guido van Rossum пише:
>> > The case for findfirst() becomes stronger! There seem plenty of ways to
>> > get this wrong.
>>
>> I write several functions every day. There are many ways to get this
>> wrong. But I do not propose to include all these functions in the
>> stdlib. If I want to include even a single function, I try to find
>> several examples that would benefit from adding this function in the
>> stdlib. If I found less examples than I expected I withdraw my idea.
>>
>> We still do not know a use case for findfirst. If the OP would show his
>> code and several examples in others code this could be an argument for
>> usefulness of this feature.
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/YKVXRQAST6B7CRNN7LFBZXWVHH6G42YC/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/7P4ZZMIL2ZFJOONUSZPNUBOZTAAEMASY/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Sebastian Kreft
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3Y4FHNORMERUZCXJS7XX2ZE4O4KIJCKN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Argumenting in favor of first()

2019-12-05 Thread Ethan Furman

On 12/05/2019 03:11 PM, Josh Rosenberg wrote:


"Also, the for-loop version quits the moment it finds a Product type, while the 
`first` version has to first process the entire jsonld_items structure."

The first version doesn't have to process the whole structure; it's written 
with a generator expression, so it only tests and produces values on demand, 
and next stops demanding them as soon as it gets a single result.


Ah, thanks.  My genexp foo is weak.  :(

--
~Ethan~
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/S5OTRK3SKWNES6FKRUZ344MALEFFWMVJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Argumenting in favor of first()

2019-12-05 Thread Guido van Rossum
I have encountered plenty of uses for first(), usually the argument is a
list or even a dict or set that’s already computed — for sets, it’s common
that it’s known there’s only one element, the question is how to get that
element.

On Thu, Dec 5, 2019 at 14:28 Andrew Barnert via Python-ideas <
python-ideas@python.org> wrote:

> On Dec 5, 2019, at 13:43, Juancarlo Añez  wrote:
>
>
> 
> I just found this code:
>
> def get_product_item(jsonld_items):
> for item in jsonld_items:
> if item['@type'] == 'Product':
> return item
> else:
> return {}
>
>
> My argument is that the intent is clearer in:
>
> def get_product_item(jsonld_items):
> return first((item for item in jsonld_items if item['@type'] ==
> 'Product'), {})
>
>
> There’s already a first_true in the recipes that does the filtering and
> firsting together.
>
> Of course there’s the usual issue about filter vs. genexpr when your
> predicate is naturally an in-line expression rather than a function, but
> otherwise:
>
> from more_itertools import first_true
>
> def get_product_item(jsonId_items):
> return first_true(jsonId_items, lambda item: item['@type'] ==
> 'Product', {})
>
> Or even:
>
> get_product_item = partial(first_true, pred=lambda item: item['@type']
> == 'Product', default={})
>
> I think in this case, because of the expression-vs.-lambda issue, I’d
> write it with first and a genexpr if we had both. But a lot of itertoolsy
> code is full of lambdas and partials like this (to call filter and map, use
> as groupby and unique keys, etc.), and often the condition is something
> you’ve already wrapped up as a function in the first place, so I’m not sure
> how generally that applies without more examples.
>
> As a reminder, first()'s definition in Python is:
>
> def first(seq, default=None):
> return next(iter(seq), default=default)
>
>
> It could be optimized (implemented in C) if it makes it into the stdlib.
>
>
> I don’t think it needs to be optimized. Most itertoolsy things that can be
> built by composing existing functions (like most of the recipes) are fast
> enough in Python; it’s the stuff that needs to loop and yield (like most of
> the stuff actually in the module) that gets a major improvement in C. It’s
> worth measuring rather than guessing, but my guess would be that the same
> applies here.
>
> The only problem is that right now, the entire module is in C, so anything
> that’s not optimized has to be a recipe and vice-versa. I’m pretty sure the
> idea of splitting itertools into a Python module with a C accelerator (like
> a lot of other modules in the stdlib) has come up before, but either
> there’s never a good enough use case, or just nobody volunteers to do it.
> And first might well be the candidate that’s worth it.
>
> Then again, maybe it is worth optimizing. Or maybe it’s fine to just list
> it as a recipe (the recipes docs already link to more_itertools, which
> already has it).
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/EEONODVROFGOL6AE3D7F772OHZYASRZN/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
--Guido (mobile)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WYGECCEPYBYJF5ALJTMMVKT45FZIB2IG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Argumenting in favor of first()

2019-12-05 Thread Andrew Barnert via Python-ideas
On Dec 5, 2019, at 13:43, Juancarlo Añez  wrote:
> 
> 
> I just found this code:
> 
> def get_product_item(jsonld_items):
> for item in jsonld_items:
> if item['@type'] == 'Product':
> return item
> else:
> return {}
> 
> My argument is that the intent is clearer in:
> 
> def get_product_item(jsonld_items):
> return first((item for item in jsonld_items if item['@type'] == 
> 'Product'), {})

There’s already a first_true in the recipes that does the filtering and 
firsting together.

Of course there’s the usual issue about filter vs. genexpr when your predicate 
is naturally an in-line expression rather than a function, but otherwise:

from more_itertools import first_true

def get_product_item(jsonId_items):
return first_true(jsonId_items, lambda item: item['@type'] == 
'Product', {})

Or even:

get_product_item = partial(first_true, pred=lambda item: item['@type'] == 
'Product', default={})

I think in this case, because of the expression-vs.-lambda issue, I’d write it 
with first and a genexpr if we had both. But a lot of itertoolsy code is full 
of lambdas and partials like this (to call filter and map, use as groupby and 
unique keys, etc.), and often the condition is something you’ve already wrapped 
up as a function in the first place, so I’m not sure how generally that applies 
without more examples.

> As a reminder, first()'s definition in Python is:
> 
> def first(seq, default=None):
> return next(iter(seq), default=default)
> 
> It could be optimized (implemented in C) if it makes it into the stdlib.

I don’t think it needs to be optimized. Most itertoolsy things that can be 
built by composing existing functions (like most of the recipes) are fast 
enough in Python; it’s the stuff that needs to loop and yield (like most of the 
stuff actually in the module) that gets a major improvement in C. It’s worth 
measuring rather than guessing, but my guess would be that the same applies 
here.

The only problem is that right now, the entire module is in C, so anything 
that’s not optimized has to be a recipe and vice-versa. I’m pretty sure the 
idea of splitting itertools into a Python module with a C accelerator (like a 
lot of other modules in the stdlib) has come up before, but either there’s 
never a good enough use case, or just nobody volunteers to do it. And first 
might well be the candidate that’s worth it.

Then again, maybe it is worth optimizing. Or maybe it’s fine to just list it as 
a recipe (the recipes docs already link to more_itertools, which already has 
it).___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EEONODVROFGOL6AE3D7F772OHZYASRZN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

2019-12-05 Thread Kyle Stanley
Serhiy Storchaka wrote:
> We still do not know a use case for findfirst. If the OP would show his
> code and several examples in others code this could be an argument for
> usefulness of this feature.

I'm not sure about the OP's exact use case, but using GitHub's code search
for .py files that match with "first re.findall" shows a decent amount of
code that uses the format ``re.findall()[0]``. It would be nice if GitHub's
search properly supported symbols and regular expressions, but this
presents a decent number of examples. See
https://github.com/search?l=Python=first+re.findall=Code.

I also spent some time looking for a few specific examples, since there
were a number of false positives in the above results. Note that I didn't
look much into the actual purpose of the code or judge it based on quality,
I was just looking for anything that seemed remotely practical and
contained something along the lines of ``re.findall()[0]``. Several of the
links below contain multiple lines where findfirst would likely be a better
alternative, but I only included one permalink per code file.

https://github.com/MohamedAl-Hussein/my_projects/blob/15feca5254fe1b2936d39369365867496ce5b2aa/fifa_workspace/fifa_market_analysis/fifa_market_analysis/items.py#L325
https://github.com/MohamedAl-Hussein/FIFA/blob/2b1390fe46f94648e5b0bcfd28bc67a3bc43f09d/fifa_data/fifa_data/items.py#L370
https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab0d5e3379b04588c/new_jersey.py#L82
https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab0d5e3379b04588c/connecticut.py#L182
https://github.com/jessyL6/CQUPTHUB-spiders_task1/blob/db73c47c0703ed01eb2a6034c37edd9e18abb2e0/ZhongBiao2/spiders/zhongbiao2.py#L176
https://github.com/kerinin/giscrape/blob/d398206ed4a7e48e1ef6afbf37b4f98784cf2442/giscrape/spiders/people_search.py#L26
https://github.com/songweifun/parsebook/blob/529a86739208e9dc07abbb31363462e2921f00a0/dao/parseMarc.py#L211

I'm sure there are far more examples and perhaps some more "realistic"
ones, I only went through the first few pages of results.

On Thu, Dec 5, 2019 at 3:08 PM Serhiy Storchaka  wrote:

> 05.12.19 21:07, Guido van Rossum пише:
> > The case for findfirst() becomes stronger! There seem plenty of ways to
> > get this wrong.
>
> I write several functions every day. There are many ways to get this
> wrong. But I do not propose to include all these functions in the
> stdlib. If I want to include even a single function, I try to find
> several examples that would benefit from adding this function in the
> stdlib. If I found less examples than I expected I withdraw my idea.
>
> We still do not know a use case for findfirst. If the OP would show his
> code and several examples in others code this could be an argument for
> usefulness of this feature.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/YKVXRQAST6B7CRNN7LFBZXWVHH6G42YC/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7P4ZZMIL2ZFJOONUSZPNUBOZTAAEMASY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: lowercase exception names trip-up .

2019-12-05 Thread Matthias Bussonnier
Thanks for pointing those out. 

At least when the alias is `error = OtherName`  the text is the stack trace are 
informative. 
-- 
M
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JSHM7FGTJA7CSCUVZFLKYYFMS3DU7NKB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

2019-12-05 Thread Serhiy Storchaka

05.12.19 21:07, Guido van Rossum пише:
The case for findfirst() becomes stronger! There seem plenty of ways to 
get this wrong.


I write several functions every day. There are many ways to get this 
wrong. But I do not propose to include all these functions in the 
stdlib. If I want to include even a single function, I try to find 
several examples that would benefit from adding this function in the 
stdlib. If I found less examples than I expected I withdraw my idea.


We still do not know a use case for findfirst. If the OP would show his 
code and several examples in others code this could be an argument for 
usefulness of this feature.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YKVXRQAST6B7CRNN7LFBZXWVHH6G42YC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Allow user extensions to operators [related to: Moving PEP 584 forward (dict + and += operators)]

2019-12-05 Thread Brett Cannon
On Wed, Dec 4, 2019 at 3:12 PM Guido van Rossum  wrote:

> We could treat it as a kind of future statement. If there’s a top level
> statement that defines the magic identitier we generate the special
> bytecode.
>

True, that would help solve the performance issue.

But I'm still -1 on the idea regardless of the performance. :)


>
> On Wed, Dec 4, 2019 at 12:26 Brett Cannon  wrote:
>
>>
>>
>> On Tue, Dec 3, 2019 at 5:58 PM Random832  wrote:
>>
>>> On Tue, Dec 3, 2019, at 13:43, Brett Cannon wrote:
>>> > -1 from me. I can see someone not realizing an operator was changed,
>>> > assuming it's standard semantics, and then having things break subtly.
>>> > And debugging this wouldn't be fun either. To me this is
>>> monkeypatching
>>> > without an explicit need for it, i.e. if you really want different
>>> > semantics in your module then define a function and use that instead
>>> of
>>> > influence-at-a-distance overriding of syntax.
>>>
>>> Does it make a difference that it'd only apply to code that is
>>> physically in the same module where the function is defined? I'd originally
>>> planned to suggest full lexical scope for the lookup, in fact, so you could
>>> in theory do it within a single function.
>>>
>>
>> Not enough to change my opinion. Changing how fundamental operators work
>> just in a module is still influencing too far from the code site. For
>> instance, if I navigate in my editor directly into the middle of a file or
>> open a file and immediately start searching for the function I care about I
>> won't notice that "+" no longer means what I thought it meant for integers
>> because someone thought it would be smart to redefine that.
>>
>> And as Serhiy pointed out, performance is going to get slammed by this,
>> no opcode or not as you just introduced a new lookup on every syntactic
>> operation.
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/DXZO7UVMBXMV7KNBPZU25YA3PQLUI4NF/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> --
> --Guido (mobile)
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/34DZCM7APSYSN2WGVP6U4RDGYYTROQ32/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

2019-12-05 Thread Andrew Barnert via Python-ideas
On Dec 5, 2019, at 08:53, Juancarlo Añez  wrote:
> 
> 
> The proposed implementation of a findfirst() would handle many common cases, 
> and be friendly to newcomers (why do I need to deal with a Match object?), 
> specially if the semantics are those of findall():
> 
>  next(iter(findall(...)), default=default)

The problem with using findall instead of finditer or search is that it scans 
the whole document rather than just until the first match, and it builds a 
potentially huge list just to throw it away. It’s pretty common that one or 
both of those will be a serious performance issue. Imagine asking to find the 
first double consonant in the OED and it takes a minute to run and pins a 
gigabyte of memory.

It’s unfortunate that these functions aren’t better matched. Why is there a 
simple-semantics find-everything and a match-semantics find-iteratively and 
find-one? But I don’t think adding a simple-semantics find-one that works by 
inefficiently finding all is the right solution.

And if the point of proposing first is that novices will figure out how to 
write first(findall(…)) so we don’t need to add findfirst, then I think we need 
findfirst even more, because novices shouldn’t learn that bad idea.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7ZBFNY6I74V2DBONGSCHDEDVKHNU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

2019-12-05 Thread Andrew Barnert via Python-ideas
On Dec 5, 2019, at 08:53, Juancarlo Añez  wrote:
> 
> BTW, a common function in extensions to itertools is first():
> 
> def first(seq, default=None):
> return next(iter(seq), default= default)
> 
> That function, first(), would also be a nice addition in itertools, and 
> findfirst() could be implemented using it. first() avoids most use cases 
> needing to check if a sequence or iterator is empty before using a default 
> value. MHO is that first() deals with so many common cases that it should be 
> a builtin.

I think this was proposed for itertools and rejected. I don’t remember why, but 
generally there’s resistance to adding anything that you could write yourself 
(and are unlikely to get wrong) on top of itertools and builtins, unless it 
needs to loop and yield itself (in which case it might need the performance 
boost of iterating in C instead of Python), because that’s what the recipes are 
for. And I suppose if you see the recipe for nth you don’t learn anything from 
the recipe for first.

But people seem more open to recipes being “everything useful” rather than only 
“everything useful that also teaches an important idea”, and the recipe docs 
even link to more-itertools for people looking to use them out of the box (and 
first is in more-itertools). Also, I think it’s pretty clear that people often 
don’t think of first when they need it, so even if they could write it if they 
thought of it, they don’t because they don’t.

So maybe it’s worth at least adding first as a recipe, even if people don’t 
think it’s worth adding to the module itself?

(Personally, I use first if I’ve already imported more-itertools for something 
else, but otherwise I just next Iter.)___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/H3GURL35C7AZ3ZBK6CQZGGCISUZ42WDV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Fwd: Re: Fwd: re.findfirst()

2019-12-05 Thread Juancarlo Añez
On Wed, Dec 4, 2019 at 3:02 PM Guido van Rossum  wrote:

> Fair enough. I’ll let the OP defend his use case.
>

The OP thinks that the case for wanting just the string for a first regex
match, or a verifiable default if there is no match, is way too common,
that the advice on the web is not very good (it should be "write a
findfirst() using next() over finditer()", and that novices default to
using findall(..)[0], which is troublesome.

The proposed implementation of a findfirst() would handle many common
cases, and be friendly to newcomers (why do I need to deal with a Match
object?), specially if the semantics are those of *findall()*:

 next(iter(findall(...)), default=default)

BTW, a common function in extensions to *itertools* is *first():*

def first(seq, default=None):
return next(iter(seq), default= default)

That function, *first()*, would also be a nice addition in *itertools*, and
*findfirst()* could be implemented using it. *first()* avoids most use
cases needing to check if a sequence or iterator is empty before using a
default value. MHO is that *first()* deals with so many common cases that
it should be a builtin.

Note that the case for *findfirst()* is weaker if *first()* is available.
Yet *findfirst()* solves the bigger problem.

-- 
Juancarlo *Añez*
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/M2OZCN5C26YUJJ4EXLIIXHQBGF6IM5GW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Suggestion for language addition

2019-12-05 Thread Eric Fahlgren
My little experiments in 3.7 show exception setup is about 40% more costly
than just a do-nothing loop, but execution of is about 9x more expensive
than doing nothing, so actually very little cost if your loop only rarely
catches the exception (I assume you'll probably actually do something
inside the loop, which would reduce the proportional overhead of try-setup).

no try   :   0.22s : N= 1000 : stddev=0.0
unraised :   0.30s : N= 1000 : stddev=0.5
raised   :   0.000180s : N= 1000 : stddev=0.0


from tools.timethis import timethis  # Dabeaz's timing tool.

N = 1_000
for __ in range(1_000):
with timethis("no try"):
for _ in range(N):
pass

with timethis("unraised"):
for _ in range(N):
try:
pass
except ZeroDivisionError:
pass

with timethis("raised"):
for _ in range(N):
try:
raise ZeroDivisionError
except ZeroDivisionError:
pass



On Wed, Dec 4, 2019 at 9:22 PM Serhiy Storchaka  wrote:

> 05.12.19 04:43, Andrew Barnert via Python-ideas пише:
> > Yes, you have to unlearn it. Exceptions are not that expensive in Python
> (and in a lot of other modern languages)—but even if they were, you’d still
> have to deal with the fact that Python uses them pervasively. Every for
> loop ends with an exception being thrown and caught, whether you like it or
> not.
>
> Raising and catching an exception in the C code is much cheaper than in
> the Python code. In Python you instantiate an exception and set its
> traceback, context and clause, in C you can just store several
> threadlocal pointers (most of them NULLs). In python you execute complex
> bytecode to catch an exception, in C you just read and compare few
> pointers in common case. StopIteration is not even raised in most case
> internally in the C code, it is raised only when leaked to the Python code.
>
> So exceptions are still expensive in Python, even comparing with other
> Python code. But outside of tight loops or if you do some input/output,
> they are proper tool for control flow.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/KRBXARP6XAWFK4IDXCRIOSJT2AYN276I/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EKRBMHHUVFY7P2KDJRSLTAK7CWEXOJHI/
Code of Conduct: http://python.org/psf/codeofconduct/