This is just bar talk at this point.  I think we've shown that this is
easy enough to do that programmers can roll their own.

But as idle chat goes, note that in your code:

   set(unicodedata.category(ch) for ch in s)

If `s` is a billion characters long, then we make a billion calls to
the `.category()` method.  Python calls are comparatively expensive,
even on well optimized data structures like strings.

In my version:

    bool(set(s) & set(unicode_categories['Sc'])

The billion characters are first reduced to a smallish set of hundreds
or thousands of distinct characters without needing method calls. Then
that is intersected with a smallish set of characters in the category.

You could optimize your version, however, simply by using:

   set(unicodedata.category(set(ch)) for ch in s)

Yours provides more information, since it lists all the categories.
But if you REALLY only care about one category, then you still have to
ask `'Sc' in set(unicodedata.category(set(ch)) for ch in s)`.  Which
is fine, that's not a hard question to ask.

On Fri, Jun 2, 2023 at 5:36 PM Chris Angelico <ros...@gmail.com> wrote:
>
> On Sat, 3 Jun 2023 at 07:28, David Mertz, Ph.D. <david.me...@gmail.com> wrote:
> >
> > Sure. That's fine. With a sufficiently long strings my code is faster, but 
> > for "typical" strings yours will be.
>
> Really? How? Your code has to build a set of every character in the
> string; mine builds a set of every category in the string. Set
> intersection won't be slower for a smaller set.
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/5C7WSPFDJ4A6LRHL67N7UFPXGU4KI56O/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5XXPVXLWZQXEQW7B35QIPXHJK7G4N6X7/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to