On Sat, 3 Jun 2023 at 10:12, David Mertz, Ph.D. <david.me...@gmail.com> wrote: > > Let's call the styles a tie. Using the SOWPODS scrabble wordlist (no > currency symbols, so False answer): > > >>> unicode_currency = {chr(c) for c in range(0xFFFF) if > >>> unicodedata.category(chr(c)) == "Sc"} > >>> wordlist = open('/usr/local/share/sowpods').read() > >>> len(wordlist) > 2707021 > >>> %timeit any(unicodedata.category(ch) == "Sc" for ch in wordlist) > 176 ms ± 1.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) > >>> %timeit any(unicodedata.category(ch) == "Sc" for ch in set(wordlist)) > 17.8 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) > >>> bool(set(wordlist) & unicode_currency) > False > >>> %timeit bool(set(wordlist) & unicode_currency) > 18 ms ± 216 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) > > Of course, this is a small character set of 26 lowercase letters (and > newline as I did it). A more diverse alphabet might tip the timing > slightly, but it's going to be a small matter either way. >
Remember though, the original request was not for a set, but for a string. Try your timing again when working with a string. The any() form is almost certainly the most effective, although I suppose it could be implemented in C for better performance (avoiding calling back into Python repeatedly). Not sure it's necessary though. ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TOAR5FT3MDIEZFBVT7YGR6CTZ2JKCZCQ/ Code of Conduct: http://python.org/psf/codeofconduct/