Steven D'Aprano <[email protected]> added the comment:
Here's my implementation:
from unicodedata import name
from unicodedata import lookup as _lookup
from fnmatch import translate
from re import compile, I
_NAMES = None
def getnames():
global _NAMES
if _NAMES is None:
_NAMES = []
for i in range(0x110000):
s = name(chr(i), '')
if s:
_NAMES.append(s)
return _NAMES
def lookup(name_or_glob):
if any(c in name_or_glob for c in '*?['):
match = compile(translate(name_or_glob), flags=I).match
return [name for name in getnames() if match(name)]
else:
return _lookup(name_or_glob)
The major limitation of my implementation is that it doesn't match name aliases
or sequences.
http://www.unicode.org/Public/11.0.0/ucd/NameAliases.txt
http://www.unicode.org/Public/11.0.0/ucd/NamedSequences.txt
For example:
lookup('TAMIL SYLLABLE TAA?') # NamedSequence
ought to return ['தா'] but doesn't.
Parts of the Unicode documentation uses the convention that canonical names are
in UPPERCASE, aliases are lowercase, and sequences are in Mixed Case. and I
think that we should follow that convention:
http://www.unicode.org/charts/aboutcharindex.html
That makes it easy to see what is the canonical name and what isn't.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue35549>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com