[Python-ideas] Unicode Name Aliases keyword argument abbreviation in unicodedata.name for missing names

Robert Vanden Eynde Wed, 11 Jul 2018 19:06:39 -0700

unicodedata.name<http://unicodedata.name> raises KeyError for a few unicode 
characters like '\0' or '\n', altough the documentation is very clear on the 
implementation, this is often not what people want, ie. a string describing the 
character.


In Python 3.3, the name aliases became accepted in unicodedata.lookup('NULL') 
and '\N{NULL}' == '\N{NUL}'.

One could expect that lookup(name(x)) == x for all unicode character but this 
property doesn't hold because of the few characters that do not have a name 
(mainly control characters).

The use case where the KeyError is raised when a codepoint for a unused 
character or newest version of unicode is however still useful.

In the NameAliases https://www.unicode.org/Public/6.3.0/ucd/NameAliases.txt one 
can see that some characters have multiple aliases, so there are multiple ways 
to map a character to a name.

I propose adding a keyword argument, to 
unicodedata.name<http://unicodedata.name> that would implement one of some 
useful behavior when the value does not exist. In that case.

One simple behavior would be to chose the name in the "abbreviation" list. 
Currently all characters except three only have one and only one abbreviation 
so that would be a good pick, so I'd imagine name('\x00', abbreviation=True) == 
'NUL'

The three characters in NameAlias.txt that have more than one abbreviation are :

'\n' with  ['LF', 'NL', 'EOL']
'\t' with ['HT', 'TAB']
'\ufeff' with ['BOM', 'ZWNBSP']

In case multiple abbreviations exist, one could take the first introduced to 
unicode (for backward compability with python versions). If this is a tie, one 
could take the first in the list. If it has no name and no abbreviation, 
unicodata.name<http://unicodata.name> raises an error or returns default as 
usual.

lookup(name(x)) == x for all x is natural isn't it ?

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Unicode Name Aliases keyword argument abbreviation in unicodedata.name for missing names

Reply via email to