Steven D'Aprano <steve+pyt...@pearwood.info> added the comment:

The behaviour is technically correct, but confusing and unfortunate, and I 
don't think we can fix it.

Unicode does not define names for the ASCII control characters. But it does 
define aliases for them, based on the C0 control char standard.

unicodedata.lookup() looks for aliases as well as names (since version 3.3).

https://www.unicode.org/Public/UNIDATA/UnicodeData.txt
https://www.unicode.org/Public/UNIDATA/NameAliases.txt

It is unfortunate that we have only a single function for looking up a unicode 
code point by name, alias, alias-abbreviation, and named-sequence. That keeps 
the API simple, but in corner cases like this it leads to confusion.

The obvious "fix" is to make name() return the alias if there is no official 
name to return, but that is a change in behaviour. I have code that assumes 
that C0 and C1 control characters have no name, and relies on name() raising an 
exception for them.

Even if we changed the behaviour to return the alias, which alias should be 
returned, the full alias or the abbreviation?

This doesn't fix the problem that name() and lookup() aren't inverses of each 
other:

lookup('NUL') -> '\0  # using the abbreviated alias
name('\0') -> 'NULL'  # returns the full alias (or vice versa)

It gets worse with named sequences:

>>> c = lookup('LATIN CAPITAL LETTER A WITH MACRON AND GRAVE')
>>> name(c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: name() argument 1 must be a unicode character, not str
>>> len(c)
2

So we cannot possibly make name() and lookup() inverses of each other.

What we really should have had is separate functions for name and alias 
lookups, or better still, to expose the raw unicode tables as mappings and let 
people create their own higher-level interfaces.

----------
nosy: +steven.daprano

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46947>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to