[issue27496] unicodedata.name() doesn't have names for control characters

2021-03-08 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27496] unicodedata.name() doesn't have names for control characters

2021-02-26 Thread Eryk Sun


Change by Eryk Sun :


--
versions: +Python 3.10, Python 3.8, Python 3.9 -Python 2.7, Python 3.5, Python 
3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27496] unicodedata.name() doesn't have names for control characters

2016-07-12 Thread Eryk Sun

Changes by Eryk Sun :


--
versions: +Python 2.7, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27496] unicodedata.name() doesn't have names for control characters

2016-07-12 Thread Eryk Sun

Changes by Eryk Sun :


--
components: +Unicode
nosy: +ezio.melotti, haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27496] unicodedata.name() doesn't have names for control characters

2016-07-12 Thread Zack Weinberg

Zack Weinberg added the comment:

It looks to me as if NameAliases.txt is the better reference for the C0 and C1 
controls.  It matches the UnicodeData.txt field 10 names for most entries where 
the field 1 name is "", but it has names for U+0080, U+0081, U+0084, 
and U+0099, which have no field 10 name.  The only catch is that NameAliases 
may have *several* names for the same character, with the same category tag, 
e.g.

0009;CHARACTER TABULATION;control
0009;HORIZONTAL TABULATION;control

It probably makes sense to consistently use the first listed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27496] unicodedata.name() doesn't have names for control characters

2016-07-12 Thread Eryk Sun

Eryk Sun added the comment:

Character names are in field 1 of UnicodeData.txt [1][2]. For controls the name 
is just "". In Tools/unicode/makunicodedata.py, the makeunicodename 
function skips names that start with "<". Instead of skipping the character, it 
could fall back on the Unicode 1.0 name (field 10), if it's defined. For 
controls, this is the ISO 6429 name:

(10) Old name as published in Unicode 1.0 or ISO 6429 names 
for control functions. This field is empty unless it is 
significantly different from the current name for the 
character. No longer used in code chart production. See 
Name_Alias. 

The names of control characters are also in NameAliases.txt, which gets 
processed as the unicode.aliases list of (name, char) tuples.

[1]: http://www.unicode.org/reports/tr44/#UnicodeData.txt
[2]: http://www.unicode.org/Public/8.0.0/ucd

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27496] unicodedata.name() doesn't have names for control characters

2016-07-12 Thread R. David Murray

R. David Murray added the comment:

That information is programatically generated from data files obtained from the 
unicode project, as far as I know.

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27496] unicodedata.name() doesn't have names for control characters

2016-07-12 Thread Zack Weinberg

New submission from Zack Weinberg:

unicodedata.name() doesn't have name information for the C0 and C1 control 
characters.  To see this, run

pprint.pprint(["U+{:04X} {}".format(n, unicodedata.name(chr(n), "")) 
for n in range(256)])

and you will observe  printed for U+ through U+001F and U+007F 
through U+009F.  These characters do have official Unicode names and they 
should be known to name().

I may see if I can come up with a patch for this one, in my copious free time.

--
components: Library (Lib)
messages: 270242
nosy: zwol
priority: normal
severity: normal
status: open
title: unicodedata.name() doesn't have names for control characters
type: behavior
versions: Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com