[issue10459] missing character names in unicodedata (CJK...)
Martin v. Löwis mar...@v.loewis.de added the comment: For 3.2, this now fixed in r86681. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10459 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10459] missing character names in unicodedata (CJK...)
Martin v. Löwis mar...@v.loewis.de added the comment: The patch for 3.1 is r86685. The patch for 2.7 is r86686. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10459 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10459] missing character names in unicodedata (CJK...)
Martin v. Löwis mar...@v.loewis.de added the comment: Marc-Andre: Many of the characters you refer actually do have names assigned, even if the names don't appear in the Unicode character database. Instead, they are specified in section 4.8 of the Unicode standard, and unicodedata.c already implements that (it just wasn't updated when the ranges changed; I will look into this). -- nosy: +loewis ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10459 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10459] missing character names in unicodedata (CJK...)
Marc-Andre Lemburg m...@egenix.com added the comment: Martin v. Löwis wrote: Martin v. Löwis mar...@v.loewis.de added the comment: Marc-Andre: Many of the characters you refer actually do have names assigned, even if the names don't appear in the Unicode character database. Instead, they are specified in section 4.8 of the Unicode standard, and unicodedata.c already implements that (it just wasn't updated when the ranges changed; I will look into this). Thanks for pointing this out. I wasn't aware of there being a standard for constructing names for CJK ideograph ranges. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10459 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10459] missing character names in unicodedata (CJK...)
New submission from Vlastimil Brom vlastimil.b...@gmail.com: I just noticed an ommision of come character names in unicodedata module. These are some CJK - Ideographs: 龼 (0x9fbc) - 鿋 (0x9fcb) (CJK Unified Ideographs [19968-40959] [0x4e00-0x9fff]) ꜀ (0x2a700) - 뜴 (0x2b734) (CJK Unified Ideographs Extension C [173824-177983] [0x2a700-0x2b73f]) 띀 (0x2b740) - 렝 (0x2b81d) (CJK Unified Ideographs Extension D [177984-178207] [0x2b740-0x2b81f]) The names are probably to be generated - e.g. CJK UNIFIED IDEOGRAPH-2A700 ... etc. (Tested with the recompiled unicodedata - using unicode 6.0; with the py 27 - builtin module (unidata_version: '5.2.0') only the first two ranges are relevant (as CJK Unified Ideographs Extension D is an adition of Unicode 6) (Also there are the unprintable ASCII controls, surrogates and private use areas, where the missing names are probably ok.) I tested with the following rather clumsy code: # # # # # # # # # # # # # # # # wide_unichr = custom unichr emulating unicode ranges beyond on narrow python build codepoints_missing_char_names = [[-2,-2],] # dummy for i in xrange(0x10+1): if unicodedata.category(wide_unichr(i))[:1] != 'C' and unicodedata.name(wide_unichr(i), u??noname??) == u??noname??: if codepoints_missing_char_names[-1][1] == i-1: codepoints_missing_char_names[-1][1] = i else: codepoints_missing_char_names.append([i, i]) for first, last in codepoints_missing_char_names[1:]: print u%s (%s) - %s (%s) % (wide_unichr(first), hex(first), wide_unichr(last), hex(last),) # # # # # # # # # # # # # # # # # # # # # # # # # # Unfortunately, I can't provide a fix, as unicodedata involves C code, where my knowledge is near zero. vbr -- messages: 121521 nosy: vbr priority: normal severity: normal status: open title: missing character names in unicodedata (CJK...) ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10459 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10459] missing character names in unicodedata (CJK...)
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10459 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10459] missing character names in unicodedata (CJK...)
Changes by Vlastimil Brom vlastimil.b...@gmail.com: -- components: +Library (Lib), Unicode type: - behavior ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10459 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10459] missing character names in unicodedata (CJK...)
Marc-Andre Lemburg m...@egenix.com added the comment: Vlastimil Brom wrote: New submission from Vlastimil Brom vlastimil.b...@gmail.com: I just noticed an ommision of come character names in unicodedata module. These are some CJK - Ideographs: 龼 (0x9fbc) - 鿋 (0x9fcb) (CJK Unified Ideographs [19968-40959] [0x4e00-0x9fff]) ꜀ (0x2a700) - 뜴 (0x2b734) (CJK Unified Ideographs Extension C [173824-177983] [0x2a700-0x2b73f]) 띀 (0x2b740) - 렝 (0x2b81d) (CJK Unified Ideographs Extension D [177984-178207] [0x2b740-0x2b81f]) The names are probably to be generated - e.g. CJK UNIFIED IDEOGRAPH-2A700 ... etc. I don't think we should fill those rather big ranges with generated names, unless there's a standard for this. There are quite a few ranges in the Unicode database that are assigned, but don't have a literal name associated with them. -- nosy: +lemburg ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10459 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com