[issue18614] Enhanced \N{} escapes for Unicode strings
New submission from Steven D'Aprano: As per the discussion here: http://mail.python.org/pipermail/python-ideas/2013-July/022419.html \N{} escapes should support the Unicode code point notation U+ (where there are four, five or six hex digits after the U+). E.g. '\N{U+03BB}' = 'λ' unicodedata.lookup should also support such numeric names, e.g.: unicodedata.lookup('U+03BB') = 'λ' As '+' is otherwise prohibited in Unicode character names, there should never be ambiguity between 'U+' as a code point and an actual name, and a single lookup function can handle both. (See http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf#G39 for details on characters allowed in names.) Also add a function for the reverse unicodedata.codepoint('λ') = 'U+03BB' def codepoint(c): return 'U+{:04X}'.format(ord(c)) -- components: Unicode messages: 194075 nosy: ezio.melotti, stevenjd priority: normal severity: normal status: open title: Enhanced \N{} escapes for Unicode strings type: enhancement versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18614 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18614] Enhanced \N{} escapes for Unicode strings
Matthew Barnett added the comment: I've attached a patch for this. -- keywords: +patch nosy: +mrabarnett Added file: http://bugs.python.org/file31112/issue18614.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18614 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18614] Enhanced \N{} escapes for Unicode strings
Terry J. Reedy added the comment: I agree with the proposal. Some of the code seems redundant with code we already have. In Python, I would write def codepoint_from_U_notation(name, namelen): if not (4 = namelen = 6): raise wrong length return chr(int(name, 16)) maybe with try-except to re-write error messages like ValueError: invalid literal for int() with base 16: '99x3' ValueError: chr() arg not in range(0x11) My point is that we already have code to convert hex strings to int; I presume PyUnicode_FromOrdinal(code) is the C version of 'chr' that already checks the max value. -- nosy: +terry.reedy ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18614 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com