[issue18614] Enhanced \N{} escapes for Unicode strings

2013-08-01 Thread Steven D'Aprano

New submission from Steven D'Aprano:

As per the discussion here:

http://mail.python.org/pipermail/python-ideas/2013-July/022419.html

\N{} escapes should support the Unicode code point notation U+ (where there 
are four, five or six hex digits after the U+).

E.g. '\N{U+03BB}' = 'λ'

unicodedata.lookup should also support such numeric names, e.g.:

unicodedata.lookup('U+03BB') = 'λ'

As '+' is otherwise prohibited in Unicode character names, there should never 
be ambiguity between 'U+' as a code point and an actual name, and a single 
lookup function can handle both.

(See http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf#G39 for details on 
characters allowed in names.)


Also add a function for the reverse

unicodedata.codepoint('λ') = 'U+03BB'


def codepoint(c):
return 'U+{:04X}'.format(ord(c))

--
components: Unicode
messages: 194075
nosy: ezio.melotti, stevenjd
priority: normal
severity: normal
status: open
title: Enhanced \N{} escapes for Unicode strings
type: enhancement
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18614
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18614] Enhanced \N{} escapes for Unicode strings

2013-08-01 Thread Matthew Barnett

Matthew Barnett added the comment:

I've attached a patch for this.

--
keywords: +patch
nosy: +mrabarnett
Added file: http://bugs.python.org/file31112/issue18614.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18614
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18614] Enhanced \N{} escapes for Unicode strings

2013-08-01 Thread Terry J. Reedy

Terry J. Reedy added the comment:

I agree with the proposal.

Some of the code seems redundant with code we already have.
In Python, I would write

def codepoint_from_U_notation(name, namelen):
  if not (4 = namelen = 6): raise wrong length
  return chr(int(name, 16))

maybe with try-except to re-write error messages like
ValueError: invalid literal for int() with base 16: '99x3'
ValueError: chr() arg not in range(0x11)

My point is that we already have code to convert hex strings to int; I presume 
PyUnicode_FromOrdinal(code) is the C version of 'chr' that already checks the 
max value.

--
nosy: +terry.reedy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18614
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com