[issue18234] Unicodedata module should provide access to codepoint aliases

2014-10-11 Thread flying sheep
flying sheep added the comment: IDK if it came with unicode 7.0, but there is clarification: # Note that currently the only instances of multiple aliases of the same # type for a single code point are either of type control or abbreviation. # An alias of type abbreviation can, in principle, be

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 23.06.2013 22:43, Alexander Belopolsky wrote: Alexander Belopolsky added the comment: unicodedata.name() was discussed in #12353 (msg144739) where MvL argued that misspelled names are better than corrected because they are more likely to appear

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Perhaps unicodedata.aliases() should return not a list, but an ordered dict. What name should use the namereplace error handler? Original or corrected? Should it use first alias if there is no original name? --

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 24.06.2013 10:05, Serhiy Storchaka wrote: Serhiy Storchaka added the comment: Perhaps unicodedata.aliases() should return not a list, but an ordered dict. What name should use the namereplace error handler? Original or corrected? Should it use

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: MAL Please leave the function as it is, i.e. a 1-1 mapping to the MAL official, non-changing Unicode name reference (including MAL spelling errors, etc). Same with code points that have no name. Since we have code points with no name - it is not 1-1

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 24.06.2013 16:35, Alexander Belopolsky wrote: Alexander Belopolsky added the comment: MAL Please leave the function as it is, i.e. a 1-1 mapping to the MAL official, non-changing Unicode name reference (including MAL spelling errors, etc). Same

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Here is an example of prior art that is relevant to this discussion: charnames::viacode(code) .. As mentioned above under ALIASES, Unicode 6.1 defines extra names (synonyms or aliases) for some code points, most of which were already available as Perl

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 24.06.2013 16:58, Alexander Belopolsky wrote: Alexander Belopolsky added the comment: Here is an example of prior art that is relevant to this discussion: charnames::viacode(code) .. As mentioned above under ALIASES, Unicode 6.1 defines extra

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: The .aliases() function would have to return a list, not a single name, so a parameter would cause the return type to change, which is not a good idea. You misunderstood my proposal. .name() will still return a single name, but the type parameter

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Martin v . Löwis
Martin v. Löwis added the comment: But some of these types could still have lists as values, no? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18234 ___

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 24.06.2013 18:10, Alexander Belopolsky wrote: Alexander Belopolsky added the comment: The .aliases() function would have to return a list, not a single name, so a parameter would cause the return type to change, which is not a good idea. You

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Rather than adding a new method to unicodedata, what do you think about adding a type keyword argument to unicodedata.name()? It can default to canonical and have possible values control, abbreviation, etc. See also #12753. --

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Can a character or sequence have multiple aliases? What will be a result type of unicodedata.name() with abbreviation keyword value? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18234

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: Can a character or sequence have multiple aliases? Yes, for example, most control characters have two aliases (and no name). ;NULL;control ;NUL;abbreviation 0001;START OF HEADING;control 0001;SOH;abbreviation 0002;START OF TEXT;control

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: unicodedata.name() was discussed in #12353 (msg144739) where MvL argued that misspelled names are better than corrected because they are more likely to appear misspelled in other sources. I am not sure I buy this argument. Someone googling for

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-23 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: I mistyped issue reference above it should be #12753, not 12353. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18234 ___

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-20 Thread Martin v . Löwis
Martin v. Löwis added the comment: I think the best way would be to provide a function unicodedata.aliases, returning a list of names for a given character or sequence. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18234

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-20 Thread Alexander Belopolsky
Alexander Belopolsky added the comment: UCD provides more than just a list of aliases: formal name aliases have type - control, abbreviation, etc. See http://www.unicode.org/Public/UNIDATA/NameAliases.txt. -- ___ Python tracker

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-17 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +benjamin.peterson, ezio.melotti, lemburg, loewis, serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18234 ___

[issue18234] Unicodedata module should provide access to codepoint aliases

2013-06-16 Thread Alexander Belopolsky
New submission from Alexander Belopolsky: Python is aware of unicode codepoint aliases, but unicodedata does not provide a way to find aliases of a given codepoint: ucd.lookup('ESCAPE') == '\N{ESCAPE}' True ucd.lookup('RS') == '\N{RS}' True but ucd.name('\N{ESCAPE}') Traceback (most