flying sheep added the comment:
IDK if it came with unicode 7.0, but there is clarification:
# Note that currently the only instances of multiple aliases of the same
# type for a single code point are either of type control or abbreviation.
# An alias of type abbreviation can, in principle, be
Marc-Andre Lemburg added the comment:
On 23.06.2013 22:43, Alexander Belopolsky wrote:
Alexander Belopolsky added the comment:
unicodedata.name() was discussed in #12353 (msg144739) where MvL argued that
misspelled names are better than corrected because they are more likely to
appear
Serhiy Storchaka added the comment:
Perhaps unicodedata.aliases() should return not a list, but an ordered dict.
What name should use the namereplace error handler? Original or corrected?
Should it use first alias if there is no original name?
--
Marc-Andre Lemburg added the comment:
On 24.06.2013 10:05, Serhiy Storchaka wrote:
Serhiy Storchaka added the comment:
Perhaps unicodedata.aliases() should return not a list, but an ordered dict.
What name should use the namereplace error handler? Original or corrected?
Should it use
Alexander Belopolsky added the comment:
MAL Please leave the function as it is, i.e. a 1-1 mapping to the
MAL official, non-changing Unicode name reference (including
MAL spelling errors, etc). Same with code points that have no name.
Since we have code points with no name - it is not 1-1
Marc-Andre Lemburg added the comment:
On 24.06.2013 16:35, Alexander Belopolsky wrote:
Alexander Belopolsky added the comment:
MAL Please leave the function as it is, i.e. a 1-1 mapping to the
MAL official, non-changing Unicode name reference (including
MAL spelling errors, etc). Same
Alexander Belopolsky added the comment:
Here is an example of prior art that is relevant to this discussion:
charnames::viacode(code)
..
As mentioned above under ALIASES, Unicode 6.1 defines extra names (synonyms or
aliases) for some code points, most of which were already available as Perl
Marc-Andre Lemburg added the comment:
On 24.06.2013 16:58, Alexander Belopolsky wrote:
Alexander Belopolsky added the comment:
Here is an example of prior art that is relevant to this discussion:
charnames::viacode(code)
..
As mentioned above under ALIASES, Unicode 6.1 defines extra
Alexander Belopolsky added the comment:
The .aliases() function would have to return a list, not a single
name, so a parameter would cause the return type to change, which
is not a good idea.
You misunderstood my proposal. .name() will still return a single name, but
the type parameter
Martin v. Löwis added the comment:
But some of these types could still have lists as values, no?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18234
___
Marc-Andre Lemburg added the comment:
On 24.06.2013 18:10, Alexander Belopolsky wrote:
Alexander Belopolsky added the comment:
The .aliases() function would have to return a list, not a single
name, so a parameter would cause the return type to change, which
is not a good idea.
You
Alexander Belopolsky added the comment:
Rather than adding a new method to unicodedata, what do you think about adding
a type keyword argument to unicodedata.name()? It can default to canonical
and have possible values control, abbreviation, etc.
See also #12753.
--
Serhiy Storchaka added the comment:
Can a character or sequence have multiple aliases? What will be a result type
of unicodedata.name() with abbreviation keyword value?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18234
Alexander Belopolsky added the comment:
Can a character or sequence have multiple aliases?
Yes, for example, most control characters have two aliases (and no name).
;NULL;control
;NUL;abbreviation
0001;START OF HEADING;control
0001;SOH;abbreviation
0002;START OF TEXT;control
Alexander Belopolsky added the comment:
unicodedata.name() was discussed in #12353 (msg144739) where MvL argued that
misspelled names are better than corrected because they are more likely to
appear misspelled in other sources. I am not sure I buy this argument.
Someone googling for
Alexander Belopolsky added the comment:
I mistyped issue reference above it should be #12753, not 12353.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18234
___
Martin v. Löwis added the comment:
I think the best way would be to provide a function unicodedata.aliases,
returning a list of names for a given character or sequence.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18234
Alexander Belopolsky added the comment:
UCD provides more than just a list of aliases: formal name aliases have type
- control, abbreviation, etc. See
http://www.unicode.org/Public/UNIDATA/NameAliases.txt.
--
___
Python tracker
Changes by Antoine Pitrou pit...@free.fr:
--
nosy: +benjamin.peterson, ezio.melotti, lemburg, loewis, serhiy.storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18234
___
New submission from Alexander Belopolsky:
Python is aware of unicode codepoint aliases, but unicodedata does not provide
a way to find aliases of a given codepoint:
ucd.lookup('ESCAPE') == '\N{ESCAPE}'
True
ucd.lookup('RS') == '\N{RS}'
True
but
ucd.name('\N{ESCAPE}')
Traceback (most
20 matches
Mail list logo