Alexander Belopolsky added the comment:
about the problems you mentioned in msg144836, can you report
it in a new issue or, if there are already issues about them,
add a message there?
I believe that would be #4610.
--
nosy: +belopolsky
superseder: - Unicode case mappings are
Martin v. Löwis mar...@v.loewis.de added the comment:
LGTM
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
___
___
Python-bugs-list mailing
Roundup Robot devn...@psf.upfronthosting.co.za added the comment:
New changeset a985d733b3a3 by Ezio Melotti in branch 'default':
#12753: Add support for Unicode name aliases and named sequences.
http://hg.python.org/cpython/rev/a985d733b3a3
--
nosy: +python-dev
Ezio Melotti ezio.melo...@gmail.com added the comment:
I committed the patch and the buildbots seem happy. Thanks for the report and
the feedback!
Tom, about the problems you mentioned in msg144836, can you report it in a new
issue or, if there are already issues about them, add a message
Roundup Robot devn...@psf.upfronthosting.co.za added the comment:
New changeset 329b96fe4472 by Ezio Melotti in branch 'default':
#12753: fix compilation on Windows.
http://hg.python.org/cpython/rev/329b96fe4472
--
___
Python tracker
Ezio Melotti ezio.melo...@gmail.com added the comment:
If the latest patch is fine I'll commit it shortly.
--
stage: patch review - commit review
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
Tom Christiansen tchr...@perl.com added the comment:
Yes, it looks good. Thank you very much.
-tom
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
___
Changes by Florent Xicluna florent.xicl...@gmail.com:
--
nosy: +flox
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
___
___
Python-bugs-list
Martin v. Löwis mar...@v.loewis.de added the comment:
If you don't use git-style diffs, Rietveld will much better accommodate patches
that don't apply to tip cleanly. Unfortunately, hg git-style diffs don't
indicate the base revision, so Rietveld guesses that the base line is tip, and
then
Changes by Ezio Melotti ezio.melo...@gmail.com:
Removed file: http://bugs.python.org/file23355/issue12753-4.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
___
Changes by Ezio Melotti ezio.melo...@gmail.com:
Added file: http://bugs.python.org/file23365/issue12753-4.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
___
Changes by Ezio Melotti ezio.melo...@gmail.com:
Removed file: http://bugs.python.org/file23365/issue12753-4.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
___
Changes by Ezio Melotti ezio.melo...@gmail.com:
Added file: http://bugs.python.org/file23374/issue12753-4.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
___
Ezio Melotti ezio.melo...@gmail.com added the comment:
(I had to re-upload the patch a couple of time to get the review button to
work. Apparently if there are some conflicts rietveld fails to apply the
patch, whereas hg is able to merge files without problems here. Sorry for the
noise.)
Ezio Melotti ezio.melo...@gmail.com added the comment:
Here is a new patch that stores the names of aliases and named sequences in the
Private Use Area.
To summarize a bit, this is what we want:
| 6.0.0 | 3.2.0 |
+---+---+
\N{...} | A | - |
.name | - | -
Tom Christiansen tchr...@perl.com added the comment:
Ezio Melotti rep...@bugs.python.org wrote
on Sun, 09 Oct 2011 13:21:00 -:
Here is a new patch that stores the names of aliases and named
sequences in the Private Use Area.
Looks good! Thanks!
--tom
--
title: \N{...}
Martin v. Löwis mar...@v.loewis.de added the comment:
There are no official English titling rules and as you noted,
publishers vary.
If there aren't any rules, then how come all book and movie titles always
look the same? :)
Can we please leave the English language out of this issue?
Ezio Melotti ezio.melo...@gmail.com added the comment:
The patch is pretty much complete, it just needs a review (I left some comments
on the review page).
One thing that can be added is some compression for the names of the named
sequences. I'm not sure I can reuse the same compression used
Martin v. Löwis mar...@v.loewis.de added the comment:
The patch needs to take versioning into account. It seems that NamedSequences
where added in 4.1, and NameAliases in 5.0. So for the moment, when using 3.2
(i.e. when self is not NULL), it is fine to lookup neither. Please put an
assertion
Tom Christiansen tchr...@perl.com added the comment:
Ezio Melotti rep...@bugs.python.org wrote
on Mon, 03 Oct 2011 04:15:51 -:
But it still has to happen at compile time, of course, so I don't know
what you could do in Python. Is there any way to change how the compiler
behaves even
Martin v. Löwis mar...@v.loewis.de added the comment:
The main underlying problem is that the internal macros are defined in a
way that made sense a long time ago, but no longer do ever since (for
example) the Unicode lowercase property stopped being synonymous with
GC=Ll and started also
Ezio Melotti ezio.melo...@gmail.com added the comment:
The problem with official names is that they have things in them that
you are not expected in names. Do you really and truly mean to tell
me you think it is somehow **good** that people are forced to write
\N{LINE FEED (LF)}
Ezio Melotti ezio.melo...@gmail.com added the comment:
Attached a new patch with more tests and doc.
--
Added file: http://bugs.python.org/file23291/issue12753-3.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
Tom Christiansen tchr...@perl.com added the comment:
Ezio Melotti rep...@bugs.python.org wrote
on Sun, 02 Oct 2011 06:46:26 -:
Actually Python doesn't seem to support \N{LINE FEED (LF)}, most likely bec=
ause that's a Unicode 1 name, and nowadays these codepoints are simply mark=
ed
Terry J. Reedy tjre...@udel.edu added the comment:
Really? White space makes things harder to read? I thought Pythonistas
believed the opposite of that.
I was surprised at that too ;-). One person's opinion in a specific
context. Don't generaliza.
English titling rules
only capitalize
Tom Christiansen tchr...@perl.com added the comment:
Really? White space makes things harder to read? I thought Pythonistas
believed the opposite of that.
I was surprised at that too ;-). One person's opinion in a specific
context. Don't generalize.
The example I initially showed
Ezio Melotti ezio.melo...@gmail.com added the comment:
But it still has to happen at compile time, of course, so I don't know
what you could do in Python. Is there any way to change how the compiler
behaves even vaguely along these lines?
I think things like from __future__ import ... do
Martin v. Löwis mar...@v.loewis.de added the comment:
Does that sound fine?
Yes, that's fine as well.
--
title: \N{...} neglects formal aliases and named sequences from Unicode
charnames namespace - \N{...} neglects formal aliases and named sequences from
Unicode charnames namespace
Martin v. Löwis mar...@v.loewis.de added the comment:
You may wish unicode.name() to return the alias in preference, however.
-1. .name() is documented (and users familiar with it expect it) as
returning the name of the character from the UCD.
It doesn't really matter much to me if it's
Tom Christiansen tchr...@perl.com added the comment:
Perl does not provide the old 1.0 names at all. We don't have a Unicode
1.0 legacy to support, which makes this cleaner. However, we do provide
for the names of the C0 and C1 Control Codes, because apart from Unicode
1.0, they don't
Ezio Melotti ezio.melo...@gmail.com added the comment:
The attached patch changes Tools/unicode/makeunicodedata.py to create a list of
names and codepoints taken from
http://www.unicode.org/Public/6.0.0/ucd/NameAliases.txt and adds it to
Modules/unicodename_db.h.
During the lookup the
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
assignee: - ezio.melotti
stage: needs patch - patch review
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
___
Martin v. Löwis mar...@v.loewis.de added the comment:
I propose to use a better lookup algorithm using binary search, and then
integrate the NamedSequences into this as well. The search result could be a
record
struct {
char *name;
int len;
Py_UCS4 chars[3]; /* no sequence is more
Ezio Melotti ezio.melo...@gmail.com added the comment:
Leaving named sequences for unicodedata.lookup() only (and not for \N{}) makes
sense.
The list of aliases is so small (11 entries) that I'm not sure using a binary
search for it would bring any advantage. Having a single lookup algorithm
Tom Christiansen tchr...@perl.com added the comment:
Ezio Melotti ezio.melo...@gmail.com added the comment:
Leaving named sequences for unicodedata.lookup() only (and not for
\N{}) makes sense.
There are certainly advantages to that strategy: you don't have to
deal with [\N{sequence}]
Ezio Melotti ezio.melo...@gmail.com added the comment:
Attached a new patch that adds support for named sequences (still needs some
test and can probably be improved).
There are certainly advantages to that strategy: you don't have to
deal with [\N{sequence}] issues.
I assume with [] you
Guido van Rossum gu...@python.org added the comment:
+1 on the feature request.
--
nosy: +gvanrossum
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
___
Terry J. Reedy tjre...@udel.edu added the comment:
I verified that the test file raises the quoted SyntaxError on 3.2 on Win7.
This:
\N{LATIN CAPITAL LETTER GHA}
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in
position 0-27: unknown Unicode character name
is most
Tom Christiansen tchr...@perl.com added the comment:
Terry J. Reedy rep...@bugs.python.org wrote
on Fri, 19 Aug 2011 22:50:58 -:
My current opinion is that adding the aliases might be done in current
releases. It certainly would serve the any user who does not know to
misspell
Matthew Barnett pyt...@mrabarnett.plus.com added the comment:
For the Line_Break property, one of the possible values is Inseparable,
with 2 permitted aliases, the shorter IN (which is reasonable) and
Inseperable (ouch!).
--
___
Python tracker
Tom Christiansen tchr...@perl.com added the comment:
Matthew Barnett rep...@bugs.python.org wrote
on Fri, 19 Aug 2011 23:36:45 -:
For the Line_Break property, one of the possible values is
Inseparable, with 2 permitted aliases, the shorter IN (which
is reasonable) and Inseperable
New submission from Tom Christiansen tchr...@perl.com:
Unicode character names share a common namespace with formal aliases and with
named sequences, but Python recognizes only the original name. That means not
everything in the namespace is accessible from Python. (If this is construed
to
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
components: +Unicode
nosy: +ezio.melotti
stage: - test needed
versions: -Python 2.7, Python 3.1, Python 3.2
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
Tom Christiansen tchr...@perl.com added the comment:
Here’s the right test file for the right ticket.
--
Added file: http://bugs.python.org/file22903/nametests.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
44 matches
Mail list logo