[issue12732] Can't portably use Unicode in Python identifiers

2011-08-12 Thread Daniel Urban

Changes by Daniel Urban urban.dani...@gmail.com:


--
nosy: +durban

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12732
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12732] Can't portably use Unicode in Python identifiers

2011-08-12 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12732
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12732] Can't portably use Unicode in Python identifiers

2011-08-12 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

Ouch!
Do the rejected characters qualify as identifier characters as defined in 
Reference 2.3 Identifiers and keywords?
http://docs.python.org/py3k/reference/lexical_analysis.html#identifiers
If some interpreter version accepts extra characters, beyond the definition (as 
happened in 2.x), it is not a bug for for another version to only accept what 
is defined.

Side question: That section has A non-normative HTML file listing all valid 
identifier characters for Unicode 4.1 can be found at 
http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.; Is the set of 
identifier characters now larger, and if so, has the table been enlarged?

--
nosy: +haypo, lemburg, loewis, terry.reedy
stage:  - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12732
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12732] Can't portably use Unicode in Python identifiers

2011-08-12 Thread Matthew Barnett

Changes by Matthew Barnett pyt...@mrabarnett.plus.com:


--
nosy: +mrabarnett

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12732
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12732] Can't portably use Unicode in Python identifiers

2011-08-12 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Terry J. Reedy rep...@bugs.python.org wrote
   on Fri, 12 Aug 2011 23:05:27 -: 

 Ouch!

 Do the rejected characters qualify as identifier characters as defined
 in Reference 2.3 Identifiers and keywords?

 http://docs.python.org/py3k/reference/lexical_analysis.html#identifiers

Yes, that's right, they do.  You're using the standard IDS and IDC, and
XIDS and XIDC, definitions.  Here were the three identifiers that were
a problem:

픘픫픦픠픬픡픢 = super
ДЯхШщЯл = Deseret
̰̰̈́̈́‿̰̿̽̓͂  = Gothic our father

If you cannot read those, then when piped through `uniquote -v` they are:

\N{MATHEMATICAL FRAKTUR CAPITAL U}\N{MATHEMATICAL FRAKTUR SMALL 
N}\N{MATHEMATICAL FRAKTUR SMALL I}\N{MATHEMATICAL FRAKTUR SMALL 
C}\N{MATHEMATICAL FRAKTUR SMALL O}\N{MATHEMATICAL FRAKTUR SMALL 
D}\N{MATHEMATICAL FRAKTUR SMALL E} = super
\N{DESERET CAPITAL LETTER DEE}\N{DESERET SMALL LETTER SHORT E}\N{DESERET 
SMALL LETTER ES}\N{DESERET SMALL LETTER LONG I}\N{DESERET SMALL LETTER 
ER}\N{DESERET SMALL LETTER SHORT E}\N{DESERET SMALL LETTER TEE} = Deseret
\N{GOTHIC LETTER AHSA}\N{GOTHIC LETTER TEIWS}\N{GOTHIC LETTER 
TEIWS}\N{GOTHIC LETTER AHSA}\N{UNDERTIE}\N{GOTHIC LETTER URUS}\N{GOTHIC LETTER 
NAUTHS}\N{GOTHIC LETTER SAUIL}\N{GOTHIC LETTER AHSA}\N{GOTHIC LETTER RAIDA}  = 
Gothic our father

I'm not sure whether you recognize the scripts they belong to, but they're
all in the astral planes.  Using `uniquote -x` on them shows:

\x{1D518}\x{1D52B}\x{1D526}\x{1D520}\x{1D52C}\x{1D521}\x{1D522} = 
super
\x{10414}\x{1042F}\x{10445}\x{10428}\x{10449}\x{1042F}\x{1043B} = 
Deseret

\x{10330}\x{10344}\x{10344}\x{10330}\x{203F}\x{1033F}\x{1033D}\x{10343}\x{10330}\x{10342}
  = Gothic our father

As to whether they're proper identifiers per your reference above, I
will take the first letter from each of 픘픫픦픠픬픡픢, ДЯхШщЯл, and ̰̰̈́̈́‿̰̿̽̓͂, 
which are repsectively 픘, Д, and ̰, or

MATHEMATICAL FRAKTUR CAPITAL U
DESERET CAPITAL LETTER DEE
GOTHIC LETTER AHSA

or 

1D518
10414
10330

and show you their full Unicode properties of these reject code points.
This requires the uniprops command, given which, these three commands 
are then completely identical:

% uniprops -ga 픘 Д ̰
% uniprops -ga 1D518 10414 10330
% uniprops -ga MATHEMATICAL FRAKTUR CAPITAL U DESERET CAPITAL LETTER 
DEE GOTHIC LETTER AHSA

and produce this output:

U+1D518 ‹픘› \N{MATHEMATICAL FRAKTUR CAPITAL U}
\w \pL \p{LC} \p{L_} \p{L} \p{Lu}
All Any Alnum Alpha Alphabetic Assigned InMathematicalAlphanumericSymbols 
Cased Cased_Letter LC Changes_When_NFKC_Casefolded
   CWKCF Common Zyyy Lu L Gr_Base Grapheme_Base Graph GrBase ID_Continue 
IDC ID_Start IDS Letter L_ Uppercase_Letter Math
   Mathematical_Alphanumeric_Symbols Print Upper Uppercase Word 
XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum X_POSIX_Alpha
   X_POSIX_Graph X_POSIX_Print X_POSIX_Upper X_POSIX_Word
Age=3.1 Bidi_Class=L Bidi_Class=Left_To_Right BC=L 
Block=Mathematical_Alphanumeric_Symbols Canonical_Combining_Class=0
   Canonical_Combining_Class=Not_Reordered CCC=NR 
Canonical_Combining_Class=NR General_Category=Cased_Letter Script=Common
   Decomposition_Type=Font DT=Font Decomposition_Type=Non_Canon 
Decomposition_Type=Non_Canonical DT=NonCanon
   East_Asian_Width=Neutral GC=LC General_Category=L 
General_Category=Letter General_Category=L_ General_Category=LC GC=L
   General_Category=Lu General_Category=Uppercase_Letter GC=Lu 
Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX
   Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA 
Joining_Group=No_Joining_Group JG=NoJoiningGroup
   Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=AL 
Line_Break=Alphabetic LB=AL Numeric_Type=None NT=None
   Numeric_Value=NaN NV=NaN Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 
Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1
   Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 
Present_In=6.0 IN=6.0 SC=Zyyy Script=Zyyy
   Sentence_Break=UP Sentence_Break=Upper SB=UP Word_Break=ALetter WB=LE 
Word_Break=LE _X_Begin
U+10414 ‹Д› \N{DESERET CAPITAL LETTER DEE}
\w \pL \p{LC} \p{L_} \p{L} \p{Lu}
All Any Alnum Alpha Alphabetic Assigned InDeseret Cased Cased_Letter LC 
Changes_When_Casefolded CWCF Changes_When_Casemapped
   CWCM Changes_When_Lowercased CWL Changes_When_NFKC_Casefolded CWKCF 
Deseret Dsrt Lu L Gr_Base Grapheme_Base Graph GrBase
   ID_Continue IDC ID_Start IDS Letter L_ Uppercase_Letter Print Upper 
Uppercase Word XID_Continue XIDC XID_Start XIDS
   X_POSIX_Alnum X_POSIX_Alpha X_POSIX_Graph X_POSIX_Print X_POSIX_Upper 
X_POSIX_Word
Age=3.1 Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Deseret 
Canonical_Combining_Class=0
   Canonical_Combining_Class=Not_Reordered CCC=NR 
Canonical_Combining_Class=NR General_Category=Cased_Letter
   

[issue12732] Can't portably use Unicode in Python identifiers

2011-08-12 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 787ed1a7aba8 by Benjamin Peterson in branch '3.2':
in narrow builds, make sure to test codepoints as identifier characters (closes 
#12732)
http://hg.python.org/cpython/rev/787ed1a7aba8

New changeset 5af15f018e20 by Benjamin Peterson in branch 'default':
merge 3.2 (#12732)
http://hg.python.org/cpython/rev/5af15f018e20

--
nosy: +python-dev
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12732
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12732] Can't portably use Unicode in Python identifiers

2011-08-11 Thread Tom Christiansen

New submission from Tom Christiansen tchr...@perl.com:

You cannot reliably use Unicode in Python identifiers because of the 
narrow/wide build issue.  The enclosed file is fine on wide builds but gets 
compiler errors on narrow ones during compilation.

Go, Ruby, Java, and Perl all handle this situation without any problem; only 
Python has the bug.

--
components: Interpreter Core
files: badidents.python
messages: 141923
nosy: tchrist
priority: normal
severity: normal
status: open
title: Can't portably use Unicode in Python identifiers
type: behavior
versions: Python 3.2
Added file: http://bugs.python.org/file22882/badidents.python

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12732
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12732] Can't portably use Unicode in Python identifiers

2011-08-11 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12732
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com