Changes by Arc Riley arcri...@gmail.com:
--
nosy: +ArcRiley
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24511
___
___
Python-bugs-list mailing
Arc Riley arcri...@gmail.com added the comment:
It looks right to me, but I would include more verbose pydoc strings.
IE, The tail attribute can be used to hold additional data associated with the
element tells me nothing. You could explain here what .tail actually is, a
few XML examples
Arc Riley arcri...@gmail.com added the comment:
Python 3.1.1 (r311:74480, Sep 13 2009, 22:19:17)
[GCC 4.4.1] on linux2
Type help, copyright, credits or license for more information.
import sys
sys.maxunicode
1114111
u = 'ё'
print(u)
Traceback (most recent call last):
File stdin, line 1
Arc Riley arcri...@gmail.com added the comment:
Amaury, you are absolutely correct, \ud801 is not a valid unicode glyph,
however I am not giving Python \ud801, I am giving Python 'ё' (==
'\U00010451').
I am attaching a different short example that demonstrates that Python
is mishandling UTF-8
Arc Riley arcri...@gmail.com added the comment:
This behavior is identical whether u.py or u.pyc is run on my systems,
where that previous ticket concerns differing behavior.
Though it is obviously related.
--
versions: -Python 2.6, Python 3.0
New submission from Arc Riley arcri...@gmail.com:
The following is a minimal example which does not work under Python
3.1.1 but functions as expected on Pyhton 2.6 and 3.0.
Python 3.1.1 believes the single UTF-8 glyph is two entirely different
(and illegal) unicode characters:
Traceback (most
Arc Riley arcri...@gmail.com added the comment:
While t.py only bugs on 3.1, the following happens with 3.0 as well:
line = 'ёѧѕёѦљ'
first = 'ё'
first
'ё'
line[0]
'\ud801'
line[0] == first
False
And with 2.6:
line = u'ёѧѕёѦљ'
first = u'ё'
first
u'\ud801\udc51'
--
versions