[issue1552880] Unicode Imports

2010-09-01 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: > Yes, but in Python, U+DC80..D+DCFF range is used to store undecodable bytes. > Eg. 'abc\xff'.decode('ascii', 'surrogateescape') gives 'abc\udcff'. That's an inventive way of breaking the unicode standard :) Anyway, why would you worry about that? My

[issue1552880] Unicode Imports

2010-09-01 Thread Éric Araujo
Changes by Éric Araujo : -- nosy: +eric.araujo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.py

[issue1552880] Unicode Imports

2010-09-01 Thread STINNER Victor
STINNER Victor added the comment: > According to the Unicode standard the high and low surrogate halves used > by UTF-16 (...) Yes, but in Python, U+DC80..D+DCFF range is used to store undecodable bytes. Eg. 'abc\xff'.decode('ascii', 'surrogateescape') gives 'abc\udcff'. > Anyway, as you rema

[issue1552880] Unicode Imports

2010-08-31 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: I conffess that I didn't follow the utf-8/surrogate discussion. But the utf-8 encoding can encode all valid unicode characters: UTF-8 may only legally be used to encode valid Unicode scalar values. According to the Unicode standard the high and low sur

[issue1552880] Unicode Imports

2010-08-31 Thread STINNER Victor
STINNER Victor added the comment: utf-8 codec (in strict mode) rejects surrogates in python3, and so you doesn't support undecodable filenames (filenames decoded using surrogateescape error handler which produces surrogate characters). It may be possible if you use surrogateescape everywhere.

[issue1552880] Unicode Imports

2010-08-24 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Possibly. I made a comment in issue 9425 explaining the particular trick that this here patch makes (using utf-8 as an intermediate form to avoid having to change all the machinery in import.c) -- ___ Pyth

[issue1552880] Unicode Imports

2010-08-24 Thread STINNER Victor
STINNER Victor added the comment: > I think #9425 super*s*edes this. Am I correct? #8611 or #9425, as you want. Anyway, I'm working on this topic and I will try to fix it before Python 3.2 release. -- nosy: +haypo ___ Python tracker

[issue1552880] Unicode Imports

2010-08-24 Thread Mark Lawrence
Mark Lawrence added the comment: I think #9425 supercedes this. Am I correct? -- nosy: +BreamoreBoy ___ Python tracker ___ ___ Pyth

[issue1552880] Unicode Imports

2009-04-01 Thread Brett Cannon
Changes by Brett Cannon : -- assignee: brett.cannon -> ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http:

[issue1552880] Unicode Imports

2009-02-11 Thread Thomas Heller
Changes by Thomas Heller : -- nosy: +theller ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyth

[issue1552880] Unicode Imports

2009-02-11 Thread Kristján Valur Jónsson
Kristján Valur Jónsson added the comment: Ah, this one is still alive? We still use this patch at CCP for our 2.x python. I'll give it some more love to answer the issues raised. Hm, is this still an issue with 3.x? Does the imput machinery use unicode as the internal format when working wit

[issue1552880] Unicode Imports

2009-02-10 Thread Daniel Diniz
Changes by Daniel Diniz : -- assignee: -> brett.cannon nosy: +brett.cannon, ezio.melotti ___ Python tracker ___ ___ Python-bugs-lis