[issue1433] marshal roundtripping for unicode

2007-11-15 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: I think you have a wrong understanding of round-tripping. In Unicode it is really irrelevant if you're using a UCS2 surrogate pair or a UCS4 representation to describe a code point. The length of the Unicode representation may change, but the meaning

[issue1433] marshal roundtripping for unicode

2007-11-13 Thread Carl Friedrich Bolz
New submission from Carl Friedrich Bolz: Marshal does not round-trip unicode surrogate pairs for wide unicode-builds: marshal.loads(marshal.dumps(u\ud800\udc00)) == u'\U0001' This is very annoying, because the size of unicode constants differs between when you run a module for the first

[issue1433] marshal roundtripping for unicode

2007-11-13 Thread Guido van Rossum
Guido van Rossum added the comment: I think this is unavoidable. Depending on whether you happen to be using a narrow or wide unicode build of Python, \U may be turned into a pair of surrogates anyway. It's not just marshal that's not roundtripping; the utf-8 codec has the same issue

[issue1433] marshal roundtripping for unicode

2007-11-13 Thread Martin v. Löwis
Martin v. Löwis added the comment: As Guido says: this is by design. The Unicode type doesn't really support storage of surrogates; so don't use it for that. -- nosy: +loewis resolution: - wont fix status: open - closed __ Tracker [EMAIL PROTECTED]