Marc-Andre Lemburg <m...@egenix.com> added the comment: Martin v. Löwis wrote: > > Martin v. Löwis <mar...@v.loewis.de> added the comment: > >> Your name will end up being partially escaped as surrogate: >> >> 'L\udcf6wis' >> >> Further processing will fail > > That depends on the further processing, no? > >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> UnicodeEncodeError: 'latin-1' codec can't encode character '\udcf6' in >> position 1: ordinal not in >> range(256) > > Where did you get this error from?
The roundup email interface must have eaten this first line of the traceback: >>> _.encode('latin-1') >> It doesn't work if an application tries to work *with* the data, >> e.g. tries to convert it > > Converting it to what? > >> parse it > > Parsing will work fine. > >> decode it > > It's a string. You shouldn't decode it. > >> The reason is >> that information included by the use of the 'surrogateescape' >> error handler is lost along the way and this then causes data >> corruption. > > And how would that not happen if it was bytes? The problems you describe > were one of the primary motivations to switch to Unicode: it's *byte* > strings that have these problems. Martin, it's obvious that you are not even trying to understand what I'm saying. That's not a good basis for discussion. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8603> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com