On Mon, Sep 29, 2008 at 5:14 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Adam Olsen wrote: >> There's no solution except to not >> decode, and 8859-1 is the way to do that. > > I think you need to elaborate that. What does ISO-8859-1 has to do > with a Python datatype in this context: which datatype, and what > algorithm on it are you specifically referring to? > > When I do (in 2.x) > > py> "foo".decode("iso-8859-1") > u'foo' > > ISTM that 8859-1 is all about decoding, so I don't understand why > you say it is a way not to decode.
8859-1 has no invalid bytes and is a 1-to-1 mapping. If you have an API that always returns unicode but accepts an encoding you can use it, then reencode using 8859-1 to get back the original bytes. An ugly hack, but more correct than UTF-8b or any similar attempt to do "unicode but not quite unicode"; either it's lossy, or it's not unicode. There's no in between. -- Adam Olsen, aka Rhamphoryncus _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com