On Wed, May 27, 2015 at 6:52 AM, anatoly techtonik <[email protected]> wrote:
> What I need is a bulletproof way to convert from anything to unicode. This > requires some kind of escaping to go forward and back. Some helper > methods like u2b() (unicode to binary) and b2u(). I am quite surprised that > so far I found nothing for this "simple" case. > That's because in general the encoding of the "binary" string is unknown. Is it ascii, utf-8, Windows CP-1252, shift-JIS, or something else? You can't decode such a string to Unicode without knowing the encoding. Check out the python-3 branch where we've been working through some of those issues. Your u2b is "easy" if you assume you want the binary to be utf-8 encoded, which is normally safe; this conversion is guaranteed to work. Your b2u is not so easy. You can't just assume utf-8 as you might think; if the string has invalid utf-8 bytes it'll raise an error or generate dummy chars depending on the args you pass to str.decode(). At least it'll get mangled if it's in a different encoding than you expect. -- Gary
_______________________________________________ Scons-dev mailing list [email protected] https://pairlist2.pair.net/mailman/listinfo/scons-dev
