On Sunday 25 March 2007 12:49, Daniel Henninger wrote: > Because it's dumb. ;D (I'm half serious) Ok, so the logic behind > guess_encoding and the gaim stuff is to "try" the various encodings > and see which one doesn't fail miserably. Then go with that one. > However, the problem I ran into is that utf-16be expects 2 bytes for > every character. Unfortunately, if your character string has an even > number of characters in it, regardless of whether they are 2 byte > utf16's or 1byte ascii, it goes "oh ok utf-16be worked". So anyone > sending ascii with even number of characters got gobblygook. Cute, > huh? Apparently the c libraries are much better behaved. I couldn't > figure out a way around this so I had to comment it out. If you know > of a better way..... =)
I guess one way is to confirm if the character is a Unicode code point after doing that. Another way (what ICU does) is to use statistics to determine if what you got back looks like a language. TX -- Email: [EMAIL PROTECTED] Jabber ID: [EMAIL PROTECTED] Web site: http://trypticon.org/ GPG Fingerprint: 9EEB 97D7 8F7B 7977 F39F A62C B8C7 BC8B 037E EA73
pgpeVGxNR91nC.pgp
Description: PGP signature
_______________________________________________ py-transports mailing list py-transports@blathersource.org http://lists.modevia.com/cgi-bin/mailman/listinfo/py-transports