Re: [py-transports] UTF-8 Decode problem

Trejkaz Sat, 24 Mar 2007 21:38:00 -0800

On Sunday 25 March 2007 12:49, Daniel Henninger wrote:
> Because it's dumb.  ;D  (I'm half serious)  Ok, so the logic behind
> guess_encoding and the gaim stuff is to "try" the various encodings
> and see which one doesn't fail miserably.  Then go with that one.
> However, the problem I ran into is that utf-16be expects 2 bytes for
> every character.  Unfortunately, if your character string has an even
> number of characters in it, regardless of whether they are 2 byte
> utf16's or 1byte ascii, it goes "oh ok  utf-16be worked".  So anyone
> sending ascii with even number of characters got gobblygook.  Cute,
> huh?  Apparently the c libraries are much better behaved.  I couldn't
> figure out a way around this so I had to comment it out.  If you know
> of a better way.....  =)


I guess one way is to confirm if the character is a Unicode code point after 
doing that.  Another way (what ICU does) is to use statistics to determine if 
what you got back looks like a language.

TX



-- 
             Email: [EMAIL PROTECTED]
         Jabber ID: [EMAIL PROTECTED]
          Web site: http://trypticon.org/
   GPG Fingerprint: 9EEB 97D7 8F7B 7977 F39F  A62C B8C7 BC8B 037E EA73

pgpeVGxNR91nC.pgp
Description: PGP signature

_______________________________________________
py-transports mailing list
py-transports@blathersource.org
http://lists.modevia.com/cgi-bin/mailman/listinfo/py-transports

Re: [py-transports] UTF-8 Decode problem

Reply via email to