Hi Cameron,

    Raffi asked me about this but since I have a few moments over
lunch I figured I would reply to the list. It's been so long but it
feels good. Anyway, the issue is the last two bytes of your URL
encoded values. From the Ruby irb console I can see:

>> CGI.unescape("%e3%83")
=> "###"
>> CGI.unescape("%e3%83").unpack('U*')
ArgumentError: malformed UTF-8 character (expected 3 bytes, given 2
bytes)
        from (irb):13:in `unpack'
        from (irb):13

    The issue is that %e3%83 is incomplete UTF-8. The %e3 is expected
to be followed by two bytes, like the "TE" character [1], which is
%e3%83%86:

>> CGI.unescape("%e3%83%86")
=> "テ"
>> CGI.unescape("%e3%83%86").unpack('U*')
=> [12486]

    Since the exact length of the escape sequence is 140 I'm guessing
there is still some code truncating the value based on byte counts.

Thanks;
  — Matt Sanford / Twitter Engineer

[1] - http://www.fileformat.info/info/unicode/char/30c6/index.htm

On Mar 9, 10:35 am, Cameron Kaiser <spec...@floodgap.com> wrote:
> So I rewrote TTYtter to count in characters instead of bytes, because users
> have been asking for ages for full 140-character tweets, and I was under
> the impression that the API now supported them thanks to Raffi's confirmation.
> Unfortunately, there seems to be a bug as soon as the tweet gets over 140
> bytes (user credentials removed). The Japanese was picked to be exactly 10
> characters long (the "yo" hiragana lands on the 10th character). The return
> block is the response from the server, which is only edited for length. I
> attached the transcript. Notice that as soon as it gets overlength, it bombs.
>
> --
> ------------------------------------ personal:http://www.cameronkaiser.com/--
>   Cameron Kaiser * Floodgap Systems *www.floodgap.com* ckai...@floodgap.com
> -- Shady business do not make for sunny life. -- Charlie Chan 
> -----------------
>
>  utft.txt
> 5KViewDownload

Reply via email to