>     Raffi asked me about this but since I have a few moments over
> lunch I figured I would reply to the list. It's been so long but it
> feels good. Anyway, the issue is the last two bytes of your URL
> encoded values. From the Ruby irb console I can see:
> 
> >> CGI.unescape("%e3%83")
> => "###"
> >> CGI.unescape("%e3%83").unpack('U*')
> ArgumentError: malformed UTF-8 character (expected 3 bytes, given 2
> bytes)
>         from (irb):13:in `unpack'
>         from (irb):13
> 
>     The issue is that %e3%83 is incomplete UTF-8. The %e3 is expected
> to be followed by two bytes, like the "TE" character [1], which is
> %e3%83%86:
> 
> >> CGI.unescape("%e3%83%86")
> => "___"
> >> CGI.unescape("%e3%83%86").unpack('U*')
> => [12486]
> 
>     Since the exact length of the escape sequence is 140 I'm guessing
> there is still some code truncating the value based on byte counts.

Not sure how I missed that. Thanks for the find, Matt. If I still find some
weirdness after correcting that, I'll report back.

-- 
------------------------------------ personal: http://www.cameronkaiser.com/ --
  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckai...@floodgap.com
-- Knowledge puffs up, but love builds up. -- 1 Corinthians 8:1 ---------------

Reply via email to