Okay, I'm still not there but I've realized I've been confusing a few things. This stackoverflow answer helped a lot:
http://stackoverflow.com/questions/5083032/how-to-utf-8-encode-a-character-string/5083858#5083858 I was conflating Unicode with UTF-8. But, I think that's also essentially what is happening somewhere in the process of ASCII-8BIT (output of Rack::Utils.unescape()) getting converted to UTF-8. I have to figure out how to override unescape() in my own initializer, I suppose, or intercept unescape()'s output and properly encode that. I think I'm close to a solution, since I'm starting to understand what all the values should be and what is happening. But any help will still be greatly appreciated, since there is still something eluding my understanding. Thanks, Dave On 5月16日, 午後8:47, ddellacosta <[email protected]> wrote: > Hi folks, > > Here's my basic issue, hopefully this is clear. I'm trying to submit > some UTF-8 values in my query string, but they are coming out mangled > on the other end. It *seems* like the problem is that what > Rack::Utils.unescape() pushes out gets converted to UTF-8 somewhere in > the chain (using 3.0.7, and Ruby 1.9.2, by the way), and it's mangling > characters which are two bytes (for example, "%20," which is space and > a one byte character, gets converted fine). I feel like I've almost > figured this out, but I'm still stumped. Here's my "evidence:" > > # Example UTF-8 string: > > "Adélaïde de Hongrie" > > # GET string (obviously URI encoded): > > Started GET "/registers/results?filter[title][]=Ad%E9la%EFde%20de > %20Hongrie&search=&limit=4" for 127.0.0.1 at 2011-05-16 14:17:33 +0700 > > # What Rack produces/Rails sees (in Controller): > > Parameters: {"filter"=>{"title"=>["Ad\xE9la\xEFde de Hongrie"]}, > "search"=>"", "limit"=>"4"} > > # Error I'm getting, when I try to "do stuff" with the above string: > > ArgumentError (invalid byte sequence in UTF-8): > > # What would actually be a valid string with hex UTF code points in > the format above: > > "Ad\xC3\xA9la\xC3\xAFde de Hongrie" > > Or, in the "\u ..." format (see anything interesting here? Something > obvious is eluding me...): > > "Ad\u{E9}la\u{EF}de de Hongrie > > To be clear, this is not a form, but an ajax query. I've tried adding > the 'utf8' snowman thing manually too, but that doesn't seem to do > anything...of course, maybe I'm doing that wrong. > > Any thoughts/questions/pointing out of obvious errors or confused ways > of thinking? I'd also appreciate any pointers to Rails documentation > which describes in more detail how this stuff happens; I've just been > digging through the code and it's slow going for me. > > Help much appreciated! > > Cheers, > Dave -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

