Update: it looks like this (what I consider buggy) behavior is common
to both firefox and IE.
Both correctly encode the path part of the URL, but neither encode the
query string part in UTF-8 (I believe for back compat with old buggy
websites).  Chrome does use UTF-8 for both.

It's easy to verify with netcat:

$ nc -l 5000

// then cut'n'paste the following URL into the address bar
http://localhost:8983/héllo?q=héllo

And netcat will spit out the following for firefox:

GET /h%C3%A9llo?q=h%E9llo HTTP/1.1
Host: localhost:5000
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US;
rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive


Note: this appears to only be an issue with testing with browsers
(i.e. cut-n-pasting, manually modifying, or typing unencoded URLs).
When doing something like submitting a form from the solr admin page,
they use the encoding of the form (which is UTF-8) and everything
works fine.

-Yonik
http://www.lucidimagination.com

Reply via email to