Re: µTorrent indexed as µTorrent

2009-07-30 Thread Bill Au
Thanks, Robert. That's exactly what my problem was. Things work find after I make sure that all my processing (index and query) are using UTF-8. FYI, it took me a while to discover that SolrJ by default uses a GET request for query, which uses ISO-8859-1. I had to explicitly use a POST to do

Re: µTorrent indexed as µTorrent

2009-07-30 Thread Yonik Seeley
On Thu, Jul 30, 2009 at 6:34 PM, Bill Aubill.w...@gmail.com wrote:  FYI, it took me a while to discover that SolrJ by default uses a GET request for query, which uses ISO-8859-1. That depends on the servlet container. SolrJ GET requests are sent in UTF-8. Some servlet containers such as

µTorrent indexed as µTorrent

2009-07-28 Thread Bill Au
I am using SolrJ to index the word µTorrent. After a commit I was not able to query for it. It turns out that the document in my Solr index contains the word µTorrent instead of µTorrent. Any one has any idea what's going on??? Bill

Re: µTorrent indexed as µTorrent

2009-07-28 Thread Robert Muir
Bill, somewhere in the process I think you might be treating your UTF-8 text as ISO-8859-1. Your character: 00B5 (µ) Bits: 10110101 UTF8-encoded: 1110 10110101 If you were to treat these bytes as ISO-8859-1 (i.e. reading from a file or wrong url encoding) then it looks like: 0xC2 (Å)