Thanks, Robert. That's exactly what my problem was. Things work find after
I make sure that all my processing (index and query) are using UTF-8. FYI,
it took me a while to discover that SolrJ by default uses a GET request for
query, which uses ISO-8859-1. I had to explicitly use a POST to do
On Thu, Jul 30, 2009 at 6:34 PM, Bill Aubill.w...@gmail.com wrote:
FYI, it took me a while to discover that SolrJ by default uses a GET request
for
query, which uses ISO-8859-1.
That depends on the servlet container. SolrJ GET requests are sent in
UTF-8. Some servlet containers such as
I am using SolrJ to index the word µTorrent. After a commit I was not able
to query for it. It turns out that the document in my Solr index contains
the word µTorrent instead of µTorrent. Any one has any idea what's going
on???
Bill
Bill, somewhere in the process I think you might be treating your
UTF-8 text as ISO-8859-1.
Your character: 00B5 (µ)
Bits: 10110101
UTF8-encoded: 1110 10110101
If you were to treat these bytes as ISO-8859-1 (i.e. reading from a
file or wrong url encoding) then it looks like:
0xC2 (Å)