Re: Cyrillic characters

2006-07-18 Thread WHIRLYCOTT
On Jul 18, 2006, at 5:53 PM, Tricia Williams wrote: that using the packaged example admin interface entering a query with a string of cyrillic characters causes a java.lang.ArrayIndexOutOfBoundsException ... I have this much fixed as well. However, I'm still walking data through the stack

Re: Cyrillic characters

2006-07-18 Thread Yonik Seeley
On 7/18/06, Tricia Williams <[EMAIL PROTECTED]> wrote: My sample query is: .. (the english word _canada_ translated into russian) or %D0%9A%D0%B0%D0%BD%D0%B0%D0%B4%D0%B0 (utf-8) or %26%231050%3B%26%231072%3B%26%231085%3B%26%231072%3B%26%231076%3B%26%231072%3B (solr url encoding) Hi Tricia,

Re: Cyrillic characters

2006-07-18 Thread WHIRLYCOTT
I've started poking around and have fixed already one bug related to URL encoding of data. I'm going to work some more on this tonight and will hopefully have a patch for you soon. phil. On Jul 18, 2006, at 6:19 PM, Yonik Seeley wrote: On 7/18/06, WHIRLYCOTT <[EMAIL PROTECTED]> wrote: Ho

Re: Cyrillic characters

2006-07-18 Thread Yonik Seeley
Definitely some Firefox bugs with UTF8 at least: If I go to the admin screen, and paste in héllo into the query box, then kill Solr and run netcat to see exactly what I get, it's the following: $ nc -l -p 8983 GET /solr/select/?stylesheet=&q=h%E9llo&version=2.1&start=0&rows=10&indent=on HT TP/1.1

Re: Cyrillic characters

2006-07-18 Thread Chris Hostetter
: ps. I am using mozilla firefox as my main browser which leads to the : behaviour I reported above. IE 6.0 works fine for cyrillics although : there is still a strange but different encoding (%CA%E0%ED%E0%E4%E0 for : the same query as before). The problem may not be in the Solr internals as mu

Re: Cyrillic characters

2006-07-18 Thread Yonik Seeley
OK, lets split up the indexing side from the query side for a moment and assume that you are indexing correctly (setting the content-type correctly, etc). I just added a new value to the multi-valued features field to the solr.xml example document: "Good unicode support: héllo (hello with an acc

Re: Cyrillic characters

2006-07-18 Thread Yonik Seeley
On 7/18/06, WHIRLYCOTT <[EMAIL PROTECTED]> wrote: How much testing have people done using UTF-8 data on Solr? UTF-8 query *output* is well tested with Resin within CNET. Indexing UTF-8 is also well tested (again, mostly with Resin). UTF-8 query input is not really tested at all AFAIK (the q par

Re: Cyrillic characters

2006-07-18 Thread WHIRLYCOTT
Crap, you're right. I have a well-tested application that's using UTF-8 everywhere possible and I just tested with some Russian text. Solr's coughing up this as an exception: Jul 18, 2006 6:00:05 PM org.apache.solr.core.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: 1

Cyrillic characters

2006-07-18 Thread Tricia Williams
Hi all, I'm trying to adapt our old cocoon/lucene based web search application to one that is more solrish. Our old web app was capable of searching for queries with cyrillic characters in them. I'm finding that using the packaged example admin interface entering a query with a string of