On Jul 16, 2008, at 4:33 AM, Stefan Oestreicher wrote:
Yes you're right. I was testing with analysis.jsp but it chokes on
multibyte
chars.
I modified the jsp and set the encoding using
request.setCharacterEncoding("UTF-8");
and it's working fine. Bug in analysis.jsp?
Yeah, it's recently been
gt; From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
> Of Yonik Seeley
> Sent: Tuesday, July 15, 2008 6:29 PM
> To: solr-user@lucene.apache.org
> Subject: Re: WordDelimiterFilter splits at non-ASCII chars
>
> On Tue, Jul 15, 2008 at 10:29 AM, Stefan Oestreicher
>
On Tue, Jul 15, 2008 at 10:29 AM, Stefan Oestreicher
<[EMAIL PROTECTED]> wrote:
> as I understand the WordDelimiterFilter should split on case changes, word
> delimiters and changes from character to digit, but it should not
> differentiate between ASCII and multibyte chars. It does however. The wo
Hi Stefan,
I wrote a test case for the problem you described but it is working fine. I
used the following definition:
What configuration are you using? If it is different, please share it so
that I can test with it.
On Tue, Jul 15, 2008 at 7:59 PM, Stefan Oestreicher <
[EMAIL PROTECTED]> wrote
Hi,
as I understand the WordDelimiterFilter should split on case changes, word
delimiters and changes from character to digit, but it should not
differentiate between ASCII and multibyte chars. It does however. The word
"hälse" (german plural of "neck") gets split into "h", "ä" and "lse", which
un