Also I don't think the bean.LOG works in UTF-8 but only Iso-latin-1

2010/4/1, MilleBii <mille...@gmail.com>:
> I use tomcat6 in UTF-8, no problem.
>
> Don't forget a Java String is Unicode not UTF-8 so in the query filter
> you should think Unicode and not UTF-8
>
> 2010/4/1, Hannu Väisänen <hvais...@joyx.joensuu.fi>:
>> I have a query filter that works when I search from the command line
>>
>> $ bin/nutch org.apache.nutch.searcher.NutchBean word
>>
>>
>> The query filter crashes when it calls native code when I search
>> through tomcat6 for a word that contains letters that are not in
>> ASCII.
>>
>> Filter assumes that its input is in UTF-8 and
>> I have configured tomcat6 to use UTF-8 everywhere.
>>
>> So either I have configured tomcat6 incorrectly or I should
>> configure Nutch to use UTF-8.
>>
>>
>> This is a log sippet from file catalina.out (MorphologyHVQueryFilter is
>> my
>> query filter),
>>
>> $ less catalina.out
>>
>> 2010-03-31 19:30:05,455 INFO  NutchBean - query request from ::1
>> 2010-03-31 19:30:05,466 INFO  NutchBean - query: k<E4>si
>> <====
>> This is not UTF-8.
>> 2010-03-31 19:30:05,466 INFO  NutchBean - lang: fi
>> 2010-03-31 19:30:05,472 INFO  NutchBean - searching for 20 raw hits
>> 2010-03-31 19:30:05,472 INFO  MorphologyHVQueryFilter -
>> MorphologyHVQueryFilter.filter käsi @ +(url:käsi^4.0 anchor:käsi^2.0
>> content:käsi title
>> :käsi^1.5 host:käsi^2.0)
>> 2010-03-31 19:30:05,472 INFO  MorphologyHVQueryFilter - clauses.length 1
>> 2010-03-31 19:30:05,472 INFO  MorphologyHVQueryFilter - TermSet
>> [käsi]
>> Clause käsi
>> 2010-03-31 19:30:05,472 INFO  MorphologyHVQueryFilter - Word [käsi]
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x0115f0e5, pid=3840, tid=2392849264
>>
>>
>> As you can see, the INFO output from NutchBean is not in UTF-8.
>> Does that mean that I should configure Nutch or reconfiure tomcat6?
>>
>> Do you have any ideas on what I shoud do next?
>>
>
>
> --
> -MilleBii-
>


-- 
-MilleBii-

Reply via email to