Also I don't think the bean.LOG works in UTF-8 but only Iso-latin-1 2010/4/1, MilleBii <mille...@gmail.com>: > I use tomcat6 in UTF-8, no problem. > > Don't forget a Java String is Unicode not UTF-8 so in the query filter > you should think Unicode and not UTF-8 > > 2010/4/1, Hannu Väisänen <hvais...@joyx.joensuu.fi>: >> I have a query filter that works when I search from the command line >> >> $ bin/nutch org.apache.nutch.searcher.NutchBean word >> >> >> The query filter crashes when it calls native code when I search >> through tomcat6 for a word that contains letters that are not in >> ASCII. >> >> Filter assumes that its input is in UTF-8 and >> I have configured tomcat6 to use UTF-8 everywhere. >> >> So either I have configured tomcat6 incorrectly or I should >> configure Nutch to use UTF-8. >> >> >> This is a log sippet from file catalina.out (MorphologyHVQueryFilter is >> my >> query filter), >> >> $ less catalina.out >> >> 2010-03-31 19:30:05,455 INFO NutchBean - query request from ::1 >> 2010-03-31 19:30:05,466 INFO NutchBean - query: k<E4>si >> <==== >> This is not UTF-8. >> 2010-03-31 19:30:05,466 INFO NutchBean - lang: fi >> 2010-03-31 19:30:05,472 INFO NutchBean - searching for 20 raw hits >> 2010-03-31 19:30:05,472 INFO MorphologyHVQueryFilter - >> MorphologyHVQueryFilter.filter käsi @ +(url:käsi^4.0 anchor:käsi^2.0 >> content:käsi title >> :käsi^1.5 host:käsi^2.0) >> 2010-03-31 19:30:05,472 INFO MorphologyHVQueryFilter - clauses.length 1 >> 2010-03-31 19:30:05,472 INFO MorphologyHVQueryFilter - TermSet >> [käsi] >> Clause käsi >> 2010-03-31 19:30:05,472 INFO MorphologyHVQueryFilter - Word [käsi] >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x0115f0e5, pid=3840, tid=2392849264 >> >> >> As you can see, the INFO output from NutchBean is not in UTF-8. >> Does that mean that I should configure Nutch or reconfiure tomcat6? >> >> Do you have any ideas on what I shoud do next? >> > > > -- > -MilleBii- >
-- -MilleBii-