I have a query filter that works when I search from the command line $ bin/nutch org.apache.nutch.searcher.NutchBean word
The query filter crashes when it calls native code when I search through tomcat6 for a word that contains letters that are not in ASCII. Filter assumes that its input is in UTF-8 and I have configured tomcat6 to use UTF-8 everywhere. So either I have configured tomcat6 incorrectly or I should configure Nutch to use UTF-8. This is a log sippet from file catalina.out (MorphologyHVQueryFilter is my query filter), $ less catalina.out 2010-03-31 19:30:05,455 INFO NutchBean - query request from ::1 2010-03-31 19:30:05,466 INFO NutchBean - query: k<E4>si <==== This is not UTF-8. 2010-03-31 19:30:05,466 INFO NutchBean - lang: fi 2010-03-31 19:30:05,472 INFO NutchBean - searching for 20 raw hits 2010-03-31 19:30:05,472 INFO MorphologyHVQueryFilter - MorphologyHVQueryFilter.filter käsi @ +(url:käsi^4.0 anchor:käsi^2.0 content:käsi title :käsi^1.5 host:käsi^2.0) 2010-03-31 19:30:05,472 INFO MorphologyHVQueryFilter - clauses.length 1 2010-03-31 19:30:05,472 INFO MorphologyHVQueryFilter - TermSet [käsi] Clause käsi 2010-03-31 19:30:05,472 INFO MorphologyHVQueryFilter - Word [käsi] # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0115f0e5, pid=3840, tid=2392849264 As you can see, the INFO output from NutchBean is not in UTF-8. Does that mean that I should configure Nutch or reconfiure tomcat6? Do you have any ideas on what I shoud do next?