I have a query filter that works when I search from the command line

$ bin/nutch org.apache.nutch.searcher.NutchBean word


The query filter crashes when it calls native code when I search
through tomcat6 for a word that contains letters that are not in
ASCII.

Filter assumes that its input is in UTF-8 and
I have configured tomcat6 to use UTF-8 everywhere.

So either I have configured tomcat6 incorrectly or I should
configure Nutch to use UTF-8.


This is a log sippet from file catalina.out (MorphologyHVQueryFilter is my 
query filter),

$ less catalina.out

2010-03-31 19:30:05,455 INFO  NutchBean - query request from ::1
2010-03-31 19:30:05,466 INFO  NutchBean - query: k<E4>si             <==== This 
is not UTF-8.
2010-03-31 19:30:05,466 INFO  NutchBean - lang: fi
2010-03-31 19:30:05,472 INFO  NutchBean - searching for 20 raw hits
2010-03-31 19:30:05,472 INFO  MorphologyHVQueryFilter - 
MorphologyHVQueryFilter.filter käsi @ +(url:käsi^4.0 anchor:käsi^2.0 
content:käsi title
:käsi^1.5 host:käsi^2.0)
2010-03-31 19:30:05,472 INFO  MorphologyHVQueryFilter - clauses.length 1
2010-03-31 19:30:05,472 INFO  MorphologyHVQueryFilter - TermSet      [käsi] 
Clause käsi
2010-03-31 19:30:05,472 INFO  MorphologyHVQueryFilter - Word [käsi]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0115f0e5, pid=3840, tid=2392849264


As you can see, the INFO output from NutchBean is not in UTF-8.
Does that mean that I should configure Nutch or reconfiure tomcat6?

Do you have any ideas on what I shoud do next?

Reply via email to