Hi all,
We're trying to implement a nutch app (version 0.8) that allows for
Boolean OR e.g. (this OR that) AND (something OR other). I've found some
relevent posts in the mailing list archive, but I think I'm missing
something. For example, here's a snippet from a post from Doug Cutting:
<snip>
that said, one can implement OR as a filter (replacing or altering
BasicQueryFilter) that scans for terms whose text is "OR" in the default
field.
</snip>
The problem I'm finding is that the NutchAnalysis analyzer seems to be
swallowing all boolean terms by the time the QueryFilter is even
executed (perhaps because OR is a stop word?). To wit:
String queryText = "this OR that";
org.apache.nutch.searcher.Query query =
org.apache.nutch.searcher.Query.parse(queryText, conf);
for (int i=0;i<query.getTerms().length;i++) {
System.out.println("Term = " + query.getTerms()[i]);
}
This results in output that looks like this:
Term = this
Term = that
So am I correct in believing that in order to implement boolean OR using
Nutch search and a QueryFilter, one must also (minimally) hack the
NutchAnalysis.jj file to produce a new analyzer? Also, given that a
Nutch Query object doesn't seem to have a method to add a non-required
Term or Phrase, does that need to be modified as well?
Sorry for the long post, and thanks in advance...
-David Odmark
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general