This looks like a good approach. Note also that you will probably need
to change BasicQueryFilter and perhaps other filters to work correctly
with optional terms.
Nguyen Ngoc Giang wrote:
Sorry, I'm a newbie in OS, and I'm not familiar to the way of updating
patches :D
I'll try to put my solution here first to receive comments from our
community. Since we must differentiate 3 possibilities: must have, may have
and must not have; we need at least 2 boolean variables in
org.apache.nutch.searcher.Query. In fact, these 2 boolean variables are
isRequired and isProhibited.
-In the first step, I define an OR token separately in jj file. This will be
put before <WORD>. So it will look like this:
<OR: "OR">
-Second, I define a new function called disjunction:
void disjunction() :
{}
{
<OR> nonOpOrTerm()
}
-Third, in the function parse(), I declare a boolean variable disj:
boolean disj;
-Forth, inside parse(), once we finished looking ahead, we examine the
existence of OR token:
( LOOKAHEAD ... )?
// check OR
(disjunction() { disj = true; })*
-Finally, I changed the handling portion in parse():
if (stop
&& field == Clause.DEFAULT_FIELD
&& terms.size()==1
&& isStopWord(array[0])) {
// ignore stop words only when single, unadorned terms in default
field
} else {
if (prohibited)
query.addProhibitedPhrase(array, field);
else if (disj)
query.addOptionalPhrase(array, field);
else
query.addRequiredPhrase(array, field);
}
After this point, I have finished changing the jj file. Please note that I
also have to add the method addOptionalPhrase() in
org.apache.nutch.searcher.Query. This method basically sets isRequired=false
and isProhibited=false. The rest has been taken care by Nutch already.
Regards,
Giang
On 3/15/06, Laurent Michenaud <[EMAIL PROTECTED]> wrote:
I would like to use Boolean Query too :)
-----Message d'origine-----
De : Alexander Hixon [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 15 mars 2006 08:38
À : [email protected]
Objet : RE: Boolean OR QueryFilter
Maybe you could post the code on JIRA, if anyone else wishes to use
Boolean operators in their search queries..? We could probably get a
developer or two to put this in the 0.8 release? Since it IS open source.
;)
Just a thought,
Alex
-----Original Message-----
From: Nguyen Ngoc Giang [mailto:[EMAIL PROTECTED]
Sent: Wednesday, 15 March 2006 3:45 PM
To: [email protected]; [EMAIL PROTECTED]
Subject: Re: Boolean OR QueryFilter
Hi David,
I also did a similar task. In fact, I hacked into jj code to add the
definition for OR and NOT. If you need any help, don't hesitate to contact
me :).
Regards,
Giang
PS: I also believe that a hack to jj code is necessary.
On 3/8/06, David Odmark <[EMAIL PROTECTED]> wrote:
Hi all,
We're trying to implement a nutch app (version 0.8) that allows for
Boolean OR e.g. (this OR that) AND (something OR other). I've found
some relevent posts in the mailing list archive, but I think I'm
missing something. For example, here's a snippet from a post from Doug
Cutting:
<snip>
that said, one can implement OR as a filter (replacing or altering
BasicQueryFilter) that scans for terms whose text is "OR" in the
default field.
</snip>
The problem I'm finding is that the NutchAnalysis analyzer seems to be
swallowing all boolean terms by the time the QueryFilter is even
executed (perhaps because OR is a stop word?). To wit:
String queryText = "this OR that";
org.apache.nutch.searcher.Query query =
org.apache.nutch.searcher.Query.parse(queryText, conf); for (int
i=0;i<query.getTerms().length;i++) {
System.out.println("Term = " + query.getTerms()[i]); }
This results in output that looks like this:
Term = this
Term = that
So am I correct in believing that in order to implement boolean OR
using Nutch search and a QueryFilter, one must also (minimally) hack
the NutchAnalysis.jj file to produce a new analyzer? Also, given that
a Nutch Query object doesn't seem to have a method to add a
non-required Term or Phrase, does that need to be modified as well?
Sorry for the long post, and thanks in advance...
-David Odmark
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general