This looks like a good approach. Note also that you will probably need to change BasicQueryFilter and perhaps other filters to work correctly with optional terms.

Nguyen Ngoc Giang wrote:
Sorry, I'm a newbie in OS, and I'm not familiar to the way of updating
patches :D
I'll try to put my solution here first to receive comments from our
community. Since we must differentiate 3 possibilities: must have, may have
and must not have; we need at least 2 boolean variables in
org.apache.nutch.searcher.Query. In fact, these 2 boolean variables are
isRequired and isProhibited.

-In the first step, I define an OR token separately in jj file. This will be
put before <WORD>. So it will look like this:
<OR: "OR">

-Second, I define a new function called disjunction:
void disjunction() :
{}
{
    <OR> nonOpOrTerm()
}

-Third, in the function parse(), I declare a boolean variable disj:
boolean disj;

-Forth, inside parse(), once we finished looking ahead, we examine the
existence of OR token:
( LOOKAHEAD ... )?
// check OR
(disjunction() { disj = true; })*

-Finally, I changed the handling portion in parse():
if (stop
          && field == Clause.DEFAULT_FIELD
          && terms.size()==1
          && isStopWord(array[0])) {
        // ignore stop words only when single, unadorned terms in default
field
      } else {
        if (prohibited)
          query.addProhibitedPhrase(array, field);
        else if (disj)
          query.addOptionalPhrase(array, field);
        else
          query.addRequiredPhrase(array, field);
      }

  After this point, I have finished changing the jj file. Please note that I
also have to add the method addOptionalPhrase() in
org.apache.nutch.searcher.Query. This method basically sets isRequired=false
and isProhibited=false. The rest has been taken care by Nutch already.

  Regards,
  Giang


On 3/15/06, Laurent Michenaud <[EMAIL PROTECTED]> wrote:

I would like to use Boolean Query too :)

-----Message d'origine-----
De : Alexander Hixon [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 15 mars 2006 08:38
À : [email protected]
Objet : RE: Boolean OR QueryFilter

Maybe you could post the code on JIRA, if anyone else wishes to use
Boolean operators in their search queries..? We could probably get a
developer or two to put this in the 0.8 release? Since it IS open source.
;)

Just a thought,
Alex

-----Original Message-----
From: Nguyen Ngoc Giang [mailto:[EMAIL PROTECTED]
Sent: Wednesday, 15 March 2006 3:45 PM
To: [email protected]; [EMAIL PROTECTED]
Subject: Re: Boolean OR QueryFilter

 Hi David,

 I also did a similar task. In fact, I hacked into jj code to add the
definition for OR and NOT. If you need any help, don't hesitate to contact
me :).

 Regards,
  Giang

PS: I also believe that a hack to jj code is necessary.

On 3/8/06, David Odmark <[EMAIL PROTECTED]> wrote:

Hi all,

We're trying to implement a nutch app (version 0.8) that allows for
Boolean OR e.g. (this OR that) AND (something OR other). I've found
some relevent posts in the mailing list archive, but I think I'm
missing something. For example, here's a snippet from a post from Doug

Cutting:

<snip>
that said, one can implement OR as a filter (replacing or altering
BasicQueryFilter) that scans for terms whose text is "OR" in the
default field.
</snip>

The problem I'm finding is that the NutchAnalysis analyzer seems to be
swallowing all boolean terms by the time the QueryFilter is even
executed (perhaps because OR is a stop word?). To wit:

String queryText = "this OR that";
org.apache.nutch.searcher.Query query =
org.apache.nutch.searcher.Query.parse(queryText, conf); for (int
i=0;i<query.getTerms().length;i++) {
           System.out.println("Term = " + query.getTerms()[i]); }

This results in output that looks like this:

Term = this
Term = that

So am I correct in believing that in order to implement boolean OR
using Nutch search and a QueryFilter, one must also (minimally) hack
the NutchAnalysis.jj file to produce a new analyzer? Also, given that
a Nutch Query object doesn't seem to have a method to add a
non-required Term or Phrase, does that need to be modified as well?

Sorry for the long post, and thanks in advance...

-David Odmark








-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to