Has anyone written something like a GoogleQueryParser?
Such a parser would differ in the behavior of the default parser in the
following points:
- Default AND rather than OR.
- Treat a-b as a-b rather that a -b.
- Perhaps disallow ~.
I guess I could write my own QueryParser, just wanted to be
- Treat a-b as a-b rather that a -b.
I came across the same. Quite an essential issue for some european sites
(as you surely know :-)
I'm not very familiar with JavaCC, but I changed QueryParser.jj in the
following way:
I changed
| MINUS: -
to
| MINUS: -
and removed - from the
In the FAQ it reads
score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t)
* coord_q_d
1. I think the new document boost is missing, isn't it?
With that it should be something like
score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t)
* coord_q_d *
- Default AND rather than OR.
As for this part: This can be accomplished with
queryParser = new QueryParser(defaultField, new MyAnalyzer());
queryParser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
- Treat a-b as a-b rather that a -b.
That would be interesting for
This actually changes the behaviour to that of google and I didn't
experience any negative side effects (yet).
Thanks. I hope there will eventually be some standard way to accomplish
this...
--
Eric Jain
--
To unsubscribe, e-mail: mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
-Original Message-
From: Eric Jain [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, September 11, 2002 1:44 PM
To: Clemens Marschner
Cc: Lucene Users List
Subject: Re: GoogleQueryParser
queryParser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
Thanks, that would be exactely what I need.
I have seen that a norm factor between 0 and 255 is read with
IndexReader.norms() in TermScorer.score().
I've seen now that this is an 8-bit float.
--
To unsubscribe, e-mail: mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]
score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t)
* coord_q_d
One last thing I wondered about: Is idf_t really going into that equation
twice?
From what I see, idf_t/norm_q is completely left out, isn't it?
tf_q is applied although it is never calculated - if a term
I think there's a bug, if I set the default operator to be OR, when I run
java org.apache.lucene.queryParser.QueryParser a AND b OR c
it will give me the result of +a +b c
if I set the default operator to be AND, and run it with the term a b OR
c, it will give me +a b c, which is different
Clemens Marschner wrote:
1. I think the new document boost is missing, isn't it?
With that it should be something like
score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * boost_t)
* coord_q_d * boost_d
Is that correct?
Almost. This should actually be boost_d * boost_d_t,
10 matches
Mail list logo