Re: How do I make sure the resulting documents contain the query terms?

Gabriele Kahlout Tue, 07 Jun 2011 08:37:04 -0700

You are right, Lucene will return based on my scoring function
implementation (Similarity
class<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html>
):

score(q,d)   =
coord(q,d)<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_coord>
·
queryNorm(q)<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_queryNorm>
·
∑  ( tf(t in 
d)<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_tf>
·
idf(t)<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_idf>
2  ·  
t.getBoost()<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_termBoost>
·
norm(t,d)<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_norm>
)
It can be seen that whenever tf(t in d) =0 the whole score will be 0, so as
you say C will never be returned.

My issue is when the query has multiple terms (my example was too simple!),
and some are 'mandatory' while others not. In that case I should make a
query that uses the
+<%20http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#+>(eg.
q=+k1).
I'm unsure I'll get the syntax right, but let's say k1 is mandatory and and
k2 and k3 are optional, then q=k2 k3 +k1. I see that queries made through
solrj are received with + in place of the " " (default to OR), so
q=k2+k3++k1.

On Tue, Jun 7, 2011 at 5:23 PM, Jonathan Rochkind <rochk...@jhu.edu> wrote:

> Um, normally that would never happen, because, well, like you say, the
> inverted index doesn't have docC for term K1, because doc C didn't include
> term K1.
>
> If you search on q=K1, then how/why would docC ever be in your result set?
>  Are you seeing it in your result set? The question then would be _why_,
> what weird thing is going on to make that happen,  that's not expected.
>
> The result set _starts_ from only the documents that actually include the
> term.  Boosting/relevancy ranking only effects what order these documents
> appear in, but there's no reason documentC should be in the result set at
> all in your case of q=k1, where docC is not indexed under k1.
>
>
> On 6/7/2011 2:35 AM, Gabriele Kahlout wrote:
>
>> Sorry being unclear and thank you for answering.
>> Consider the following documents A(k0,k1,k2), B(k1,k2,k3), and
>> C(k0,k2,k3),
>> where A,B,C are document identifiers and the ks in bracket with each are
>> the
>> terms each contains.
>> So Solr inverted index should be something like:
>>
>> k0 -->  A | C
>> k1 -->  A | B
>> k2 -->  A | B | C
>> k3 -->  B | C
>>
>> Now let q=k1, how do I make sure C doesn't appear as a result since it
>> doesn't contain any occurence of k1?
>>
>

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: How do I make sure the resulting documents contain the query terms?

Reply via email to