Okay, if you're using a custom similarity, I'm not sure what's going on, I'm not familiar with that.

But ordinarily, you are right, you would require k1 with "+k1".

What you say about the "+" being lost suggests something is going wrong. Either you are not sending your query to Solr properly escaped, or there's a bug in your custom similarity or query parser, or (not too likely) there's a bug in Solr.

My experience is using the standard query parser, standard similarity class, and contacting Solr via HTTP. (are you using SolrJ or HTTP?). In that case, when you send the "q" to Solr, you are responsible for URI-encoding it when you send it. So if you want to send a query like "k2 k3 +k1", you need to URI-escape it first, and this is what you'd send:

q=k2+k3+%2Bk1

or, escaping spaces as %20 instead, which is actually more 'correct' with current standards:

q=k2%20k3%20%2Bk1

The important thing is that "+" escapes as "%2B". You need to escape it before sending it to Solr via an HTTP URI query string or HTTP form post data. Yes, if you send a raw "+", Solr will understand that as representing a space, not an actual "+". This is because the "+" character is not 'safe', it needs to be escaped. The programming language of your choice probably already has a library function for URI-escaping values.

On 6/7/2011 11:36 AM, Gabriele Kahlout wrote:
You are right, Lucene will return based on my scoring function
implementation (Similarity
class<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html>
):

score(q,d)   =
coord(q,d)<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_coord>
·
queryNorm(q)<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_queryNorm>
·
∑  ( tf(t in 
d)<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_tf>
·
idf(t)<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_idf>
2  ·  
t.getBoost()<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_termBoost>
·
norm(t,d)<http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_norm>
)
It can be seen that whenever tf(t in d) =0 the whole score will be 0, so as
you say C will never be returned.

My issue is when the query has multiple terms (my example was too simple!),
and some are 'mandatory' while others not. In that case I should make a
query that uses the
+<%20http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#+>(eg.
q=+k1).
I'm unsure I'll get the syntax right, but let's say k1 is mandatory and and
k2 and k3 are optional, then q=k2 k3 +k1. I see that queries made through
solrj are received with + in place of the " " (default to OR), so
q=k2+k3++k1.



On Tue, Jun 7, 2011 at 5:23 PM, Jonathan Rochkind<rochk...@jhu.edu>  wrote:

Um, normally that would never happen, because, well, like you say, the
inverted index doesn't have docC for term K1, because doc C didn't include
term K1.

If you search on q=K1, then how/why would docC ever be in your result set?
  Are you seeing it in your result set? The question then would be _why_,
what weird thing is going on to make that happen,  that's not expected.

The result set _starts_ from only the documents that actually include the
term.  Boosting/relevancy ranking only effects what order these documents
appear in, but there's no reason documentC should be in the result set at
all in your case of q=k1, where docC is not indexed under k1.


On 6/7/2011 2:35 AM, Gabriele Kahlout wrote:

Sorry being unclear and thank you for answering.
Consider the following documents A(k0,k1,k2), B(k1,k2,k3), and
C(k0,k2,k3),
where A,B,C are document identifiers and the ks in bracket with each are
the
terms each contains.
So Solr inverted index should be something like:

k0 -->   A | C
k1 -->   A | B
k2 -->   A | B | C
k3 -->   B | C

Now let q=k1, how do I make sure C doesn't appear as a result since it
doesn't contain any occurence of k1?


Reply via email to