Re: custom scoring

Em Thu, 16 Feb 2012 11:10:31 -0800

Hello Carlos,

> We have some more tests on that matter: now we're moving from issuing this
> large query through the SOLR interface to creating our own
QueryParser. The
> initial tests we've done in our QParser (that internally creates multiple
> queries and inserts them inside a DisjunctionMaxQuery) are very good,
we're
> getting very good response times and high quality answers. But when we've
> tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a
> QueryValueSource that wraps the DisMaxQuery), then the times move from
> 10-20 msec to 200-300msec.
I reviewed the sourcecode and yes, the FunctionQuery iterates over the
whole index, however... let's see!


In relation to the DisMaxQuery you create within your parser: What kind
of clause is the FunctionQuery and what kind of clause are your other
queries (MUST, SHOULD, MUST_NOT...)?

*I* would expect that with a shrinking set of matching documents to the
overall-query, the function query only checks those documents that are
guaranteed to be within the result set.

> Note that we're using early termination of queries (via a custom
> collector), and therefore (as shown by the numbers I included above) even
> if the query is very complex, we're getting very fast answers. The only
> situation where the response time explodes is when we include a
> FunctionQuery.
Could you give us some details about how/where did you plugin the
Collector, please?

Kind regards,
Em

Am 16.02.2012 19:41, schrieb Carlos Gonzalez-Cadenas:
> Hello Em:
> 
> Thanks for your answer.
> 
> Yes, we initially also thought that the excessive increase in response time
> was caused by the several queries being executed, and we did another test.
> We executed one of the subqueries that I've shown to you directly in the
> "q" parameter and then we tested this same subquery (only this one, without
> the others) with the function query "query($q1)" in the "q" parameter.
> 
> Theoretically the times for these two queries should be more or less the
> same, but the second one is several times slower than the first one. After
> this observation we learned more about function queries and we learned from
> the code and from some comments in the forums [1] that the FunctionQueries
> are expected to match all documents.
> 
> We have some more tests on that matter: now we're moving from issuing this
> large query through the SOLR interface to creating our own QueryParser. The
> initial tests we've done in our QParser (that internally creates multiple
> queries and inserts them inside a DisjunctionMaxQuery) are very good, we're
> getting very good response times and high quality answers. But when we've
> tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a
> QueryValueSource that wraps the DisMaxQuery), then the times move from
> 10-20 msec to 200-300msec.
> 
> Note that we're using early termination of queries (via a custom
> collector), and therefore (as shown by the numbers I included above) even
> if the query is very complex, we're getting very fast answers. The only
> situation where the response time explodes is when we include a
> FunctionQuery.
> 
> Re: your question of what we're trying to achieve ... We're implementing a
> powerful query autocomplete system, and we use several fields to a) improve
> performance on wildcard queries and b) have a very precise control over the
> score.
> 
> Thanks a lot for your help,
> Carlos
> 
> [1]: http://grokbase.com/p/lucene/solr-user/11bjw87bt5/functionquery-score-0
> 
> Carlos Gonzalez-Cadenas
> CEO, ExperienceOn - New generation search
> http://www.experienceon.com
> 
> Mobile: +34 652 911 201
> Skype: carlosgonzalezcadenas
> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
> 
> 
> On Thu, Feb 16, 2012 at 7:09 PM, Em <mailformailingli...@yahoo.de> wrote:
> 
>> Hello Carlos,
>>
>> well, you must take into account that you are executing up to 8 queries
>> per request instead of one query per request.
>>
>> I am not totally sure about the details of the implementation of the
>> max-function-query, but I guess it first iterates over the results of
>> the first max-query, afterwards over the results of the second max-query
>> and so on. This is a much higher complexity than in the case of a normal
>> query.
>>
>> I would suggest you to optimize your request. I don't think that this
>> particular function query is matching *all* docs. Instead I think it
>> just matches those docs specified by your inner-query (although I might
>> be wrong about that).
>>
>> What are you trying to achieve by your request?
>>
>> Regards,
>> Em
>>
>> Am 16.02.2012 16:24, schrieb Carlos Gonzalez-Cadenas:
>>> Hello Em:
>>>
>>> The URL is quite large (w/ shards, ...), maybe it's best if I paste the
>>> relevant parts.
>>>
>>> Our "q" parameter is:
>>>
>>>
>> "q":"_val_:\"product(query_score,max(query($q8),max(query($q7),max(query($q4),query($q3)))))\"",
>>>
>>> The subqueries q8, q7, q4 and q3 are regular queries, for example:
>>>
>>> "q7":"stopword_phrase:colomba~1 AND stopword_phrase:santa AND
>>> wildcard_stopword_phrase:car^0.7 AND stopword_phrase:hoteles OR
>>> (stopword_phrase:las AND stopword_phrase:de)"
>>>
>>> We've executed the subqueries q3-q8 independently and they're very fast,
>>> but when we introduce the function queries as described below, it all
>> goes
>>> 10X slower.
>>>
>>> Let me know if you need anything else.
>>>
>>> Thanks
>>> Carlos
>>>
>>>
>>> Carlos Gonzalez-Cadenas
>>> CEO, ExperienceOn - New generation search
>>> http://www.experienceon.com
>>>
>>> Mobile: +34 652 911 201
>>> Skype: carlosgonzalezcadenas
>>> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
>>>
>>>
>>> On Thu, Feb 16, 2012 at 4:02 PM, Em <mailformailingli...@yahoo.de>
>> wrote:
>>>
>>>> Hello carlos,
>>>>
>>>> could you show us how your Solr-call looks like?
>>>>
>>>> Regards,
>>>> Em
>>>>
>>>> Am 16.02.2012 14:34, schrieb Carlos Gonzalez-Cadenas:
>>>>> Hello all:
>>>>>
>>>>> We'd like to score the matching documents using a combination of SOLR's
>>>> IR
>>>>> score with another application-specific score that we store within the
>>>>> documents themselves (i.e. a float field containing the app-specific
>>>>> score). In particular, we'd like to calculate the final score doing
>> some
>>>>> operations with both numbers (i.e product, sqrt, ...)
>>>>>
>>>>> According to what we know, there are two ways to do this in SOLR:
>>>>>
>>>>> A) Sort by function [1]: We've tested an expression like
>>>>> "sort=product(score, query_score)" in the SOLR query, where score is
>> the
>>>>> common SOLR IR score and query_score is our own precalculated score,
>> but
>>>> it
>>>>> seems that SOLR can only do this with stored/indexed fields (and
>>>> obviously
>>>>> "score" is not stored/indexed).
>>>>>
>>>>> B) Function queries: We've used _val_ and function queries like max,
>> sqrt
>>>>> and query, and we've obtained the desired results from a functional
>> point
>>>>> of view. However, our index is quite large (400M documents) and the
>>>>> performance degrades heavily, given that function queries are AFAIK
>>>>> matching all the documents.
>>>>>
>>>>> I have two questions:
>>>>>
>>>>> 1) Apart from the two options I mentioned, is there any other (simple)
>>>> way
>>>>> to achieve this that we're not aware of?
>>>>>
>>>>> 2) If we have to choose the function queries path, would it be very
>>>>> difficult to modify the actual implementation so that it doesn't match
>>>> all
>>>>> the documents, that is, to pass a query so that it only operates over
>> the
>>>>> documents matching the query?. Looking at the FunctionQuery.java source
>>>>> code, there's a comment that says "// instead of matching all docs, we
>>>>> could also embed a query. the score could either ignore the subscore,
>> or
>>>>> boost it", which is giving us some hope that maybe it's possible and
>> even
>>>>> desirable to go in this direction. If you can give us some directions
>>>> about
>>>>> how to go about this, we may be able to do the actual implementation.
>>>>>
>>>>> BTW, we're using Lucene/SOLR trunk.
>>>>>
>>>>> Thanks a lot for your help.
>>>>> Carlos
>>>>>
>>>>> [1]: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
>>>>>
>>>>
>>>
>>
>

Re: custom scoring

Reply via email to