Hello Carlos, > We have some more tests on that matter: now we're moving from issuing this > large query through the SOLR interface to creating our own QueryParser. The > initial tests we've done in our QParser (that internally creates multiple > queries and inserts them inside a DisjunctionMaxQuery) are very good, we're > getting very good response times and high quality answers. But when we've > tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a > QueryValueSource that wraps the DisMaxQuery), then the times move from > 10-20 msec to 200-300msec. I reviewed the sourcecode and yes, the FunctionQuery iterates over the whole index, however... let's see!
In relation to the DisMaxQuery you create within your parser: What kind of clause is the FunctionQuery and what kind of clause are your other queries (MUST, SHOULD, MUST_NOT...)? *I* would expect that with a shrinking set of matching documents to the overall-query, the function query only checks those documents that are guaranteed to be within the result set. > Note that we're using early termination of queries (via a custom > collector), and therefore (as shown by the numbers I included above) even > if the query is very complex, we're getting very fast answers. The only > situation where the response time explodes is when we include a > FunctionQuery. Could you give us some details about how/where did you plugin the Collector, please? Kind regards, Em Am 16.02.2012 19:41, schrieb Carlos Gonzalez-Cadenas: > Hello Em: > > Thanks for your answer. > > Yes, we initially also thought that the excessive increase in response time > was caused by the several queries being executed, and we did another test. > We executed one of the subqueries that I've shown to you directly in the > "q" parameter and then we tested this same subquery (only this one, without > the others) with the function query "query($q1)" in the "q" parameter. > > Theoretically the times for these two queries should be more or less the > same, but the second one is several times slower than the first one. After > this observation we learned more about function queries and we learned from > the code and from some comments in the forums [1] that the FunctionQueries > are expected to match all documents. > > We have some more tests on that matter: now we're moving from issuing this > large query through the SOLR interface to creating our own QueryParser. The > initial tests we've done in our QParser (that internally creates multiple > queries and inserts them inside a DisjunctionMaxQuery) are very good, we're > getting very good response times and high quality answers. But when we've > tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. with a > QueryValueSource that wraps the DisMaxQuery), then the times move from > 10-20 msec to 200-300msec. > > Note that we're using early termination of queries (via a custom > collector), and therefore (as shown by the numbers I included above) even > if the query is very complex, we're getting very fast answers. The only > situation where the response time explodes is when we include a > FunctionQuery. > > Re: your question of what we're trying to achieve ... We're implementing a > powerful query autocomplete system, and we use several fields to a) improve > performance on wildcard queries and b) have a very precise control over the > score. > > Thanks a lot for your help, > Carlos > > [1]: http://grokbase.com/p/lucene/solr-user/11bjw87bt5/functionquery-score-0 > > Carlos Gonzalez-Cadenas > CEO, ExperienceOn - New generation search > http://www.experienceon.com > > Mobile: +34 652 911 201 > Skype: carlosgonzalezcadenas > LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas > > > On Thu, Feb 16, 2012 at 7:09 PM, Em <mailformailingli...@yahoo.de> wrote: > >> Hello Carlos, >> >> well, you must take into account that you are executing up to 8 queries >> per request instead of one query per request. >> >> I am not totally sure about the details of the implementation of the >> max-function-query, but I guess it first iterates over the results of >> the first max-query, afterwards over the results of the second max-query >> and so on. This is a much higher complexity than in the case of a normal >> query. >> >> I would suggest you to optimize your request. I don't think that this >> particular function query is matching *all* docs. Instead I think it >> just matches those docs specified by your inner-query (although I might >> be wrong about that). >> >> What are you trying to achieve by your request? >> >> Regards, >> Em >> >> Am 16.02.2012 16:24, schrieb Carlos Gonzalez-Cadenas: >>> Hello Em: >>> >>> The URL is quite large (w/ shards, ...), maybe it's best if I paste the >>> relevant parts. >>> >>> Our "q" parameter is: >>> >>> >> "q":"_val_:\"product(query_score,max(query($q8),max(query($q7),max(query($q4),query($q3)))))\"", >>> >>> The subqueries q8, q7, q4 and q3 are regular queries, for example: >>> >>> "q7":"stopword_phrase:colomba~1 AND stopword_phrase:santa AND >>> wildcard_stopword_phrase:car^0.7 AND stopword_phrase:hoteles OR >>> (stopword_phrase:las AND stopword_phrase:de)" >>> >>> We've executed the subqueries q3-q8 independently and they're very fast, >>> but when we introduce the function queries as described below, it all >> goes >>> 10X slower. >>> >>> Let me know if you need anything else. >>> >>> Thanks >>> Carlos >>> >>> >>> Carlos Gonzalez-Cadenas >>> CEO, ExperienceOn - New generation search >>> http://www.experienceon.com >>> >>> Mobile: +34 652 911 201 >>> Skype: carlosgonzalezcadenas >>> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas >>> >>> >>> On Thu, Feb 16, 2012 at 4:02 PM, Em <mailformailingli...@yahoo.de> >> wrote: >>> >>>> Hello carlos, >>>> >>>> could you show us how your Solr-call looks like? >>>> >>>> Regards, >>>> Em >>>> >>>> Am 16.02.2012 14:34, schrieb Carlos Gonzalez-Cadenas: >>>>> Hello all: >>>>> >>>>> We'd like to score the matching documents using a combination of SOLR's >>>> IR >>>>> score with another application-specific score that we store within the >>>>> documents themselves (i.e. a float field containing the app-specific >>>>> score). In particular, we'd like to calculate the final score doing >> some >>>>> operations with both numbers (i.e product, sqrt, ...) >>>>> >>>>> According to what we know, there are two ways to do this in SOLR: >>>>> >>>>> A) Sort by function [1]: We've tested an expression like >>>>> "sort=product(score, query_score)" in the SOLR query, where score is >> the >>>>> common SOLR IR score and query_score is our own precalculated score, >> but >>>> it >>>>> seems that SOLR can only do this with stored/indexed fields (and >>>> obviously >>>>> "score" is not stored/indexed). >>>>> >>>>> B) Function queries: We've used _val_ and function queries like max, >> sqrt >>>>> and query, and we've obtained the desired results from a functional >> point >>>>> of view. However, our index is quite large (400M documents) and the >>>>> performance degrades heavily, given that function queries are AFAIK >>>>> matching all the documents. >>>>> >>>>> I have two questions: >>>>> >>>>> 1) Apart from the two options I mentioned, is there any other (simple) >>>> way >>>>> to achieve this that we're not aware of? >>>>> >>>>> 2) If we have to choose the function queries path, would it be very >>>>> difficult to modify the actual implementation so that it doesn't match >>>> all >>>>> the documents, that is, to pass a query so that it only operates over >> the >>>>> documents matching the query?. Looking at the FunctionQuery.java source >>>>> code, there's a comment that says "// instead of matching all docs, we >>>>> could also embed a query. the score could either ignore the subscore, >> or >>>>> boost it", which is giving us some hope that maybe it's possible and >> even >>>>> desirable to go in this direction. If you can give us some directions >>>> about >>>>> how to go about this, we may be able to do the actual implementation. >>>>> >>>>> BTW, we're using Lucene/SOLR trunk. >>>>> >>>>> Thanks a lot for your help. >>>>> Carlos >>>>> >>>>> [1]: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function >>>>> >>>> >>> >> >