Re[2]: Sort hits in the order of subqueries

2018-06-07 Thread Robert K .
Hello,

I had a look at the Constant Score approach suggested by Emir: (q0^=100) OR 
(q1)^=90 ...

As observed by Alexandre it seems to introduce stratification at the cost of 
the intra-query ranking
which is not satisfactory.

So if I imagine Constant Score as a function f(x) = C operating on a document 
score and constrained
to a subquery then what I would like to have is sigmoid function F(x, C) = C + 
1 / (1+ exp(-x)) applied to
the document scores of intra-queries.

Instead of:

ConstantScore(q0, 100) OR ConstantScore(q1, 90) ...

then:

SigmoidScore(q0, 100) OR SigmoidScore(q1, 90) ...

I'm pretty sure, it is possible to take ConstantScore class and end up with 
Sigmoid as a custom extension.
Still hoping for a hint what is the simplest approach to achieve the 
stratification.


Next question which I have in this context: we happen to sort some intra 
queries by different fields in some cases.
It looks like:

(q0 sorted by date) OR (q1 sorted by relevancy)


Wondering if you have any idea how is that possible to formulate in Solr.

Regards,

Robert


>Четверг,  7 июня 2018, 15:20 +02:00 от Alexandre Rafalovitch 
>:
>
>I think this solution will destroy intra-query ranking. So all results in
>q0 come before q1 but would be random within q0 results.
>
>Would instead just a bunch of boost queries with different weights
>(additive probably) be a beter way to introduce stratification?
>
>Regards,
>   Alex
>
>On Thu, Jun 7, 2018, 13:19 Emir Arnautović, < emir.arnauto...@sematext.com >
>wrote:
>
>> Hi Robert,
>> If I get your requirement right, you can solve it with following:
>> (q0)^=100 OR (q1)^=90….
>>
>> Assuming there are no overlaps - otherwise, one matching multiple
>> conditions can change the ordering.
>>
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training -  http://sematext.com/
>>
>>
>>
>> > On 7 Jun 2018, at 11:53, Robert K. < wk.rk.sk...@mail.ru.INVALID > wrote:
>> >
>> > Hello,
>> >
>> > I am investigating the following use case.
>> >
>> > Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a
>> boolean query using 'SHOULD'-clauses.
>> > The requirement for the hits sorting is that the results of q_0 precede
>> the results of q_1, the results of q_1 precede the
>> > results of q_2 an so on. If a hit occurs in the results of more then one
>> query, then we should see it only once in the results
>> > of the query with the smallest index.
>> >
>> > I have searched for some solutions but didn't find anything useful so
>> far.
>> >
>> > I have considered following approaches:
>> >
>> > 1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ...
>> >
>> > While possible, seems to have a potential negative impact on performance
>> due to multiple evaluations on the same queries.
>> > I didn't do any measurements, though. It is technically possible to
>> optimize the execution of this query to evaluate the subqueries
>> > q_i only once, but I don't know, whether this kind of optimizations is
>> implemented in the current Lucene/Solr. (?)
>> >
>> > 2. Implement CustomScoreQuery. General idea: Take a list of queries and
>> execute them in the context of a BooleanQuery mapping
>> > the scores of the corresponding subqueries to disjunct score ranges,
>> like q_n -> [0,1), q_(n-1) -> [1,2) and so on.
>> >
>> > Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded
>> approach. Still I didn't see any obvious solution
>> > how I can use FunctionQuery to implement the idea. Is it possible,
>> should I dive in and try to do it with FunctionQuery.
>> >
>> > 3. Assuming there is some possibility to solve the task with the
>> FunctionQuery (or anything within the out-of-the-box Solr). My questions
>> > are: Is there any solution without having to write our own extension to
>> Solr? Using only what is delivered in the standard distribution of Solr?
>> >
>> >
>> > Note: In the past we solved the problem within our legacy application
>> with a modified BooleanQuery/BooleanScorer. We could migrate
>> > (=rewrite) this extension to the current Solr/Lucene, but it may be not
>> the best option, so I am exploring all the other possibilities.
>> >
>> > Thank you all & Best regards,
>> >
>> > Robert
>>
>>





Re: Sort hits in the order of subqueries

2018-06-07 Thread Alexandre Rafalovitch
I think this solution will destroy intra-query ranking. So all results in
q0 come before q1 but would be random within q0 results.

Would instead just a bunch of boost queries with different weights
(additive probably) be a beter way to introduce stratification?

Regards,
   Alex

On Thu, Jun 7, 2018, 13:19 Emir Arnautović, 
wrote:

> Hi Robert,
> If I get your requirement right, you can solve it with following:
> (q0)^=100 OR (q1)^=90….
>
> Assuming there are no overlaps - otherwise, one matching multiple
> conditions can change the ordering.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 7 Jun 2018, at 11:53, Robert K.  wrote:
> >
> > Hello,
> >
> > I am investigating the following use case.
> >
> > Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a
> boolean query using 'SHOULD'-clauses.
> > The requirement for the hits sorting is that the results of q_0 precede
> the results of q_1, the results of q_1 precede the
> > results of q_2 an so on. If a hit occurs in the results of more then one
> query, then we should see it only once in the results
> > of the query with the smallest index.
> >
> > I have searched for some solutions but didn't find anything useful so
> far.
> >
> > I have considered following approaches:
> >
> > 1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ...
> >
> > While possible, seems to have a potential negative impact on performance
> due to multiple evaluations on the same queries.
> > I didn't do any measurements, though. It is technically possible to
> optimize the execution of this query to evaluate the subqueries
> > q_i only once, but I don't know, whether this kind of optimizations is
> implemented in the current Lucene/Solr. (?)
> >
> > 2. Implement CustomScoreQuery. General idea: Take a list of queries and
> execute them in the context of a BooleanQuery mapping
> > the scores of the corresponding subqueries to disjunct score ranges,
> like q_n -> [0,1), q_(n-1) -> [1,2) and so on.
> >
> > Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded
> approach. Still I didn't see any obvious solution
> > how I can use FunctionQuery to implement the idea. Is it possible,
> should I dive in and try to do it with FunctionQuery.
> >
> > 3. Assuming there is some possibility to solve the task with the
> FunctionQuery (or anything within the out-of-the-box Solr). My questions
> > are: Is there any solution without having to write our own extension to
> Solr? Using only what is delivered in the standard distribution of Solr?
> >
> >
> > Note: In the past we solved the problem within our legacy application
> with a modified BooleanQuery/BooleanScorer. We could migrate
> > (=rewrite) this extension to the current Solr/Lucene, but it may be not
> the best option, so I am exploring all the other possibilities.
> >
> > Thank you all & Best regards,
> >
> > Robert
>
>


Re: Sort hits in the order of subqueries

2018-06-07 Thread Emir Arnautović
Hi Robert,
If I get your requirement right, you can solve it with following:
(q0)^=100 OR (q1)^=90….

Assuming there are no overlaps - otherwise, one matching multiple conditions 
can change the ordering.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 7 Jun 2018, at 11:53, Robert K.  wrote:
> 
> Hello,
> 
> I am investigating the following use case.
> 
> Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a 
> boolean query using 'SHOULD'-clauses.
> The requirement for the hits sorting is that the results of q_0 precede the 
> results of q_1, the results of q_1 precede the
> results of q_2 an so on. If a hit occurs in the results of more then one 
> query, then we should see it only once in the results
> of the query with the smallest index.
> 
> I have searched for some solutions but didn't find anything useful so far.
> 
> I have considered following approaches:
> 
> 1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ...
> 
> While possible, seems to have a potential negative impact on performance due 
> to multiple evaluations on the same queries.
> I didn't do any measurements, though. It is technically possible to optimize 
> the execution of this query to evaluate the subqueries
> q_i only once, but I don't know, whether this kind of optimizations is 
> implemented in the current Lucene/Solr. (?)
> 
> 2. Implement CustomScoreQuery. General idea: Take a list of queries and 
> execute them in the context of a BooleanQuery mapping
> the scores of the corresponding subqueries to disjunct score ranges, like q_n 
> -> [0,1), q_(n-1) -> [1,2) and so on.
> 
> Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded 
> approach. Still I didn't see any obvious solution
> how I can use FunctionQuery to implement the idea. Is it possible, should I 
> dive in and try to do it with FunctionQuery.
> 
> 3. Assuming there is some possibility to solve the task with the 
> FunctionQuery (or anything within the out-of-the-box Solr). My questions
> are: Is there any solution without having to write our own extension to Solr? 
> Using only what is delivered in the standard distribution of Solr?
> 
> 
> Note: In the past we solved the problem within our legacy application with a 
> modified BooleanQuery/BooleanScorer. We could migrate
> (=rewrite) this extension to the current Solr/Lucene, but it may be not the 
> best option, so I am exploring all the other possibilities.
> 
> Thank you all & Best regards,
> 
> Robert



Sort hits in the order of subqueries

2018-06-07 Thread Robert K .
Hello,

I am investigating the following use case.

Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a 
boolean query using 'SHOULD'-clauses.
The requirement for the hits sorting is that the results of q_0 precede the 
results of q_1, the results of q_1 precede the
results of q_2 an so on. If a hit occurs in the results of more then one query, 
then we should see it only once in the results
of the query with the smallest index.

I have searched for some solutions but didn't find anything useful so far.

I have considered following approaches:

1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ...

While possible, seems to have a potential negative impact on performance due to 
multiple evaluations on the same queries.
I didn't do any measurements, though. It is technically possible to optimize 
the execution of this query to evaluate the subqueries
q_i only once, but I don't know, whether this kind of optimizations is 
implemented in the current Lucene/Solr. (?)

2. Implement CustomScoreQuery. General idea: Take a list of queries and execute 
them in the context of a BooleanQuery mapping
the scores of the corresponding subqueries to disjunct score ranges, like q_n 
-> [0,1), q_(n-1) -> [1,2) and so on.

Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded 
approach. Still I didn't see any obvious solution
how I can use FunctionQuery to implement the idea. Is it possible, should I 
dive in and try to do it with FunctionQuery.

3. Assuming there is some possibility to solve the task with the FunctionQuery 
(or anything within the out-of-the-box Solr). My questions
are: Is there any solution without having to write our own extension to Solr? 
Using only what is delivered in the standard distribution of Solr?


Note: In the past we solved the problem within our legacy application with a 
modified BooleanQuery/BooleanScorer. We could migrate
(=rewrite) this extension to the current Solr/Lucene, but it may be not the 
best option, so I am exploring all the other possibilities.

Thank you all & Best regards,

Robert