Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-08-28 Thread Spyros Kapnissis
Hi Dmitry,

No, we were not able to solve the sorting/re-ranking issue. In the end we
migrated the custom sorting formula to using the 'q' param instead of
'sort' to get back the results sorted by score as expected.

That mostly solved our issues with inconsistent Solr scores. Maybe sorting
and re-ranking are conflicting concepts.

Hope this helps.


On Fri, Aug 28, 2020 at 4:28 PM Jörn Franke  wrote:

> Maybe this can help you?
>
> https://lucene.apache.org/solr/guide/7_5/distributed-requests.html#configuring-statscache-distributed-idf
>
> On Mon, May 11, 2020 at 9:24 AM Spyros Kapnissis  wrote:
>
> > HI all,
> >
> > On our current master/slave setup (no cloud), we use a a custom sorting
> > function to get the first pass results (using the sort param), and then
> we
> > use LTR for re-ranking. This works fine, i.e. re-ranking is applied on
> the
> > topN, after sorting has completed and the order is correct.
> >
> > However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> > shards, this does not seem to work as expected. To my understanding, Solr
> > collects the reranked results from the shards back on a single node to
> > merge them, and then tries to re-apply sorting.
> >
> > We would expect the results to at least follow the sorting formula, even
> if
> > this is not what we want. But this still not even the case, as the
> > combination of the two (sorting + reranking) results in erratic ordering.
> >
> > Example result, where $sort_score is the sorting formula output, and
> score
> > is the LTR re-ranked output:
> >
> > {"id": "152",
> > "$sort_score": 17.38543,
> > "score": 0.22140852
> > },
> > {"id": "2016",
> > "$sort_score": 14.612957,
> > "score": 0.19214153
> > },
> > { "id": "1523",
> > "$sort_score": 14.4093275,
> > "score": 0.26738763
> > },
> > { "id": "6704",
> > "$sort_score": 13.956842,
> > "score": 0.17357588
> > },
> > { "id": "6512",
> > "$sort_score": 14.43907,
> > "score": 0.11575622
> > },
> >
> > We also tried with other simple re-rank queries apart from LTR, and the
> > issue persisted.
> >
> > Could someone please help troubleshoot? Ideally, we would want to have
> the
> > re-rank results merged on the single node, and not re-apply sorting.
> >
> > Thank you!
> >
>


Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-05-11 Thread Spyros Kapnissis
HI all,

On our current master/slave setup (no cloud), we use a a custom sorting
function to get the first pass results (using the sort param), and then we
use LTR for re-ranking. This works fine, i.e. re-ranking is applied on the
topN, after sorting has completed and the order is correct.

However, as we are migrating on SolrCloud (version 7.3.1) with multiple
shards, this does not seem to work as expected. To my understanding, Solr
collects the reranked results from the shards back on a single node to
merge them, and then tries to re-apply sorting.

We would expect the results to at least follow the sorting formula, even if
this is not what we want. But this still not even the case, as the
combination of the two (sorting + reranking) results in erratic ordering.

Example result, where $sort_score is the sorting formula output, and score
is the LTR re-ranked output:

{"id": "152",
"$sort_score": 17.38543,
"score": 0.22140852
},
{"id": "2016",
"$sort_score": 14.612957,
"score": 0.19214153
},
{ "id": "1523",
"$sort_score": 14.4093275,
"score": 0.26738763
},
{ "id": "6704",
"$sort_score": 13.956842,
"score": 0.17357588
},
{ "id": "6512",
"$sort_score": 14.43907,
"score": 0.11575622
},

We also tried with other simple re-rank queries apart from LTR, and the
issue persisted.

Could someone please help troubleshoot? Ideally, we would want to have the
re-rank results merged on the single node, and not re-apply sorting.

Thank you!


Filter results based on their number of terms, relative to the search query

2013-08-21 Thread Spyros Kapnissis
Hi,

We have an index of several small expressions, let's say 4-20 words on average. 
I have a requirement to search for approximate results only, relevant to the 
search query. 

For example, when someone searches for (+a +b +c), we would like to return only 
these expressions that contain all terms, plus one irrelevant term at most (eg. 
a,b,c,d), and filter out any results that are longer.

Any ideas on this? One thought I had is that maybe we could use a filter 
function query using Similarity's queryNorm along with the coord factor. Is 
this even possible?

Thanks,
Spyros


Match only documents which contain all query terms

2011-07-01 Thread Spyros Kapnissis
Hello to all,


Is it possible that I can make solr return only documents that contain all or 
most of my query terms for a specific field? Or will I need some 
post-processing on the results?

So, for example, if I search for (a b c), I would like the following documents 
returned:

a b c
a' c b (where a' is a stem for example)

but not 
x y a b c z

Thanks,
Spyros

Synonyms valid only in specific categories of data

2011-06-01 Thread Spyros Kapnissis
Hello to all,


I have a collection of text phrases in more than 20 languages that I'm indexing 
in solr. Each phrase belongs to one of about 30 different phrase categories. I 
have specified different fields for each language and added a synonym filter at 
query time. I would however like the synonym filter to take into account the 
category as well. So, a specific synonym should be valid and used only in one 
or 
more categories per language. (the category is indexed in another field).  

Is this somehow possible in the current SynonymFilterFactory implementation? 

Hope it makes sense. 

Thank you,
Spyros


Re: Synonyms valid only in specific categories of data

2011-06-01 Thread Spyros Kapnissis
Yes that would probably be a lot of fields.. I guess a way would be to extend 
the SynonymFilter and change the format of the synonyms.txt file to take the 
categories into account. 


Thanks again for your answer.




From: lee carroll lee.a.carr...@googlemail.com
To: solr-user@lucene.apache.org
Sent: Wednesday, June 1, 2011 12:23 PM
Subject: Re: Synonyms valid only in specific categories of data

I don't think you can assign a synonyms file dynamically to a field.
you would need to create multiple fields for each lang / cat phrases
and have their own synonyms file referenced for each field. that would
be a lot of fields.



On 1 June 2011 09:59, Spyros Kapnissis ska...@yahoo.com wrote:
 Hello to all,


 I have a collection of text phrases in more than 20 languages that I'm 
 indexing
 in solr. Each phrase belongs to one of about 30 different phrase categories. I
 have specified different fields for each language and added a synonym filter 
 at
 query time. I would however like the synonym filter to take into account the
 category as well. So, a specific synonym should be valid and used only in one 
 or
 more categories per language. (the category is indexed in another field).

 Is this somehow possible in the current SynonymFilterFactory implementation?

 Hope it makes sense.

 Thank you,
 Spyros