Re: Unable to get the results in ordered manner from SOLR

2018-02-10 Thread Erick Erickson
In general, trying to impose a strict ordering by this kind of
boosting is a losing proposition. Any change to the underlying index
will alter the term and doc stats and your particular use-case won't
work at some point.

So I'd just go with a "constant score boost", see:
https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html.
Rather than
"254eead5-0c17-11e8-a4e0-067beceeb88f\"^14501 OR
"f05a7d41-0b19-11e8-8f16-067beceeb88f"^14001

"254eead5-0c17-11e8-a4e0-067beceeb88f\"^=100 OR
"f05a7d41-0b19-11e8-8f16-067beceeb88f"^=99
etc,

note ^= rather than just ^

Best,
Erick

On Fri, Feb 9, 2018 at 10:41 PM, Anupam Bhattacharya
 wrote:
> I further tried to understand the results I see since the idf value is
> smaller higher boost is not helping. Can we ensure that idf value should
> not impact the ordering.
>
> 
> 73690.7 = sum of:
>   73690.7 = weight(pageID:5d368d4f-0c16-11e8-a4e0-067beceeb88f in 36)
> [SchemaSimilarity], result of:
> 73690.7 = score(doc=36,freq=1.0 = termFreq=1.0
> ), product of:
>   13501.0 = boost
>   5.458166 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
> 1.0 = docFreq
> 351.0 = docCount
>   1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.0 = parameter b (norms omitted for field)
> 
> 
> 69267.71 = sum of:
>   69267.71 = weight(pageID:f05a7d41-0b19-11e8-8f16-067beceeb88f in 2)
> [SchemaSimilarity], result of:
> 69267.71 = score(doc=2,freq=1.0 = termFreq=1.0
> ), product of:
>   14001.0 = boost
>   4.9473405 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
> 2.0 = docFreq
> 351.0 = docCount
>   1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from:
> 1.0 = termFreq=1.0
> 1.2 = parameter k1
> 0.0 = parameter b (norms omitted for field)
> 
>
>
> On Sat, Feb 10, 2018 at 10:58 AM, Anupam Bhattacharya 
> wrote:
>
>> For me the SOLR sort was working fine with boost increments of 1. But
>> after some recent changes to do atomic updates to existing SOLR document
>> now when I try to get ordered results I get it in improper order. I have
>> tried to keep large difference in the boosting values between individual
>> terms but its very inconsistent.
>>
>> Below is the query with debugQuery=on.
>>
>> {
>>   "responseHeader":{
>> "status":0,
>> "QTime":19,
>> "params":{
>>   "q":"pageID:(\"254eead5-0c17-11e8-a4e0-067beceeb88f\"^14501 OR
>> \"f05a7d41-0b19-11e8-8f16-067beceeb88f\"^14001 OR
>> \"5d368d4f-0c16-11e8-a4e0-067beceeb88f\"^13501 OR
>> \"ebcdc438-0702-11e8-9ea1-068a0dc6bc8f\"^13001 OR
>> \"f\r\n72e001a-0c1a-11e8-a4e0-067beceeb88f\"^12501 OR
>> \"dd73b9b6-0c17-11e8-a4e0-067beceeb88f\"^12001 OR
>> \"bbd40922-0c8c-11e8-9b48-067beceeb88f\"^11501 OR
>> \"d00791e9-0c1a-11e8-a4e0-067beceeb88f\"^11001 OR
>> \"fbe95d\r\nb9-0bf8-11e8-a5c6-067beceeb88f\"^10501 OR
>> \"8778980f-0d8c-11e8-b7a8-067beceeb88f\"^10001 OR
>> \"e48e4073-02a4-11e8-a9a6-068a0dc6bc8f\"^9501 OR
>> \"3b91f3ab-00ed-11e8-ac1d-067beceeb88f\"^9001 OR
>> \"e878f0a7-0d98\r\n-11e8-916e-067beceeb88f\"^8501 OR
>> \"beca0178-0bf6-11e8-a5c6-067beceeb88f\"^8001 OR
>> \"864a7dee-01e4-11e8-80b0-067beceeb88f\"^7501 OR
>> \"70b34ec1-ff38-11e7-b9c9-067beceeb88f\"^7001 OR
>> \"ddd51847-0d8d-11e8-b7a\r\n8-067beceeb88f\"^6501 OR
>> \"ad8d954d-000a-11e8-92a3-067beceeb88f\"^6001 OR
>> \"93f24713-0ae4-11e8-866d-068a0dc6bc8f\"^5501 OR
>> \"87033dd0-ff37-11e7-b9c9-067beceeb88f\"^5001 OR
>> \"65b079c8-0c19-11e8-a4e0-067bece\r\neb88f\"^4501 OR
>> \"9c5f0007-0c18-11e8-a4e0-067beceeb88f\"^4001 OR
>> \"0a5796d5-0d7f-11e8-bf64-067beceeb88f\"^3501 OR
>> \"b3800b06-0104-11e8-99f2-067beceeb88f\"^3001 OR
>> \"9ed6136c-058c-11e8-8951-067beceeb88f\"^25\r\n01 OR
>> \"af34b7a5-0102-11e8-99f2-067beceeb88f\"^2001 OR
>> \"9f30cfa1-fe59-11e7-adca-067beceeb88f\"^1501 OR
>> \"91a40514-ff3e-11e7-b9c9-067beceeb88f\"^1001 OR
>> \"1aed96f2-0b44-11e8-94a2-067beceeb88f\"^501 OR
>> \"51d\r\n42489-0049-11e8-8628-067beceeb88f\"^1)\r\n",
>>   "sort=score desc":"",
>>   "fl":"pageID,score",
>>   "rows":"100",
>>   "debugQuery":"on"}},
>>   "response":{"numFound":24,"start":0,"maxScore":79148.87,"docs":[
>>   {
>> "pageID":"254eead5-0c17-11e8-a4e0-067beceeb88f",
>> "score":79148.87},
>>   {
>> "pageID":"5d368d4f-0c16-11e8-a4e0-067beceeb88f",
>> "score":73690.7},
>>   {
>> "pageID":"f05a7d41-0b19-11e8-8f16-067beceeb88f",
>> "score":69267.71},
>>   {
>> "pageID":"dd73b9b6-0c17-11e8-a4e0-067beceeb88f",
>> "score":65503.45},
>>   {
>> "pageID":"ebcdc438-0702-11e8-9ea1-068a0dc6bc8f",
>> "score":64320.375},
>>   {
>> "pageID":"bbd40922-0c8c-11e8-9b48-067beceeb88f",
>> "score":62774.367},
>>   {
>> 

Re: Unable to get the results in ordered manner from SOLR

2018-02-09 Thread Anupam Bhattacharya
I further tried to understand the results I see since the idf value is
smaller higher boost is not helping. Can we ensure that idf value should
not impact the ordering.


73690.7 = sum of:
  73690.7 = weight(pageID:5d368d4f-0c16-11e8-a4e0-067beceeb88f in 36)
[SchemaSimilarity], result of:
73690.7 = score(doc=36,freq=1.0 = termFreq=1.0
), product of:
  13501.0 = boost
  5.458166 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:
1.0 = docFreq
351.0 = docCount
  1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.0 = parameter b (norms omitted for field)


69267.71 = sum of:
  69267.71 = weight(pageID:f05a7d41-0b19-11e8-8f16-067beceeb88f in 2)
[SchemaSimilarity], result of:
69267.71 = score(doc=2,freq=1.0 = termFreq=1.0
), product of:
  14001.0 = boost
  4.9473405 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:
2.0 = docFreq
351.0 = docCount
  1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.0 = parameter b (norms omitted for field)



On Sat, Feb 10, 2018 at 10:58 AM, Anupam Bhattacharya 
wrote:

> For me the SOLR sort was working fine with boost increments of 1. But
> after some recent changes to do atomic updates to existing SOLR document
> now when I try to get ordered results I get it in improper order. I have
> tried to keep large difference in the boosting values between individual
> terms but its very inconsistent.
>
> Below is the query with debugQuery=on.
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":19,
> "params":{
>   "q":"pageID:(\"254eead5-0c17-11e8-a4e0-067beceeb88f\"^14501 OR
> \"f05a7d41-0b19-11e8-8f16-067beceeb88f\"^14001 OR
> \"5d368d4f-0c16-11e8-a4e0-067beceeb88f\"^13501 OR
> \"ebcdc438-0702-11e8-9ea1-068a0dc6bc8f\"^13001 OR
> \"f\r\n72e001a-0c1a-11e8-a4e0-067beceeb88f\"^12501 OR
> \"dd73b9b6-0c17-11e8-a4e0-067beceeb88f\"^12001 OR
> \"bbd40922-0c8c-11e8-9b48-067beceeb88f\"^11501 OR
> \"d00791e9-0c1a-11e8-a4e0-067beceeb88f\"^11001 OR
> \"fbe95d\r\nb9-0bf8-11e8-a5c6-067beceeb88f\"^10501 OR
> \"8778980f-0d8c-11e8-b7a8-067beceeb88f\"^10001 OR
> \"e48e4073-02a4-11e8-a9a6-068a0dc6bc8f\"^9501 OR
> \"3b91f3ab-00ed-11e8-ac1d-067beceeb88f\"^9001 OR
> \"e878f0a7-0d98\r\n-11e8-916e-067beceeb88f\"^8501 OR
> \"beca0178-0bf6-11e8-a5c6-067beceeb88f\"^8001 OR
> \"864a7dee-01e4-11e8-80b0-067beceeb88f\"^7501 OR
> \"70b34ec1-ff38-11e7-b9c9-067beceeb88f\"^7001 OR
> \"ddd51847-0d8d-11e8-b7a\r\n8-067beceeb88f\"^6501 OR
> \"ad8d954d-000a-11e8-92a3-067beceeb88f\"^6001 OR
> \"93f24713-0ae4-11e8-866d-068a0dc6bc8f\"^5501 OR
> \"87033dd0-ff37-11e7-b9c9-067beceeb88f\"^5001 OR
> \"65b079c8-0c19-11e8-a4e0-067bece\r\neb88f\"^4501 OR
> \"9c5f0007-0c18-11e8-a4e0-067beceeb88f\"^4001 OR
> \"0a5796d5-0d7f-11e8-bf64-067beceeb88f\"^3501 OR
> \"b3800b06-0104-11e8-99f2-067beceeb88f\"^3001 OR
> \"9ed6136c-058c-11e8-8951-067beceeb88f\"^25\r\n01 OR
> \"af34b7a5-0102-11e8-99f2-067beceeb88f\"^2001 OR
> \"9f30cfa1-fe59-11e7-adca-067beceeb88f\"^1501 OR
> \"91a40514-ff3e-11e7-b9c9-067beceeb88f\"^1001 OR
> \"1aed96f2-0b44-11e8-94a2-067beceeb88f\"^501 OR
> \"51d\r\n42489-0049-11e8-8628-067beceeb88f\"^1)\r\n",
>   "sort=score desc":"",
>   "fl":"pageID,score",
>   "rows":"100",
>   "debugQuery":"on"}},
>   "response":{"numFound":24,"start":0,"maxScore":79148.87,"docs":[
>   {
> "pageID":"254eead5-0c17-11e8-a4e0-067beceeb88f",
> "score":79148.87},
>   {
> "pageID":"5d368d4f-0c16-11e8-a4e0-067beceeb88f",
> "score":73690.7},
>   {
> "pageID":"f05a7d41-0b19-11e8-8f16-067beceeb88f",
> "score":69267.71},
>   {
> "pageID":"dd73b9b6-0c17-11e8-a4e0-067beceeb88f",
> "score":65503.45},
>   {
> "pageID":"ebcdc438-0702-11e8-9ea1-068a0dc6bc8f",
> "score":64320.375},
>   {
> "pageID":"bbd40922-0c8c-11e8-9b48-067beceeb88f",
> "score":62774.367},
>   {
> "pageID":"d00791e9-0c1a-11e8-a4e0-067beceeb88f",
> "score":60045.28},
>   {
> "pageID":"8778980f-0d8c-11e8-b7a8-067beceeb88f",
> "score":54587.12},
>   {
> "pageID":"e48e4073-02a4-11e8-a9a6-068a0dc6bc8f",
> "score":47004.684},
>   {
> "pageID":"3b91f3ab-00ed-11e8-ac1d-067beceeb88f",
> "score":44531.01},
>   {
> "pageID":"beca0178-0bf6-11e8-a5c6-067beceeb88f",
> "score":43670.79},
>   {
> "pageID":"864a7dee-01e4-11e8-80b0-067beceeb88f",
> "score":37110.0},
>   {
> "pageID":"70b34ec1-ff38-11e7-b9c9-067beceeb88f",
> "score":34636.332},
>   {
> "pageID":"ad8d954d-000a-11e8-92a3-067beceeb88f",
> "score":32754.455},
>   {
> "pageID":"87033dd0-ff37-11e7-b9c9-067beceeb88f",
>