Hi

As part of a project using grouped results, we looked at using a sharded index. 
The first thing the team noted was some of our application tests failing which 
was due to the term distribution across the shards. However switching on 
ExactStatsCache didn't help. This was because the grouped results feature uses 
separate code paths for large parts of its functionality exact stats wasn't 
enabled. Attempting to resolve this uncovered a couple of other issues with 
debug explain plans not using exact stats either which results in the 
information being misleading, and all this it turned out was based on a minor 
problem with the exact stats not correctly distributing term frequencies in all 
cases (highly dependant upon your document distribution of course).

So I have registered three bugs (listed below) for these issues, in reverse 
order to the descriptions above as I went back through tackling the primary 
cause first and creating patches with test cases for each. Note I did these 
against the 5.x branch because whilst attempting to apply to the master I 
couldn't get the test case behaviours to work. After going back to 5.x to where 
I had originally worked through the fixes I finally determined the use of 
caching in the test case environment was the problem. I believe applying the 
changes to master based on where I was at a few months ago should be fairly 
straightforward, sadly I haven't had time to revisit this. Also because of the 
nature of the relationship between the issues the patches linked to the Jira 
issues are dependant upon the preceding issues patch (hopefully this isn't too 
much of an issue).

SOLR-9122 - ExactStatsCache doesn't share all stats
SOLR-1923 - Explain plans not using ExactStatsCache in debug mode
SOLR-1924 - Grouped Results does not support ExactStatsCache

It is worth noting that this will of course have subtle changes in behaviour, 
and potentially some performance overhead in some cases depending on how the 
features have been used.

Hopefully these changes will be accepted as-is but should any queries arise 
I'll attempt to answer as necessary.

Tony

Antony Scerri
Lead Architect, Elsevier


________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, 
Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in 
England and Wales.

Reply via email to