Markus Jelsma <markus.jel...@openindex.io> wrote:
> I tried the overrequest ratio/count and set them to 1.0/0 . Odd enough,
> with these settings high facet.limit and extremely high facet.limit are
> both up to twice as slow as with 1.5/10 settings.

Not sure if it is the right explanation for your "extremely high 
facet.limit"-case, but here goes...


The two phases in distributed simple String faceting in Solr are very different 
from each other:

The first phase allocates a counter structure, iterates the query hits and 
increments the counters, then extracts the top-X facet terms and returns them.

The second phase receives a list of facet terms to count. The terms are those 
that the shard did not deliver in phase 1. 
An example might help here: For phase 1, shard 1 returns [a:5 b:3 c:3], while 
shard 2 returns [d:2 e:2 c:1]. This is merged to [a:5 c:4 b:3]. Since shard 2 
did not return counts for the terms a and b, these counts are requested from 
shard 2 in phase 2.
In the current implementation, the term counts in the second phase are 
calculated in the same way as enum faceting: Basically one tiny search for each 
term with the query facetfield:term. This does not scale well, so it does not 
take many terms before phase 2 gets _slower_ than phase 1 (you can see for 
yourself in the solr.log). So we want to keep the number of phase 2 term-counts 
down, even if it means that phase 1 gets a bit slower.
This is where over-requesting comes into play: The more you over-request, the 
slower phase 1 gets, but it also means that the chance of the merger having to 
ask for extra term-counts gets lower as they were probably returned in phase 1.
I wrote a bit about the phenomena in 
https://sbdevel.wordpress.com/2014/09/11/even-sparse-faceting-is-limited/

- Toke Eskildsen

Reply via email to