Markus Jelsma <markus.jel...@openindex.io> wrote: > I tried the overrequest ratio/count and set them to 1.0/0 . Odd enough, > with these settings high facet.limit and extremely high facet.limit are > both up to twice as slow as with 1.5/10 settings.
Not sure if it is the right explanation for your "extremely high facet.limit"-case, but here goes... The two phases in distributed simple String faceting in Solr are very different from each other: The first phase allocates a counter structure, iterates the query hits and increments the counters, then extracts the top-X facet terms and returns them. The second phase receives a list of facet terms to count. The terms are those that the shard did not deliver in phase 1. An example might help here: For phase 1, shard 1 returns [a:5 b:3 c:3], while shard 2 returns [d:2 e:2 c:1]. This is merged to [a:5 c:4 b:3]. Since shard 2 did not return counts for the terms a and b, these counts are requested from shard 2 in phase 2. In the current implementation, the term counts in the second phase are calculated in the same way as enum faceting: Basically one tiny search for each term with the query facetfield:term. This does not scale well, so it does not take many terms before phase 2 gets _slower_ than phase 1 (you can see for yourself in the solr.log). So we want to keep the number of phase 2 term-counts down, even if it means that phase 1 gets a bit slower. This is where over-requesting comes into play: The more you over-request, the slower phase 1 gets, but it also means that the chance of the merger having to ask for extra term-counts gets lower as they were probably returned in phase 1. I wrote a bit about the phenomena in https://sbdevel.wordpress.com/2014/09/11/even-sparse-faceting-is-limited/ - Toke Eskildsen