> As far as I can see, JSON Facets does not have this delayed mapping mechanism: Every increment requires a call to the segment->global-ordinal map. With a large field this map cannot be in the fast caches. Combine this with a gazillion references and it makes sense that JSON Facets is slower in this scenario. A factor 20 sounds like way too much though. I would have expected maybe 2.
I'm not sure if it is the really large content that causes this. I have found some other fields, if I indexed them as String and the length is more than 5 different words, the JSON facet is slightly slower than Legacy facet, but that is within your expected factor of 2. (Legacy Facet QTime:10, JSON Facet QTime:25) The content is the only one with a factor of more than 20, as some of the documents indexed is more than 200 pages long. So should I say that in this case of doing faceting on large content field, using Legacy Facet is better than using the newer JSON Facet? But for other shorter field, using JSON Facet would be better? Regards, Edwin On 3 September 2015 at 02:44, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > Yonik Seeley <ysee...@gmail.com> wrote: > > Hmmm, well something is really wrong for this orders of magnitude > > difference. I've never seen anything like that and we should > > definitely try to get to the bottom of it. > > This might be a wild goose chase, but... > > Zheng states it is a text field with the content of fairly large > documents. This means a high amount of unique values and a gazillion > references from documents to those values. > > When incrementing counters for String faceting, segment ordinal -> index > ordinal mapping takes place. Legacy facets has a mechanism where temporary > segment-specific counters are used. These are updated directly with the > segment ordinals and the mapping to global ordinals is performed after the > counting. > > As far as I can see, JSON Facets does not have this delayed mapping > mechanism: Every increment requires a call to the segment->global-ordinal > map. With a large field this map cannot be in the fast caches. Combine this > with a gazillion references and it makes sense that JSON Facets is slower > in this scenario. A factor 20 sounds like way too much though. I would have > expected maybe 2. > > - Toke Eskildsen >