Hi all,
I'm converting my legacy facets to JSON facets and am seeing much better
performance, especially with high cardinality facet fields. However, the one
issue I can't seem to resolve is excessive memory usage (and OOM errors) when
trying to simulate the effect of "group.facet" to sort facets according to a
grouping field.
My situation, slightly simplified is:
Solr 4.6.1
* Doc set: ~200,000 docs
* Grouping by item_id, an indexed, stored, single value string field with
~50,000 unique values, ~4 docs per item
* Faceting by person_id, an indexed, stored, multi-value string field with
~50,000 values (w/ a very skewed distribution)
* No docValues fields
Each document here is a description of an item, and there are several
descriptions per item in multiple languages.
With legacy facets I use group.field=item_id and group.facet=true, which gives
me facet counts with the number of items rather than descriptions, and
correctly sorted by descending item count.
With JSON facets I'm doing the equivalent like so:
&json.facet={
"people": {
"type": "terms",
"field": "person_id",
"facet": {
"grouped_count": "unique(item_id)"
},
"sort": "grouped_count desc"
}
}
This works, and is somewhat faster than legacy faceting, but it also produces a
massive spike in memory usage when (and only when) the sort parameter is set to
the aggregate field. A server that runs happily with a 512MB heap OOMs unless I
give it a 4GB heap. With sort set to (the default) "count desc" there is no
memory usage spike.
I would be curious if anyone has experienced this kind of memory usage when
sorting JSON facets by stats and if there’s anything I can do to mitigate it.
I’ve tried reindexing with docValues enabled on the relevant fields and it
seems to make no difference in this respect.
Many thanks,
~Mike