Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Shawn Heisey Sun, 14 Oct 2018 08:47:37 -0700

On 10/14/2018 6:32 AM, yasoobhaider wrote:

Memory Analyzer output:


One instance of "org.apache.solr.uninverting.FieldCacheImpl" loaded by
"org.eclipse.jetty.webapp.WebAppClassLoader @ 0x7f60f7b38658" occupies
61,234,712,560 (91.86%) bytes. The memory is accumulated in one instance of
"java.util.HashMap$Node[]" loaded by "<system class loader>".

<snip>

But I also noticed that the fieldcache entries on solr UI have the same
entries for all collections on that solr instance.

Ques 1. Is the field cache reset on commit? If so, is it reset when any of
the collections are committed? Or it is not reset at all and I'm missing
something here?

ALL caches are invalidated when a new searcher is opened. Some of thecaches support autowarming. The field cache isn't one of them. ThedocumentCache also cannot be warmed ... but warming queryResultCachewill populate documentCache.

Ques 2. Is there a way to reset/delete this cache every x minutes (the
current autocommit duration) irrespective of whether documents were added or
not?

Not that I know of. Opening a new searcher is generally required toclear out Solr's caches. Opening a new searcher requires a change tothe index and a commit. One thing you could do is have your indexingsoftware insert/update a dummy document (with a special value in theuniqueKey field and all non-required fields missing) on a regular basis.

Other than this, I think the reason for huge heap usage (as others have
pointed out) is that we are not using docValues for any of the fields, and
we use a large number of fields in sorting functions (between 15-20 over all
queries combined). As the next step on this front, I will add new fields
with docvalues true and reindex the entire collection. Hopefully that will
help.

Yes, if you facet, do group queries, or sort on fields that do not havedocValues, then Solr must build an uninverted index to do those things,and I think it uses the field cache for that. The docValues structure isthe same info as an uninverted index, so Solr can just read it directly,rather than generating it and using heap memory.

Adding docValues will make your index bigger. For fields with highcardinality, the increase could be large.

We use quite a few dynamic fields in sorting. There is no mention of using
docvalues with dynamic fields in the official documentation
(https://lucene.apache.org/solr/guide/6_6/docvalues.html).

Ques 3. Do docvalues work with dynamic fields or not? If they do, anything
in particular that I should look out for, like the cardinality of the field
(ie number of different x's in example_dynamic_field_x)?

The ONLY functional difference between a dynamic field and a "standard"field is that dynamic fields are not explicitly named in the schema, butuse wildcard naming. This difference is only at the Solr level. At theLucene level, there is NO difference at all. A dynamic field can haveany of the same attributes as other fields, including docValues.

Shawn, I've uploaded my configuration files for the two collections here:
https://ufile.io/u6oe0 (tar -zxvf c1a_confs.tar.gz to decompress)

c1 collection is ~10GB when optimized, and has 2.5 million documents.
ca collection is ~2GB when optimized, and has 9.5 million documents.

Please let me know if you think there is something amiss in the
configuration that I should fix.

I think your autowarm counts, particularly on the filterCache, areprobably too large. But if commits that open a new searcher arehappening quickly, you probably won't need to fiddle with that. You cancheck the admin UI "plugins/stats" area to see how long it takes to openthe last searcher, and the caches have information about how long ittook to warm.

You have increased ramBufferSizeMB. The default value is 100, and formost indexes, increasing it just consumes memory without making thingswork any faster. Increasing it a little bit might make a difference onyour c1 collection, since its documents are a bit larger than what I'dcall typical.

Here's what I would recommend you use for autoCommit (removing maxDocs,lowering maxTime, and setting openSearcher to false):


    <autoCommit>
      <maxTime>60000</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>

Opening a new searcher on autoCommit isn't necessary if you configureautoSoftCommit, and disabling it will make those commits faster. You*do* want autoCommit configured with a fairly short interval, so don'tremove it. Not configuring maxDocs makes the operation more predictable.

For autoSoftCommit, 30 seconds is pretty aggressive, but as long ascommits are happening very quickly, shouldn't be a problem. If commitsare taking more than a few seconds, I would increase the interval. TheautoSoftCommit interval in the config one of your collections is set toan hour ... if you're not overriding that with thesolr.autoSoftCommit.maxTime property, you could decrease that one.


Thanks,
Shawn

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Reply via email to