On 10/14/2018 6:32 AM, yasoobhaider wrote:
Memory Analyzer output:

One instance of "org.apache.solr.uninverting.FieldCacheImpl" loaded by
"org.eclipse.jetty.webapp.WebAppClassLoader @ 0x7f60f7b38658" occupies
61,234,712,560 (91.86%) bytes. The memory is accumulated in one instance of
"java.util.HashMap$Node[]" loaded by "<system class loader>".
<snip>
But I also noticed that the fieldcache entries on solr UI have the same
entries for all collections on that solr instance.

Ques 1. Is the field cache reset on commit? If so, is it reset when any of
the collections are committed? Or it is not reset at all and I'm missing
something here?

ALL caches are invalidated when a new searcher is opened. Some of the caches support autowarming.  The field cache isn't one of them.  The documentCache also cannot be warmed ... but warming queryResultCache will populate documentCache.

Ques 2. Is there a way to reset/delete this cache every x minutes (the
current autocommit duration) irrespective of whether documents were added or
not?

Not that I know of.  Opening a new searcher is generally required to clear out Solr's caches.  Opening a new searcher requires a change to the index and a commit.  One thing you could do is have your indexing software insert/update a dummy document (with a special value in the uniqueKey field and all non-required fields missing) on a regular basis.

Other than this, I think the reason for huge heap usage (as others have
pointed out) is that we are not using docValues for any of the fields, and
we use a large number of fields in sorting functions (between 15-20 over all
queries combined). As the next step on this front, I will add new fields
with docvalues true and reindex the entire collection. Hopefully that will
help.

Yes, if you facet, do group queries, or sort on fields that do not have docValues, then Solr must build an uninverted index to do those things, and I think it uses the field cache for that. The docValues structure is the same info as an uninverted index, so Solr can just read it directly, rather than generating it and using heap memory.

Adding docValues will make your index bigger.  For fields with high cardinality, the increase could be large.

We use quite a few dynamic fields in sorting. There is no mention of using
docvalues with dynamic fields in the official documentation
(https://lucene.apache.org/solr/guide/6_6/docvalues.html).

Ques 3. Do docvalues work with dynamic fields or not? If they do, anything
in particular that I should look out for, like the cardinality of the field
(ie number of different x's in example_dynamic_field_x)?

The ONLY functional difference between a dynamic field and a "standard" field is that dynamic fields are not explicitly named in the schema, but use wildcard naming.  This difference is only at the Solr level.  At the Lucene level, there is NO difference at all.  A dynamic field can have any of the same attributes as other fields, including docValues.

Shawn, I've uploaded my configuration files for the two collections here:
https://ufile.io/u6oe0 (tar -zxvf c1a_confs.tar.gz to decompress)

c1 collection is ~10GB when optimized, and has 2.5 million documents.
ca collection is ~2GB when optimized, and has 9.5 million documents.

Please let me know if you think there is something amiss in the
configuration that I should fix.

I think your autowarm counts, particularly on the filterCache, are probably too large.  But if commits that open a new searcher are happening quickly, you probably won't need to fiddle with that.  You can check the admin UI "plugins/stats" area to see how long it takes to open the last searcher, and the caches have information about how long it took to warm.

You have increased ramBufferSizeMB. The default value is 100, and for most indexes, increasing it just consumes memory without making things work any faster.  Increasing it a little bit might make a difference on your c1 collection, since its documents are a bit larger than what I'd call typical.

Here's what I would recommend you use for autoCommit (removing maxDocs, lowering maxTime, and setting openSearcher to false):

    <autoCommit>
      <maxTime>60000</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>

Opening a new searcher on autoCommit isn't necessary if you configure autoSoftCommit, and disabling it will make those commits faster.  You *do* want autoCommit configured with a fairly short interval, so don't remove it.  Not configuring maxDocs makes the operation more predictable.

For autoSoftCommit, 30 seconds is pretty aggressive, but as long as commits are happening very quickly, shouldn't be a problem.  If commits are taking more than a few seconds, I would increase the interval.  The autoSoftCommit interval in the config one of your collections is set to an hour ... if you're not overriding that with the solr.autoSoftCommit.maxTime property, you could decrease that one.

Thanks,
Shawn

Reply via email to