I think you're on the wrong track here. Shawn is absolutely
right his statements about when the field cache is invalidated.

That said, the cure to your problem has nothing to do with
opening new searchers. Solr puts values in caches as it needs
them _and keeps them there_. In the sort/group/facet/function
case, it needs the values for field X so it uninverts the field and
puts it in the cache where it stays. The assumption is that if it's
used once, it'll be used again. Since the uninversion process is
expensive, the idea of building it over and over is not something
that's performant.

This problem will go away if you define these fields with
docValues="true" (and completely re-index into a new collection).
The index will get bigger true. But it's a case of
pay-me-now-or-pay-me-later. Essentially at index time, the
uninverted structure is built and serialized to disk. Thereafter,
it's in MMapDirectoyr (OS memory) space. Here's the go-to
blog about that:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

So the growth in the index size is roughly the amount of heap that
will _not_ be used for uninverting fields. By removing the structures
from the heap, the memory allocated to the JVM can get much smaller,
reducing GC pressure and the like.

There are no advantages and many disadvantagess to _not_ using docValues
for fields used for sorting/grouping/faceting.

I'm overstating the case a tiny bit, but for all practical purposes I
stand behind
it ;).

Best,
Erick
On Sun, Oct 14, 2018 at 11:46 AM Shawn Heisey <apa...@elyograg.org> wrote:
>
> On 10/14/2018 6:32 AM, yasoobhaider wrote:
> > Memory Analyzer output:
> >
> > One instance of "org.apache.solr.uninverting.FieldCacheImpl" loaded by
> > "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x7f60f7b38658" occupies
> > 61,234,712,560 (91.86%) bytes. The memory is accumulated in one instance of
> > "java.util.HashMap$Node[]" loaded by "<system class loader>".
> <snip>
> > But I also noticed that the fieldcache entries on solr UI have the same
> > entries for all collections on that solr instance.
> >
> > Ques 1. Is the field cache reset on commit? If so, is it reset when any of
> > the collections are committed? Or it is not reset at all and I'm missing
> > something here?
>
> ALL caches are invalidated when a new searcher is opened. Some of the
> caches support autowarming.  The field cache isn't one of them.  The
> documentCache also cannot be warmed ... but warming queryResultCache
> will populate documentCache.
>
> > Ques 2. Is there a way to reset/delete this cache every x minutes (the
> > current autocommit duration) irrespective of whether documents were added or
> > not?
>
> Not that I know of.  Opening a new searcher is generally required to
> clear out Solr's caches.  Opening a new searcher requires a change to
> the index and a commit.  One thing you could do is have your indexing
> software insert/update a dummy document (with a special value in the
> uniqueKey field and all non-required fields missing) on a regular basis.
>
> > Other than this, I think the reason for huge heap usage (as others have
> > pointed out) is that we are not using docValues for any of the fields, and
> > we use a large number of fields in sorting functions (between 15-20 over all
> > queries combined). As the next step on this front, I will add new fields
> > with docvalues true and reindex the entire collection. Hopefully that will
> > help.
>
> Yes, if you facet, do group queries, or sort on fields that do not have
> docValues, then Solr must build an uninverted index to do those things,
> and I think it uses the field cache for that. The docValues structure is
> the same info as an uninverted index, so Solr can just read it directly,
> rather than generating it and using heap memory.
>
> Adding docValues will make your index bigger.  For fields with high
> cardinality, the increase could be large.
>
> > We use quite a few dynamic fields in sorting. There is no mention of using
> > docvalues with dynamic fields in the official documentation
> > (https://lucene.apache.org/solr/guide/6_6/docvalues.html).
> >
> > Ques 3. Do docvalues work with dynamic fields or not? If they do, anything
> > in particular that I should look out for, like the cardinality of the field
> > (ie number of different x's in example_dynamic_field_x)?
>
> The ONLY functional difference between a dynamic field and a "standard"
> field is that dynamic fields are not explicitly named in the schema, but
> use wildcard naming.  This difference is only at the Solr level.  At the
> Lucene level, there is NO difference at all.  A dynamic field can have
> any of the same attributes as other fields, including docValues.
>
> > Shawn, I've uploaded my configuration files for the two collections here:
> > https://ufile.io/u6oe0 (tar -zxvf c1a_confs.tar.gz to decompress)
> >
> > c1 collection is ~10GB when optimized, and has 2.5 million documents.
> > ca collection is ~2GB when optimized, and has 9.5 million documents.
> >
> > Please let me know if you think there is something amiss in the
> > configuration that I should fix.
>
> I think your autowarm counts, particularly on the filterCache, are
> probably too large.  But if commits that open a new searcher are
> happening quickly, you probably won't need to fiddle with that.  You can
> check the admin UI "plugins/stats" area to see how long it takes to open
> the last searcher, and the caches have information about how long it
> took to warm.
>
> You have increased ramBufferSizeMB. The default value is 100, and for
> most indexes, increasing it just consumes memory without making things
> work any faster.  Increasing it a little bit might make a difference on
> your c1 collection, since its documents are a bit larger than what I'd
> call typical.
>
> Here's what I would recommend you use for autoCommit (removing maxDocs,
> lowering maxTime, and setting openSearcher to false):
>
>      <autoCommit>
>        <maxTime>60000</maxTime>
>        <openSearcher>false</openSearcher>
>      </autoCommit>
>
> Opening a new searcher on autoCommit isn't necessary if you configure
> autoSoftCommit, and disabling it will make those commits faster.  You
> *do* want autoCommit configured with a fairly short interval, so don't
> remove it.  Not configuring maxDocs makes the operation more predictable.
>
> For autoSoftCommit, 30 seconds is pretty aggressive, but as long as
> commits are happening very quickly, shouldn't be a problem.  If commits
> are taking more than a few seconds, I would increase the interval.  The
> autoSoftCommit interval in the config one of your collections is set to
> an hour ... if you're not overriding that with the
> solr.autoSoftCommit.maxTime property, you could decrease that one.
>
> Thanks,
> Shawn
>

Reply via email to