Thanks, Shawn,
This makes sense. Filter queries with high hit counts could be the trigger
for out-of-memory, That's why it is so infrequent.
We will try to relook filter queries and further try reducing filter cache
size.

one question though,

> There is an alternate format for filterCache entries, that just lists
> the IDs of the matching documents.  This only gets used when the
> hitcount for the filter is low.

Does this alternate format use different data structures to store the
document ids for filters with low document count, Other than the bitmap?

means the size constraint(filter cache size) would apply only on bitmap or
this alternate structure too or their sum?


On Wed, 11 Aug, 2021, 6:50 pm Shawn Heisey, <apa...@elyograg.org> wrote:

> On 8/11/2021 6:04 AM, Satya Nand wrote:
> > *Filter cache stats:*
> >
> https://drive.google.com/file/d/19MHEzi9m3KS4s-M86BKFiwmnGkMh3DGx/view?usp=sharing
>
>
> This shows the current size as 3912, almost full.
>
> There is an alternate format for filterCache entries, that just lists
> the IDs of the matching documents.  This only gets used when the
> hitcount for the filter is low.  I do not know what threshold it uses to
> decide that the hitcount is low enough to use the alternate format, and
> I do not know where in the code to look for the answer.  This is
> probably why you can have 3912 entries in the cache without blowing the
> heap.
>
> I bet that when the heap gets blown, the filter queries Solr receives
> are such that they cannot use the alternate format, and thus require the
> full 12.7 million bytes.  Get enough of those, and you're going to need
> more heap than 30GB.  I bet that if you set the heap to 31G, the OOMEs
> would occur a little less frequently.  Note that if you set the heap to
> 32G, you actually have less memory available than if you set it to 31G
> -- At 32GB, Java must switch from 32 bit pointers to 64 bit pointers.
> Solr creates a LOT of objects on the heap, so that difference adds up.
>
> Discussion item for those with an interest in the low-level code:  What
> kind of performance impact would it cause to use a filter bitmap
> compressed with run-length encoding?  Would that happen at the Lucene
> level rather than the Solr level?
>
> To fully solve this issue, you may need to re-engineer your queries so
> that fq values are highly reusable, and non-reusable filters are added
> to the main query.  Then you would not need a very large cache to obtain
> a good hit ratio.
>
> Thanks,
> Shawn
>
>

-- 

Reply via email to