On 8/10/2021 1:06 AM, Satya Nand wrote:
Document count is 101893353.
The OOME exception confirms that we are dealing with heap memory. That means we won't have to look into the other resource types that can cause OOME.
With that document count, each filterCache entry is 12736670 bytes, plus some small number of bytes for java object overhead. That's 12.7 million bytes.
If your configured 4000 entry filterCache were to actually fill up, it would require nearly 51 billion bytes, and that's just for the one core with 101 million documents. This is much larger than the 30GB heap you have specified ... I am betting that the filterCache is the reason you're hitting OOME.
You need to dramatically reduce the size of your filterCache. Start with 256 and see what that gets you. Solr ships with a size of 512. Also, see what you can do about making it so that there is a lot of re-use possible with queries that you put in the fq parameter. It's better to have several fq parameters rather than one parameter with a lot of AND clauses -- much more chance of filter re-use.
I notice that you have autowarmCount set to 100 on two caches. (The autowarmCount on the documentCache, which you have set to 512, won't be used -- that cache cannot be warmed directly. It is indirectly warmed when the other caches are warmed.) This means that every time you issue a commit that opens a new searcher, Solr will execute up to 200 queries as part of the cache warming. This can make the warming take a VERY long time. Consider reducing autowarmCount. It's not causing your OOME problems, but it might be making commits take a very long time.
Thanks, Shawn