Hi Shawn, Thanks for explaining it so well. We will work on reducing the filter cache size and auto warm count.
Though I have one question. If your configured 4000 entry filterCache were to actually fill up, it > would require nearly 51 billion bytes, and that's just for the one core > with 101 million documents. This is much larger than the 30GB heap you > have specified ... I am betting that the filterCache is the reason > you're hitting OOME. As you can see from the below screenshots the filter cache is almost full and the heap is approx 18-20 GB. I think this means heap is not actually taking 51 GB of space. Otherwise, the issue would have been very frequent if the full cache had been taking ~50 GB of space. I also believed the solr uses some compressed data structures to accumulate its cache, That' how it is able to store the cache in less memory. Isn't it? Also, the issue is not very frequent. It comes once or twice a month, Where all follower servers stop working at the same time due to OutOfMemory error. *Filter Cache statics as of 10:08 IST* [image: image.png] *Heap Usages* [image: image.png] - On Wed, Aug 11, 2021 at 4:12 AM Shawn Heisey <apa...@elyograg.org> wrote: > On 8/10/2021 1:06 AM, Satya Nand wrote: > > Document count is 101893353. > > The OOME exception confirms that we are dealing with heap memory. That > means we won't have to look into the other resource types that can cause > OOME. > > With that document count, each filterCache entry is 12736670 bytes, plus > some small number of bytes for java object overhead. That's 12.7 > million bytes. > > If your configured 4000 entry filterCache were to actually fill up, it > would require nearly 51 billion bytes, and that's just for the one core > with 101 million documents. This is much larger than the 30GB heap you > have specified ... I am betting that the filterCache is the reason > you're hitting OOME. > > You need to dramatically reduce the size of your filterCache. Start > with 256 and see what that gets you. Solr ships with a size of 512. > Also, see what you can do about making it so that there is a lot of > re-use possible with queries that you put in the fq parameter. It's > better to have several fq parameters rather than one parameter with a > lot of AND clauses -- much more chance of filter re-use. > > I notice that you have autowarmCount set to 100 on two caches. (The > autowarmCount on the documentCache, which you have set to 512, won't be > used -- that cache cannot be warmed directly. It is indirectly warmed > when the other caches are warmed.) This means that every time you issue > a commit that opens a new searcher, Solr will execute up to 200 queries > as part of the cache warming. This can make the warming take a VERY > long time. Consider reducing autowarmCount. It's not causing your OOME > problems, but it might be making commits take a very long time. > > Thanks, > Shawn > --