Hello Jan, thanks for your reply! I'm not very experienced with Cache settings on solr, this is the first time I'm setting it up myself.
These are the settings I was able to find on our solrconfig.xml <filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0"/> <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0"/> <documentCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0"/> <cache name="perSegFilter" class="solr.search.LRUCache" size="10" initialSize="0" autowarmCount="10" regenerator="solr.NoOpRegenerator" /> In the meantime, I'll investigate about cachin, thanks again! MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT matias.la...@passare.com | +54 11-6357-2143 -----Original Message----- From: Jan Høydahl <jan....@cominvent.com> Sent: Thursday, December 1, 2022 10:11 PM To: users@solr.apache.org Subject: Re: Very High CPU when indexing What are your cache settings? Are you using autoWarmCount or explicit cache warming? It could be a source of long commit times. Jan > 1. des. 2022 kl. 22:35 skrev Matias Laino <matias.la...@passare.com.INVALID>: > > > I've tried with multiple different autosoft commit and auto commit > configurations, and it always takes 2:30 - 3 minutes to get the records > available on search, CPU is being pretty good since I upgraded, and memory > should be plenty unless I'm mistaken, I'm lost at this point. > > Any help will be really appreciated > > MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT > matias.la...@passare.com | +54 11-6357-2143 > > > -----Original Message----- > From: Matias Laino <matias.la...@passare.com.INVALID> > Sent: Thursday, December 1, 2022 1:11 PM > To: users@solr.apache.org > Subject: RE: Very High CPU when indexing > > Hi Shawn, thanks again for the reply. > > I've tried increasing the memory to 32 gb and 16gb of ram heap with 8 cores, > and even though I still see peaks of 300% CPU on the solr process it can > handle it (solr doesn't go down). > But, I've tried several different configurations for the auto commit and soft > commit and results always take a few minutes to show up on search, which is > really unacceptable for us, I'm not sure how to proceed now. > > I've looked at the cores and for example of the collection I'm testing > against right now, I see these values: > > Core 1: > Num Docs:4806841 > Max Doc:4845793 > Heap Memory Usage:387392 > Core 2: > Num Docs:4810159 > Max Doc:4849229 > Heap Memory Usage:450008 > > Other collections look fairly similar, except for this one: > > Preview Core1: > Num Docs:5774937 > Max Doc:5832482 > Heap Memory Usage:407424 > > Preview Core2: > Num Docs:5774937 > Max Doc:5833942 > Heap Memory Usage:463632 > > Preview Core 3: > Num Docs:5778245 > Max Doc:5790174 > Heap Memory Usage:480672 > > For some reason, the "Preview Collection" has 3 shards instead of 2 like it > was before... maybe that could be related? The collection overview say shards > 2 and replication factor 2. > > As additional info, Zookeeper is running on it's own server and solr is the > only thing running on that server, aside some system processes. > > Thanks again! > > MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT > matias.la...@passare.com | +54 11-6357-2143 > > > -----Original Message----- > From: Shawn Heisey <elyog...@elyograg.org> > Sent: Thursday, December 1, 2022 1:07 AM > To: users@solr.apache.org > Subject: Re: Very High CPU when indexing > > On 11/30/22 08:57, Matias Laino wrote: >> Q: What is the total document count? >> A: Based on the dashboard, it's Total #docs: 68.6mn each node (I'm >> replicating the same data on both) > > Each core has a count. And here you can see what I was talking about with > max doc compared to num docs. > > https://www.dropbox.com/s/jdgddn4ve5mluhr/core_doc_counts.png?dl=0 > >> Q: but it would be great to have an on-disk size and document count >> (max docs, not num docs) for each collection >> A: I'm not sure where to get that from metrics, based on the cloud dashboard >> it say the following by shard: >> preview_s1r2: 1.9Gb >> preview_s2r11: 1.9Gb >> preview_s2r6: 1.9Gb >> staging-d_s1r1: 1.8Gb >> staging-d_s2r4: 1.8Gb >> staging-a_s1r1: 1.7Gb >> staging-a_s2r4: 1.7Gb >> staging-c_s2r5: 1.6Gb >> staging-c_s1r2: 1.6Gb >> pre-prod_s1r1: 1.6Gb >> pre-prod_s2r4: 1.6Gb >> staging-b_s1r2: 1.5Gb >> staging-b_s2r5: 1.5Gb >> That is replicated on the other node. > > So you've got 22GB of data, and assuming Solr is the only thing running on > the machine, only about 8GB of memory to cache it (total RAM of 16GB minus > 8GB for the Solr heap). I would hope for at least of 12GB of cache for that, > and more is always better. 8GB may not be enough. If you have other software > running on the machine, it will be even less. Does ZK live on the same > instance? If so, how much heap are you giving to that? > > Performance of a system is often perfectly fine up until some threshold, and > once you throw just little bit more data in the mix so it goes over that > threshold, performance drops drastically. That is how a small increase can > bring a system to its knees. > > If you can upgrade the instance to one with more memory, that might also > help, but I do think that the biggest problem is the autoSoftCommit setting. > If you really can't make it at least two minutes, which is the value I would > use, then set it as high as you can. 10 to 30 seconds, maybe. > > Thanks, > Shawn >