hello, I’m facing some performance issues when moving from NRT replica types to TLOG + PULL. We’re constantly indexing new data and heavily querying (~2k rps).
- index size is ~ 2.5Gi; - number of docs ~4.6M; - 2 shards; - 7 cores and 14Gi of memory - 30 instances - JVM Heap is 12Gi When running on NRT only, the response time in avg is ~150ms p99 and 40ms p95. When changing to TLOG (6 tlog replicas) + 24 PULL, the response time grows to ~350ms p99 and 120ms p95. Here are some fragments from our solrconfig: > <updateHandler class="solr.DirectUpdateHandler2"> > <updateLog> > <str name="dir">${solr.data.dir:}</str> > <int > name="tlogDfsReplication">${solr.ulog.tlogDfsReplication:3}</int> > </updateLog> > > <autoCommit> > <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> > <maxDocs>${solr.autoCommit.maxDocs:10000}</maxDocs> > <openSearcher>true</openSearcher> > </autoCommit> > > <autoSoftCommit> > <maxTime>${solr.autoSoftCommit.maxTime:300000}</maxTime> > </autoSoftCommit> > </updateHandler> > <query> > <maxBooleanClauses>1000</maxBooleanClauses> > <filterCache class="solr.CaffeineCache" > size="${filterCache.size:32768}" > initialSize="${filterCache.initialSize:32768}" > autowarmCount="20%"/> > > <queryResultCache class="solr.CaffeineCache" > size="${queryResultCache.size:32768}" > initialSize="${queryResultCache.initialSize:32768}" > autowarmCount="0%"/> > > <documentCache class="solr.CaffeineCache" > size="${documentCache.size:150000}" > initialSize="${documentCache.initialSize:150000}" > autowarmCount="0%"/> > > <enableLazyFieldLoading>true</enableLazyFieldLoading> > <useFilterForSortedQuery>true</useFilterForSortedQuery> > > <queryResultWindowSize>160</queryResultWindowSize> > <queryResultMaxDocsCached>300</queryResultMaxDocsCached> > > <listener event="newSearcher" class="solr.QuerySenderListener"> > </listener> > <listener event="firstSearcher" class="solr.QuerySenderListener"> > </listener> > > <useColdSearcher>false</useColdSearcher> > <maxWarmingSearchers>8</maxWarmingSearchers> > </query> One of my assumption was to reduce the maxWarmingSearchers and to increase the autoCommit maxTime, since the softCommit isn’t available anymore in TLOG replicas. Is that valid? I couldn’t find any documents with the differences/considerations we need to take into account between NRT and TLOG, could you please help? Thanks a lot in advance. Please let me know if there is anything else required. Best regards, Nick Vladiceanu