Hi Tim, thanks for your reply. Forgot to mention, I’ve tried with shards.preference=replica.type:PULL,replica.type:TLOG,replica.location:local, and the results are basically the same as with only replica.location:local or without any additional query parameters. Sometimes, under heavier load, some random nodes are getting higher sys load (1min -> ~14-20)/cpu usage 100%, no wait, all utilised by user. The traffic is evenly distributed across all the nodes (round-robin), each shard has equal number of replicas. Data is also (almost) evenly distributed across shards. Thank you
Best, Nick > On 11. Jun 2021, at 5:17 PM, Timothy Potter <thelabd...@gmail.com> wrote: > > Hi Nick, > > What does your response time look like if you use > shards.preference=replica.type:PULL,replica.location:local as a query > parameter? Basically route all queries to PULL replicas only. > > LMK > > Tim > > On Fri, Jun 11, 2021 at 6:55 AM Nick Vladiceanu <vladicean...@gmail.com> > wrote: >> >> hello, >> I’m facing some performance issues when moving from NRT replica types to >> TLOG + PULL. We’re constantly indexing new data and heavily querying (~2k >> rps). >> >> - index size is ~ 2.5Gi; >> - number of docs ~4.6M; >> - 2 shards; >> - 7 cores and 14Gi of memory >> - 30 instances >> - JVM Heap is 12Gi >> >> When running on NRT only, the response time in avg is ~150ms p99 and 40ms >> p95. When changing to TLOG (6 tlog replicas) + 24 PULL, the response time >> grows to ~350ms p99 and 120ms p95. >> >> Here are some fragments from our solrconfig: >> >> >>> <updateHandler class="solr.DirectUpdateHandler2"> >>> <updateLog> >>> <str name="dir">${solr.data.dir:}</str> >>> <int >>> name="tlogDfsReplication">${solr.ulog.tlogDfsReplication:3}</int> >>> </updateLog> >>> >>> <autoCommit> >>> <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> >>> <maxDocs>${solr.autoCommit.maxDocs:10000}</maxDocs> >>> <openSearcher>true</openSearcher> >>> </autoCommit> >>> >>> <autoSoftCommit> >>> <maxTime>${solr.autoSoftCommit.maxTime:300000}</maxTime> >>> </autoSoftCommit> >>> </updateHandler> >> >>> <query> >>> <maxBooleanClauses>1000</maxBooleanClauses> >>> <filterCache class="solr.CaffeineCache" >>> size="${filterCache.size:32768}" >>> initialSize="${filterCache.initialSize:32768}" >>> autowarmCount="20%"/> >>> >>> <queryResultCache class="solr.CaffeineCache" >>> size="${queryResultCache.size:32768}" >>> initialSize="${queryResultCache.initialSize:32768}" >>> autowarmCount="0%"/> >>> >>> <documentCache class="solr.CaffeineCache" >>> size="${documentCache.size:150000}" >>> initialSize="${documentCache.initialSize:150000}" >>> autowarmCount="0%"/> >>> >>> <enableLazyFieldLoading>true</enableLazyFieldLoading> >>> <useFilterForSortedQuery>true</useFilterForSortedQuery> >>> >>> <queryResultWindowSize>160</queryResultWindowSize> >>> <queryResultMaxDocsCached>300</queryResultMaxDocsCached> >>> >>> <listener event="newSearcher" class="solr.QuerySenderListener"> >>> </listener> >>> <listener event="firstSearcher" class="solr.QuerySenderListener"> >>> </listener> >>> >>> <useColdSearcher>false</useColdSearcher> >>> <maxWarmingSearchers>8</maxWarmingSearchers> >>> </query> >> >> One of my assumption was to reduce the maxWarmingSearchers and to increase >> the autoCommit maxTime, since the softCommit isn’t available anymore in TLOG >> replicas. Is that valid? >> >> I couldn’t find any documents with the differences/considerations we need to >> take into account between NRT and TLOG, could you please help? Thanks a lot >> in advance. Please let me know if there is anything else required. >> >> Best regards, >> Nick Vladiceanu