On 12/20/22 06:34, Nick Vladiceanu wrote:
Thank you Shawn for sharing, indeed useful information.
However, I must say that we only used deleteById and never deleteByQuery. We
also only rely on the auto segment merging and not issuing optimize command.
That is very unusual. I've never seen a core reload take more than a
few seconds, even when I was dealing with core sizes of double-digit GB.
Unless you have hundreds or thousands of replicas for each of your 6
shards, it really should complete very quickly.
Have you been able to determine which Solr cores in the collection are
causing the delay, and take a look at those machines?
Some thoughts:
When you said 96 nodes, were you talking about Solr instances or
servers? You really should only run one Solr instance per server,
especially for a small index like this.
A 23GB heap seems very excessive for a 4.7GB index that has less than 4
million documents. I'm sure you can reduce that by a lot and encounter
smaller GC pauses as a result. If you can share your GC logs, I should
be able to provide a recommendation.
I've been looking at what MinHeapFreeRatio and MaxHeapFreeRatio do.
Those settings are probably unnecessary. This is what I currently use
for GC tuning on JDK 11 or JDK 17. This produces EXTREMELY short
collection pauses, but I have noticed that throughput-heavy things like
indexing run a bit slower, but if the indexing is multi-threaded, I
think that it would not be affected a lot.
GC_TUNE=" \
-XX:+UnlockExperimentalVMOptions \
-XX:+UseZGC \
-XX:+ParallelRefProcEnabled \
-XX:+ExplicitGCInvokesConcurrent \
-XX:+UseStringDeduplication \
-XX:+AlwaysPreTouch \
-XX:+UseNUMA \
"
ZGC has one unexpected disadvantage. Using it will disable Compressed
OOPs -- meaning that even with a heap smaller than 32GB, it uses 64 bit
pointers. This hasn't really impacted me ... the index is so small that
with a 1GB heap I have more than enough. If low pauses are the most
important thing you need from GC and you're running at least JDK11, I
would strongly recommend ZGC. It does make indexing slower for me -- a
full rebuild that takes 10 minutes with G1 takes 11 minutes with ZGC.
But even the worst-case GC pauses are single-digit milliseconds.
For G1GC, which is still the best option for JDK8, this is what I used
to have:
#GC_TUNE=" \
# -XX:+UseG1GC \
# -XX:+ParallelRefProcEnabled \
# -XX:MaxGCPauseMillis=100 \
# -XX:+ExplicitGCInvokesConcurrent \
# -XX:+UseStringDeduplication \
# -XX:+AlwaysPreTouch \
# -XX:+UseNUMA \
#"
Thanks,
Shawn