I would like to request help from the community on something. I'm not
in a position to do the kind of testing that I want, as I no longer have
access to Solr servers with large amounts of data.
What I want to test is the Sheandoah garbage collector. I've done some
testing on my own, but the index is very small (629MB) and so is the
heap size (512MB).
Here is a GC log from my most recent test:
https://www.dropbox.com/s/8cbncuax7kv0x9c/solr_gc.log?dl=0
For this test, I deleted all the GC logs, restarted Solr, deleted all
docs and optimized the index so it had 0 segments, and then asked
dovecot (POP/IMAP server) to do a full reindex. At this moment there
are 158905 docs in the index. Then I grabbed the GC log linked above
and had the gceasy.io website analyze it. The GC performance looks very
good ... but with the heap at only 512MB, even a bad GC config would
probably look good. Here are the GC settings that I put in
/etc/default/solr.in.sh:
GC_TUNE=" \
-XX:+AlwaysPreTouch \
-XX:+UseNUMA \
-XX:+UseShenandoahGC \
-XX:+ParallelRefProcEnabled \
-XX:+UseStringDeduplication \
-XX:ParallelGCThreads=2 \
"
I'm running this on a t3a.medium EC2 instance, which only has 2 CPUs, so
I limited the GC threads to 2. This instance is my personal mail
server. If anyone brave enough to help me test wants to try it, and you
have a server with a LOT of cores, you could increase the number of threads.
What I need to see is the GC logs that Solr creates, along with some
details about the indexes on the server that generated the log. Best
results will come from very busy servers that have a large index ...
hoping for 100GB or more of index per Solr core, and a max heap size at
least 4GB. If you want to get really adventurous, you could gather GC
logs with the default GC settings (which in later Solr versions is G1GC)
and with Shenandoah.
A recent version of Java 11 is required to enable the Shenandoah
collector. I think it was made available in 11.0.3. I am running
OpenJDK 11.0.11, the latest available on Ubuntu 20.04 LTS.
I'm not advocating that anyone try this on a mission-critical production
system, but I would not expect it to cause problems on such a setup.
Use your own judgement.
Thanks,
Shawn