How big are the indexes? Improving performance with a smaller heap could mean 
that the indexes were not fitting in the file buffers.

You can verify this by looking at iostats with the different heap sizes. There 
should be almost no disk reads while Solr is handling queries. If there is disk 
IO, there is not enough RAM available to cache the index files and queries will 
be a lot slower.

There could be some change between RHEL versions where there are new demons or 
something else is taking up more RAM.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 26, 2021, at 10:07 AM, Dave <hastings.recurs...@gmail.com> wrote:
> 
> I have always preferred completely turning off swap on solr dedicated 
> machines, and especially if you can’t use an SSD. 
> 
>> On Oct 26, 2021, at 12:59 PM, Paul Russell <paul.russ...@qflow.com> wrote:
>> 
>> Thanks for all the helpful information.
>> 
>> Currently we are averaging about 5.5k requests a minute for this collection
>> that is supported by a 3 node SOLR cluster. RHEL6 (Current Servers) and
>> RHEL 7 (New Servers)  are both utilizing OpenJDK8. Older servers have an
>> older version 8.131 new servers have 8.302 jdk installations.
>> 
>> GC is configured the same on all servers.
>> 
>> GC_TUNE="-XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200
>> -XX:+AggressiveOpts -XX:+AlwaysPreTouch -XX:+PerfDisableSharedMem
>> -XX:MetaspaceSize=64M"
>> 
>> 
>> Because I can bring the nodes on-line during off peak hours and load test
>> I'll take a look at 'swap-off" option. I dont control the hardware but I
>> also think a larger SSD based swap fs is also an option unless turning swap
>> off doesnt work
>> 
>> 
>> Thanks again..
>> 
>> 
>> 
>> 
>> 
>>> On Tue, Oct 26, 2021 at 9:20 AM Shawn Heisey <apa...@elyograg.org> wrote:
>>> 
>>>> On 10/26/21 6:10 AM, Paul Russell wrote:
>>>> I have a current SOLR cluster running SOLR 6.6 on RHEL 6 servers. All
>>> SOLR
>>>> instances use a 25G JVM on the RHEL 6 server configured with 64G of
>>> memory
>>>> managing a 900G collection. Measured response time to queries average
>>> about
>>>> 100ms.
>>> 
>>> Congrats on getting that performance.  With the numbers you have
>>> described, I would not expect to see anything that good.
>>> 
>>>> On the RHEL 7 servers the kswapd0 process is consuming up to 30% of the
>>> CPU
>>>> and response time is being measured at 500-1000 ms for queries.
>>> 
>>> How long are you giving the system, and how many queries have been
>>> handled by the cluster before you begin benchmarking?  The only way the
>>> old cluster could see performance that good is handling a LOT of queries
>>> ... enough that the OS can figure out how to effectively cache the index
>>> with limited memory.  By my calculations, your systems have less than
>>> 40GB of free memory to cache a 900GB index.  And that assumes that Solr
>>> is the only software running on these systems.
>>> 
>>>> I tried using the vm.swappiness setting at both 0 and 1 and have been
>>>> unable to change the behavior.
>>> 
>>> Did you see any information other than kswapd0 CPU usage that led you to
>>> this action?  I would not expect swap to be the problem with this, and
>>> your own experiments seem to say the same.
>>> 
>>>> If I trim the SOLR JVM to 16Gb response
>>>> times get better and GC logs show the JVM is operating correctly..
>>> 
>>> 
>>> Sounds like you have a solution.  Is there a problem with simply
>>> changing the heap size?  If everything works with a lower heap size,
>>> then the lower heap size is strongly encouraged.  You seem to be making
>>> a point here about the JVM operating correctly with a 16GB heap.  Are
>>> you seeing something in GC logs to indicate incorrect operation with the
>>> higher heap?  Solr 6.x uses CMS for garbage collection. You might see
>>> better GC performance by switching to G1. Switching to another collector
>>> would require a much newer Java version, one that is probably not
>>> compatible with Solr 6.x. Here is the GC_TUNE setting (goes in
>>> solr.in.sh) for newer Solr versions:
>>> 
>>>      GC_TUNE=('-XX:+UseG1GC' \
>>>        '-XX:+PerfDisableSharedMem' \
>>>        '-XX:+ParallelRefProcEnabled' \
>>>        '-XX:MaxGCPauseMillis=250' \
>>>        '-XX:+UseLargePages' \
>>>        '-XX:+AlwaysPreTouch' \
>>>        '-XX:+ExplicitGCInvokesConcurrent')
>>> 
>>> If your servers have more than one physical CPU and NUMA architecture,
>>> then I would strongly recommend adding "-XX:+UseNUMA" to the argument
>>> list.  Adding it on systems with only one NUMA node will not cause
>>> problems.
>>> 
>>> I would not expect the problem to be in the OS, but I could be wrong.
>>> It is possible that changes in the newer kernel make it less efficient
>>> at figuring out proper cache operation, and that would affect Solr.
>>> Usually things get better with an upgrade, but you never know.
>>> 
>>> It seems more likely to be some other difference between the systems.
>>> Top culprit in my mind is Java.  Are the two systems running the same
>>> version of Java from the same vendor?  What I would recommend for Solr
>>> 6.x is the latest OpenJDK 8.  In the past I would have recommended
>>> Oracle Java, but they changed their licensing, so now I go with
>>> OpenJDK.  Avoid IBM Java or anything that descends from it -- it is
>>> known to have bugs running Lucene software.  If you want to use a newer
>>> Java version than Java 8, you'll need to upgrade Solr.  Upgrading from
>>> 6.x to 8.x is something that requires extensive testing, and a complete
>>> reindex from scratch.
>>> 
>>> I would be interested in seeing the screenshot described here:
>>> 
>>> 
>>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Askingforhelponamemory/performanceissue
>>> 
>>> RHEL uses gnu top.
>>> 
>>> My own deployments use Ubuntu.  Back when I did have access to large
>>> Solr installs, they were running on CentOS, which is effectively the
>>> same as RHEL.  I do not recall whether they were CentOS 6 or 7.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>>> 
>> 
>> -- 
>> Paul
>> Russell
>> VP Integration/Support Services
>> [image: <!--company-->] <https://www.qflow.com/>
>> *main:* 314.968.9906
>> *direct:* 314.255.2135
>> *cell:* 314.258.0864
>> 9317 Manchester Rd.
>> St. Louis, MO 63119
>> qflow.com <https://www.qflow.com/>

Reply via email to