Re: Solr 7.2.1 Collection Backup Performance issue

Shawn Heisey Tue, 18 Sep 2018 12:43:12 -0700

On 9/18/2018 11:00 AM, Ganesh Sethuraman wrote:

We are using Solr 7.2.1 with SolrCloud with 35 collections with 1 node ZK
ensemble (in lower environment, we will have 3 nodes ensemble) in AWS. We
are testing to see if we have Async Solr Cloud backup  (
https://lucene.apache.org/solr/guide/7_2/collections-api.html#backup) done
every time we are create a new collection or update an existing collection.
There are 1 replica and 8 shards per collection. Two Solr nodes.


For the largest collection (index size of 80GB), we see that BACKUP to the
EFS drive takes about ~10 mins. We are doing lot of /get (real time get)
option from the application. We are seeing that that the performance
significantly (2x) degrades on the read (get) performance when we BACK-UP
is going on in parallel.

My best guess here is that you do not have enough memory. For goodperformance, Solr is extremely reliant on having certain parts of theindex data sitting in memory, so that it doesn't have to actually readthe disk to discover matches for a query. When all is working well,that data will be read from memory instead of the disk. Memory is MUCHMUCH faster than a disk.

Making a backup is going to read ALL of the index data. So if you donot have enough spare memory to cache the entire index, reading theindex to make the backup is going to push the important parts of theindex out of the cache, and then Solr will have to actually go and readthe disk in order to satisfy a query.


https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Can you gather a screenshot of your process list and put it on a filesharing website? You'll find instructions on how to do this here:


https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

Thanks,
Shawn

Re: Solr 7.2.1 Collection Backup Performance issue

Reply via email to