Re: Replica goes into recovery mode in Solr 6.1.0

Ere Maijala Fri, 10 Jul 2020 01:41:11 -0700

Walter already said that setting soft commit max time to 100 ms is a
recipe for disaster. That alone can be the issue, but if you're not
willing to try higher values, there's no way of being sure. And you have
huge JVM heaps without an explanation for the reason. If those do not
cause problems, you indicated that you also run some other software on
the same server. Is it possible that the other processes hog CPU, disk
or network and starve Solr?


I must add that Solr 6.1.0 is over four years old. You could be hitting
a bug that has been fixed for years, but even if you encounter an issue
that's still present, you will need to uprgade to get it fixed. If you
look at the number of fixes done in subsequent 6.x versions alone in the
changelog (https://lucene.apache.org/solr/8_5_1/changes/Changes.html)
you'll see that there are a lot of them. You could be hitting something
like SOLR-10420, which has been fixed for over three years.

Best,
Ere

vishal patel kirjoitti 10.7.2020 klo 7.52:
> I’ve been running Solr for a dozen years and I’ve never needed a heap larger 
> than 8 GB.
>>> What is your data size? same like us 1 TB? is your searching or indexing 
>>> frequently? NRT model?
> 
> My question is why replica is going into recovery? When replica went down, I 
> checked GC log but GC pause was not more than 2 seconds.
> Also, I cannot find out any reason for recovery from Solr log file. i want to 
> know the reason why replica goes into recovery.
> 
> Regards,
> Vishal Patel
> ________________________________
> From: Walter Underwood <wun...@wunderwood.org>
> Sent: Friday, July 10, 2020 3:03 AM
> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
> 
> Those are extremely large JVMs. Unless you have proven that you MUST
> have 55 GB of heap, use a smaller heap.
> 
> I’ve been running Solr for a dozen years and I’ve never needed a heap
> larger than 8 GB.
> 
> Also, there is usually no need to use one JVM per replica.
> 
> Your configuration is using 110 GB (two JVMs) just for Java
> where I would configure it with a single 8 GB JVM. That would
> free up 100 GB for file caches.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Jul 8, 2020, at 10:10 PM, vishal patel <vishalpatel200...@outlook.com> 
>> wrote:
>>
>> Thanks for reply.
>>
>> what you mean by "Shard1 Allocated memory”
>>>> It means JVM memory of one solr node or instance.
>>
>> How many Solr JVMs are you running?
>>>> In one server 2 solr JVMs in which one is shard and other is replica.
>>
>> What is the heap size for your JVMs?
>>>> 55GB of one Solr JVM.
>>
>> Regards,
>> Vishal Patel
>>
>> Sent from Outlook<http://aka.ms/weboutlook>
>> ________________________________
>> From: Walter Underwood <wun...@wunderwood.org>
>> Sent: Wednesday, July 8, 2020 8:45 PM
>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>
>> I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know 
>> of
>> any way to dedicate system RAM to an application object like a replica.
>>
>> How many Solr JVMs are you running?
>>
>> What is the heap size for your JVMs?
>>
>> Setting soft commit max time to 100 ms does not magically make Solr super 
>> fast.
>> It makes Solr do too much work, makes the work queues fill up, and makes it 
>> fail.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>> On Jul 7, 2020, at 10:55 PM, vishal patel <vishalpatel200...@outlook.com> 
>>> wrote:
>>>
>>> Thanks for your reply.
>>>
>>> One server has total 320GB ram. In this 2 solr node one is shard1 and 
>>> second is shard2 replica. Each solr node have 55GB memory allocated. shard1 
>>> has 585GB data and shard2 replica has 492GB data. means almost 1TB data in 
>>> this server. server has also other applications and for that 60GB memory 
>>> allocated. So total 150GB memory is left.
>>>
>>> Proper formatting details:
>>> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view
>>>
>>> Are you running multiple huge JVMs?
>>>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory 
>>>>> are still free.
>>>
>>> The servers will be doing a LOT of disk IO, so look at the read and write 
>>> iops. I expect that the solr processes are blocked on disk reads almost all 
>>> the time.
>>>>> is it chance to go in recovery mode if more IO read and write or blocked?
>>>
>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>>> Our requirement is NRT so we keep the less time
>>>
>>> Regards,
>>> Vishal Patel
>>> ________________________________
>>> From: Walter Underwood <wun...@wunderwood.org>
>>> Sent: Tuesday, July 7, 2020 8:15 PM
>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>
>>> This isn’t a support list, so nobody looks at issues. We do try to help.
>>>
>>> It looks like you have 1 TB of index on a system with 320 GB of RAM.
>>> I don’t know what "Shard1 Allocated memory” is, but maybe half of
>>> that RAM is used by JVMs or some other process, I guess. Are you
>>> running multiple huge JVMs?
>>>
>>> The servers will be doing a LOT of disk IO, so look at the read and
>>> write iops. I expect that the solr processes are blocked on disk reads
>>> almost all the time.
>>>
>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>> That is probably causing your outages.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>> On Jul 7, 2020, at 5:18 AM, vishal patel <vishalpatel200...@outlook.com> 
>>>> wrote:
>>>>
>>>> Any one is looking my issue? Please guide me.
>>>>
>>>> Regards,
>>>> Vishal Patel
>>>>
>>>>
>>>> ________________________________
>>>> From: vishal patel <vishalpatel200...@outlook.com>
>>>> Sent: Monday, July 6, 2020 7:11 PM
>>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
>>>> Subject: Replica goes into recovery mode in Solr 6.1.0
>>>>
>>>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We 
>>>> have 2 shards and each shard has 1 replica. We have 3 collection.
>>>> We do not use any cache and also disable in Solr config.xml. Search and 
>>>> Update requests are coming frequently in our live platform.
>>>>
>>>> *Our commit configuration in solr.config are below
>>>> <autoCommit>
>>>> <maxTime>600000</maxTime>
>>>>     <maxDocs>20000</maxDocs>
>>>>     <openSearcher>false</openSearcher>
>>>> </autoCommit>
>>>> <autoSoftCommit>
>>>>     <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>>>> </autoSoftCommit>
>>>>
>>>> *We used Near Real Time Searching So we did below configuration in 
>>>> solr.in.cmd
>>>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>>>>
>>>> *Our collections details are below:
>>>>
>>>> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
>>>> Number of Documents     Size(GB)        Number of Documents     Size(GB)   
>>>>      Number of Documents     Size(GB)        Number of Documents     
>>>> Size(GB)
>>>> collection1     26913364        201     26913379        202     26913380   
>>>>      198     26913379        198
>>>> collection2     13934360        310     13934367        310     13934368   
>>>>      219     13934367        219
>>>> collection3     351539689       73.5    351540040       73.5    351540136  
>>>>      75.2    351539722       75.2
>>>>
>>>> *My server configurations are below:
>>>>
>>>>      Server1 Server2
>>>> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 
>>>> 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 
>>>> 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
>>>> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
>>>> Total memory(GB)        320     320
>>>> Shard1 Allocated memory(GB)     55
>>>> Shard2 Replica Allocated memory(GB)     55
>>>> Shard2 Allocated memory(GB)             55
>>>> Shard1 Replica Allocated memory(GB)             55
>>>> Other Applications Allocated Memory(GB) 60      22
>>>> Other Number Of Applications    11      7
>>>>
>>>>
>>>> Sometimes, any one replica goes into recovery mode. Why replica goes into 
>>>> recovery? Due to heavy search OR heavy update/insert OR long GC pause 
>>>> time? If any one of them then what should we do in configuration?
>>>> Should we increase the shard for recovery issue?
>>>>
>>>> Regards,
>>>> Vishal Patel
>>>>
>>>
>>
> 
> 

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: Replica goes into recovery mode in Solr 6.1.0

Reply via email to