"Could be", certainly. "Definitely is" is iffier ;)...

But the statement "If we restart the Solr service or optimize the core
it seems to kick back in again.", especially the "optimize" bit
(which, by the way you should do only if you have the capability of
doing it periodically [1]) is some evidence that this may be in the
vicinity. One of the effects of an optimize is to merge your segments
files from N to 1. So say you have 10 segments. Each one of those may
consist of 10-15 individual files, all of which are held open. So
you'd go from 150 open file handles to 15......

https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Best,
Erick

On Fri, Jan 19, 2018 at 9:32 AM, Pouliot, Scott
<scott.poul...@peoplefluent.com> wrote:
> Erick,
>
> Thanks!  Could these settings be toying with replication?  Solr itself seems 
> to be working like a champ, except when things get out of sync.
>
> Scott
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, January 19, 2018 12:27 PM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Solr Replication being flaky (6.2.0)
>
> Scott:
>
> We usually recommend setting files and processes very, very high. Like 65K 
> high. Or unlimited if you can.
>
> Plus max user processes should also be bumped very high as well, like 65K as 
> well.
>
> Plus max memory and virtual memory should be unlimited.
>
> We've included warnings at startup for open files and processes, see 
> SOLR-11703
>
> Best,
> Erick
>
> On Fri, Jan 19, 2018 at 7:54 AM, Pouliot, Scott 
> <scott.poul...@peoplefluent.com> wrote:
>> I do have a ticket in with our systems team to up the file handlers since I 
>> am seeing the "Too many files open" error on occasion on our prod servers.  
>> Is this the setting you're referring to?  Found we were set to to 1024 using 
>> the "Ulimit" command.
>>
>> -----Original Message-----
>> From: Shawn Heisey [mailto:apa...@elyograg.org]
>> Sent: Friday, January 19, 2018 10:48 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Replication being flaky (6.2.0)
>>
>> On 1/19/2018 7:50 AM, Pouliot, Scott wrote:
>>> So we're running Solr in a Master/Slave configuration (1 of each) and it 
>>> seems that the replication stalls or stops functioning every now and again. 
>>>  If we restart the Solr service or optimize the core it seems to kick back 
>>> in again.
>>>
>>> Anyone have any idea what might be causing this?  We do have a good amount 
>>> of cores on each server (@150 or so), but I have heard reports of a LOT 
>>> more than that in use.
>>
>> Have you increased the number of processes that the user running Solr is 
>> allowed to start?  Most operating systems limit the number of 
>> threads/processes a user can start to a low value like 1024.  With 150 
>> cores, particularly with background tasks like replication configured, 
>> chances are that Solr is going to need to start a lot of threads.  This is 
>> an OS setting that a lot of Solr admins end up needing to increase.
>>
>> I ran into the process limit on my servers and I don't have anywhere near 
>> 150 cores.
>>
>> The fact that restarting Solr gets it working again (at least
>> temporarily) would fit with a process limit being the problem.  I'm not 
>> guaranteeing that this is the problem, only saying that it fits.
>>
>> Thanks,
>> Shawn

Reply via email to