I do have a ticket in with our systems team to up the file handlers since I am seeing the "Too many files open" error on occasion on our prod servers. Is this the setting you're referring to? Found we were set to to 1024 using the "Ulimit" command.
-----Original Message----- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Friday, January 19, 2018 10:48 AM To: solr-user@lucene.apache.org Subject: Re: Solr Replication being flaky (6.2.0) On 1/19/2018 7:50 AM, Pouliot, Scott wrote: > So we're running Solr in a Master/Slave configuration (1 of each) and it > seems that the replication stalls or stops functioning every now and again. > If we restart the Solr service or optimize the core it seems to kick back in > again. > > Anyone have any idea what might be causing this? We do have a good amount of > cores on each server (@150 or so), but I have heard reports of a LOT more > than that in use. Have you increased the number of processes that the user running Solr is allowed to start? Most operating systems limit the number of threads/processes a user can start to a low value like 1024. With 150 cores, particularly with background tasks like replication configured, chances are that Solr is going to need to start a lot of threads. This is an OS setting that a lot of Solr admins end up needing to increase. I ran into the process limit on my servers and I don't have anywhere near 150 cores. The fact that restarting Solr gets it working again (at least temporarily) would fit with a process limit being the problem. I'm not guaranteeing that this is the problem, only saying that it fits. Thanks, Shawn