On 1/19/2018 7:50 AM, Pouliot, Scott wrote:
So we're running Solr in a Master/Slave configuration (1 of each) and it seems
that the replication stalls or stops functioning every now and again. If we
restart the Solr service or optimize the core it seems to kick back in again.
Anyone have any idea what might be causing this? We do have a good amount of
cores on each server (@150 or so), but I have heard reports of a LOT more than
that in use.
Have you increased the number of processes that the user running Solr is
allowed to start? Most operating systems limit the number of
threads/processes a user can start to a low value like 1024. With 150
cores, particularly with background tasks like replication configured,
chances are that Solr is going to need to start a lot of threads. This
is an OS setting that a lot of Solr admins end up needing to increase.
I ran into the process limit on my servers and I don't have anywhere
near 150 cores.
The fact that restarting Solr gets it working again (at least
temporarily) would fit with a process limit being the problem. I'm not
guaranteeing that this is the problem, only saying that it fits.
Thanks,
Shawn