To add to what Dave said, if you have a particular machine that’s prone to
suddenly stopping, that’s usually a red flag that you should seriously 
think about hardware issues.

If the problem strikes different machines, then I agree with Shawn that
the first thing I’d be suspicious of is OOM errors.

FWIW,
Erick

> On Jun 9, 2020, at 6:05 AM, Dave <hastings.recurs...@gmail.com> wrote:
> 
> I’ll add that whenever I’ve had a solr instance shut down, for me it’s been a 
> hardware failure. Either the ram or the disk got a “glitch” and both of these 
> are relatively fragile and wear and tear type parts of the machine, and 
> should be expected to fail and be replaced from time to time. Solr is pretty 
> aggressive with its logging so there are a lot of writes always happening and 
> of course reads, if the disk has any issues or the memory it can lock it up 
> and bring her down, more so if you have any spellcheck dictionaries or 
> suggesters being built on start up. 
> 
> Just my experience with this, could be wrong (most likely wrong) but we 
> always have extra drives and memory around the server room for this reason.  
> At least once or twice a year we will have a disk failure in the raid and 
> need to swap in a new one. 
> 
> Good luck though, also solr should be logging it’s failures so it would be 
> good to look there too
> 
>> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>> 
>> On 5/14/2020 7:22 AM, Ryan W wrote:
>>> I manage a site where solr has stopped running a couple times in the past
>>> week. The server hasn't been rebooted, so that's not the reason.  What else
>>> causes solr to stop running?  How can I investigate why this is happening?
>> 
>> Any situation where Solr stops running and nobody requested the stop is a 
>> result of a serious problem that must be thoroughly investigated.  I think 
>> it's a bad idea for Solr to automatically restart when it stops 
>> unexpectedly.  Chances are that whatever caused the crash is going to simply 
>> make the crash happen again until the problem is solved. Automatically 
>> restarting could hide problems from the system administrator.
>> 
>> The only way a Solr auto-restart would be acceptable to me is if it sends a 
>> high priority alert to the sysadmin EVERY time it executes an auto-restart.  
>> It really is that bad of a problem.
>> 
>> The causes of Solr crashes (that I can think of) include the following. I 
>> believe I have listed these four options from most likely to least likely:
>> 
>> * Java OutOfMemoryError exceptions.  On non-windows systems, the "bin/solr" 
>> script starts Solr with an option that results in Solr's death anytime one 
>> of these exceptions occurs.  We do this because program operation is 
>> indeterminate and completely unpredictable when OOME occurs, so it's far 
>> safer to stop running.  That exception can be caused by several things, some 
>> of which actually do not involve memory at all.  If you're running on 
>> Windows via the bin\solr.cmd command, then this will not happen ... but OOME 
>> could still cause a crash, because as I already mentioned, program operation 
>> is unpredictable when OOME occurs.
>> 
>> * The OS kills Solr because system memory is completely exhausted and Solr 
>> is the process using the most memory.  Linux calls this the "oom-killer" ... 
>> I am pretty sure something like it exists on most operating systems.
>> 
>> * Corruption somewhere in the system.  Could be in Java, the OS, Solr, or 
>> data used by any of those.
>> 
>> * A very serious bug in Solr's code that we haven't discovered yet.
>> 
>> I included that last one simply for completeness.  A bug that causes a crash 
>> *COULD* exist, but as of right now, we have not seen any supporting evidence.
>> 
>> My guess is that Java OutOfMemoryError is the cause here, but I can't be 
>> certain.  If that is happening, then some resource (which might not be 
>> memory) is fully depleted.  We would need to see the full OutOfMemoryError 
>> exception in order to determine why it is happening. Sometimes the exception 
>> is logged in solr.log, sometimes it isn't.  We cannot predict what part of 
>> the code will be running when OOME occurs, so it would be nearly impossible 
>> for us to guarantee logging.  OOME can happen ANYWHERE - even in code that 
>> the compiler thinks is immune to exceptions.
>> 
>> Side note to fellow committers:  I wonder if we should implement an uncaught 
>> exception handler in Solr.  I have found in my own programs that it helps 
>> figure out thorny problems.  And while I am on the subject of handlers that 
>> might not be general knowledge, I didn't find a shutdown hook or a security 
>> manager outside of tests.
>> 
>> Thanks,
>> Shawn

Reply via email to