Re: How to determine why solr stops running?

2020-06-30 Thread Otis Gospodnetić
Hi,

Maybe https://github.com/sematext/solr-diagnostics can be of use?

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



On Mon, Jun 29, 2020 at 3:46 PM Erick Erickson 
wrote:

> Really look at your cache size settings.
>
> This is to eliminate this scenario:
> - your cache sizes are very large
> - when you looked and the memory was 9G, you also had a lot of cache
> entries
> - there was a commit, which threw out the old cache and reduced your cache
> size
>
> This is frankly kind of unlikely, but worth checking.
>
> The other option is that you haven’t been hitting OOMs at all and that’s a
> complete
> red herring. Let’s say in actuality, you only need an 8G heap or even
> smaller. By
> overallocating memory garbage will simply accumulate for a long time and
> when it
> is eventually collected, _lots_ of memory will be collected.
>
> Another rather unlikely scenario, but again worth checking.
>
> Best,
> Erick
>
> > On Jun 29, 2020, at 3:27 PM, Ryan W  wrote:
> >
> > On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson 
> > wrote:
> >
> >> ps aux | grep solr
> >>
> >
> > [solr@faspbsy0002 database-backups]$ ps aux | grep solr
> > solr  72072  1.6 33.4 22847816 10966476 ?   Sl   13:35   1:36 java
> > -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled
> > -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages
> > -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails
> > -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> > -Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation
> > -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
> > -Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983
> > -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server
> > -Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home=
> > -Dsolr.install.dir=/opt/solr
> > -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
> > -Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole
> > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
> /opt/solr/server/logs
> > -jar start.jar --module=http
> >
> >
> >
> >> should show you all the parameters Solr is running with, as would the
> >> admin screen. You should see something like:
> >>
> >> -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh
> >>
> >> And there should be some logs laying around if that was the case
> >> similar to:
> >> $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log
> >>
> >
> > This log is not being written, even though in the oom_solr.sh it does
> > appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the
> logs
> > directory, but it isn't. There are some log files in
> /opt/solr/server/logs,
> > and they are indeed being written to.  There are fresh entries in the
> logs,
> > but no sign of any problem.  If I grep for oom in the logs directory, the
> > only references I see are benign... just a few entries that list all the
> > flags, and oom_solr.sh is among the settings visible in the entry.  And
> > someone did a search for "Mushroom," so there's another instance of oom
> > from that search.
> >
> >
> > As for memory, It Depends (tm). There are configurations
> >> you can make choices about that will affect the heap requirements.
> >> You can’t really draw comparisons between different projects. Your
> >> Drupal + Solr app has how many documents? Indexed how? Searched
> >> how? .vs. this one.
> >>
> >> The usual suspect for configuration settings that are responsible
> >> include:
> >>
> >> - filterCache size too large. Each filterCache entry is bounded by
> >> maxDoc/8 bytes. I’ve seen people set this to over 1M…
> >>
> >> - using non-docValues for fields used for sorting, grouping, function
> >> queries
> >> or faceting. Solr will uninvert the field on the heap, whereas if you
> have
> >> specified docValues=true, the memory is out in OS memory space rather
> than
> >> heap.
> >>
> >> - People just putting too many docs in a collection in a single JVM in
> >> aggregate.
> >> All replicas in the same instance are using part of the heap.
> >>
> >> - Having unnecessary options on your fields, although that’s more MMap
> >> space than
> >> heap.
> >>
> >> The problem basically is that all of Solr’s access is essentially
> random,
> >> so for
> >> performance reasons lots of stuff has to be in memory.
> >>
> >> That said, Solr hasn’t been as careful as it should be about using up
> >> memory,
> >> that’s ongoing.
> >>
> >> If you really want to know what’s using up memory, throw a heap analysis
> >> tool
> >> at it. That’ll give you a clue what’s hogging memory and you can go from
> >> there.
> >>
> >>> On Jun 29, 2020, at 1:48 PM, David Hastings <
> >> hastings.recurs...@gmail.com> wrote:
> >>>
> >>> little nit picky note here, use 31gb, never 32.
> >>>
> >>> On Mon, Jun 29, 2020 at 

Re: How to determine why solr stops running?

2020-06-29 Thread Erick Erickson
Really look at your cache size settings.

This is to eliminate this scenario:
- your cache sizes are very large
- when you looked and the memory was 9G, you also had a lot of cache entries
- there was a commit, which threw out the old cache and reduced your cache size

This is frankly kind of unlikely, but worth checking.

The other option is that you haven’t been hitting OOMs at all and that’s a 
complete
red herring. Let’s say in actuality, you only need an 8G heap or even smaller. 
By
overallocating memory garbage will simply accumulate for a long time and when it
is eventually collected, _lots_ of memory will be collected.

Another rather unlikely scenario, but again worth checking.

Best,
Erick

> On Jun 29, 2020, at 3:27 PM, Ryan W  wrote:
> 
> On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson 
> wrote:
> 
>> ps aux | grep solr
>> 
> 
> [solr@faspbsy0002 database-backups]$ ps aux | grep solr
> solr  72072  1.6 33.4 22847816 10966476 ?   Sl   13:35   1:36 java
> -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled
> -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages
> -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> -Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
> -Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983
> -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server
> -Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home=
> -Dsolr.install.dir=/opt/solr
> -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
> -Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole
> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs
> -jar start.jar --module=http
> 
> 
> 
>> should show you all the parameters Solr is running with, as would the
>> admin screen. You should see something like:
>> 
>> -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh
>> 
>> And there should be some logs laying around if that was the case
>> similar to:
>> $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log
>> 
> 
> This log is not being written, even though in the oom_solr.sh it does
> appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the logs
> directory, but it isn't. There are some log files in /opt/solr/server/logs,
> and they are indeed being written to.  There are fresh entries in the logs,
> but no sign of any problem.  If I grep for oom in the logs directory, the
> only references I see are benign... just a few entries that list all the
> flags, and oom_solr.sh is among the settings visible in the entry.  And
> someone did a search for "Mushroom," so there's another instance of oom
> from that search.
> 
> 
> As for memory, It Depends (tm). There are configurations
>> you can make choices about that will affect the heap requirements.
>> You can’t really draw comparisons between different projects. Your
>> Drupal + Solr app has how many documents? Indexed how? Searched
>> how? .vs. this one.
>> 
>> The usual suspect for configuration settings that are responsible
>> include:
>> 
>> - filterCache size too large. Each filterCache entry is bounded by
>> maxDoc/8 bytes. I’ve seen people set this to over 1M…
>> 
>> - using non-docValues for fields used for sorting, grouping, function
>> queries
>> or faceting. Solr will uninvert the field on the heap, whereas if you have
>> specified docValues=true, the memory is out in OS memory space rather than
>> heap.
>> 
>> - People just putting too many docs in a collection in a single JVM in
>> aggregate.
>> All replicas in the same instance are using part of the heap.
>> 
>> - Having unnecessary options on your fields, although that’s more MMap
>> space than
>> heap.
>> 
>> The problem basically is that all of Solr’s access is essentially random,
>> so for
>> performance reasons lots of stuff has to be in memory.
>> 
>> That said, Solr hasn’t been as careful as it should be about using up
>> memory,
>> that’s ongoing.
>> 
>> If you really want to know what’s using up memory, throw a heap analysis
>> tool
>> at it. That’ll give you a clue what’s hogging memory and you can go from
>> there.
>> 
>>> On Jun 29, 2020, at 1:48 PM, David Hastings <
>> hastings.recurs...@gmail.com> wrote:
>>> 
>>> little nit picky note here, use 31gb, never 32.
>>> 
>>> On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:
>>> 
 It figures it would happen again a couple hours after I suggested the
>> issue
 might be resolved.  Just now, Solr stopped running.  I cleared the
>> cache in
 my app a couple times around the time that it happened, so perhaps that
>> was
 somehow too taxing for the server.  However, I've never allocated so
>> much
 RAM to a website before, so it's odd that I'm getting these failures.
>> My
 colleagues were astonished when I said 

Re: How to determine why solr stops running?

2020-06-29 Thread Ryan W
On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson 
wrote:

> ps aux | grep solr
>

[solr@faspbsy0002 database-backups]$ ps aux | grep solr
solr  72072  1.6 33.4 22847816 10966476 ?   Sl   13:35   1:36 java
-server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled
-XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages
-XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
-Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
-Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983
-DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server
-Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home=
-Dsolr.install.dir=/opt/solr
-Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
-Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs
-jar start.jar --module=http



> should show you all the parameters Solr is running with, as would the
> admin screen. You should see something like:
>
> -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh
>
> And there should be some logs laying around if that was the case
> similar to:
> $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log
>

This log is not being written, even though in the oom_solr.sh it does
appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the logs
directory, but it isn't. There are some log files in /opt/solr/server/logs,
and they are indeed being written to.  There are fresh entries in the logs,
but no sign of any problem.  If I grep for oom in the logs directory, the
only references I see are benign... just a few entries that list all the
flags, and oom_solr.sh is among the settings visible in the entry.  And
someone did a search for "Mushroom," so there's another instance of oom
from that search.


As for memory, It Depends (tm). There are configurations
> you can make choices about that will affect the heap requirements.
> You can’t really draw comparisons between different projects. Your
> Drupal + Solr app has how many documents? Indexed how? Searched
> how? .vs. this one.
>
> The usual suspect for configuration settings that are responsible
> include:
>
> - filterCache size too large. Each filterCache entry is bounded by
> maxDoc/8 bytes. I’ve seen people set this to over 1M…
>
> - using non-docValues for fields used for sorting, grouping, function
> queries
> or faceting. Solr will uninvert the field on the heap, whereas if you have
> specified docValues=true, the memory is out in OS memory space rather than
> heap.
>
> - People just putting too many docs in a collection in a single JVM in
> aggregate.
> All replicas in the same instance are using part of the heap.
>
> - Having unnecessary options on your fields, although that’s more MMap
> space than
> heap.
>
> The problem basically is that all of Solr’s access is essentially random,
> so for
> performance reasons lots of stuff has to be in memory.
>
> That said, Solr hasn’t been as careful as it should be about using up
> memory,
> that’s ongoing.
>
> If you really want to know what’s using up memory, throw a heap analysis
> tool
> at it. That’ll give you a clue what’s hogging memory and you can go from
> there.
>
> > On Jun 29, 2020, at 1:48 PM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
> >
> > little nit picky note here, use 31gb, never 32.
> >
> > On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:
> >
> >> It figures it would happen again a couple hours after I suggested the
> issue
> >> might be resolved.  Just now, Solr stopped running.  I cleared the
> cache in
> >> my app a couple times around the time that it happened, so perhaps that
> was
> >> somehow too taxing for the server.  However, I've never allocated so
> much
> >> RAM to a website before, so it's odd that I'm getting these failures.
> My
> >> colleagues were astonished when I said people on the solr-user list were
> >> telling me I might need 32GB just for solr.
> >>
> >> I manage another project that uses Drupal + Solr, and we have a total of
> >> 8GB of RAM on that server and Solr never, ever stops.  I've been
> managing
> >> that site for years and never seen a Solr outage.  On that project,
> >> Drupal + Solr is OK with 8GB, but somehow this other project needs 64
> GB or
> >> more?
> >>
> >> "The thing that’s unsettling about this is that assuming you were
> hitting
> >> OOMs, and were running the OOM-killer script, you _should_ have had very
> >> clear evidence that that was the cause."
> >>
> >> How do I know if I'm running the OOM-killer script?
> >>
> >> Thank you.
> >>
> >> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >>> The thing that’s unsettling about this is that assuming you were
> hitting
> >>> OOMs,

Re: How to determine why solr stops running?

2020-06-29 Thread Jörn Franke
Maybe you can identify in the logfiles some critical queries?

What is the total size of the index?

What client are you using on the web app side? Are you reusing clients or 
create one new for every query.

> Am 29.06.2020 um 21:14 schrieb Ryan W :
> 
> On Mon, Jun 29, 2020 at 1:49 PM David Hastings 
> wrote:
> 
>> little nit picky note here, use 31gb, never 32.
> 
> 
> Good to know.
> 
> Just now I got this output from bin/solr status:
> 
>  "solr_home":"/opt/solr/server/solr",
>  "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy -
> 2019-05-28 23:37:48",
>  "startTime":"2020-06-29T17:35:13.966Z",
>  "uptime":"0 days, 1 hours, 32 minutes, 7 seconds",
>  "memory":"9.3 GB (%57.9) of 16 GB"}
> 
> That's the highest memory use I've seen.  Not sure if this indicates 16GB
> isn't enough.  Then I ran it again a couple minutes later and it was down
> to 598.3 MB.  I wonder what accounts for these wide swings.  I can't
> imagine if a few users are doing searches, suddenly it uses 9 GB of RAM.
> 
> 
>> On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:
>> 
>>> It figures it would happen again a couple hours after I suggested the
>> issue
>>> might be resolved.  Just now, Solr stopped running.  I cleared the cache
>> in
>>> my app a couple times around the time that it happened, so perhaps that
>> was
>>> somehow too taxing for the server.  However, I've never allocated so much
>>> RAM to a website before, so it's odd that I'm getting these failures.  My
>>> colleagues were astonished when I said people on the solr-user list were
>>> telling me I might need 32GB just for solr.
>>> 
>>> I manage another project that uses Drupal + Solr, and we have a total of
>>> 8GB of RAM on that server and Solr never, ever stops.  I've been managing
>>> that site for years and never seen a Solr outage.  On that project,
>>> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB
>> or
>>> more?
>>> 
>>> "The thing that’s unsettling about this is that assuming you were hitting
>>> OOMs, and were running the OOM-killer script, you _should_ have had very
>>> clear evidence that that was the cause."
>>> 
>>> How do I know if I'm running the OOM-killer script?
>>> 
>>> Thank you.
>>> 
>>> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson >> 
>>> wrote:
>>> 
 The thing that’s unsettling about this is that assuming you were
>> hitting
 OOMs,
 and were running the OOM-killer script, you _should_ have had very
>> clear
 evidence that that was the cause.
 
 If you were not running the killer script, the apologies for not asking
 about that
 in the first place. Java’s performance is unpredictable when OOMs
>> happen,
 which is the point of the killer script: at least Solr stops rather
>> than
>>> do
 something inexplicable.
 
 Best,
 Erick
 
> On Jun 29, 2020, at 11:52 AM, David Hastings <
 hastings.recurs...@gmail.com> wrote:
> 
> sometimes just throwing money/ram/ssd at the problem is just the best
> answer.
> 
> On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
> 
>> Thanks everyone. Just to give an update on this issue, I bumped the
>>> RAM
>> available to Solr up to 16GB a couple weeks ago, and haven’t had any
>> problem since.
>> 
>> 
>> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
>> hastings.recurs...@gmail.com>
>> wrote:
>> 
>>> me personally, around 290gb.  as much as we could shove into them
>>> 
>>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
 erickerick...@gmail.com
>>> 
>>> wrote:
>>> 
 How much physical RAM? A rule of thumb is that you should allocate
>>> no
>>> more
 than 25-50 percent of the total physical RAM to Solr. That's
>> cumulative,
 i.e. the sum of the heap allocations across all your JVMs should
>> be
>> below
 that percentage. See Uwe Schindler's mmapdirectiry blog...
 
 Shot in the dark...
 
 On Tue, Jun 16, 2020, 11:51 David Hastings <
>> hastings.recurs...@gmail.com
 
 wrote:
 
> To add to this, i generally have solr start with this:
> -Xms31000m-Xmx31000m
> 
> and the only other thing that runs on them are maria db gallera
>> cluster
> nodes that are not in use (aside from replication)
> 
> the 31gb is not an accident either, you dont want 32gb.
> 
> 
> On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey <
>> apa...@elyograg.org
 
 wrote:
> 
>> On 6/11/2020 11:52 AM, Ryan W wrote:
 I will check "dmesg" first, to find out any hardware error
>>> message.
>> 
>> 
>> 
>>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
>> score 9
 or
>>> sacrifice child
>>> [1521232.782908] Killed process 117529 (httpd), UID 48,
>> 

Re: How to determine why solr stops running?

2020-06-29 Thread Ryan W
On Mon, Jun 29, 2020 at 1:49 PM David Hastings 
wrote:

> little nit picky note here, use 31gb, never 32.


Good to know.

Just now I got this output from bin/solr status:

  "solr_home":"/opt/solr/server/solr",
  "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy -
2019-05-28 23:37:48",
  "startTime":"2020-06-29T17:35:13.966Z",
  "uptime":"0 days, 1 hours, 32 minutes, 7 seconds",
  "memory":"9.3 GB (%57.9) of 16 GB"}

That's the highest memory use I've seen.  Not sure if this indicates 16GB
isn't enough.  Then I ran it again a couple minutes later and it was down
to 598.3 MB.  I wonder what accounts for these wide swings.  I can't
imagine if a few users are doing searches, suddenly it uses 9 GB of RAM.


On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:
>
> > It figures it would happen again a couple hours after I suggested the
> issue
> > might be resolved.  Just now, Solr stopped running.  I cleared the cache
> in
> > my app a couple times around the time that it happened, so perhaps that
> was
> > somehow too taxing for the server.  However, I've never allocated so much
> > RAM to a website before, so it's odd that I'm getting these failures.  My
> > colleagues were astonished when I said people on the solr-user list were
> > telling me I might need 32GB just for solr.
> >
> > I manage another project that uses Drupal + Solr, and we have a total of
> > 8GB of RAM on that server and Solr never, ever stops.  I've been managing
> > that site for years and never seen a Solr outage.  On that project,
> > Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB
> or
> > more?
> >
> > "The thing that’s unsettling about this is that assuming you were hitting
> > OOMs, and were running the OOM-killer script, you _should_ have had very
> > clear evidence that that was the cause."
> >
> > How do I know if I'm running the OOM-killer script?
> >
> > Thank you.
> >
> > On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson  >
> > wrote:
> >
> > > The thing that’s unsettling about this is that assuming you were
> hitting
> > > OOMs,
> > > and were running the OOM-killer script, you _should_ have had very
> clear
> > > evidence that that was the cause.
> > >
> > > If you were not running the killer script, the apologies for not asking
> > > about that
> > > in the first place. Java’s performance is unpredictable when OOMs
> happen,
> > > which is the point of the killer script: at least Solr stops rather
> than
> > do
> > > something inexplicable.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Jun 29, 2020, at 11:52 AM, David Hastings <
> > > hastings.recurs...@gmail.com> wrote:
> > > >
> > > > sometimes just throwing money/ram/ssd at the problem is just the best
> > > > answer.
> > > >
> > > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
> > > >
> > > >> Thanks everyone. Just to give an update on this issue, I bumped the
> > RAM
> > > >> available to Solr up to 16GB a couple weeks ago, and haven’t had any
> > > >> problem since.
> > > >>
> > > >>
> > > >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
> > > >> hastings.recurs...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> me personally, around 290gb.  as much as we could shove into them
> > > >>>
> > > >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
> > > erickerick...@gmail.com
> > > >>>
> > > >>> wrote:
> > > >>>
> > >  How much physical RAM? A rule of thumb is that you should allocate
> > no
> > > >>> more
> > >  than 25-50 percent of the total physical RAM to Solr. That's
> > > >> cumulative,
> > >  i.e. the sum of the heap allocations across all your JVMs should
> be
> > > >> below
> > >  that percentage. See Uwe Schindler's mmapdirectiry blog...
> > > 
> > >  Shot in the dark...
> > > 
> > >  On Tue, Jun 16, 2020, 11:51 David Hastings <
> > > >> hastings.recurs...@gmail.com
> > > 
> > >  wrote:
> > > 
> > > > To add to this, i generally have solr start with this:
> > > > -Xms31000m-Xmx31000m
> > > >
> > > > and the only other thing that runs on them are maria db gallera
> > > >> cluster
> > > > nodes that are not in use (aside from replication)
> > > >
> > > > the 31gb is not an accident either, you dont want 32gb.
> > > >
> > > >
> > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey <
> apa...@elyograg.org
> > >
> > >  wrote:
> > > >
> > > >> On 6/11/2020 11:52 AM, Ryan W wrote:
> > >  I will check "dmesg" first, to find out any hardware error
> > > >>> message.
> > > >>
> > > >> 
> > > >>
> > > >>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
> > > >> score 9
> > >  or
> > > >>> sacrifice child
> > > >>> [1521232.782908] Killed process 117529 (httpd), UID 48,
> > > >> total-vm:675824kB,
> > > >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> > > >>>
> > > >>> Is this a relevant "Out of memory" message?  Does this suggest
> an
> > > >>> OOM
> > > >>> 

Re: How to determine why solr stops running?

2020-06-29 Thread Erick Erickson
ps aux | grep solr

should show you all the parameters Solr is running with, as would the
admin screen. You should see something like:

-XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh

And there should be some logs laying around if that was the case
similar to:
$SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log

As for memory, It Depends (tm). There are configurations
you can make choices about that will affect the heap requirements.
You can’t really draw comparisons between different projects. Your
Drupal + Solr app has how many documents? Indexed how? Searched
how? .vs. this one.

The usual suspect for configuration settings that are responsible 
include:

- filterCache size too large. Each filterCache entry is bounded by
maxDoc/8 bytes. I’ve seen people set this to over 1M…

- using non-docValues for fields used for sorting, grouping, function queries
or faceting. Solr will uninvert the field on the heap, whereas if you have
specified docValues=true, the memory is out in OS memory space rather than heap.

- People just putting too many docs in a collection in a single JVM in 
aggregate.
All replicas in the same instance are using part of the heap.

- Having unnecessary options on your fields, although that’s more MMap space 
than
heap.

The problem basically is that all of Solr’s access is essentially random, so for
performance reasons lots of stuff has to be in memory.

That said, Solr hasn’t been as careful as it should be about using up memory,
that’s ongoing.

If you really want to know what’s using up memory, throw a heap analysis tool
at it. That’ll give you a clue what’s hogging memory and you can go from there.

> On Jun 29, 2020, at 1:48 PM, David Hastings  
> wrote:
> 
> little nit picky note here, use 31gb, never 32.
> 
> On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:
> 
>> It figures it would happen again a couple hours after I suggested the issue
>> might be resolved.  Just now, Solr stopped running.  I cleared the cache in
>> my app a couple times around the time that it happened, so perhaps that was
>> somehow too taxing for the server.  However, I've never allocated so much
>> RAM to a website before, so it's odd that I'm getting these failures.  My
>> colleagues were astonished when I said people on the solr-user list were
>> telling me I might need 32GB just for solr.
>> 
>> I manage another project that uses Drupal + Solr, and we have a total of
>> 8GB of RAM on that server and Solr never, ever stops.  I've been managing
>> that site for years and never seen a Solr outage.  On that project,
>> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or
>> more?
>> 
>> "The thing that’s unsettling about this is that assuming you were hitting
>> OOMs, and were running the OOM-killer script, you _should_ have had very
>> clear evidence that that was the cause."
>> 
>> How do I know if I'm running the OOM-killer script?
>> 
>> Thank you.
>> 
>> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson 
>> wrote:
>> 
>>> The thing that’s unsettling about this is that assuming you were hitting
>>> OOMs,
>>> and were running the OOM-killer script, you _should_ have had very clear
>>> evidence that that was the cause.
>>> 
>>> If you were not running the killer script, the apologies for not asking
>>> about that
>>> in the first place. Java’s performance is unpredictable when OOMs happen,
>>> which is the point of the killer script: at least Solr stops rather than
>> do
>>> something inexplicable.
>>> 
>>> Best,
>>> Erick
>>> 
 On Jun 29, 2020, at 11:52 AM, David Hastings <
>>> hastings.recurs...@gmail.com> wrote:
 
 sometimes just throwing money/ram/ssd at the problem is just the best
 answer.
 
 On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
 
> Thanks everyone. Just to give an update on this issue, I bumped the
>> RAM
> available to Solr up to 16GB a couple weeks ago, and haven’t had any
> problem since.
> 
> 
> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
> hastings.recurs...@gmail.com>
> wrote:
> 
>> me personally, around 290gb.  as much as we could shove into them
>> 
>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
>>> erickerick...@gmail.com
>> 
>> wrote:
>> 
>>> How much physical RAM? A rule of thumb is that you should allocate
>> no
>> more
>>> than 25-50 percent of the total physical RAM to Solr. That's
> cumulative,
>>> i.e. the sum of the heap allocations across all your JVMs should be
> below
>>> that percentage. See Uwe Schindler's mmapdirectiry blog...
>>> 
>>> Shot in the dark...
>>> 
>>> On Tue, Jun 16, 2020, 11:51 David Hastings <
> hastings.recurs...@gmail.com
>>> 
>>> wrote:
>>> 
 To add to this, i generally have solr start with this:
 -Xms31000m-Xmx31000m
 
 and the only other thing that runs on them are maria db gallera
> cluster
 nodes that are not in 

Re: How to determine why solr stops running?

2020-06-29 Thread David Hastings
little nit picky note here, use 31gb, never 32.

On Mon, Jun 29, 2020 at 1:45 PM Ryan W  wrote:

> It figures it would happen again a couple hours after I suggested the issue
> might be resolved.  Just now, Solr stopped running.  I cleared the cache in
> my app a couple times around the time that it happened, so perhaps that was
> somehow too taxing for the server.  However, I've never allocated so much
> RAM to a website before, so it's odd that I'm getting these failures.  My
> colleagues were astonished when I said people on the solr-user list were
> telling me I might need 32GB just for solr.
>
> I manage another project that uses Drupal + Solr, and we have a total of
> 8GB of RAM on that server and Solr never, ever stops.  I've been managing
> that site for years and never seen a Solr outage.  On that project,
> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or
> more?
>
> "The thing that’s unsettling about this is that assuming you were hitting
> OOMs, and were running the OOM-killer script, you _should_ have had very
> clear evidence that that was the cause."
>
> How do I know if I'm running the OOM-killer script?
>
> Thank you.
>
> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson 
> wrote:
>
> > The thing that’s unsettling about this is that assuming you were hitting
> > OOMs,
> > and were running the OOM-killer script, you _should_ have had very clear
> > evidence that that was the cause.
> >
> > If you were not running the killer script, the apologies for not asking
> > about that
> > in the first place. Java’s performance is unpredictable when OOMs happen,
> > which is the point of the killer script: at least Solr stops rather than
> do
> > something inexplicable.
> >
> > Best,
> > Erick
> >
> > > On Jun 29, 2020, at 11:52 AM, David Hastings <
> > hastings.recurs...@gmail.com> wrote:
> > >
> > > sometimes just throwing money/ram/ssd at the problem is just the best
> > > answer.
> > >
> > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
> > >
> > >> Thanks everyone. Just to give an update on this issue, I bumped the
> RAM
> > >> available to Solr up to 16GB a couple weeks ago, and haven’t had any
> > >> problem since.
> > >>
> > >>
> > >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
> > >> hastings.recurs...@gmail.com>
> > >> wrote:
> > >>
> > >>> me personally, around 290gb.  as much as we could shove into them
> > >>>
> > >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
> > erickerick...@gmail.com
> > >>>
> > >>> wrote:
> > >>>
> >  How much physical RAM? A rule of thumb is that you should allocate
> no
> > >>> more
> >  than 25-50 percent of the total physical RAM to Solr. That's
> > >> cumulative,
> >  i.e. the sum of the heap allocations across all your JVMs should be
> > >> below
> >  that percentage. See Uwe Schindler's mmapdirectiry blog...
> > 
> >  Shot in the dark...
> > 
> >  On Tue, Jun 16, 2020, 11:51 David Hastings <
> > >> hastings.recurs...@gmail.com
> > 
> >  wrote:
> > 
> > > To add to this, i generally have solr start with this:
> > > -Xms31000m-Xmx31000m
> > >
> > > and the only other thing that runs on them are maria db gallera
> > >> cluster
> > > nodes that are not in use (aside from replication)
> > >
> > > the 31gb is not an accident either, you dont want 32gb.
> > >
> > >
> > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey  >
> >  wrote:
> > >
> > >> On 6/11/2020 11:52 AM, Ryan W wrote:
> >  I will check "dmesg" first, to find out any hardware error
> > >>> message.
> > >>
> > >> 
> > >>
> > >>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
> > >> score 9
> >  or
> > >>> sacrifice child
> > >>> [1521232.782908] Killed process 117529 (httpd), UID 48,
> > >> total-vm:675824kB,
> > >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> > >>>
> > >>> Is this a relevant "Out of memory" message?  Does this suggest an
> > >>> OOM
> > >>> situation is the culprit?
> > >>
> > >> Because this was in the "dmesg" output, it indicates that it is
> the
> > >> operating system killing programs because the *system* doesn't
> have
> > >>> any
> > >> memory left.  It wasn't Java that did this, and it wasn't Solr
> that
> > >>> was
> > >> killed.  It very well could have been Solr that was killed at
> > >> another
> > >> time, though.
> > >>
> > >> The process that it killed this time is named httpd ... which is
> > >> most
> > >> likely the Apache webserver.  Because the UID is 48, this is
> > >> probably
> >  an
> > >> OS derived from Redhat, where the "apache" user has UID and GID 48
> > >> by
> > >> default.  Apache with its default config can be VERY memory hungry
> > >>> when
> > >> it gets busy.
> > >>
> > >>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> > >>
> > >> This says that you started Solr with the 

Re: How to determine why solr stops running?

2020-06-29 Thread Ryan W
It figures it would happen again a couple hours after I suggested the issue
might be resolved.  Just now, Solr stopped running.  I cleared the cache in
my app a couple times around the time that it happened, so perhaps that was
somehow too taxing for the server.  However, I've never allocated so much
RAM to a website before, so it's odd that I'm getting these failures.  My
colleagues were astonished when I said people on the solr-user list were
telling me I might need 32GB just for solr.

I manage another project that uses Drupal + Solr, and we have a total of
8GB of RAM on that server and Solr never, ever stops.  I've been managing
that site for years and never seen a Solr outage.  On that project,
Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or
more?

"The thing that’s unsettling about this is that assuming you were hitting
OOMs, and were running the OOM-killer script, you _should_ have had very
clear evidence that that was the cause."

How do I know if I'm running the OOM-killer script?

Thank you.

On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson 
wrote:

> The thing that’s unsettling about this is that assuming you were hitting
> OOMs,
> and were running the OOM-killer script, you _should_ have had very clear
> evidence that that was the cause.
>
> If you were not running the killer script, the apologies for not asking
> about that
> in the first place. Java’s performance is unpredictable when OOMs happen,
> which is the point of the killer script: at least Solr stops rather than do
> something inexplicable.
>
> Best,
> Erick
>
> > On Jun 29, 2020, at 11:52 AM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
> >
> > sometimes just throwing money/ram/ssd at the problem is just the best
> > answer.
> >
> > On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
> >
> >> Thanks everyone. Just to give an update on this issue, I bumped the RAM
> >> available to Solr up to 16GB a couple weeks ago, and haven’t had any
> >> problem since.
> >>
> >>
> >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
> >> hastings.recurs...@gmail.com>
> >> wrote:
> >>
> >>> me personally, around 290gb.  as much as we could shove into them
> >>>
> >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
> erickerick...@gmail.com
> >>>
> >>> wrote:
> >>>
>  How much physical RAM? A rule of thumb is that you should allocate no
> >>> more
>  than 25-50 percent of the total physical RAM to Solr. That's
> >> cumulative,
>  i.e. the sum of the heap allocations across all your JVMs should be
> >> below
>  that percentage. See Uwe Schindler's mmapdirectiry blog...
> 
>  Shot in the dark...
> 
>  On Tue, Jun 16, 2020, 11:51 David Hastings <
> >> hastings.recurs...@gmail.com
> 
>  wrote:
> 
> > To add to this, i generally have solr start with this:
> > -Xms31000m-Xmx31000m
> >
> > and the only other thing that runs on them are maria db gallera
> >> cluster
> > nodes that are not in use (aside from replication)
> >
> > the 31gb is not an accident either, you dont want 32gb.
> >
> >
> > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey 
>  wrote:
> >
> >> On 6/11/2020 11:52 AM, Ryan W wrote:
>  I will check "dmesg" first, to find out any hardware error
> >>> message.
> >>
> >> 
> >>
> >>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
> >> score 9
>  or
> >>> sacrifice child
> >>> [1521232.782908] Killed process 117529 (httpd), UID 48,
> >> total-vm:675824kB,
> >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> >>>
> >>> Is this a relevant "Out of memory" message?  Does this suggest an
> >>> OOM
> >>> situation is the culprit?
> >>
> >> Because this was in the "dmesg" output, it indicates that it is the
> >> operating system killing programs because the *system* doesn't have
> >>> any
> >> memory left.  It wasn't Java that did this, and it wasn't Solr that
> >>> was
> >> killed.  It very well could have been Solr that was killed at
> >> another
> >> time, though.
> >>
> >> The process that it killed this time is named httpd ... which is
> >> most
> >> likely the Apache webserver.  Because the UID is 48, this is
> >> probably
>  an
> >> OS derived from Redhat, where the "apache" user has UID and GID 48
> >> by
> >> default.  Apache with its default config can be VERY memory hungry
> >>> when
> >> it gets busy.
> >>
> >>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> >>
> >> This says that you started Solr with the default 512MB heap.  Which
> >>> is
> >> VERY VERY small.  The default is small so that Solr will start on
> >> virtually any hardware.  Almost every user must increase the heap
> >>> size.
> >> And because the OS is killing processes, it is likely that the
> >> system
> >> does not have enough memory installed for what you have running on
> >>> it.

Re: How to determine why solr stops running?

2020-06-29 Thread Erick Erickson
The thing that’s unsettling about this is that assuming you were hitting OOMs,
and were running the OOM-killer script, you _should_ have had very clear
evidence that that was the cause.

If you were not running the killer script, the apologies for not asking about 
that
in the first place. Java’s performance is unpredictable when OOMs happen,
which is the point of the killer script: at least Solr stops rather than do
something inexplicable.

Best,
Erick

> On Jun 29, 2020, at 11:52 AM, David Hastings  
> wrote:
> 
> sometimes just throwing money/ram/ssd at the problem is just the best
> answer.
> 
> On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:
> 
>> Thanks everyone. Just to give an update on this issue, I bumped the RAM
>> available to Solr up to 16GB a couple weeks ago, and haven’t had any
>> problem since.
>> 
>> 
>> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
>> hastings.recurs...@gmail.com>
>> wrote:
>> 
>>> me personally, around 290gb.  as much as we could shove into them
>>> 
>>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson >> 
>>> wrote:
>>> 
 How much physical RAM? A rule of thumb is that you should allocate no
>>> more
 than 25-50 percent of the total physical RAM to Solr. That's
>> cumulative,
 i.e. the sum of the heap allocations across all your JVMs should be
>> below
 that percentage. See Uwe Schindler's mmapdirectiry blog...
 
 Shot in the dark...
 
 On Tue, Jun 16, 2020, 11:51 David Hastings <
>> hastings.recurs...@gmail.com
 
 wrote:
 
> To add to this, i generally have solr start with this:
> -Xms31000m-Xmx31000m
> 
> and the only other thing that runs on them are maria db gallera
>> cluster
> nodes that are not in use (aside from replication)
> 
> the 31gb is not an accident either, you dont want 32gb.
> 
> 
> On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey 
 wrote:
> 
>> On 6/11/2020 11:52 AM, Ryan W wrote:
 I will check "dmesg" first, to find out any hardware error
>>> message.
>> 
>> 
>> 
>>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
>> score 9
 or
>>> sacrifice child
>>> [1521232.782908] Killed process 117529 (httpd), UID 48,
>> total-vm:675824kB,
>>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
>>> 
>>> Is this a relevant "Out of memory" message?  Does this suggest an
>>> OOM
>>> situation is the culprit?
>> 
>> Because this was in the "dmesg" output, it indicates that it is the
>> operating system killing programs because the *system* doesn't have
>>> any
>> memory left.  It wasn't Java that did this, and it wasn't Solr that
>>> was
>> killed.  It very well could have been Solr that was killed at
>> another
>> time, though.
>> 
>> The process that it killed this time is named httpd ... which is
>> most
>> likely the Apache webserver.  Because the UID is 48, this is
>> probably
 an
>> OS derived from Redhat, where the "apache" user has UID and GID 48
>> by
>> default.  Apache with its default config can be VERY memory hungry
>>> when
>> it gets busy.
>> 
>>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
>> 
>> This says that you started Solr with the default 512MB heap.  Which
>>> is
>> VERY VERY small.  The default is small so that Solr will start on
>> virtually any hardware.  Almost every user must increase the heap
>>> size.
>> And because the OS is killing processes, it is likely that the
>> system
>> does not have enough memory installed for what you have running on
>>> it.
>> 
>> It is generally not a good idea to share the server hardware
>> between
>> Solr and other software, unless the system has a lot of spare
 resources,
>> memory in particular.
>> 
>> Thanks,
>> Shawn
>> 
> 
 
>>> 
>> 



Re: How to determine why solr stops running?

2020-06-29 Thread David Hastings
sometimes just throwing money/ram/ssd at the problem is just the best
answer.

On Mon, Jun 29, 2020 at 11:38 AM Ryan W  wrote:

> Thanks everyone. Just to give an update on this issue, I bumped the RAM
> available to Solr up to 16GB a couple weeks ago, and haven’t had any
> problem since.
>
>
> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
> hastings.recurs...@gmail.com>
> wrote:
>
> > me personally, around 290gb.  as much as we could shove into them
> >
> > On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson  >
> > wrote:
> >
> > > How much physical RAM? A rule of thumb is that you should allocate no
> > more
> > > than 25-50 percent of the total physical RAM to Solr. That's
> cumulative,
> > > i.e. the sum of the heap allocations across all your JVMs should be
> below
> > > that percentage. See Uwe Schindler's mmapdirectiry blog...
> > >
> > > Shot in the dark...
> > >
> > > On Tue, Jun 16, 2020, 11:51 David Hastings <
> hastings.recurs...@gmail.com
> > >
> > > wrote:
> > >
> > > > To add to this, i generally have solr start with this:
> > > > -Xms31000m-Xmx31000m
> > > >
> > > > and the only other thing that runs on them are maria db gallera
> cluster
> > > > nodes that are not in use (aside from replication)
> > > >
> > > > the 31gb is not an accident either, you dont want 32gb.
> > > >
> > > >
> > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey 
> > > wrote:
> > > >
> > > > > On 6/11/2020 11:52 AM, Ryan W wrote:
> > > > > >> I will check "dmesg" first, to find out any hardware error
> > message.
> > > > >
> > > > > 
> > > > >
> > > > > > [1521232.781801] Out of memory: Kill process 117529 (httpd)
> score 9
> > > or
> > > > > > sacrifice child
> > > > > > [1521232.782908] Killed process 117529 (httpd), UID 48,
> > > > > total-vm:675824kB,
> > > > > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> > > > > >
> > > > > > Is this a relevant "Out of memory" message?  Does this suggest an
> > OOM
> > > > > > situation is the culprit?
> > > > >
> > > > > Because this was in the "dmesg" output, it indicates that it is the
> > > > > operating system killing programs because the *system* doesn't have
> > any
> > > > > memory left.  It wasn't Java that did this, and it wasn't Solr that
> > was
> > > > > killed.  It very well could have been Solr that was killed at
> another
> > > > > time, though.
> > > > >
> > > > > The process that it killed this time is named httpd ... which is
> most
> > > > > likely the Apache webserver.  Because the UID is 48, this is
> probably
> > > an
> > > > > OS derived from Redhat, where the "apache" user has UID and GID 48
> by
> > > > > default.  Apache with its default config can be VERY memory hungry
> > when
> > > > > it gets busy.
> > > > >
> > > > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> > > > >
> > > > > This says that you started Solr with the default 512MB heap.  Which
> > is
> > > > > VERY VERY small.  The default is small so that Solr will start on
> > > > > virtually any hardware.  Almost every user must increase the heap
> > size.
> > > > > And because the OS is killing processes, it is likely that the
> system
> > > > > does not have enough memory installed for what you have running on
> > it.
> > > > >
> > > > > It is generally not a good idea to share the server hardware
> between
> > > > > Solr and other software, unless the system has a lot of spare
> > > resources,
> > > > > memory in particular.
> > > > >
> > > > > Thanks,
> > > > > Shawn
> > > > >
> > > >
> > >
> >
>


Re: How to determine why solr stops running?

2020-06-29 Thread Ryan W
Thanks everyone. Just to give an update on this issue, I bumped the RAM
available to Solr up to 16GB a couple weeks ago, and haven’t had any
problem since.


On Tue, Jun 16, 2020 at 1:00 PM David Hastings 
wrote:

> me personally, around 290gb.  as much as we could shove into them
>
> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson 
> wrote:
>
> > How much physical RAM? A rule of thumb is that you should allocate no
> more
> > than 25-50 percent of the total physical RAM to Solr. That's cumulative,
> > i.e. the sum of the heap allocations across all your JVMs should be below
> > that percentage. See Uwe Schindler's mmapdirectiry blog...
> >
> > Shot in the dark...
> >
> > On Tue, Jun 16, 2020, 11:51 David Hastings  >
> > wrote:
> >
> > > To add to this, i generally have solr start with this:
> > > -Xms31000m-Xmx31000m
> > >
> > > and the only other thing that runs on them are maria db gallera cluster
> > > nodes that are not in use (aside from replication)
> > >
> > > the 31gb is not an accident either, you dont want 32gb.
> > >
> > >
> > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey 
> > wrote:
> > >
> > > > On 6/11/2020 11:52 AM, Ryan W wrote:
> > > > >> I will check "dmesg" first, to find out any hardware error
> message.
> > > >
> > > > 
> > > >
> > > > > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9
> > or
> > > > > sacrifice child
> > > > > [1521232.782908] Killed process 117529 (httpd), UID 48,
> > > > total-vm:675824kB,
> > > > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> > > > >
> > > > > Is this a relevant "Out of memory" message?  Does this suggest an
> OOM
> > > > > situation is the culprit?
> > > >
> > > > Because this was in the "dmesg" output, it indicates that it is the
> > > > operating system killing programs because the *system* doesn't have
> any
> > > > memory left.  It wasn't Java that did this, and it wasn't Solr that
> was
> > > > killed.  It very well could have been Solr that was killed at another
> > > > time, though.
> > > >
> > > > The process that it killed this time is named httpd ... which is most
> > > > likely the Apache webserver.  Because the UID is 48, this is probably
> > an
> > > > OS derived from Redhat, where the "apache" user has UID and GID 48 by
> > > > default.  Apache with its default config can be VERY memory hungry
> when
> > > > it gets busy.
> > > >
> > > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> > > >
> > > > This says that you started Solr with the default 512MB heap.  Which
> is
> > > > VERY VERY small.  The default is small so that Solr will start on
> > > > virtually any hardware.  Almost every user must increase the heap
> size.
> > > > And because the OS is killing processes, it is likely that the system
> > > > does not have enough memory installed for what you have running on
> it.
> > > >
> > > > It is generally not a good idea to share the server hardware between
> > > > Solr and other software, unless the system has a lot of spare
> > resources,
> > > > memory in particular.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > >
> >
>


Re: How to determine why solr stops running?

2020-06-16 Thread David Hastings
me personally, around 290gb.  as much as we could shove into them

On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson 
wrote:

> How much physical RAM? A rule of thumb is that you should allocate no more
> than 25-50 percent of the total physical RAM to Solr. That's cumulative,
> i.e. the sum of the heap allocations across all your JVMs should be below
> that percentage. See Uwe Schindler's mmapdirectiry blog...
>
> Shot in the dark...
>
> On Tue, Jun 16, 2020, 11:51 David Hastings 
> wrote:
>
> > To add to this, i generally have solr start with this:
> > -Xms31000m-Xmx31000m
> >
> > and the only other thing that runs on them are maria db gallera cluster
> > nodes that are not in use (aside from replication)
> >
> > the 31gb is not an accident either, you dont want 32gb.
> >
> >
> > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey 
> wrote:
> >
> > > On 6/11/2020 11:52 AM, Ryan W wrote:
> > > >> I will check "dmesg" first, to find out any hardware error message.
> > >
> > > 
> > >
> > > > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9
> or
> > > > sacrifice child
> > > > [1521232.782908] Killed process 117529 (httpd), UID 48,
> > > total-vm:675824kB,
> > > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> > > >
> > > > Is this a relevant "Out of memory" message?  Does this suggest an OOM
> > > > situation is the culprit?
> > >
> > > Because this was in the "dmesg" output, it indicates that it is the
> > > operating system killing programs because the *system* doesn't have any
> > > memory left.  It wasn't Java that did this, and it wasn't Solr that was
> > > killed.  It very well could have been Solr that was killed at another
> > > time, though.
> > >
> > > The process that it killed this time is named httpd ... which is most
> > > likely the Apache webserver.  Because the UID is 48, this is probably
> an
> > > OS derived from Redhat, where the "apache" user has UID and GID 48 by
> > > default.  Apache with its default config can be VERY memory hungry when
> > > it gets busy.
> > >
> > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> > >
> > > This says that you started Solr with the default 512MB heap.  Which is
> > > VERY VERY small.  The default is small so that Solr will start on
> > > virtually any hardware.  Almost every user must increase the heap size.
> > > And because the OS is killing processes, it is likely that the system
> > > does not have enough memory installed for what you have running on it.
> > >
> > > It is generally not a good idea to share the server hardware between
> > > Solr and other software, unless the system has a lot of spare
> resources,
> > > memory in particular.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
>


Re: How to determine why solr stops running?

2020-06-16 Thread Erick Erickson
How much physical RAM? A rule of thumb is that you should allocate no more
than 25-50 percent of the total physical RAM to Solr. That's cumulative,
i.e. the sum of the heap allocations across all your JVMs should be below
that percentage. See Uwe Schindler's mmapdirectiry blog...

Shot in the dark...

On Tue, Jun 16, 2020, 11:51 David Hastings 
wrote:

> To add to this, i generally have solr start with this:
> -Xms31000m-Xmx31000m
>
> and the only other thing that runs on them are maria db gallera cluster
> nodes that are not in use (aside from replication)
>
> the 31gb is not an accident either, you dont want 32gb.
>
>
> On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey  wrote:
>
> > On 6/11/2020 11:52 AM, Ryan W wrote:
> > >> I will check "dmesg" first, to find out any hardware error message.
> >
> > 
> >
> > > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
> > > sacrifice child
> > > [1521232.782908] Killed process 117529 (httpd), UID 48,
> > total-vm:675824kB,
> > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> > >
> > > Is this a relevant "Out of memory" message?  Does this suggest an OOM
> > > situation is the culprit?
> >
> > Because this was in the "dmesg" output, it indicates that it is the
> > operating system killing programs because the *system* doesn't have any
> > memory left.  It wasn't Java that did this, and it wasn't Solr that was
> > killed.  It very well could have been Solr that was killed at another
> > time, though.
> >
> > The process that it killed this time is named httpd ... which is most
> > likely the Apache webserver.  Because the UID is 48, this is probably an
> > OS derived from Redhat, where the "apache" user has UID and GID 48 by
> > default.  Apache with its default config can be VERY memory hungry when
> > it gets busy.
> >
> > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> >
> > This says that you started Solr with the default 512MB heap.  Which is
> > VERY VERY small.  The default is small so that Solr will start on
> > virtually any hardware.  Almost every user must increase the heap size.
> > And because the OS is killing processes, it is likely that the system
> > does not have enough memory installed for what you have running on it.
> >
> > It is generally not a good idea to share the server hardware between
> > Solr and other software, unless the system has a lot of spare resources,
> > memory in particular.
> >
> > Thanks,
> > Shawn
> >
>


Re: How to determine why solr stops running?

2020-06-16 Thread David Hastings
To add to this, i generally have solr start with this:
-Xms31000m-Xmx31000m

and the only other thing that runs on them are maria db gallera cluster
nodes that are not in use (aside from replication)

the 31gb is not an accident either, you dont want 32gb.


On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey  wrote:

> On 6/11/2020 11:52 AM, Ryan W wrote:
> >> I will check "dmesg" first, to find out any hardware error message.
>
> 
>
> > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
> > sacrifice child
> > [1521232.782908] Killed process 117529 (httpd), UID 48,
> total-vm:675824kB,
> > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> >
> > Is this a relevant "Out of memory" message?  Does this suggest an OOM
> > situation is the culprit?
>
> Because this was in the "dmesg" output, it indicates that it is the
> operating system killing programs because the *system* doesn't have any
> memory left.  It wasn't Java that did this, and it wasn't Solr that was
> killed.  It very well could have been Solr that was killed at another
> time, though.
>
> The process that it killed this time is named httpd ... which is most
> likely the Apache webserver.  Because the UID is 48, this is probably an
> OS derived from Redhat, where the "apache" user has UID and GID 48 by
> default.  Apache with its default config can be VERY memory hungry when
> it gets busy.
>
> > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
>
> This says that you started Solr with the default 512MB heap.  Which is
> VERY VERY small.  The default is small so that Solr will start on
> virtually any hardware.  Almost every user must increase the heap size.
> And because the OS is killing processes, it is likely that the system
> does not have enough memory installed for what you have running on it.
>
> It is generally not a good idea to share the server hardware between
> Solr and other software, unless the system has a lot of spare resources,
> memory in particular.
>
> Thanks,
> Shawn
>


Re: How to determine why solr stops running?

2020-06-16 Thread Shawn Heisey

On 6/11/2020 11:52 AM, Ryan W wrote:

I will check "dmesg" first, to find out any hardware error message.





[1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
sacrifice child
[1521232.782908] Killed process 117529 (httpd), UID 48, total-vm:675824kB,
anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB

Is this a relevant "Out of memory" message?  Does this suggest an OOM
situation is the culprit?


Because this was in the "dmesg" output, it indicates that it is the 
operating system killing programs because the *system* doesn't have any 
memory left.  It wasn't Java that did this, and it wasn't Solr that was 
killed.  It very well could have been Solr that was killed at another 
time, though.


The process that it killed this time is named httpd ... which is most 
likely the Apache webserver.  Because the UID is 48, this is probably an 
OS derived from Redhat, where the "apache" user has UID and GID 48 by 
default.  Apache with its default config can be VERY memory hungry when 
it gets busy.



-XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912


This says that you started Solr with the default 512MB heap.  Which is 
VERY VERY small.  The default is small so that Solr will start on 
virtually any hardware.  Almost every user must increase the heap size. 
And because the OS is killing processes, it is likely that the system 
does not have enough memory installed for what you have running on it.


It is generally not a good idea to share the server hardware between 
Solr and other software, unless the system has a lot of spare resources, 
memory in particular.


Thanks,
Shawn


Re: How to determine why solr stops running?

2020-06-15 Thread Ryan W
17728 -XX:MaxTenuringThreshold=8
>> >>>> -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
>> >>>> -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
>> >>>> -XX:-OmitStackTraceInFastThrow
>> >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
>> >>> /opt/solr/server/logs
>> >>>> -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
>> >>>> -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
>> >>>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>> >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
>> >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
>> >>>> -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
>> >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
>> >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
>> -XX:+UseGCLogFileRotation
>> >>>> -XX:+UseParNewGC
>> >>>>
>> >>>> Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh".
>> >>> But I
>> >>>> think this is just a setting that indicates what to do in case of an
>> >>> OOM.
>> >>>> And if I look in that oom_solr.sh file, I see it would write an entry
>> >>> to a
>> >>>> solr_oom_kill log. And there is no such log in the logs directory.
>> >>>>
>> >>>> Many thanks.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>> Then use some system admin tools to monitor that server,
>> >>>>> for instance, top, vmstat, lsof, iostat ... or simply install some
>> nice
>> >>>>> free monitoring tool into this system, like monit, monitorix,
>> nagios.
>> >>>>> Good luck!
>> >>>>>
>> >>>>> 
>> >>>>> From: Ryan W 
>> >>>>> Sent: Thursday, June 11, 2020 2:13 AM
>> >>>>> To: solr-user@lucene.apache.org 
>> >>>>> Subject: Re: How to determine why solr stops running?
>> >>>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> People keep suggesting I check the logs for errors.  What do those
>> >>> errors
>> >>>>> look like?  Does anyone have examples of the text of a Solr oom
>> >>> error?  Or
>> >>>>> the text of any other errors I should be looking for the next time
>> solr
>> >>>>> fails?  Are there phrases I should grep for in the logs?  Should I
>> be
>> >>>>> looking in the Solr logs for an OOM error, or in the Apache logs?
>> >>>>>
>> >>>>> There is nothing failing on the server except for solr -- at least
>> not
>> >>> that
>> >>>>> I can see.  There is no apparent problem with the hardware or
>> anything
>> >>> else
>> >>>>> on the server.  The OS is Red Hat Enterprise Linux. The server has
>> 16
>> >>> GB of
>> >>>>> RAM and hosts one website that does not get a huge amount of
>> traffic.
>> >>>>>
>> >>>>> When the start command is given to solr, does it first check to see
>> if
>> >>> solr
>> >>>>> is running, or does it always start solr whether it is already
>> running
>> >>> or
>> >>>>> not?
>> >>>>>
>> >>>>> Many thanks!
>> >>>>> Ryan
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson <
>> erickerick...@gmail.com
>> >>>>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> To add to what Dave said, if you have a particular machine that’s
>> >>> prone
>> >>>>> to
>> >>>>>> suddenly stopping, that’s usually a red flag that you should
>> seriously
>> >>>>>> think about hardware issues.
>> >>>>>>
>> >>>>>> If the problem strikes different machines, then I agree with Shawn
>> >>> that
>> >>>>>> the first thing I’d be suspicious of is OOM 

Re: How to determine why solr stops running?

2020-06-15 Thread Ryan W
; -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
> >>>> -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
> >>>> -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
> >>>> -XX:-OmitStackTraceInFastThrow
> >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
> >>> /opt/solr/server/logs
> >>>> -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
> >>>> -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
> >>>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> >>>> -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
> >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
> -XX:+UseGCLogFileRotation
> >>>> -XX:+UseParNewGC
> >>>>
> >>>> Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh".
> >>> But I
> >>>> think this is just a setting that indicates what to do in case of an
> >>> OOM.
> >>>> And if I look in that oom_solr.sh file, I see it would write an entry
> >>> to a
> >>>> solr_oom_kill log. And there is no such log in the logs directory.
> >>>>
> >>>> Many thanks.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> Then use some system admin tools to monitor that server,
> >>>>> for instance, top, vmstat, lsof, iostat ... or simply install some
> nice
> >>>>> free monitoring tool into this system, like monit, monitorix, nagios.
> >>>>> Good luck!
> >>>>>
> >>>>> 
> >>>>> From: Ryan W 
> >>>>> Sent: Thursday, June 11, 2020 2:13 AM
> >>>>> To: solr-user@lucene.apache.org 
> >>>>> Subject: Re: How to determine why solr stops running?
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> People keep suggesting I check the logs for errors.  What do those
> >>> errors
> >>>>> look like?  Does anyone have examples of the text of a Solr oom
> >>> error?  Or
> >>>>> the text of any other errors I should be looking for the next time
> solr
> >>>>> fails?  Are there phrases I should grep for in the logs?  Should I be
> >>>>> looking in the Solr logs for an OOM error, or in the Apache logs?
> >>>>>
> >>>>> There is nothing failing on the server except for solr -- at least
> not
> >>> that
> >>>>> I can see.  There is no apparent problem with the hardware or
> anything
> >>> else
> >>>>> on the server.  The OS is Red Hat Enterprise Linux. The server has 16
> >>> GB of
> >>>>> RAM and hosts one website that does not get a huge amount of traffic.
> >>>>>
> >>>>> When the start command is given to solr, does it first check to see
> if
> >>> solr
> >>>>> is running, or does it always start solr whether it is already
> running
> >>> or
> >>>>> not?
> >>>>>
> >>>>> Many thanks!
> >>>>> Ryan
> >>>>>
> >>>>>
> >>>>> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson <
> erickerick...@gmail.com
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> To add to what Dave said, if you have a particular machine that’s
> >>> prone
> >>>>> to
> >>>>>> suddenly stopping, that’s usually a red flag that you should
> seriously
> >>>>>> think about hardware issues.
> >>>>>>
> >>>>>> If the problem strikes different machines, then I agree with Shawn
> >>> that
> >>>>>> the first thing I’d be suspicious of is OOM errors.
> >>>>>>
> >>>>>> FWIW,
> >>>>>> Erick
> >>>>>>
> >>>>>>> On Jun 9, 2020, at 6:05 AM, Dave 
> >>> wrote:
> >>>>>>>
> >>>>>>> I’ll add that whenever I’ve had a solr instance shut down, for me
> >>&g

Re: How to determine why solr stops running?

2020-06-15 Thread Jörn Franke
ngDistribution -XX:SurvivorRatio=4
>>>> -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
>>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
>>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
>>>> -XX:+UseParNewGC
>>>> 
>>>> Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh".
>>> But I
>>>> think this is just a setting that indicates what to do in case of an
>>> OOM.
>>>> And if I look in that oom_solr.sh file, I see it would write an entry
>>> to a
>>>> solr_oom_kill log. And there is no such log in the logs directory.
>>>> 
>>>> Many thanks.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> Then use some system admin tools to monitor that server,
>>>>> for instance, top, vmstat, lsof, iostat ... or simply install some nice
>>>>> free monitoring tool into this system, like monit, monitorix, nagios.
>>>>> Good luck!
>>>>> 
>>>>> 
>>>>> From: Ryan W 
>>>>> Sent: Thursday, June 11, 2020 2:13 AM
>>>>> To: solr-user@lucene.apache.org 
>>>>> Subject: Re: How to determine why solr stops running?
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> People keep suggesting I check the logs for errors.  What do those
>>> errors
>>>>> look like?  Does anyone have examples of the text of a Solr oom
>>> error?  Or
>>>>> the text of any other errors I should be looking for the next time solr
>>>>> fails?  Are there phrases I should grep for in the logs?  Should I be
>>>>> looking in the Solr logs for an OOM error, or in the Apache logs?
>>>>> 
>>>>> There is nothing failing on the server except for solr -- at least not
>>> that
>>>>> I can see.  There is no apparent problem with the hardware or anything
>>> else
>>>>> on the server.  The OS is Red Hat Enterprise Linux. The server has 16
>>> GB of
>>>>> RAM and hosts one website that does not get a huge amount of traffic.
>>>>> 
>>>>> When the start command is given to solr, does it first check to see if
>>> solr
>>>>> is running, or does it always start solr whether it is already running
>>> or
>>>>> not?
>>>>> 
>>>>> Many thanks!
>>>>> Ryan
>>>>> 
>>>>> 
>>>>> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson >>> 
>>>>> wrote:
>>>>> 
>>>>>> To add to what Dave said, if you have a particular machine that’s
>>> prone
>>>>> to
>>>>>> suddenly stopping, that’s usually a red flag that you should seriously
>>>>>> think about hardware issues.
>>>>>> 
>>>>>> If the problem strikes different machines, then I agree with Shawn
>>> that
>>>>>> the first thing I’d be suspicious of is OOM errors.
>>>>>> 
>>>>>> FWIW,
>>>>>> Erick
>>>>>> 
>>>>>>> On Jun 9, 2020, at 6:05 AM, Dave 
>>> wrote:
>>>>>>> 
>>>>>>> I’ll add that whenever I’ve had a solr instance shut down, for me
>>> it’s
>>>>>> been a hardware failure. Either the ram or the disk got a “glitch” and
>>>>> both
>>>>>> of these are relatively fragile and wear and tear type parts of the
>>>>>> machine, and should be expected to fail and be replaced from time to
>>>>> time.
>>>>>> Solr is pretty aggressive with its logging so there are a lot of
>>> writes
>>>>>> always happening and of course reads, if the disk has any issues or
>>> the
>>>>>> memory it can lock it up and bring her down, more so if you have any
>>>>>> spellcheck dictionaries or suggesters being built on start up.
>>>>>>> 
>>>>>>> Just my experience with this, could be wrong (most likely wrong) but
>>> we
>>>>>> always have extra drives and memory around the server room for this
>>>>>> reason.  At least once or twice a year we will have a disk failure in
>>> the
>>>>>> raid and need to swap in a new one.
>>>>>>> 
>&g

Re: How to determine why solr stops running?

2020-06-15 Thread Ryan W
It happened again today.  Again, no other apparent problems on the server.
Nothing else is stopping.  Nothing in the logs that strikes me as useful.
I'm using Red Hat Linux 7.8 and Solr 7.7.2.

Solr is stopping a couple times per week and I don't know how to determine
why.

On Sun, Jun 14, 2020 at 9:41 AM Ryan W  wrote:

> Thank you.  I pasted those settings at the end of my /etc/default/
> solr.in.sh just now and restarted solr.  I will see if that fixes it.
> Previously, I had no settings at all in solr.in.sh except for SOLR_PORT.
>
> On Thu, Jun 11, 2020 at 1:59 PM Walter Underwood 
> wrote:
>
>> 1. You have a tiny heap. 536 Megabytes is not enough.
>> 2. I stopped using the CMS GC years ago.
>>
>> Here is the GC config we use on every one of our 150+ Solr hosts. We’re
>> still on Java 8, but will be upgrading soon.
>>
>> SOLR_HEAP=8g
>> # Use G1 GC  -- wunder 2017-01-23
>> # Settings from https://wiki.apache.org/solr/ShawnHeisey
>> GC_TUNE=" \
>> -XX:+UseG1GC \
>> -XX:+ParallelRefProcEnabled \
>> -XX:G1HeapRegionSize=8m \
>> -XX:MaxGCPauseMillis=200 \
>> -XX:+UseLargePages \
>> -XX:+AggressiveOpts \
>> "
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> > On Jun 11, 2020, at 10:52 AM, Ryan W  wrote:
>> >
>> > On Wed, Jun 10, 2020 at 8:35 PM Hup Chen  wrote:
>> >
>> >> I will check "dmesg" first, to find out any hardware error message.
>> >>
>> >
>> > Here is what I see toward the end of the output from dmesg:
>> >
>> > [1521232.781785] [118857]48 118857   108785  677 201
>> > 901 0 httpd
>> > [1521232.781787] [118860]48 118860   108785  710 201
>> > 881 0 httpd
>> > [1521232.781788] [118862]48 118862   113063 5256 210
>> > 725 0 httpd
>> > [1521232.781790] [118864]48 118864   114085 6634 212
>> > 703 0 httpd
>> > [1521232.781791] [118871]48 118871   13968732323 262
>> > 620 0 httpd
>> > [1521232.781793] [118873]48 118873   108785  821 201
>> > 792 0 httpd
>> > [1521232.781795] [118879]48 118879   14026332719 263
>> > 621 0 httpd
>> > [1521232.781796] [118903]48 118903   108785  812 201
>> > 771 0 httpd
>> > [1521232.781798] [118905]48 118905   113575 5606 211
>> > 660 0 httpd
>> > [1521232.781800] [118906]48 118906   113563 5694 211
>> > 626 0 httpd
>> > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
>> > sacrifice child
>> > [1521232.782908] Killed process 117529 (httpd), UID 48,
>> total-vm:675824kB,
>> > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
>> >
>> > Is this a relevant "Out of memory" message?  Does this suggest an OOM
>> > situation is the culprit?
>> >
>> > When I grep in the solr logs for oom, I see some entries like this...
>> >
>> > ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
>> > -XX:CMSInitiatingOccupancyFraction=50
>> -XX:CMSMaxAbortablePrecleanTime=6000
>> > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
>> > -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
>> > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
>> > -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
>> > -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
>> > -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
>> > -XX:-OmitStackTraceInFastThrow
>> > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
>> /opt/solr/server/logs
>> > -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
>> > -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
>> > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>> > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
>> > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
>> > -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
>> > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
>> > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
>> > -XX:+UseParNewGC
>> >
>> > Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh".
>> But I
>> > think this is just a setting th

Re: How to determine why solr stops running?

2020-06-14 Thread Ryan W
Thank you.  I pasted those settings at the end of my /etc/default/solr.in.sh
just now and restarted solr.  I will see if that fixes it.  Previously, I
had no settings at all in solr.in.sh except for SOLR_PORT.

On Thu, Jun 11, 2020 at 1:59 PM Walter Underwood 
wrote:

> 1. You have a tiny heap. 536 Megabytes is not enough.
> 2. I stopped using the CMS GC years ago.
>
> Here is the GC config we use on every one of our 150+ Solr hosts. We’re
> still on Java 8, but will be upgrading soon.
>
> SOLR_HEAP=8g
> # Use G1 GC  -- wunder 2017-01-23
> # Settings from https://wiki.apache.org/solr/ShawnHeisey
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jun 11, 2020, at 10:52 AM, Ryan W  wrote:
> >
> > On Wed, Jun 10, 2020 at 8:35 PM Hup Chen  wrote:
> >
> >> I will check "dmesg" first, to find out any hardware error message.
> >>
> >
> > Here is what I see toward the end of the output from dmesg:
> >
> > [1521232.781785] [118857]48 118857   108785  677 201
> > 901 0 httpd
> > [1521232.781787] [118860]48 118860   108785  710 201
> > 881 0 httpd
> > [1521232.781788] [118862]48 118862   113063 5256 210
> > 725 0 httpd
> > [1521232.781790] [118864]48 118864   114085 6634 212
> > 703 0 httpd
> > [1521232.781791] [118871]48 118871   13968732323 262
> > 620 0 httpd
> > [1521232.781793] [118873]48 118873   108785  821 201
> > 792 0 httpd
> > [1521232.781795] [118879]48 118879   14026332719 263
> > 621 0 httpd
> > [1521232.781796] [118903]48 118903   108785  812 201
> > 771 0 httpd
> > [1521232.781798] [118905]48 118905   113575 5606 211
> > 660 0 httpd
> > [1521232.781800] [118906]48 118906   113563 5694 211
> > 626 0 httpd
> > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
> > sacrifice child
> > [1521232.782908] Killed process 117529 (httpd), UID 48,
> total-vm:675824kB,
> > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> >
> > Is this a relevant "Out of memory" message?  Does this suggest an OOM
> > situation is the culprit?
> >
> > When I grep in the solr logs for oom, I see some entries like this...
> >
> > ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
> > -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTime=6000
> > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> > -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
> > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> > -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
> > -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
> > -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
> > -XX:-OmitStackTraceInFastThrow
> > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
> /opt/solr/server/logs
> > -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
> > -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
> > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> > -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
> > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
> > -XX:+UseParNewGC
> >
> > Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh".
> But I
> > think this is just a setting that indicates what to do in case of an OOM.
> > And if I look in that oom_solr.sh file, I see it would write an entry to
> a
> > solr_oom_kill log. And there is no such log in the logs directory.
> >
> > Many thanks.
> >
> >
> >
> >
> >> Then use some system admin tools to monitor that server,
> >> for instance, top, vmstat, lsof, iostat ... or simply install some nice
> >> free monitoring tool into this system, like monit, monitorix, nagios.
> >> Good luck!
> >>
> >> 
> >> From: Ryan W 
> >> Sent: Thursday, June 11, 2020 2:13 AM
> >> 

Re: How to determine why solr stops running?

2020-06-11 Thread Walter Underwood
1. You have a tiny heap. 536 Megabytes is not enough.
2. I stopped using the CMS GC years ago.

Here is the GC config we use on every one of our 150+ Solr hosts. We’re still 
on Java 8, but will be upgrading soon.

SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 11, 2020, at 10:52 AM, Ryan W  wrote:
> 
> On Wed, Jun 10, 2020 at 8:35 PM Hup Chen  wrote:
> 
>> I will check "dmesg" first, to find out any hardware error message.
>> 
> 
> Here is what I see toward the end of the output from dmesg:
> 
> [1521232.781785] [118857]48 118857   108785  677 201
> 901 0 httpd
> [1521232.781787] [118860]48 118860   108785  710 201
> 881 0 httpd
> [1521232.781788] [118862]48 118862   113063 5256 210
> 725 0 httpd
> [1521232.781790] [118864]48 118864   114085 6634 212
> 703 0 httpd
> [1521232.781791] [118871]48 118871   13968732323 262
> 620 0 httpd
> [1521232.781793] [118873]48 118873   108785  821 201
> 792 0 httpd
> [1521232.781795] [118879]48 118879   14026332719 263
> 621 0 httpd
> [1521232.781796] [118903]48 118903   108785  812 201
> 771 0 httpd
> [1521232.781798] [118905]48 118905   113575 5606 211
> 660 0 httpd
> [1521232.781800] [118906]48 118906   113563 5694 211
> 626 0 httpd
> [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
> sacrifice child
> [1521232.782908] Killed process 117529 (httpd), UID 48, total-vm:675824kB,
> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> 
> Is this a relevant "Out of memory" message?  Does this suggest an OOM
> situation is the culprit?
> 
> When I grep in the solr logs for oom, I see some entries like this...
> 
> ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
> -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
> -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
> -XX:-OmitStackTraceInFastThrow
> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs
> -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
> -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
> -XX:+UseParNewGC
> 
> Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". But I
> think this is just a setting that indicates what to do in case of an OOM.
> And if I look in that oom_solr.sh file, I see it would write an entry to a
> solr_oom_kill log. And there is no such log in the logs directory.
> 
> Many thanks.
> 
> 
> 
> 
>> Then use some system admin tools to monitor that server,
>> for instance, top, vmstat, lsof, iostat ... or simply install some nice
>> free monitoring tool into this system, like monit, monitorix, nagios.
>> Good luck!
>> 
>> 
>> From: Ryan W 
>> Sent: Thursday, June 11, 2020 2:13 AM
>> To: solr-user@lucene.apache.org 
>> Subject: Re: How to determine why solr stops running?
>> 
>> Hi all,
>> 
>> People keep suggesting I check the logs for errors.  What do those errors
>> look like?  Does anyone have examples of the text of a Solr oom error?  Or
>> the text of any other errors I should be looking for the next time solr
>> fails?  Are there phrases I should grep for in the logs?  Should I be
>> looking in the Solr logs for an OOM error, or in the Apache logs?
>> 
>> There is nothing failing on the server except for solr -- at least not that
>> I can see.  There is no apparent problem with the hardware or anything else
>> on the server.  The OS is Red Hat Enterprise Linux. The serv

Re: How to determine why solr stops running?

2020-06-11 Thread Ryan W
On Wed, Jun 10, 2020 at 8:35 PM Hup Chen  wrote:

> I will check "dmesg" first, to find out any hardware error message.
>

Here is what I see toward the end of the output from dmesg:

[1521232.781785] [118857]48 118857   108785  677 201
901 0 httpd
[1521232.781787] [118860]48 118860   108785  710 201
881 0 httpd
[1521232.781788] [118862]48 118862   113063 5256 210
725 0 httpd
[1521232.781790] [118864]48 118864   114085 6634 212
703 0 httpd
[1521232.781791] [118871]48 118871   13968732323 262
620 0 httpd
[1521232.781793] [118873]48 118873   108785  821 201
792 0 httpd
[1521232.781795] [118879]48 118879   14026332719 263
621 0 httpd
[1521232.781796] [118903]48 118903   108785  812 201
771 0 httpd
[1521232.781798] [118905]48 118905   113575 5606 211
660 0 httpd
[1521232.781800] [118906]48 118906   113563 5694 211
626 0 httpd
[1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
sacrifice child
[1521232.782908] Killed process 117529 (httpd), UID 48, total-vm:675824kB,
anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB

Is this a relevant "Out of memory" message?  Does this suggest an OOM
situation is the culprit?

When I grep in the solr logs for oom, I see some entries like this...

./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
-XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
-XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
-XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
-XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
-XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
-XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
-XX:-OmitStackTraceInFastThrow
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs
-XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
-XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
-XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
-XX:+UseParNewGC

Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". But I
think this is just a setting that indicates what to do in case of an OOM.
And if I look in that oom_solr.sh file, I see it would write an entry to a
solr_oom_kill log. And there is no such log in the logs directory.

Many thanks.




> Then use some system admin tools to monitor that server,
> for instance, top, vmstat, lsof, iostat ... or simply install some nice
> free monitoring tool into this system, like monit, monitorix, nagios.
> Good luck!
>
> 
> From: Ryan W 
> Sent: Thursday, June 11, 2020 2:13 AM
> To: solr-user@lucene.apache.org 
> Subject: Re: How to determine why solr stops running?
>
> Hi all,
>
> People keep suggesting I check the logs for errors.  What do those errors
> look like?  Does anyone have examples of the text of a Solr oom error?  Or
> the text of any other errors I should be looking for the next time solr
> fails?  Are there phrases I should grep for in the logs?  Should I be
> looking in the Solr logs for an OOM error, or in the Apache logs?
>
> There is nothing failing on the server except for solr -- at least not that
> I can see.  There is no apparent problem with the hardware or anything else
> on the server.  The OS is Red Hat Enterprise Linux. The server has 16 GB of
> RAM and hosts one website that does not get a huge amount of traffic.
>
> When the start command is given to solr, does it first check to see if solr
> is running, or does it always start solr whether it is already running or
> not?
>
> Many thanks!
> Ryan
>
>
> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson 
> wrote:
>
> > To add to what Dave said, if you have a particular machine that’s prone
> to
> > suddenly stopping, that’s usually a red flag that you should seriously
> > think about hardware issues.
> >
> > If the problem strikes different machines, then I agree with Shawn that
> > the first thing I’d be suspicious of is OOM errors.
> >
> > FWIW,
> > Erick
> >
> > > On Jun 9, 2020, at 6:05 AM, Dave  wrote:
> > >
> > > I’ll add that whenever I’ve had a solr instance shut down, for me it’s
> > been a hardware failure. Either the ram or the disk got a “glitch” and
> both
> >

Re: How to determine why solr stops running?

2020-06-10 Thread Shawn Heisey

On 6/10/2020 12:13 PM, Ryan W wrote:

People keep suggesting I check the logs for errors.  What do those errors
look like?  Does anyone have examples of the text of a Solr oom error?  Or
the text of any other errors I should be looking for the next time solr
fails?  Are there phrases I should grep for in the logs?  Should I be
looking in the Solr logs for an OOM error, or in the Apache logs?


Are you running Solr on Windows?   If you are, then a Jave OOME will NOT 
cause Solr to stop.  On pretty much any other operating system, Solr 
will terminate when OOME occurs.  This termination will create a 
separate logfile, one that contains very little actual information, 
really the only thing it says is that the oom killer script was 
executed.  That logfile will have a filename like the following:


solr_oom_killer-8983-2019-08-11_22_57_56.log

If OOME is the reason Solr stops running, then the only place that 
exception will be logged is solr.log as far as I know ... but there 
exists a very real possibility that it won't actually be logged.  It 
could occur at a place in the code that does not have any logging.


At the URL below is an example of a logged OOME on a Solr server.  In 
this case, it wasn't memory that was exhausted, the error was logging an 
inability to start a new thread:


https://paste.apache.org/aznyg

Thanks,
Shawn


Re: How to determine why solr stops running?

2020-06-10 Thread Hup Chen
I will check "dmesg" first, to find out any hardware error message.
Then use some system admin tools to monitor that server,
for instance, top, vmstat, lsof, iostat ... or simply install some nice
free monitoring tool into this system, like monit, monitorix, nagios.
Good luck!


From: Ryan W 
Sent: Thursday, June 11, 2020 2:13 AM
To: solr-user@lucene.apache.org 
Subject: Re: How to determine why solr stops running?

Hi all,

People keep suggesting I check the logs for errors.  What do those errors
look like?  Does anyone have examples of the text of a Solr oom error?  Or
the text of any other errors I should be looking for the next time solr
fails?  Are there phrases I should grep for in the logs?  Should I be
looking in the Solr logs for an OOM error, or in the Apache logs?

There is nothing failing on the server except for solr -- at least not that
I can see.  There is no apparent problem with the hardware or anything else
on the server.  The OS is Red Hat Enterprise Linux. The server has 16 GB of
RAM and hosts one website that does not get a huge amount of traffic.

When the start command is given to solr, does it first check to see if solr
is running, or does it always start solr whether it is already running or
not?

Many thanks!
Ryan


On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson 
wrote:

> To add to what Dave said, if you have a particular machine that’s prone to
> suddenly stopping, that’s usually a red flag that you should seriously
> think about hardware issues.
>
> If the problem strikes different machines, then I agree with Shawn that
> the first thing I’d be suspicious of is OOM errors.
>
> FWIW,
> Erick
>
> > On Jun 9, 2020, at 6:05 AM, Dave  wrote:
> >
> > I’ll add that whenever I’ve had a solr instance shut down, for me it’s
> been a hardware failure. Either the ram or the disk got a “glitch” and both
> of these are relatively fragile and wear and tear type parts of the
> machine, and should be expected to fail and be replaced from time to time.
> Solr is pretty aggressive with its logging so there are a lot of writes
> always happening and of course reads, if the disk has any issues or the
> memory it can lock it up and bring her down, more so if you have any
> spellcheck dictionaries or suggesters being built on start up.
> >
> > Just my experience with this, could be wrong (most likely wrong) but we
> always have extra drives and memory around the server room for this
> reason.  At least once or twice a year we will have a disk failure in the
> raid and need to swap in a new one.
> >
> > Good luck though, also solr should be logging it’s failures so it would
> be good to look there too
> >
> >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey  wrote:
> >>
> >> On 5/14/2020 7:22 AM, Ryan W wrote:
> >>> I manage a site where solr has stopped running a couple times in the
> past
> >>> week. The server hasn't been rebooted, so that's not the reason.  What
> else
> >>> causes solr to stop running?  How can I investigate why this is
> happening?
> >>
> >> Any situation where Solr stops running and nobody requested the stop is
> a result of a serious problem that must be thoroughly investigated.  I
> think it's a bad idea for Solr to automatically restart when it stops
> unexpectedly.  Chances are that whatever caused the crash is going to
> simply make the crash happen again until the problem is solved.
> Automatically restarting could hide problems from the system administrator.
> >>
> >> The only way a Solr auto-restart would be acceptable to me is if it
> sends a high priority alert to the sysadmin EVERY time it executes an
> auto-restart.  It really is that bad of a problem.
> >>
> >> The causes of Solr crashes (that I can think of) include the following.
> I believe I have listed these four options from most likely to least likely:
> >>
> >> * Java OutOfMemoryError exceptions.  On non-windows systems, the
> "bin/solr" script starts Solr with an option that results in Solr's death
> anytime one of these exceptions occurs.  We do this because program
> operation is indeterminate and completely unpredictable when OOME occurs,
> so it's far safer to stop running.  That exception can be caused by several
> things, some of which actually do not involve memory at all.  If you're
> running on Windows via the bin\solr.cmd command, then this will not happen
> ... but OOME could still cause a crash, because as I already mentioned,
> program operation is unpredictable when OOME occurs.
> >>
> >> * The OS kills Solr because system memory is completely exhausted and
> Solr is the process using the most memory.  Linux calls this the
>

Re: How to determine why solr stops running?

2020-06-10 Thread Ryan W
Hi all,

People keep suggesting I check the logs for errors.  What do those errors
look like?  Does anyone have examples of the text of a Solr oom error?  Or
the text of any other errors I should be looking for the next time solr
fails?  Are there phrases I should grep for in the logs?  Should I be
looking in the Solr logs for an OOM error, or in the Apache logs?

There is nothing failing on the server except for solr -- at least not that
I can see.  There is no apparent problem with the hardware or anything else
on the server.  The OS is Red Hat Enterprise Linux. The server has 16 GB of
RAM and hosts one website that does not get a huge amount of traffic.

When the start command is given to solr, does it first check to see if solr
is running, or does it always start solr whether it is already running or
not?

Many thanks!
Ryan


On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson 
wrote:

> To add to what Dave said, if you have a particular machine that’s prone to
> suddenly stopping, that’s usually a red flag that you should seriously
> think about hardware issues.
>
> If the problem strikes different machines, then I agree with Shawn that
> the first thing I’d be suspicious of is OOM errors.
>
> FWIW,
> Erick
>
> > On Jun 9, 2020, at 6:05 AM, Dave  wrote:
> >
> > I’ll add that whenever I’ve had a solr instance shut down, for me it’s
> been a hardware failure. Either the ram or the disk got a “glitch” and both
> of these are relatively fragile and wear and tear type parts of the
> machine, and should be expected to fail and be replaced from time to time.
> Solr is pretty aggressive with its logging so there are a lot of writes
> always happening and of course reads, if the disk has any issues or the
> memory it can lock it up and bring her down, more so if you have any
> spellcheck dictionaries or suggesters being built on start up.
> >
> > Just my experience with this, could be wrong (most likely wrong) but we
> always have extra drives and memory around the server room for this
> reason.  At least once or twice a year we will have a disk failure in the
> raid and need to swap in a new one.
> >
> > Good luck though, also solr should be logging it’s failures so it would
> be good to look there too
> >
> >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey  wrote:
> >>
> >> On 5/14/2020 7:22 AM, Ryan W wrote:
> >>> I manage a site where solr has stopped running a couple times in the
> past
> >>> week. The server hasn't been rebooted, so that's not the reason.  What
> else
> >>> causes solr to stop running?  How can I investigate why this is
> happening?
> >>
> >> Any situation where Solr stops running and nobody requested the stop is
> a result of a serious problem that must be thoroughly investigated.  I
> think it's a bad idea for Solr to automatically restart when it stops
> unexpectedly.  Chances are that whatever caused the crash is going to
> simply make the crash happen again until the problem is solved.
> Automatically restarting could hide problems from the system administrator.
> >>
> >> The only way a Solr auto-restart would be acceptable to me is if it
> sends a high priority alert to the sysadmin EVERY time it executes an
> auto-restart.  It really is that bad of a problem.
> >>
> >> The causes of Solr crashes (that I can think of) include the following.
> I believe I have listed these four options from most likely to least likely:
> >>
> >> * Java OutOfMemoryError exceptions.  On non-windows systems, the
> "bin/solr" script starts Solr with an option that results in Solr's death
> anytime one of these exceptions occurs.  We do this because program
> operation is indeterminate and completely unpredictable when OOME occurs,
> so it's far safer to stop running.  That exception can be caused by several
> things, some of which actually do not involve memory at all.  If you're
> running on Windows via the bin\solr.cmd command, then this will not happen
> ... but OOME could still cause a crash, because as I already mentioned,
> program operation is unpredictable when OOME occurs.
> >>
> >> * The OS kills Solr because system memory is completely exhausted and
> Solr is the process using the most memory.  Linux calls this the
> "oom-killer" ... I am pretty sure something like it exists on most
> operating systems.
> >>
> >> * Corruption somewhere in the system.  Could be in Java, the OS, Solr,
> or data used by any of those.
> >>
> >> * A very serious bug in Solr's code that we haven't discovered yet.
> >>
> >> I included that last one simply for completeness.  A bug that causes a
> crash *COULD* exist, but as of right now, we have not seen any supporting
> evidence.
> >>
> >> My guess is that Java OutOfMemoryError is the cause here, but I can't
> be certain.  If that is happening, then some resource (which might not be
> memory) is fully depleted.  We would need to see the full OutOfMemoryError
> exception in order to determine why it is happening. Sometimes the
> exception is logged in solr.log, sometimes it 

Re: How to determine why solr stops running?

2020-06-09 Thread Erick Erickson
To add to what Dave said, if you have a particular machine that’s prone to
suddenly stopping, that’s usually a red flag that you should seriously 
think about hardware issues.

If the problem strikes different machines, then I agree with Shawn that
the first thing I’d be suspicious of is OOM errors.

FWIW,
Erick

> On Jun 9, 2020, at 6:05 AM, Dave  wrote:
> 
> I’ll add that whenever I’ve had a solr instance shut down, for me it’s been a 
> hardware failure. Either the ram or the disk got a “glitch” and both of these 
> are relatively fragile and wear and tear type parts of the machine, and 
> should be expected to fail and be replaced from time to time. Solr is pretty 
> aggressive with its logging so there are a lot of writes always happening and 
> of course reads, if the disk has any issues or the memory it can lock it up 
> and bring her down, more so if you have any spellcheck dictionaries or 
> suggesters being built on start up. 
> 
> Just my experience with this, could be wrong (most likely wrong) but we 
> always have extra drives and memory around the server room for this reason.  
> At least once or twice a year we will have a disk failure in the raid and 
> need to swap in a new one. 
> 
> Good luck though, also solr should be logging it’s failures so it would be 
> good to look there too
> 
>> On Jun 9, 2020, at 2:35 AM, Shawn Heisey  wrote:
>> 
>> On 5/14/2020 7:22 AM, Ryan W wrote:
>>> I manage a site where solr has stopped running a couple times in the past
>>> week. The server hasn't been rebooted, so that's not the reason.  What else
>>> causes solr to stop running?  How can I investigate why this is happening?
>> 
>> Any situation where Solr stops running and nobody requested the stop is a 
>> result of a serious problem that must be thoroughly investigated.  I think 
>> it's a bad idea for Solr to automatically restart when it stops 
>> unexpectedly.  Chances are that whatever caused the crash is going to simply 
>> make the crash happen again until the problem is solved. Automatically 
>> restarting could hide problems from the system administrator.
>> 
>> The only way a Solr auto-restart would be acceptable to me is if it sends a 
>> high priority alert to the sysadmin EVERY time it executes an auto-restart.  
>> It really is that bad of a problem.
>> 
>> The causes of Solr crashes (that I can think of) include the following. I 
>> believe I have listed these four options from most likely to least likely:
>> 
>> * Java OutOfMemoryError exceptions.  On non-windows systems, the "bin/solr" 
>> script starts Solr with an option that results in Solr's death anytime one 
>> of these exceptions occurs.  We do this because program operation is 
>> indeterminate and completely unpredictable when OOME occurs, so it's far 
>> safer to stop running.  That exception can be caused by several things, some 
>> of which actually do not involve memory at all.  If you're running on 
>> Windows via the bin\solr.cmd command, then this will not happen ... but OOME 
>> could still cause a crash, because as I already mentioned, program operation 
>> is unpredictable when OOME occurs.
>> 
>> * The OS kills Solr because system memory is completely exhausted and Solr 
>> is the process using the most memory.  Linux calls this the "oom-killer" ... 
>> I am pretty sure something like it exists on most operating systems.
>> 
>> * Corruption somewhere in the system.  Could be in Java, the OS, Solr, or 
>> data used by any of those.
>> 
>> * A very serious bug in Solr's code that we haven't discovered yet.
>> 
>> I included that last one simply for completeness.  A bug that causes a crash 
>> *COULD* exist, but as of right now, we have not seen any supporting evidence.
>> 
>> My guess is that Java OutOfMemoryError is the cause here, but I can't be 
>> certain.  If that is happening, then some resource (which might not be 
>> memory) is fully depleted.  We would need to see the full OutOfMemoryError 
>> exception in order to determine why it is happening. Sometimes the exception 
>> is logged in solr.log, sometimes it isn't.  We cannot predict what part of 
>> the code will be running when OOME occurs, so it would be nearly impossible 
>> for us to guarantee logging.  OOME can happen ANYWHERE - even in code that 
>> the compiler thinks is immune to exceptions.
>> 
>> Side note to fellow committers:  I wonder if we should implement an uncaught 
>> exception handler in Solr.  I have found in my own programs that it helps 
>> figure out thorny problems.  And while I am on the subject of handlers that 
>> might not be general knowledge, I didn't find a shutdown hook or a security 
>> manager outside of tests.
>> 
>> Thanks,
>> Shawn



Re: How to determine why solr stops running?

2020-06-09 Thread Dave
I’ll add that whenever I’ve had a solr instance shut down, for me it’s been a 
hardware failure. Either the ram or the disk got a “glitch” and both of these 
are relatively fragile and wear and tear type parts of the machine, and should 
be expected to fail and be replaced from time to time. Solr is pretty 
aggressive with its logging so there are a lot of writes always happening and 
of course reads, if the disk has any issues or the memory it can lock it up and 
bring her down, more so if you have any spellcheck dictionaries or suggesters 
being built on start up. 

Just my experience with this, could be wrong (most likely wrong) but we always 
have extra drives and memory around the server room for this reason.  At least 
once or twice a year we will have a disk failure in the raid and need to swap 
in a new one. 

Good luck though, also solr should be logging it’s failures so it would be good 
to look there too

> On Jun 9, 2020, at 2:35 AM, Shawn Heisey  wrote:
> 
> On 5/14/2020 7:22 AM, Ryan W wrote:
>> I manage a site where solr has stopped running a couple times in the past
>> week. The server hasn't been rebooted, so that's not the reason.  What else
>> causes solr to stop running?  How can I investigate why this is happening?
> 
> Any situation where Solr stops running and nobody requested the stop is a 
> result of a serious problem that must be thoroughly investigated.  I think 
> it's a bad idea for Solr to automatically restart when it stops unexpectedly. 
>  Chances are that whatever caused the crash is going to simply make the crash 
> happen again until the problem is solved. Automatically restarting could hide 
> problems from the system administrator.
> 
> The only way a Solr auto-restart would be acceptable to me is if it sends a 
> high priority alert to the sysadmin EVERY time it executes an auto-restart.  
> It really is that bad of a problem.
> 
> The causes of Solr crashes (that I can think of) include the following. I 
> believe I have listed these four options from most likely to least likely:
> 
> * Java OutOfMemoryError exceptions.  On non-windows systems, the "bin/solr" 
> script starts Solr with an option that results in Solr's death anytime one of 
> these exceptions occurs.  We do this because program operation is 
> indeterminate and completely unpredictable when OOME occurs, so it's far 
> safer to stop running.  That exception can be caused by several things, some 
> of which actually do not involve memory at all.  If you're running on Windows 
> via the bin\solr.cmd command, then this will not happen ... but OOME could 
> still cause a crash, because as I already mentioned, program operation is 
> unpredictable when OOME occurs.
> 
> * The OS kills Solr because system memory is completely exhausted and Solr is 
> the process using the most memory.  Linux calls this the "oom-killer" ... I 
> am pretty sure something like it exists on most operating systems.
> 
> * Corruption somewhere in the system.  Could be in Java, the OS, Solr, or 
> data used by any of those.
> 
> * A very serious bug in Solr's code that we haven't discovered yet.
> 
> I included that last one simply for completeness.  A bug that causes a crash 
> *COULD* exist, but as of right now, we have not seen any supporting evidence.
> 
> My guess is that Java OutOfMemoryError is the cause here, but I can't be 
> certain.  If that is happening, then some resource (which might not be 
> memory) is fully depleted.  We would need to see the full OutOfMemoryError 
> exception in order to determine why it is happening. Sometimes the exception 
> is logged in solr.log, sometimes it isn't.  We cannot predict what part of 
> the code will be running when OOME occurs, so it would be nearly impossible 
> for us to guarantee logging.  OOME can happen ANYWHERE - even in code that 
> the compiler thinks is immune to exceptions.
> 
> Side note to fellow committers:  I wonder if we should implement an uncaught 
> exception handler in Solr.  I have found in my own programs that it helps 
> figure out thorny problems.  And while I am on the subject of handlers that 
> might not be general knowledge, I didn't find a shutdown hook or a security 
> manager outside of tests.
> 
> Thanks,
> Shawn


Re: How to determine why solr stops running?

2020-06-09 Thread Shawn Heisey

On 5/14/2020 7:22 AM, Ryan W wrote:

I manage a site where solr has stopped running a couple times in the past
week. The server hasn't been rebooted, so that's not the reason.  What else
causes solr to stop running?  How can I investigate why this is happening?


Any situation where Solr stops running and nobody requested the stop is 
a result of a serious problem that must be thoroughly investigated.  I 
think it's a bad idea for Solr to automatically restart when it stops 
unexpectedly.  Chances are that whatever caused the crash is going to 
simply make the crash happen again until the problem is solved. 
Automatically restarting could hide problems from the system administrator.


The only way a Solr auto-restart would be acceptable to me is if it 
sends a high priority alert to the sysadmin EVERY time it executes an 
auto-restart.  It really is that bad of a problem.


The causes of Solr crashes (that I can think of) include the following. 
I believe I have listed these four options from most likely to least likely:


* Java OutOfMemoryError exceptions.  On non-windows systems, the 
"bin/solr" script starts Solr with an option that results in Solr's 
death anytime one of these exceptions occurs.  We do this because 
program operation is indeterminate and completely unpredictable when 
OOME occurs, so it's far safer to stop running.  That exception can be 
caused by several things, some of which actually do not involve memory 
at all.  If you're running on Windows via the bin\solr.cmd command, then 
this will not happen ... but OOME could still cause a crash, because as 
I already mentioned, program operation is unpredictable when OOME occurs.


* The OS kills Solr because system memory is completely exhausted and 
Solr is the process using the most memory.  Linux calls this the 
"oom-killer" ... I am pretty sure something like it exists on most 
operating systems.


* Corruption somewhere in the system.  Could be in Java, the OS, Solr, 
or data used by any of those.


* A very serious bug in Solr's code that we haven't discovered yet.

I included that last one simply for completeness.  A bug that causes a 
crash *COULD* exist, but as of right now, we have not seen any 
supporting evidence.


My guess is that Java OutOfMemoryError is the cause here, but I can't be 
certain.  If that is happening, then some resource (which might not be 
memory) is fully depleted.  We would need to see the full 
OutOfMemoryError exception in order to determine why it is happening. 
Sometimes the exception is logged in solr.log, sometimes it isn't.  We 
cannot predict what part of the code will be running when OOME occurs, 
so it would be nearly impossible for us to guarantee logging.  OOME can 
happen ANYWHERE - even in code that the compiler thinks is immune to 
exceptions.


Side note to fellow committers:  I wonder if we should implement an 
uncaught exception handler in Solr.  I have found in my own programs 
that it helps figure out thorny problems.  And while I am on the subject 
of handlers that might not be general knowledge, I didn't find a 
shutdown hook or a security manager outside of tests.


Thanks,
Shawn


Re: How to determine why solr stops running?

2020-06-08 Thread Radu Gheorghe
I assumed it does, based on your description. If you installed it as a service 
(systemd), then systemd can start the service again if it fails. (something 
like Restart=always in your [Service] definition).

But if it doesn’t restart automatically now, I think it’s easier to 
troubleshoot: just check the last logs after it crashed.

Best regards,
Radu

https://sematext.com

> On 8 Jun 2020, at 16:28, Ryan W  wrote:
> 
> "If Solr auto-restarts"
> 
> It doesn't auto-restart.  Is there some auto-restart functionality?  I'm
> not aware of that.
> 
> On Mon, Jun 8, 2020 at 7:10 AM Radu Gheorghe 
> wrote:
> 
>> Hi Ryan,
>> 
>> If Solr auto-restarts, I suppose it's systemd doing that. When it restarts
>> the Solr service, systemd should log this (maybe somethibg like: journalctl
>> --no-pager | grep -i solr).
>> 
>> Then you can go in your Solr logs and check what happened right before that
>> time. Also, check system logs for what happened before Solr was restarted.
>> 
>> Best regards,
>> Radu
>> 
>> https://sematext.com/
>> 
>> joi, 4 iun. 2020, 19:24 Ryan W  a scris:
>> 
>>> Happened again today. Solr stopped running. Apache hasn't stopped in 10
>>> days, so this is not due to a server reboot.
>>> 
>>> Solr is not being run with the oom-killer.  And when I grep for ERROR in
>>> the logs, there is nothing from today.
>>> 
>>> On Mon, May 18, 2020 at 3:15 PM James Greene <
>> ja...@jamesaustingreene.com>
>>> wrote:
>>> 
 I usually do a combination of grepping for ERROR in solr logs and
>>> checking
 journalctl to see if an external program may have killed the process.
 
 
 Cheers,
 
 /
 *   James Austin Greene
 *  www.jamesaustingreene.com
 *  336-lol-nerd
 /
 
 
 On Mon, May 18, 2020 at 1:39 PM Erick Erickson <
>> erickerick...@gmail.com>
 wrote:
 
> ps aux | grep solr
> 
> on a *.nix system will show you all the runtime parameters.
> 
>> On May 18, 2020, at 12:46 PM, Ryan W  wrote:
>> 
>> Is there a config file containing the start params?  I run solr
>>> like...
>> 
>> bin/solr start
>> 
>> I have not seen anything in the logs that seems informative. When I
 grep
> in
>> the logs directory for 'memory', I see nothing besides a couple
>>> entries
>> like...
>> 
>> 2020-05-14 13:05:56.155 INFO  (main) [   ]
> o.a.s.h.a.MetricsHistoryHandler
>> No .system collection, keeping metrics history in memory.
>> 
>> I don't know what that entry means, though the date does roughly
 coincide
>> with the last time solr stopped running.
>> 
>> Thank you.
>> 
>> 
>> On Mon, May 18, 2020 at 12:00 PM Erick Erickson <
 erickerick...@gmail.com
>> 
>> wrote:
>> 
>>> Probably, but check that you are running with the oom-killer,
>> it'll
>>> be
> in
>>> your start params.
>>> 
>>> But absent that, something external will be the culprit, Solr
>>> doesn't
> stop
>>> by itself. Do look at the Solr log once things stop, it should
>> show
>>> if
>>> someone or something stopped it.
>>> 
>>> On Mon, May 18, 2020, 10:43 Ryan W  wrote:
>>> 
 I don't see any log file with "oom" in the file name.  Does that
>>> mean
>>> there
 hasn't been an out-of-memory issue?  Thanks.
 
 On Thu, May 14, 2020 at 10:05 AM James Greene <
>>> ja...@jamesaustingreene.com
> 
 wrote:
 
> Check the log for for an OOM crash.  Fatal exceptions will be in
>>> the
>>> main
> solr log and out of memory errors will be in their own -oom log.
> 
> I've encountered quite a few solr crashes and usually it's when
>>> there's a
> threshold of concurrent users and/or indexing happening.
> 
> 
> 
> On Thu, May 14, 2020, 9:23 AM Ryan W  wrote:
> 
>> Hi all,
>> 
>> I manage a site where solr has stopped running a couple times
>> in
 the
 past
>> week. The server hasn't been rebooted, so that's not the
>> reason.
>>> What
> else
>> causes solr to stop running?  How can I investigate why this is
> happening?
>> 
>> Thank you,
>> Ryan
>> 
> 
 
>>> 
> 
> 
 
>>> 
>> 



Re: How to determine why solr stops running?

2020-06-08 Thread Ryan W
"If Solr auto-restarts"

It doesn't auto-restart.  Is there some auto-restart functionality?  I'm
not aware of that.

On Mon, Jun 8, 2020 at 7:10 AM Radu Gheorghe 
wrote:

> Hi Ryan,
>
> If Solr auto-restarts, I suppose it's systemd doing that. When it restarts
> the Solr service, systemd should log this (maybe somethibg like: journalctl
> --no-pager | grep -i solr).
>
> Then you can go in your Solr logs and check what happened right before that
> time. Also, check system logs for what happened before Solr was restarted.
>
> Best regards,
> Radu
>
> https://sematext.com/
>
> joi, 4 iun. 2020, 19:24 Ryan W  a scris:
>
> > Happened again today. Solr stopped running. Apache hasn't stopped in 10
> > days, so this is not due to a server reboot.
> >
> > Solr is not being run with the oom-killer.  And when I grep for ERROR in
> > the logs, there is nothing from today.
> >
> > On Mon, May 18, 2020 at 3:15 PM James Greene <
> ja...@jamesaustingreene.com>
> > wrote:
> >
> > > I usually do a combination of grepping for ERROR in solr logs and
> > checking
> > > journalctl to see if an external program may have killed the process.
> > >
> > >
> > > Cheers,
> > >
> > > /
> > > *   James Austin Greene
> > > *  www.jamesaustingreene.com
> > > *  336-lol-nerd
> > > /
> > >
> > >
> > > On Mon, May 18, 2020 at 1:39 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > > > ps aux | grep solr
> > > >
> > > > on a *.nix system will show you all the runtime parameters.
> > > >
> > > > > On May 18, 2020, at 12:46 PM, Ryan W  wrote:
> > > > >
> > > > > Is there a config file containing the start params?  I run solr
> > like...
> > > > >
> > > > > bin/solr start
> > > > >
> > > > > I have not seen anything in the logs that seems informative. When I
> > > grep
> > > > in
> > > > > the logs directory for 'memory', I see nothing besides a couple
> > entries
> > > > > like...
> > > > >
> > > > > 2020-05-14 13:05:56.155 INFO  (main) [   ]
> > > > o.a.s.h.a.MetricsHistoryHandler
> > > > > No .system collection, keeping metrics history in memory.
> > > > >
> > > > > I don't know what that entry means, though the date does roughly
> > > coincide
> > > > > with the last time solr stopped running.
> > > > >
> > > > > Thank you.
> > > > >
> > > > >
> > > > > On Mon, May 18, 2020 at 12:00 PM Erick Erickson <
> > > erickerick...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Probably, but check that you are running with the oom-killer,
> it'll
> > be
> > > > in
> > > > >> your start params.
> > > > >>
> > > > >> But absent that, something external will be the culprit, Solr
> > doesn't
> > > > stop
> > > > >> by itself. Do look at the Solr log once things stop, it should
> show
> > if
> > > > >> someone or something stopped it.
> > > > >>
> > > > >> On Mon, May 18, 2020, 10:43 Ryan W  wrote:
> > > > >>
> > > > >>> I don't see any log file with "oom" in the file name.  Does that
> > mean
> > > > >> there
> > > > >>> hasn't been an out-of-memory issue?  Thanks.
> > > > >>>
> > > > >>> On Thu, May 14, 2020 at 10:05 AM James Greene <
> > > > >> ja...@jamesaustingreene.com
> > > > 
> > > > >>> wrote:
> > > > >>>
> > > >  Check the log for for an OOM crash.  Fatal exceptions will be in
> > the
> > > > >> main
> > > >  solr log and out of memory errors will be in their own -oom log.
> > > > 
> > > >  I've encountered quite a few solr crashes and usually it's when
> > > > >> there's a
> > > >  threshold of concurrent users and/or indexing happening.
> > > > 
> > > > 
> > > > 
> > > >  On Thu, May 14, 2020, 9:23 AM Ryan W  wrote:
> > > > 
> > > > > Hi all,
> > > > >
> > > > > I manage a site where solr has stopped running a couple times
> in
> > > the
> > > > >>> past
> > > > > week. The server hasn't been rebooted, so that's not the
> reason.
> > > > >> What
> > > >  else
> > > > > causes solr to stop running?  How can I investigate why this is
> > > >  happening?
> > > > >
> > > > > Thank you,
> > > > > Ryan
> > > > >
> > > > 
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>


Re: How to determine why solr stops running?

2020-06-08 Thread Radu Gheorghe
Hi Ryan,

If Solr auto-restarts, I suppose it's systemd doing that. When it restarts
the Solr service, systemd should log this (maybe somethibg like: journalctl
--no-pager | grep -i solr).

Then you can go in your Solr logs and check what happened right before that
time. Also, check system logs for what happened before Solr was restarted.

Best regards,
Radu

https://sematext.com/

joi, 4 iun. 2020, 19:24 Ryan W  a scris:

> Happened again today. Solr stopped running. Apache hasn't stopped in 10
> days, so this is not due to a server reboot.
>
> Solr is not being run with the oom-killer.  And when I grep for ERROR in
> the logs, there is nothing from today.
>
> On Mon, May 18, 2020 at 3:15 PM James Greene 
> wrote:
>
> > I usually do a combination of grepping for ERROR in solr logs and
> checking
> > journalctl to see if an external program may have killed the process.
> >
> >
> > Cheers,
> >
> > /
> > *   James Austin Greene
> > *  www.jamesaustingreene.com
> > *  336-lol-nerd
> > /
> >
> >
> > On Mon, May 18, 2020 at 1:39 PM Erick Erickson 
> > wrote:
> >
> > > ps aux | grep solr
> > >
> > > on a *.nix system will show you all the runtime parameters.
> > >
> > > > On May 18, 2020, at 12:46 PM, Ryan W  wrote:
> > > >
> > > > Is there a config file containing the start params?  I run solr
> like...
> > > >
> > > > bin/solr start
> > > >
> > > > I have not seen anything in the logs that seems informative. When I
> > grep
> > > in
> > > > the logs directory for 'memory', I see nothing besides a couple
> entries
> > > > like...
> > > >
> > > > 2020-05-14 13:05:56.155 INFO  (main) [   ]
> > > o.a.s.h.a.MetricsHistoryHandler
> > > > No .system collection, keeping metrics history in memory.
> > > >
> > > > I don't know what that entry means, though the date does roughly
> > coincide
> > > > with the last time solr stopped running.
> > > >
> > > > Thank you.
> > > >
> > > >
> > > > On Mon, May 18, 2020 at 12:00 PM Erick Erickson <
> > erickerick...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Probably, but check that you are running with the oom-killer, it'll
> be
> > > in
> > > >> your start params.
> > > >>
> > > >> But absent that, something external will be the culprit, Solr
> doesn't
> > > stop
> > > >> by itself. Do look at the Solr log once things stop, it should show
> if
> > > >> someone or something stopped it.
> > > >>
> > > >> On Mon, May 18, 2020, 10:43 Ryan W  wrote:
> > > >>
> > > >>> I don't see any log file with "oom" in the file name.  Does that
> mean
> > > >> there
> > > >>> hasn't been an out-of-memory issue?  Thanks.
> > > >>>
> > > >>> On Thu, May 14, 2020 at 10:05 AM James Greene <
> > > >> ja...@jamesaustingreene.com
> > > 
> > > >>> wrote:
> > > >>>
> > >  Check the log for for an OOM crash.  Fatal exceptions will be in
> the
> > > >> main
> > >  solr log and out of memory errors will be in their own -oom log.
> > > 
> > >  I've encountered quite a few solr crashes and usually it's when
> > > >> there's a
> > >  threshold of concurrent users and/or indexing happening.
> > > 
> > > 
> > > 
> > >  On Thu, May 14, 2020, 9:23 AM Ryan W  wrote:
> > > 
> > > > Hi all,
> > > >
> > > > I manage a site where solr has stopped running a couple times in
> > the
> > > >>> past
> > > > week. The server hasn't been rebooted, so that's not the reason.
> > > >> What
> > >  else
> > > > causes solr to stop running?  How can I investigate why this is
> > >  happening?
> > > >
> > > > Thank you,
> > > > Ryan
> > > >
> > > 
> > > >>>
> > > >>
> > >
> > >
> >
>


Re: How to determine why solr stops running?

2020-06-04 Thread Ryan W
Happened again today. Solr stopped running. Apache hasn't stopped in 10
days, so this is not due to a server reboot.

Solr is not being run with the oom-killer.  And when I grep for ERROR in
the logs, there is nothing from today.

On Mon, May 18, 2020 at 3:15 PM James Greene 
wrote:

> I usually do a combination of grepping for ERROR in solr logs and checking
> journalctl to see if an external program may have killed the process.
>
>
> Cheers,
>
> /
> *   James Austin Greene
> *  www.jamesaustingreene.com
> *  336-lol-nerd
> /
>
>
> On Mon, May 18, 2020 at 1:39 PM Erick Erickson 
> wrote:
>
> > ps aux | grep solr
> >
> > on a *.nix system will show you all the runtime parameters.
> >
> > > On May 18, 2020, at 12:46 PM, Ryan W  wrote:
> > >
> > > Is there a config file containing the start params?  I run solr like...
> > >
> > > bin/solr start
> > >
> > > I have not seen anything in the logs that seems informative. When I
> grep
> > in
> > > the logs directory for 'memory', I see nothing besides a couple entries
> > > like...
> > >
> > > 2020-05-14 13:05:56.155 INFO  (main) [   ]
> > o.a.s.h.a.MetricsHistoryHandler
> > > No .system collection, keeping metrics history in memory.
> > >
> > > I don't know what that entry means, though the date does roughly
> coincide
> > > with the last time solr stopped running.
> > >
> > > Thank you.
> > >
> > >
> > > On Mon, May 18, 2020 at 12:00 PM Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > >> Probably, but check that you are running with the oom-killer, it'll be
> > in
> > >> your start params.
> > >>
> > >> But absent that, something external will be the culprit, Solr doesn't
> > stop
> > >> by itself. Do look at the Solr log once things stop, it should show if
> > >> someone or something stopped it.
> > >>
> > >> On Mon, May 18, 2020, 10:43 Ryan W  wrote:
> > >>
> > >>> I don't see any log file with "oom" in the file name.  Does that mean
> > >> there
> > >>> hasn't been an out-of-memory issue?  Thanks.
> > >>>
> > >>> On Thu, May 14, 2020 at 10:05 AM James Greene <
> > >> ja...@jamesaustingreene.com
> > 
> > >>> wrote:
> > >>>
> >  Check the log for for an OOM crash.  Fatal exceptions will be in the
> > >> main
> >  solr log and out of memory errors will be in their own -oom log.
> > 
> >  I've encountered quite a few solr crashes and usually it's when
> > >> there's a
> >  threshold of concurrent users and/or indexing happening.
> > 
> > 
> > 
> >  On Thu, May 14, 2020, 9:23 AM Ryan W  wrote:
> > 
> > > Hi all,
> > >
> > > I manage a site where solr has stopped running a couple times in
> the
> > >>> past
> > > week. The server hasn't been rebooted, so that's not the reason.
> > >> What
> >  else
> > > causes solr to stop running?  How can I investigate why this is
> >  happening?
> > >
> > > Thank you,
> > > Ryan
> > >
> > 
> > >>>
> > >>
> >
> >
>


Re: How to determine why solr stops running?

2020-05-18 Thread James Greene
I usually do a combination of grepping for ERROR in solr logs and checking
journalctl to see if an external program may have killed the process.


Cheers,

/
*   James Austin Greene
*  www.jamesaustingreene.com
*  336-lol-nerd
/


On Mon, May 18, 2020 at 1:39 PM Erick Erickson 
wrote:

> ps aux | grep solr
>
> on a *.nix system will show you all the runtime parameters.
>
> > On May 18, 2020, at 12:46 PM, Ryan W  wrote:
> >
> > Is there a config file containing the start params?  I run solr like...
> >
> > bin/solr start
> >
> > I have not seen anything in the logs that seems informative. When I grep
> in
> > the logs directory for 'memory', I see nothing besides a couple entries
> > like...
> >
> > 2020-05-14 13:05:56.155 INFO  (main) [   ]
> o.a.s.h.a.MetricsHistoryHandler
> > No .system collection, keeping metrics history in memory.
> >
> > I don't know what that entry means, though the date does roughly coincide
> > with the last time solr stopped running.
> >
> > Thank you.
> >
> >
> > On Mon, May 18, 2020 at 12:00 PM Erick Erickson  >
> > wrote:
> >
> >> Probably, but check that you are running with the oom-killer, it'll be
> in
> >> your start params.
> >>
> >> But absent that, something external will be the culprit, Solr doesn't
> stop
> >> by itself. Do look at the Solr log once things stop, it should show if
> >> someone or something stopped it.
> >>
> >> On Mon, May 18, 2020, 10:43 Ryan W  wrote:
> >>
> >>> I don't see any log file with "oom" in the file name.  Does that mean
> >> there
> >>> hasn't been an out-of-memory issue?  Thanks.
> >>>
> >>> On Thu, May 14, 2020 at 10:05 AM James Greene <
> >> ja...@jamesaustingreene.com
> 
> >>> wrote:
> >>>
>  Check the log for for an OOM crash.  Fatal exceptions will be in the
> >> main
>  solr log and out of memory errors will be in their own -oom log.
> 
>  I've encountered quite a few solr crashes and usually it's when
> >> there's a
>  threshold of concurrent users and/or indexing happening.
> 
> 
> 
>  On Thu, May 14, 2020, 9:23 AM Ryan W  wrote:
> 
> > Hi all,
> >
> > I manage a site where solr has stopped running a couple times in the
> >>> past
> > week. The server hasn't been rebooted, so that's not the reason.
> >> What
>  else
> > causes solr to stop running?  How can I investigate why this is
>  happening?
> >
> > Thank you,
> > Ryan
> >
> 
> >>>
> >>
>
>


Re: How to determine why solr stops running?

2020-05-18 Thread Erick Erickson
ps aux | grep solr

on a *.nix system will show you all the runtime parameters.

> On May 18, 2020, at 12:46 PM, Ryan W  wrote:
> 
> Is there a config file containing the start params?  I run solr like...
> 
> bin/solr start
> 
> I have not seen anything in the logs that seems informative. When I grep in
> the logs directory for 'memory', I see nothing besides a couple entries
> like...
> 
> 2020-05-14 13:05:56.155 INFO  (main) [   ] o.a.s.h.a.MetricsHistoryHandler
> No .system collection, keeping metrics history in memory.
> 
> I don't know what that entry means, though the date does roughly coincide
> with the last time solr stopped running.
> 
> Thank you.
> 
> 
> On Mon, May 18, 2020 at 12:00 PM Erick Erickson 
> wrote:
> 
>> Probably, but check that you are running with the oom-killer, it'll be in
>> your start params.
>> 
>> But absent that, something external will be the culprit, Solr doesn't stop
>> by itself. Do look at the Solr log once things stop, it should show if
>> someone or something stopped it.
>> 
>> On Mon, May 18, 2020, 10:43 Ryan W  wrote:
>> 
>>> I don't see any log file with "oom" in the file name.  Does that mean
>> there
>>> hasn't been an out-of-memory issue?  Thanks.
>>> 
>>> On Thu, May 14, 2020 at 10:05 AM James Greene <
>> ja...@jamesaustingreene.com
 
>>> wrote:
>>> 
 Check the log for for an OOM crash.  Fatal exceptions will be in the
>> main
 solr log and out of memory errors will be in their own -oom log.
 
 I've encountered quite a few solr crashes and usually it's when
>> there's a
 threshold of concurrent users and/or indexing happening.
 
 
 
 On Thu, May 14, 2020, 9:23 AM Ryan W  wrote:
 
> Hi all,
> 
> I manage a site where solr has stopped running a couple times in the
>>> past
> week. The server hasn't been rebooted, so that's not the reason.
>> What
 else
> causes solr to stop running?  How can I investigate why this is
 happening?
> 
> Thank you,
> Ryan
> 
 
>>> 
>> 



Re: How to determine why solr stops running?

2020-05-18 Thread Ryan W
Is there a config file containing the start params?  I run solr like...

bin/solr start

I have not seen anything in the logs that seems informative. When I grep in
the logs directory for 'memory', I see nothing besides a couple entries
like...

2020-05-14 13:05:56.155 INFO  (main) [   ] o.a.s.h.a.MetricsHistoryHandler
No .system collection, keeping metrics history in memory.

I don't know what that entry means, though the date does roughly coincide
with the last time solr stopped running.

Thank you.


On Mon, May 18, 2020 at 12:00 PM Erick Erickson 
wrote:

> Probably, but check that you are running with the oom-killer, it'll be in
> your start params.
>
> But absent that, something external will be the culprit, Solr doesn't stop
> by itself. Do look at the Solr log once things stop, it should show if
> someone or something stopped it.
>
> On Mon, May 18, 2020, 10:43 Ryan W  wrote:
>
> > I don't see any log file with "oom" in the file name.  Does that mean
> there
> > hasn't been an out-of-memory issue?  Thanks.
> >
> > On Thu, May 14, 2020 at 10:05 AM James Greene <
> ja...@jamesaustingreene.com
> > >
> > wrote:
> >
> > > Check the log for for an OOM crash.  Fatal exceptions will be in the
> main
> > > solr log and out of memory errors will be in their own -oom log.
> > >
> > > I've encountered quite a few solr crashes and usually it's when
> there's a
> > > threshold of concurrent users and/or indexing happening.
> > >
> > >
> > >
> > > On Thu, May 14, 2020, 9:23 AM Ryan W  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I manage a site where solr has stopped running a couple times in the
> > past
> > > > week. The server hasn't been rebooted, so that's not the reason.
> What
> > > else
> > > > causes solr to stop running?  How can I investigate why this is
> > > happening?
> > > >
> > > > Thank you,
> > > > Ryan
> > > >
> > >
> >
>


Re: How to determine why solr stops running?

2020-05-18 Thread Erick Erickson
Probably, but check that you are running with the oom-killer, it'll be in
your start params.

But absent that, something external will be the culprit, Solr doesn't stop
by itself. Do look at the Solr log once things stop, it should show if
someone or something stopped it.

On Mon, May 18, 2020, 10:43 Ryan W  wrote:

> I don't see any log file with "oom" in the file name.  Does that mean there
> hasn't been an out-of-memory issue?  Thanks.
>
> On Thu, May 14, 2020 at 10:05 AM James Greene  >
> wrote:
>
> > Check the log for for an OOM crash.  Fatal exceptions will be in the main
> > solr log and out of memory errors will be in their own -oom log.
> >
> > I've encountered quite a few solr crashes and usually it's when there's a
> > threshold of concurrent users and/or indexing happening.
> >
> >
> >
> > On Thu, May 14, 2020, 9:23 AM Ryan W  wrote:
> >
> > > Hi all,
> > >
> > > I manage a site where solr has stopped running a couple times in the
> past
> > > week. The server hasn't been rebooted, so that's not the reason.  What
> > else
> > > causes solr to stop running?  How can I investigate why this is
> > happening?
> > >
> > > Thank you,
> > > Ryan
> > >
> >
>


Re: How to determine why solr stops running?

2020-05-18 Thread Ryan W
I don't see any log file with "oom" in the file name.  Does that mean there
hasn't been an out-of-memory issue?  Thanks.

On Thu, May 14, 2020 at 10:05 AM James Greene 
wrote:

> Check the log for for an OOM crash.  Fatal exceptions will be in the main
> solr log and out of memory errors will be in their own -oom log.
>
> I've encountered quite a few solr crashes and usually it's when there's a
> threshold of concurrent users and/or indexing happening.
>
>
>
> On Thu, May 14, 2020, 9:23 AM Ryan W  wrote:
>
> > Hi all,
> >
> > I manage a site where solr has stopped running a couple times in the past
> > week. The server hasn't been rebooted, so that's not the reason.  What
> else
> > causes solr to stop running?  How can I investigate why this is
> happening?
> >
> > Thank you,
> > Ryan
> >
>


Re: How to determine why solr stops running?

2020-05-14 Thread James Greene
Check the log for for an OOM crash.  Fatal exceptions will be in the main
solr log and out of memory errors will be in their own -oom log.

I've encountered quite a few solr crashes and usually it's when there's a
threshold of concurrent users and/or indexing happening.



On Thu, May 14, 2020, 9:23 AM Ryan W  wrote:

> Hi all,
>
> I manage a site where solr has stopped running a couple times in the past
> week. The server hasn't been rebooted, so that's not the reason.  What else
> causes solr to stop running?  How can I investigate why this is happening?
>
> Thank you,
> Ryan
>