Re: How to determine why solr stops running?
Hi, Maybe https://github.com/sematext/solr-diagnostics can be of use? Otis -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ On Mon, Jun 29, 2020 at 3:46 PM Erick Erickson wrote: > Really look at your cache size settings. > > This is to eliminate this scenario: > - your cache sizes are very large > - when you looked and the memory was 9G, you also had a lot of cache > entries > - there was a commit, which threw out the old cache and reduced your cache > size > > This is frankly kind of unlikely, but worth checking. > > The other option is that you haven’t been hitting OOMs at all and that’s a > complete > red herring. Let’s say in actuality, you only need an 8G heap or even > smaller. By > overallocating memory garbage will simply accumulate for a long time and > when it > is eventually collected, _lots_ of memory will be collected. > > Another rather unlikely scenario, but again worth checking. > > Best, > Erick > > > On Jun 29, 2020, at 3:27 PM, Ryan W wrote: > > > > On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson > > wrote: > > > >> ps aux | grep solr > >> > > > > [solr@faspbsy0002 database-backups]$ ps aux | grep solr > > solr 72072 1.6 33.4 22847816 10966476 ? Sl 13:35 1:36 java > > -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled > > -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages > > -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails > > -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps > > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > > -Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation > > -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M > > -Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983 > > -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server > > -Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home= > > -Dsolr.install.dir=/opt/solr > > -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf > > -Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole > > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 > /opt/solr/server/logs > > -jar start.jar --module=http > > > > > > > >> should show you all the parameters Solr is running with, as would the > >> admin screen. You should see something like: > >> > >> -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh > >> > >> And there should be some logs laying around if that was the case > >> similar to: > >> $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log > >> > > > > This log is not being written, even though in the oom_solr.sh it does > > appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the > logs > > directory, but it isn't. There are some log files in > /opt/solr/server/logs, > > and they are indeed being written to. There are fresh entries in the > logs, > > but no sign of any problem. If I grep for oom in the logs directory, the > > only references I see are benign... just a few entries that list all the > > flags, and oom_solr.sh is among the settings visible in the entry. And > > someone did a search for "Mushroom," so there's another instance of oom > > from that search. > > > > > > As for memory, It Depends (tm). There are configurations > >> you can make choices about that will affect the heap requirements. > >> You can’t really draw comparisons between different projects. Your > >> Drupal + Solr app has how many documents? Indexed how? Searched > >> how? .vs. this one. > >> > >> The usual suspect for configuration settings that are responsible > >> include: > >> > >> - filterCache size too large. Each filterCache entry is bounded by > >> maxDoc/8 bytes. I’ve seen people set this to over 1M… > >> > >> - using non-docValues for fields used for sorting, grouping, function > >> queries > >> or faceting. Solr will uninvert the field on the heap, whereas if you > have > >> specified docValues=true, the memory is out in OS memory space rather > than > >> heap. > >> > >> - People just putting too many docs in a collection in a single JVM in > >> aggregate. > >> All replicas in the same instance are using part of the heap. > >> > >> - Having unnecessary options on your fields, although that’s more MMap > >> space than > >> heap. > >> > >> The problem basically is that all of Solr’s access is essentially > random, > >> so for > >> performance reasons lots of stuff has to be in memory. > >> > >> That said, Solr hasn’t been as careful as it should be about using up > >> memory, > >> that’s ongoing. > >> > >> If you really want to know what’s using up memory, throw a heap analysis > >> tool > >> at it. That’ll give you a clue what’s hogging memory and you can go from > >> there. > >> > >>> On Jun 29, 2020, at 1:48 PM, David Hastings < > >> hastings.recurs...@gmail.com> wrote: > >>> > >>> little nit picky note here, use 31gb, never 32. > >>> > >>> On Mon, Jun 29, 2020 at
Re: How to determine why solr stops running?
Really look at your cache size settings. This is to eliminate this scenario: - your cache sizes are very large - when you looked and the memory was 9G, you also had a lot of cache entries - there was a commit, which threw out the old cache and reduced your cache size This is frankly kind of unlikely, but worth checking. The other option is that you haven’t been hitting OOMs at all and that’s a complete red herring. Let’s say in actuality, you only need an 8G heap or even smaller. By overallocating memory garbage will simply accumulate for a long time and when it is eventually collected, _lots_ of memory will be collected. Another rather unlikely scenario, but again worth checking. Best, Erick > On Jun 29, 2020, at 3:27 PM, Ryan W wrote: > > On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson > wrote: > >> ps aux | grep solr >> > > [solr@faspbsy0002 database-backups]$ ps aux | grep solr > solr 72072 1.6 33.4 22847816 10966476 ? Sl 13:35 1:36 java > -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled > -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages > -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > -Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M > -Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983 > -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server > -Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home= > -Dsolr.install.dir=/opt/solr > -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf > -Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs > -jar start.jar --module=http > > > >> should show you all the parameters Solr is running with, as would the >> admin screen. You should see something like: >> >> -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh >> >> And there should be some logs laying around if that was the case >> similar to: >> $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log >> > > This log is not being written, even though in the oom_solr.sh it does > appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the logs > directory, but it isn't. There are some log files in /opt/solr/server/logs, > and they are indeed being written to. There are fresh entries in the logs, > but no sign of any problem. If I grep for oom in the logs directory, the > only references I see are benign... just a few entries that list all the > flags, and oom_solr.sh is among the settings visible in the entry. And > someone did a search for "Mushroom," so there's another instance of oom > from that search. > > > As for memory, It Depends (tm). There are configurations >> you can make choices about that will affect the heap requirements. >> You can’t really draw comparisons between different projects. Your >> Drupal + Solr app has how many documents? Indexed how? Searched >> how? .vs. this one. >> >> The usual suspect for configuration settings that are responsible >> include: >> >> - filterCache size too large. Each filterCache entry is bounded by >> maxDoc/8 bytes. I’ve seen people set this to over 1M… >> >> - using non-docValues for fields used for sorting, grouping, function >> queries >> or faceting. Solr will uninvert the field on the heap, whereas if you have >> specified docValues=true, the memory is out in OS memory space rather than >> heap. >> >> - People just putting too many docs in a collection in a single JVM in >> aggregate. >> All replicas in the same instance are using part of the heap. >> >> - Having unnecessary options on your fields, although that’s more MMap >> space than >> heap. >> >> The problem basically is that all of Solr’s access is essentially random, >> so for >> performance reasons lots of stuff has to be in memory. >> >> That said, Solr hasn’t been as careful as it should be about using up >> memory, >> that’s ongoing. >> >> If you really want to know what’s using up memory, throw a heap analysis >> tool >> at it. That’ll give you a clue what’s hogging memory and you can go from >> there. >> >>> On Jun 29, 2020, at 1:48 PM, David Hastings < >> hastings.recurs...@gmail.com> wrote: >>> >>> little nit picky note here, use 31gb, never 32. >>> >>> On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: >>> It figures it would happen again a couple hours after I suggested the >> issue might be resolved. Just now, Solr stopped running. I cleared the >> cache in my app a couple times around the time that it happened, so perhaps that >> was somehow too taxing for the server. However, I've never allocated so >> much RAM to a website before, so it's odd that I'm getting these failures. >> My colleagues were astonished when I said
Re: How to determine why solr stops running?
On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson wrote: > ps aux | grep solr > [solr@faspbsy0002 database-backups]$ ps aux | grep solr solr 72072 1.6 33.4 22847816 10966476 ? Sl 13:35 1:36 java -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M -Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server -Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home= -Dsolr.install.dir=/opt/solr -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf -Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs -jar start.jar --module=http > should show you all the parameters Solr is running with, as would the > admin screen. You should see something like: > > -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh > > And there should be some logs laying around if that was the case > similar to: > $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log > This log is not being written, even though in the oom_solr.sh it does appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the logs directory, but it isn't. There are some log files in /opt/solr/server/logs, and they are indeed being written to. There are fresh entries in the logs, but no sign of any problem. If I grep for oom in the logs directory, the only references I see are benign... just a few entries that list all the flags, and oom_solr.sh is among the settings visible in the entry. And someone did a search for "Mushroom," so there's another instance of oom from that search. As for memory, It Depends (tm). There are configurations > you can make choices about that will affect the heap requirements. > You can’t really draw comparisons between different projects. Your > Drupal + Solr app has how many documents? Indexed how? Searched > how? .vs. this one. > > The usual suspect for configuration settings that are responsible > include: > > - filterCache size too large. Each filterCache entry is bounded by > maxDoc/8 bytes. I’ve seen people set this to over 1M… > > - using non-docValues for fields used for sorting, grouping, function > queries > or faceting. Solr will uninvert the field on the heap, whereas if you have > specified docValues=true, the memory is out in OS memory space rather than > heap. > > - People just putting too many docs in a collection in a single JVM in > aggregate. > All replicas in the same instance are using part of the heap. > > - Having unnecessary options on your fields, although that’s more MMap > space than > heap. > > The problem basically is that all of Solr’s access is essentially random, > so for > performance reasons lots of stuff has to be in memory. > > That said, Solr hasn’t been as careful as it should be about using up > memory, > that’s ongoing. > > If you really want to know what’s using up memory, throw a heap analysis > tool > at it. That’ll give you a clue what’s hogging memory and you can go from > there. > > > On Jun 29, 2020, at 1:48 PM, David Hastings < > hastings.recurs...@gmail.com> wrote: > > > > little nit picky note here, use 31gb, never 32. > > > > On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: > > > >> It figures it would happen again a couple hours after I suggested the > issue > >> might be resolved. Just now, Solr stopped running. I cleared the > cache in > >> my app a couple times around the time that it happened, so perhaps that > was > >> somehow too taxing for the server. However, I've never allocated so > much > >> RAM to a website before, so it's odd that I'm getting these failures. > My > >> colleagues were astonished when I said people on the solr-user list were > >> telling me I might need 32GB just for solr. > >> > >> I manage another project that uses Drupal + Solr, and we have a total of > >> 8GB of RAM on that server and Solr never, ever stops. I've been > managing > >> that site for years and never seen a Solr outage. On that project, > >> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 > GB or > >> more? > >> > >> "The thing that’s unsettling about this is that assuming you were > hitting > >> OOMs, and were running the OOM-killer script, you _should_ have had very > >> clear evidence that that was the cause." > >> > >> How do I know if I'm running the OOM-killer script? > >> > >> Thank you. > >> > >> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >>> The thing that’s unsettling about this is that assuming you were > hitting > >>> OOMs,
Re: How to determine why solr stops running?
Maybe you can identify in the logfiles some critical queries? What is the total size of the index? What client are you using on the web app side? Are you reusing clients or create one new for every query. > Am 29.06.2020 um 21:14 schrieb Ryan W : > > On Mon, Jun 29, 2020 at 1:49 PM David Hastings > wrote: > >> little nit picky note here, use 31gb, never 32. > > > Good to know. > > Just now I got this output from bin/solr status: > > "solr_home":"/opt/solr/server/solr", > "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - > 2019-05-28 23:37:48", > "startTime":"2020-06-29T17:35:13.966Z", > "uptime":"0 days, 1 hours, 32 minutes, 7 seconds", > "memory":"9.3 GB (%57.9) of 16 GB"} > > That's the highest memory use I've seen. Not sure if this indicates 16GB > isn't enough. Then I ran it again a couple minutes later and it was down > to 598.3 MB. I wonder what accounts for these wide swings. I can't > imagine if a few users are doing searches, suddenly it uses 9 GB of RAM. > > >> On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: >> >>> It figures it would happen again a couple hours after I suggested the >> issue >>> might be resolved. Just now, Solr stopped running. I cleared the cache >> in >>> my app a couple times around the time that it happened, so perhaps that >> was >>> somehow too taxing for the server. However, I've never allocated so much >>> RAM to a website before, so it's odd that I'm getting these failures. My >>> colleagues were astonished when I said people on the solr-user list were >>> telling me I might need 32GB just for solr. >>> >>> I manage another project that uses Drupal + Solr, and we have a total of >>> 8GB of RAM on that server and Solr never, ever stops. I've been managing >>> that site for years and never seen a Solr outage. On that project, >>> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB >> or >>> more? >>> >>> "The thing that’s unsettling about this is that assuming you were hitting >>> OOMs, and were running the OOM-killer script, you _should_ have had very >>> clear evidence that that was the cause." >>> >>> How do I know if I'm running the OOM-killer script? >>> >>> Thank you. >>> >>> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson >> >>> wrote: >>> The thing that’s unsettling about this is that assuming you were >> hitting OOMs, and were running the OOM-killer script, you _should_ have had very >> clear evidence that that was the cause. If you were not running the killer script, the apologies for not asking about that in the first place. Java’s performance is unpredictable when OOMs >> happen, which is the point of the killer script: at least Solr stops rather >> than >>> do something inexplicable. Best, Erick > On Jun 29, 2020, at 11:52 AM, David Hastings < hastings.recurs...@gmail.com> wrote: > > sometimes just throwing money/ram/ssd at the problem is just the best > answer. > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > >> Thanks everyone. Just to give an update on this issue, I bumped the >>> RAM >> available to Solr up to 16GB a couple weeks ago, and haven’t had any >> problem since. >> >> >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings < >> hastings.recurs...@gmail.com> >> wrote: >> >>> me personally, around 290gb. as much as we could shove into them >>> >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson < erickerick...@gmail.com >>> >>> wrote: >>> How much physical RAM? A rule of thumb is that you should allocate >>> no >>> more than 25-50 percent of the total physical RAM to Solr. That's >> cumulative, i.e. the sum of the heap allocations across all your JVMs should >> be >> below that percentage. See Uwe Schindler's mmapdirectiry blog... Shot in the dark... On Tue, Jun 16, 2020, 11:51 David Hastings < >> hastings.recurs...@gmail.com wrote: > To add to this, i generally have solr start with this: > -Xms31000m-Xmx31000m > > and the only other thing that runs on them are maria db gallera >> cluster > nodes that are not in use (aside from replication) > > the 31gb is not an accident either, you dont want 32gb. > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey < >> apa...@elyograg.org wrote: > >> On 6/11/2020 11:52 AM, Ryan W wrote: I will check "dmesg" first, to find out any hardware error >>> message. >> >> >> >>> [1521232.781801] Out of memory: Kill process 117529 (httpd) >> score 9 or >>> sacrifice child >>> [1521232.782908] Killed process 117529 (httpd), UID 48, >>
Re: How to determine why solr stops running?
On Mon, Jun 29, 2020 at 1:49 PM David Hastings wrote: > little nit picky note here, use 31gb, never 32. Good to know. Just now I got this output from bin/solr status: "solr_home":"/opt/solr/server/solr", "version":"7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48", "startTime":"2020-06-29T17:35:13.966Z", "uptime":"0 days, 1 hours, 32 minutes, 7 seconds", "memory":"9.3 GB (%57.9) of 16 GB"} That's the highest memory use I've seen. Not sure if this indicates 16GB isn't enough. Then I ran it again a couple minutes later and it was down to 598.3 MB. I wonder what accounts for these wide swings. I can't imagine if a few users are doing searches, suddenly it uses 9 GB of RAM. On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: > > > It figures it would happen again a couple hours after I suggested the > issue > > might be resolved. Just now, Solr stopped running. I cleared the cache > in > > my app a couple times around the time that it happened, so perhaps that > was > > somehow too taxing for the server. However, I've never allocated so much > > RAM to a website before, so it's odd that I'm getting these failures. My > > colleagues were astonished when I said people on the solr-user list were > > telling me I might need 32GB just for solr. > > > > I manage another project that uses Drupal + Solr, and we have a total of > > 8GB of RAM on that server and Solr never, ever stops. I've been managing > > that site for years and never seen a Solr outage. On that project, > > Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB > or > > more? > > > > "The thing that’s unsettling about this is that assuming you were hitting > > OOMs, and were running the OOM-killer script, you _should_ have had very > > clear evidence that that was the cause." > > > > How do I know if I'm running the OOM-killer script? > > > > Thank you. > > > > On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson > > > wrote: > > > > > The thing that’s unsettling about this is that assuming you were > hitting > > > OOMs, > > > and were running the OOM-killer script, you _should_ have had very > clear > > > evidence that that was the cause. > > > > > > If you were not running the killer script, the apologies for not asking > > > about that > > > in the first place. Java’s performance is unpredictable when OOMs > happen, > > > which is the point of the killer script: at least Solr stops rather > than > > do > > > something inexplicable. > > > > > > Best, > > > Erick > > > > > > > On Jun 29, 2020, at 11:52 AM, David Hastings < > > > hastings.recurs...@gmail.com> wrote: > > > > > > > > sometimes just throwing money/ram/ssd at the problem is just the best > > > > answer. > > > > > > > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > > > > > > > >> Thanks everyone. Just to give an update on this issue, I bumped the > > RAM > > > >> available to Solr up to 16GB a couple weeks ago, and haven’t had any > > > >> problem since. > > > >> > > > >> > > > >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > > > >> hastings.recurs...@gmail.com> > > > >> wrote: > > > >> > > > >>> me personally, around 290gb. as much as we could shove into them > > > >>> > > > >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson < > > > erickerick...@gmail.com > > > >>> > > > >>> wrote: > > > >>> > > > How much physical RAM? A rule of thumb is that you should allocate > > no > > > >>> more > > > than 25-50 percent of the total physical RAM to Solr. That's > > > >> cumulative, > > > i.e. the sum of the heap allocations across all your JVMs should > be > > > >> below > > > that percentage. See Uwe Schindler's mmapdirectiry blog... > > > > > > Shot in the dark... > > > > > > On Tue, Jun 16, 2020, 11:51 David Hastings < > > > >> hastings.recurs...@gmail.com > > > > > > wrote: > > > > > > > To add to this, i generally have solr start with this: > > > > -Xms31000m-Xmx31000m > > > > > > > > and the only other thing that runs on them are maria db gallera > > > >> cluster > > > > nodes that are not in use (aside from replication) > > > > > > > > the 31gb is not an accident either, you dont want 32gb. > > > > > > > > > > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey < > apa...@elyograg.org > > > > > > wrote: > > > > > > > >> On 6/11/2020 11:52 AM, Ryan W wrote: > > > I will check "dmesg" first, to find out any hardware error > > > >>> message. > > > >> > > > >> > > > >> > > > >>> [1521232.781801] Out of memory: Kill process 117529 (httpd) > > > >> score 9 > > > or > > > >>> sacrifice child > > > >>> [1521232.782908] Killed process 117529 (httpd), UID 48, > > > >> total-vm:675824kB, > > > >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > > >>> > > > >>> Is this a relevant "Out of memory" message? Does this suggest > an > > > >>> OOM > > > >>>
Re: How to determine why solr stops running?
ps aux | grep solr should show you all the parameters Solr is running with, as would the admin screen. You should see something like: -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh And there should be some logs laying around if that was the case similar to: $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log As for memory, It Depends (tm). There are configurations you can make choices about that will affect the heap requirements. You can’t really draw comparisons between different projects. Your Drupal + Solr app has how many documents? Indexed how? Searched how? .vs. this one. The usual suspect for configuration settings that are responsible include: - filterCache size too large. Each filterCache entry is bounded by maxDoc/8 bytes. I’ve seen people set this to over 1M… - using non-docValues for fields used for sorting, grouping, function queries or faceting. Solr will uninvert the field on the heap, whereas if you have specified docValues=true, the memory is out in OS memory space rather than heap. - People just putting too many docs in a collection in a single JVM in aggregate. All replicas in the same instance are using part of the heap. - Having unnecessary options on your fields, although that’s more MMap space than heap. The problem basically is that all of Solr’s access is essentially random, so for performance reasons lots of stuff has to be in memory. That said, Solr hasn’t been as careful as it should be about using up memory, that’s ongoing. If you really want to know what’s using up memory, throw a heap analysis tool at it. That’ll give you a clue what’s hogging memory and you can go from there. > On Jun 29, 2020, at 1:48 PM, David Hastings > wrote: > > little nit picky note here, use 31gb, never 32. > > On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: > >> It figures it would happen again a couple hours after I suggested the issue >> might be resolved. Just now, Solr stopped running. I cleared the cache in >> my app a couple times around the time that it happened, so perhaps that was >> somehow too taxing for the server. However, I've never allocated so much >> RAM to a website before, so it's odd that I'm getting these failures. My >> colleagues were astonished when I said people on the solr-user list were >> telling me I might need 32GB just for solr. >> >> I manage another project that uses Drupal + Solr, and we have a total of >> 8GB of RAM on that server and Solr never, ever stops. I've been managing >> that site for years and never seen a Solr outage. On that project, >> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or >> more? >> >> "The thing that’s unsettling about this is that assuming you were hitting >> OOMs, and were running the OOM-killer script, you _should_ have had very >> clear evidence that that was the cause." >> >> How do I know if I'm running the OOM-killer script? >> >> Thank you. >> >> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson >> wrote: >> >>> The thing that’s unsettling about this is that assuming you were hitting >>> OOMs, >>> and were running the OOM-killer script, you _should_ have had very clear >>> evidence that that was the cause. >>> >>> If you were not running the killer script, the apologies for not asking >>> about that >>> in the first place. Java’s performance is unpredictable when OOMs happen, >>> which is the point of the killer script: at least Solr stops rather than >> do >>> something inexplicable. >>> >>> Best, >>> Erick >>> On Jun 29, 2020, at 11:52 AM, David Hastings < >>> hastings.recurs...@gmail.com> wrote: sometimes just throwing money/ram/ssd at the problem is just the best answer. On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > Thanks everyone. Just to give an update on this issue, I bumped the >> RAM > available to Solr up to 16GB a couple weeks ago, and haven’t had any > problem since. > > > On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > hastings.recurs...@gmail.com> > wrote: > >> me personally, around 290gb. as much as we could shove into them >> >> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson < >>> erickerick...@gmail.com >> >> wrote: >> >>> How much physical RAM? A rule of thumb is that you should allocate >> no >> more >>> than 25-50 percent of the total physical RAM to Solr. That's > cumulative, >>> i.e. the sum of the heap allocations across all your JVMs should be > below >>> that percentage. See Uwe Schindler's mmapdirectiry blog... >>> >>> Shot in the dark... >>> >>> On Tue, Jun 16, 2020, 11:51 David Hastings < > hastings.recurs...@gmail.com >>> >>> wrote: >>> To add to this, i generally have solr start with this: -Xms31000m-Xmx31000m and the only other thing that runs on them are maria db gallera > cluster nodes that are not in
Re: How to determine why solr stops running?
little nit picky note here, use 31gb, never 32. On Mon, Jun 29, 2020 at 1:45 PM Ryan W wrote: > It figures it would happen again a couple hours after I suggested the issue > might be resolved. Just now, Solr stopped running. I cleared the cache in > my app a couple times around the time that it happened, so perhaps that was > somehow too taxing for the server. However, I've never allocated so much > RAM to a website before, so it's odd that I'm getting these failures. My > colleagues were astonished when I said people on the solr-user list were > telling me I might need 32GB just for solr. > > I manage another project that uses Drupal + Solr, and we have a total of > 8GB of RAM on that server and Solr never, ever stops. I've been managing > that site for years and never seen a Solr outage. On that project, > Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or > more? > > "The thing that’s unsettling about this is that assuming you were hitting > OOMs, and were running the OOM-killer script, you _should_ have had very > clear evidence that that was the cause." > > How do I know if I'm running the OOM-killer script? > > Thank you. > > On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson > wrote: > > > The thing that’s unsettling about this is that assuming you were hitting > > OOMs, > > and were running the OOM-killer script, you _should_ have had very clear > > evidence that that was the cause. > > > > If you were not running the killer script, the apologies for not asking > > about that > > in the first place. Java’s performance is unpredictable when OOMs happen, > > which is the point of the killer script: at least Solr stops rather than > do > > something inexplicable. > > > > Best, > > Erick > > > > > On Jun 29, 2020, at 11:52 AM, David Hastings < > > hastings.recurs...@gmail.com> wrote: > > > > > > sometimes just throwing money/ram/ssd at the problem is just the best > > > answer. > > > > > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > > > > > >> Thanks everyone. Just to give an update on this issue, I bumped the > RAM > > >> available to Solr up to 16GB a couple weeks ago, and haven’t had any > > >> problem since. > > >> > > >> > > >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > > >> hastings.recurs...@gmail.com> > > >> wrote: > > >> > > >>> me personally, around 290gb. as much as we could shove into them > > >>> > > >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson < > > erickerick...@gmail.com > > >>> > > >>> wrote: > > >>> > > How much physical RAM? A rule of thumb is that you should allocate > no > > >>> more > > than 25-50 percent of the total physical RAM to Solr. That's > > >> cumulative, > > i.e. the sum of the heap allocations across all your JVMs should be > > >> below > > that percentage. See Uwe Schindler's mmapdirectiry blog... > > > > Shot in the dark... > > > > On Tue, Jun 16, 2020, 11:51 David Hastings < > > >> hastings.recurs...@gmail.com > > > > wrote: > > > > > To add to this, i generally have solr start with this: > > > -Xms31000m-Xmx31000m > > > > > > and the only other thing that runs on them are maria db gallera > > >> cluster > > > nodes that are not in use (aside from replication) > > > > > > the 31gb is not an accident either, you dont want 32gb. > > > > > > > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey > > > wrote: > > > > > >> On 6/11/2020 11:52 AM, Ryan W wrote: > > I will check "dmesg" first, to find out any hardware error > > >>> message. > > >> > > >> > > >> > > >>> [1521232.781801] Out of memory: Kill process 117529 (httpd) > > >> score 9 > > or > > >>> sacrifice child > > >>> [1521232.782908] Killed process 117529 (httpd), UID 48, > > >> total-vm:675824kB, > > >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > >>> > > >>> Is this a relevant "Out of memory" message? Does this suggest an > > >>> OOM > > >>> situation is the culprit? > > >> > > >> Because this was in the "dmesg" output, it indicates that it is > the > > >> operating system killing programs because the *system* doesn't > have > > >>> any > > >> memory left. It wasn't Java that did this, and it wasn't Solr > that > > >>> was > > >> killed. It very well could have been Solr that was killed at > > >> another > > >> time, though. > > >> > > >> The process that it killed this time is named httpd ... which is > > >> most > > >> likely the Apache webserver. Because the UID is 48, this is > > >> probably > > an > > >> OS derived from Redhat, where the "apache" user has UID and GID 48 > > >> by > > >> default. Apache with its default config can be VERY memory hungry > > >>> when > > >> it gets busy. > > >> > > >>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > > >> > > >> This says that you started Solr with the
Re: How to determine why solr stops running?
It figures it would happen again a couple hours after I suggested the issue might be resolved. Just now, Solr stopped running. I cleared the cache in my app a couple times around the time that it happened, so perhaps that was somehow too taxing for the server. However, I've never allocated so much RAM to a website before, so it's odd that I'm getting these failures. My colleagues were astonished when I said people on the solr-user list were telling me I might need 32GB just for solr. I manage another project that uses Drupal + Solr, and we have a total of 8GB of RAM on that server and Solr never, ever stops. I've been managing that site for years and never seen a Solr outage. On that project, Drupal + Solr is OK with 8GB, but somehow this other project needs 64 GB or more? "The thing that’s unsettling about this is that assuming you were hitting OOMs, and were running the OOM-killer script, you _should_ have had very clear evidence that that was the cause." How do I know if I'm running the OOM-killer script? Thank you. On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson wrote: > The thing that’s unsettling about this is that assuming you were hitting > OOMs, > and were running the OOM-killer script, you _should_ have had very clear > evidence that that was the cause. > > If you were not running the killer script, the apologies for not asking > about that > in the first place. Java’s performance is unpredictable when OOMs happen, > which is the point of the killer script: at least Solr stops rather than do > something inexplicable. > > Best, > Erick > > > On Jun 29, 2020, at 11:52 AM, David Hastings < > hastings.recurs...@gmail.com> wrote: > > > > sometimes just throwing money/ram/ssd at the problem is just the best > > answer. > > > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > > > >> Thanks everyone. Just to give an update on this issue, I bumped the RAM > >> available to Solr up to 16GB a couple weeks ago, and haven’t had any > >> problem since. > >> > >> > >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > >> hastings.recurs...@gmail.com> > >> wrote: > >> > >>> me personally, around 290gb. as much as we could shove into them > >>> > >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson < > erickerick...@gmail.com > >>> > >>> wrote: > >>> > How much physical RAM? A rule of thumb is that you should allocate no > >>> more > than 25-50 percent of the total physical RAM to Solr. That's > >> cumulative, > i.e. the sum of the heap allocations across all your JVMs should be > >> below > that percentage. See Uwe Schindler's mmapdirectiry blog... > > Shot in the dark... > > On Tue, Jun 16, 2020, 11:51 David Hastings < > >> hastings.recurs...@gmail.com > > wrote: > > > To add to this, i generally have solr start with this: > > -Xms31000m-Xmx31000m > > > > and the only other thing that runs on them are maria db gallera > >> cluster > > nodes that are not in use (aside from replication) > > > > the 31gb is not an accident either, you dont want 32gb. > > > > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey > wrote: > > > >> On 6/11/2020 11:52 AM, Ryan W wrote: > I will check "dmesg" first, to find out any hardware error > >>> message. > >> > >> > >> > >>> [1521232.781801] Out of memory: Kill process 117529 (httpd) > >> score 9 > or > >>> sacrifice child > >>> [1521232.782908] Killed process 117529 (httpd), UID 48, > >> total-vm:675824kB, > >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > >>> > >>> Is this a relevant "Out of memory" message? Does this suggest an > >>> OOM > >>> situation is the culprit? > >> > >> Because this was in the "dmesg" output, it indicates that it is the > >> operating system killing programs because the *system* doesn't have > >>> any > >> memory left. It wasn't Java that did this, and it wasn't Solr that > >>> was > >> killed. It very well could have been Solr that was killed at > >> another > >> time, though. > >> > >> The process that it killed this time is named httpd ... which is > >> most > >> likely the Apache webserver. Because the UID is 48, this is > >> probably > an > >> OS derived from Redhat, where the "apache" user has UID and GID 48 > >> by > >> default. Apache with its default config can be VERY memory hungry > >>> when > >> it gets busy. > >> > >>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > >> > >> This says that you started Solr with the default 512MB heap. Which > >>> is > >> VERY VERY small. The default is small so that Solr will start on > >> virtually any hardware. Almost every user must increase the heap > >>> size. > >> And because the OS is killing processes, it is likely that the > >> system > >> does not have enough memory installed for what you have running on > >>> it.
Re: How to determine why solr stops running?
The thing that’s unsettling about this is that assuming you were hitting OOMs, and were running the OOM-killer script, you _should_ have had very clear evidence that that was the cause. If you were not running the killer script, the apologies for not asking about that in the first place. Java’s performance is unpredictable when OOMs happen, which is the point of the killer script: at least Solr stops rather than do something inexplicable. Best, Erick > On Jun 29, 2020, at 11:52 AM, David Hastings > wrote: > > sometimes just throwing money/ram/ssd at the problem is just the best > answer. > > On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > >> Thanks everyone. Just to give an update on this issue, I bumped the RAM >> available to Solr up to 16GB a couple weeks ago, and haven’t had any >> problem since. >> >> >> On Tue, Jun 16, 2020 at 1:00 PM David Hastings < >> hastings.recurs...@gmail.com> >> wrote: >> >>> me personally, around 290gb. as much as we could shove into them >>> >>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson >> >>> wrote: >>> How much physical RAM? A rule of thumb is that you should allocate no >>> more than 25-50 percent of the total physical RAM to Solr. That's >> cumulative, i.e. the sum of the heap allocations across all your JVMs should be >> below that percentage. See Uwe Schindler's mmapdirectiry blog... Shot in the dark... On Tue, Jun 16, 2020, 11:51 David Hastings < >> hastings.recurs...@gmail.com wrote: > To add to this, i generally have solr start with this: > -Xms31000m-Xmx31000m > > and the only other thing that runs on them are maria db gallera >> cluster > nodes that are not in use (aside from replication) > > the 31gb is not an accident either, you dont want 32gb. > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey wrote: > >> On 6/11/2020 11:52 AM, Ryan W wrote: I will check "dmesg" first, to find out any hardware error >>> message. >> >> >> >>> [1521232.781801] Out of memory: Kill process 117529 (httpd) >> score 9 or >>> sacrifice child >>> [1521232.782908] Killed process 117529 (httpd), UID 48, >> total-vm:675824kB, >>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB >>> >>> Is this a relevant "Out of memory" message? Does this suggest an >>> OOM >>> situation is the culprit? >> >> Because this was in the "dmesg" output, it indicates that it is the >> operating system killing programs because the *system* doesn't have >>> any >> memory left. It wasn't Java that did this, and it wasn't Solr that >>> was >> killed. It very well could have been Solr that was killed at >> another >> time, though. >> >> The process that it killed this time is named httpd ... which is >> most >> likely the Apache webserver. Because the UID is 48, this is >> probably an >> OS derived from Redhat, where the "apache" user has UID and GID 48 >> by >> default. Apache with its default config can be VERY memory hungry >>> when >> it gets busy. >> >>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 >> >> This says that you started Solr with the default 512MB heap. Which >>> is >> VERY VERY small. The default is small so that Solr will start on >> virtually any hardware. Almost every user must increase the heap >>> size. >> And because the OS is killing processes, it is likely that the >> system >> does not have enough memory installed for what you have running on >>> it. >> >> It is generally not a good idea to share the server hardware >> between >> Solr and other software, unless the system has a lot of spare resources, >> memory in particular. >> >> Thanks, >> Shawn >> > >>> >>
Re: How to determine why solr stops running?
sometimes just throwing money/ram/ssd at the problem is just the best answer. On Mon, Jun 29, 2020 at 11:38 AM Ryan W wrote: > Thanks everyone. Just to give an update on this issue, I bumped the RAM > available to Solr up to 16GB a couple weeks ago, and haven’t had any > problem since. > > > On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > hastings.recurs...@gmail.com> > wrote: > > > me personally, around 290gb. as much as we could shove into them > > > > On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson > > > wrote: > > > > > How much physical RAM? A rule of thumb is that you should allocate no > > more > > > than 25-50 percent of the total physical RAM to Solr. That's > cumulative, > > > i.e. the sum of the heap allocations across all your JVMs should be > below > > > that percentage. See Uwe Schindler's mmapdirectiry blog... > > > > > > Shot in the dark... > > > > > > On Tue, Jun 16, 2020, 11:51 David Hastings < > hastings.recurs...@gmail.com > > > > > > wrote: > > > > > > > To add to this, i generally have solr start with this: > > > > -Xms31000m-Xmx31000m > > > > > > > > and the only other thing that runs on them are maria db gallera > cluster > > > > nodes that are not in use (aside from replication) > > > > > > > > the 31gb is not an accident either, you dont want 32gb. > > > > > > > > > > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey > > > wrote: > > > > > > > > > On 6/11/2020 11:52 AM, Ryan W wrote: > > > > > >> I will check "dmesg" first, to find out any hardware error > > message. > > > > > > > > > > > > > > > > > > > > > [1521232.781801] Out of memory: Kill process 117529 (httpd) > score 9 > > > or > > > > > > sacrifice child > > > > > > [1521232.782908] Killed process 117529 (httpd), UID 48, > > > > > total-vm:675824kB, > > > > > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > > > > > > > > > > > Is this a relevant "Out of memory" message? Does this suggest an > > OOM > > > > > > situation is the culprit? > > > > > > > > > > Because this was in the "dmesg" output, it indicates that it is the > > > > > operating system killing programs because the *system* doesn't have > > any > > > > > memory left. It wasn't Java that did this, and it wasn't Solr that > > was > > > > > killed. It very well could have been Solr that was killed at > another > > > > > time, though. > > > > > > > > > > The process that it killed this time is named httpd ... which is > most > > > > > likely the Apache webserver. Because the UID is 48, this is > probably > > > an > > > > > OS derived from Redhat, where the "apache" user has UID and GID 48 > by > > > > > default. Apache with its default config can be VERY memory hungry > > when > > > > > it gets busy. > > > > > > > > > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > > > > > > > > > > This says that you started Solr with the default 512MB heap. Which > > is > > > > > VERY VERY small. The default is small so that Solr will start on > > > > > virtually any hardware. Almost every user must increase the heap > > size. > > > > > And because the OS is killing processes, it is likely that the > system > > > > > does not have enough memory installed for what you have running on > > it. > > > > > > > > > > It is generally not a good idea to share the server hardware > between > > > > > Solr and other software, unless the system has a lot of spare > > > resources, > > > > > memory in particular. > > > > > > > > > > Thanks, > > > > > Shawn > > > > > > > > > > > > > > >
Re: How to determine why solr stops running?
Thanks everyone. Just to give an update on this issue, I bumped the RAM available to Solr up to 16GB a couple weeks ago, and haven’t had any problem since. On Tue, Jun 16, 2020 at 1:00 PM David Hastings wrote: > me personally, around 290gb. as much as we could shove into them > > On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson > wrote: > > > How much physical RAM? A rule of thumb is that you should allocate no > more > > than 25-50 percent of the total physical RAM to Solr. That's cumulative, > > i.e. the sum of the heap allocations across all your JVMs should be below > > that percentage. See Uwe Schindler's mmapdirectiry blog... > > > > Shot in the dark... > > > > On Tue, Jun 16, 2020, 11:51 David Hastings > > > wrote: > > > > > To add to this, i generally have solr start with this: > > > -Xms31000m-Xmx31000m > > > > > > and the only other thing that runs on them are maria db gallera cluster > > > nodes that are not in use (aside from replication) > > > > > > the 31gb is not an accident either, you dont want 32gb. > > > > > > > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey > > wrote: > > > > > > > On 6/11/2020 11:52 AM, Ryan W wrote: > > > > >> I will check "dmesg" first, to find out any hardware error > message. > > > > > > > > > > > > > > > > > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 > > or > > > > > sacrifice child > > > > > [1521232.782908] Killed process 117529 (httpd), UID 48, > > > > total-vm:675824kB, > > > > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > > > > > > > > > Is this a relevant "Out of memory" message? Does this suggest an > OOM > > > > > situation is the culprit? > > > > > > > > Because this was in the "dmesg" output, it indicates that it is the > > > > operating system killing programs because the *system* doesn't have > any > > > > memory left. It wasn't Java that did this, and it wasn't Solr that > was > > > > killed. It very well could have been Solr that was killed at another > > > > time, though. > > > > > > > > The process that it killed this time is named httpd ... which is most > > > > likely the Apache webserver. Because the UID is 48, this is probably > > an > > > > OS derived from Redhat, where the "apache" user has UID and GID 48 by > > > > default. Apache with its default config can be VERY memory hungry > when > > > > it gets busy. > > > > > > > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > > > > > > > > This says that you started Solr with the default 512MB heap. Which > is > > > > VERY VERY small. The default is small so that Solr will start on > > > > virtually any hardware. Almost every user must increase the heap > size. > > > > And because the OS is killing processes, it is likely that the system > > > > does not have enough memory installed for what you have running on > it. > > > > > > > > It is generally not a good idea to share the server hardware between > > > > Solr and other software, unless the system has a lot of spare > > resources, > > > > memory in particular. > > > > > > > > Thanks, > > > > Shawn > > > > > > > > > >
Re: How to determine why solr stops running?
me personally, around 290gb. as much as we could shove into them On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson wrote: > How much physical RAM? A rule of thumb is that you should allocate no more > than 25-50 percent of the total physical RAM to Solr. That's cumulative, > i.e. the sum of the heap allocations across all your JVMs should be below > that percentage. See Uwe Schindler's mmapdirectiry blog... > > Shot in the dark... > > On Tue, Jun 16, 2020, 11:51 David Hastings > wrote: > > > To add to this, i generally have solr start with this: > > -Xms31000m-Xmx31000m > > > > and the only other thing that runs on them are maria db gallera cluster > > nodes that are not in use (aside from replication) > > > > the 31gb is not an accident either, you dont want 32gb. > > > > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey > wrote: > > > > > On 6/11/2020 11:52 AM, Ryan W wrote: > > > >> I will check "dmesg" first, to find out any hardware error message. > > > > > > > > > > > > > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 > or > > > > sacrifice child > > > > [1521232.782908] Killed process 117529 (httpd), UID 48, > > > total-vm:675824kB, > > > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > > > > > > > Is this a relevant "Out of memory" message? Does this suggest an OOM > > > > situation is the culprit? > > > > > > Because this was in the "dmesg" output, it indicates that it is the > > > operating system killing programs because the *system* doesn't have any > > > memory left. It wasn't Java that did this, and it wasn't Solr that was > > > killed. It very well could have been Solr that was killed at another > > > time, though. > > > > > > The process that it killed this time is named httpd ... which is most > > > likely the Apache webserver. Because the UID is 48, this is probably > an > > > OS derived from Redhat, where the "apache" user has UID and GID 48 by > > > default. Apache with its default config can be VERY memory hungry when > > > it gets busy. > > > > > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > > > > > > This says that you started Solr with the default 512MB heap. Which is > > > VERY VERY small. The default is small so that Solr will start on > > > virtually any hardware. Almost every user must increase the heap size. > > > And because the OS is killing processes, it is likely that the system > > > does not have enough memory installed for what you have running on it. > > > > > > It is generally not a good idea to share the server hardware between > > > Solr and other software, unless the system has a lot of spare > resources, > > > memory in particular. > > > > > > Thanks, > > > Shawn > > > > > >
Re: How to determine why solr stops running?
How much physical RAM? A rule of thumb is that you should allocate no more than 25-50 percent of the total physical RAM to Solr. That's cumulative, i.e. the sum of the heap allocations across all your JVMs should be below that percentage. See Uwe Schindler's mmapdirectiry blog... Shot in the dark... On Tue, Jun 16, 2020, 11:51 David Hastings wrote: > To add to this, i generally have solr start with this: > -Xms31000m-Xmx31000m > > and the only other thing that runs on them are maria db gallera cluster > nodes that are not in use (aside from replication) > > the 31gb is not an accident either, you dont want 32gb. > > > On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey wrote: > > > On 6/11/2020 11:52 AM, Ryan W wrote: > > >> I will check "dmesg" first, to find out any hardware error message. > > > > > > > > > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or > > > sacrifice child > > > [1521232.782908] Killed process 117529 (httpd), UID 48, > > total-vm:675824kB, > > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > > > > > Is this a relevant "Out of memory" message? Does this suggest an OOM > > > situation is the culprit? > > > > Because this was in the "dmesg" output, it indicates that it is the > > operating system killing programs because the *system* doesn't have any > > memory left. It wasn't Java that did this, and it wasn't Solr that was > > killed. It very well could have been Solr that was killed at another > > time, though. > > > > The process that it killed this time is named httpd ... which is most > > likely the Apache webserver. Because the UID is 48, this is probably an > > OS derived from Redhat, where the "apache" user has UID and GID 48 by > > default. Apache with its default config can be VERY memory hungry when > > it gets busy. > > > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > > > > This says that you started Solr with the default 512MB heap. Which is > > VERY VERY small. The default is small so that Solr will start on > > virtually any hardware. Almost every user must increase the heap size. > > And because the OS is killing processes, it is likely that the system > > does not have enough memory installed for what you have running on it. > > > > It is generally not a good idea to share the server hardware between > > Solr and other software, unless the system has a lot of spare resources, > > memory in particular. > > > > Thanks, > > Shawn > > >
Re: How to determine why solr stops running?
To add to this, i generally have solr start with this: -Xms31000m-Xmx31000m and the only other thing that runs on them are maria db gallera cluster nodes that are not in use (aside from replication) the 31gb is not an accident either, you dont want 32gb. On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey wrote: > On 6/11/2020 11:52 AM, Ryan W wrote: > >> I will check "dmesg" first, to find out any hardware error message. > > > > > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or > > sacrifice child > > [1521232.782908] Killed process 117529 (httpd), UID 48, > total-vm:675824kB, > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > > > Is this a relevant "Out of memory" message? Does this suggest an OOM > > situation is the culprit? > > Because this was in the "dmesg" output, it indicates that it is the > operating system killing programs because the *system* doesn't have any > memory left. It wasn't Java that did this, and it wasn't Solr that was > killed. It very well could have been Solr that was killed at another > time, though. > > The process that it killed this time is named httpd ... which is most > likely the Apache webserver. Because the UID is 48, this is probably an > OS derived from Redhat, where the "apache" user has UID and GID 48 by > default. Apache with its default config can be VERY memory hungry when > it gets busy. > > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > > This says that you started Solr with the default 512MB heap. Which is > VERY VERY small. The default is small so that Solr will start on > virtually any hardware. Almost every user must increase the heap size. > And because the OS is killing processes, it is likely that the system > does not have enough memory installed for what you have running on it. > > It is generally not a good idea to share the server hardware between > Solr and other software, unless the system has a lot of spare resources, > memory in particular. > > Thanks, > Shawn >
Re: How to determine why solr stops running?
On 6/11/2020 11:52 AM, Ryan W wrote: I will check "dmesg" first, to find out any hardware error message. [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or sacrifice child [1521232.782908] Killed process 117529 (httpd), UID 48, total-vm:675824kB, anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB Is this a relevant "Out of memory" message? Does this suggest an OOM situation is the culprit? Because this was in the "dmesg" output, it indicates that it is the operating system killing programs because the *system* doesn't have any memory left. It wasn't Java that did this, and it wasn't Solr that was killed. It very well could have been Solr that was killed at another time, though. The process that it killed this time is named httpd ... which is most likely the Apache webserver. Because the UID is 48, this is probably an OS derived from Redhat, where the "apache" user has UID and GID 48 by default. Apache with its default config can be VERY memory hungry when it gets busy. -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 This says that you started Solr with the default 512MB heap. Which is VERY VERY small. The default is small so that Solr will start on virtually any hardware. Almost every user must increase the heap size. And because the OS is killing processes, it is likely that the system does not have enough memory installed for what you have running on it. It is generally not a good idea to share the server hardware between Solr and other software, unless the system has a lot of spare resources, memory in particular. Thanks, Shawn
Re: How to determine why solr stops running?
17728 -XX:MaxTenuringThreshold=8 >> >>>> -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728 >> >>>> -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184 >> >>>> -XX:-OmitStackTraceInFastThrow >> >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 >> >>> /opt/solr/server/logs >> >>>> -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled >> >>>> -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC >> >>>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps >> >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC >> >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 >> >>>> -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256 >> >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers >> >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC >> -XX:+UseGCLogFileRotation >> >>>> -XX:+UseParNewGC >> >>>> >> >>>> Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". >> >>> But I >> >>>> think this is just a setting that indicates what to do in case of an >> >>> OOM. >> >>>> And if I look in that oom_solr.sh file, I see it would write an entry >> >>> to a >> >>>> solr_oom_kill log. And there is no such log in the logs directory. >> >>>> >> >>>> Many thanks. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>>> Then use some system admin tools to monitor that server, >> >>>>> for instance, top, vmstat, lsof, iostat ... or simply install some >> nice >> >>>>> free monitoring tool into this system, like monit, monitorix, >> nagios. >> >>>>> Good luck! >> >>>>> >> >>>>> >> >>>>> From: Ryan W >> >>>>> Sent: Thursday, June 11, 2020 2:13 AM >> >>>>> To: solr-user@lucene.apache.org >> >>>>> Subject: Re: How to determine why solr stops running? >> >>>>> >> >>>>> Hi all, >> >>>>> >> >>>>> People keep suggesting I check the logs for errors. What do those >> >>> errors >> >>>>> look like? Does anyone have examples of the text of a Solr oom >> >>> error? Or >> >>>>> the text of any other errors I should be looking for the next time >> solr >> >>>>> fails? Are there phrases I should grep for in the logs? Should I >> be >> >>>>> looking in the Solr logs for an OOM error, or in the Apache logs? >> >>>>> >> >>>>> There is nothing failing on the server except for solr -- at least >> not >> >>> that >> >>>>> I can see. There is no apparent problem with the hardware or >> anything >> >>> else >> >>>>> on the server. The OS is Red Hat Enterprise Linux. The server has >> 16 >> >>> GB of >> >>>>> RAM and hosts one website that does not get a huge amount of >> traffic. >> >>>>> >> >>>>> When the start command is given to solr, does it first check to see >> if >> >>> solr >> >>>>> is running, or does it always start solr whether it is already >> running >> >>> or >> >>>>> not? >> >>>>> >> >>>>> Many thanks! >> >>>>> Ryan >> >>>>> >> >>>>> >> >>>>> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson < >> erickerick...@gmail.com >> >>>> >> >>>>> wrote: >> >>>>> >> >>>>>> To add to what Dave said, if you have a particular machine that’s >> >>> prone >> >>>>> to >> >>>>>> suddenly stopping, that’s usually a red flag that you should >> seriously >> >>>>>> think about hardware issues. >> >>>>>> >> >>>>>> If the problem strikes different machines, then I agree with Shawn >> >>> that >> >>>>>> the first thing I’d be suspicious of is OOM
Re: How to determine why solr stops running?
; -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8 > >>>> -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728 > >>>> -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184 > >>>> -XX:-OmitStackTraceInFastThrow > >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 > >>> /opt/solr/server/logs > >>>> -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled > >>>> -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC > >>>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps > >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC > >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 > >>>> -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256 > >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers > >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC > -XX:+UseGCLogFileRotation > >>>> -XX:+UseParNewGC > >>>> > >>>> Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". > >>> But I > >>>> think this is just a setting that indicates what to do in case of an > >>> OOM. > >>>> And if I look in that oom_solr.sh file, I see it would write an entry > >>> to a > >>>> solr_oom_kill log. And there is no such log in the logs directory. > >>>> > >>>> Many thanks. > >>>> > >>>> > >>>> > >>>> > >>>>> Then use some system admin tools to monitor that server, > >>>>> for instance, top, vmstat, lsof, iostat ... or simply install some > nice > >>>>> free monitoring tool into this system, like monit, monitorix, nagios. > >>>>> Good luck! > >>>>> > >>>>> > >>>>> From: Ryan W > >>>>> Sent: Thursday, June 11, 2020 2:13 AM > >>>>> To: solr-user@lucene.apache.org > >>>>> Subject: Re: How to determine why solr stops running? > >>>>> > >>>>> Hi all, > >>>>> > >>>>> People keep suggesting I check the logs for errors. What do those > >>> errors > >>>>> look like? Does anyone have examples of the text of a Solr oom > >>> error? Or > >>>>> the text of any other errors I should be looking for the next time > solr > >>>>> fails? Are there phrases I should grep for in the logs? Should I be > >>>>> looking in the Solr logs for an OOM error, or in the Apache logs? > >>>>> > >>>>> There is nothing failing on the server except for solr -- at least > not > >>> that > >>>>> I can see. There is no apparent problem with the hardware or > anything > >>> else > >>>>> on the server. The OS is Red Hat Enterprise Linux. The server has 16 > >>> GB of > >>>>> RAM and hosts one website that does not get a huge amount of traffic. > >>>>> > >>>>> When the start command is given to solr, does it first check to see > if > >>> solr > >>>>> is running, or does it always start solr whether it is already > running > >>> or > >>>>> not? > >>>>> > >>>>> Many thanks! > >>>>> Ryan > >>>>> > >>>>> > >>>>> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson < > erickerick...@gmail.com > >>>> > >>>>> wrote: > >>>>> > >>>>>> To add to what Dave said, if you have a particular machine that’s > >>> prone > >>>>> to > >>>>>> suddenly stopping, that’s usually a red flag that you should > seriously > >>>>>> think about hardware issues. > >>>>>> > >>>>>> If the problem strikes different machines, then I agree with Shawn > >>> that > >>>>>> the first thing I’d be suspicious of is OOM errors. > >>>>>> > >>>>>> FWIW, > >>>>>> Erick > >>>>>> > >>>>>>> On Jun 9, 2020, at 6:05 AM, Dave > >>> wrote: > >>>>>>> > >>>>>>> I’ll add that whenever I’ve had a solr instance shut down, for me > >>&g
Re: How to determine why solr stops running?
ngDistribution -XX:SurvivorRatio=4 >>>> -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256 >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation >>>> -XX:+UseParNewGC >>>> >>>> Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". >>> But I >>>> think this is just a setting that indicates what to do in case of an >>> OOM. >>>> And if I look in that oom_solr.sh file, I see it would write an entry >>> to a >>>> solr_oom_kill log. And there is no such log in the logs directory. >>>> >>>> Many thanks. >>>> >>>> >>>> >>>> >>>>> Then use some system admin tools to monitor that server, >>>>> for instance, top, vmstat, lsof, iostat ... or simply install some nice >>>>> free monitoring tool into this system, like monit, monitorix, nagios. >>>>> Good luck! >>>>> >>>>> >>>>> From: Ryan W >>>>> Sent: Thursday, June 11, 2020 2:13 AM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Re: How to determine why solr stops running? >>>>> >>>>> Hi all, >>>>> >>>>> People keep suggesting I check the logs for errors. What do those >>> errors >>>>> look like? Does anyone have examples of the text of a Solr oom >>> error? Or >>>>> the text of any other errors I should be looking for the next time solr >>>>> fails? Are there phrases I should grep for in the logs? Should I be >>>>> looking in the Solr logs for an OOM error, or in the Apache logs? >>>>> >>>>> There is nothing failing on the server except for solr -- at least not >>> that >>>>> I can see. There is no apparent problem with the hardware or anything >>> else >>>>> on the server. The OS is Red Hat Enterprise Linux. The server has 16 >>> GB of >>>>> RAM and hosts one website that does not get a huge amount of traffic. >>>>> >>>>> When the start command is given to solr, does it first check to see if >>> solr >>>>> is running, or does it always start solr whether it is already running >>> or >>>>> not? >>>>> >>>>> Many thanks! >>>>> Ryan >>>>> >>>>> >>>>> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson >>> >>>>> wrote: >>>>> >>>>>> To add to what Dave said, if you have a particular machine that’s >>> prone >>>>> to >>>>>> suddenly stopping, that’s usually a red flag that you should seriously >>>>>> think about hardware issues. >>>>>> >>>>>> If the problem strikes different machines, then I agree with Shawn >>> that >>>>>> the first thing I’d be suspicious of is OOM errors. >>>>>> >>>>>> FWIW, >>>>>> Erick >>>>>> >>>>>>> On Jun 9, 2020, at 6:05 AM, Dave >>> wrote: >>>>>>> >>>>>>> I’ll add that whenever I’ve had a solr instance shut down, for me >>> it’s >>>>>> been a hardware failure. Either the ram or the disk got a “glitch” and >>>>> both >>>>>> of these are relatively fragile and wear and tear type parts of the >>>>>> machine, and should be expected to fail and be replaced from time to >>>>> time. >>>>>> Solr is pretty aggressive with its logging so there are a lot of >>> writes >>>>>> always happening and of course reads, if the disk has any issues or >>> the >>>>>> memory it can lock it up and bring her down, more so if you have any >>>>>> spellcheck dictionaries or suggesters being built on start up. >>>>>>> >>>>>>> Just my experience with this, could be wrong (most likely wrong) but >>> we >>>>>> always have extra drives and memory around the server room for this >>>>>> reason. At least once or twice a year we will have a disk failure in >>> the >>>>>> raid and need to swap in a new one. >>>>>>> >&g
Re: How to determine why solr stops running?
It happened again today. Again, no other apparent problems on the server. Nothing else is stopping. Nothing in the logs that strikes me as useful. I'm using Red Hat Linux 7.8 and Solr 7.7.2. Solr is stopping a couple times per week and I don't know how to determine why. On Sun, Jun 14, 2020 at 9:41 AM Ryan W wrote: > Thank you. I pasted those settings at the end of my /etc/default/ > solr.in.sh just now and restarted solr. I will see if that fixes it. > Previously, I had no settings at all in solr.in.sh except for SOLR_PORT. > > On Thu, Jun 11, 2020 at 1:59 PM Walter Underwood > wrote: > >> 1. You have a tiny heap. 536 Megabytes is not enough. >> 2. I stopped using the CMS GC years ago. >> >> Here is the GC config we use on every one of our 150+ Solr hosts. We’re >> still on Java 8, but will be upgrading soon. >> >> SOLR_HEAP=8g >> # Use G1 GC -- wunder 2017-01-23 >> # Settings from https://wiki.apache.org/solr/ShawnHeisey >> GC_TUNE=" \ >> -XX:+UseG1GC \ >> -XX:+ParallelRefProcEnabled \ >> -XX:G1HeapRegionSize=8m \ >> -XX:MaxGCPauseMillis=200 \ >> -XX:+UseLargePages \ >> -XX:+AggressiveOpts \ >> " >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> > On Jun 11, 2020, at 10:52 AM, Ryan W wrote: >> > >> > On Wed, Jun 10, 2020 at 8:35 PM Hup Chen wrote: >> > >> >> I will check "dmesg" first, to find out any hardware error message. >> >> >> > >> > Here is what I see toward the end of the output from dmesg: >> > >> > [1521232.781785] [118857]48 118857 108785 677 201 >> > 901 0 httpd >> > [1521232.781787] [118860]48 118860 108785 710 201 >> > 881 0 httpd >> > [1521232.781788] [118862]48 118862 113063 5256 210 >> > 725 0 httpd >> > [1521232.781790] [118864]48 118864 114085 6634 212 >> > 703 0 httpd >> > [1521232.781791] [118871]48 118871 13968732323 262 >> > 620 0 httpd >> > [1521232.781793] [118873]48 118873 108785 821 201 >> > 792 0 httpd >> > [1521232.781795] [118879]48 118879 14026332719 263 >> > 621 0 httpd >> > [1521232.781796] [118903]48 118903 108785 812 201 >> > 771 0 httpd >> > [1521232.781798] [118905]48 118905 113575 5606 211 >> > 660 0 httpd >> > [1521232.781800] [118906]48 118906 113563 5694 211 >> > 626 0 httpd >> > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or >> > sacrifice child >> > [1521232.782908] Killed process 117529 (httpd), UID 48, >> total-vm:675824kB, >> > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB >> > >> > Is this a relevant "Out of memory" message? Does this suggest an OOM >> > situation is the culprit? >> > >> > When I grep in the solr logs for oom, I see some entries like this... >> > >> > ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4 >> > -XX:CMSInitiatingOccupancyFraction=50 >> -XX:CMSMaxAbortablePrecleanTime=6000 >> > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark >> > -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520 >> > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 >> > -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8 >> > -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728 >> > -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184 >> > -XX:-OmitStackTraceInFastThrow >> > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 >> /opt/solr/server/logs >> > -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled >> > -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC >> > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps >> > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC >> > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 >> > -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256 >> > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers >> > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation >> > -XX:+UseParNewGC >> > >> > Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". >> But I >> > think this is just a setting th
Re: How to determine why solr stops running?
Thank you. I pasted those settings at the end of my /etc/default/solr.in.sh just now and restarted solr. I will see if that fixes it. Previously, I had no settings at all in solr.in.sh except for SOLR_PORT. On Thu, Jun 11, 2020 at 1:59 PM Walter Underwood wrote: > 1. You have a tiny heap. 536 Megabytes is not enough. > 2. I stopped using the CMS GC years ago. > > Here is the GC config we use on every one of our 150+ Solr hosts. We’re > still on Java 8, but will be upgrading soon. > > SOLR_HEAP=8g > # Use G1 GC -- wunder 2017-01-23 > # Settings from https://wiki.apache.org/solr/ShawnHeisey > GC_TUNE=" \ > -XX:+UseG1GC \ > -XX:+ParallelRefProcEnabled \ > -XX:G1HeapRegionSize=8m \ > -XX:MaxGCPauseMillis=200 \ > -XX:+UseLargePages \ > -XX:+AggressiveOpts \ > " > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Jun 11, 2020, at 10:52 AM, Ryan W wrote: > > > > On Wed, Jun 10, 2020 at 8:35 PM Hup Chen wrote: > > > >> I will check "dmesg" first, to find out any hardware error message. > >> > > > > Here is what I see toward the end of the output from dmesg: > > > > [1521232.781785] [118857]48 118857 108785 677 201 > > 901 0 httpd > > [1521232.781787] [118860]48 118860 108785 710 201 > > 881 0 httpd > > [1521232.781788] [118862]48 118862 113063 5256 210 > > 725 0 httpd > > [1521232.781790] [118864]48 118864 114085 6634 212 > > 703 0 httpd > > [1521232.781791] [118871]48 118871 13968732323 262 > > 620 0 httpd > > [1521232.781793] [118873]48 118873 108785 821 201 > > 792 0 httpd > > [1521232.781795] [118879]48 118879 14026332719 263 > > 621 0 httpd > > [1521232.781796] [118903]48 118903 108785 812 201 > > 771 0 httpd > > [1521232.781798] [118905]48 118905 113575 5606 211 > > 660 0 httpd > > [1521232.781800] [118906]48 118906 113563 5694 211 > > 626 0 httpd > > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or > > sacrifice child > > [1521232.782908] Killed process 117529 (httpd), UID 48, > total-vm:675824kB, > > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > > > Is this a relevant "Out of memory" message? Does this suggest an OOM > > situation is the culprit? > > > > When I grep in the solr logs for oom, I see some entries like this... > > > > ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4 > > -XX:CMSInitiatingOccupancyFraction=50 > -XX:CMSMaxAbortablePrecleanTime=6000 > > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark > > -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520 > > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > > -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8 > > -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728 > > -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184 > > -XX:-OmitStackTraceInFastThrow > > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 > /opt/solr/server/logs > > -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled > > -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC > > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps > > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC > > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 > > -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256 > > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers > > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation > > -XX:+UseParNewGC > > > > Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". > But I > > think this is just a setting that indicates what to do in case of an OOM. > > And if I look in that oom_solr.sh file, I see it would write an entry to > a > > solr_oom_kill log. And there is no such log in the logs directory. > > > > Many thanks. > > > > > > > > > >> Then use some system admin tools to monitor that server, > >> for instance, top, vmstat, lsof, iostat ... or simply install some nice > >> free monitoring tool into this system, like monit, monitorix, nagios. > >> Good luck! > >> > >> > >> From: Ryan W > >> Sent: Thursday, June 11, 2020 2:13 AM > >>
Re: How to determine why solr stops running?
1. You have a tiny heap. 536 Megabytes is not enough. 2. I stopped using the CMS GC years ago. Here is the GC config we use on every one of our 150+ Solr hosts. We’re still on Java 8, but will be upgrading soon. SOLR_HEAP=8g # Use G1 GC -- wunder 2017-01-23 # Settings from https://wiki.apache.org/solr/ShawnHeisey GC_TUNE=" \ -XX:+UseG1GC \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \ -XX:MaxGCPauseMillis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 11, 2020, at 10:52 AM, Ryan W wrote: > > On Wed, Jun 10, 2020 at 8:35 PM Hup Chen wrote: > >> I will check "dmesg" first, to find out any hardware error message. >> > > Here is what I see toward the end of the output from dmesg: > > [1521232.781785] [118857]48 118857 108785 677 201 > 901 0 httpd > [1521232.781787] [118860]48 118860 108785 710 201 > 881 0 httpd > [1521232.781788] [118862]48 118862 113063 5256 210 > 725 0 httpd > [1521232.781790] [118864]48 118864 114085 6634 212 > 703 0 httpd > [1521232.781791] [118871]48 118871 13968732323 262 > 620 0 httpd > [1521232.781793] [118873]48 118873 108785 821 201 > 792 0 httpd > [1521232.781795] [118879]48 118879 14026332719 263 > 621 0 httpd > [1521232.781796] [118903]48 118903 108785 812 201 > 771 0 httpd > [1521232.781798] [118905]48 118905 113575 5606 211 > 660 0 httpd > [1521232.781800] [118906]48 118906 113563 5694 211 > 626 0 httpd > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or > sacrifice child > [1521232.782908] Killed process 117529 (httpd), UID 48, total-vm:675824kB, > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > > Is this a relevant "Out of memory" message? Does this suggest an OOM > situation is the culprit? > > When I grep in the solr logs for oom, I see some entries like this... > > ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4 > -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark > -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520 > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8 > -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728 > -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184 > -XX:-OmitStackTraceInFastThrow > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs > -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled > -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 > -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation > -XX:+UseParNewGC > > Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". But I > think this is just a setting that indicates what to do in case of an OOM. > And if I look in that oom_solr.sh file, I see it would write an entry to a > solr_oom_kill log. And there is no such log in the logs directory. > > Many thanks. > > > > >> Then use some system admin tools to monitor that server, >> for instance, top, vmstat, lsof, iostat ... or simply install some nice >> free monitoring tool into this system, like monit, monitorix, nagios. >> Good luck! >> >> >> From: Ryan W >> Sent: Thursday, June 11, 2020 2:13 AM >> To: solr-user@lucene.apache.org >> Subject: Re: How to determine why solr stops running? >> >> Hi all, >> >> People keep suggesting I check the logs for errors. What do those errors >> look like? Does anyone have examples of the text of a Solr oom error? Or >> the text of any other errors I should be looking for the next time solr >> fails? Are there phrases I should grep for in the logs? Should I be >> looking in the Solr logs for an OOM error, or in the Apache logs? >> >> There is nothing failing on the server except for solr -- at least not that >> I can see. There is no apparent problem with the hardware or anything else >> on the server. The OS is Red Hat Enterprise Linux. The serv
Re: How to determine why solr stops running?
On Wed, Jun 10, 2020 at 8:35 PM Hup Chen wrote: > I will check "dmesg" first, to find out any hardware error message. > Here is what I see toward the end of the output from dmesg: [1521232.781785] [118857]48 118857 108785 677 201 901 0 httpd [1521232.781787] [118860]48 118860 108785 710 201 881 0 httpd [1521232.781788] [118862]48 118862 113063 5256 210 725 0 httpd [1521232.781790] [118864]48 118864 114085 6634 212 703 0 httpd [1521232.781791] [118871]48 118871 13968732323 262 620 0 httpd [1521232.781793] [118873]48 118873 108785 821 201 792 0 httpd [1521232.781795] [118879]48 118879 14026332719 263 621 0 httpd [1521232.781796] [118903]48 118903 108785 812 201 771 0 httpd [1521232.781798] [118905]48 118905 113575 5606 211 660 0 httpd [1521232.781800] [118906]48 118906 113563 5694 211 626 0 httpd [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or sacrifice child [1521232.782908] Killed process 117529 (httpd), UID 48, total-vm:675824kB, anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB Is this a relevant "Out of memory" message? Does this suggest an OOM situation is the culprit? When I grep in the solr logs for oom, I see some entries like this... ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4 -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520 -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8 -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728 -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184 -XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". But I think this is just a setting that indicates what to do in case of an OOM. And if I look in that oom_solr.sh file, I see it would write an entry to a solr_oom_kill log. And there is no such log in the logs directory. Many thanks. > Then use some system admin tools to monitor that server, > for instance, top, vmstat, lsof, iostat ... or simply install some nice > free monitoring tool into this system, like monit, monitorix, nagios. > Good luck! > > > From: Ryan W > Sent: Thursday, June 11, 2020 2:13 AM > To: solr-user@lucene.apache.org > Subject: Re: How to determine why solr stops running? > > Hi all, > > People keep suggesting I check the logs for errors. What do those errors > look like? Does anyone have examples of the text of a Solr oom error? Or > the text of any other errors I should be looking for the next time solr > fails? Are there phrases I should grep for in the logs? Should I be > looking in the Solr logs for an OOM error, or in the Apache logs? > > There is nothing failing on the server except for solr -- at least not that > I can see. There is no apparent problem with the hardware or anything else > on the server. The OS is Red Hat Enterprise Linux. The server has 16 GB of > RAM and hosts one website that does not get a huge amount of traffic. > > When the start command is given to solr, does it first check to see if solr > is running, or does it always start solr whether it is already running or > not? > > Many thanks! > Ryan > > > On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson > wrote: > > > To add to what Dave said, if you have a particular machine that’s prone > to > > suddenly stopping, that’s usually a red flag that you should seriously > > think about hardware issues. > > > > If the problem strikes different machines, then I agree with Shawn that > > the first thing I’d be suspicious of is OOM errors. > > > > FWIW, > > Erick > > > > > On Jun 9, 2020, at 6:05 AM, Dave wrote: > > > > > > I’ll add that whenever I’ve had a solr instance shut down, for me it’s > > been a hardware failure. Either the ram or the disk got a “glitch” and > both > >
Re: How to determine why solr stops running?
On 6/10/2020 12:13 PM, Ryan W wrote: People keep suggesting I check the logs for errors. What do those errors look like? Does anyone have examples of the text of a Solr oom error? Or the text of any other errors I should be looking for the next time solr fails? Are there phrases I should grep for in the logs? Should I be looking in the Solr logs for an OOM error, or in the Apache logs? Are you running Solr on Windows? If you are, then a Jave OOME will NOT cause Solr to stop. On pretty much any other operating system, Solr will terminate when OOME occurs. This termination will create a separate logfile, one that contains very little actual information, really the only thing it says is that the oom killer script was executed. That logfile will have a filename like the following: solr_oom_killer-8983-2019-08-11_22_57_56.log If OOME is the reason Solr stops running, then the only place that exception will be logged is solr.log as far as I know ... but there exists a very real possibility that it won't actually be logged. It could occur at a place in the code that does not have any logging. At the URL below is an example of a logged OOME on a Solr server. In this case, it wasn't memory that was exhausted, the error was logging an inability to start a new thread: https://paste.apache.org/aznyg Thanks, Shawn
Re: How to determine why solr stops running?
I will check "dmesg" first, to find out any hardware error message. Then use some system admin tools to monitor that server, for instance, top, vmstat, lsof, iostat ... or simply install some nice free monitoring tool into this system, like monit, monitorix, nagios. Good luck! From: Ryan W Sent: Thursday, June 11, 2020 2:13 AM To: solr-user@lucene.apache.org Subject: Re: How to determine why solr stops running? Hi all, People keep suggesting I check the logs for errors. What do those errors look like? Does anyone have examples of the text of a Solr oom error? Or the text of any other errors I should be looking for the next time solr fails? Are there phrases I should grep for in the logs? Should I be looking in the Solr logs for an OOM error, or in the Apache logs? There is nothing failing on the server except for solr -- at least not that I can see. There is no apparent problem with the hardware or anything else on the server. The OS is Red Hat Enterprise Linux. The server has 16 GB of RAM and hosts one website that does not get a huge amount of traffic. When the start command is given to solr, does it first check to see if solr is running, or does it always start solr whether it is already running or not? Many thanks! Ryan On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson wrote: > To add to what Dave said, if you have a particular machine that’s prone to > suddenly stopping, that’s usually a red flag that you should seriously > think about hardware issues. > > If the problem strikes different machines, then I agree with Shawn that > the first thing I’d be suspicious of is OOM errors. > > FWIW, > Erick > > > On Jun 9, 2020, at 6:05 AM, Dave wrote: > > > > I’ll add that whenever I’ve had a solr instance shut down, for me it’s > been a hardware failure. Either the ram or the disk got a “glitch” and both > of these are relatively fragile and wear and tear type parts of the > machine, and should be expected to fail and be replaced from time to time. > Solr is pretty aggressive with its logging so there are a lot of writes > always happening and of course reads, if the disk has any issues or the > memory it can lock it up and bring her down, more so if you have any > spellcheck dictionaries or suggesters being built on start up. > > > > Just my experience with this, could be wrong (most likely wrong) but we > always have extra drives and memory around the server room for this > reason. At least once or twice a year we will have a disk failure in the > raid and need to swap in a new one. > > > > Good luck though, also solr should be logging it’s failures so it would > be good to look there too > > > >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey wrote: > >> > >> On 5/14/2020 7:22 AM, Ryan W wrote: > >>> I manage a site where solr has stopped running a couple times in the > past > >>> week. The server hasn't been rebooted, so that's not the reason. What > else > >>> causes solr to stop running? How can I investigate why this is > happening? > >> > >> Any situation where Solr stops running and nobody requested the stop is > a result of a serious problem that must be thoroughly investigated. I > think it's a bad idea for Solr to automatically restart when it stops > unexpectedly. Chances are that whatever caused the crash is going to > simply make the crash happen again until the problem is solved. > Automatically restarting could hide problems from the system administrator. > >> > >> The only way a Solr auto-restart would be acceptable to me is if it > sends a high priority alert to the sysadmin EVERY time it executes an > auto-restart. It really is that bad of a problem. > >> > >> The causes of Solr crashes (that I can think of) include the following. > I believe I have listed these four options from most likely to least likely: > >> > >> * Java OutOfMemoryError exceptions. On non-windows systems, the > "bin/solr" script starts Solr with an option that results in Solr's death > anytime one of these exceptions occurs. We do this because program > operation is indeterminate and completely unpredictable when OOME occurs, > so it's far safer to stop running. That exception can be caused by several > things, some of which actually do not involve memory at all. If you're > running on Windows via the bin\solr.cmd command, then this will not happen > ... but OOME could still cause a crash, because as I already mentioned, > program operation is unpredictable when OOME occurs. > >> > >> * The OS kills Solr because system memory is completely exhausted and > Solr is the process using the most memory. Linux calls this the >
Re: How to determine why solr stops running?
Hi all, People keep suggesting I check the logs for errors. What do those errors look like? Does anyone have examples of the text of a Solr oom error? Or the text of any other errors I should be looking for the next time solr fails? Are there phrases I should grep for in the logs? Should I be looking in the Solr logs for an OOM error, or in the Apache logs? There is nothing failing on the server except for solr -- at least not that I can see. There is no apparent problem with the hardware or anything else on the server. The OS is Red Hat Enterprise Linux. The server has 16 GB of RAM and hosts one website that does not get a huge amount of traffic. When the start command is given to solr, does it first check to see if solr is running, or does it always start solr whether it is already running or not? Many thanks! Ryan On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson wrote: > To add to what Dave said, if you have a particular machine that’s prone to > suddenly stopping, that’s usually a red flag that you should seriously > think about hardware issues. > > If the problem strikes different machines, then I agree with Shawn that > the first thing I’d be suspicious of is OOM errors. > > FWIW, > Erick > > > On Jun 9, 2020, at 6:05 AM, Dave wrote: > > > > I’ll add that whenever I’ve had a solr instance shut down, for me it’s > been a hardware failure. Either the ram or the disk got a “glitch” and both > of these are relatively fragile and wear and tear type parts of the > machine, and should be expected to fail and be replaced from time to time. > Solr is pretty aggressive with its logging so there are a lot of writes > always happening and of course reads, if the disk has any issues or the > memory it can lock it up and bring her down, more so if you have any > spellcheck dictionaries or suggesters being built on start up. > > > > Just my experience with this, could be wrong (most likely wrong) but we > always have extra drives and memory around the server room for this > reason. At least once or twice a year we will have a disk failure in the > raid and need to swap in a new one. > > > > Good luck though, also solr should be logging it’s failures so it would > be good to look there too > > > >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey wrote: > >> > >> On 5/14/2020 7:22 AM, Ryan W wrote: > >>> I manage a site where solr has stopped running a couple times in the > past > >>> week. The server hasn't been rebooted, so that's not the reason. What > else > >>> causes solr to stop running? How can I investigate why this is > happening? > >> > >> Any situation where Solr stops running and nobody requested the stop is > a result of a serious problem that must be thoroughly investigated. I > think it's a bad idea for Solr to automatically restart when it stops > unexpectedly. Chances are that whatever caused the crash is going to > simply make the crash happen again until the problem is solved. > Automatically restarting could hide problems from the system administrator. > >> > >> The only way a Solr auto-restart would be acceptable to me is if it > sends a high priority alert to the sysadmin EVERY time it executes an > auto-restart. It really is that bad of a problem. > >> > >> The causes of Solr crashes (that I can think of) include the following. > I believe I have listed these four options from most likely to least likely: > >> > >> * Java OutOfMemoryError exceptions. On non-windows systems, the > "bin/solr" script starts Solr with an option that results in Solr's death > anytime one of these exceptions occurs. We do this because program > operation is indeterminate and completely unpredictable when OOME occurs, > so it's far safer to stop running. That exception can be caused by several > things, some of which actually do not involve memory at all. If you're > running on Windows via the bin\solr.cmd command, then this will not happen > ... but OOME could still cause a crash, because as I already mentioned, > program operation is unpredictable when OOME occurs. > >> > >> * The OS kills Solr because system memory is completely exhausted and > Solr is the process using the most memory. Linux calls this the > "oom-killer" ... I am pretty sure something like it exists on most > operating systems. > >> > >> * Corruption somewhere in the system. Could be in Java, the OS, Solr, > or data used by any of those. > >> > >> * A very serious bug in Solr's code that we haven't discovered yet. > >> > >> I included that last one simply for completeness. A bug that causes a > crash *COULD* exist, but as of right now, we have not seen any supporting > evidence. > >> > >> My guess is that Java OutOfMemoryError is the cause here, but I can't > be certain. If that is happening, then some resource (which might not be > memory) is fully depleted. We would need to see the full OutOfMemoryError > exception in order to determine why it is happening. Sometimes the > exception is logged in solr.log, sometimes it
Re: How to determine why solr stops running?
To add to what Dave said, if you have a particular machine that’s prone to suddenly stopping, that’s usually a red flag that you should seriously think about hardware issues. If the problem strikes different machines, then I agree with Shawn that the first thing I’d be suspicious of is OOM errors. FWIW, Erick > On Jun 9, 2020, at 6:05 AM, Dave wrote: > > I’ll add that whenever I’ve had a solr instance shut down, for me it’s been a > hardware failure. Either the ram or the disk got a “glitch” and both of these > are relatively fragile and wear and tear type parts of the machine, and > should be expected to fail and be replaced from time to time. Solr is pretty > aggressive with its logging so there are a lot of writes always happening and > of course reads, if the disk has any issues or the memory it can lock it up > and bring her down, more so if you have any spellcheck dictionaries or > suggesters being built on start up. > > Just my experience with this, could be wrong (most likely wrong) but we > always have extra drives and memory around the server room for this reason. > At least once or twice a year we will have a disk failure in the raid and > need to swap in a new one. > > Good luck though, also solr should be logging it’s failures so it would be > good to look there too > >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey wrote: >> >> On 5/14/2020 7:22 AM, Ryan W wrote: >>> I manage a site where solr has stopped running a couple times in the past >>> week. The server hasn't been rebooted, so that's not the reason. What else >>> causes solr to stop running? How can I investigate why this is happening? >> >> Any situation where Solr stops running and nobody requested the stop is a >> result of a serious problem that must be thoroughly investigated. I think >> it's a bad idea for Solr to automatically restart when it stops >> unexpectedly. Chances are that whatever caused the crash is going to simply >> make the crash happen again until the problem is solved. Automatically >> restarting could hide problems from the system administrator. >> >> The only way a Solr auto-restart would be acceptable to me is if it sends a >> high priority alert to the sysadmin EVERY time it executes an auto-restart. >> It really is that bad of a problem. >> >> The causes of Solr crashes (that I can think of) include the following. I >> believe I have listed these four options from most likely to least likely: >> >> * Java OutOfMemoryError exceptions. On non-windows systems, the "bin/solr" >> script starts Solr with an option that results in Solr's death anytime one >> of these exceptions occurs. We do this because program operation is >> indeterminate and completely unpredictable when OOME occurs, so it's far >> safer to stop running. That exception can be caused by several things, some >> of which actually do not involve memory at all. If you're running on >> Windows via the bin\solr.cmd command, then this will not happen ... but OOME >> could still cause a crash, because as I already mentioned, program operation >> is unpredictable when OOME occurs. >> >> * The OS kills Solr because system memory is completely exhausted and Solr >> is the process using the most memory. Linux calls this the "oom-killer" ... >> I am pretty sure something like it exists on most operating systems. >> >> * Corruption somewhere in the system. Could be in Java, the OS, Solr, or >> data used by any of those. >> >> * A very serious bug in Solr's code that we haven't discovered yet. >> >> I included that last one simply for completeness. A bug that causes a crash >> *COULD* exist, but as of right now, we have not seen any supporting evidence. >> >> My guess is that Java OutOfMemoryError is the cause here, but I can't be >> certain. If that is happening, then some resource (which might not be >> memory) is fully depleted. We would need to see the full OutOfMemoryError >> exception in order to determine why it is happening. Sometimes the exception >> is logged in solr.log, sometimes it isn't. We cannot predict what part of >> the code will be running when OOME occurs, so it would be nearly impossible >> for us to guarantee logging. OOME can happen ANYWHERE - even in code that >> the compiler thinks is immune to exceptions. >> >> Side note to fellow committers: I wonder if we should implement an uncaught >> exception handler in Solr. I have found in my own programs that it helps >> figure out thorny problems. And while I am on the subject of handlers that >> might not be general knowledge, I didn't find a shutdown hook or a security >> manager outside of tests. >> >> Thanks, >> Shawn
Re: How to determine why solr stops running?
I’ll add that whenever I’ve had a solr instance shut down, for me it’s been a hardware failure. Either the ram or the disk got a “glitch” and both of these are relatively fragile and wear and tear type parts of the machine, and should be expected to fail and be replaced from time to time. Solr is pretty aggressive with its logging so there are a lot of writes always happening and of course reads, if the disk has any issues or the memory it can lock it up and bring her down, more so if you have any spellcheck dictionaries or suggesters being built on start up. Just my experience with this, could be wrong (most likely wrong) but we always have extra drives and memory around the server room for this reason. At least once or twice a year we will have a disk failure in the raid and need to swap in a new one. Good luck though, also solr should be logging it’s failures so it would be good to look there too > On Jun 9, 2020, at 2:35 AM, Shawn Heisey wrote: > > On 5/14/2020 7:22 AM, Ryan W wrote: >> I manage a site where solr has stopped running a couple times in the past >> week. The server hasn't been rebooted, so that's not the reason. What else >> causes solr to stop running? How can I investigate why this is happening? > > Any situation where Solr stops running and nobody requested the stop is a > result of a serious problem that must be thoroughly investigated. I think > it's a bad idea for Solr to automatically restart when it stops unexpectedly. > Chances are that whatever caused the crash is going to simply make the crash > happen again until the problem is solved. Automatically restarting could hide > problems from the system administrator. > > The only way a Solr auto-restart would be acceptable to me is if it sends a > high priority alert to the sysadmin EVERY time it executes an auto-restart. > It really is that bad of a problem. > > The causes of Solr crashes (that I can think of) include the following. I > believe I have listed these four options from most likely to least likely: > > * Java OutOfMemoryError exceptions. On non-windows systems, the "bin/solr" > script starts Solr with an option that results in Solr's death anytime one of > these exceptions occurs. We do this because program operation is > indeterminate and completely unpredictable when OOME occurs, so it's far > safer to stop running. That exception can be caused by several things, some > of which actually do not involve memory at all. If you're running on Windows > via the bin\solr.cmd command, then this will not happen ... but OOME could > still cause a crash, because as I already mentioned, program operation is > unpredictable when OOME occurs. > > * The OS kills Solr because system memory is completely exhausted and Solr is > the process using the most memory. Linux calls this the "oom-killer" ... I > am pretty sure something like it exists on most operating systems. > > * Corruption somewhere in the system. Could be in Java, the OS, Solr, or > data used by any of those. > > * A very serious bug in Solr's code that we haven't discovered yet. > > I included that last one simply for completeness. A bug that causes a crash > *COULD* exist, but as of right now, we have not seen any supporting evidence. > > My guess is that Java OutOfMemoryError is the cause here, but I can't be > certain. If that is happening, then some resource (which might not be > memory) is fully depleted. We would need to see the full OutOfMemoryError > exception in order to determine why it is happening. Sometimes the exception > is logged in solr.log, sometimes it isn't. We cannot predict what part of > the code will be running when OOME occurs, so it would be nearly impossible > for us to guarantee logging. OOME can happen ANYWHERE - even in code that > the compiler thinks is immune to exceptions. > > Side note to fellow committers: I wonder if we should implement an uncaught > exception handler in Solr. I have found in my own programs that it helps > figure out thorny problems. And while I am on the subject of handlers that > might not be general knowledge, I didn't find a shutdown hook or a security > manager outside of tests. > > Thanks, > Shawn
Re: How to determine why solr stops running?
On 5/14/2020 7:22 AM, Ryan W wrote: I manage a site where solr has stopped running a couple times in the past week. The server hasn't been rebooted, so that's not the reason. What else causes solr to stop running? How can I investigate why this is happening? Any situation where Solr stops running and nobody requested the stop is a result of a serious problem that must be thoroughly investigated. I think it's a bad idea for Solr to automatically restart when it stops unexpectedly. Chances are that whatever caused the crash is going to simply make the crash happen again until the problem is solved. Automatically restarting could hide problems from the system administrator. The only way a Solr auto-restart would be acceptable to me is if it sends a high priority alert to the sysadmin EVERY time it executes an auto-restart. It really is that bad of a problem. The causes of Solr crashes (that I can think of) include the following. I believe I have listed these four options from most likely to least likely: * Java OutOfMemoryError exceptions. On non-windows systems, the "bin/solr" script starts Solr with an option that results in Solr's death anytime one of these exceptions occurs. We do this because program operation is indeterminate and completely unpredictable when OOME occurs, so it's far safer to stop running. That exception can be caused by several things, some of which actually do not involve memory at all. If you're running on Windows via the bin\solr.cmd command, then this will not happen ... but OOME could still cause a crash, because as I already mentioned, program operation is unpredictable when OOME occurs. * The OS kills Solr because system memory is completely exhausted and Solr is the process using the most memory. Linux calls this the "oom-killer" ... I am pretty sure something like it exists on most operating systems. * Corruption somewhere in the system. Could be in Java, the OS, Solr, or data used by any of those. * A very serious bug in Solr's code that we haven't discovered yet. I included that last one simply for completeness. A bug that causes a crash *COULD* exist, but as of right now, we have not seen any supporting evidence. My guess is that Java OutOfMemoryError is the cause here, but I can't be certain. If that is happening, then some resource (which might not be memory) is fully depleted. We would need to see the full OutOfMemoryError exception in order to determine why it is happening. Sometimes the exception is logged in solr.log, sometimes it isn't. We cannot predict what part of the code will be running when OOME occurs, so it would be nearly impossible for us to guarantee logging. OOME can happen ANYWHERE - even in code that the compiler thinks is immune to exceptions. Side note to fellow committers: I wonder if we should implement an uncaught exception handler in Solr. I have found in my own programs that it helps figure out thorny problems. And while I am on the subject of handlers that might not be general knowledge, I didn't find a shutdown hook or a security manager outside of tests. Thanks, Shawn
Re: How to determine why solr stops running?
I assumed it does, based on your description. If you installed it as a service (systemd), then systemd can start the service again if it fails. (something like Restart=always in your [Service] definition). But if it doesn’t restart automatically now, I think it’s easier to troubleshoot: just check the last logs after it crashed. Best regards, Radu https://sematext.com > On 8 Jun 2020, at 16:28, Ryan W wrote: > > "If Solr auto-restarts" > > It doesn't auto-restart. Is there some auto-restart functionality? I'm > not aware of that. > > On Mon, Jun 8, 2020 at 7:10 AM Radu Gheorghe > wrote: > >> Hi Ryan, >> >> If Solr auto-restarts, I suppose it's systemd doing that. When it restarts >> the Solr service, systemd should log this (maybe somethibg like: journalctl >> --no-pager | grep -i solr). >> >> Then you can go in your Solr logs and check what happened right before that >> time. Also, check system logs for what happened before Solr was restarted. >> >> Best regards, >> Radu >> >> https://sematext.com/ >> >> joi, 4 iun. 2020, 19:24 Ryan W a scris: >> >>> Happened again today. Solr stopped running. Apache hasn't stopped in 10 >>> days, so this is not due to a server reboot. >>> >>> Solr is not being run with the oom-killer. And when I grep for ERROR in >>> the logs, there is nothing from today. >>> >>> On Mon, May 18, 2020 at 3:15 PM James Greene < >> ja...@jamesaustingreene.com> >>> wrote: >>> I usually do a combination of grepping for ERROR in solr logs and >>> checking journalctl to see if an external program may have killed the process. Cheers, / * James Austin Greene * www.jamesaustingreene.com * 336-lol-nerd / On Mon, May 18, 2020 at 1:39 PM Erick Erickson < >> erickerick...@gmail.com> wrote: > ps aux | grep solr > > on a *.nix system will show you all the runtime parameters. > >> On May 18, 2020, at 12:46 PM, Ryan W wrote: >> >> Is there a config file containing the start params? I run solr >>> like... >> >> bin/solr start >> >> I have not seen anything in the logs that seems informative. When I grep > in >> the logs directory for 'memory', I see nothing besides a couple >>> entries >> like... >> >> 2020-05-14 13:05:56.155 INFO (main) [ ] > o.a.s.h.a.MetricsHistoryHandler >> No .system collection, keeping metrics history in memory. >> >> I don't know what that entry means, though the date does roughly coincide >> with the last time solr stopped running. >> >> Thank you. >> >> >> On Mon, May 18, 2020 at 12:00 PM Erick Erickson < erickerick...@gmail.com >> >> wrote: >> >>> Probably, but check that you are running with the oom-killer, >> it'll >>> be > in >>> your start params. >>> >>> But absent that, something external will be the culprit, Solr >>> doesn't > stop >>> by itself. Do look at the Solr log once things stop, it should >> show >>> if >>> someone or something stopped it. >>> >>> On Mon, May 18, 2020, 10:43 Ryan W wrote: >>> I don't see any log file with "oom" in the file name. Does that >>> mean >>> there hasn't been an out-of-memory issue? Thanks. On Thu, May 14, 2020 at 10:05 AM James Greene < >>> ja...@jamesaustingreene.com > wrote: > Check the log for for an OOM crash. Fatal exceptions will be in >>> the >>> main > solr log and out of memory errors will be in their own -oom log. > > I've encountered quite a few solr crashes and usually it's when >>> there's a > threshold of concurrent users and/or indexing happening. > > > > On Thu, May 14, 2020, 9:23 AM Ryan W wrote: > >> Hi all, >> >> I manage a site where solr has stopped running a couple times >> in the past >> week. The server hasn't been rebooted, so that's not the >> reason. >>> What > else >> causes solr to stop running? How can I investigate why this is > happening? >> >> Thank you, >> Ryan >> > >>> > > >>> >>
Re: How to determine why solr stops running?
"If Solr auto-restarts" It doesn't auto-restart. Is there some auto-restart functionality? I'm not aware of that. On Mon, Jun 8, 2020 at 7:10 AM Radu Gheorghe wrote: > Hi Ryan, > > If Solr auto-restarts, I suppose it's systemd doing that. When it restarts > the Solr service, systemd should log this (maybe somethibg like: journalctl > --no-pager | grep -i solr). > > Then you can go in your Solr logs and check what happened right before that > time. Also, check system logs for what happened before Solr was restarted. > > Best regards, > Radu > > https://sematext.com/ > > joi, 4 iun. 2020, 19:24 Ryan W a scris: > > > Happened again today. Solr stopped running. Apache hasn't stopped in 10 > > days, so this is not due to a server reboot. > > > > Solr is not being run with the oom-killer. And when I grep for ERROR in > > the logs, there is nothing from today. > > > > On Mon, May 18, 2020 at 3:15 PM James Greene < > ja...@jamesaustingreene.com> > > wrote: > > > > > I usually do a combination of grepping for ERROR in solr logs and > > checking > > > journalctl to see if an external program may have killed the process. > > > > > > > > > Cheers, > > > > > > / > > > * James Austin Greene > > > * www.jamesaustingreene.com > > > * 336-lol-nerd > > > / > > > > > > > > > On Mon, May 18, 2020 at 1:39 PM Erick Erickson < > erickerick...@gmail.com> > > > wrote: > > > > > > > ps aux | grep solr > > > > > > > > on a *.nix system will show you all the runtime parameters. > > > > > > > > > On May 18, 2020, at 12:46 PM, Ryan W wrote: > > > > > > > > > > Is there a config file containing the start params? I run solr > > like... > > > > > > > > > > bin/solr start > > > > > > > > > > I have not seen anything in the logs that seems informative. When I > > > grep > > > > in > > > > > the logs directory for 'memory', I see nothing besides a couple > > entries > > > > > like... > > > > > > > > > > 2020-05-14 13:05:56.155 INFO (main) [ ] > > > > o.a.s.h.a.MetricsHistoryHandler > > > > > No .system collection, keeping metrics history in memory. > > > > > > > > > > I don't know what that entry means, though the date does roughly > > > coincide > > > > > with the last time solr stopped running. > > > > > > > > > > Thank you. > > > > > > > > > > > > > > > On Mon, May 18, 2020 at 12:00 PM Erick Erickson < > > > erickerick...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > >> Probably, but check that you are running with the oom-killer, > it'll > > be > > > > in > > > > >> your start params. > > > > >> > > > > >> But absent that, something external will be the culprit, Solr > > doesn't > > > > stop > > > > >> by itself. Do look at the Solr log once things stop, it should > show > > if > > > > >> someone or something stopped it. > > > > >> > > > > >> On Mon, May 18, 2020, 10:43 Ryan W wrote: > > > > >> > > > > >>> I don't see any log file with "oom" in the file name. Does that > > mean > > > > >> there > > > > >>> hasn't been an out-of-memory issue? Thanks. > > > > >>> > > > > >>> On Thu, May 14, 2020 at 10:05 AM James Greene < > > > > >> ja...@jamesaustingreene.com > > > > > > > > >>> wrote: > > > > >>> > > > > Check the log for for an OOM crash. Fatal exceptions will be in > > the > > > > >> main > > > > solr log and out of memory errors will be in their own -oom log. > > > > > > > > I've encountered quite a few solr crashes and usually it's when > > > > >> there's a > > > > threshold of concurrent users and/or indexing happening. > > > > > > > > > > > > > > > > On Thu, May 14, 2020, 9:23 AM Ryan W wrote: > > > > > > > > > Hi all, > > > > > > > > > > I manage a site where solr has stopped running a couple times > in > > > the > > > > >>> past > > > > > week. The server hasn't been rebooted, so that's not the > reason. > > > > >> What > > > > else > > > > > causes solr to stop running? How can I investigate why this is > > > > happening? > > > > > > > > > > Thank you, > > > > > Ryan > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > >
Re: How to determine why solr stops running?
Hi Ryan, If Solr auto-restarts, I suppose it's systemd doing that. When it restarts the Solr service, systemd should log this (maybe somethibg like: journalctl --no-pager | grep -i solr). Then you can go in your Solr logs and check what happened right before that time. Also, check system logs for what happened before Solr was restarted. Best regards, Radu https://sematext.com/ joi, 4 iun. 2020, 19:24 Ryan W a scris: > Happened again today. Solr stopped running. Apache hasn't stopped in 10 > days, so this is not due to a server reboot. > > Solr is not being run with the oom-killer. And when I grep for ERROR in > the logs, there is nothing from today. > > On Mon, May 18, 2020 at 3:15 PM James Greene > wrote: > > > I usually do a combination of grepping for ERROR in solr logs and > checking > > journalctl to see if an external program may have killed the process. > > > > > > Cheers, > > > > / > > * James Austin Greene > > * www.jamesaustingreene.com > > * 336-lol-nerd > > / > > > > > > On Mon, May 18, 2020 at 1:39 PM Erick Erickson > > wrote: > > > > > ps aux | grep solr > > > > > > on a *.nix system will show you all the runtime parameters. > > > > > > > On May 18, 2020, at 12:46 PM, Ryan W wrote: > > > > > > > > Is there a config file containing the start params? I run solr > like... > > > > > > > > bin/solr start > > > > > > > > I have not seen anything in the logs that seems informative. When I > > grep > > > in > > > > the logs directory for 'memory', I see nothing besides a couple > entries > > > > like... > > > > > > > > 2020-05-14 13:05:56.155 INFO (main) [ ] > > > o.a.s.h.a.MetricsHistoryHandler > > > > No .system collection, keeping metrics history in memory. > > > > > > > > I don't know what that entry means, though the date does roughly > > coincide > > > > with the last time solr stopped running. > > > > > > > > Thank you. > > > > > > > > > > > > On Mon, May 18, 2020 at 12:00 PM Erick Erickson < > > erickerick...@gmail.com > > > > > > > > wrote: > > > > > > > >> Probably, but check that you are running with the oom-killer, it'll > be > > > in > > > >> your start params. > > > >> > > > >> But absent that, something external will be the culprit, Solr > doesn't > > > stop > > > >> by itself. Do look at the Solr log once things stop, it should show > if > > > >> someone or something stopped it. > > > >> > > > >> On Mon, May 18, 2020, 10:43 Ryan W wrote: > > > >> > > > >>> I don't see any log file with "oom" in the file name. Does that > mean > > > >> there > > > >>> hasn't been an out-of-memory issue? Thanks. > > > >>> > > > >>> On Thu, May 14, 2020 at 10:05 AM James Greene < > > > >> ja...@jamesaustingreene.com > > > > > > >>> wrote: > > > >>> > > > Check the log for for an OOM crash. Fatal exceptions will be in > the > > > >> main > > > solr log and out of memory errors will be in their own -oom log. > > > > > > I've encountered quite a few solr crashes and usually it's when > > > >> there's a > > > threshold of concurrent users and/or indexing happening. > > > > > > > > > > > > On Thu, May 14, 2020, 9:23 AM Ryan W wrote: > > > > > > > Hi all, > > > > > > > > I manage a site where solr has stopped running a couple times in > > the > > > >>> past > > > > week. The server hasn't been rebooted, so that's not the reason. > > > >> What > > > else > > > > causes solr to stop running? How can I investigate why this is > > > happening? > > > > > > > > Thank you, > > > > Ryan > > > > > > > > > > >>> > > > >> > > > > > > > > >
Re: How to determine why solr stops running?
Happened again today. Solr stopped running. Apache hasn't stopped in 10 days, so this is not due to a server reboot. Solr is not being run with the oom-killer. And when I grep for ERROR in the logs, there is nothing from today. On Mon, May 18, 2020 at 3:15 PM James Greene wrote: > I usually do a combination of grepping for ERROR in solr logs and checking > journalctl to see if an external program may have killed the process. > > > Cheers, > > / > * James Austin Greene > * www.jamesaustingreene.com > * 336-lol-nerd > / > > > On Mon, May 18, 2020 at 1:39 PM Erick Erickson > wrote: > > > ps aux | grep solr > > > > on a *.nix system will show you all the runtime parameters. > > > > > On May 18, 2020, at 12:46 PM, Ryan W wrote: > > > > > > Is there a config file containing the start params? I run solr like... > > > > > > bin/solr start > > > > > > I have not seen anything in the logs that seems informative. When I > grep > > in > > > the logs directory for 'memory', I see nothing besides a couple entries > > > like... > > > > > > 2020-05-14 13:05:56.155 INFO (main) [ ] > > o.a.s.h.a.MetricsHistoryHandler > > > No .system collection, keeping metrics history in memory. > > > > > > I don't know what that entry means, though the date does roughly > coincide > > > with the last time solr stopped running. > > > > > > Thank you. > > > > > > > > > On Mon, May 18, 2020 at 12:00 PM Erick Erickson < > erickerick...@gmail.com > > > > > > wrote: > > > > > >> Probably, but check that you are running with the oom-killer, it'll be > > in > > >> your start params. > > >> > > >> But absent that, something external will be the culprit, Solr doesn't > > stop > > >> by itself. Do look at the Solr log once things stop, it should show if > > >> someone or something stopped it. > > >> > > >> On Mon, May 18, 2020, 10:43 Ryan W wrote: > > >> > > >>> I don't see any log file with "oom" in the file name. Does that mean > > >> there > > >>> hasn't been an out-of-memory issue? Thanks. > > >>> > > >>> On Thu, May 14, 2020 at 10:05 AM James Greene < > > >> ja...@jamesaustingreene.com > > > > >>> wrote: > > >>> > > Check the log for for an OOM crash. Fatal exceptions will be in the > > >> main > > solr log and out of memory errors will be in their own -oom log. > > > > I've encountered quite a few solr crashes and usually it's when > > >> there's a > > threshold of concurrent users and/or indexing happening. > > > > > > > > On Thu, May 14, 2020, 9:23 AM Ryan W wrote: > > > > > Hi all, > > > > > > I manage a site where solr has stopped running a couple times in > the > > >>> past > > > week. The server hasn't been rebooted, so that's not the reason. > > >> What > > else > > > causes solr to stop running? How can I investigate why this is > > happening? > > > > > > Thank you, > > > Ryan > > > > > > > >>> > > >> > > > > >
Re: How to determine why solr stops running?
I usually do a combination of grepping for ERROR in solr logs and checking journalctl to see if an external program may have killed the process. Cheers, / * James Austin Greene * www.jamesaustingreene.com * 336-lol-nerd / On Mon, May 18, 2020 at 1:39 PM Erick Erickson wrote: > ps aux | grep solr > > on a *.nix system will show you all the runtime parameters. > > > On May 18, 2020, at 12:46 PM, Ryan W wrote: > > > > Is there a config file containing the start params? I run solr like... > > > > bin/solr start > > > > I have not seen anything in the logs that seems informative. When I grep > in > > the logs directory for 'memory', I see nothing besides a couple entries > > like... > > > > 2020-05-14 13:05:56.155 INFO (main) [ ] > o.a.s.h.a.MetricsHistoryHandler > > No .system collection, keeping metrics history in memory. > > > > I don't know what that entry means, though the date does roughly coincide > > with the last time solr stopped running. > > > > Thank you. > > > > > > On Mon, May 18, 2020 at 12:00 PM Erick Erickson > > > wrote: > > > >> Probably, but check that you are running with the oom-killer, it'll be > in > >> your start params. > >> > >> But absent that, something external will be the culprit, Solr doesn't > stop > >> by itself. Do look at the Solr log once things stop, it should show if > >> someone or something stopped it. > >> > >> On Mon, May 18, 2020, 10:43 Ryan W wrote: > >> > >>> I don't see any log file with "oom" in the file name. Does that mean > >> there > >>> hasn't been an out-of-memory issue? Thanks. > >>> > >>> On Thu, May 14, 2020 at 10:05 AM James Greene < > >> ja...@jamesaustingreene.com > > >>> wrote: > >>> > Check the log for for an OOM crash. Fatal exceptions will be in the > >> main > solr log and out of memory errors will be in their own -oom log. > > I've encountered quite a few solr crashes and usually it's when > >> there's a > threshold of concurrent users and/or indexing happening. > > > > On Thu, May 14, 2020, 9:23 AM Ryan W wrote: > > > Hi all, > > > > I manage a site where solr has stopped running a couple times in the > >>> past > > week. The server hasn't been rebooted, so that's not the reason. > >> What > else > > causes solr to stop running? How can I investigate why this is > happening? > > > > Thank you, > > Ryan > > > > >>> > >> > >
Re: How to determine why solr stops running?
ps aux | grep solr on a *.nix system will show you all the runtime parameters. > On May 18, 2020, at 12:46 PM, Ryan W wrote: > > Is there a config file containing the start params? I run solr like... > > bin/solr start > > I have not seen anything in the logs that seems informative. When I grep in > the logs directory for 'memory', I see nothing besides a couple entries > like... > > 2020-05-14 13:05:56.155 INFO (main) [ ] o.a.s.h.a.MetricsHistoryHandler > No .system collection, keeping metrics history in memory. > > I don't know what that entry means, though the date does roughly coincide > with the last time solr stopped running. > > Thank you. > > > On Mon, May 18, 2020 at 12:00 PM Erick Erickson > wrote: > >> Probably, but check that you are running with the oom-killer, it'll be in >> your start params. >> >> But absent that, something external will be the culprit, Solr doesn't stop >> by itself. Do look at the Solr log once things stop, it should show if >> someone or something stopped it. >> >> On Mon, May 18, 2020, 10:43 Ryan W wrote: >> >>> I don't see any log file with "oom" in the file name. Does that mean >> there >>> hasn't been an out-of-memory issue? Thanks. >>> >>> On Thu, May 14, 2020 at 10:05 AM James Greene < >> ja...@jamesaustingreene.com >>> wrote: >>> Check the log for for an OOM crash. Fatal exceptions will be in the >> main solr log and out of memory errors will be in their own -oom log. I've encountered quite a few solr crashes and usually it's when >> there's a threshold of concurrent users and/or indexing happening. On Thu, May 14, 2020, 9:23 AM Ryan W wrote: > Hi all, > > I manage a site where solr has stopped running a couple times in the >>> past > week. The server hasn't been rebooted, so that's not the reason. >> What else > causes solr to stop running? How can I investigate why this is happening? > > Thank you, > Ryan > >>> >>
Re: How to determine why solr stops running?
Is there a config file containing the start params? I run solr like... bin/solr start I have not seen anything in the logs that seems informative. When I grep in the logs directory for 'memory', I see nothing besides a couple entries like... 2020-05-14 13:05:56.155 INFO (main) [ ] o.a.s.h.a.MetricsHistoryHandler No .system collection, keeping metrics history in memory. I don't know what that entry means, though the date does roughly coincide with the last time solr stopped running. Thank you. On Mon, May 18, 2020 at 12:00 PM Erick Erickson wrote: > Probably, but check that you are running with the oom-killer, it'll be in > your start params. > > But absent that, something external will be the culprit, Solr doesn't stop > by itself. Do look at the Solr log once things stop, it should show if > someone or something stopped it. > > On Mon, May 18, 2020, 10:43 Ryan W wrote: > > > I don't see any log file with "oom" in the file name. Does that mean > there > > hasn't been an out-of-memory issue? Thanks. > > > > On Thu, May 14, 2020 at 10:05 AM James Greene < > ja...@jamesaustingreene.com > > > > > wrote: > > > > > Check the log for for an OOM crash. Fatal exceptions will be in the > main > > > solr log and out of memory errors will be in their own -oom log. > > > > > > I've encountered quite a few solr crashes and usually it's when > there's a > > > threshold of concurrent users and/or indexing happening. > > > > > > > > > > > > On Thu, May 14, 2020, 9:23 AM Ryan W wrote: > > > > > > > Hi all, > > > > > > > > I manage a site where solr has stopped running a couple times in the > > past > > > > week. The server hasn't been rebooted, so that's not the reason. > What > > > else > > > > causes solr to stop running? How can I investigate why this is > > > happening? > > > > > > > > Thank you, > > > > Ryan > > > > > > > > > >
Re: How to determine why solr stops running?
Probably, but check that you are running with the oom-killer, it'll be in your start params. But absent that, something external will be the culprit, Solr doesn't stop by itself. Do look at the Solr log once things stop, it should show if someone or something stopped it. On Mon, May 18, 2020, 10:43 Ryan W wrote: > I don't see any log file with "oom" in the file name. Does that mean there > hasn't been an out-of-memory issue? Thanks. > > On Thu, May 14, 2020 at 10:05 AM James Greene > > wrote: > > > Check the log for for an OOM crash. Fatal exceptions will be in the main > > solr log and out of memory errors will be in their own -oom log. > > > > I've encountered quite a few solr crashes and usually it's when there's a > > threshold of concurrent users and/or indexing happening. > > > > > > > > On Thu, May 14, 2020, 9:23 AM Ryan W wrote: > > > > > Hi all, > > > > > > I manage a site where solr has stopped running a couple times in the > past > > > week. The server hasn't been rebooted, so that's not the reason. What > > else > > > causes solr to stop running? How can I investigate why this is > > happening? > > > > > > Thank you, > > > Ryan > > > > > >
Re: How to determine why solr stops running?
I don't see any log file with "oom" in the file name. Does that mean there hasn't been an out-of-memory issue? Thanks. On Thu, May 14, 2020 at 10:05 AM James Greene wrote: > Check the log for for an OOM crash. Fatal exceptions will be in the main > solr log and out of memory errors will be in their own -oom log. > > I've encountered quite a few solr crashes and usually it's when there's a > threshold of concurrent users and/or indexing happening. > > > > On Thu, May 14, 2020, 9:23 AM Ryan W wrote: > > > Hi all, > > > > I manage a site where solr has stopped running a couple times in the past > > week. The server hasn't been rebooted, so that's not the reason. What > else > > causes solr to stop running? How can I investigate why this is > happening? > > > > Thank you, > > Ryan > > >
Re: How to determine why solr stops running?
Check the log for for an OOM crash. Fatal exceptions will be in the main solr log and out of memory errors will be in their own -oom log. I've encountered quite a few solr crashes and usually it's when there's a threshold of concurrent users and/or indexing happening. On Thu, May 14, 2020, 9:23 AM Ryan W wrote: > Hi all, > > I manage a site where solr has stopped running a couple times in the past > week. The server hasn't been rebooted, so that's not the reason. What else > causes solr to stop running? How can I investigate why this is happening? > > Thank you, > Ryan >