[ https://issues.apache.org/jira/browse/SOLR-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851462#action_12851462 ]
Shawn Smith edited comment on SOLR-1855 at 3/30/10 9:58 PM: ------------------------------------------------------------ I've attached a first pass implementation of this script: 'checksolr' attachment. It's basically the script we're using in our production environment to monitor Solr health. As such, it's not completely generic, but it should be a decent start: * bash script tested only on Linux * dependencies on curl, xmllint, xmlstarlet (curl, libxml2, xmlstarlet packages) * assumes url structure corresponding to the default multi-core Solr configuration (http://<host>:<port>/solr/admin/cores, .../solr/<core>/admin/ping, .../solr/<core>/replication?command=details) * checks slave replication health assuming Solr 1.4 Java replication * dynamically determines the set of Solr cores, so it's useful in a multi-core deployment where the set of cores may change relatively often Example usage: {noformat} $ ./checksolr -? Usage: checksolr [OPTIONS] Options: --help | -h Print the brief help message and exit. --man Print the manual page and exit. --host | -H HOST Check this host instead of localhost. --port | -P Port Use this port instead of the default(8983) to connect. --diff | -D Time difference between now and when solr last replicated Use this option to set the maximum difference in seconds between the time when the solr slave replicated and now. --slave Perform slave checks on the host instead of ping tests. $ ./checksolr --host solrmaster1 Core "core0" returned "OK". Core "core1" returned "OK". Core "core2" returned "OK". $ echo $? 0 $ ./checksolr --slave --host solrslave1 Core "core0" is up to date. Core "core1" is up to date. Core "core2" is having trouble replicating. $ echo $? 1 {noformat} was (Author: ssmith): I've attached a first pass implementation of this script: !checksolr!. It's basically the script we're using in our production environment to monitor Solr health. As such, it's not completely generic, but it should be a decent start: * bash script tested only on Linux * dependencies on curl, xmllint, xmlstarlet (curl, libxml2, xmlstarlet packages) * assumes url structure corresponding to the default multi-core Solr configuration (http://<host>:<port>/solr/admin/cores, .../solr/<core>/admin/ping, .../solr/<core>/replication?command=details) * checks slave replication health assuming Solr 1.4 Java replication * dynamically determines the set of Solr cores, so it's useful in a multi-core deployment where the set of cores may change relatively often Example usage: {noformat} $ ./checksolr -? Usage: checksolr [OPTIONS] Options: --help | -h Print the brief help message and exit. --man Print the manual page and exit. --host | -H HOST Check this host instead of localhost. --port | -P Port Use this port instead of the default(8983) to connect. --diff | -D Time difference between now and when solr last replicated Use this option to set the maximum difference in seconds between the time when the solr slave replicated and now. --slave Perform slave checks on the host instead of ping tests. $ ./checksolr --host solrmaster1 Core "core0" returned "OK". Core "core1" returned "OK". Core "core2" returned "OK". $ echo $? 0 $ ./checksolr --slave --host solrslave1 Core "core0" is up to date. Core "core1" is up to date. Core "core2" is having trouble replicating. $ echo $? 1 {noformat} > Script to monitor Solr health including replication status > ---------------------------------------------------------- > > Key: SOLR-1855 > URL: https://issues.apache.org/jira/browse/SOLR-1855 > Project: Solr > Issue Type: New Feature > Components: replication (java) > Affects Versions: 1.4 > Reporter: Shawn Smith > Attachments: checksolr > > > It would be useful to have a simple monitor script that checks the health of > all cores on a solr server. > # Call the "ping" command and verify success. > # Check for replication failures, for replication slaves. > The script should return a non-zero exit code if any serious errors are > discovered. This should make it easy to plug the script into a monitoring > framework (Nagios, etc.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.