Hi Jim, Based on our discussion, I cooked up this solution for my book Solr in Action and would appreciate you looking it over to see if it meets your needs. The basic idea is to extend Solr's built-in PingRequestHandler to verify a replica is connected to Zookeeper and is in the "active" state. To enable this, install the custom JAR and then update your solrconfig.xml to use this class instead of the built-in one for the /admin/ping request handler:
<requestHandler name="/admin/ping" class="sia.ch13.ClusterStateAwarePingRequestHandler"> >>>> Code <<<< package sia.ch13; import org.apache.solr.cloud.CloudDescriptor; import org.apache.solr.cloud.ZkController; import org.apache.solr.common.SolrException; import org.apache.solr.common.cloud.ClusterState; import org.apache.solr.common.cloud.Slice; import org.apache.solr.core.CoreContainer; import org.apache.solr.core.CoreDescriptor; import org.apache.solr.core.SolrCore; import org.apache.solr.handler.PingRequestHandler; import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.response.SolrQueryResponse; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * Extends Solr's PingRequestHandler to check a replica's cluster status as part of the health check. */ public class ClusterStateAwarePingRequestHandler extends PingRequestHandler { public static Logger log = LoggerFactory.getLogger(ClusterStateAwarePingRequestHandler.class); @Override public void handleRequestBody(SolrQueryRequest solrQueryRequest, SolrQueryResponse solrQueryResponse) throws Exception { // delegate to the base class to check the status of this local index super.handleRequestBody(solrQueryRequest, solrQueryResponse); // if ping status is OK, then check cluster state of this core if ("OK".equals(solrQueryResponse.getValues().get("status"))) { verifyThisReplicaIsActive(solrQueryRequest.getCore()); } } /** * Verifies this replica is "active". */ protected void verifyThisReplicaIsActive(SolrCore solrCore) throws SolrException { String replicaState = "unknown"; String nodeName = "?"; String shardName = "?"; String collectionName = "?"; String role = "?"; Exception exc = null; try { CoreDescriptor coreDescriptor = solrCore.getCoreDescriptor(); CoreContainer coreContainer = coreDescriptor.getCoreContainer(); CloudDescriptor cloud = coreDescriptor.getCloudDescriptor(); shardName = cloud.getShardId(); collectionName = cloud.getCollectionName(); role = (cloud.isLeader() ? "Leader" : "Replica"); ZkController zkController = coreContainer.getZkController(); if (zkController != null) { nodeName = zkController.getNodeName(); if (zkController.isConnected()) { ClusterState clusterState = zkController.getClusterState(); Slice slice = clusterState.getSlice(collectionName, shardName); replicaState = (slice != null) ? slice.getState() : "gone"; } else { replicaState = "not connected to Zookeeper"; } } else { replicaState = "Zookeeper not enabled/configured"; } } catch (Exception e) { replicaState = "error determining cluster state"; exc = e; } if ("active".equals(replicaState)) { log.info(String.format("%s at %s for %s in the %s collection is active.", role, nodeName, shardName, collectionName)); } else { // fail the ping by raising an exception String errMsg = String.format("%s at %s for %s in the %s collection is not active! State is: %s", role, nodeName, shardName, collectionName, replicaState); if (exc != null) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg, exc); } else { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg); } } } } On Tue, Jul 23, 2013 at 1:46 PM, jimtronic <jimtro...@gmail.com> wrote: > I think the best bet here would be a ping like handler that would simply > return the state of only this box in the cluster: > > Something like /admin/state which would return > "down","active","leader","recovering" > > I'm not really sure where to begin however. Any ideas? > > jim > > On Mon, Jul 22, 2013 at 12:52 PM, Timothy Potter [via Lucene] < > ml-node+s472066n4079518...@n3.nabble.com> wrote: > >> There is but I couldn't get it to work in my environment on Jetty, see: >> >> >> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-BnPXQ@...%3E<http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-bn...@mail.gmail.com%3E> >> >> Let me know if you have any better luck. I had to resort to something >> hacky but was out of time I could devote to such unproductive >> endeavors ;-) >> >> On Mon, Jul 22, 2013 at 10:49 AM, jimtronic <[hidden >> email]<http://user/SendEmail.jtp?type=node&node=4079518&i=0>> >> wrote: >> >> > I'm not sure why it went down exactly -- I restarted the process and >> lost the >> > logs. (d'oh!) >> > >> > An OOM seems likely, however. Is there a setting for killing the >> processes >> > when solr encounters an OOM? >> > >> > Thanks! >> > >> > Jim >> > >> > >> > >> > -- >> > View this message in context: >> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html >> >> > Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> ------------------------------ >> If you reply to this email, your message will be added to the discussion >> below: >> >> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079518.html >> To unsubscribe from Node down, but not out, click >> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4079495&code=amltdHJvbmljQGdtYWlsLmNvbXw0MDc5NDk1fDEzMjQ4NDk0MTQ=> >> . >> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >> > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079856.html > Sent from the Solr - User mailing list archive at Nabble.com.