Hi Jim,

Based on our discussion, I cooked up this solution for my book Solr in
Action and would appreciate you looking it over to see if it meets
your needs. The basic idea is to extend Solr's built-in
PingRequestHandler to verify a replica is connected to Zookeeper and
is in the "active" state. To enable this, install the custom JAR and
then update your solrconfig.xml to use this class instead of the
built-in one for the /admin/ping request handler:

<requestHandler name="/admin/ping"
class="sia.ch13.ClusterStateAwarePingRequestHandler">



>>>> Code <<<<

package sia.ch13;

import org.apache.solr.cloud.CloudDescriptor;
import org.apache.solr.cloud.ZkController;
import org.apache.solr.common.SolrException;
import org.apache.solr.common.cloud.ClusterState;
import org.apache.solr.common.cloud.Slice;
import org.apache.solr.core.CoreContainer;
import org.apache.solr.core.CoreDescriptor;
import org.apache.solr.core.SolrCore;
import org.apache.solr.handler.PingRequestHandler;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.response.SolrQueryResponse;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * Extends Solr's PingRequestHandler to check a replica's cluster
status as part of the health check.
 */
public class ClusterStateAwarePingRequestHandler extends PingRequestHandler {

    public static Logger log =
LoggerFactory.getLogger(ClusterStateAwarePingRequestHandler.class);

    @Override
    public void handleRequestBody(SolrQueryRequest solrQueryRequest,
SolrQueryResponse solrQueryResponse) throws Exception {
        // delegate to the base class to check the status of this local index
        super.handleRequestBody(solrQueryRequest, solrQueryResponse);

        // if ping status is OK, then check cluster state of this core
        if ("OK".equals(solrQueryResponse.getValues().get("status"))) {
            verifyThisReplicaIsActive(solrQueryRequest.getCore());
        }
    }

    /**
     * Verifies this replica is "active".
     */
    protected void verifyThisReplicaIsActive(SolrCore solrCore) throws
SolrException {
        String replicaState = "unknown";
        String nodeName = "?";
        String shardName = "?";
        String collectionName = "?";
        String role = "?";
        Exception exc = null;
        try {
            CoreDescriptor coreDescriptor = solrCore.getCoreDescriptor();
            CoreContainer coreContainer = coreDescriptor.getCoreContainer();
            CloudDescriptor cloud = coreDescriptor.getCloudDescriptor();

            shardName = cloud.getShardId();
            collectionName = cloud.getCollectionName();
            role = (cloud.isLeader() ? "Leader" : "Replica");

            ZkController zkController = coreContainer.getZkController();
            if (zkController != null) {
                nodeName = zkController.getNodeName();
                if (zkController.isConnected()) {
                    ClusterState clusterState = zkController.getClusterState();
                    Slice slice =
clusterState.getSlice(collectionName, shardName);
                    replicaState = (slice != null) ? slice.getState() : "gone";
                } else {
                    replicaState = "not connected to Zookeeper";
                }
            } else {
                replicaState = "Zookeeper not enabled/configured";
            }
        } catch (Exception e) {
            replicaState = "error determining cluster state";
            exc = e;
        }

        if ("active".equals(replicaState)) {
            log.info(String.format("%s at %s for %s in the %s
collection is active.",
                    role, nodeName, shardName, collectionName));
        } else {
            // fail the ping by raising an exception
            String errMsg = String.format("%s at %s for %s in the %s
collection is not active! State is: %s",
                    role, nodeName, shardName, collectionName, replicaState);
            if (exc != null) {
                throw new
SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg, exc);
            } else {
                throw new
SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg);
            }
        }
    }
}

On Tue, Jul 23, 2013 at 1:46 PM, jimtronic <jimtro...@gmail.com> wrote:
> I think the best bet here would be a ping like handler that would simply
> return the state of only this box in the cluster:
>
> Something like /admin/state which would return
> "down","active","leader","recovering"
>
> I'm not really sure where to begin however. Any ideas?
>
> jim
>
> On Mon, Jul 22, 2013 at 12:52 PM, Timothy Potter [via Lucene] <
> ml-node+s472066n4079518...@n3.nabble.com> wrote:
>
>> There is but I couldn't get it to work in my environment on Jetty, see:
>>
>>
>> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-BnPXQ@...%3E<http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-bn...@mail.gmail.com%3E>
>>
>> Let me know if you have any better luck. I had to resort to something
>> hacky but was out of time I could devote to such unproductive
>> endeavors ;-)
>>
>> On Mon, Jul 22, 2013 at 10:49 AM, jimtronic <[hidden 
>> email]<http://user/SendEmail.jtp?type=node&node=4079518&i=0>>
>> wrote:
>>
>> > I'm not sure why it went down exactly -- I restarted the process and
>> lost the
>> > logs. (d'oh!)
>> >
>> > An OOM seems likely, however. Is there a setting for killing the
>> processes
>> > when solr encounters an OOM?
>> >
>> > Thanks!
>> >
>> > Jim
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html
>>
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079518.html
>>  To unsubscribe from Node down, but not out, click 
>> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4079495&code=amltdHJvbmljQGdtYWlsLmNvbXw0MDc5NDk1fDEzMjQ4NDk0MTQ=>
>> .
>> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079856.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to