First of all, killing with -9 is A Very Bad Idea. You can
leave write lock files laying around. You can leave
the state in an "interesting" place. You haven't given
Solr a chance to tell Zookeeper that it's going away.
(which would set the state to "down"). In short
when you do this you have to deal with the consequences
yourself, one of which is this mismatch between
cluster state and live_nodes.

Now, that rant done the bin/solr script tries to stop Solr
gracefully but issues a kill if solr doesn't stop nicely. Personally
I think that timeout should be longer, but that's another story.

The onlyIfDown='true' option is there specifically as a
safety valve. It was provided for those who want to guard against
typos and the like, so just don't specify it and you should be fine.

Best,
Erick

On Mon, Jul 18, 2016 at 11:51 PM, Jerome Yang <jey...@pivotal.io> wrote:
> Hi all,
>
> Here's the situation.
> I'm using solr5.3 in cloud mode.
>
> I have 4 nodes.
>
> After use "kill -9 pid-solr-node" to kill 2 nodes.
> These replicas in the two nodes still are "ACTIVE" in zookeeper's
> state.json.
>
> The problem is, when I try to delete these down replicas with
> parameter onlyIfDown='true'.
> It says,
> "Delete replica failed: Attempted to remove replica :
> demo.public.tbl/shard0/core_node4 with onlyIfDown='true', but state is
> 'active'."
>
> From this link:
> <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE>
> <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE>
> <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE>
> <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE>
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.State.html#ACTIVE
>
> It says:
> *NOTE*: when the node the replica is hosted on crashes, the replica's state
> may remain ACTIVE in ZK. To determine if the replica is truly active, you
> must also verify that its node
> <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/Replica.html#getNodeName-->
> is
> under /live_nodes in ZK (or use ClusterState.liveNodesContain(String)
> <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/common/cloud/ClusterState.html#liveNodesContain-java.lang.String->
> ).
>
> So, is this a bug?
>
> Regards,
> Jerome

Reply via email to