Our environment still run with Solr4.7. Recently we noticed in a test. When
we stopped 1 solr server(solr02, which did OS shutdown), all the cores of
solr02 are shown as "down", but remains a few cores still as leaders. After
that, we quickly seeing all other servers are still sending requests to
that down solr server, and therefore we saw a lot of TCP waiting threads in
thread pool of other solr servers since solr02 already down.

"shard53":{
        "range":"26660000-2998ffff",
        "state":"active",
        "replicas":{
          "core_node102":{
            "state":"down",
            "base_url":"https://solr02.myhost/solr";,
            "core":"collection2_shard53_replica1",
            "node_name":"https://solr02.myhost_solr";,
            "leader":"true"},
          "core_node104":{
            "state":"active",
            "base_url":"https://solr04.myhost/solr";,
            "core":"collection2_shard53_replica2",
            "node_name":"https://solr04.myhost/solr_solr"}}},

Is this something known bug in 4.7 and late on fixed? Any reference JIRA we
can study about?  If the solr service is stopped gracefully, we can see
leader core election happens and switched to other active core. But if we
just directly shutdown a Solr OS, we can reproduce in our environment that
some "Down" cores remains "leader" at ZK clusterstate.json

Reply via email to