Our environment still run with Solr4.7. Recently we noticed in a test. When we stopped 1 solr server(solr02, which did OS shutdown), all the cores of solr02 are shown as "down", but remains a few cores still as leaders. After that, we quickly seeing all other servers are still sending requests to that down solr server, and therefore we saw a lot of TCP waiting threads in thread pool of other solr servers since solr02 already down.
"shard53":{ "range":"26660000-2998ffff", "state":"active", "replicas":{ "core_node102":{ "state":"down", "base_url":"https://solr02.myhost/solr", "core":"collection2_shard53_replica1", "node_name":"https://solr02.myhost_solr", "leader":"true"}, "core_node104":{ "state":"active", "base_url":"https://solr04.myhost/solr", "core":"collection2_shard53_replica2", "node_name":"https://solr04.myhost/solr_solr"}}}, Is this something known bug in 4.7 and late on fixed? Any reference JIRA we can study about? If the solr service is stopped gracefully, we can see leader core election happens and switched to other active core. But if we just directly shutdown a Solr OS, we can reproduce in our environment that some "Down" cores remains "leader" at ZK clusterstate.json