On 3/16/2016 8:14 AM, Tom Evans wrote:
> The problem occurs when we attempt to query a node to see if products
> or items is active on that node. The balancer (haproxy) requests the
> ping handler for the appropriate collection, however all the nodes
> return OK for all the collections(!)
>
> Eg, on node01, it has replicas for products and skus, but the ping
> handler for /solr/items/admin/ping returns 200!
This returns OK because as long as one replica for every shard in
"items" is available somewhere in the cloud, you can make a request for
"items" on that node and it will work. Or at least it *should* work,
and if it's not working, that's a bug. I remember that one of the older
4.x versions *did* have a bug where queries for a collection would only
work if the node actually contained shards for that collection.
> This means that as far as the balancer is concerned, node01 is a valid
> destination for item queries, and inevitably it blows up as soon as
> such a query is made to it.
What version of Solr?
> As I understand it, this is because the URL we are checking is for the
> collection ("items") rather than a specific core
> ("items_shard1_replica1")
>
> Is there a way to make the ping handler only check local shards? I
> have tried with distrib=false&preferLocalShards=false, but it still
> returns a 200.
Most requests to an individual shard replica will expand to the entire
collection. Some things will reduce that to only the queried shard
replica if you add a distrib=false parameter to the request. I do not
know if the ping handler is one of those things. But you should not
need to query individual shards. As explained above, if all nodes are
part of the same cloud (using the same zkHost string), you should be
able to query any node in the cloud for any collection in the cloud,
whether that node contains shards for that collection or not.
Other messages you've sent to the list are about Solr5, so I would be
very surprised to learn that the old 4.x bug is still there, but
anything's possible.
Thanks,
Shawn