Hi all,

Found interesting problem in Solr 7.5.0 regarding implicit router when _route_ param being provided in non-distributed request.

Imagine following set-up...

1. Collection: foo
2. Physical nodes: nodeA, nodeB
3. Shards: shard1, shard2
4. Replication factor: 2 (pure NRT)

- nodeA
-- foo_shard1_replica_n1
-- foo_shard2_replica_n1
- nodeB
-- foo_shard1_replica_n2
-- foo_shard2_replica_n2

TL;DR: two shards, two replicas each, co-sharing nodes.


Request: new SolrQuery("filter:value").setParam("_route_", "shard1").setParam("distrib", "false");

This request will return unpredictable results, depending on which core it hits.


The reason being is that CloudSolrClient will resolve node URLs to collection rather than cores. This is critical snippet in the code:

---------------- Start from line 1072 -----------------------

      List<String> replicas = new ArrayList<>();
      String joinedInputCollections = StrUtils.join(inputCollections, ',');
      for (Slice slice : slices.values()) {
        for (ZkNodeProps nodeProps : slice.getReplicasMap().values()) {
          ZkCoreNodeProps coreNodeProps = new ZkCoreNodeProps(nodeProps);
          String node = coreNodeProps.getNodeName();
          if (!liveNodes.contains(node) // Must be a live node to continue
              || Replica.State.getState(coreNodeProps.getState()) != Replica.State.ACTIVE) // Must be an ACTIVE replica to continue
            continue;
          if (seenNodes.add(node)) { // if we haven't yet collected a URL to this node...             String url = ZkCoreNodeProps.getCoreUrl(nodeProps.getStr(ZkStateReader.BASE_URL_PROP), joinedInputCollections); // BOOM!
            if (sendToLeaders && coreNodeProps.isLeader()) {
              theUrlList.add(url); // put leaders here eagerly (if sendToLeader mode)
            } else {
              replicas.add(url); // replicas here
            }
          }
        }
      }

--------------------------------------------------------------------

The URL of replica is formed using collection name, not core name:
Line 1082: ZkCoreNodeProps.getCoreUrl(nodeProps.getStr(ZkStateReader.BASE_URL_PROP), joinedInputCollections)


Instead of getting URLs like:
- http://nodeA/solr/foo_shard1_replica_n1
- http://nodeB/solr/foo_shard1_replica_n2

We end up with:
- http://nodeA/solr/foo
- http://nodeB/solr/foo

Because in this example shards share physical nodes, sometimes request is routed to core of proper shard, sometimes not.

Should the CloudSolrClient resolve exact core URLs when distrib=false? I am guessing yes.


--
Jaroslaw Rozanski | e: m...@jarekrozanski.eu

Reply via email to