Dear all,
I'm experiencing a strange behaviour with a SolrCloud cluster.

Cluster description
I have a cluster with a total of 38 nodes. All nodes are installed with the 
following features:

  *   OS: Debian GNU/Linux 9.13 (stretch)
  *   JRE: openjdk version "11.0.6" 2020-01-14
  *   Apache Solr: Apache Solr 8.11.2

The cluster nodes are divided as follows:

Nodes used for indexing
solrindex-01
solrindex-02

Nodes used for queries
solrquery-01
solrquery-02

Cluster nodes with collections
solrnode-01
...
solrnode-34

Configuration of the collection
In the cluster I have a collection (i.e testcollection) divided on the various 
nodes through different shards (one shard for each month, i.e. shard_202201, 
shard_202202, ...)

Problem
>From time to time the solrquery-01 node is no longer able to query the entire 
>collection and in particular it is unable to contact some replicas of the 
>collection present on the other nodes of the cluster. The problem does not 
>resolve itself but it is necessary to restart the Apache Solr service on the 
>solrquery-01 node.

In particular:
If I try to query a specific replica from the solrquery-01 node, the request 
remains pending until it times out

Query
http://solrquery-01:8080/solr/volocomapi_search/select?q=UniqueReference:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFED&debug=true&shards=http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/

Response
[cid:image001.jpg@01D8BE28.664755F0]

By executing the same query from another node (eg: solrnode-01) the query is 
successful.

Query
http://solrnode-01:8080/solr/volocomapi_search/select?q=UniqueReference:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFED&debug=true&shards=http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/


Response:
[cid:image002.jpg@01D8BE28.664755F0]

The same happens if I try to run the query to a different replica

Query
http://solrquery-01:8080/solr/volocomapi_search/select?q=UniqueReference:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFED&debug=true&shards=http://solrnode-23.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n573/

Response
[cid:image003.jpg@01D8BE28.664755F0]


Checking the network traffic with tcpdump on the solrquery-01 machine does not 
show any connection as it does on the solrnode-01 machine

tcpdump from the solrquery-01 machine
[cid:image004.jpg@01D8BE28.664755F0]

tcpdump on the solrnode-01 machine
[cid:image005.jpg@01D8BE28.664755F0]

Question
Do you have any suggestions on how to investigate this issue further? 
Suggestions on possible solutions?


Thank you in advance,
Matteo

Reply via email to