Hi, I'm trying to understand the difference between doing load balancing via an HTTP proxy vs. using SolrJ's CloudSolrClient. I created a test cluster (3x SolrCloud 8.8 nodes, 3x ZooKeeper nodes) and then tested a few things:
1) Configure a proxy to do the load balancing. I figured that: - I can delegate health checks to both the proxy, and my container orchestration. - I can connect to the cluster with SolrJ using the HttpSolrClient with the proxy URL. My concern is that, since health checks are done on the Solr instance (e.g. GET /solr/), and not a specific collection, the proxy could redirect a request to a healthy node with a faulty collection. Is this a real concern? 2) Alternatively, I could use CloudSolrClient and configure either the list of `solrBaseUrls` or `zkHosts`. The constraint here is that, when using CloudSolrClient, the SolrJ client gets back a list of resolved IP addresses from the SolrCloud cluster or ZooKeeper ensemble. The client must be able to reach those resolved IP addresses or the connection will fail. Therefore, either the client must live in the same network as the servers (subnet, VPN, etc.), or the servers must be publicly accessible. I'm new to Solr, so I wonder if there's any other specifics or alternatives that I'm not considering. Are there any particular reasons why you'd recommend one setup over the other? Any insight is appreciated, Ixai
