Hi, I'm trying to understand the difference between doing load balancing
via an HTTP proxy vs. using SolrJ's CloudSolrClient. I created a test
cluster (3x SolrCloud 8.8 nodes, 3x ZooKeeper nodes) and then tested a few
things:

1) Configure a proxy to do the load balancing. I figured that:
- I can delegate health checks to both the proxy, and my container
orchestration.
- I can connect to the cluster with SolrJ using the HttpSolrClient with the
proxy URL.

My concern is that, since health checks are done on the Solr instance (e.g.
GET /solr/), and not a specific collection, the proxy could redirect a
request to a healthy node with a faulty collection. Is this a real concern?

2) Alternatively, I could use CloudSolrClient and configure either the list
of `solrBaseUrls` or `zkHosts`.

The constraint here is that, when using CloudSolrClient, the SolrJ client
gets back a list of resolved IP addresses from the SolrCloud cluster or
ZooKeeper ensemble. The client must be able to reach those resolved IP
addresses or the connection will fail. Therefore, either the client must
live in the same network as the servers (subnet, VPN, etc.), or the servers
must be publicly accessible.

I'm new to Solr, so I wonder if there's any other specifics or alternatives
that I'm not considering. Are there any particular reasons why you'd
recommend one setup over the other?

Any insight is appreciated,
Ixai

Reply via email to