We've been trying to figure out ways to "migrate" existing SolrClouds to
another ZK ensemble which will be built on different infrastructure than
the current ensemble.  Also, ZK will be upgraded from 3.4.13 (old ensemble)
to 3.6.3 (new ensemble).  We're running Solr 8.10.1.

One option we are experimenting with is to transfer ZK snapshot and tlogs
from the old to the new ensemble, modify the ZK_HOST to point to the new
ensemble, then restart Solr.  (We use chroots to keep each SolrCloud
separated).  We are NOT using dynamic reconfiguration (zoo.cfg
reconfigEnabled=false).

Transferring the snapshot and tlogs seemingly worked in ZK: no errors, and
poking around ZK show all the data is current.

When we did the Solr part, it seemed to work as well, looking at the Cloud
in the UI: all the nodes, replicas, collections, etc. are there: the
clusterstatus is valid, and we can query and index new content.

But the ZK Status page is just strange - it shows this error:

Errors:
Your ZK connection string (3 hosts) is different from the dynamic ensemble
config (3 hosts). Solr does not currently support dynamic reconfiguration
and will only be able to connect to the zk hosts in your connection string.
Failed talking to Zookeeper localhost:2181
Failed talking to Zookeeper localhost:2182
Failed talking to Zookeeper localhost:2183

ZK connection string:
10.xx.xx.xx:2181,10.xx.xx.xx:2182,10.xx.xx.xx:2183/solr8
Ensemble size: 3
Ensemble mode:
Dynamic reconfig enabled: true

And the little table under all that shows the following headers:
localhost:2181 localhost:2182 localhost:2183
...of course with ok=false for all 3, because there is no ZK running on
localhost.

And as stated before, zoo.cfg has reconfigEnabled= false.

So for all the important bits, Solr seems to be looking at the ZK in the
connection string.  But I don't understand what's going on in the UI: I'm
not sure how to stop Solr from looking for ZK's on localhost.  Is it
somehow related to this: https://issues.apache.org/jira/browse/SOLR-13801?
There are various linkages to other bugs/improvements, but we're not doing
anything special here:  we have whitelisted the ZK 4lw's, we've disable
ACL, and we're not using TLS.

Anyone have any ideas?  Thanks

Reply via email to