Re: Split a large ZooKeeper cluster into multiple separate clusters

Eric Young Wed, 07 Sep 2016 16:16:43 -0700

> The zookeeper list isn't really the right place for most of this.  The
> residents of this list will to have zero knowledge of how Solr uses
> zookeeper.  I'm on both lists -- and I'm a lot more familiar with Solr
> than Zookeeper.


I think I see that now.  I was under the impression that this might be
more of a ZooKeeper question since the clusterstate.json was where I
think I will run into issues.  But I guess this JSON structure is more
of a Solr design and Zookeeper doesn't really have any say in it?


> On second thought, DON'T TRY THIS.
>
> I wouldn't want to take the chance that the DELETE would actually try to
> contact the mentioned servers and truly delete the collection.

I agree.  The only part in the scenario that I'm particularly worried
about would be the API calls through Solr (because they look like they
might try to do extra stuff I don't want).  I will definitely be
testing this before actually doing it, but I was posting the questions
to see if anyone has other ideas.

NOTE: The example I gave was minimal for the sake of making the idea
easier to understand.  It confuses me sometimes too.  My true ZK
cluster size is about 18 nodes, spans multiple physical locations, and
manages over 200 collections.  I'm not worried about ZooKeeper
performance.  Part of the reason for needing to do this is that it can
cause problems when trying to create a new collection with the same
name in a different physical location with a different schema.  Also,
adding/removing ZooKeeper nodes can be problematic to manage over a
large cluster (partly because 3.4.6 doesn't support live config
changes b3.5.0+ does).


> If you have a collection that has replicas on all four Solr servers,
> then your four solr servers are *one* SolrCloud cluster, not two.  If
> they were separate clusters, it would not be possible to have one
> collection with shards/replicas on all four servers.

I agree about the current *one* SolrCloud.  I technically do have a
single massive SolrCloud because of this fact (which also poses some
potential issues).  All the dynamic collections have always been
logically separated and only the static collections span the whole
SolrCloud cluster.


> The first thing you need to do is rearrange the static collection so it
> only lives on two of the Solr servers.  To do this, you can use
> ADDREPLICA if addiitonal replicas are required, then DELETEREPLICA to
> remove it from two of the servers.

Unfortunately, the final goal is to have each final SolrCloud cluster
to have knowledge of every static collections, but the local ZooKeeper
clusters should not know about the others in other clusters
(effectively duplicating the collection in each cluster).  So, there
is no rearranging to do, only removing "extra" nodes after splitting
the ZooKeeper cluster.  This may sound counter productive, but the
static collections are managed outside of Solr.  In the event that I
do need to update the content in one, I can reload the collection on
per location basis for a less risky deployment.  It's a bit scary when
you need to reload a large static collection across 20+ Solr servers.


-- Eric

Re: Split a large ZooKeeper cluster into multiple separate clusters

Reply via email to