You're right - ZK is simply managing the shared config information for the
cluster and has no part in query or transactions between the actual nodes,
except as it depends on shared config information (e.g., what the shards are
and where the nodes are.)
Somewhere in there I was simply making the point that ZK manages 1MB-size
blobs of data, so a database of the status of millions of Solr cores would
be beyond what can readily be managed by ZK.
-- Jack Krupansky
-----Original Message-----
From: Upayavira
Sent: Sunday, June 09, 2013 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: LotsOfCores feature
On Fri, Jun 7, 2013, at 02:59 PM, Jack Krupansky wrote:
AFAICT, SolrCloud addresses the use case of distributed update for a
relatively smaller number of collections (dozens?) that have a relatively
larger number of rows - billions over a modest to moderate number of
nodes
(a handful to a dozen or dozens). So, maybe dozens of collections (some
people still call these "cores") that distribute hundreds of millions if
not
billions of rows over dozens (or potentially low hundreds) of nodes.
Technically, ZK was designed for thousands of nodes, but I don't think
that
was for the use case of distributed query that constantly fans out to all
shards.
Not sure I get what you're saying here. ZK was designed for thousands of
nodes, and the way it works is by making sure that each node has an
active cache of all relevant data within it so they don't need to poll
ZK for the data. Therefore, as far as ZK is concerned it is irrelevant
how many hosts are involved in any particular transaction - the node
that is handling the distribution consults its cache of the list of
active nodes, decides which one to hit, and off it goes, no interaction
with ZK required.
Or am I missing something?
Upayavira