Re: LotsOfCores feature

Jack Krupansky Sun, 09 Jun 2013 13:57:09 -0700

You're right - ZK is simply managing the shared config information for thecluster and has no part in query or transactions between the actual nodes,except as it depends on shared config information (e.g., what the shards areand where the nodes are.)

Somewhere in there I was simply making the point that ZK manages 1MB-sizeblobs of data, so a database of the status of millions of Solr cores wouldbe beyond what can readily be managed by ZK.


-- Jack Krupansky

-----Original Message-----From: Upayavira

Sent: Sunday, June 09, 2013 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: LotsOfCores feature



On Fri, Jun 7, 2013, at 02:59 PM, Jack Krupansky wrote:

AFAICT, SolrCloud addresses the use case of distributed update for a
relatively smaller number of collections (dozens?) that have a relatively
larger number of rows - billions over a modest to moderate number of
nodes
(a handful to a dozen or dozens). So, maybe dozens of collections (some
people still call these "cores") that distribute hundreds of millions if
not
billions of rows over dozens (or potentially low hundreds) of nodes.
Technically, ZK was designed for thousands of nodes, but I don't think
that
was for the use case of distributed query that constantly fans out to all
shards.


Not sure I get what you're saying here. ZK was designed for thousands of
nodes, and the way it works is by making sure that each node has an
active cache of all relevant data within it so they don't need to poll
ZK for the data. Therefore, as far as ZK is concerned it is irrelevant
how many hosts are involved in any particular transaction - the node
that is handling the distribution consults its cache of the list of
active nodes, decides which one to hit, and off it goes, no interaction
with ZK required.

Or am I missing something?

Upayavira

Re: LotsOfCores feature

Reply via email to