[Solr Wiki] Update of "SolrCloud" by YonikSeeley

Apache Wiki Thu, 03 Dec 2009 13:00:14 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The "SolrCloud" page has been changed by YonikSeeley.
The comment on this change is: a start on some high level requirements / design.
http://wiki.apache.org/solr/SolrCloud

--------------------------------------------------

New page:
=== High level design goals ===
These are long term goals for SolrCloud.  Many of these features will not be 
developed in the first versions, but we're designing for the long haul.

===== High Availability and Fault Tolerance =====
No external load balancer should be required.  We should eliminate any single 
points of failure (i.e. start with a design that will allow us to add this 
feature at a later point with minimal changes to things like the zookeeper 
schema)

===== Cluster resizing and rebalancing =====
To grow the cluster or to rebalance due to hotspots, shards should be 
resizable.  Pieces of existing shards should be able to be split off and 
assigned to new servers.

===== Clients should not have to know about cluster layout. =====
A simple client (like a browser) should be able to hit any solr search server 
in the cluster with a request, and that search server should be able to execute 
a distributed search against the cluster as a whole, including load balancing 
and failover, to return the results to the client.

===== Open API =====
The zookeeper schema should be well defined and public, allowing other software 
components other than the master node to inspect and change the cluster via 
zookeeper.  A task like creating a new collection eventually be achievable by 
creating the correct znodes/files in zookeeper, w/o having to talk to any solr 
servers.

===== Support various levels of custom clusters =====
Support various splits between how much the user manages and how much solr 
manages.  One could have a set of servers in zookeeper with defined indexes 
(shards or complete copies) and want to just use the client search 
capabilities.  One may also want a traditional master/slave relationship, even 
if more advanced options are available.

===== Support user specified partitioning =====
Partitioning of documents by geographic region, time, user, etc, brings huge 
performance benefits by allowing only a part of the cluster to be queried.  
This should be an option even in conjunction with the most advanced modes of 
operation (automatic document->shard assignment, index rebalancing, no single 
points of failure, etc) presumably by allowing user-specified hashcodes for 
documents.

=== Entities we want to model or record in zookeeper: ===
'''host'''

 * The actual physical (or virtual, as it may be) machine
 * operating system, RAM, disk
 * some sort of metric for what level of load should be placed on this machine

'''node'''

 * a single JVM running one or more solr cores

'''collection'''

 * A collection of documents sharing the same schema

'''shard'''

 * A piece of a collection

'''core'''

 * A solr core (is there a better name we can use for this?)
 * If a core and a shard have a one-to-one mapping, they could be redundant.

'''role'''

 * is a core a master, a search node, a spell-checker node, etc
 * we probably want a generic way to map from role to different configs or 
config overrides

'''network topology'''

 * switch, rack, data center, etc

Resources

http://sourceforge.net/mailarchive/forum.php?forum_name=bailey-developers

[Solr Wiki] Update of "SolrCloud" by YonikSeeley

Reply via email to