Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The "SolrCloud" page has been changed by YonikSeeley. The comment on this change is: a start on some high level requirements / design. http://wiki.apache.org/solr/SolrCloud -------------------------------------------------- New page: === High level design goals === These are long term goals for SolrCloud. Many of these features will not be developed in the first versions, but we're designing for the long haul. ===== High Availability and Fault Tolerance ===== No external load balancer should be required. We should eliminate any single points of failure (i.e. start with a design that will allow us to add this feature at a later point with minimal changes to things like the zookeeper schema) ===== Cluster resizing and rebalancing ===== To grow the cluster or to rebalance due to hotspots, shards should be resizable. Pieces of existing shards should be able to be split off and assigned to new servers. ===== Clients should not have to know about cluster layout. ===== A simple client (like a browser) should be able to hit any solr search server in the cluster with a request, and that search server should be able to execute a distributed search against the cluster as a whole, including load balancing and failover, to return the results to the client. ===== Open API ===== The zookeeper schema should be well defined and public, allowing other software components other than the master node to inspect and change the cluster via zookeeper. A task like creating a new collection eventually be achievable by creating the correct znodes/files in zookeeper, w/o having to talk to any solr servers. ===== Support various levels of custom clusters ===== Support various splits between how much the user manages and how much solr manages. One could have a set of servers in zookeeper with defined indexes (shards or complete copies) and want to just use the client search capabilities. One may also want a traditional master/slave relationship, even if more advanced options are available. ===== Support user specified partitioning ===== Partitioning of documents by geographic region, time, user, etc, brings huge performance benefits by allowing only a part of the cluster to be queried. This should be an option even in conjunction with the most advanced modes of operation (automatic document->shard assignment, index rebalancing, no single points of failure, etc) presumably by allowing user-specified hashcodes for documents. === Entities we want to model or record in zookeeper: === '''host''' * The actual physical (or virtual, as it may be) machine * operating system, RAM, disk * some sort of metric for what level of load should be placed on this machine '''node''' * a single JVM running one or more solr cores '''collection''' * A collection of documents sharing the same schema '''shard''' * A piece of a collection '''core''' * A solr core (is there a better name we can use for this?) * If a core and a shard have a one-to-one mapping, they could be redundant. '''role''' * is a core a master, a search node, a spell-checker node, etc * we probably want a generic way to map from role to different configs or config overrides '''network topology''' * switch, rack, data center, etc Resources http://sourceforge.net/mailarchive/forum.php?forum_name=bailey-developers
