Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The "SolrCloud" page has been changed by ShawnHeisey: http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=88&rev2=89 Comment: Added information about solr port, zookeeper, and solr.xml. If you haven't yet, go through the simple [[http://lucene.apache.org/solr/tutorial.html|Solr Tutorial]] to familiarize yourself with Solr. Note: reset all configuration and remove documents from the tutorial before going through the cloud features. Copying the example directories with pre-existing Solr indexes will cause document counts to be off. - Solr embeds and uses Zookeeper as a repository for cluster configuration and coordination - think of it as a distributed filesystem that contains information about all of the Solr servers. + Solr embeds and uses Zookeeper as a repository for cluster configuration and coordination - think of it as a distributed filesystem that contains information about all of the Solr servers + + If you want to use a port other than 8983 for Solr, see the note about solr.xml under Parameter Reference below. === Example A: Simple two shard cluster === {{http://people.apache.org/~markrmiller/2shard2server.jpg}} @@ -165, +167 @@ Create http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4 About the params + * '''name''': The name of the collection to be created * '''numShards''': The number of logical shards (sometimes called slices) to be created as part of the collection * '''replicationFactor''': The number of copies of each document (or, the number of physical replicas to be created for each logical shard of the collection.) A replicationFactor of 3 means that there will be 3 replicas (one of which is normally designated to be the leader) for each logical shard. NOTE: in Solr 4.0, replicationFactor was the number of *additional* copies as opposed to the total number of copies. * '''maxShardsPerNode''' : A create operation will spread numShards*replicationFactor shard-replica across your live Solr nodes - fairly distributed, and never two replica of the same shard on the same Solr node. If a Solr is not live at the point in time where the create operation is carried out, it will not get any parts of the new collection. To prevent too many replica being created on a single Solr node, use maxShardsPerNode to set a limit for how many replicas the create operation is allowed to create on each node - default is 1. If it cannot fit the entire collection numShards*replicationFactor replicas on you live Solrs it will not create anything at all. * '''createNodeSet''': If not provided the create operation will create shard-replica spread across all of your live Solr nodes. You can provide the "createNodeSet" parameter to change the set of nodes to spread the shard-replica across. The format of values for this param is "<node-name1>,<node-name2>,...,<node-nameN>" - e.g. "localhost:8983_solr,localhost:8984_solr,localhost:8985_solr" - Delete http://localhost:8983/solr/admin/collections?action=DELETE&name=mycollection @@ -298, +300 @@ === SolrCloud Instance Params === - These are set in solr.xml, but by default they are setup in solr.xml to also work with system properties. + These are set in solr.xml, but by default they are setup in solr.xml to also work with system properties. Important note: the port found here will be used (via zookeeper) to inform the rest of the cluster what port each Solr instance is using. The default port is 8983. The example solr.xml uses the jetty.port system property, so if you want to use a port other than 8983, either you have to set this property when starting Solr, or you have to change solr.xml to fit your particular installation. ||host ||Defaults to the first local host address found ||If the wrong host address is found automatically, you can over ride the host address with this param. || ||hostPort ||Defaults to the jetty.port system property ||The port that Solr is running on - by default this is found by looking at the jetty.port system property. || ||hostContext ||Defaults to solr ||The context path for the Solr webapp. (Note: in Solr 4.0, it was mandatory that the hostContext not contain "/" or "_" characters. Begining with Solr 4.1, this limitation was removed, and it is recomended that you specify the begining slash. When running in the example jetty configs, the "hostContext" system property can be used to control both the servlet context used by jetty, and the hostContext used by SolrCloud -- eg: {{{-DhostContext=/solr}}}) || + + === SolrCloud Instance ZooKeeper Params === @@ -372, +376 @@ }}} === Zookeeper chroot === If you are already using Zookeeper for other applications and you want to keep the ZNodes organized by application, or if you want to have multiple separated SolrCloud clusters sharing one Zookeeper ensemble you can use Zookeeper's "chroot" option. From Zookeeper's documentation: http://zookeeper.apache.org/doc/r3.3.6/zookeeperProgrammers.html#ch_zkSessions + {{{ An optional "chroot" suffix may also be appended to the connection string. This will run the client commands while interpreting all paths relative to this root (similar to the unix chroot command). If used the example would look like: "127.0.0.1:4545/app/a" or "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002/app/a" where the client would be rooted at "/app/a" and all paths would be relative to this root - ie getting/setting/etc... "/foo/bar" would result in operations being run on "/app/a/foo/bar" (from the server perspective). }}} To use this Zookeeper feature, simply start Solr with the "chroot" suffix in the zkHost parameter. For example: + {{{ java -DzkHost=localhost:9983/foo/bar -jar start.jar }}} or + {{{ java -DzkHost=zoo1:9983,zoo2:9983,zoo3:9983/foo/bar -jar start.jar }}} '''NOTE:''' With Solr 4.0 you'll need to create the initial path in Zoookeeper before starting Solr. Since Solr 4.1, the initial path will automatically be created if you are using either ''bootstrap_conf'' or ''boostrap_confdir''. + == Known Limitations == A small number of Solr search components do not support distributed search. In some cases, a component may never get distributed support, in other cases it may just be a matter of time and effort. All of the search components that do not yet support standard distributed search have the same limitation with SolrCloud. You can pass distrib=false to use these components on a single SolrCore. The Grouping feature only works if groups are in the same shard. Proper support will require custom hashing and there is already a JIRA issue working towards this. == Glossary == - ||'''Collection''': ||A single search index.|| + ||'''Collection''': ||A single search index. || ||'''Shard''': ||A logical section of a single collection (also called Slice). Sometimes people will talk about "Shard" in a physical sense (a manifestation of a logical shard) || ||'''Replica''': ||A physical manifestation of a logical Shard, implemented as a single Lucene index on a SolrCore || - ||'''Leader''': ||One Replica of every Shard will be designated as a Leader to coordinate indexing for that Shard|| + ||'''Leader''': ||One Replica of every Shard will be designated as a Leader to coordinate indexing for that Shard || ||'''SolrCore''': ||Encapsulates a single physical index. One or more make up logical shards (or slices) which make up a collection. || ||'''Node''': ||A single instance of Solr. A single Solr instance can have multiple SolrCores that can be part of any number of collections. || ||'''Cluster''': ||All of the nodes you are using to host SolrCores. || @@ -401, +409 @@ == FAQ == * '''Q:''' I'm seeing lot's of session timeout exceptions - what to do? - . '''A:''' Try raising the ZooKeeper session timeout by editing solr.xml - see the zkClientTimeout attribute. The minimum session timeout is 2 times your ZooKeeper defined tickTime. The maximum is 20 times the tickTime. The default tickTime is 2 seconds. You should avoiding raising this for no good reason, but it should be high enough that you don't see a lot of false session timeouts due to load, network lag, or garbage collection pauses. Some environments might need to go as high as 30-60 seconds. + . '''A:''' Try raising the ZooKeeper session timeout by editing solr.xml - see the zkClientTimeout attribute. The minimum session timeout is 2 times your ZooKeeper defined tickTime. The maximum is 20 times the tickTime. The default tickTime is 2 seconds. You should avoiding raising this for no good reason, but it should be high enough that you don't see a lot of false session timeouts due to load, network lag, or garbage collection pauses. Some environments might need to go as high as 30-60 seconds. * '''Q:''' How do I use SolrCloud, but distribute updates myself? . '''A:''' Add the following UpdateProcessorFactory somewhere in your update chain: '''NoOpDistributingUpdateProcessorFactory''' * '''Q:''' What is the difference between a Collection and a SolrCore?