[Solr Wiki] Update of "SolrCloud" by ShawnHeisey

Apache Wiki Sat, 26 Jan 2013 11:16:07 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The "SolrCloud" page has been changed by ShawnHeisey:
http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=88&rev2=89

Comment:
Added information about solr port, zookeeper, and solr.xml.

  
  If you haven't yet, go through the simple 
[[http://lucene.apache.org/solr/tutorial.html|Solr Tutorial]] to familiarize 
yourself with Solr. Note: reset all configuration and remove documents from the 
tutorial before going through the cloud features. Copying the example 
directories with pre-existing Solr indexes will cause document counts to be off.
  
- Solr embeds and uses Zookeeper as a repository for cluster configuration and 
coordination - think of it as a distributed filesystem that contains 
information about all of the Solr servers.
+ Solr embeds and uses Zookeeper as a repository for cluster configuration and 
coordination - think of it as a distributed filesystem that contains 
information about all of the Solr servers
+ 
+ If you want to use a port other than 8983 for Solr, see the note about 
solr.xml under Parameter Reference below.
  
  === Example A: Simple two shard cluster ===
  {{http://people.apache.org/~markrmiller/2shard2server.jpg}}
@@ -165, +167 @@

  Create 
http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4
  
  About the params
+ 
   * '''name''': The name of the collection to be created
   * '''numShards''': The number of logical shards (sometimes called slices) to 
be created as part of the collection
   * '''replicationFactor''': The number of copies of each document (or, the 
number of physical replicas to be created for each logical shard of the 
collection.)  A replicationFactor of 3 means that there will be 3 replicas (one 
of which is normally designated to be the leader) for each logical shard.  
NOTE: in Solr 4.0, replicationFactor was the number of *additional* copies as 
opposed to the total number of copies.
   * '''maxShardsPerNode''' : A create operation will spread 
numShards*replicationFactor shard-replica across your live Solr nodes - fairly 
distributed, and never two replica of the same shard on the same Solr node. If 
a Solr is not live at the point in time where the create operation is carried 
out, it will not get any parts of the new collection. To prevent too many 
replica being created on a single Solr node, use maxShardsPerNode to set a 
limit for how many replicas the create operation is allowed to create on each 
node - default is 1. If it cannot fit the entire collection 
numShards*replicationFactor replicas on you live Solrs it will not create 
anything at all.
   * '''createNodeSet''': If not provided the create operation will create 
shard-replica spread across all of your live Solr nodes. You can provide the 
"createNodeSet" parameter to change the set of nodes to spread the 
shard-replica across. The format of values for this param is 
"<node-name1>,<node-name2>,...,<node-nameN>" - e.g. 
"localhost:8983_solr,localhost:8984_solr,localhost:8985_solr"
- 
  
  Delete 
http://localhost:8983/solr/admin/collections?action=DELETE&name=mycollection
  
@@ -298, +300 @@

  
  
  === SolrCloud Instance Params ===
- These are set in solr.xml, but by default they are setup in solr.xml to also 
work with system properties.
+ These are set in solr.xml, but by default they are setup in solr.xml to also 
work with system properties.  Important note: the port found here will be used 
(via zookeeper) to inform the rest of the cluster what port each Solr instance 
is using.  The default port is 8983.  The example solr.xml uses the jetty.port 
system property, so if you want to use a port other than 8983, either you have 
to set this property when starting Solr, or you have to change solr.xml to fit 
your particular installation.
  ||host ||Defaults to the first local host address found ||If the wrong host 
address is found automatically, you can over ride the host address with this 
param. ||
  ||hostPort ||Defaults to the jetty.port system property ||The port that Solr 
is running on - by default this is found by looking at the jetty.port system 
property. ||
  ||hostContext ||Defaults to solr ||The context path for the Solr webapp.  
(Note: in Solr 4.0, it was mandatory that the hostContext not contain "/" or 
"_" characters.  Begining with Solr 4.1, this limitation was removed, and it is 
recomended that you specify the begining slash.  When running in the example 
jetty configs, the "hostContext" system property can be used to control both 
the servlet context used by jetty, and the hostContext used by SolrCloud -- eg: 
{{{-DhostContext=/solr}}}) ||
+ 
+ 
  
  
  === SolrCloud Instance ZooKeeper Params ===
@@ -372, +376 @@

  }}}
  === Zookeeper chroot ===
  If you are already using Zookeeper for other applications and you want to 
keep the ZNodes organized by application, or if you want to have multiple 
separated SolrCloud clusters sharing one Zookeeper ensemble you can use 
Zookeeper's "chroot" option. From Zookeeper's documentation: 
http://zookeeper.apache.org/doc/r3.3.6/zookeeperProgrammers.html#ch_zkSessions
+ 
  {{{
  An optional "chroot" suffix may also be appended to the connection string. 
This will run the client commands while interpreting all paths relative to this 
root (similar to the unix chroot command). If used the example would look like: 
"127.0.0.1:4545/app/a" or "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002/app/a" 
where the client would be rooted at "/app/a" and all paths would be relative to 
this root - ie getting/setting/etc... "/foo/bar" would result in operations 
being run on "/app/a/foo/bar" (from the server perspective).
  }}}
  To use this Zookeeper feature, simply start Solr with the "chroot" suffix in 
the zkHost parameter. For example:
+ 
  {{{
  java -DzkHost=localhost:9983/foo/bar -jar start.jar
  }}}
  or
+ 
  {{{
  java -DzkHost=zoo1:9983,zoo2:9983,zoo3:9983/foo/bar -jar start.jar
  }}}
  '''NOTE:''' With Solr 4.0 you'll need to create the initial path in 
Zoookeeper before starting Solr. Since Solr 4.1, the initial path will 
automatically be created if you are using either ''bootstrap_conf'' or 
''boostrap_confdir''.
+ 
  == Known Limitations ==
  A small number of Solr search components do not support distributed search. 
In some cases, a component may never get distributed support, in other cases it 
may just be a matter of time and effort. All of the search components that do 
not yet support standard distributed search have the same limitation with 
SolrCloud. You can pass distrib=false to use these components on a single 
SolrCore.
  
  The Grouping feature only works if groups are in the same shard. Proper 
support will require custom hashing and there is already a JIRA issue working 
towards this.
  
  == Glossary ==
- ||'''Collection''': ||A single search index.||
+ ||'''Collection''': ||A single search index. ||
  ||'''Shard''': ||A logical section of a single collection (also called 
Slice). Sometimes people will talk about "Shard" in a physical sense (a 
manifestation of a logical shard) ||
  ||'''Replica''': ||A physical manifestation of a logical Shard, implemented 
as a single Lucene index on a SolrCore ||
- ||'''Leader''': ||One Replica of every Shard will be designated as a Leader 
to coordinate indexing for that Shard||
+ ||'''Leader''': ||One Replica of every Shard will be designated as a Leader 
to coordinate indexing for that Shard ||
  ||'''SolrCore''': ||Encapsulates a single physical index. One or more make up 
logical shards (or slices) which make up a collection. ||
  ||'''Node''': ||A single instance of Solr. A single Solr instance can have 
multiple SolrCores that can be part of any number of collections. ||
  ||'''Cluster''': ||All of the nodes you are using to host SolrCores. ||
@@ -401, +409 @@

  
  == FAQ ==
   * '''Q:''' I'm seeing lot's of session timeout exceptions - what to do?
-   . '''A:''' Try raising the ZooKeeper session timeout by editing solr.xml - 
see the zkClientTimeout attribute. The minimum session timeout is 2 times your 
ZooKeeper defined tickTime. The maximum is 20 times the tickTime. The default 
tickTime is 2 seconds. You should avoiding raising this for no good reason, but 
it should be high enough that you don't see a lot of false session timeouts due 
to load, network lag, or garbage collection pauses. Some environments might 
need to go as high as 30-60 seconds. 
+   . '''A:''' Try raising the ZooKeeper session timeout by editing solr.xml - 
see the zkClientTimeout attribute. The minimum session timeout is 2 times your 
ZooKeeper defined tickTime. The maximum is 20 times the tickTime. The default 
tickTime is 2 seconds. You should avoiding raising this for no good reason, but 
it should be high enough that you don't see a lot of false session timeouts due 
to load, network lag, or garbage collection pauses. Some environments might 
need to go as high as 30-60 seconds.
   * '''Q:''' How do I use SolrCloud, but distribute updates myself?
    . '''A:''' Add the following UpdateProcessorFactory somewhere in your 
update chain: '''NoOpDistributingUpdateProcessorFactory'''
   * '''Q:''' What is the difference between a Collection and a SolrCore?

[Solr Wiki] Update of "SolrCloud" by ShawnHeisey

Reply via email to