[Solr Wiki] Update of "SolrCloud" by jayqhacker

Apache Wiki Mon, 19 Nov 2012 09:08:30 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The "SolrCloud" page has been changed by jayqhacker:
http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=77&rev2=78

Comment:
Mention that zkRun localhost default does not work for a distributed ensemble.

  ## page was renamed from SolrCloud2
  {{http://people.apache.org/~markrmiller/2shard4serverFull.jpg}}
  
- 
  <<TableOfContents()>>
  
- 
  == SolrCloud ==
- 
  SolrCloud is the name of a set of new distributed capabilities in Solr. 
Passing parameters to enable these capabilities will enable you to set up a 
highly available, fault tolerant cluster of Solr servers. Use SolrCloud when 
you want high scale, fault tolerant, distributed indexing and search 
capabilities.
  
  Look at the following 'Getting Started' section to quickly learn how to start 
up a cluster. There are 3 quick examples to follow, each showing how to startup 
a progressively more complicated cluster. After checking out the examples, 
consult the following sections for more detailed information as needed.
@@ -81, +78 @@

  
  This example will simply build off of the previous example by creating 
another copy of shard1 and shard2.  Extra shard copies can be used for high 
availability and fault tolerance, or simply for increasing the query capacity 
of the cluster.
  
- 
  First, run through the previous example so we already have two shards and 
some documents indexed into each.  Then simply make a copy of those two servers:
  
  {{{
@@ -150, +146 @@

  }}}
  Now since we are running three embedded zookeeper servers as an ensemble, 
everything can keep working even if a server is lost. To demonstrate this, kill 
the exampleB server by pressing CTRL+C in it's window and then browse to the 
[[http://localhost:8983/solr/#/~cloud|Solr Zookeeper Admin UI]] to verify that 
the zookeeper service still works.
  
+ Note that when running on multiple hosts, you will need to set 
`-DzkRun=hostname:port` on each host to the exact name and port used in 
`-DzkHost` -- the default `localhost` will not work.
+ 
  == ZooKeeper ==
- Multiple Zookeeper servers running together for fault tolerance and high 
availability is called an ensemble.  For production, it's recommended that you 
run an external zookeeper ensemble rather than having Solr run embedded 
servers.  See the [[http://zookeeper.apache.org/|Apache ZooKeeper]] site for 
more information on downloading and running a zookeeper ensemble. More 
specifically, try 
[[http://zookeeper.apache.org/doc/r3.3.4/zookeeperStarted.html|Getting 
Started]] and 
[[http://zookeeper.apache.org/doc/r3.3.4/zookeeperAdmin.html|ZooKeeper Admin]]. 
It's actually pretty simple to get going. You can stick to having Solr run 
ZooKeeper, but keep in mind that a ZooKeeper cluster is not easily changed 
dynamically. Until further support is added to ZooKeeper, changes are best done 
with rolling restarts. Handling this in a separate process from Solr will 
usually be preferable. 
+ Multiple Zookeeper servers running together for fault tolerance and high 
availability is called an ensemble.  For production, it's recommended that you 
run an external zookeeper ensemble rather than having Solr run embedded 
servers.  See the [[http://zookeeper.apache.org/|Apache ZooKeeper]] site for 
more information on downloading and running a zookeeper ensemble. More 
specifically, try 
[[http://zookeeper.apache.org/doc/r3.3.4/zookeeperStarted.html|Getting 
Started]] and 
[[http://zookeeper.apache.org/doc/r3.3.4/zookeeperAdmin.html|ZooKeeper Admin]]. 
It's actually pretty simple to get going. You can stick to having Solr run 
ZooKeeper, but keep in mind that a ZooKeeper cluster is not easily changed 
dynamically. Until further support is added to ZooKeeper, changes are best done 
with rolling restarts. Handling this in a separate process from Solr will 
usually be preferable.
  
  When Solr runs an embedded zookeeper server, it defaults to using the solr 
port plus 1000 for the zookeeper client port.  In addition, it defaults to 
adding one to the client port for the zookeeper server port, and two for the 
zookeeper leader election port.  So in the first example with Solr running at 
8983, the embedded zookeeper server used port 9983 for the client port and 
9984,9985 for the server ports.
  
  In terms of trying to make sure ZooKeeper is setup to be very fast, keep a 
few things in mind: Solr does not use ZooKeeper intensively - optimizations may 
not be necessary in many cases. Also, while adding more ZooKeeper nodes will 
help some with read performance, it will slightly hurt write performance. 
Again, Solr does not really do much with ZooKeeper when your cluster is in a 
steady state. If you do need to optimize ZooKeeper, here are a few helpful 
notes:
  
   1. ZooKeeper works best when it has a dedicated machine. ZooKeeper is a 
timely service and a dedicated machine helps ensure timely responses. A 
dedicated machine is not required however.
-  2. ZooKeeper works best when you put its transaction log and snap-shots on 
different disk drives.
+  1. ZooKeeper works best when you put its transaction log and snap-shots on 
different disk drives.
-  3. If you do colocate ZooKeeper with Solr, using separate disk drives for 
Solr and ZooKeeper will help with performance.
+  1. If you do colocate ZooKeeper with Solr, using separate disk drives for 
Solr and ZooKeeper will help with performance.
- 
  
  == Managing collections via the Collections API ==
  The collections API let's you manage collections. Under the hood, it 
generally uses the CoreAdmin API to manage SolrCores on each server - it's 
essentially sugar for actions that you could handle yourself if you made 
individual CoreAdmin API calls to each server you wanted an action to take 
place on.
  
- Create
- 
http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4
+ Create 
http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4
  
- Note: replicationFactor defines the maximum number of replicas created in 
addition to the leader from amongst the nodes currently running (i.e. nodes 
added later will not be used for this collection). Imagine you have a cluster 
with 20 nodes and want to add an additional smaller collection to your 
installation with 2 shards, each shard with a leader and two replicas. You 
would specify a replicationFactor=2. Now six of your nodes will host this new 
collection and the other 14 will not host the new collection. 
+ Note: replicationFactor defines the maximum number of replicas created in 
addition to the leader from amongst the nodes currently running (i.e. nodes 
added later will not be used for this collection). Imagine you have a cluster 
with 20 nodes and want to add an additional smaller collection to your 
installation with 2 shards, each shard with a leader and two replicas. You 
would specify a replicationFactor=2. Now six of your nodes will host this new 
collection and the other 14 will not host the new collection.
  
- Delete
- http://localhost:8983/solr/admin/collections?action=DELETE&name=mycollection
+ Delete 
http://localhost:8983/solr/admin/collections?action=DELETE&name=mycollection
  
- Reload
- http://localhost:8983/solr/admin/collections?action=RELOAD&name=mycollection
+ Reload 
http://localhost:8983/solr/admin/collections?action=RELOAD&name=mycollection
  
  == Creating cores via CoreAdmin ==
  New Solr cores may also be created and associated with a collection via 
CoreAdmin.
@@ -222, +216 @@

  {{{
  
http://localhost:8983/solr/collection1/select?collection=collection1_NY,collection1_NJ,collection1_CT
  }}}
- 
  == Required Config ==
- 
  All of the required config is already setup in the example configs shipped 
with Solr. The following is what you need to add if you are migrating old 
config files, or what you should not remove if you are starting with new config 
files.
  
  === schema.xml ===
- 
  You must have a _version_ field defined:
  
  <field name="_version_" type="long" indexed="true" stored="true" 
multiValued="false"/>
  
  === solrconfig.xml ===
- 
  You must have an UpdateLog defined - this should be defined in the 
updateHandler section.
  
  {{{
      <!-- Enables a transaction log, currently used for real-time get.
           "dir" - the target directory for transaction logs, defaults to the
-          solr data directory.  --> 
+          solr data directory.  -->
      <updateLog>
        <str name="dir">${solr.data.dir:}</str>
      </updateLog>
  }}}
- 
  You must have a replication handler called /replication defined:
  
  {{{
-     <requestHandler name="/replication" class="solr.ReplicationHandler" 
startup="lazy" /> 
+     <requestHandler name="/replication" class="solr.ReplicationHandler" 
startup="lazy" />
  }}}
- 
  You must have a realtime get handler called /get defined:
+ 
  {{{
      <requestHandler name="/get" class="solr.RealTimeGetHandler">
        <lst name="defaults">
@@ -261, +250 @@

      </requestHandler>
  }}}
  You must have the admin handlers defined:
+ 
  {{{
      <requestHandler name="/admin/" class="solr.admin.AdminHandlers" />
  }}}
  And you must leave the admin path in solr.xml as the default:
+ 
  {{{
      <cores adminPath="/admin/cores"
  }}}
@@ -278, +269 @@

       <processor class="solr.RunUpdateProcessorFactory" />
     </updateRequestProcessorChain>
  }}}
- 
  If you do not want the '''DistributedUpdateProcessFactory''' auto injected 
into your chain (say you want to use SolrCloud functionality, but you want to 
distribute updates yourself) then specify the following update processor 
factory in your chain: '''NoOpDistributingUpdateProcessorFactory'''
  
  == Re-sizing a Cluster ==
@@ -296, +286 @@

  If you want to use the Near Realtime search support, you will probably want 
to enable auto soft commits in your solrconfig.xml file before putting it into 
zookeeper. Otherwise you can send explicit soft commits to the cluster as you 
desire. See NearRealtimeSearch
  
  == Parameter Reference ==
- 
  === Cluster Params ===
- || numShards || Defaults to 1 || The number of shards to hash documents to. 
There will be one leader per shard and each leader can have N replicas. ||
+ ||numShards ||Defaults to 1 ||The number of shards to hash documents to. 
There will be one leader per shard and each leader can have N replicas. ||
+ 
  
  === SolrCloud Instance Params ===
- 
  These are set in solr.xml, but by default they are setup in solr.xml to also 
work with system properties.
- 
- || host || Defaults to the first local host address found || If the wrong 
host address is found automatically, you can over ride the host address with 
this param. ||
+ ||host ||Defaults to the first local host address found ||If the wrong host 
address is found automatically, you can over ride the host address with this 
param. ||
- || hostPort || Defaults to the jetty.port system property || The port that 
Solr is running on - by default this is found by looking at the jetty.port 
system property. ||
+ ||hostPort ||Defaults to the jetty.port system property ||The port that Solr 
is running on - by default this is found by looking at the jetty.port system 
property. ||
- || hostContext || Defaults to solr || The context path for the Solr webapp. ||
+ ||hostContext ||Defaults to solr ||The context path for the Solr webapp. ||
+ 
+ 
+ 
  
  === SolrCloud Instance ZooKeeper Params ===
- || zkRun || Defaults to localhost:<solrPort+1001> || Causes Solr to run an 
embedded version of ZooKeeper. Set to the address of ZooKeeper on this node - 
this allows us to know who 'we are' in the list of addresses in the zkHost 
connect string. Simply using -DzkRun gets you the default value. ||
+ ||zkRun ||Defaults to localhost:<solrPort+1001> ||Causes Solr to run an 
embedded version of ZooKeeper. Set to the address of ZooKeeper on this node - 
this allows us to know who 'we are' in the list of addresses in the `zkHost` 
connect string. Simply using `-DzkRun` gets you the default value.  Note this 
must be one of the exact strings from `zkHost`; in particular, the default 
`localhost` will not work for a multi-machine ensemble. ||
- || zkHost || No default || The host address for ZooKeeper - usually this 
should be a comma separated list of addresses to each node in your ZooKeeper 
ensemble. ||
+ ||zkHost ||No default ||The host address for ZooKeeper - usually this should 
be a comma separated list of addresses to each node in your ZooKeeper ensemble. 
||
- || zkClientTimeout || Defaults to 15000 || The time a client is allowed to 
not talk to ZooKeeper before having it's session expired. ||
+ ||zkClientTimeout ||Defaults to 15000 ||The time a client is allowed to not 
talk to ZooKeeper before having it's session expired. ||
+ 
  
  zkRun and zkHost are setup using system properties. zkClientTimeout is setup 
in solr.xml, but default, can also be set using a system property.
  
  === SolrCloud Core Params ===
- || shardId || Defaults to being automatically assigned based on numShards || 
Allows you to specify the id used to group SolrCores into shards. ||
+ ||shardId ||Defaults to being automatically assigned based on numShards 
||Allows you to specify the id used to group SolrCores into shards. ||
+ 
  
  shardId can be configured in solr.xml for each core element as an attribute.
  
  == Getting your Configuration Files into ZooKeeper ==
- 
  === Config Startup Bootstrap Params ===
- 
  There are two different ways you can use system properties to upload your 
initial configuration files to ZooKeeper the first time you start Solr. 
Remember that these are meant to be used only on first startup or when 
overwriting configuration files - everytime you start Solr with these system 
properties, any current configuration files in ZooKeeper may be overwritten 
when 'conf set' names match.
  
  1. Look at solr.xml and upload the conf for each SolrCore found. The 'config 
set' name will be the collection name for that SolrCore, and collections will 
use the 'config set' that has a matching name.
- ||bootstrap_conf || No default || If you pass -Dbootstrap_conf=true on 
startup, each SolrCore you have configured will have it's configuration files 
automatically uploaded and linked to the collection that SolrCore is part of ||
+ ||bootstrap_conf ||No default ||If you pass -Dbootstrap_conf=true on startup, 
each SolrCore you have configured will have it's configuration files 
automatically uploaded and linked to the collection that SolrCore is part of ||
+ 
+ 
+ 
+ 
  2. Upload the given directory as a 'conf set' with the given name. No linking 
of collection to 'config set' is done. However, if only one 'conf set' exists, 
a collection will auto link to it.
- ||bootstrap_confdir || No default || If you pass 
-bootstrap_confdir=<directory> on startup, that specific directory of 
configuration files will be uploaded to ZooKeeper with a 'conf set' name 
defined by the below system property, collection.configName ||
+ ||bootstrap_confdir ||No default ||If you pass -bootstrap_confdir=<directory> 
on startup, that specific directory of configuration files will be uploaded to 
ZooKeeper with a 'conf set' name defined by the below system property, 
collection.configName ||
- || collection.configName || Defaults to configuration1 || Determines the name 
of the conf set pointed to by bootstrap_confdir ||
+ ||collection.configName ||Defaults to configuration1 ||Determines the name of 
the conf set pointed to by bootstrap_confdir ||
+ 
+ 
+ 
  
  === Command Line Util ===
  The CLI tool also lets you upload config to ZooKeeper. It allows you to do it 
the same two ways that you can above. It also provides a few other commands 
that let you link collection sets to collections, make ZooKeeper paths or clear 
them, as well as download configs from ZooKeeper to the local filesystem.
@@ -348, +346 @@

   -s,--solrhome <arg>     for bootstrap, runzk: solrhome location
   -z,--zkhost <arg>       ZooKeeper host address
  }}}
- 
  ==== Examples ====
  {{{
  # try uploading a conf dir
@@ -364, +361 @@

  }}}
  ==== Scripts ====
  There are scripts in example/cloud-scripts that handle the classpath and 
class name for you if you are using Solr out of the box with Jetty. Cmds then 
become:
+ 
  {{{
  sh zkcli.sh -cmd linkconfig -zkhost 127.0.0.1:9983 -collection collection1 
-confname conf1 -solrhome example/solr
  }}}
- 
  == Known Limitations ==
- 
  A small number of Solr search components do not support distributed search. 
In some cases, a component may never get distributed support, in other cases it 
may just be a matter of time and effort. All of the search components that do 
not yet support standard distributed search have the same limitation with 
SolrCloud. You can pass distrib=false to use these components on a single 
SolrCore.
  
  The Grouping feature only works if groups are in the same shard. Proper 
support will require custom hashing and there is already a JIRA issue working 
towards this.
  
  == Glossary ==
- 
- ||'''Collection''':|| A single search index.||
+ ||'''Collection''': ||A single search index. ||
- ||'''Shard''':|| Either a logical or physical section of a single index 
depending on context. A logical section is also called a slice. A physical 
shard is expressed as a SolrCore.||
+ ||'''Shard''': ||Either a logical or physical section of a single index 
depending on context. A logical section is also called a slice. A physical 
shard is expressed as a SolrCore. ||
- ||'''Slice''':|| A logical section of a single index. One or more identical, 
physical shards make up a slice.||
+ ||'''Slice''': ||A logical section of a single index. One or more identical, 
physical shards make up a slice. ||
- ||'''SolrCore''':|| Encapsulates a single physical index. One or more make up 
logical shards (or slices) which make up a collection.||
+ ||'''SolrCore''': ||Encapsulates a single physical index. One or more make up 
logical shards (or slices) which make up a collection. ||
- ||'''Node''':|| A single instance of Solr. A single Solr instance can have 
multiple SolrCores that can be part of any number of collections.||
+ ||'''Node''': ||A single instance of Solr. A single Solr instance can have 
multiple SolrCores that can be part of any number of collections. ||
- ||'''Cluster''':|| All of the nodes you are using to host SolrCores.||
+ ||'''Cluster''': ||All of the nodes you are using to host SolrCores. ||
+ 
  
  == FAQ ==
- 
   * '''Q:''' I'm seeing lot's of session timeout exceptions - what to do?
-    '''A:''' Try raising the ZooKeeper session timeout by editing solr.xml - 
see the zkClientTimeout attribute. The minimum session timeout is 2 times your 
ZooKeeper defined tickTime. The maximum is 20 times the tickTime. The default 
tickTime is 2 seconds.
+   . '''A:''' Try raising the ZooKeeper session timeout by editing solr.xml - 
see the zkClientTimeout attribute. The minimum session timeout is 2 times your 
ZooKeeper defined tickTime. The maximum is 20 times the tickTime. The default 
tickTime is 2 seconds.
   * '''Q:''' How do I use SolrCloud, but distribute updates myself?
-    '''A:''' Add the following UpdateProcessorFactory somewhere in your update 
chain: '''NoOpDistributingUpdateProcessorFactory'''
+   . '''A:''' Add the following UpdateProcessorFactory somewhere in your 
update chain: '''NoOpDistributingUpdateProcessorFactory'''
   * '''Q:''' What is the difference between a Collection and a SolrCore?
-    '''A:''' In classic single node Solr, a SolrCore is basically equivalent 
to a Collection. It presents one logical index. In SolrCloud, the SolrCore's on 
multiple nodes form a Collection. This is still just one logical index, but 
multiple SolrCores host different 'shards' of the full collection. So a 
SolrCore encapsulates a single physical index on an instance. A Collection is a 
combination of all of the SolrCores that together provide a logical index that 
is distributed across many nodes.
+   . '''A:''' In classic single node Solr, a SolrCore is basically equivalent 
to a Collection. It presents one logical index. In SolrCloud, the SolrCore's on 
multiple nodes form a Collection. This is still just one logical index, but 
multiple SolrCores host different 'shards' of the full collection. So a 
SolrCore encapsulates a single physical index on an instance. A Collection is a 
combination of all of the SolrCores that together provide a logical index that 
is distributed across many nodes.

[Solr Wiki] Update of "SolrCloud" by jayqhacker

Reply via email to