Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The "SolrCloud" page has been changed by YonikSeeley.
http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=7&rev2=8

--------------------------------------------------

  
  Have some sort of command list that every server should execute before 
certain actions? (could involve hitting URLs, executing system commands, etc)?
  
+ === Distributed Search ===
+ ==== Basic Distributed Search ====
+ The state of the cluster will be read at startup.  Changes to the state will 
be immediately reflected in the internal representation via zookeeper watches.  
Once a cluster state has been built, a connection to zookeeper is not needed to 
serve requests (i.e. it can work when disconnected from zk).
+ 
+ implementation detail:  certain information about the internal representation 
of the cluster should be copied at the start of a request and probably 
shouldn't change during the request.  This probably includes the shards that 
will be included in the request (we don't want that changing between phases of 
a request), and the nodes we are querying for those shards.  Someone may take a 
node out of service, or zookeeper may have marked the node as failed, but we 
can simply continue using the normal request/failover logic for the duration of 
that distributed request.
+ 
+ Connection refused errors from solr_server->solr_server  (or other errors 
that we believe would not result in an error if executed on a different node) 
should result in failover behavior (re-request a different shard).  It can be a 
local policy decision to not try that node again for a certain amount of time 
after so many of these errors.  Zookeeper does not need to be updated with this 
info (but could be in the future).
+ 
+ ==== Timeouts ====
+ Zookeeper ephemeral znodes can be used to determine what servers are 
available for requests.
+ 
+ Q: if zookeeper dies and comes back up, does it come back with all the 
ephemeral nodes?  If all the ephemeral nodes are deleted, we need to disregard 
and continue using our last internal model.
+ 
+ solr_server->solr_server requests may result in a timeout after 
"shard-socket-timeout".  If a flag indicating partialResults is set, we should 
not retry a different shard.  If a flag indicating partialResults is not set, 
we fail the request, or retry a different shard, depending on a new 
"retryOnTimeout" flag.  After a configurable number of timeouts, where other 
shards did not timeout, we can mark the node as "slow" or "timedout" in 
zookeeper.  A leader could optionally act on that information to remove the 
node or reallocate resources.
+ 
  == Resources ==
  http://sourceforge.net/mailarchive/forum.php?forum_name=bailey-developers
  

Reply via email to