Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by ShalinMangar:
http://wiki.apache.org/solr/DistributedSearchDesign

The comment on the change is:
Added notes on the current DistributedSearch approach

------------------------------------------------------------------------------
  Distributing the indexing will be up to users via a Multiple Master approach.
  In the future, we may want to migrate to "Consistency via Specifying Index 
Version" and lucene internal docids.
  
+ The query is executed in phases. In each phase a request is sent to relevant 
shards in a separate thread. After all the responses are received for all 
requests the next phase is executed.
+ 
+ ==== Phase 1: GET_TOP_IDS [& GET_FACETS] ====
+ Each shard is requested for the top matching document's unique keys and sort 
fields with facets for the given query. The number of keys requested in this 
phase is 'N' (start=0&rows=N) regardless of the start specified, so that the 
results can be correctly merged together.
+ 
+ The response gets the unique keys for each document and their scores. If 
GET_FACETS is requested it returns the top 'N' facets. n=facet.count. After the 
responses are obtained they are merged and sorted by the rank. From the sorted 
list the documents to be returned are identified on the basis of 'start' and 
'rows' parameter.
+ 
+ ==== Phase 2 ====
+ Request are sent to fetch fields, highlighting and MoreLikeThis information 
only for the documents identified in Phase 1. The request contains the document 
unique keys and is sent to only the relevant shard which has the document.
+ 
+ ==== Phase 3: REFINE_FACETS (only for faceted search) ====
+ The original returned facets may have insufficient information. So more 
requests are sent to shards for refining facets. Note that the approach applied 
here gives accurate counts but theoretically, it is possible to miss some facet 
terms.
+ 
+ After the document fields and facets are obtained the response is constructed 
and sent back to client.
+ 
+ It is possible that during the small window of time (from phase 1-3) the 
index may change. In that case the responses may have incorrect data. That is 
ignored for the time-being.
+ 

Reply via email to