[Solr Wiki] Update of "DistributedSearch" by YonikSeeley

Apache Wiki Tue, 20 Nov 2007 14:15:27 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by YonikSeeley:
http://wiki.apache.org/solr/DistributedSearch

------------------------------------------------------------------------------
  
  Q: how would request handler get a particular version of a SolrIndexSearcher? 
 One is already bound to a SolrQueryRequest (the newest), but if an older one 
is requested should that logic be built into SolrCore?  Handling it in SolrCore 
would be both cleaner for request handlers, but it would complicate SolrCore 
too.
  
+ === Multi-phased approach, allowing for inconsistency ===
+ Do a mulit-phased approach (separate query phase from stored field retrievial 
and document highlighting), but communicate
+ using the uniqueKey fields rather than internal docids.
+ 
+ This is simpler, but opens a window of inconsistency because the index could 
always change between phases.
+ This level of inconsistency may be acceptable given that there is already 
inconsistency caused by clients paging through results.
+ 
+ Downsides of using uniqueKeys instead of lucene docids:
+  * doing internal_id->uniqueKey is either expensive, or requires a lot of 
memory (FieldCache entry)
+    * could mitigate in the future with payloads, or up-and-coming 
fields-stored-separately
+  * paging deeper into results will require more network bandwidth since 
uniqueKeys could be large
+ 
  === High Availability ===
  How can High Availability be obtained on the query side?
   * sub-searchers could be identified by VIPs (top-level-searcher would go 
through a load-balancer to access sub-searchers).
@@ -226, +238 @@

  
  == Misc ==
   * Any realistic way to use Hadoop?
+    * probably not... map-reduce is more for long running batch jobs (data 
mining, index building, log processing, etc)
   * Multi-core seems to be a lasting trend.
     * x86: Dual cores are now standard, 4 cores/chip are right around the 
corner (2006 end)
     * parallelize certain request portions to lower latency...
@@ -239, +252 @@

   * Simple Distributed Search: After analysis of search patterns (for a 
particular application needing distributed search), simple distributed search 
is not an option because of the depth (topN) of the searches.  The solution 
would quickly become network-bound from the large responses from sub-searchers, 
and IO bound from reading all of the stored fields of the documents.
  
  === Current approach ===
- "Consistency via Specifying Index Version" would allow fewer changes to the 
core, request handlers, highlighting & faceting code.
+ "Multi-phased approach, allowing for inconsistency" is what is being first 
used for
+ the query side of https://issues.apache.org/jira/browse/SOLR-303 .
+ Distributing the indexing will be up to users via a Multiple Master approach.
+ In the future, we may want to migrate to "Consistency via Specifying Index 
Version" and lucene internal docids.

[Solr Wiki] Update of "DistributedSearch" by YonikSeeley

Reply via email to