Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by YonikSeeley: http://wiki.apache.org/solr/DistributedSearch ------------------------------------------------------------------------------ Q: how would request handler get a particular version of a SolrIndexSearcher? One is already bound to a SolrQueryRequest (the newest), but if an older one is requested should that logic be built into SolrCore? Handling it in SolrCore would be both cleaner for request handlers, but it would complicate SolrCore too. + === Multi-phased approach, allowing for inconsistency === + Do a mulit-phased approach (separate query phase from stored field retrievial and document highlighting), but communicate + using the uniqueKey fields rather than internal docids. + + This is simpler, but opens a window of inconsistency because the index could always change between phases. + This level of inconsistency may be acceptable given that there is already inconsistency caused by clients paging through results. + + Downsides of using uniqueKeys instead of lucene docids: + * doing internal_id->uniqueKey is either expensive, or requires a lot of memory (FieldCache entry) + * could mitigate in the future with payloads, or up-and-coming fields-stored-separately + * paging deeper into results will require more network bandwidth since uniqueKeys could be large + === High Availability === How can High Availability be obtained on the query side? * sub-searchers could be identified by VIPs (top-level-searcher would go through a load-balancer to access sub-searchers). @@ -226, +238 @@ == Misc == * Any realistic way to use Hadoop? + * probably not... map-reduce is more for long running batch jobs (data mining, index building, log processing, etc) * Multi-core seems to be a lasting trend. * x86: Dual cores are now standard, 4 cores/chip are right around the corner (2006 end) * parallelize certain request portions to lower latency... @@ -239, +252 @@ * Simple Distributed Search: After analysis of search patterns (for a particular application needing distributed search), simple distributed search is not an option because of the depth (topN) of the searches. The solution would quickly become network-bound from the large responses from sub-searchers, and IO bound from reading all of the stored fields of the documents. === Current approach === - "Consistency via Specifying Index Version" would allow fewer changes to the core, request handlers, highlighting & faceting code. + "Multi-phased approach, allowing for inconsistency" is what is being first used for + the query side of https://issues.apache.org/jira/browse/SOLR-303 . + Distributing the indexing will be up to users via a Multiple Master approach. + In the future, we may want to migrate to "Consistency via Specifying Index Version" and lucene internal docids.
