Katta has a very flexible and usable option for this even in the absence of replicas.
The idea is that shards may report results, may report failure, may report late, may never report or may have a transport layer issue. All kinds of behavior should be handled. What is done with katta is that each search has a deadline and a partial results policy. At any time, if all results have been received, a complete set of results is returned. If a deadline is reached, then the policy is interrogated with the results so far. The policy has the option to return a failure, partial results (with timeouts reported on missing shards) or to set a new deadline and possibly a new policy (so that the number of missing results gets more relaxed as time passes). The policy is also called each time a new result is received or failure is noted. Transport layer issues and explicit error returns are handled by the framework. Any time one of these is encountered, the search is immediately dispatched to a replica of the shard if one exists. In that case, that query may have a late start and may not return by the deadline, depending on policy. If no replica is available that has not been queried, an error result is recorded for that shard. Note that Katta even supports fail-fast in this scenario since the partial result policy can return a new deadline for all partial results that have no hard failures and can return a failure if it notes any shard failures. On Tue, Feb 9, 2010 at 5:25 AM, Yonik Seeley <yo...@lucidimagination.com>wrote: > The SolrCloud branch now has load balancing and fail-over amongst > shard replicas. > Partial results aren't available yet (if there are no up replicas for > a shard), but that is planned. > > -Yonik > http://www.lucidimagination.com > > > On Tue, Feb 9, 2010 at 8:21 AM, Jan Høydahl / Cominvent > <jan....@cominvent.com> wrote: > > Isn't that OK as long as there is the option of allowing partial results > if you really want? > > Keeping the logic simple has its benefits. Let client be responsible for > query resubmit strategy, and let load balancer (or shard manager) be > responsible for marking a node/shard as dead/inresponsive and choosing > another for the next query. > > > > -- > > Jan Høydahl - search architect > > Cominvent AS - www.cominvent.com > > > > On 9. feb. 2010, at 04.36, Lance Norskog wrote: > > > >> At this point, Distributed Search does not support any recovery if > >> when one or more shards fail. If any fail or time out, the whole query > >> fails. > >> > >> On Sat, Feb 6, 2010 at 9:34 AM, mike anderson <saidthero...@gmail.com> > wrote: > >>> "so if we received the response from shard2 before shard1, we would > just > >>> queue it up and wait for the response to shard1." > >>> > >>> This crossed my mind, but my concern was how to handle the case when > shard1 > >>> never responds. Is this something I need to worry about? > >>> > >>> -mike > >>> > >>> On Sat, Feb 6, 2010 at 11:33 AM, Yonik Seeley < > yo...@lucidimagination.com>wrote: > >>> > >>>> It seems like changing an element in a priority queue breaks the > >>>> invariants, and hence it's not doable with a priority queue and with > >>>> the current strategy of adding sub-responses as they are received. > >>>> > >>>> One way to continue using a priority queue would be to add > >>>> sub-responses to the queue in the preferred order... so if we received > >>>> the response from shard2 before shard1, we would just queue it up and > >>>> wait for the response to shard1. > >>>> > >>>> -Yonik > >>>> http://www.lucidimagination.com > >>>> > >>>> > >>>> On Sat, Feb 6, 2010 at 10:35 AM, mike anderson < > saidthero...@gmail.com> > >>>> wrote: > >>>>> I have a need to favor documents from one shard over another when > >>>> duplicates > >>>>> occur. I found this code in the query component: > >>>>> > >>>>> String prevShard = uniqueDoc.put(id, srsp.getShard()); > >>>>> if (prevShard != null) { > >>>>> // duplicate detected > >>>>> numFound--; > >>>>> > >>>>> // For now, just always use the first encountered since we > >>>> can't > >>>>> currently > >>>>> // remove the previous one added to the priority queue. > If we > >>>>> switched > >>>>> // to the Java5 PriorityQueue, this would be easier. > >>>>> continue; > >>>>> // make which duplicate is used deterministic based on > shard > >>>>> // if (prevShard.compareTo(srsp.shard) >= 0) { > >>>>> // TODO: remove previous from priority queue > >>>>> // continue; > >>>>> // } > >>>>> } > >>>>> > >>>>> > >>>>> Is there a ticket open for this issue? What would it take to fix? > >>>>> > >>>>> Thanks, > >>>>> Mike > >>>>> > >>>> > >>> > >> > >> > >> > >> -- > >> Lance Norskog > >> goks...@gmail.com > > > > > -- Ted Dunning, CTO DeepDyve