Re: Katta's goodness for Solr

Noble Paul നോബിള്‍ नोब्ळ् Wed, 12 Nov 2008 20:51:00 -0800

On Thu, Nov 13, 2008 at 10:11 AM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> I'm not sure if you are commenting on how Katta does things in that 
> LoadBalancers part, but Katta doesn't do that as far as I know.  Passing 
> shard URL in request is the Solr thing, but I think we concluded shard URLs 
> can also live in "defaults" for the handler, no?
For katta it may not matter because the config comes down from a
cenral repo. So it knows who the other shards are and it can contact
them directly (using hadoop ipc).


Putting the shard urls in solrconfig does not make our solution any
more elegant. The shards may go up or go down at any moment and the
whole system has no means of coping with it (unless you use a
Loadbalancer) .
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
>
> ________________________________
> From: Noble Paul നോബിള്‍ नोब्ळ् <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, November 12, 2008 11:06:21 PM
> Subject: Re: Katta's goodness for Solr
>
> The way we do distributed search is not straight forward . Introducing
> extra layers (LoadBalancers) in between the shards looks like a hack
> to me. Moreover , passing in the shard URL in the request is not a
> very nice design The clients should be ideally unaware of the fact
> that they are doing a distributed search
>
> We must move fast in order to catch up with the developments in other 
> projects .
>
> On Tue, Nov 11, 2008 at 11:45 PM, Otis Gospodnetic
> <[EMAIL PROTECTED]> wrote:
>> Quick thought.  I saw Stefan's Katta presentation last night.  Katta seems 
>> nice and simple.  If I understood correctly, juicy stuff that is interesting 
>> to Solr is:
>> - Katta has a notion of a Primary Master and N Secondary Slaves (no SPOF 
>> there)
>> - Search Nodes serve index shards copied locally from some shared storage
>> - Zookeeper instances (again Primary Master and N Secondary Slaves) that 
>> facilitate communication among distributed components
>>
>> The master:
>> -- knows how to distribute a set of index shards it is given across a number 
>> of search nodes (distribution policy pluggable, similar to Hadoop's, but 
>> different)
>> -- has a map of which shard is on which search node (in Zookeeper)
>> -- knows how to replicate each shard (replication factor configurable)
>> -- knows when a search node goes down (via Zookeeper notification)
>> -- knows how to create more replicas of shards on dead search node (and 
>> remove extra replicas when search node is revived)
>> -- can notify search nodes when a new index is available (via Zookeeper)
>>
>> More in:
>> http://joa23.files.wordpress.com/2008/09/katta-overview.pdf
>>
>> Paul Noble will like slide #13 ;)
>>
>> In particular, I think that:
>> - Making use of Zookeper for index snapshot + replication might be useful 
>> (Master publishes the info about a new snapshot to Zookier and Search Slaves 
>> get notified immediately and start copying the index)
>> - Making use of Zookeper for keeping a map of index shards + applying a 
>> replication factor would be very useful
>> - Making use of pluggable shard placement policy would be useful
>>
>> Thoughts?
>>
>> Also:
>> While Katta provides shard->search server functionality via pluggable impl, 
>> what both Solr and Katta are still missing is the doc->shard functionality.  
>> However, this might not be terribly hard if we do something similar to 
>> Katta's pluggable shard->search server distribution policy.  Please mind I'm 
>> saying this without having looked at any of the Katta code.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul

Re: Katta's goodness for Solr

Reply via email to