Re: True master-master fail-over without data gaps (choosing CA in CAP)

Jake Luciani Wed, 09 Mar 2011 15:04:47 -0800

Jason,

It's predecessor did, Lucandra. But Solandra is a new approach that manages 
shards of documents across the cluster for you and uses solrs distributed 
search to query indexes.


Jake

On Mar 9, 2011, at 5:15 PM, Jason Rutherglen <jason.rutherg...@gmail.com> wrote:

> Doesn't Solandra partition by term instead of document?
> 
> On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. <dsmi...@mitre.org> wrote:
>> I was just about to jump in this conversation to mention Solandra and go 
>> fig, Solandra's committer comes in. :-)   It was nice to meet you at Strata, 
>> Jake.
>> 
>> I haven't dug into the code yet but Solandra strikes me as a killer way to 
>> scale Solr. I'm looking forward to playing with it; particularly looking at 
>> disk requirements and performance measurements.
>> 
>> ~ David Smiley
>> 
>> On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote:
>> 
>>> Hi Otis,
>>> 
>>> Have you considered using Solandra with Quorum writes
>>> to achieve master/master with CA semantics?
>>> 
>>> -Jake
>>> 
>>> 
>>> On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com
>>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> ---- Original Message ----
>>>> 
>>>>> From: Robert Petersen <rober...@buy.com>
>>>>> 
>>>>> Can't you skip the SAN and keep the indexes locally?  Then you  would
>>>>> have two redundant copies of the index and no lock issues.
>>>> 
>>>> I could, but then I'd have the issue of keeping them in sync, which seems
>>>> more
>>>> fragile.  I think SAN makes things simpler overall.
>>>> 
>>>>> Also, Can't master02 just be a slave to master01 (in the master farm  and
>>>>> separate from the slave farm) until such time as master01 fails?   Then
>>>> 
>>>> No, because it wouldn't be in sync.  It would always be N minutes behind,
>>>> and
>>>> when the primary master fails, the secondary would not have all the docs -
>>>> data
>>>> loss.
>>>> 
>>>>> master02 would start receiving the new documents with an  indexes
>>>>> complete up to the last replication at least and the other slaves  would
>>>>> be directed by LB to poll master02 also...
>>>> 
>>>> Yeah, "complete up to the last replication" is the problem.  It's a data
>>>> gap
>>>> that now needs to be filled somehow.
>>>> 
>>>> Otis
>>>> ----
>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>> Lucene ecosystem search :: http://search-lucene.com/
>>>> 
>>>> 
>>>>> -----Original  Message-----
>>>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>>>>> Sent: Wednesday, March 09, 2011 9:47 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject:  Re: True master-master fail-over without data gaps (choosing CA
>>>>> in  CAP)
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> 
>>>>> ----- Original Message ----
>>>>>> From: Walter  Underwood <wun...@wunderwood.org>
>>>>> 
>>>>>> On  Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote:
>>>>>> 
>>>>>>> You mean  it's  not possible to have 2 masters that are in nearly
>>>>> real-time
>>>>>> sync?
>>>>>>> How  about with DRBD?  I know people use  DRBD to keep 2 Hadoop NNs
>>>>> (their
>>>>>> edit
>>>>>> 
>>>>>>> logs) in  sync to avoid the current NN SPOF, for example, so I'm
>>>>> thinking
>>>>>> this
>>>>>> 
>>>>>>> could be doable with Solr masters, too, no?
>>>>>> 
>>>>>> If you add fault-tolerant, you run into the CAP  Theorem.  Consistency,
>>>>> 
>>>>>> availability, partition: choose two. You cannot have  it  all.
>>>>> 
>>>>> Right, so I'll take Consistency and Availability, and I'll  put my 2
>>>>> masters in
>>>>> the same rack (which has redundant switches, power  supply, etc.) and
>>>>> thus
>>>>> minimize/avoid partitioning.
>>>>> Assuming the above  actually works, I think my Q remains:
>>>>> 
>>>>> How do you set up 2 Solr masters so  they are in near real-time sync?
>>>>> DRBD?
>>>>> 
>>>>> But here is maybe a simpler  scenario that more people may be
>>>>> considering:
>>>>> 
>>>>> Imagine 2 masters on 2  different servers in 1 rack, pointing to the same
>>>>> index
>>>>> on the shared  storage (SAN) that also happens to live in the same rack.
>>>>> 2 Solr masters are  behind 1 LB VIP that indexer talks to.
>>>>> The VIP is configured so that all  requests always get routed to the
>>>>> primary
>>>>> master (because only 1 master  can be modifying an index at a time),
>>>>> except when
>>>>> this primary is down,  in which case the requests are sent to the
>>>>> secondary
>>>>> master.
>>>>> 
>>>>> So in  this case my Q is around automation of this, around Lucene index
>>>>> locks,
>>>>> around the need for manual intervention, and such.
>>>>> Concretely, if you  have these 2 master instances, the primary master has
>>>>> the
>>>>> Lucene index  lock in the index dir.  When the secondary master needs to
>>>>> take
>>>>> over  (i.e., when it starts receiving documents via LB), it needs to be
>>>>> able to
>>>>> write to that same index.  But what if that lock is still around?   One
>>>>> could use
>>>>> the Native lock to make the lock disappear if the primary  master's JVM
>>>>> exited
>>>>> unexpectedly, and in that case everything *should*  work and be
>>>>> completely
>>>>> transparent, right?  That is, the secondary  will start getting new docs,
>>>>> it will
>>>>> use its IndexWriter to write to that  same shared index, which won't be
>>>>> locked
>>>>> for writes because the lock is  gone, and everyone will be happy.  Did I
>>>>> miss
>>>>> something important  here?
>>>>> 
>>>>> Assuming the above is correct, what if the lock is *not* gone  because
>>>>> the
>>>>> primary master's JVM is actually not dead, although maybe  unresponsive,
>>>>> so LB
>>>>> thinks the primary master is dead.  Then the LB  will route indexing
>>>>> requests to
>>>>> the secondary master, which will attempt  to write to the index, but be
>>>>> denied
>>>>> because of the lock.  So a  human needs to jump in, remove the lock, and
>>>>> manually
>>>>> reindex failed docs  if the upstream component doesn't buffer docs that
>>>>> failed to
>>>>> get indexed  and doesn't retry indexing them automatically.  Is this
>>>>> correct or
>>>>> is there a way to avoid humans  here?
>>>>> 
>>>>> Thanks,
>>>>> Otis
>>>>> ----
>>>>> Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
>>>>> Lucene ecosystem search :: http://search-lucene.com/
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> http://twitter.com/tjake
>> 
>>

Re: True master-master fail-over without data gaps (choosing CA in CAP)

Reply via email to