Re: New leader/replica solution for HDFS

Erick Erickson Wed, 25 Feb 2015 13:36:08 -0800

bq: Is adding replicas going to increase search performance?

Absolutely, assuming you've maxed out Solr. You can scale the SOLR
query/second rate nearly linearly by adding replicas regardless of
whether it's over HDFS or not.


Having multiple replicas per shard _also_ increases fault tolerance,
so you get both. Even with HDFS, though, a single replica (just a
leader) per shard means that you don't have any redundancy if the
motherboard on that server dies even though HDFS has multiple copies
of the _data_.

Best,
Erick

On Wed, Feb 25, 2015 at 12:01 PM, Joseph Obernberger
<j...@lovehorsepower.com> wrote:
> I am also confused on this.  Is adding replicas going to increase search
> performance?  I'm not sure I see the point of any replicas when using HDFS.
> Is there one?
> Thank you!
>
> -Joe
>
>
> On 2/25/2015 10:57 AM, Erick Erickson wrote:
>>
>> bq: And the data sync between leader/replica is always a problem
>>
>> Not quite sure what you mean by this. There shouldn't need to be
>> any synching in the sense that the index gets replicated, the
>> incoming documents should be sent to each node (and indexed
>> to HDFS) as they come in.
>>
>> bq: There is duplicate index computing on Replilca side.
>>
>> Yes, that's the design of SolrCloud, explicitly to provide data safety.
>> If you instead rely on the leader to index and somehow pull that
>> indexed form to the replica, then you will lose data if the leader
>> goes down before sending the indexed form.
>>
>> bq: My thought is that the leader and the replica all bind to the same
>> data
>> index directory.
>>
>> This is unsafe. They would both then try to _write_ to the same
>> index, which can easily corrupt indexes and/or all but the first
>> one to access the index would be locked out.
>>
>> All that said, the HDFS triple-redundancy compounded with the
>> Solr leaders/replicas redundancy means a bunch of extra
>> storage. You can turn the HDFS replication down to 1, but that has
>> other implications.
>>
>> Best,
>> Erick
>>
>> On Tue, Feb 24, 2015 at 11:12 PM, longsan <longsan...@sina.com> wrote:
>>>
>>> We used HDFS as our Solr index storage and we really have a heavy update
>>> load. We had met much problems with current leader/replica solution.
>>> There
>>> is duplicate index computing on Replilca side. And the data sync between
>>> leader/replica is always a problem.
>>>
>>> As HDFS already provides data replication on data layer, could Solr
>>> provide
>>> just service layer replication?
>>>
>>> My thought is that the leader and the replica all bind to the same data
>>> index directory. And the leader will build up index for new request, the
>>> replica will just keep update the index version with the leader(such as a
>>> soft commit periodically? ). If the leader lost then the replica will
>>> take
>>> the duty immediately.
>>>
>>> Thanks for any suggestion of this idea.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/New-leader-replica-solution-for-HDFS-tp4188735.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: New leader/replica solution for HDFS

Reply via email to