bq: Is adding replicas going to increase search performance? Absolutely, assuming you've maxed out Solr. You can scale the SOLR query/second rate nearly linearly by adding replicas regardless of whether it's over HDFS or not.
Having multiple replicas per shard _also_ increases fault tolerance, so you get both. Even with HDFS, though, a single replica (just a leader) per shard means that you don't have any redundancy if the motherboard on that server dies even though HDFS has multiple copies of the _data_. Best, Erick On Wed, Feb 25, 2015 at 12:01 PM, Joseph Obernberger <j...@lovehorsepower.com> wrote: > I am also confused on this. Is adding replicas going to increase search > performance? I'm not sure I see the point of any replicas when using HDFS. > Is there one? > Thank you! > > -Joe > > > On 2/25/2015 10:57 AM, Erick Erickson wrote: >> >> bq: And the data sync between leader/replica is always a problem >> >> Not quite sure what you mean by this. There shouldn't need to be >> any synching in the sense that the index gets replicated, the >> incoming documents should be sent to each node (and indexed >> to HDFS) as they come in. >> >> bq: There is duplicate index computing on Replilca side. >> >> Yes, that's the design of SolrCloud, explicitly to provide data safety. >> If you instead rely on the leader to index and somehow pull that >> indexed form to the replica, then you will lose data if the leader >> goes down before sending the indexed form. >> >> bq: My thought is that the leader and the replica all bind to the same >> data >> index directory. >> >> This is unsafe. They would both then try to _write_ to the same >> index, which can easily corrupt indexes and/or all but the first >> one to access the index would be locked out. >> >> All that said, the HDFS triple-redundancy compounded with the >> Solr leaders/replicas redundancy means a bunch of extra >> storage. You can turn the HDFS replication down to 1, but that has >> other implications. >> >> Best, >> Erick >> >> On Tue, Feb 24, 2015 at 11:12 PM, longsan <longsan...@sina.com> wrote: >>> >>> We used HDFS as our Solr index storage and we really have a heavy update >>> load. We had met much problems with current leader/replica solution. >>> There >>> is duplicate index computing on Replilca side. And the data sync between >>> leader/replica is always a problem. >>> >>> As HDFS already provides data replication on data layer, could Solr >>> provide >>> just service layer replication? >>> >>> My thought is that the leader and the replica all bind to the same data >>> index directory. And the leader will build up index for new request, the >>> replica will just keep update the index version with the leader(such as a >>> soft commit periodically? ). If the leader lost then the replica will >>> take >>> the duty immediately. >>> >>> Thanks for any suggestion of this idea. >>> >>> >>> >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/New-leader-replica-solution-for-HDFS-tp4188735.html >>> Sent from the Solr - User mailing list archive at Nabble.com. > >