Ok, this imply that if I have X replica of a shard, the document is indexed X+1 
times? one for each replica plus the leader shard? It seems to me a huge 
wasting of resources.  

In a Master/slave scenario indexing takes places only on master node, then 
slave replicates analyzed data.  

--
Gian Maria Ricci
Cell: +39 320 0136949
    


-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: lunedì 11 gennaio 2016 19:03
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Change leader in SolrCloud

You have to assign the preferredLeader role first. You can do that node-by-node 
via ADDREPLICAPROP or have the system do it for you with BALANCESHARDUNIQUE.

As I said before, in SolrCloud the leader forwards the raw document to each 
follower. There is no pre-processing, analysis anything else done on the leader 
first.

Best,
Erick

On Mon, Jan 11, 2016 at 9:19 AM, Gian Maria Ricci - aka Alkampfer 
<alkamp...@nablasoft.com> wrote:
> Thanks.
>
> This arise a different question: when I index a document, it is assigned to 
> one of the three shard based on the value of the ID field. Actually indexing 
> a document is usually a CPU and RAM intensive work to parse text, tokenize, 
> etc. How this works in SolrCloud? I probably incorrectly assumed that the 
> indexing task is carried out by the shard leader, then data is propagated to 
> replica of that shard. This lead me to think that, having all three leader 
> shards in a node, it does not use other nodes to index data and performance 
> will suffer.
>
> I've tried to use REBALANCELEADERS but nothing changes (probably because 
> there are few shards).
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: lunedì 11 gennaio 2016 17:49
> To: solr-user@lucene.apache.org
> Subject: Re: Change leader in SolrCloud
>
> On 1/11/2016 8:45 AM, Gian Maria Ricci - aka Alkampfer wrote:
>> Due to the different reboot times probably, I’ve noticed that upon 
>> reboot all three leader shards are on a single machine. I’m expecting 
>> shard leaders to be distributed evenly between machines, because if 
>> all shard leader are on a same machine, all new documents to index 
>> will be routed to the same machine, thus indexing load is not subdivided.
>
> You're looking for the REBALANCELEADERS functionality ... but because you 
> only have three nodes, the fact that one machine has the leaders for all 
> three shards is not really a problem.
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#Colle
> ctionsAPI-RebalanceLeaders
>
> This feature was added for a use case where there are hundreds of nodes and 
> hundreds of total shards, with the leader roles heavily concentrated on a 
> small number of nodes.  With REBALANCELEADERS, the leader roles can be spread 
> more evenly around the cluster.
>
> It is true that the shard leader does do a small amount of extra work, but 
> for a very small installation like yours, the overhead is nothing to be 
> concerned about.  You can do something about it if it bothers you, though.
>
> Thanks,
> Shawn
>

Reply via email to