On 6 Nov 2014, at 15:27, kishore g 
<[email protected]<mailto:[email protected]>> wrote:

Thanks Tom. Good observation. The reason Helix moves back the partition is to 
maintain equal distribution of locks at all times, if we don't move it back the 
node that came back up will be idle. This assumes the number of replicas is 
more than number of nodes.

I think I get this - if, say, all instances have a capacity of 2, then you 
might end up with some instances containing 2 and some 0, using the current 
rebalancing algorithm, which isn’t what you want (idle node). I guess the 
algorithm would need tweaking to make sure that every node had either capacity 
or capacity-1 partitions, so that those 0’s wouldn’t be acceptable in that case 
and would have partitions moved from nodes with full capacity. I could possibly 
look at making this change for you? I’d need info on how to submit patches.

For single partition or in general when the number of numPartitions * 
numReplicas < nodes, I agree that moving back is unneccesary. We can think of 
changing the algorithm smarter.

Same with second case, I expected minimum movement. Your suggestion makes 
sense. Kanak what do you think.

For the single partition use case, I think you can probably use LeaderStandby 
model and set the number of replicas to be number of nodes. In this case, I 
believe the leader will not move back when the old node comes back up. 
Kanak/Jason I believe we made this change some time back. Correct me if I am 
wrong.

I had a look at this option, but the problem is that I’d need to hard-code the 
number of instances, which I’d rather avoid. I guess it might work if I 
allocated a number larger than the expected number of nodes I’d ever have?

I tried setting up a state machine with ’N’ standby nodes, but 
ZKHelixAdmin.rebalance has some checks saying you can only have:

  *   no more than 1 state with an upper bound of 1
  *   no more than 1 state with an upper bound of R
  *   no more than 1 state with an upper bound of N, in which case you can’t 
have any other states with either R or 1 as their upper bound (which messes up 
my case, where I’d want 1 leader and (N-1) standbys, ideally)

Are those checks definitely all necessary for full-auto mode?

Any alternatives other than writing a user-defined rebalancer?

Thanks,

Tom
This email and any attachments are intended only for the addressees and may 
contain confidential and/or privileged material. Any processing of, or taking 
of any action in reliance upon, this information by persons or entities other 
than the intended addressees is prohibited. If you have received this in error, 
do not take a copy to your computer or removable media, or forward this email. 
Please contact the sender and delete this material. Cambridge Cognition has 
monitoring and scanning systems in place in relation to emails sent and 
received to: monitor / record business communications in order to prevent and 
detect crime; investigate the use of the Company's internal and external email 
system; and provide evidence of compliance with business practices. Company 
Registration Number 4338746 Registered address, Tunbridge Court, Tunbridge 
Lane, Bottisham, Cambridge, CB25 9TU, UK

Reply via email to