Re: Auto-rebalancing question

kishore g Sun, 09 Nov 2014 23:24:52 -0800

wow, you should really stop using Microsoft :-)

On Sun, Nov 9, 2014 at 11:09 PM, Kanak Biscuitwala <[email protected]>
wrote:


> I'm not sure why Microsoft is so bad at formatting email. Let me try that
> again: Sorry for the late reply. Here are some general comments: - The
> original behavior described is "by design" but probably handled too
> stringently. The idea is that each partition replica should have some
> "affinity" to nodes, so that if there are multiple node restart events, in
> the general case the partition movement is minimized (i.e. the only
> movements occur when the node with affinity to the partition either enters
> or exits the cluster). - It's a known issue that lexicographically earlier
> node names will by default have the higher "remainder" capacities. There is
> a routine in the algorithm that allows "stealing" capacities, but that only
> comes into play when an unassigned replica cannot be assigned to any node
> with available capacity (i.e. all nodes with capacity are already serving a
> replica for that partition). As Kishore said, we can definitely do better.
> A good fix to this bug is to assign remainder capacities first based on
> existing assignments. - There should never be a situation in which some
> live instances have capacity C and others have capacity D>= C+2 (or <=
> C-2). All live node capacities should differ by at most 1. If a node is not
> live, then yes it has capacity 0 because it doesn't logically make sense
> for a non-live node to accept replicas. If any of that is not true, then
> there's a bug in the algorithm. - You don't need to hard-code the number of
> instances, but it would probably help the algorithm if the new instances
> are lexicographically after the existing ones. You can alternatively
> provide a different ReplicaPlacementScheme which more appropriately handles
> default node affinity in AutoRebalanceStrategy. - ZKHelixAdmin is
> definitely more restrictive than it needs to be; it was implemented well
> before AutoRebalanceStrategy was. - See these links for contributing to
> Helix (either approach works for us, but submitting a pull request on
> GitHub is probably faster for you):
> https://cwiki.apache.org/confluence/display/HELIX/Contributor+Workflow
> and
> https://cwiki.apache.org/confluence/display/HELIX/Merging+Pull+Requests -
> Kishore, I'm not sure what you're referring to regarding using
> LeaderStandby. For FULL_AUTO LeaderStandby, we will definitely take
> previous state assignment into account when computing a new assignment, and
> won't initially change states if possible, but the algorithm does try to
> eventually settle on an assignment where states are evenly distributed when
> possible. This could mean extra state transitions that occur eventually in
> order to maintain that balance. > From: [email protected] > To:
> [email protected] > Subject: RE: Auto-rebalancing question > Date:
> Sun, 9 Nov 2014 23:06:59 -0800 > > Sorry for the late reply. Here are some
> general comments:
>
> - The original behavior described is "by design" but probably handled too
> stringently. The idea is that each partition replica should have some
> "affinity" to nodes, so that if there are multiple node restart events, in
> the general case the partition movement is minimized (i.e. the only
> movements occur when the node with affinity to the partition either enters
> or exits the cluster).
>
> - It's a known issue that lexicographically earlier node names will by
> default have the higher "remainder" capacities. There is a routine in the
> algorithm that allows "stealing" capacities, but that only comes into play
> when an unassigned replica cannot be assigned to any node with available
> capacity (i.e. all nodes with capacity are already serving a replica for
> that partition). As Kishore said, we can definitely do better. A good fix
> to this bug is to assign remainder capacities first based on existing
> assignments.
>
> - There should never be a situation in which some live instances have
> capacity C and others have capacity D >= C+2 (or <= C-2). All live node
> capacities should differ by at most 1. If a node is not live, then yes it
> has capacity 0 because it doesn't logically make sense for a non-live node
> to accept replicas. If any of that is not true, then there's a bug in the
> algorithm.
>
> - You don't need to hard-code the number of instances, but it would
> probably help the algorithm if the new instances are lexicographically
> after the existing ones. You can alternatively provide a different
> ReplicaPlacementScheme which more appropriately handles default node
> affinity in AutoRebalanceStrategy.
>
> - ZKHelixAdmin is definitely more restrictive than it needs to be; it was
> implemented well before AutoRebalanceStrategy was.
>
> - See these links for contributing to Helix (either approach works for us,
> but submitting a pull request on GitHub is probably faster for you):
> https://cwiki.apache.org/confluence/display/HELIX/Contributor+Workflow
>  and
> https://cwiki.apache.org/confluence/display/HELIX/Merging+Pull+Requests
>
> - Kishore, I'm not sure what you're referring to regarding using
> LeaderStandby. For FULL_AUTO LeaderStandby, we will definitely take
> previous state assignment into account when computing a new assignment, and
> won't initially change states if possible, but the algorithm does try to
> eventually settle on an assignment where states are evenly distributed when
> possible. This could mean extra state transitions that occur eventually in
> order to maintain that balance.
>
> ________________________________
> > Date: Sun, 9 Nov 2014 22:43:15 -0800
> > Subject: Re: Auto-rebalancing question
> > From: [email protected]
> > To: [email protected]
> >
> > I will try this and get back to you.
> >
> > On Fri, Nov 7, 2014 at 8:21 AM, Tom Widmer
> > <[email protected]<mailto:[email protected]>> wrote:
> > On 6 Nov 2014, at 15:27, kishore g
> > <[email protected]<mailto:[email protected]>> wrote:
> >
> > Thanks Tom. Good observation. The reason Helix moves back the partition
> > is to maintain equal distribution of locks at all times, if we don't
> > move it back the node that came back up will be idle. This assumes the
> > number of replicas is more than number of nodes.
> >
> > I think I get this - if, say, all instances have a capacity of 2, then
> > you might end up with some instances containing 2 and some 0, using the
> > current rebalancing algorithm, which isn’t what you want (idle node). I
> > guess the algorithm would need tweaking to make sure that every node
> > had either capacity or capacity-1 partitions, so that those 0’s
> > wouldn’t be acceptable in that case and would have partitions moved
> > from nodes with full capacity. I could possibly look at making this
> > change for you? I’d need info on how to submit patches.
> >
> > For single partition or in general when the number of numPartitions *
> > numReplicas < nodes, I agree that moving back is unneccesary. We can
> > think of changing the algorithm smarter.
> >
> > Same with second case, I expected minimum movement. Your suggestion
> > makes sense. Kanak what do you think.
> >
> > For the single partition use case, I think you can probably use
> > LeaderStandby model and set the number of replicas to be number of
> > nodes. In this case, I believe the leader will not move back when the
> > old node comes back up. Kanak/Jason I believe we made this change some
> > time back. Correct me if I am wrong.
> >
> > I had a look at this option, but the problem is that I’d need to
> > hard-code the number of instances, which I’d rather avoid. I guess it
> > might work if I allocated a number larger than the expected number of
> > nodes I’d ever have?
> >
> > I tried setting up a state machine with ’N’ standby nodes, but
> > ZKHelixAdmin.rebalance has some checks saying you can only have:
> >
> > * no more than 1 state with an upper bound of 1
> > * no more than 1 state with an upper bound of R
> > * no more than 1 state with an upper bound of N, in which case you
> > can’t have any other states with either R or 1 as their upper bound
> > (which messes up my case, where I’d want 1 leader and (N-1) standbys,
> > ideally)
> >
> > Are those checks definitely all necessary for full-auto mode?
> >
> > Any alternatives other than writing a user-defined rebalancer?
> >
> > Thanks,
> >
> > Tom
> > This email and any attachments are intended only for the addressees and
> > may contain confidential and/or privileged material. Any processing of,
> > or taking of any action in reliance upon, this information by persons
> > or entities other than the intended addressees is prohibited. If you
> > have received this in error, do not take a copy to your computer or
> > removable media, or forward this email. Please contact the sender and
> > delete this material. Cambridge Cognition has monitoring and scanning
> > systems in place in relation to emails sent and received to: monitor /
> > record business communications in order to prevent and detect crime;
> > investigate the use of the Company's internal and external email
> > system; and provide evidence of compliance with business practices.
> > Company Registration Number 4338746 Registered address, Tunbridge
> > Court, Tunbridge Lane, Bottisham, Cambridge, CB25 9TU, UK
> >
>

Re: Auto-rebalancing question

Reply via email to