Re: use jumping hash with Helix

Wang Jiajun Fri, 25 May 2018 10:33:07 -0700

Changing the partition list can be done by modifying the resource
IdeaStates (will be in the resource config in the future).
Basically, you can:
1. modify the partition list in IS directly or
2. reconfigure the number of partitions and call HelixAdmin.rebalance() to
automatically regenerate the list with default partition names.


After IS changed, next controller rebalance will assign the new partitions.

As for the additional partition info, some of our users leverage the
property store. I think you can do the same.

Best Regards,
Jiajun

On Thu, May 24, 2018 at 9:03 PM, Chris Lu <[email protected]> wrote:

> Jiajun, Thanks for answering!
>
> I could not find any doc about "Helix supports changing the partition
> list".
>
> Bootstrap a partition replica can be done by reading off other existing
> partitions, as long as the new partition replica knows its own partition
> index.
>
> Basically in my design, all machines follow the same Kafka topic, but they
> may have progressed to different Kafka offsets. When data moves, they need
> to resume from the original offsets.
> It's hard to track different offsets for many moving partitions. Is there
> any way to put some meta data, such as the Kafka offsets, to the moving
> partition?
>
> Chris
>
> On Thu, May 24, 2018 at 3:24 PM, Wang Jiajun <[email protected]>
> wrote:
>
>> Hi Chris,
>>
>> Could you please elaborate the expected behavior a little bit?
>> Helix supports changing the partition list now.
>> But application needs to define how data is transferred between different
>> partitions. For example, when the new partition replica is boot up, the
>> state transition logic can move data accordingly.
>> If you are using CRUSH algorithm, the existing partitions' placement
>> should be stable.
>>
>>
>> Best Regards,
>> Jiajun
>>
>> On Mon, May 21, 2018 at 11:12 PM, Chris Lu <[email protected]> wrote:
>>
>>> In most distributed systems, the data is over sharded. Helix seems
>>> taking this as an assumption.
>>>
>>> Is there any way to use Helix to manage splitting the shards, for data
>>> stores?
>>>
>>> I am trying to fit the jump consistent hash into Helix model.
>>>
>>> Basically jumping hash can change from N shards to M shards, where N and
>>> M can be just any positive integers. If growing, the (M-N)/M data on N
>>> shards would be moved to the new M-N shards.
>>>
>>> https://arxiv.org/abs/1406.2294
>>>
>>> Jumping Hash is needed to provide an atomic operation to switch to the
>>> new topology. When growing, the data queries still go to the existing
>>> shards, until the new shards are ready. So the new servers can prepare data
>>> as fast as possible, rather than having to throttle the data preparation.
>>>
>>> Chris
>>>
>>>
>>
>

Re: use jumping hash with Helix

Reply via email to