I believe it's #2, but perhaps I should explain:

Here's a simplified view of mapFields;
  "mapFields" : {
    "partition_11" : {
      "server05.verizonmedia.com" : "ONLINE"
    },
    "partition_22" : {
      "server05.verizonmedia.com" : "ONLINE"
    },
},

Server 5 has partitions (replicas?) 11 and 22 assigned to it; and that's
currently fine. We could, for example, have partition_17 also assigned,
which would be fine, but if a fourth one were to be assigned then we stand
a high likelihood of crashing.

Bootstrapping replicas is also expensive, so we'd like to minimize that as
well.

On Mon, Feb 8, 2021 at 3:14 PM Xue Junkai <[email protected]> wrote:

> Thanks Phong. Can you clarify which you are looking for?
> 1. parallel number of state transitions for bootstrapping replicas.
> 2. number of replicas holding in an instance for limitation.
>
> Best,
>
> Junkai
>
>
> On Mon, Feb 8, 2021 at 3:06 PM Phong X. Nguyen <[email protected]>
> wrote:
>
>> Hello!
>>
>> I'm currently on a project that uses Apache Helix 0.8.4 (with a pending
>> upgrade to Helix 1.0.1) to distribute partitions across a number of hosts
>> (currently 32 partitions, 16 hosts). Once a partition is allocated to a
>> host a bunch of expensive initialization steps occur, and the
>> system proceeds to do a bunch of computations for the partition on a
>> scheduled interval. We seek to minimize initializations when possible.
>>
>> If a system goes down (due to either maintenance or failure), the
>> partitions get reshuffled. Currently we are using
>> the CrushEdRebalanceStrategy in the hopes of minimizing partition
>> movements. However, we noticed that unlike the earlier AutoRebalancer
>> scheme, the CrushEdRebalanceStrategy does not limit the number of
>> partitions per node. In our case, this can cause severe out-of-memory
>> issues, which will then cascade as node after node gets more and more
>> partitions that it cannot handle. We have on rare occasion seen our entire
>> cluster fail as a result, and then our production engineers must manually -
>> and carefully - bring the system back online. This is undesirable.
>>
>> Does Helix have a rebalancing strategy that minimizes partition movement
>> yet also permits enforcement of maximum partitions per node?
>>
>> Thanks,
>> - Phong X. Nguyen
>>
>

Reply via email to