I mostly agree with the solution proposed by Asmund. One addition would be
that you can have replicas for each partition and that would give you fault
tolerance. You can use FULL AUTO mode and let Helix manage everything.

Regarding the statement "My understanding of helix is that it isn't trivial
to dynamically add a partition", it is possible to add partitions
dynamically and we heavily rely on that feature in Pinot.

thanks,
Kishore G

On Thu, Jul 12, 2018 at 11:06 AM, Åsmund Tokheim <[email protected]> wrote:

> Hi
>
> I'm not that experienced with helix, so wait and see if anyone offers any
> corrections.
>
> My understanding of helix is that it isn't trivial to dynamically add a
> partition, and you in any case wouldn't want thousands of partitions or
> 'tasks'.
>
> For a problem like yours, I would define one task with say 100 partitions.
> When the load balancer receives an entity id, you could use something like
> consistent hashing to identify what partition that id belongs to.
>
> That would also to some degree reduce your need for resource weights, as
> averaged over thousand random entities, each partition should roughly be
> the same. I'm not aware of any concept like resource/partition weights but
> you can probably achieve the same effect by using custom rebalanced.
>
> Regards
> Åsmund
>
>
> On Thu, 12 Jul 2018, 16:56 Diot Sébastien, <[email protected]> wrote:
>
>> Hi,
>>
>> First message. I've just discovered Apache Helix, while looking at
>> Pinterest Rocksplicator. I was wandering if Helix could replace our
>> home-grown load-balancer.
>>
>> We have a productive cluster of 12 "large" Java application servers, with
>> a home-grown Java load-balancer, that acts as a "registry", but not as a
>> reverse-proxy. The clients call the load-balancer with the "entity ID" they
>> want to access, and the load-balancer returns them a URL to the application
>> server they should (currently) use to access that entity. The entities are
>> stored in a central DB, and the application servers session pools provide a
>> cache to reduce the load on the DB. The entities vary in size by a factor
>> of 100, with the vast majority (95%) being "small", but a few being "very
>> large". We have about 130K "entities", and per day about 7K different
>> entities are accessed.
>>
>> Firstly, I'm not sure if this would be modeled as one single "task", and
>> one "partition" per currently cached entity (dynamically added and
>> removed), OR one task per entity (dynamically added and removed), and a
>> single partition per task. Since all data is basically changed on each
>> access, and is stored in a central DB, we have no use for "replicas".
>>
>> Secondly, since the "size" of each entity can vary a lot, our LB takes
>> the entity size into consideration (together with CPU load and a few other
>> factors) when computing the "load" of each node. So, can "resource weight"
>> be taken into consideration when load-balancing using Helix?
>>
>> Regards,
>>
>> Sébastien Diot
>> Softwareentwickler
>> Softwareentwicklung edlohn
>>
>> eurodata AG | Großblittersdorfer Str. 257-259 | D-66119 Saarbrücken
>> <https://maps.google.com/?q=Gro%C3%9Fblittersdorfer+Str.+257-259+%7C+D-66119+Saarbr%C3%BCcken&entry=gmail&source=g>
>> Telefon +49 681 8808 768 | Telefax +49 681 8808 787
>> [email protected] | www.eurodata.de | www.facebook.com/eurodata.de
>>
>> HRB 101336 Amtsgericht Saarbrücken | USt-IdNr. DE 182634634
>> Aufsichtsratsvorsitzender: Franz-Josef Wernze
>> Vorstand: Dieter Leinen
>>
>>

Reply via email to