You are right, currently Helix distributes partitions from the same resource (DB or table) evenly across nodes, i.e, each node will hold roughly same number of partitions from one DB. If the traffic (or data size) across partitions from same DB is even, but different DBs have different traffic load, that is fine though.
We are actively developing a new rebalancer (https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer) which Helix will distribute partitions based on their weight (a vector of metrics defined by the user). This will help us to solve the "hot" partitions issue. We are expecting this new rebalancer will be released by end of this year. Lei [https://avatars3.githubusercontent.com/u/47359?s=400&v=4]<https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer> apache/helix<https://github.com/apache/helix/wiki/Weight-aware-Globally-Evenly-distributed-Rebalancer> Mirror of Apache Helix. Contribute to apache/helix development by creating an account on GitHub. github.com Lei Xia Data Infra/Helix [email protected]<mailto:[email protected]> www.linkedin.com/in/lxia1<http://www.linkedin.com/in/lxia1> ________________________________ From: Jianzhou Zhao <[email protected]> Sent: Sunday, October 20, 2019 9:40 PM To: [email protected] <[email protected]> Subject: Re: who uses helix This is amazing. This number means each table could have thousands of partitions on average. At this scale, I feel the helix UI could be the bottleneck to monitor them :) Not a related question, but just considered it after seeing the numbers... In the helix code, I did not see how it balance partitions in terms of how busy a resource could be, but helix simply ensures numbers of shards are evenly distributed. What happens if some resources are busy for some reason, and their distribution make all other resources are busy because they share nodes. I am new to helix, and its tutorial says we can use semi-auto or customized balancing strategies, but am still curious at practice the semi-auto approaches are scaled. On Sun, Oct 20, 2019 at 9:31 PM kishore g <[email protected]<mailto:[email protected]>> wrote: At LinkedIn, Helix manages thousands on nodes. 1 million segments (equivalent of partitions) across thousands of tables. On Sun, Oct 20, 2019 at 9:16 PM Jianzhou Zhao <[email protected]<mailto:[email protected]>> wrote: Cool. In the rocksplicator case, how many partitions, replica + nodes a helix cluster can manage? On Sun, Oct 20, 2019 at 9:05 PM Bo Liu <[email protected]<mailto:[email protected]>> wrote: We extensively use Helix at Pinterest. This blog post has more details and some tips. https://medium.com/pinterest-engineering/automated-cluster-management-and-recovery-for-rocksplicator-f1f8fd35c833<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmedium.com%2Fpinterest-engineering%2Fautomated-cluster-management-and-recovery-for-rocksplicator-f1f8fd35c833&data=02%7C01%7Clxia%40linkedin.com%7C6e415a026ada4a5d1ec708d755e25ea7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637072303072003774&sdata=k97L59nrSwNWVd2jbtjiiezGyVOUA8SD6xsp2kkS7cQ%3D&reserved=0> On Sun, Oct 20, 2019 at 8:30 PM Jianzhou Zhao <[email protected]<mailto:[email protected]>> wrote: Hi, I was looking for who uses helix, and got https://cwiki.apache.org/confluence/display/HELIX/Powered+By+Helix<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FHELIX%2FPowered%2BBy%2BHelix&data=02%7C01%7Clxia%40linkedin.com%7C6e415a026ada4a5d1ec708d755e25ea7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637072303072013765&sdata=%2FU30Oqb%2F%2Fn9XaNZ%2Ft57qcdmOR9bpWgI%2FAUziQD%2BQK1A%3D&reserved=0> The link did not get updated after 2017. Is the list still update to date? Thank you, j -- Best regards, Bo
