Re: hash and range partition uneven distribution for one tablet server

Alexey Serbin Thu, 26 Mar 2020 20:38:09 -0700

Hi,

Do you mean you still see uneven distribution of leader replicas?



Thanks,

Alexey

On Thu, Mar 26, 2020 at 7:56 PM Fisk Xia <wuskyfant...@gmail.com> wrote:

> Hi,
>
> Thanks for you time and attention.
>
> To further elaborate the situation, we are having replication factor = 1.
> We have tried running Kudu Rebalancer Tool, we still seeing uneven
> distribution as mentioned.
>
> Please advice us if there is any alternative to improve the situation
> other than running the Kudu Rebalancer Tool.
>
> Thank you.
>
>
> On 2020/03/25 04:43:13, Adar Lieber-Dembo <a...@cloudera.com> wrote:
>
> What you're seeing sort of makes sense given that partition assignment>
> uses "power of 2" selection process: two servers are chosen at random,>
> and the one with the fewer partitions is selected as the recipient of>
> the new partition. Given enough partitions, this algorithm should>
> result in an even distribution of partitions across servers. But since>
> you're only assigning 5 (or 15, if the replication factor is 3)>
> partitions to 5 servers, there may be some skew.>
>
> Have you tried running the Kudu rebalancer tool? That's "kudu cluster>
> rebalance". It'll redistribute your partitions to minimize skew across>
> tservers.>
>
> All that said, we currently don't have a mechanism to distribute>
> tablet leaders evenly across the cluster, so you may still see>
> hotspotting on writes if one server happens to host more leaders than>
> the others and if those leaders are servicing a high write load.>
>
> On Tue, Mar 24, 2020 at 9:09 PM 夏天松 <wu...@gmail.com> wrote:>
>
>
> I have a hot data insert problem when using kudu. If I use both hash and
> range partition, all buckets will be unevenly distributed.>
> Kudu cluster distribution:  3 master and 5 tablet server>
>
>
> My create table sql:>
> CREATE TABLE tmp.sales_by_year (>
> device_id STRING NOT NULL,>
> update_date STRING NOT NULL,>
> update_time STRING NOT NULL,>
> object_name STRING NOT NULL,>
> attribute_name STRING NOT NULL,>
> present_value STRING NULL,>
> PRIMARY KEY (device_id, update_date, update_time, object_name,
> attribute_name)>
> )>
> PARTITION BY HASH (device_id) PARTITIONS 5, RANGE (update_date) (>
> PARTITION '2020-03-21'<= VALUES < '2020-03-22'>
> )>
> STORED AS KUDU;>
>
>
> Then I hope when update_date = '2020-03-21' , every tablet server has one
> partition , but the real distribution is not like this. The real
> distribution is that some machines have no partitions, and some have 2 or 3
> partitions. This situation leads to high CPU usage on some machines when
> writing large amounts of time series data.>
>
>
> Please help me, how can i solve this problem.>
>
>

Re: hash and range partition uneven distribution for one tablet server

Reply via email to