Hi, Do you mean you still see uneven distribution of leader replicas?
Thanks, Alexey On Thu, Mar 26, 2020 at 7:56 PM Fisk Xia <wuskyfant...@gmail.com> wrote: > Hi, > > Thanks for you time and attention. > > To further elaborate the situation, we are having replication factor = 1. > We have tried running Kudu Rebalancer Tool, we still seeing uneven > distribution as mentioned. > > Please advice us if there is any alternative to improve the situation > other than running the Kudu Rebalancer Tool. > > Thank you. > > > On 2020/03/25 04:43:13, Adar Lieber-Dembo <a...@cloudera.com> wrote: > > What you're seeing sort of makes sense given that partition assignment> > uses "power of 2" selection process: two servers are chosen at random,> > and the one with the fewer partitions is selected as the recipient of> > the new partition. Given enough partitions, this algorithm should> > result in an even distribution of partitions across servers. But since> > you're only assigning 5 (or 15, if the replication factor is 3)> > partitions to 5 servers, there may be some skew.> > > Have you tried running the Kudu rebalancer tool? That's "kudu cluster> > rebalance". It'll redistribute your partitions to minimize skew across> > tservers.> > > All that said, we currently don't have a mechanism to distribute> > tablet leaders evenly across the cluster, so you may still see> > hotspotting on writes if one server happens to host more leaders than> > the others and if those leaders are servicing a high write load.> > > On Tue, Mar 24, 2020 at 9:09 PM 夏天松 <wu...@gmail.com> wrote:> > > > I have a hot data insert problem when using kudu. If I use both hash and > range partition, all buckets will be unevenly distributed.> > Kudu cluster distribution: 3 master and 5 tablet server> > > > My create table sql:> > CREATE TABLE tmp.sales_by_year (> > device_id STRING NOT NULL,> > update_date STRING NOT NULL,> > update_time STRING NOT NULL,> > object_name STRING NOT NULL,> > attribute_name STRING NOT NULL,> > present_value STRING NULL,> > PRIMARY KEY (device_id, update_date, update_time, object_name, > attribute_name)> > )> > PARTITION BY HASH (device_id) PARTITIONS 5, RANGE (update_date) (> > PARTITION '2020-03-21'<= VALUES < '2020-03-22'> > )> > STORED AS KUDU;> > > > Then I hope when update_date = '2020-03-21' , every tablet server has one > partition , but the real distribution is not like this. The real > distribution is that some machines have no partitions, and some have 2 or 3 > partitions. This situation leads to high CPU usage on some machines when > writing large amounts of time series data.> > > > Please help me, how can i solve this problem.> > >