I have a hot data insert problem when using kudu. If I use both hash and
range partition, all buckets will be unevenly distributed.
Kudu cluster distribution:  3 master and 5 tablet server

My create table sql:
CREATE TABLE tmp.sales_by_year (
  device_id STRING NOT NULL,
  update_date STRING NOT NULL,
  update_time STRING NOT NULL,
  object_name STRING NOT NULL,
  attribute_name STRING NOT NULL,
  present_value STRING NULL,
  PRIMARY KEY (device_id, update_date, update_time, object_name,
attribute_name)
)
PARTITION BY HASH (device_id) PARTITIONS 5, RANGE (update_date) (
  PARTITION '2020-03-21'<= VALUES < '2020-03-22'
)
STORED AS KUDU;

Then I hope when update_date = '2020-03-21' , every tablet server has one
partition , but the real distribution is not like this. The real
distribution is that some machines have no partitions, and some have 2 or 3
partitions. This situation leads to high CPU usage on some machines when
writing large amounts of time series data.

Please help me, how can i solve this problem.

Reply via email to