Kudu cluster sizing questions

Chetan Rautela Tue, 21 Sep 2021 23:29:43 -0700

Hi team ,

I am looking for some storage solution that can give high ingestion/update 
rates and able to run OLAP queries, Apache Kudu looks one promising solution,
Please help me to check if Apache Kudu is correct fit


Use Case: 
------------ . 
        I am receiving 40K records per sec. record size is less, 5 fields max. 
2 string 2 timestamp 1 number.  
        With primary key I will be getting ~ 2 billion unique records per day 
and rest will be updates. 
        With Apache Spark aggregation we can reduce 20% of updates. 
        TTL of each record will be 30 days. 

How much data can we store in kudu per node ? 
With large updates , will get/scan request become slow over time ? 
How much large tables can we create in Kudu ? 
Will random read and update be supported at this scale ? 
How many parallel ingestion jobs can we in a Kudu, for different tables ?


Please suggest some articles related to kudu sizing and performance.

Regards,
Chetan Rautela

Kudu cluster sizing questions

Reply via email to