Hi team , I am looking for some storage solution that can give high ingestion/update rates and able to run OLAP queries, Apache Kudu looks one promising solution, Please help me to check if Apache Kudu is correct fit
Use Case: ------------ . I am receiving 40K records per sec. record size is less, 5 fields max. 2 string 2 timestamp 1 number. With primary key I will be getting ~ 2 billion unique records per day and rest will be updates. With Apache Spark aggregation we can reduce 20% of updates. TTL of each record will be 30 days. How much data can we store in kudu per node ? With large updates , will get/scan request become slow over time ? How much large tables can we create in Kudu ? Will random read and update be supported at this scale ? How many parallel ingestion jobs can we in a Kudu, for different tables ? Please suggest some articles related to kudu sizing and performance. Regards, Chetan Rautela