Hi guys,

I was really excited about Kudu until I saw this:

https://kudu.apache.org/docs/known_issues.html


   -

   Recommended maximum amount of stored data, post-replication and
   post-compression, per tablet server is 8TB.
   -

   Recommended maximum number of tablets per tablet server is 2000,
   post-replication.
   -

   Maximum number of tablets per table for each tablet server is 60,
   post-replication, at table-creation time.

These numbers are very concerning to me because the project I am working on
will have 300+ plus tables and 20 tables have over 1B rows, 50-100 tables
are 200M rows in average and the rest are below 50M rows. I want to see if
I can build near real-time data lake, ingesting data from our source rdbms
systems.

My cluster is 30 nodes cluster with 12 spinning HDDs each (each drive is
8Tb) and each node is 2 CPU 22 core beast with 512Gb of DDR4 RAM.

Does these limitations above still apply in my case? Looks like I can only
have 24Tb worth of data in Kudu which is way below that I need. My modest
estimate is 80-100Tb.

Also concerned that I can only have 20,000 tablets after replication - as I
mentioned above I am going to have a bunch of tables with lots of rows.

I do not have an option to pick a different hardware configuration for our
cluster.

thanks

Reply via email to