Hi guys, I was really excited about Kudu until I saw this:
https://kudu.apache.org/docs/known_issues.html - Recommended maximum amount of stored data, post-replication and post-compression, per tablet server is 8TB. - Recommended maximum number of tablets per tablet server is 2000, post-replication. - Maximum number of tablets per table for each tablet server is 60, post-replication, at table-creation time. These numbers are very concerning to me because the project I am working on will have 300+ plus tables and 20 tables have over 1B rows, 50-100 tables are 200M rows in average and the rest are below 50M rows. I want to see if I can build near real-time data lake, ingesting data from our source rdbms systems. My cluster is 30 nodes cluster with 12 spinning HDDs each (each drive is 8Tb) and each node is 2 CPU 22 core beast with 512Gb of DDR4 RAM. Does these limitations above still apply in my case? Looks like I can only have 24Tb worth of data in Kudu which is way below that I need. My modest estimate is 80-100Tb. Also concerned that I can only have 20,000 tablets after replication - as I mentioned above I am going to have a bunch of tables with lots of rows. I do not have an option to pick a different hardware configuration for our cluster. thanks
