On Tue, Mar 20, 2018 at 2:15 AM, Кравец Владимир Александрович < [email protected]> wrote:
> Hi, I'm new to Kudu and I'm trying to understand the applicability for our > purposes. So I met the following article about the kudu limitations - > https://www.cloudera.com/documentation/enterprise/latest/topics/kudu_ > limitations.html#concept_cws_n4n_5z. Do I understand correctly that this > means that the maximum total amount of usefull compressed stored data in > one kudu-table is 8TB? Here my calcs: > I think there are a few mistakes below. Comments lineline. > 1. Amount of stored data per tablet = Recommended maximum amount of stored > data / Recommended maximum number of tablets per tablet server = 8 000 / 2 > 000 = 4 GB per tablet > That assumes that every tablet is equally sized and that you have hit the limit on number of tablets. Even though you _can_ have 2000 tablets per server, you might want fewer. In addition, you don't need to have every tablet be the same size -- some might be 10GB while others might be 1GB or smaller. > 2. Maximum number of tablets per table for each tablet server > pre-replication = Maximum number of tablets per table for each tablet > server is 60, post-replication / number of replicas = 60 / 3 = 20 tablets > per table per tablet server > The key word that you didn't copy here is "at table-creation time". This limitation has to do with avoiding some issues we have seen when trying to create too many tablets at the same time on the cluster. With range partitioning, you can always add more partitions later. For example it's very common to add a new partition for each day. So, a single table can, after some days, have more than 20 tablets on a given server. > 3. Total amount of stored data per table, pre-replication = Amount of > stored data per tablet * Maximum number of tablets per table for each > tablet server pre-replication * Maximum number of tablet servers = 4 GB * > 20 * 100 = 8TB > Per above, this isn't really the case. For example, on one cluster at Cloudera which runs an internal workload, we have one table that is 82TB and another which is 46TB. I've seen much larger tables in some user installations as well. > And I also would like to understand how fundamental the nature of the > limitation "Maximum number of tablets per table for each tablet server is > 60, post-replication"? Is it possible that this restriction will be removed? > See above. -Todd -- Todd Lipcon Software Engineer, Cloudera
