Right, I think you're interpreting that correctly. If you're feeling adventurous, you could experiment with those limits even further :)
Node density is something we're tracking and hoping to improve in the near future. There has already been some pretty drastic bumps in this area (see here <https://issues.apache.org/jira/browse/KUDU-1967>), although I don't think there's an exact timeline. Andrew On Wed, Nov 29, 2017 at 11:16 AM, Boris Tyukin <[email protected]> wrote: > thanks for your response, Andrew. every node has 12 8Tb hdds - so 96 Tb > total per node. our production cluster will have 30 nodes so 2.8 PTb total > of local hdd space. Looks like with Kudu we will only be able to use 8Tb x > 30 = 240Tb total before replication so realistically it will be 80Tb top. > Can you confirm that? > > This is exactly my concern that a lot of space is wasted. We can use it > for HDFS of course and Kafka or something else but my concern is why Kudu > cannot use more than 8Tb per node. Is it something that is going to change > in future maybe? > > On Wed, Nov 29, 2017 at 1:06 PM, Andrew Wong <[email protected]> wrote: > >> Hi Boris, >> >> The recommendations listed indicate what has been tested. Going beyond >> that is uncharted territory, although that isn't to say it can't be done! >> >> This sort of planning depends on what your schemas look like. Without >> that, it's hard to gauge how many tablets are needed for your tables. That >> would then guide the number of tablets you could hold total. >> >> In terms of space, it seems like the number of nodes would provide ample >> space (30 nodes * 8TB per node >> 80-100TB), unless I'm missing something. >> Although given the number of HDDs per node, it sounds like a lot would go >> unused. If you meant that you have 3 nodes, that's a different story. Would >> you mind clarifying? >> >> >> Andrew >> >> On Tue, Nov 28, 2017 at 7:25 AM, Boris Tyukin <[email protected]> >> wrote: >> >>> Hi guys, >>> >>> I was really excited about Kudu until I saw this: >>> >>> https://kudu.apache.org/docs/known_issues.html >>> >>> >>> - >>> >>> Recommended maximum amount of stored data, post-replication and >>> post-compression, per tablet server is 8TB. >>> - >>> >>> Recommended maximum number of tablets per tablet server is 2000, >>> post-replication. >>> - >>> >>> Maximum number of tablets per table for each tablet server is 60, >>> post-replication, at table-creation time. >>> >>> These numbers are very concerning to me because the project I am working >>> on will have 300+ plus tables and 20 tables have over 1B rows, 50-100 >>> tables are 200M rows in average and the rest are below 50M rows. I want to >>> see if I can build near real-time data lake, ingesting data from our source >>> rdbms systems. >>> >>> My cluster is 30 nodes cluster with 12 spinning HDDs each (each drive is >>> 8Tb) and each node is 2 CPU 22 core beast with 512Gb of DDR4 RAM. >>> >>> Does these limitations above still apply in my case? Looks like I can >>> only have 24Tb worth of data in Kudu which is way below that I need. My >>> modest estimate is 80-100Tb. >>> >>> Also concerned that I can only have 20,000 tablets after replication - >>> as I mentioned above I am going to have a bunch of tables with lots of rows. >>> >>> I do not have an option to pick a different hardware configuration for >>> our cluster. >>> >>> thanks >>> >> >> >> >> -- >> Andrew Wong >> > > -- Andrew Wong
