awesome, this is great to know! thanks again Andrew On Wed, Nov 29, 2017 at 2:35 PM, Andrew Wong <aw...@cloudera.com> wrote:
> Right, I think you're interpreting that correctly. If you're feeling > adventurous, you could experiment with those limits even further :) > > Node density is something we're tracking and hoping to improve in the near > future. There has already been some pretty drastic bumps in this area (see > here <https://issues.apache.org/jira/browse/KUDU-1967>), although I don't > think there's an exact timeline. > > > Andrew > > On Wed, Nov 29, 2017 at 11:16 AM, Boris Tyukin <bo...@boristyukin.com> > wrote: > >> thanks for your response, Andrew. every node has 12 8Tb hdds - so 96 Tb >> total per node. our production cluster will have 30 nodes so 2.8 PTb total >> of local hdd space. Looks like with Kudu we will only be able to use 8Tb x >> 30 = 240Tb total before replication so realistically it will be 80Tb top. >> Can you confirm that? >> >> This is exactly my concern that a lot of space is wasted. We can use it >> for HDFS of course and Kafka or something else but my concern is why Kudu >> cannot use more than 8Tb per node. Is it something that is going to change >> in future maybe? >> >> On Wed, Nov 29, 2017 at 1:06 PM, Andrew Wong <aw...@cloudera.com> wrote: >> >>> Hi Boris, >>> >>> The recommendations listed indicate what has been tested. Going beyond >>> that is uncharted territory, although that isn't to say it can't be done! >>> >>> This sort of planning depends on what your schemas look like. Without >>> that, it's hard to gauge how many tablets are needed for your tables. That >>> would then guide the number of tablets you could hold total. >>> >>> In terms of space, it seems like the number of nodes would provide ample >>> space (30 nodes * 8TB per node >> 80-100TB), unless I'm missing something. >>> Although given the number of HDDs per node, it sounds like a lot would go >>> unused. If you meant that you have 3 nodes, that's a different story. Would >>> you mind clarifying? >>> >>> >>> Andrew >>> >>> On Tue, Nov 28, 2017 at 7:25 AM, Boris Tyukin <bo...@boristyukin.com> >>> wrote: >>> >>>> Hi guys, >>>> >>>> I was really excited about Kudu until I saw this: >>>> >>>> https://kudu.apache.org/docs/known_issues.html >>>> >>>> >>>> - >>>> >>>> Recommended maximum amount of stored data, post-replication and >>>> post-compression, per tablet server is 8TB. >>>> - >>>> >>>> Recommended maximum number of tablets per tablet server is 2000, >>>> post-replication. >>>> - >>>> >>>> Maximum number of tablets per table for each tablet server is 60, >>>> post-replication, at table-creation time. >>>> >>>> These numbers are very concerning to me because the project I am >>>> working on will have 300+ plus tables and 20 tables have over 1B rows, >>>> 50-100 tables are 200M rows in average and the rest are below 50M rows. I >>>> want to see if I can build near real-time data lake, ingesting data from >>>> our source rdbms systems. >>>> >>>> My cluster is 30 nodes cluster with 12 spinning HDDs each (each drive >>>> is 8Tb) and each node is 2 CPU 22 core beast with 512Gb of DDR4 RAM. >>>> >>>> Does these limitations above still apply in my case? Looks like I can >>>> only have 24Tb worth of data in Kudu which is way below that I need. My >>>> modest estimate is 80-100Tb. >>>> >>>> Also concerned that I can only have 20,000 tablets after replication - >>>> as I mentioned above I am going to have a bunch of tables with lots of >>>> rows. >>>> >>>> I do not have an option to pick a different hardware configuration for >>>> our cluster. >>>> >>>> thanks >>>> >>> >>> >>> >>> -- >>> Andrew Wong >>> >> >> > > > -- > Andrew Wong >