thanks for your response, Andrew. every node has 12 8Tb hdds - so 96 Tb
total per node. our production cluster will have 30 nodes so 2.8 PTb total
of local hdd space. Looks like with Kudu we will only be able to use 8Tb x
30 = 240Tb total before replication so realistically it will be 80Tb top.
Can you confirm that?

This is exactly my concern that a lot of space is wasted. We can use it for
HDFS of course and Kafka or something else but my concern is why Kudu
cannot use more than 8Tb per node. Is it something that is going to change
in future maybe?

On Wed, Nov 29, 2017 at 1:06 PM, Andrew Wong <aw...@cloudera.com> wrote:

> Hi Boris,
>
> The recommendations listed indicate what has been tested. Going beyond
> that is uncharted territory, although that isn't to say it can't be done!
>
> This sort of planning depends on what your schemas look like. Without
> that, it's hard to gauge how many tablets are needed for your tables. That
> would then guide the number of tablets you could hold total.
>
> In terms of space, it seems like the number of nodes would provide ample
> space (30 nodes * 8TB per node >> 80-100TB), unless I'm missing something.
> Although given the number of HDDs per node, it sounds like a lot would go
> unused. If you meant that you have 3 nodes, that's a different story. Would
> you mind clarifying?
>
>
> Andrew
>
> On Tue, Nov 28, 2017 at 7:25 AM, Boris Tyukin <bo...@boristyukin.com>
> wrote:
>
>> Hi guys,
>>
>> I was really excited about Kudu until I saw this:
>>
>> https://kudu.apache.org/docs/known_issues.html
>>
>>
>>    -
>>
>>    Recommended maximum amount of stored data, post-replication and
>>    post-compression, per tablet server is 8TB.
>>    -
>>
>>    Recommended maximum number of tablets per tablet server is 2000,
>>    post-replication.
>>    -
>>
>>    Maximum number of tablets per table for each tablet server is 60,
>>    post-replication, at table-creation time.
>>
>> These numbers are very concerning to me because the project I am working
>> on will have 300+ plus tables and 20 tables have over 1B rows, 50-100
>> tables are 200M rows in average and the rest are below 50M rows. I want to
>> see if I can build near real-time data lake, ingesting data from our source
>> rdbms systems.
>>
>> My cluster is 30 nodes cluster with 12 spinning HDDs each (each drive is
>> 8Tb) and each node is 2 CPU 22 core beast with 512Gb of DDR4 RAM.
>>
>> Does these limitations above still apply in my case? Looks like I can
>> only have 24Tb worth of data in Kudu which is way below that I need. My
>> modest estimate is 80-100Tb.
>>
>> Also concerned that I can only have 20,000 tablets after replication - as
>> I mentioned above I am going to have a bunch of tables with lots of rows.
>>
>> I do not have an option to pick a different hardware configuration for
>> our cluster.
>>
>> thanks
>>
>
>
>
> --
> Andrew Wong
>

Reply via email to