Re: Help me understand Kudu scalability limitations

Andrew Wong Wed, 29 Nov 2017 11:36:46 -0800

Right, I think you're interpreting that correctly. If you're feeling
adventurous, you could experiment with those limits even further :)


Node density is something we're tracking and hoping to improve in the near
future. There has already been some pretty drastic bumps in this area (see
here <https://issues.apache.org/jira/browse/KUDU-1967>), although I don't
think there's an exact timeline.


Andrew

On Wed, Nov 29, 2017 at 11:16 AM, Boris Tyukin <[email protected]>
wrote:

> thanks for your response, Andrew. every node has 12 8Tb hdds - so 96 Tb
> total per node. our production cluster will have 30 nodes so 2.8 PTb total
> of local hdd space. Looks like with Kudu we will only be able to use 8Tb x
> 30 = 240Tb total before replication so realistically it will be 80Tb top.
> Can you confirm that?
>
> This is exactly my concern that a lot of space is wasted. We can use it
> for HDFS of course and Kafka or something else but my concern is why Kudu
> cannot use more than 8Tb per node. Is it something that is going to change
> in future maybe?
>
> On Wed, Nov 29, 2017 at 1:06 PM, Andrew Wong <[email protected]> wrote:
>
>> Hi Boris,
>>
>> The recommendations listed indicate what has been tested. Going beyond
>> that is uncharted territory, although that isn't to say it can't be done!
>>
>> This sort of planning depends on what your schemas look like. Without
>> that, it's hard to gauge how many tablets are needed for your tables. That
>> would then guide the number of tablets you could hold total.
>>
>> In terms of space, it seems like the number of nodes would provide ample
>> space (30 nodes * 8TB per node >> 80-100TB), unless I'm missing something.
>> Although given the number of HDDs per node, it sounds like a lot would go
>> unused. If you meant that you have 3 nodes, that's a different story. Would
>> you mind clarifying?
>>
>>
>> Andrew
>>
>> On Tue, Nov 28, 2017 at 7:25 AM, Boris Tyukin <[email protected]>
>> wrote:
>>
>>> Hi guys,
>>>
>>> I was really excited about Kudu until I saw this:
>>>
>>> https://kudu.apache.org/docs/known_issues.html
>>>
>>>
>>>    -
>>>
>>>    Recommended maximum amount of stored data, post-replication and
>>>    post-compression, per tablet server is 8TB.
>>>    -
>>>
>>>    Recommended maximum number of tablets per tablet server is 2000,
>>>    post-replication.
>>>    -
>>>
>>>    Maximum number of tablets per table for each tablet server is 60,
>>>    post-replication, at table-creation time.
>>>
>>> These numbers are very concerning to me because the project I am working
>>> on will have 300+ plus tables and 20 tables have over 1B rows, 50-100
>>> tables are 200M rows in average and the rest are below 50M rows. I want to
>>> see if I can build near real-time data lake, ingesting data from our source
>>> rdbms systems.
>>>
>>> My cluster is 30 nodes cluster with 12 spinning HDDs each (each drive is
>>> 8Tb) and each node is 2 CPU 22 core beast with 512Gb of DDR4 RAM.
>>>
>>> Does these limitations above still apply in my case? Looks like I can
>>> only have 24Tb worth of data in Kudu which is way below that I need. My
>>> modest estimate is 80-100Tb.
>>>
>>> Also concerned that I can only have 20,000 tablets after replication -
>>> as I mentioned above I am going to have a bunch of tables with lots of rows.
>>>
>>> I do not have an option to pick a different hardware configuration for
>>> our cluster.
>>>
>>> thanks
>>>
>>
>>
>>
>> --
>> Andrew Wong
>>
>
>


-- 
Andrew Wong

Re: Help me understand Kudu scalability limitations

Reply via email to