Re: Help me understand Kudu scalability limitations

Boris Tyukin Wed, 29 Nov 2017 12:09:52 -0800

awesome, this is great to know! thanks again Andrew

On Wed, Nov 29, 2017 at 2:35 PM, Andrew Wong <[email protected]> wrote:


> Right, I think you're interpreting that correctly. If you're feeling
> adventurous, you could experiment with those limits even further :)
>
> Node density is something we're tracking and hoping to improve in the near
> future. There has already been some pretty drastic bumps in this area (see
> here <https://issues.apache.org/jira/browse/KUDU-1967>), although I don't
> think there's an exact timeline.
>
>
> Andrew
>
> On Wed, Nov 29, 2017 at 11:16 AM, Boris Tyukin <[email protected]>
> wrote:
>
>> thanks for your response, Andrew. every node has 12 8Tb hdds - so 96 Tb
>> total per node. our production cluster will have 30 nodes so 2.8 PTb total
>> of local hdd space. Looks like with Kudu we will only be able to use 8Tb x
>> 30 = 240Tb total before replication so realistically it will be 80Tb top.
>> Can you confirm that?
>>
>> This is exactly my concern that a lot of space is wasted. We can use it
>> for HDFS of course and Kafka or something else but my concern is why Kudu
>> cannot use more than 8Tb per node. Is it something that is going to change
>> in future maybe?
>>
>> On Wed, Nov 29, 2017 at 1:06 PM, Andrew Wong <[email protected]> wrote:
>>
>>> Hi Boris,
>>>
>>> The recommendations listed indicate what has been tested. Going beyond
>>> that is uncharted territory, although that isn't to say it can't be done!
>>>
>>> This sort of planning depends on what your schemas look like. Without
>>> that, it's hard to gauge how many tablets are needed for your tables. That
>>> would then guide the number of tablets you could hold total.
>>>
>>> In terms of space, it seems like the number of nodes would provide ample
>>> space (30 nodes * 8TB per node >> 80-100TB), unless I'm missing something.
>>> Although given the number of HDDs per node, it sounds like a lot would go
>>> unused. If you meant that you have 3 nodes, that's a different story. Would
>>> you mind clarifying?
>>>
>>>
>>> Andrew
>>>
>>> On Tue, Nov 28, 2017 at 7:25 AM, Boris Tyukin <[email protected]>
>>> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> I was really excited about Kudu until I saw this:
>>>>
>>>> https://kudu.apache.org/docs/known_issues.html
>>>>
>>>>
>>>>    -
>>>>
>>>>    Recommended maximum amount of stored data, post-replication and
>>>>    post-compression, per tablet server is 8TB.
>>>>    -
>>>>
>>>>    Recommended maximum number of tablets per tablet server is 2000,
>>>>    post-replication.
>>>>    -
>>>>
>>>>    Maximum number of tablets per table for each tablet server is 60,
>>>>    post-replication, at table-creation time.
>>>>
>>>> These numbers are very concerning to me because the project I am
>>>> working on will have 300+ plus tables and 20 tables have over 1B rows,
>>>> 50-100 tables are 200M rows in average and the rest are below 50M rows. I
>>>> want to see if I can build near real-time data lake, ingesting data from
>>>> our source rdbms systems.
>>>>
>>>> My cluster is 30 nodes cluster with 12 spinning HDDs each (each drive
>>>> is 8Tb) and each node is 2 CPU 22 core beast with 512Gb of DDR4 RAM.
>>>>
>>>> Does these limitations above still apply in my case? Looks like I can
>>>> only have 24Tb worth of data in Kudu which is way below that I need. My
>>>> modest estimate is 80-100Tb.
>>>>
>>>> Also concerned that I can only have 20,000 tablets after replication -
>>>> as I mentioned above I am going to have a bunch of tables with lots of 
>>>> rows.
>>>>
>>>> I do not have an option to pick a different hardware configuration for
>>>> our cluster.
>>>>
>>>> thanks
>>>>
>>>
>>>
>>>
>>> --
>>> Andrew Wong
>>>
>>
>>
>
>
> --
> Andrew Wong
>

Re: Help me understand Kudu scalability limitations

Reply via email to