Re: Re: Recommended maximum amount of stored data per tablet server

2018-08-04 Thread Boris Tyukin
How much space typically allocated just for WAL and metadata? We have 2
400GB ssds in raid5 for OS and 12 12TB hdds. Is it still a good idea to
carve out maybe 100gb on SSD or use a dedicated hdd

On Thu, Aug 2, 2018, 20:36 Todd Lipcon  wrote:

> On Thu, Aug 2, 2018 at 4:54 PM, Quanlong Huang 
> wrote:
>
>> Thank Adar and Todd! We'd like to contribute when we could.
>>
>> Are there any concerns if we share the machines with HDFS DataNodes and
>> Yarn NodeManagers? The network bandwidth is 10Gbps. I think it's ok if they
>> don't share the same disks, e.g. 4 disks for kudu and the other 11 disks
>> for DataNode and NodeManager, and leave enough CPU & mem for kudu. Is that
>> right?
>>
>
> That should be fine. Typically we actualyl recommend sharing all the disks
> for all of the services. There is a trade-off between static partitioning
> (exclusive access to a smaller number of disks) vs dynamic sharing
> (potential contention but more available resources). Unless your workload
> is very latency sensitive I usually think it's better to have the bigger
> pool of resources available even if it needs to share with other systems.
>
> One recommendation, though is to consider using a dedicated disk for the
> Kudu WAL and metadata, which can help performance, since the WAL can be
> sensitive to other heavy workloads monopolizing bandwidth on the same
> spindle.
>
> -Todd
>
>>
>> At 2018-08-03 02:26:37, "Todd Lipcon"  wrote:
>>
>> +1 to what Adar said.
>>
>> One tension we have currently for scaling is that we don't want to scale
>> individual tablets too large, because of problems like the superblock that
>> Adar mentioned. However, the solution of just having more tablets is also
>> not a great one, since many of our startup time problems are primarily
>> affected by the number of tablets more than their size (see KUDU-38 as the
>> prime, ancient, example). Additionally, having lots of tablets increases
>> raft heartbeat traffic and may need to dial back those heartbeat intervals
>> to keep things stable.
>>
>> All of these things can be addressed in time and with some work. If you
>> are interested in working on these areas to improve density that would be a
>> great contribution.
>>
>> -Todd
>>
>>
>>
>> On Thu, Aug 2, 2018 at 11:17 AM, Adar Lieber-Dembo 
>> wrote:
>>
>>> The 8TB limit isn't a hard one, it's just a reflection of the scale
>>> that Kudu developers commonly test. Beyond 8TB we can't vouch for
>>> Kudu's stability and performance. For example, we know that as the
>>> amount of on-disk data grows, node restart times get longer and longer
>>> (see KUDU-2014 for some ideas on how to improve that). Furthermore, as
>>> tablets accrue more data blocks, their superblocks become larger,
>>> raising the minimum amount of I/O for any operation that rewrites a
>>> superblock (such as a flush or compaction). Lastly, the tablet copy
>>> protocol used in rereplication tries to copy the entire superblock in
>>> one RPC message; if the superblock is too large, it'll run up against
>>> the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc).
>>>
>>> These examples are just off the top of my head; there may be others
>>> lurking. So this goes back to what I led with: beyond the recommended
>>> limit we aren't quite sure how Kudu's performance and stability are
>>> affected.
>>>
>>> All that said, you're welcome to try it out and report back with your
>>> findings.
>>>
>>>
>>> On Thu, Aug 2, 2018 at 7:23 AM Quanlong Huang 
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > In the document of "Known Issues and Limitations", it's recommended
>>> that "maximum amount of stored data, post-replication and post-compression,
>>> per tablet server is 8TB". How is the 8TB calculated?
>>> >
>>> > We have some machines each with 15 * 4TB spinning disk drives and
>>> 256GB RAM, 48 cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space
>>> is recommended to leave for other systems? We prefer to make the machine
>>> dedicated to Kudu. Can tablet server leverage the whole space efficiently?
>>> >
>>> > Thanks,
>>> > Quanlong
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re:Re: Re: Recommended maximum amount of stored data per tablet server

2018-08-03 Thread Quanlong Huang
Thank you, Todd! I'm really appreciated to your help!


At 2018-08-03 08:36:39,"Todd Lipcon"  wrote:

On Thu, Aug 2, 2018 at 4:54 PM, Quanlong Huang  wrote:

Thank Adar and Todd! We'd like to contribute when we could.


Are there any concerns if we share the machines with HDFS DataNodes and Yarn 
NodeManagers? The network bandwidth is 10Gbps. I think it's ok if they don't 
share the same disks, e.g. 4 disks for kudu and the other 11 disks for DataNode 
and NodeManager, and leave enough CPU & mem for kudu. Is that right?


That should be fine. Typically we actualyl recommend sharing all the disks for 
all of the services. There is a trade-off between static partitioning 
(exclusive access to a smaller number of disks) vs dynamic sharing (potential 
contention but more available resources). Unless your workload is very latency 
sensitive I usually think it's better to have the bigger pool of resources 
available even if it needs to share with other systems.


One recommendation, though is to consider using a dedicated disk for the Kudu 
WAL and metadata, which can help performance, since the WAL can be sensitive to 
other heavy workloads monopolizing bandwidth on the same spindle.


-Todd

At 2018-08-03 02:26:37, "Todd Lipcon"  wrote:

+1 to what Adar said.


One tension we have currently for scaling is that we don't want to scale 
individual tablets too large, because of problems like the superblock that Adar 
mentioned. However, the solution of just having more tablets is also not a 
great one, since many of our startup time problems are primarily affected by 
the number of tablets more than their size (see KUDU-38 as the prime, ancient, 
example). Additionally, having lots of tablets increases raft heartbeat traffic 
and may need to dial back those heartbeat intervals to keep things stable.


All of these things can be addressed in time and with some work. If you are 
interested in working on these areas to improve density that would be a great 
contribution.


-Todd






On Thu, Aug 2, 2018 at 11:17 AM, Adar Lieber-Dembo  wrote:
The 8TB limit isn't a hard one, it's just a reflection of the scale
that Kudu developers commonly test. Beyond 8TB we can't vouch for
Kudu's stability and performance. For example, we know that as the
amount of on-disk data grows, node restart times get longer and longer
(see KUDU-2014 for some ideas on how to improve that). Furthermore, as
tablets accrue more data blocks, their superblocks become larger,
raising the minimum amount of I/O for any operation that rewrites a
superblock (such as a flush or compaction). Lastly, the tablet copy
protocol used in rereplication tries to copy the entire superblock in
one RPC message; if the superblock is too large, it'll run up against
the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc).

These examples are just off the top of my head; there may be others
lurking. So this goes back to what I led with: beyond the recommended
limit we aren't quite sure how Kudu's performance and stability are
affected.

All that said, you're welcome to try it out and report back with your findings.



On Thu, Aug 2, 2018 at 7:23 AM Quanlong Huang  wrote:
>
> Hi all,
>
> In the document of "Known Issues and Limitations", it's recommended that 
> "maximum amount of stored data, post-replication and post-compression, per 
> tablet server is 8TB". How is the 8TB calculated?
>
> We have some machines each with 15 * 4TB spinning disk drives and 256GB RAM, 
> 48 cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space is recommended 
> to leave for other systems? We prefer to make the machine dedicated to Kudu. 
> Can tablet server leverage the whole space efficiently?
>
> Thanks,
> Quanlong






--

Todd Lipcon
Software Engineer, Cloudera





--

Todd Lipcon
Software Engineer, Cloudera

Re: Re: Recommended maximum amount of stored data per tablet server

2018-08-02 Thread Todd Lipcon
On Thu, Aug 2, 2018 at 4:54 PM, Quanlong Huang 
wrote:

> Thank Adar and Todd! We'd like to contribute when we could.
>
> Are there any concerns if we share the machines with HDFS DataNodes and
> Yarn NodeManagers? The network bandwidth is 10Gbps. I think it's ok if they
> don't share the same disks, e.g. 4 disks for kudu and the other 11 disks
> for DataNode and NodeManager, and leave enough CPU & mem for kudu. Is that
> right?
>

That should be fine. Typically we actualyl recommend sharing all the disks
for all of the services. There is a trade-off between static partitioning
(exclusive access to a smaller number of disks) vs dynamic sharing
(potential contention but more available resources). Unless your workload
is very latency sensitive I usually think it's better to have the bigger
pool of resources available even if it needs to share with other systems.

One recommendation, though is to consider using a dedicated disk for the
Kudu WAL and metadata, which can help performance, since the WAL can be
sensitive to other heavy workloads monopolizing bandwidth on the same
spindle.

-Todd

>
> At 2018-08-03 02:26:37, "Todd Lipcon"  wrote:
>
> +1 to what Adar said.
>
> One tension we have currently for scaling is that we don't want to scale
> individual tablets too large, because of problems like the superblock that
> Adar mentioned. However, the solution of just having more tablets is also
> not a great one, since many of our startup time problems are primarily
> affected by the number of tablets more than their size (see KUDU-38 as the
> prime, ancient, example). Additionally, having lots of tablets increases
> raft heartbeat traffic and may need to dial back those heartbeat intervals
> to keep things stable.
>
> All of these things can be addressed in time and with some work. If you
> are interested in working on these areas to improve density that would be a
> great contribution.
>
> -Todd
>
>
>
> On Thu, Aug 2, 2018 at 11:17 AM, Adar Lieber-Dembo 
> wrote:
>
>> The 8TB limit isn't a hard one, it's just a reflection of the scale
>> that Kudu developers commonly test. Beyond 8TB we can't vouch for
>> Kudu's stability and performance. For example, we know that as the
>> amount of on-disk data grows, node restart times get longer and longer
>> (see KUDU-2014 for some ideas on how to improve that). Furthermore, as
>> tablets accrue more data blocks, their superblocks become larger,
>> raising the minimum amount of I/O for any operation that rewrites a
>> superblock (such as a flush or compaction). Lastly, the tablet copy
>> protocol used in rereplication tries to copy the entire superblock in
>> one RPC message; if the superblock is too large, it'll run up against
>> the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc).
>>
>> These examples are just off the top of my head; there may be others
>> lurking. So this goes back to what I led with: beyond the recommended
>> limit we aren't quite sure how Kudu's performance and stability are
>> affected.
>>
>> All that said, you're welcome to try it out and report back with your
>> findings.
>>
>>
>> On Thu, Aug 2, 2018 at 7:23 AM Quanlong Huang 
>> wrote:
>> >
>> > Hi all,
>> >
>> > In the document of "Known Issues and Limitations", it's recommended
>> that "maximum amount of stored data, post-replication and post-compression,
>> per tablet server is 8TB". How is the 8TB calculated?
>> >
>> > We have some machines each with 15 * 4TB spinning disk drives and 256GB
>> RAM, 48 cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space is
>> recommended to leave for other systems? We prefer to make the machine
>> dedicated to Kudu. Can tablet server leverage the whole space efficiently?
>> >
>> > Thanks,
>> > Quanlong
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re:Re: Recommended maximum amount of stored data per tablet server

2018-08-02 Thread Quanlong Huang
Thank Adar and Todd! We'd like to contribute when we could.


Are there any concerns if we share the machines with HDFS DataNodes and Yarn 
NodeManagers? The network bandwidth is 10Gbps. I think it's ok if they don't 
share the same disks, e.g. 4 disks for kudu and the other 11 disks for DataNode 
and NodeManager, and leave enough CPU & mem for kudu. Is that right?


Thanks,
Quanlong

At 2018-08-03 02:26:37, "Todd Lipcon"  wrote:

+1 to what Adar said.


One tension we have currently for scaling is that we don't want to scale 
individual tablets too large, because of problems like the superblock that Adar 
mentioned. However, the solution of just having more tablets is also not a 
great one, since many of our startup time problems are primarily affected by 
the number of tablets more than their size (see KUDU-38 as the prime, ancient, 
example). Additionally, having lots of tablets increases raft heartbeat traffic 
and may need to dial back those heartbeat intervals to keep things stable.


All of these things can be addressed in time and with some work. If you are 
interested in working on these areas to improve density that would be a great 
contribution.


-Todd






On Thu, Aug 2, 2018 at 11:17 AM, Adar Lieber-Dembo  wrote:
The 8TB limit isn't a hard one, it's just a reflection of the scale
that Kudu developers commonly test. Beyond 8TB we can't vouch for
Kudu's stability and performance. For example, we know that as the
amount of on-disk data grows, node restart times get longer and longer
(see KUDU-2014 for some ideas on how to improve that). Furthermore, as
tablets accrue more data blocks, their superblocks become larger,
raising the minimum amount of I/O for any operation that rewrites a
superblock (such as a flush or compaction). Lastly, the tablet copy
protocol used in rereplication tries to copy the entire superblock in
one RPC message; if the superblock is too large, it'll run up against
the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc).

These examples are just off the top of my head; there may be others
lurking. So this goes back to what I led with: beyond the recommended
limit we aren't quite sure how Kudu's performance and stability are
affected.

All that said, you're welcome to try it out and report back with your findings.



On Thu, Aug 2, 2018 at 7:23 AM Quanlong Huang  wrote:
>
> Hi all,
>
> In the document of "Known Issues and Limitations", it's recommended that 
> "maximum amount of stored data, post-replication and post-compression, per 
> tablet server is 8TB". How is the 8TB calculated?
>
> We have some machines each with 15 * 4TB spinning disk drives and 256GB RAM, 
> 48 cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space is recommended 
> to leave for other systems? We prefer to make the machine dedicated to Kudu. 
> Can tablet server leverage the whole space efficiently?
>
> Thanks,
> Quanlong






--

Todd Lipcon
Software Engineer, Cloudera

Re: Recommended maximum amount of stored data per tablet server

2018-08-02 Thread Todd Lipcon
+1 to what Adar said.

One tension we have currently for scaling is that we don't want to scale
individual tablets too large, because of problems like the superblock that
Adar mentioned. However, the solution of just having more tablets is also
not a great one, since many of our startup time problems are primarily
affected by the number of tablets more than their size (see KUDU-38 as the
prime, ancient, example). Additionally, having lots of tablets increases
raft heartbeat traffic and may need to dial back those heartbeat intervals
to keep things stable.

All of these things can be addressed in time and with some work. If you are
interested in working on these areas to improve density that would be a
great contribution.

-Todd



On Thu, Aug 2, 2018 at 11:17 AM, Adar Lieber-Dembo 
wrote:

> The 8TB limit isn't a hard one, it's just a reflection of the scale
> that Kudu developers commonly test. Beyond 8TB we can't vouch for
> Kudu's stability and performance. For example, we know that as the
> amount of on-disk data grows, node restart times get longer and longer
> (see KUDU-2014 for some ideas on how to improve that). Furthermore, as
> tablets accrue more data blocks, their superblocks become larger,
> raising the minimum amount of I/O for any operation that rewrites a
> superblock (such as a flush or compaction). Lastly, the tablet copy
> protocol used in rereplication tries to copy the entire superblock in
> one RPC message; if the superblock is too large, it'll run up against
> the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc).
>
> These examples are just off the top of my head; there may be others
> lurking. So this goes back to what I led with: beyond the recommended
> limit we aren't quite sure how Kudu's performance and stability are
> affected.
>
> All that said, you're welcome to try it out and report back with your
> findings.
>
>
> On Thu, Aug 2, 2018 at 7:23 AM Quanlong Huang 
> wrote:
> >
> > Hi all,
> >
> > In the document of "Known Issues and Limitations", it's recommended that
> "maximum amount of stored data, post-replication and post-compression, per
> tablet server is 8TB". How is the 8TB calculated?
> >
> > We have some machines each with 15 * 4TB spinning disk drives and 256GB
> RAM, 48 cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space is
> recommended to leave for other systems? We prefer to make the machine
> dedicated to Kudu. Can tablet server leverage the whole space efficiently?
> >
> > Thanks,
> > Quanlong
>



-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Recommended maximum amount of stored data per tablet server

2018-08-02 Thread Adar Lieber-Dembo
The 8TB limit isn't a hard one, it's just a reflection of the scale
that Kudu developers commonly test. Beyond 8TB we can't vouch for
Kudu's stability and performance. For example, we know that as the
amount of on-disk data grows, node restart times get longer and longer
(see KUDU-2014 for some ideas on how to improve that). Furthermore, as
tablets accrue more data blocks, their superblocks become larger,
raising the minimum amount of I/O for any operation that rewrites a
superblock (such as a flush or compaction). Lastly, the tablet copy
protocol used in rereplication tries to copy the entire superblock in
one RPC message; if the superblock is too large, it'll run up against
the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc).

These examples are just off the top of my head; there may be others
lurking. So this goes back to what I led with: beyond the recommended
limit we aren't quite sure how Kudu's performance and stability are
affected.

All that said, you're welcome to try it out and report back with your findings.


On Thu, Aug 2, 2018 at 7:23 AM Quanlong Huang  wrote:
>
> Hi all,
>
> In the document of "Known Issues and Limitations", it's recommended that 
> "maximum amount of stored data, post-replication and post-compression, per 
> tablet server is 8TB". How is the 8TB calculated?
>
> We have some machines each with 15 * 4TB spinning disk drives and 256GB RAM, 
> 48 cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space is recommended 
> to leave for other systems? We prefer to make the machine dedicated to Kudu. 
> Can tablet server leverage the whole space efficiently?
>
> Thanks,
> Quanlong


Recommended maximum amount of stored data per tablet server

2018-08-02 Thread Quanlong Huang
Hi all,


In the document of "Known Issues and Limitations", it's recommended that 
"maximum amount of stored data, post-replication and post-compression, per 
tablet server is 8TB". How is the 8TB calculated?


We have some machines each with 15 * 4TB spinning disk drives and 256GB RAM, 48 
cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space is recommended to 
leave for other systems? We prefer to make the machine dedicated to Kudu. Can 
tablet server leverage the whole space efficiently?


Thanks,
Quanlong