On Thu, Aug 2, 2018 at 4:54 PM, Quanlong Huang <[email protected]> wrote:
> Thank Adar and Todd! We'd like to contribute when we could. > > Are there any concerns if we share the machines with HDFS DataNodes and > Yarn NodeManagers? The network bandwidth is 10Gbps. I think it's ok if they > don't share the same disks, e.g. 4 disks for kudu and the other 11 disks > for DataNode and NodeManager, and leave enough CPU & mem for kudu. Is that > right? > That should be fine. Typically we actualyl recommend sharing all the disks for all of the services. There is a trade-off between static partitioning (exclusive access to a smaller number of disks) vs dynamic sharing (potential contention but more available resources). Unless your workload is very latency sensitive I usually think it's better to have the bigger pool of resources available even if it needs to share with other systems. One recommendation, though is to consider using a dedicated disk for the Kudu WAL and metadata, which can help performance, since the WAL can be sensitive to other heavy workloads monopolizing bandwidth on the same spindle. -Todd > > At 2018-08-03 02:26:37, "Todd Lipcon" <[email protected]> wrote: > > +1 to what Adar said. > > One tension we have currently for scaling is that we don't want to scale > individual tablets too large, because of problems like the superblock that > Adar mentioned. However, the solution of just having more tablets is also > not a great one, since many of our startup time problems are primarily > affected by the number of tablets more than their size (see KUDU-38 as the > prime, ancient, example). Additionally, having lots of tablets increases > raft heartbeat traffic and may need to dial back those heartbeat intervals > to keep things stable. > > All of these things can be addressed in time and with some work. If you > are interested in working on these areas to improve density that would be a > great contribution. > > -Todd > > > > On Thu, Aug 2, 2018 at 11:17 AM, Adar Lieber-Dembo <[email protected]> > wrote: > >> The 8TB limit isn't a hard one, it's just a reflection of the scale >> that Kudu developers commonly test. Beyond 8TB we can't vouch for >> Kudu's stability and performance. For example, we know that as the >> amount of on-disk data grows, node restart times get longer and longer >> (see KUDU-2014 for some ideas on how to improve that). Furthermore, as >> tablets accrue more data blocks, their superblocks become larger, >> raising the minimum amount of I/O for any operation that rewrites a >> superblock (such as a flush or compaction). Lastly, the tablet copy >> protocol used in rereplication tries to copy the entire superblock in >> one RPC message; if the superblock is too large, it'll run up against >> the default 50 MB RPC transfer size (see src/kudu/rpc/transfer.cc). >> >> These examples are just off the top of my head; there may be others >> lurking. So this goes back to what I led with: beyond the recommended >> limit we aren't quite sure how Kudu's performance and stability are >> affected. >> >> All that said, you're welcome to try it out and report back with your >> findings. >> >> >> On Thu, Aug 2, 2018 at 7:23 AM Quanlong Huang <[email protected]> >> wrote: >> > >> > Hi all, >> > >> > In the document of "Known Issues and Limitations", it's recommended >> that "maximum amount of stored data, post-replication and post-compression, >> per tablet server is 8TB". How is the 8TB calculated? >> > >> > We have some machines each with 15 * 4TB spinning disk drives and 256GB >> RAM, 48 cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space is >> recommended to leave for other systems? We prefer to make the machine >> dedicated to Kudu. Can tablet server leverage the whole space efficiently? >> > >> > Thanks, >> > Quanlong >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > > -- Todd Lipcon Software Engineer, Cloudera
