Hi Jason, By "bulk indexing only" you mean you are loading data with a high rate of inserts?
It seems that there is a lot of contention on the memory trackers. https://issues.apache.org/jira/browse/KUDU-1502 is one JIRA where I noted this was the case. If that's the culprit, I would look into the following: - try to change your insert pattern so that it is more sequential in nature (random inserts will cause a lot of block cache lookups to check for duplicate keys) - if you have RAM available, increase both the block cache capacity and the server's memory limit accordingly, so that the bloom lookups will hit Kudu's cache instead of having to go to the operating system cache. Aside from that, we'll be spending some time on improving performance of write-heavy workloads in upcoming releases, and I think fixing this MemTracker contention will be one of the issues tackled. In case the above isn't the issue, do you think you could use 'perf record -g -a' and generate a flame graph? http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html -Todd On Tue, Mar 14, 2017 at 6:14 AM, Jason Heo <[email protected]> wrote: > Hi. I'm experiencing high load and high cpu usage. Kudu is running on 5 > kudu dedicated nodes. 2 nodes' load is 40, while 3 nodes' load is 15. > > Here is the output of `perf record -a & perf report` during bulk indexing > only operation. > > http://imgur.com/8lz1CRk > > I'm wondering this is a reasonable situation. > > I'm using Kudu on CDH 5.10 > > Thanks. > -- Todd Lipcon Software Engineer, Cloudera
