Thanks Todd.

Kudu cluster running on centos 7.2, each tablet node has 40 cores, the test 
table is about 140GB after 3 reps,  and partitioned by hash bucket, I had tried 
24 and 120 hash buckets.

I do one test: 
1. Stop all ingestion to the cluster
2. Just randomly upsert 3000 rows once, upsert contains new data row or just 
updates to exisit row (updates the whole row, not just updates one or more 
column)
3. From the CDH monitor dashboard, I see the cluster's disk I/O raising from 
~300Mb/s to ~1.5Gb/s, and get back the ~300Mb/s 30min later or more

I check some of tablet node INFO log, they are always doing compaction, 
compacting 1~ 100s of thousands rows.

My question:
1. Are the maintenance manager is rewriting the whole table?  3000 rows upsert 
once will trigger a rewriting the whole table?
2. Does the background I/O have impacts to the scan performance.
3. About the number of hash partitioned buckets,  I partitioned the table to 24 
or 120 buckets, what's the difference in upsert and scan performance? and what 
is the best practices?
4. What is the recommended setting for tablet server memory hard limit?

Thanks.



[email protected]
 
From: Todd Lipcon
Date: 2016-08-17 01:58
To: user
Subject: Re: abnormal high disk I/O rate when upsert into kudu table?
Hi Jacky,

Answers inline below

On Tue, Aug 16, 2016 at 8:13 AM, [email protected] <[email protected]> wrote:
Dear Kudu Developers, 

I am a new tester for kudu, our kudu cluster has 3+12 nodes, 3 seperated master 
node and 12 tablet node, 
each node has 128GB memory, and 1 SSD for WAL, 6 1TB SAS for data

we are using CDH 5.7.0 with impala-kudu 2.7.0 and kudu 0.9.1 parcels, we set 
16GB memory hard limit for each tablet node.

Sounds like a good cluster setup. Thanks for providing the details. 

 
one of our test table is about 80-100 columns and 1 key column, with java 
client, we can insert/upsert into the kudu table about 100,000/s
the kudu table has 300m rows, and about 300,000 rows update per day, we also 
use java client upsert API to update the rows

we found the kudu cluster maybe encounter abnormal high disk I/O rate, about 
1.5-2.0Gb/s, even we just update 1,000~10,000 rows/s
i would like to know, with our row update frequency, is the cluster high disk 
rate normal or not?

Are you upserts randomly spread across the range of rows in the table? If so, 
then when the updates flush, they'll trigger compactions of the updates and 
inserted rows into the existing data. This will cause, over time, a rewrite of 
the whole table, in order to incorporate the updates.

This background I/O is run by the "maintenance manager". You can visit 
http://tablet-server:8050/maintenance-manager to see a dashboard of currently 
running maintenance operations such as compactions.

The maintenance manager runs a preset number of threads, so the amount of 
background I/O you're experiencing won't increase if you increase the number of 
upserts.

I'm curious, is the background I/O causing an issue, or just unexpected?

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to