This could be a problem… If this is a bad byproduct brought over from HBase, 
then this is a common issue for all HBase users. It would be too bad if this 
also exists in Kudu. We HBase users have been trying to eradicate this for a 
long time.

It’s only an opinion…

Cheers,
Ben


> On Aug 16, 2016, at 6:05 PM, [email protected] wrote:
> 
> Thanks Todd.
> 
> Kudu cluster running on centos 7.2, each tablet node has 40 cores, the test 
> table is about 140GB after 3 reps,  and partitioned by hash bucket, I had 
> tried 24 and 120 hash buckets.
> 
> I do one test: 
> 1. Stop all ingestion to the cluster
> 2. Just randomly upsert 3000 rows once, upsert contains new data row or just 
> updates to exisit row (updates the whole row, not just updates one or more 
> column)
> 3. From the CDH monitor dashboard, I see the cluster's disk I/O raising from 
> ~300Mb/s to ~1.5Gb/s, and get back the ~300Mb/s 30min later or more
> 
> I check some of tablet node INFO log, they are always doing compaction, 
> compacting 1~ 100s of thousands rows.
> 
> My question:
> 1. Are the maintenance manager is rewriting the whole table?  3000 rows 
> upsert once will trigger a rewriting the whole table?
> 2. Does the background I/O have impacts to the scan performance.
> 3. About the number of hash partitioned buckets,  I partitioned the table to 
> 24 or 120 buckets, what's the difference in upsert and scan performance? and 
> what is the best practices?
> 4. What is the recommended setting for tablet server memory hard limit?
> 
> Thanks.
> 
> [email protected] <mailto:[email protected]>
>  
> From: Todd Lipcon <mailto:[email protected]>
> Date: 2016-08-17 01:58
> To: user <mailto:[email protected]>
> Subject: Re: abnormal high disk I/O rate when upsert into kudu table?
> Hi Jacky,
> 
> Answers inline below
> 
> On Tue, Aug 16, 2016 at 8:13 AM, [email protected] 
> <mailto:[email protected]> <[email protected] <mailto:[email protected]>> 
> wrote:
> Dear Kudu Developers, 
> 
> I am a new tester for kudu, our kudu cluster has 3+12 nodes, 3 seperated 
> master node and 12 tablet node, 
> each node has 128GB memory, and 1 SSD for WAL, 6 1TB SAS for data
> 
> we are using CDH 5.7.0 with impala-kudu 2.7.0 and kudu 0.9.1 parcels, we set 
> 16GB memory hard limit for each tablet node.
> 
> Sounds like a good cluster setup. Thanks for providing the details. 
> 
>  
> one of our test table is about 80-100 columns and 1 key column, with java 
> client, we can insert/upsert into the kudu table about 100,000/s
> the kudu table has 300m rows, and about 300,000 rows update per day, we also 
> use java client upsert API to update the rows
> 
> we found the kudu cluster maybe encounter abnormal high disk I/O rate, about 
> 1.5-2.0Gb/s, even we just update 1,000~10,000 rows/s
> i would like to know, with our row update frequency, is the cluster high disk 
> rate normal or not?
> 
> Are you upserts randomly spread across the range of rows in the table? If so, 
> then when the updates flush, they'll trigger compactions of the updates and 
> inserted rows into the existing data. This will cause, over time, a rewrite 
> of the whole table, in order to incorporate the updates.
> 
> This background I/O is run by the "maintenance manager". You can visit 
> http://tablet-server:8050/maintenance-manager 
> <http://tablet-server:8050/maintenance-manager> to see a dashboard of 
> currently running maintenance operations such as compactions.
> 
> The maintenance manager runs a preset number of threads, so the amount of 
> background I/O you're experiencing won't increase if you increase the number 
> of upserts.
> 
> I'm curious, is the background I/O causing an issue, or just unexpected?
> 
> Thanks
> -Todd
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Reply via email to