This could be a problem… If this is a bad byproduct brought over from HBase, then this is a common issue for all HBase users. It would be too bad if this also exists in Kudu. We HBase users have been trying to eradicate this for a long time.
It’s only an opinion… Cheers, Ben > On Aug 16, 2016, at 6:05 PM, [email protected] wrote: > > Thanks Todd. > > Kudu cluster running on centos 7.2, each tablet node has 40 cores, the test > table is about 140GB after 3 reps, and partitioned by hash bucket, I had > tried 24 and 120 hash buckets. > > I do one test: > 1. Stop all ingestion to the cluster > 2. Just randomly upsert 3000 rows once, upsert contains new data row or just > updates to exisit row (updates the whole row, not just updates one or more > column) > 3. From the CDH monitor dashboard, I see the cluster's disk I/O raising from > ~300Mb/s to ~1.5Gb/s, and get back the ~300Mb/s 30min later or more > > I check some of tablet node INFO log, they are always doing compaction, > compacting 1~ 100s of thousands rows. > > My question: > 1. Are the maintenance manager is rewriting the whole table? 3000 rows > upsert once will trigger a rewriting the whole table? > 2. Does the background I/O have impacts to the scan performance. > 3. About the number of hash partitioned buckets, I partitioned the table to > 24 or 120 buckets, what's the difference in upsert and scan performance? and > what is the best practices? > 4. What is the recommended setting for tablet server memory hard limit? > > Thanks. > > [email protected] <mailto:[email protected]> > > From: Todd Lipcon <mailto:[email protected]> > Date: 2016-08-17 01:58 > To: user <mailto:[email protected]> > Subject: Re: abnormal high disk I/O rate when upsert into kudu table? > Hi Jacky, > > Answers inline below > > On Tue, Aug 16, 2016 at 8:13 AM, [email protected] > <mailto:[email protected]> <[email protected] <mailto:[email protected]>> > wrote: > Dear Kudu Developers, > > I am a new tester for kudu, our kudu cluster has 3+12 nodes, 3 seperated > master node and 12 tablet node, > each node has 128GB memory, and 1 SSD for WAL, 6 1TB SAS for data > > we are using CDH 5.7.0 with impala-kudu 2.7.0 and kudu 0.9.1 parcels, we set > 16GB memory hard limit for each tablet node. > > Sounds like a good cluster setup. Thanks for providing the details. > > > one of our test table is about 80-100 columns and 1 key column, with java > client, we can insert/upsert into the kudu table about 100,000/s > the kudu table has 300m rows, and about 300,000 rows update per day, we also > use java client upsert API to update the rows > > we found the kudu cluster maybe encounter abnormal high disk I/O rate, about > 1.5-2.0Gb/s, even we just update 1,000~10,000 rows/s > i would like to know, with our row update frequency, is the cluster high disk > rate normal or not? > > Are you upserts randomly spread across the range of rows in the table? If so, > then when the updates flush, they'll trigger compactions of the updates and > inserted rows into the existing data. This will cause, over time, a rewrite > of the whole table, in order to incorporate the updates. > > This background I/O is run by the "maintenance manager". You can visit > http://tablet-server:8050/maintenance-manager > <http://tablet-server:8050/maintenance-manager> to see a dashboard of > currently running maintenance operations such as compactions. > > The maintenance manager runs a preset number of threads, so the amount of > background I/O you're experiencing won't increase if you increase the number > of upserts. > > I'm curious, is the background I/O causing an issue, or just unexpected? > > Thanks > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera
