thanks for explanation, Todd.now waiting for 0.10.0 parcels to test.
Get Outlook for iOS




On Fri, Aug 19, 2016 at 2:11 AM +0800, "Todd Lipcon" <[email protected]> wrote:










Hey Ben and Jacky,
Apologies for my late response. As you might have seen on the Kudu blog, a lot 
of the contributors have been busy wrapping up the 0.10.0 release this week. 
Answers inline


On Aug 16, 2016, at 6:05 PM, [email protected] wrote:
Thanks Todd.
Kudu cluster running on centos 7.2, each tablet node has 40 cores, the test 
table is about 140GB after 3 reps,  and partitioned by hash bucket, I had tried 
24 and 120 hash buckets.
I do one test: 1. Stop all ingestion to the cluster2. Just randomly upsert 3000 
rows once, upsert contains new data row or just updates to exisit row (updates 
the whole row, not just updates one or more column)3. From the CDH monitor 
dashboard, I see the cluster's disk I/O raising from ~300Mb/s to ~1.5Gb/s, and 
get back the ~300Mb/s 30min later or more
I check some of tablet node INFO log, they are always doing compaction, 
compacting 1~ 100s of thousands rows.
My question:1. Are the maintenance manager is rewriting the whole table?  3000 
rows upsert once will trigger a rewriting the whole table?
If those 3000 rows are spread across the whole key space, then yes, it 
currently will. If, on the other hand, you had a table something like:
CREATE TABLE t (  ts INTEGER PRIMARY KEY,  other_data string,   ...) DISTRIBUTE 
BY HASH(ts) INTO 120 BUCKETS
and your inserts were for a small range of time (eg concentrated around "now") 
then the compaction would only rewrite the portion of key space that has new 
rows (or updates) affecting it. 2. Does the background I/O have impacts to the 
scan performance.Of course it has some. However, it is restricted by default to 
a single thread, so it should use only a small percentage of the machine's 
total capacity. I had worked on a patch a few years ago to use ioprio_set to 
mark these I/Os as "low priority", but didn't have enough time to validate it 
helping with any workload, and thus, it didn't get committed. In case anyone's 
interested in trying it, I posted the (very old) diff to 
https://gist.github.com/toddlipcon/faaf8e74b4dae93e668e8bda1118b58a . It will 
probably need some conflict resolution to apply it. 3. About the number of hash 
partitioned buckets,  I partitioned the table to 24 or 120 buckets, what's the 
difference in upsert and scan performance? and what is the best practices?For 
write performance, I'd recommend several (5-10) of tablets per tablet server 
being actively written to. Individual tablets can handle multi-threaded writes 
pretty well, though there are some bottlenecks at very high throughputs, so 
just having one per tablet server would not get peak performance.
On the read side, it's worth noting that _currently_ both the Spark and Impala 
integrations start only one scanner thread per tablet. So, the number of 
tablets limits scan parallelism. Given that, I expect you would see better 
performance with 120 buckets. This is, however, a temporary limitation: we 
intend to add some API at some point to make it easier for query engines to 
divide the scanner into smaller chunks which can be spread across more threads 
regardless of the number of tablets. 4. What is the recommended setting for 
tablet server memory hard limit?
It depends how much you want to devote to other applications :) On the 
write-buffering side, I have seen diminishing returns after 10GB or so. 
However, you can use a much bigger memory limit and devote an arbitrary amount 
to the block cache, which will improve both write performance (since you can 
get 100% cache hit rate on bloom filters) as well as read performance (since 
you will hit cache more on reads). Of course, it's worth noting that if you 
stick to a small memory limit for Kudu, the Linux block cache is still used, 
and you can still get many reads serviced from memory.

On Tue, Aug 16, 2016 at 6:09 PM, Benjamin Kim <[email protected]> wrote:
This could be a problem… If this is a bad byproduct brought over from HBase, 
then this is a common issue for all HBase users. It would be too bad if this 
also exists in Kudu. We HBase users have been trying to eradicate this for a 
long time.
The main difference between compaction in Kudu and in HBase is that all of our 
compaction is "incremental". We also have the design that compaction is always 
running at the same speed. As you've noticed, you should expect that in a live 
Kudu cluster, there's always some background work consuming IO. This has the 
downside that you're always seeing some performance impact due to it. This has 
the huge (IMHO) upside that you are _always_ seeing the performance impact -- 
in other words, you will never be "surprised" by a compaction starting.
This is unlike the design in HBase (and many other LSM-tree designs) where 
there is a distinct "trigger point" at which compaction starts. In those 
systems, you can have something performing well during testing, and then all of 
a sudden reach some threshold where the performance profile drastically changes 
(possibly resulting in getting paged in the middle of the night). Personally, 
as an operator, I would always pick a system which is _consistently_ a bit 
slower than one which is sometimes faster and at arbitrary times goes into a 
slow mode.
One thing we should probably consider is having some sort of very low threshold 
below which we don't trigger a compaction. The example of a large table with a 
few thousand inserts is a good one - we are probably better off just waiting 
until the situation is a little bit worse before starting compaction. We just 
don't want to wait until it's out of control.
Hope that reasoning makes sense to you, and that your experience testing Kudu 
has borne it out.
-Todd
[email protected] From: Todd LipconDate: 2016-08-17 01:58To: userSubject: Re: 
abnormal high disk I/O rate when upsert into kudu table?Hi Jacky,
Answers inline below
On Tue, Aug 16, 2016 at 8:13 AM, [email protected] <[email protected]> wrote:
Dear Kudu Developers, 
I am a new tester for kudu, our kudu cluster has 3+12 nodes, 3 seperated master 
node and 12 tablet node, each node has 128GB memory, and 1 SSD for WAL, 6 1TB 
SAS for data
we are using CDH 5.7.0 with impala-kudu 2.7.0 and kudu 0.9.1 parcels, we set 
16GB memory hard limit for each tablet node.
Sounds like a good cluster setup. Thanks for providing the details. 
 one of our test table is about 80-100 columns and 1 key column, with java 
client, we can insert/upsert into the kudu table about 100,000/s
the kudu table has 300m rows, and about 300,000 rows update per day, we also 
use java client upsert API to update the rows

we found the kudu cluster maybe encounter abnormal high disk I/O rate, about 
1.5-2.0Gb/s, even we just update 1,000~10,000 rows/s
i would like to know, with our row update frequency, is the cluster high disk 
rate normal or not?
Are you upserts randomly spread across the range of rows in the table? If so, 
then when the updates flush, they'll trigger compactions of the updates and 
inserted rows into the existing data. This will cause, over time, a rewrite of 
the whole table, in order to incorporate the updates.
This background I/O is run by the "maintenance manager". You can visit 
http://tablet-server:8050/maintenance-manager to see a dashboard of currently 
running maintenance operations such as compactions.
The maintenance manager runs a preset number of threads, so the amount of 
background I/O you're experiencing won't increase if you increase the number of 
upserts.
I'm curious, is the background I/O causing an issue, or just unexpected?
Thanks-Todd-- 
Todd Lipcon
Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera






Reply via email to