ZhangYao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13412 )

Change subject: KUDU-2826: Add tail for mutation list to accelerate update in 
memrowset
......................................................................


Patch Set 3:

(3 comments)

@Adar Dembo@Todd Lipcon
I run TestMemRowSet.TestMemRowSetInsertCountAndScan with --roundtrip_num_rows = 
1000000.
[Before change]:
Time spent Inserting rows: real 0.593s  user 0.530s sys 0.050s
Time spent Counting rows: real 0.011s  user 0.010s sys 0.000s
Time spent Scanning rows where none are committed: real 0.014s user 0.010s sys 
0.000s
Time spent Scanning rows where all are committed: real 0.045s  user 0.040s sys 
0.000s

[After change]:
Time spent Inserting rows: real 0.594s  user 0.570s sys 0.000s
Time spent Counting rows: real 0.012s  user 0.010s sys 0.000s
Time spent Scanning rows where none are committed: real 0.015s user 0.020s sys 
0.000s
Time spent Scanning rows where all are committed: real 0.043s  user 0.040s sys 
0.000s

Becase my change only affects the update in memrowset, so it will not affect 
the insert and scan,
so I don't expect any difference for this test, and this result also 
illustrates this.

I add a update microbenchmark in memrowset-test named 
TestMemRowSetUpdatePerformance.
Use --update_ratio to control the update percent of rows and --times_to_update 
to control
the update times for selected row. I make a test with --update_ratio = 0.2 and 
--update_ratio = 5000.

[Before change]:
Time spent Inserting rows: real 0.001s user 0.000s sys 0.000s
Time spent Updating rows: real 192.291s  user 191.650s sys 0.230s

[After change]:
Time spent Inserting rows: real 0.001s user 0.000s sys 0.000s
Time spent Updating rows: real 3.130s  user 3.080s sys 0.040s

The performance improvement by this change is adorable.

By the way, the result of a Zipfian update YCSB workload will be done in few 
days.

http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/memrowset.h
File src/kudu/tablet/memrowset.h:

http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/memrowset.h@102
PS2, Line 102:     memrowset_ = memrowset;
             :   }
             :
             :   const Schema* schema() const;
             :
             :   Timestamp insertion_timestamp() const { return 
header_->insertion_timestamp; }
             :
             :   Mutation* redo_head() {
> For added clarity, I'd rewrite this:
Done.


http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/memrowset.cc
File src/kudu/tablet/memrowset.cc:

http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/memrowset.cc@70
PS2, Line 70:   const Mutation *mut_tail = header_->redo_tail;
> I think you probably want to update this function to use the tail. If the t
I have update this function to  use the tail, but this will not check the 
rationality of updates(such as reinsert must behind delete) anymore.


http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/mutation.h
File src/kudu/tablet/mutation.h:

http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/mutation.h@87
PS2, Line 87:
> Rather than passing this MRS-specific data structure in, can we pass the he
Done.



--
To view, visit http://gerrit.cloudera.org:8080/13412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I945f332f8241ecb3cc6ab66e67ee3ee2b4e49be8
Gerrit-Change-Number: 13412
Gerrit-PatchSet: 3
Gerrit-Owner: ZhangYao <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Yao Xu <[email protected]>
Gerrit-Reviewer: ZhangYao <[email protected]>
Gerrit-Comment-Date: Mon, 27 May 2019 06:46:43 +0000
Gerrit-HasComments: Yes

Reply via email to