ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/13412 )
Change subject: KUDU-2826: Add tail for mutation list to accelerate update in memrowset ...................................................................... Patch Set 3: (3 comments) @Adar Dembo@Todd Lipcon I run TestMemRowSet.TestMemRowSetInsertCountAndScan with --roundtrip_num_rows = 1000000. [Before change]: Time spent Inserting rows: real 0.593s user 0.530s sys 0.050s Time spent Counting rows: real 0.011s user 0.010s sys 0.000s Time spent Scanning rows where none are committed: real 0.014s user 0.010s sys 0.000s Time spent Scanning rows where all are committed: real 0.045s user 0.040s sys 0.000s [After change]: Time spent Inserting rows: real 0.594s user 0.570s sys 0.000s Time spent Counting rows: real 0.012s user 0.010s sys 0.000s Time spent Scanning rows where none are committed: real 0.015s user 0.020s sys 0.000s Time spent Scanning rows where all are committed: real 0.043s user 0.040s sys 0.000s Becase my change only affects the update in memrowset, so it will not affect the insert and scan, so I don't expect any difference for this test, and this result also illustrates this. I add a update microbenchmark in memrowset-test named TestMemRowSetUpdatePerformance. Use --update_ratio to control the update percent of rows and --times_to_update to control the update times for selected row. I make a test with --update_ratio = 0.2 and --update_ratio = 5000. [Before change]: Time spent Inserting rows: real 0.001s user 0.000s sys 0.000s Time spent Updating rows: real 192.291s user 191.650s sys 0.230s [After change]: Time spent Inserting rows: real 0.001s user 0.000s sys 0.000s Time spent Updating rows: real 3.130s user 3.080s sys 0.040s The performance improvement by this change is adorable. By the way, the result of a Zipfian update YCSB workload will be done in few days. http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/memrowset.h File src/kudu/tablet/memrowset.h: http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/memrowset.h@102 PS2, Line 102: memrowset_ = memrowset; : } : : const Schema* schema() const; : : Timestamp insertion_timestamp() const { return header_->insertion_timestamp; } : : Mutation* redo_head() { > For added clarity, I'd rewrite this: Done. http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/memrowset.cc File src/kudu/tablet/memrowset.cc: http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/memrowset.cc@70 PS2, Line 70: const Mutation *mut_tail = header_->redo_tail; > I think you probably want to update this function to use the tail. If the t I have update this function to use the tail, but this will not check the rationality of updates(such as reinsert must behind delete) anymore. http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/mutation.h File src/kudu/tablet/mutation.h: http://gerrit.cloudera.org:8080/#/c/13412/2/src/kudu/tablet/mutation.h@87 PS2, Line 87: > Rather than passing this MRS-specific data structure in, can we pass the he Done. -- To view, visit http://gerrit.cloudera.org:8080/13412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I945f332f8241ecb3cc6ab66e67ee3ee2b4e49be8 Gerrit-Change-Number: 13412 Gerrit-PatchSet: 3 Gerrit-Owner: ZhangYao <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Yao Xu <[email protected]> Gerrit-Reviewer: ZhangYao <[email protected]> Gerrit-Comment-Date: Mon, 27 May 2019 06:46:43 +0000 Gerrit-HasComments: Yes
