Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18503 )
Change subject: KUDU-3367 [compaction] add supplement to gc algorithm ...................................................................... Patch Set 31: (19 comments) http://gerrit.cloudera.org:8080/#/c/18503/31//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18503/31//COMMIT_MSG@9 PS31, Line 9: delta with full of delete op ... delta full of delete ops ... http://gerrit.cloudera.org:8080/#/c/18503/31//COMMIT_MSG@10 PS31, Line 10: no update op else in the file ... not a single update operation in the delta ... http://gerrit.cloudera.org:8080/#/c/18503/31//COMMIT_MSG@10 PS31, Line 10: The current compact algorithm will : not schedule the file do compact The current compaction algorithm doesn't run GC on such deltamemstores. http://gerrit.cloudera.org:8080/#/c/18503/31//COMMIT_MSG@11 PS31, Line 11: If such files exist, after : accumulating for a period of time, it will greatly affect our scan : speed. The accumulation of deltamemstores like that negatively affects performance of scan operations. http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/delta_tracker.h File src/kudu/tablet/delta_tracker.h: http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/delta_tracker.h@399 PS31, Line 399: need release storage need to release storage http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/delta_tracker.cc File src/kudu/tablet/delta_tracker.cc: http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/delta_tracker.cc@64 PS31, Line 64: DEFINE_uint64(all_delete_op_delta_file_cnt_for_compact, 1, : "The compaction scheduler will not deal the compact delta files " : "with only delete ops until the file count threshold reached. " What sort of files do we count here? Delta memstores? Would be great to be a bit more precise. Also, why it's important to have this as some sort sort of numeric threshold instead of just a boolean? http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/delta_tracker.cc@954 PS31, Line 954: // Get all delete op only delta file count. : // Ignore only one delete op for the delta file. Count the number of REDO delta stores that contain only DELETE operations where more than a single DELETE operation is present. http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/delta_tracker.cc@966 PS31, Line 966: kOutdatedInterval nit: why not to call this just 'max_delta_age'? http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/delta_tracker.cc@967 PS31, Line 967: style nit: wrong indent (see https://google.github.io/styleguide/cppguide.html#Function_Calls for acceptable options) http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test-base.h File src/kudu/tablet/diskrowset-test-base.h: http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test-base.h@155 PS31, Line 155: Delete some data at a given rate from the given rowset. Delete rows in the specified range from the given rowset. http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test.cc File src/kudu/tablet/diskrowset-test.cc: http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test.cc@647 PS31, Line 647: AffectCompaction Would be great to have more meaningful name here, or maybe add a comment explaining what this test scenario tries to verify. http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test.cc@648 PS31, Line 648: With this setting, we want major compactions to basically always have a score. Make major delta compaction runnable even with tiny amount of data accumulated across rowset's deltas. http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test.cc@650 PS31, Line 650: FLAGS_rowset_metadata_store_keys = true; Is this essential to have set flag set to 'true'? If yes, please add a comment explaining the reasoning; if not, remove this custom setting. http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test.cc@657 PS31, Line 657: Get Generate http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test.cc@671 PS31, Line 671: deleted data DELETE operations http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test.cc@671 PS31, Line 671: Get Generate http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test.cc@676 PS31, Line 676: + style nit: add spaces around '+' sign http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test.cc@681 PS31, Line 681: FLAGS_tablet_history_max_age_sec = 1; Please add a comment what's the intention behind setting this parameter so low. http://gerrit.cloudera.org:8080/#/c/18503/31/src/kudu/tablet/diskrowset-test.cc@684 PS31, Line 684: Compaction of the delete op only delta files will be scheduled Major delta compaction of the DRS should obtain a non-zero score. -- To view, visit http://gerrit.cloudera.org:8080/18503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8b26737dffecc17688b42188da959b2ba16351ed Gerrit-Change-Number: 18503 Gerrit-PatchSet: 31 Gerrit-Owner: KeDeng <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: KeDeng <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Comment-Date: Sun, 11 Dec 2022 00:38:55 +0000 Gerrit-HasComments: Yes
