David Ribeiro Alves has posted comments on this change. Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background ......................................................................
Patch Set 9: (9 comments) http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tserver/tserver_mm_ops.cc File src/kudu/tserver/tserver_mm_ops.cc: PS9, Line 34: . maybe add "(doesn't prevent compactions from deleting ancient data)" PS9, Line 37: DEFINE_int32(undo_delta_block_gc_init_budget_millis, 100, this should likely be tagged as advanced, maybe even hidden. PS9, Line 43: DEFINE_double(undo_delta_block_gc_delete_max_selection_ratio, 0.5, : "The maximum fraction of the time that undo delta block gc may be selected " : "to run by the maintenance manager. This is measured by the ratio of number " : "of times UpdateStats() is called over the number of times Perform() is " : "called on the UndoDeltaBlockGCOp. This is intended to avoid that task " : "starving other maintenance tasks."); tags (hidden, likely). this is hard to understand as a prioritization cap. http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/tserver/tserver_mm_ops.h File src/kudu/tserver/tserver_mm_ops.h: PS9, Line 1: // Licensed to the Apache Software Foundation (ASF) under one Not sure I got the complete picture of all the discussions and the controversy around the server level mm. I think it's a great idea to have one but I think that MaintenanceMananger::FindBestOp() is way too tablet specific for the current implementation to make any sense in this case. In specific I think it's forcing you to expose flags that we don't really want if we had a server level implementation of that maintenance manager method. PS9, Line 55: OVERRIDE use c++11 overrides PS9, Line 65: LogPrefix docs http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: PS9, Line 69: The target minimum size (in megabytes), per tablet, of ancient " : "data on disk needed to prioritize deletion of that ancient data. this sentence is very hard to parse. also why do we need a min? if we have a single gc'able undo delta shouldn't we always gc it, even if at a lower priority? in other words shouldn't the scheduler prioritize gc from something like 0 (no data to gc) to 1 (a lot of data to gc/disk is full) in a continuous spectrum? http://gerrit.cloudera.org:8080/#/c/4363/9/src/kudu/util/maintenance_manager.h File src/kudu/util/maintenance_manager.h: PS9, Line 118: Important: Update Clear() when adding fields to this class. nice add, but could you move it to the class header? PS9, Line 135: The approximate amount of disk space that not doing this operation keeps us from GCing from : // the data blocks. I couldn't parse this, I know its a parallel comment to the fields above, but those are pretty unparseable too. How about: "Approximate amount of data that would be GCd if this operation ran" -- To view, visit http://gerrit.cloudera.org:8080/4363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
