Mike Percy has posted comments on this change.

Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background
......................................................................


Patch Set 14:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4363/13/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

PS13, Line 136: enable_undo_delta_block_gc
> ah, I missed this flag lived here (I'm reviewing in reverse file order) yea
tablet.cc is the best place to prevent registration of the task since we 
register it here in Tablet::RegisterMaintenanceOps(). However if you think I 
should DECLARE it here and DEFINE it in tablet_mm_ops.cc then I can do that. I 
originally had it run but mark the task as not runnable but JD asked me to 
prevent it from running at all, so I changed it to do this.


PS13, Line 1764: InitAncientUndoDeltas
> why are you returning bytes_in_ancient_undos? the op seems to ignore this.
Added a short-circuit to UndoDeltaBlockGCOp::Perform() if this returns 0 bytes.


PS13, Line 1774: // No need to hold the selection lock here, since we are not 
actually making
               :   // any changes.
> nit: this is slightly confusing cuz we do get a read lock to copy them, I t
Done


PS13, Line 1782:   vector<pair<size_t, int64_t>> 
rowset_ancient_undos_est_sizes; // index, bytes
               :   rowset_ancient_undos_est_sizes.reserve(rowsets.size());
               :   for (size_t i = 0; i < rowsets.size(); i++) {
               :     const auto& rowset = rowsets[i];
               :     int64_t bytes;
               :     
RETURN_NOT_OK(rowset->EstimateBytesInPotentiallyAncientUndoDeltas(ancient_history_mark,
               :                                                                
       &bytes));
               :     rowset_ancient_undos_est_sizes.emplace_back(i, bytes);
               :   }
> are adding already intialized undo deltas part of this? why? eventually thi
It's estimating the size of each rowset so we can initialize them greedily, 
starting with the largest first. Updated some of the comments.


PS13, Line 1793: td::sort(rowset_ancient_undos_est_sizes.begin(), 
rowset_ancient_undos_est_sizes.end(),
               :             [&](const pair<size_t, int64_t>& a, const 
pair<size_t, int64_t>& b) {
               :               return a.second > b.second; // Descending order.
               :             });
> if we only had the uninitialized undos in rowset_ancient_undos_est_sizes wo
Yes, because that means we start checking and initializing with the rowset with 
the most data in its undos.


PS13, Line 1834: std::lock_guard<std::mutex> compact_lock(compact_select_lock_);
> we hold this lock for the whole time? all other uses of this lock in tablet
Oh, you're right. Fixed.


http://gerrit.cloudera.org:8080/#/c/4363/13/src/kudu/tablet/tablet_mm_ops.cc
File src/kudu/tablet/tablet_mm_ops.cc:

PS13, Line 276: stats->set_runnable(true);
> set runnable to false if there no data to gc
Done


PS13, Line 283: WARN_NOT_OK(tablet_->InitAncientUndoDeltas(time_budget, 
&tablet_bytes_in_ancient_undos),
              :               Substitute("$0Unable to initialize old undo delta 
files on tablet $1",
              :                           LogPrefix(), tablet_->tablet_id()));
> is tablet_bytes_in_ancient_undos defined if the method returns not ok? seem
Good point, I hadn't reconsidered the method signature after moving it out of 
UpdateStats(). Done.


http://gerrit.cloudera.org:8080/#/c/4363/13/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

PS13, Line 69: DEFINE_int64(data_gc_min_size_mb, 0,
             :              "The (exclusive) minimum number of megabytes of 
ancient data on "
             :              "disk, per tablet, needed to prioritize deletion of 
that data.");
> do we actually need this knob? when would we turn this up or down? if you w
My thought was that since I am not really sure what the ideal settings are yet, 
2 knobs would allow us to set this high but the prioritization probability 
high, for example.

With the new data GC job, disabling can take place with the "enable_..." flag, 
but if you think it's generally useful for the future we could add -1 as a 
special value here... but we don't need it yet.


PS13, Line 324: op->io_usage() == MaintenanceOp::LOW_IO_USAGE
> remind me what this is doing?
We have to give some classification to each job so I chose LOW_IO_USAGE for the 
data GC job... but I think that's the wrong classification now that we are 
doing the Init() on the worker thread. I'll remove this part.


PS13, Line 395: rand_.NextDoubleFraction() > FLAGS_data_gc_prioritization_prob
> nit: can you revert this (<= not >) and flip the if?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

Reply via email to