Mike Percy has posted comments on this change. Change subject: KUDU-1601. Delete ancient UNDO delta blocks in the background ......................................................................
Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/4363/5//COMMIT_MSG Commit Message: PS5, Line 15: The task was written to operate at the server level because the : UpdateStats() part of the task spends a budgeted amount of time opening : (initializing) undo delta blocks in order to read the stats from their : headers. > Maybe I'm missing something obvious, but I don't see how the explanation (s > What is it about UpdateStats() doing I/O that forces the op to be > server-level, or makes it more desirable? The idea is to attempt to budget the amount of time that can be spent per MM cycle on initializing delta files. If we have # of these tasks == # tablets then it's not really obvious how to do that without building a special facility for it. Doing it at the server level makes it easy because it's effectively a singleton, so we can put that logic in UpdateStats(). > Would it be possible to make opening of eager UNDO delta blocks an MM op in > its own right? Doing I/O in UpdateStats() (regardless of at what level) seems > weird; it has the potential to stall other cheap MM operations (such as log > GC) if the I/O is blocked for a really long time. That seems problematic because it's not really clear how to schedule it. In general we want to sort-of "always" be opening deltas. Once we have all deltas open, it's pretty cheap to make decisions about GCing data (and we want to do it with high priority because it's cheap and important to do). > It also puts the I/O on the scheduler thread instead of on the MM operations > thread(s). I do agree that putting the scheduler at the mercy of disk latency is not ideal. OTOH, if the disk(s) are really so burdened, all of the MM ops will be slow anyway. > If you don't want to introduce two new ops, perhaps they can be "combined" > such that Perform() either opens some UNDO blocks, or does ancient history > GC, and we'll decide which approach to take during UpdateStats()? I suppose another way to do this could be to schedule the job with very high priority but not more than some fraction of the time (I've already added logic to do this, and defaulted it to 50% of the time as the max). Then Perform() would basically consist of: InitializeAncientBlocks(tablets, budget_deadline); DeleteAncientBlocks(tablets); Such an approach doesn't try to solve the related but separate problem of eventually opening *all* delta files, not just ancient ones... but maybe it could be made to in a follow-up change. http://gerrit.cloudera.org:8080/#/c/4363/5/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 2357: // The catalog manager has only one tablet peer. > It's possible for sys_catalog_ to not yet be set, in which case this functi Thanks for the catch. -- To view, visit http://gerrit.cloudera.org:8080/4363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0309bf7acfb6d018860c80f354012c3500da5c68 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
