Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12048 )
Change subject: KUDU-2618 Factor the amount of data into time-based flush decisions ...................................................................... Patch Set 1: (10 comments) Did you test this on a real cluster? http://gerrit.cloudera.org:8080/#/c/12048/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12048/1//COMMIT_MSG@20 PS1, Line 20: This Nit: consider starting a new paragraph here. http://gerrit.cloudera.org:8080/#/c/12048/1//COMMIT_MSG@24 PS1, Line 24: Kudu will wait longer before flushing Note that some integration tests (such as node_density-itest) set flush_threshold_secs to 1 in order to force the tserver to flush as quickly as possible. Such tests are probably flushing far less now. To maintain the existing behavior, you should look at all tests that override flush_threshold_secs and flush_threshold_mb, and probably override flush_threshold_upper_bound_secs too. http://gerrit.cloudera.org:8080/#/c/12048/1//COMMIT_MSG@27 PS1, Line 27: (memory-pressure-bassed flushing would kick in then) and the amount of based http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc File src/kudu/tablet/tablet_replica_mm_ops.cc: http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@75 PS1, Line 75: DEFINE_int32(flush_threshold_upper_bound_secs, 60 * 60, Should probably tagged experimental, or at least advanced. http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@76 PS1, Line 76: MemRowSet Not just MRS. http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@102 PS1, Line 102: if (anchored_mb > FLAGS_flush_threshold_mb) { Rereading this code after not looking at it for years, I was somewhat surprised that we convert anchored RAM into a perf improvement score. I went back and reread how the maintenance manager prioritizes operations and summarized it below: 1. The low-IO operation that retains the most WAL bytes, if one exists (LogGC). 2. If we're under memory pressure (60% of the hard limit), the operation that anchors the most RAM (FlushMRS or FlushDMS). 3. The operation that retains the most WAL bytes provided that quantity is at least 1 GB (LogGC). 4. The operation that retains the most data bytes, selected only half of the time if there's an operation with a perf improvement (UndoDeltaGC). 5. The operation with the best perf improvement (various). I guess the idea is that we still want to flush even when not under memory pressure, and given the above prioritization, the only way to get the MM to flush is to show a perf improvement. http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@106 PS1, Line 106: MRS Since this function is also used to set perf improvements for DMS flushes, I'd reword the comments here to account for the fact that we could be talking about a DMS, not an MRS. http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@121 PS1, Line 121: ti time http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@128 PS1, Line 128: target_rowset_size Nit: target_rowset_size_mb http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@133 PS1, Line 133: if (perf > 1.0) { else if -- To view, visit http://gerrit.cloudera.org:8080/12048 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I27fbd3ec34c5615d07deaf47b3400ca7c4ea426a Gerrit-Change-Number: 12048 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Thu, 06 Dec 2018 22:58:09 +0000 Gerrit-HasComments: Yes