Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8090 )
Change subject: KUDU-1125: issue one catalog write per tablet report ...................................................................... Patch Set 6: (10 comments) http://gerrit.cloudera.org:8080/#/c/8090/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8090/6//COMMIT_MSG@10 PS6, Line 10: When the : master is configured to fsync WAL writes, this can add a lot of load during : election storms and when the master is restarted. any chance you did a before/after test here? eg start a small cluster with 1000 tablets per node and grab some metric which shows the improvement? Also did you consider if there is any bound beyond which we end up trying to write single batches which are 50MB+ and thus might blow out the max RPC size when using multi-master? Some back-of-the-envelope math says that a pretty loose upper bound on SysTabletsEntryPB is probably 1KB, so we'd need a batch of 50,000 tablet writes before we overrun the 50MB limit, right? That shoudl be safe given we'd blow out a lot of per-TS limits long before this one, but would be good if you can sanity-check my thinking here. http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3106 PS6, Line 3106: "num_tablets", full_report.updated_tablets_size()); would be nice to also use TRACE_COUNTER_INCREMENT("reported_tablets", updated_tablets.size()) or somesuch here so we can see the metric in /rpcz if we get any latency outliers http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3122 PS6, Line 3122: unordered_map<string, ReportedTabletUpdatesPB*> updates; worth adding to the comment explaining ownership of these raw pointers http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3151 PS6, Line 3151: ReportedTabletUpdatesPB* update = full_report_update->add_tablets(); to cut down on small allocations here, consider doing full_report_update->mutable_tablets()->Reserve(full_report.update_tablets_size()) up top before the loop? http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3157 PS6, Line 3157: shared_lock<LockType> l(lock_); I think this would likely perform better if you can acquire this once for the whole report instead of acquire/release for each tablet. i.e a single "long" acquisition is probably better than N "short" acquisitions, considering that 'lock_' is only rarely accessed for write If such a restructuring is a pain, just add a TODO? http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3227 PS6, Line 3227: bootstrapping should this now say "copying"? http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3249 PS6, Line 3249: with an error nit: rephrase as "non-deleted tablet which is reporting an error" or somesuch? Initially I wasn't clear whether "with an error" was referring to "Skip" or "tablet" http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3268 PS6, Line 3268: if (!cstate.committed_config().has_opid_index()) { nit: would it be possible to move this if statement up a few lines before copying 'cstate' since I think this case is relatively common (servers may have thousands of tombstoned replicas) and because copying the cstate PB is probably expensive? (lots of embedded strings, etc) http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3285 PS6, Line 3285: // TODO(adar): ShouldTransitionTabletToRunning doesn't take step 7b into : // account. Should it? not sure I understand this question -- isn't the condition from ShouldTransitiontabletToRunning equivalent to the condition in 7b? is it actually redundant now? Maybe it's worth inlining that function here since it seems like, given the redundancy above, it is really just one or two boolean expressions? http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3344 PS6, Line 3344: ConsensusStatePB prev_cstate = tablet->metadata().state().pb.consensus_state(); this is a little confusing - this is shadowing a variable with the same name in outer scope? -- To view, visit http://gerrit.cloudera.org:8080/8090 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie6f5cf0e4b1cd1160b3b310d89c6dbf3dd62e43b Gerrit-Change-Number: 8090 Gerrit-PatchSet: 6 Gerrit-Owner: Adar Dembo <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Fri, 29 Sep 2017 18:38:54 +0000 Gerrit-HasComments: Yes
