Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8090 )

Change subject: KUDU-1125: issue one catalog write per tablet report
......................................................................


Patch Set 7: Code-Review+1

(4 comments)

lgtm after you address Mike's comments

http://gerrit.cloudera.org:8080/#/c/8090/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8090/6//COMMIT_MSG@10
PS6, Line 10: When the
            : master is configured to fsync WAL writes, this can add a lot of 
load during
            : election storms and when the master is restarted.
> Haven't done any perf testing yet. I was planning to do it post-merge, but
That's fair. If it's going to take a lot of effort I don't think it's worth 
trying to hit this issue. It might also take a lot of machines since we won't 
generate writes for TOMBSTONED tablets, and the non-tombstoned tablet count is 
limited by the thread scalability issues on the TS side. I'd skip doing 
anything fancy for now and just try to enable fsync on a cluster with a 
reasonable number of tablets and see if there's a noticeable difference in the 
TSHeartbeat latency profile (there should be an enormous one)


http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3106
PS6, Line 3106:                "requestor", rpc->requestor_string(),
> Todd, I'm not sure how your example can help measure latency. Did you mean
well, /rpcz already keeps samples based on the overall latency of the request 
handling. For each sampling bucket (eg >1sec) it'll keep a single sampled 
trace, which includes the metrics. So the idea is that if we go to /rpcz and 
look at the ">1sec TSHeartbeat" sample, we'll have a useful metric to see 
whether the report was particularly large or something.


http://gerrit.cloudera.org:8080/#/c/8090/6/src/kudu/master/catalog_manager.cc@3285
PS6, Line 3285:       // leader elected. We need to wait for a leader before 
marking a tablet
              :       // as RUNNING, or else
> This TODO is trying to draw attention to the fact that ShouldTransitionTabl
Ack


http://gerrit.cloudera.org:8080/#/c/8090/7/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/8090/7/src/kudu/master/catalog_manager.cc@3173
PS7, Line 3173:         // reasonable to ignore it and wait for an operator fix 
the situation.
> The end of this comment only makes sense after going to the flag definition
I had a similar thought but seems out of scope for this commit. Perhaps add a 
TODO to consider removing this flag? Seems like it's only used by one test, and 
that test's goal is just to see that the flag works :)



--
To view, visit http://gerrit.cloudera.org:8080/8090
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6f5cf0e4b1cd1160b3b310d89c6dbf3dd62e43b
Gerrit-Change-Number: 8090
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Fri, 06 Oct 2017 00:36:57 +0000
Gerrit-HasComments: Yes

Reply via email to