Will Berkeley has posted comments on this change.

Change subject: [WIP] KUDU-1860: ksck doesn't identify tablets that are evicted 
but still in config
......................................................................


Patch Set 1:

(8 comments)

I implemented a GetAllConsensusStates RPC and used it instead of one 
GetConsensusState RPC per tablet replica; however, this makes the new features 
available only for servers >= 1.4. Should we stick both forms of logic in so it 
works on older servers?

http://gerrit.cloudera.org:8080/#/c/6772/1/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

PS1, Line 210: s.ToString()
> This s refers to the one from line 204, when I assume it should be the one 
Done


PS1, Line 616: shared_ptr<KsckTabletReplica> r
> const shared_ptr<KsckTabletReplica>& r
Done


PS1, Line 624: std::shared_ptr<KsckTabletReplica> r
> same
Done


Line 716:   } else if (consensus_conflict) {
> If we catch a replica in the middle of a config change then we can still se
Yes, that's part of a core problem with these consensus checks: we can't 
retrieve the information from all servers simultaneously, and config changes 
can happen as a normal part of the tablet's operation. However, AFAICT 
consensus doesn't track any timestamps on its configs, so we'd have to 
introduce that mechanism just for these checks. Would it be better to, instead 
of surfacing all conflicts, have some known bad situations that we check for 
and alert on? We could additionally allow ALL info to be printed, so a 
knowledgable admin could use that to investigate misbehaving tablets that our 
heuristics don't cover.


http://gerrit.cloudera.org:8080/#/c/6772/1/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

PS1, Line 100: peers
> you should also std::move(peers) here
Tidy bot caught that on patch set 2; it's done in set 3.


PS1, Line 124:   int64_t term_;
             :   boost::optional<string> leader_uuid_;
             :   vector<string> peers_;
> it looks like these can all be const
If you make them const it breaks the assignment at ksck.cc L663 since the 
copy-assign operator is implicitly deleted. I think the compiler with 
copy-elide there.


http://gerrit.cloudera.org:8080/#/c/6772/1/src/kudu/tools/ksck_remote.cc
File src/kudu/tools/ksck_remote.cc:

PS1, Line 114: for (auto it = tablet_status_map_.begin(); it != 
tablet_status_map_.end(); it++)
> Since you're not mutating tablet_status_map_, it's more idiomatic to write 
Done


PS1, Line 120: InsertOrDi
> It's not safe to trust that multiple remotes won't have the same UUID and t
Good point.


-- 
To view, visit http://gerrit.cloudera.org:8080/6772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-HasComments: Yes

Reply via email to