Will Berkeley has posted comments on this change.

Change subject: KUDU-1860: ksck doesn't identify tablets that are evicted but 
still in config
......................................................................


Patch Set 7:

(54 comments)

http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

Line 471:   // The ids of the tablets.
> Can we add: An empty list means return info for all tablets known to the ta
Done


PS7, Line 557:   // Returns the consensus state for a tablet.
             :   rpc GetConsensusState(GetConsensusStateRequestPB) returns 
(GetConsensusStateResponsePB);
> Can we delete this function now? Gotta love deleting code. I don't think th
Done


Line 560:   // Returns the consensus state for a set of tablets.
> I think we should document whether this returns info for tombstoned tablets
Done


PS7, Line 561: GetConsensusStates
> I think GetConsensusState() is still a valid name for this endpoint, or if 
I like GetConsensusState.


http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/tools/ksck-test.cc
File src/kudu/tools/ksck-test.cc:

Line 70:       const std::string& tablet_id,
> warning: parameter 'tablet_id' is unused [misc-unused-parameters]
Done


Line 71:       const Schema& schema,
> warning: parameter 'schema' is unused [misc-unused-parameters]
Done


PS7, Line 232:   
> nit: spacing
Done


PS7, Line 236: InsertOrDie
> Do we also need to worry about the duplicate UUID issue here?
No. This is test code that's manually setting up a vanilla consensus 
configuration on each mock TS.

If you wanted to test a case like that, you'd reach in and set it up in the 
specific test, as I do in the tests below for various situations


Line 243:                            string tablet_id,
> warning: the parameter 'tablet_id' is copied for each invocation but only u
Done


Line 243:                            string tablet_id,
> I think this is complaining because we should either specify the input argu
Kept it simple for the test and went with const string&.


Line 394:   FLAGS_check_consensus_info = false;
> No need to reset your flags between tests due to KuduTest::flag_saver_
:)


PS7, Line 401: shared_ptr<KsckTabletServer> ts
> is it safe to declare this const shared_ptr<KsckTabletServer>& ts ?
Done


PS7, Line 401: "ts-id-0")
> can we make "ts-id-0" and other identifier strings in here (except perhaps 
I don't think making constants for the whole file or for KsckTest makes sense 
just to cover these three specific tests, and the literals are actually used 
just once per test, so I don't think it makes it clearer or easier to store 
them in a variable.

FWIW, it's also common to use a literal "ts-id-0" or similar in other tests in 
this file.


Line 413:   FLAGS_check_consensus_info = false;
> also here
Done


Line 432:   FLAGS_check_consensus_info = false;
> and here
Done


Line 451:   FLAGS_check_consensus_info = false;
> and here
Done


Line 453: 
> Would also be interesting to test the duplicate TS UUID case.
You mean the rare + bad case from L123 of ksck_remote.cc? That code path can't 
be tested here as that situation is dealt with by the RemoteKsckTabletServer, 
not by Ksck.

What would happen, though, is that one of the duplicates' (same ts_id and 
tablet id) consensus info would be left out of the info provided to Ksck. I 
think it'd complicate the code to cover that rare edge case nicely, but it is 
easy to spit out a warning if we run into a duplicate. I'll do that.


http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

Line 63: using std::left;
> warning: using decl 'left' is unused [misc-unused-using-decls]
Done


PS7, Line 623: peers_from_master
> nit: peer_uuids_from_master might be clearer since when I read peer I think
Done


PS7, Line 624:   std::transform(tablet->replicas().cbegin(), 
tablet->replicas().cend(),
             :                  peers_from_master.begin(),
             :                  [](const std::shared_ptr<KsckTabletReplica>& r) 
{ return r->ts_uuid(); });
> nit: This could be more simply written as a for-each loop:
Done


PS7, Line 627: boost::none
> Doesn't the master know the OpId index?
In its cmeta for the tablet I would guess so but that's not actually info that 
ksck gathers presently. KsckRemoteMaster is sneaky and gets its info about the 
master's POV on tablets by requesting scan tokens for a scan it'll never do. 
We'd have to refactor a fair bit more to add in the term/opid.


PS7, Line 627: -1
> nit: magic number. Also, doesn't the master know the term?
Magic number removed by using an enum. See above for why the master doesn't 
have the term here.


PS7, Line 649: strings::Substitute("$0:$1", ts->uuid(), tablet->id())
> as noted elsewhere this seems hacky compared to:
Done


PS7, Line 658: peers_from_replica
> same as above on naming
Done


PS7, Line 659:         std::transform(peers.begin(), peers.end(), 
peers_from_replica.begin(),
             :                        [](const consensus::RaftPeerPB& pb) { 
return pb.permanent_uuid(); });
> nit: same as above regarding functional vs. iterative but I guess it's also
Done


PS7, Line 693:     conflicting_states = std::count_if(replica_states.cbegin(), 
replica_states.cend(),
             :         [&](const ReplicaState& r) -> bool { return 
r.consensus_state != master_config; });
> nit: this is quite wordy compared to:
Done


Line 728:   } else if (conflicting_states > 0) {
> It seems quite easy to achieve this on a large cluster, which may be moving
I think that ksck "cries wolf" whenever there's nominal tablet movement going 
on in a cluster-- under-replication and consensus conflicts. Maybe later we can 
do a more thorough job of separating out meaningful warnings; for now, users 
can re-run ksck to see if the problem persists (targeting a specific table or 
tablet if needed).


PS7, Line 778: all_peers
> nit: peer_uuid_mapping?
Done


PS7, Line 795:     // A lambda to print out consensus from the point of view of 
the master and each replica.
             :     auto print_config = [&all_peers](const string& name,
             :                                      const 
boost::optional<KsckConsensusState>& config)
> Perhaps this lambda can be extracted into a function? Ksck::VerifyTablet() 
Done


Line 815:       if (config->term() >= 0) {
> Seems like term should be an optional
Done


PS7, Line 817: config->opid_index().get() >= 0
> Do we ever expect a negative number here? I don't think that's possible.
Done


PS7, Line 820:         Out() << Substitute("  $0.$1", config->term(), opid_str);
> I don't like this layout because this looks like this was the last log entr
Done


PS7, Line 834: "master"
> perhaps "config from master" to make this easier to understand?
Done


http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

PS7, Line 97: bool pending
> can we make this an enum?
Good idea. I think if we have three types: MASTER, COMMITTED, and PENDING, it 
should make the class easier to understand.


Line 109:   bool pending() const {
> Needs API docs on the methods in this class, just a one liner like:
N/A. Removed method


PS7, Line 117: boost::optional<int64_t>
> const boost::optional<int64_t>& ?
N/A. Removed method


PS7, Line 121: const boost::optional<std::string> 
> const boost::optional<std::string>& ?
N/A. Removed method


Line 129:   bool operator==(const KsckConsensusState &other) const {
> Would be useful to doc this equality operator
Done


Line 130:     bool same_except_term = leader_uuid_ == other.leader_uuid_ && 
peers_ == other.peers_;
> why not check pending_?
pending_ removed in favor of enum type; type is now checked


Line 131:     if (term_ >= 0 && other.term_ >= 0) {
> why not just term_ == other.term_ ?
N/A with addition of MASTER type


Line 137:  private:
> It looks like we can make all of these member variables const because they'
I made it struct since that seems simpler. I want the copy assign operator 
because of ksck's logic of building a replica state one bit at a time, so I 
didn't make the fields const.


Line 139:   int64_t term_;
> Since we are using a magic number -1 somewhere in ksck.cc maybe we should m
Done


PS7, Line 142: peers_
> Not sure what this is. How about peer_uuids_ ?
Done


PS7, Line 142: std::set
> I guess using a std::set here saves us a sort if we want to print these pee
Yup. That's the idea.


PS7, Line 238: std::string
> Hmm, I think it would be a little cleaner if we keyed this map by std::pair
Done


PS7, Line 247: FetchConsensusInfo
> nit: Maybe FetchConsensusState() would be more clear
Done


PS7, Line 444: CONFLICTED
> nit: maybe CONSENSUS_MISMATCH? Also add a comment to document its meaning.
Done


http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/tools/ksck_remote.cc
File src/kudu/tools/ksck_remote.cc:

Line 116:   for (const auto& entry : tablet_status_map_) {
> Is there a reason we don't just ask for all of them?
No reason. My bad-- ksck lets you filter which tables to look at, but that is 
done at a higher level than this, so we will always gather all the info here.


PS7, Line 122: Don't crash
> Seems like we should warn in this case? Or perhaps detect this and warn els
Came to the same conclusion while looking at another one of your comments 
before I saw this one. Done.


http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/tools/ksck_remote.h
File src/kudu/tools/ksck_remote.h:

PS7, Line 58: FetchConsensusInfo
> nit: Maybe FetchConsensusState()?
Done. I also renamed the flag from check_consensus_info to 
check_consensus_state.


http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/tools/tool_action_cluster.cc
File src/kudu/tools/tool_action_cluster.cc:

PS7, Line 138: check_consensus_info
> how about just --consensus ?
Done


http://gerrit.cloudera.org:8080/#/c/6772/7/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

PS7, Line 1067: req->tablet_ids()
> Is this is empty we can get them all via tablet_manager_->GetTabletReplicas
Done


Line 1069:     if (!tablet_manager_->GetTabletReplica(tablet_id, 
&tablet_replica).ok()) {
> As mentioned elsewhere, this excludes tombstoned tablets. I actually think 
I agree we should leave it out and perhaps add something in later to get it 
later.


PS7, Line 1079:     ConsensusStatePB state = consensus->ConsensusState();
              :     tablet_info->mutable_cstate()->Swap(&state);
> do this in one line like:
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6772
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16e4de09821b372c3773b4ade3fd9e37ab818808
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-HasComments: Yes

Reply via email to