Alexey Serbin has posted comments on this change.

Change subject: [catalog_manager] proper handling of catalog shutdown
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6170/2/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

PS2, Line 458:  {
             :           bool error_is_not_leader;
             :           const Status& s =
             :               
catalog_manager_->TryGenerateNewTskUnlocked(&error_is_not_leader);
             :           if (!s.ok()) {
             :             LOG(WARNING) << "Failed to add new TSK: " << 
s.ToString();
             :             if (!error_is_not_leader) {
             :               // Do not expect any other errors besides 
NOT_THE_LEADER error.
             :               CHECK_OK(s);
             :             }
             :           }
             :         }
> say that the error is something else (like the machine is partitioned) does
AFAIK, your understanding is correct -- it might happen that checks in the 
ScopedLeaderSharedLock ctor went fine, but then situation might change down the 
road.  For example, the master could lose its leadership role while holding 
that shared lock -- there is no guarantee it will stay a master up to the very 
end of the operation which holds that lock.

I think you are correct -- we shouldn't crash the master server if we can let 
it continue working knowing its operation  in such a state does not bring 
inconsistency into the clustered master.  This is exactly the case: if master 
some error happened and the master cannot write the newly generated TSK into 
the system table, we should not let it run only if it stays the leader master 
AND no any active TSK is left.  I need to figure out how to check that master 
is still a leader (the latter part with TSK it clear to me).  Probably, I can 
use the underlying system_catalog_'s consensus implementation to figure that 
out.

This is a background task which runs on its own -- I'm not sure it's possible 
to implement the behavior you described.  However, I might be missing something.


PS2, Line 751:   if (!(s.ok() || s.IsNotFound())) {
             :     return s;
             :   }
> could you leave a comment that we only handle the OK and NotFound cases bel
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to