[ 
https://issues.apache.org/jira/browse/SENTRY-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Li resolved SENTRY-1813.
---------------------------
       Resolution: Duplicate
    Fix Version/s:     (was: 2.1.0)
                   2.0.0

> LeaderStatusMonitor could get into limbo state upon ZK connection loss
> ----------------------------------------------------------------------
>
>                 Key: SENTRY-1813
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1813
>             Project: Sentry
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Vamsee Yarlagadda
>            Assignee: Vamsee Yarlagadda
>             Fix For: 2.0.0
>
>         Attachments: Screenshot.png
>
>
> I noticed that during failover testing, if there was a connection loss with 
> ZK to the sentry servers, the one who is currently the leader gets into a 
> limbo state as it interrupts the Curator-LeaderSelector thread which no 
> longer gets revived in the running Sentry process (unless the process is 
> restarted).
> Relevant code under LeaderStatusMonitor
> http://github.mtv.cloudera.com/CDH/sentry/blob/cdh5-1.5.1/sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/LeaderStatusMonitor.java#L243-L246
> {code}
>    try {
>       isLeader = true;
>       // Wait until we are interrupted or receive a signal
>       cond.await();
>     } catch (InterruptedException ignored) {
>       Thread.currentThread().interrupt();
>       LOG.info("LeaderStatusMonitor: interrupted");
>     } finally {
>       isLeader = false;
>       lock.unlock();
>       LOG.info("LeaderStatusMonitor: becoming standby");
>     }
> {code}
> I realized even upon the loss of ZK connection, curator framework raises an 
> Interrupted Exception in LeaderStausMonitor which attempts to call interrupt 
> on Thread.currentThread which is essentially *Curator-LeaderSelector* thread.
> <SCREENSHOT_ATTACHED>
> So if the LeaderSelector thread is interrupted, this particular Sentry server 
> loses the capability of participating in a leader election in the future. And 
> if this happens to all the sentry servers in the cluster, any further loss 
> could get into a limbo state.
> And during this state, Sentry no longer reads events from HMS and thereby 
> users can no longer be able to issue DDL statements like CREATE etc. However 
> GRANT, REVOKE still work as they don't go through HMSFollower.
>   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to