Jean-Daniel Cryans created KUDU-1860:
----------------------------------------

             Summary: ksck doesn't identify tablets that are evicted but still 
in config
                 Key: KUDU-1860
                 URL: https://issues.apache.org/jira/browse/KUDU-1860
             Project: Kudu
          Issue Type: Bug
          Components: util
    Affects Versions: 1.2.0
            Reporter: Jean-Daniel Cryans
            Priority: Critical


As reported by a user on Slack, ksck can give you a wrong output such as:

{noformat}
  ca199fafca544df2a1b2a01be9d5266d (server1:7250): RUNNING [LEADER]
  a077957f627c4758ab5a989aca8a1ca8 (server2:7250): RUNNING
  5c09a555c205482b8131f15b2c249ec6 (server3:7250): bad state
    State:       NOT_STARTED
    Data state:  TABLET_DATA_TOMBSTONED
    Last status: Tablet initializing...
{noformat}

The problem is that server2 was already evicted out of the configuration (based 
on reading the logs) but it wasn't committed in the config (which contains 
server 1 and 3) since there's really only 1 server left out of 3.

Ideally ksck should try to see what each server thinks the configuration is and 
see if there's a difference from what's in the master. As it is, it looks like 
we're missing 1 replica but in reality this is a broken tablet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to