[ 
https://issues.apache.org/jira/browse/KUDU-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066476#comment-17066476
 ] 

heng zhao commented on KUDU-3082:
---------------------------------

You can try "remote_replica delete" to delete the tablet replica, forcing them 
to be re-replicated from the leader .

[https://kudu.apache.org/docs/troubleshooting.html#cfile_corruption]

I solved it in kudu 1.7.0-cdh5.16.2

> tablets in "CONSENSUS_MISMATCH" state for a long time
> -----------------------------------------------------
>
>                 Key: KUDU-3082
>                 URL: https://issues.apache.org/jira/browse/KUDU-3082
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 1.10.1
>            Reporter: YifanZhang
>            Priority: Major
>
> Lately we found a few tablets in one of our clusters are unhealthy, the ksck 
> output is like:
>  
> {code:java}
> Tablet Summary
> Tablet 7404240f458f462d92b6588d07583a52 of table '' is conflicted: 3 
> replicas' active configs disagree with the leader master's
>   7380d797d2ea49e88d71091802fb1c81 (kudu-ts26): RUNNING
>   d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING
>   47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
> All reported replicas are:
>   A = 7380d797d2ea49e88d71091802fb1c81
>   B = d1952499f94a4e6087bee28466fcb09f
>   C = 47af52df1adc47e1903eb097e9c88f2e
>   D = 08beca5ed4d04003b6979bf8bac378d2
> The consensus matrix is:
>  Config source |     Replicas     | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
>  master        | A   B   C*       |              |              | Yes
>  A             | A   B   C*       | 5            | -1           | Yes
>  B             | A   B   C        | 5            | -1           | Yes
>  C             | A   B   C*  D~   | 5            | 54649        | No
> Tablet 6d9d3fb034314fa7bee9cfbf602bcdc8 of table '' is conflicted: 2 
> replicas' active configs disagree with the leader master's
>   d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING
>   47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
>   5a8aeadabdd140c29a09dabcae919b31 (kudu-ts21): RUNNING
> All reported replicas are:
>   A = d1952499f94a4e6087bee28466fcb09f
>   B = 47af52df1adc47e1903eb097e9c88f2e
>   C = 5a8aeadabdd140c29a09dabcae919b31
>   D = 14632cdbb0d04279bc772f64e06389f9
> The consensus matrix is:
>  Config source |     Replicas     | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
>  master        | A   B*  C        |              |              | Yes
>  A             | A   B*  C        | 5            | 5            | Yes
>  B             | A   B*  C   D~   | 5            | 96176        | No
>  C             | A   B*  C        | 5            | 5            | Yes
> Tablet bf1ec7d693b94632b099dc0928e76363 of table '' is conflicted: 1 
> replicas' active configs disagree with the leader master's
>   a9eaff3cf1ed483aae849549999d649a (kudu-ts23): RUNNING
>   f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING
>   47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
> All reported replicas are:
>   A = a9eaff3cf1ed483aae849549999d649a
>   B = f75df4a6b5ce404884313af5f906b392
>   C = 47af52df1adc47e1903eb097e9c88f2e
>   D = d1952499f94a4e6087bee28466fcb09f
> The consensus matrix is:
>  Config source |     Replicas     | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
>  master        | A   B   C*       |              |              | Yes
>  A             | A   B   C*       | 1            | -1           | Yes
>  B             | A   B   C*       | 1            | -1           | Yes
>  C             | A   B   C*  D~   | 1            | 2            | No
> Tablet 3190a310857e4c64997adb477131488a of table '' is conflicted: 3 
> replicas' active configs disagree with the leader master's
>   47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER]
>   f0f7b2f4b9d344e6929105f48365f38e (kudu-ts24): RUNNING
>   f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING
> All reported replicas are:
>   A = 47af52df1adc47e1903eb097e9c88f2e
>   B = f0f7b2f4b9d344e6929105f48365f38e
>   C = f75df4a6b5ce404884313af5f906b392
>   D = d1952499f94a4e6087bee28466fcb09f
> The consensus matrix is:
>  Config source |     Replicas     | Current term | Config index | Committed?
> ---------------+------------------+--------------+--------------+------------
>  master        | A*  B   C        |              |              | Yes
>  A             | A*  B   C   D~   | 1            | 1991         | No
>  B             | A*  B   C        | 1            | 4            | Yes
>  C             | A*  B   C        | 1            | 4            | Yes{code}
> These tablets couldn't recover for a couple of days until we restart 
> kudu-ts27.
> I found so many duplicated logs in kudu-ts27 are like:
> {code:java}
> I0314 04:38:41.511279 65731 raft_consensus.cc:937] T 
> 7404240f458f462d92b6588d07583a52 P 47af52df1adc47e1903eb097e9c88f2e [term 3 
> LEADER]: attempt to promote peer 08beca5ed4d04003b6979bf8bac378d2: there is 
> already a config change operation in progress. Unable to promote follower 
> until it completes. Doing nothing.
> I0314 04:38:41.751009 65453 raft_consensus.cc:937] T 
> 6d9d3fb034314fa7bee9cfbf602bcdc8 P 47af52df1adc47e1903eb097e9c88f2e [term 5 
> LEADER]: attempt to promote peer 14632cdbb0d04279bc772f64e06389f9: there is 
> already a config change operation in progress. Unable to promote follower 
> until it completes. Doing nothing.
> {code}
> There seems to be some RaftConfig change operations that somehow cannot 
> complete.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to