[ https://issues.apache.org/jira/browse/KUDU-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432525#comment-15432525 ]
zhangsong edited comment on KUDU-1576 at 8/23/16 9:52 AM: ---------------------------------------------------------- the time line is like: at first , replica 2 is the leader at: I0819 11:08:39.935573 replica 2 is down(according to last line of kudu-tserver.INFO),trigger another two follower to requestVote. at I0819 11:09:02.246912 replica 1 won elect. at I0819 11:09:06.675046 20157 replica 3 is down(according to last line of kudu-tserver.INFO) at W0819 11:09:11.407464 see consensus timeout message related to replica 3 (according to kudu-tserver.INFO on replica 1) was (Author: brucesz): last line of kudu-tserver.INFO on replica 2: I0819 11:08:39.935573 12831 multi_column_writer.cc:85]... that on replica 3: I0819 11:09:06.675046 20157 raft_consensus.cc:380] .. the time line is like: at first , replica 2 is the leader at: I0819 11:08:39.935573 replica 2 is down(according to last line of kudu-tserver.INFO),trigger another two follower to requestVote. at I0819 11:09:02.246912 replica 1 won elect. at I0819 11:09:06.675046 20157 replica 3 is down(according to last line of kudu-tserver.INFO) at W0819 11:09:11.407464 see consensus timeout message related to replica 3 (according to kudu-tserver.INFO on replica 1) > raft-config will stay in pending state a long time in node crash situation. > --------------------------------------------------------------------------- > > Key: KUDU-1576 > URL: https://issues.apache.org/jira/browse/KUDU-1576 > Project: Kudu > Issue Type: Bug > Reporter: zhangsong > > After experiencing two phsical nodes crash, i found one of my table is > read-only. i did some search and found that both of two followers of a > tablet is in down state. But from web-ui those down follower are still > there. So i try to recovery the table with kudu-admin tool's change_config > and it failed with below message: > Pending config: local: false peers { permanent_uuid: > "515ab1adcbd64081b646a86133f5f60d" member_type: VOTER last_known_addr { host: > "one_of_follower" port: 7052 } } peers { permanent_uuid: > "3a77ef5039f447d29db5a44c92279a7a" member_type: VOTER last_known_addr { host: > "current_leader" port: 7052 } } > it seems that after one of raft-config members is down, when current leader > is trying to replicate the config, the "515ab1adcbd64081b646a86133f5f60d" > crashed . In which case , the config just pend there, as the raft-config will > never get accepted by majority. > It will be better that we can have some machanism to fix it , at least > manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332)