[jira] [Assigned] (KAFKA-10655) Raft leader should resign after write failures
[ https://issues.apache.org/jira/browse/KAFKA-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen reassigned KAFKA-10655: --- Assignee: Boyang Chen (was: Jason Gustafson) > Raft leader should resign after write failures > -- > > Key: KAFKA-10655 > URL: https://issues.apache.org/jira/browse/KAFKA-10655 > Project: Kafka > Issue Type: Sub-task >Reporter: Jason Gustafson >Assignee: Boyang Chen >Priority: Major > > The controller's state machine relies on strong ordering guarantees. Each > write assumes that all previous writes are either committed or will > eventually become committed. In order to protect this assumption, the > controller must not accept additional writes in the same epoch if a preceding > write has failed. Instead, it should resign so that another leader can be > elected. There are basically three classes of failures that we consider: > 1. Serialization/state errors. Any unexpected write errors should be treated > as fatal. The leader should gracefully resign and the process should shutdown. > 2. Disk IO errors. Similarly, the leader should resign (gracefully if > possible) and the process should shutdown. > 3. Commit failures. If the leader is unable to commit data after some time, > then it should gracefully resign, but the process should not exit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-10655) Raft leader should resign after write failures
[ https://issues.apache.org/jira/browse/KAFKA-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson reassigned KAFKA-10655: --- Assignee: Jason Gustafson > Raft leader should resign after write failures > -- > > Key: KAFKA-10655 > URL: https://issues.apache.org/jira/browse/KAFKA-10655 > Project: Kafka > Issue Type: Sub-task >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > > The controller's state machine relies on strong ordering guarantees. Each > write assumes that all previous writes are either committed or will > eventually become committed. In order to protect this assumption, the > controller must not accept additional writes in the same epoch if a preceding > write has failed. Instead, it should resign so that another controller can be > elected. There are basically three classes of failures that we consider: > 1. Serialization/state errors. Anything unexpected write errors should be > treated as fatal. The leader should gracefully resign and the process should > shutdown. > 2. Disk IO errors. Similarly, the leader should resign (gracefully if > possible) and the process should shutdown. > 3. Commit failures. If the leader is unable to commit data after some time, > then it should gracefully resign, but the process should not exit. -- This message was sent by Atlassian Jira (v8.3.4#803005)