[jira] [Assigned] (KAFKA-10655) Raft leader should resign after write failures

2020-11-17 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-10655:
---

Assignee: Boyang Chen  (was: Jason Gustafson)

> Raft leader should resign after write failures
> --
>
> Key: KAFKA-10655
> URL: https://issues.apache.org/jira/browse/KAFKA-10655
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>Assignee: Boyang Chen
>Priority: Major
>
> The controller's state machine relies on strong ordering guarantees. Each 
> write assumes that all previous writes are either committed or will 
> eventually become committed. In order to protect this assumption, the 
> controller must not accept additional writes in the same epoch if a preceding 
> write has failed. Instead, it should resign so that another leader can be 
> elected. There are basically three classes of failures that we consider:
> 1. Serialization/state errors. Any unexpected write errors should be treated 
> as fatal. The leader should gracefully resign and the process should shutdown.
> 2. Disk IO errors. Similarly, the leader should resign (gracefully if 
> possible) and the process should shutdown. 
> 3. Commit failures. If the leader is unable to commit data after some time, 
> then it should gracefully resign, but the process should not exit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10655) Raft leader should resign after write failures

2020-10-28 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reassigned KAFKA-10655:
---

Assignee: Jason Gustafson

> Raft leader should resign after write failures
> --
>
> Key: KAFKA-10655
> URL: https://issues.apache.org/jira/browse/KAFKA-10655
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> The controller's state machine relies on strong ordering guarantees. Each 
> write assumes that all previous writes are either committed or will 
> eventually become committed. In order to protect this assumption, the 
> controller must not accept additional writes in the same epoch if a preceding 
> write has failed. Instead, it should resign so that another controller can be 
> elected. There are basically three classes of failures that we consider:
> 1. Serialization/state errors. Anything unexpected write errors should be 
> treated as fatal. The leader should gracefully resign and the process should 
> shutdown.
> 2. Disk IO errors. Similarly, the leader should resign (gracefully if 
> possible) and the process should shutdown. 
> 3. Commit failures. If the leader is unable to commit data after some time, 
> then it should gracefully resign, but the process should not exit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)