[jira] [Commented] (KAFKA-16082) JBOD: Possible dataloss when moving leader partition

2024-01-10 Thread Stanislav Kozlovski (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805286#comment-17805286
 ] 

Stanislav Kozlovski commented on KAFKA-16082:
-

Deeming this not a blocker as per discussions with [~pprovenzano] 

 

> JBOD: Possible dataloss when moving leader partition
> 
>
> Key: KAFKA-16082
> URL: https://issues.apache.org/jira/browse/KAFKA-16082
> Project: Kafka
>  Issue Type: Bug
>  Components: jbod
>Affects Versions: 3.7.0
>Reporter: Proven Provenzano
>Assignee: Gaurav Narula
>Priority: Critical
> Fix For: 3.7.1
>
>
> There is a possible dataloss scenario
> when using JBOD,
> when moving the partition leader log from one directory to another on the 
> same broker,
> when after the destination log has caught up to the source log and after the 
> broker has sent an update to the partition assignment
> if the broker accepts and commits a new record for the partition and then the 
> broker restarts and the original partition leader log is lost
> then the destination log would not contain the new record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16082) JBOD: Possible dataloss when moving leader partition

2024-01-09 Thread Proven Provenzano (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804880#comment-17804880
 ] 

Proven Provenzano commented on KAFKA-16082:
---

[~gnarula] added an improvement for the handling of case 3 above:[ 
https://github.com/apache/kafka/pull/15136|https://github.com/apache/kafka/pull/15136]
 

 

> JBOD: Possible dataloss when moving leader partition
> 
>
> Key: KAFKA-16082
> URL: https://issues.apache.org/jira/browse/KAFKA-16082
> Project: Kafka
>  Issue Type: Bug
>  Components: jbod
>Affects Versions: 3.7.0
>Reporter: Proven Provenzano
>Assignee: Gaurav Narula
>Priority: Blocker
> Fix For: 3.7.0
>
>
> There is a possible dataloss scenario
> when using JBOD,
> when moving the partition leader log from one directory to another on the 
> same broker,
> when after the destination log has caught up to the source log and after the 
> broker has sent an update to the partition assignment
> if the broker accepts and commits a new record for the partition and then the 
> broker restarts and the original partition leader log is lost
> then the destination log would not contain the new record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16082) JBOD: Possible dataloss when moving leader partition

2024-01-09 Thread Proven Provenzano (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804879#comment-17804879
 ] 

Proven Provenzano commented on KAFKA-16082:
---

For the case of 3:
 
If I understand this correctly, the scenario is that the broker restarts and 
sees that `dir2` is supposed to own `tp0` from the metadata log replay, however 
it doesn't see the log in `dir2` because the failed future replica hasn't been 
renamed and so it will create a new replica for `tp0` in `dir2` and populate it 
with data from other replicas. Can we create a unit test to validate this? It 
may also be possible to reuse the current future replica so long as the broker 
at restart went through a stage where the leader of the partition was moved to 
a different broker. Now it can treat the partition as an out of sync replica 
and do the rename and catch up immediately. Note it cannot do the rename until 
after the partition leadership has been moved away from the broker in case the 
broker again restarts.
{quote} {quote}

> JBOD: Possible dataloss when moving leader partition
> 
>
> Key: KAFKA-16082
> URL: https://issues.apache.org/jira/browse/KAFKA-16082
> Project: Kafka
>  Issue Type: Bug
>  Components: jbod
>Affects Versions: 3.7.0
>Reporter: Proven Provenzano
>Assignee: Gaurav Narula
>Priority: Blocker
> Fix For: 3.7.0
>
>
> There is a possible dataloss scenario
> when using JBOD,
> when moving the partition leader log from one directory to another on the 
> same broker,
> when after the destination log has caught up to the source log and after the 
> broker has sent an update to the partition assignment
> if the broker accepts and commits a new record for the partition and then the 
> broker restarts and the original partition leader log is lost
> then the destination log would not contain the new record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16082) JBOD: Possible dataloss when moving leader partition

2024-01-04 Thread Gaurav Narula (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803205#comment-17803205
 ] 

Gaurav Narula commented on KAFKA-16082:
---

We tried to analyse the failure scenarios further, keeping the existing design 
for inter-broker replica movement and here's a summary of our findings:

Consider a partition `tp0` being moved from `dir1` to `dir2`. To recap, the 
current design

(i) Waits for the future replica (in `dir2`) to catch up
(ii) Sends an RPC to the controller to mark `dir2` as the log dir for the 
partition.
(iii) On getting a successful response back from the controller, we wait for 
the future replica to catch up once again. Note that we hold locks when the 
compare LEOs for current and future replicas.
(iv) When caught up, we promote the future replica by renaming the directory in 
`dir2` atomically and get rid of the `-future` suffix. We also update broker 
local caches to denote that the partition resides in `dir2` and no future 
replica exists. Finally, we atomically rename the directory in `dir1` and add a 
`-delete` suffix for it to be cleaned later.

Let's consider the following failure categories:

1. Log directory failure during (iv)

This can further be broken down into two scenarios:

(a) `dir2` fails

This would result in an atomic rename of directory in `dir2` to fail and a 
KafkaStorageException to be propagated up in ReplicaAlterLogDirsThread and the 
thread to abort. Eventually,  `ReplicaManager::maybeUpdateTopicAssignment` will 
be run while handling the metadata update from the controller. The function 
will correct the assignment to `dir1`.

(b) `dir1`  fails

This would result in an atomic rename of directory in `dir1` to fail and a 
KafkaStorageException to be propagates up in ReplicaAlterLogDirsThread. Since 
the renaming of future replica and the caches are up to date, 
`ReplicaManager::maybeUpdateTopicAssignment` will be a no-op. However, when the 
broker is restarted, it will fail during startup as two log dirs will exist for 
the partition in `dir1` and `dir2`. The error message is clear here in 
suggesting the partition must be removed from the directory which failed 
recently  (`dir1`)

2. Log directory failure during (iii)

This would result in `replicaAlterLogDirsManager.removeFetcherForPartitions` 
being invoked. Eventually,  `ReplicaManager::maybeUpdateTopicAssignment` will 
be run while handling the metadata update from the controller. The function 
will correct the assignment to `dir1`.

3. Broker crashes during (iii) and starts with an empty `dir1`

Broker will catch up with the metadata from the controller, realises `dir2` 
should own `tp0`. It ignores the failed future replica in `dir2` and creates a 
new future replica in `dir2`, streaming the logs from the new leader. This is 
safe as long as the new leader was in-sync prior to getting elected.

What we overlooked earlier was the fact that 
`ReplicaManager::maybeUpdateTopicAssignment` tries to reconcile the broker's 
state with the controller, causing it to converge eventually. So far, we're 
unable to come up with a scenario with data loss. The bug we have so far is the 
failure to remove an abandoned future directory in scenario (3) which seems to 
be more benign.

I'm curious to hear what others think about these scenarios and possibly others 
that they come across? Perhaps someone who's worked closely in Partion.scala 
can pitch in?

> JBOD: Possible dataloss when moving leader partition
> 
>
> Key: KAFKA-16082
> URL: https://issues.apache.org/jira/browse/KAFKA-16082
> Project: Kafka
>  Issue Type: Bug
>  Components: jbod
>Affects Versions: 3.7.0
>Reporter: Proven Provenzano
>Assignee: Gaurav Narula
>Priority: Blocker
> Fix For: 3.7.0
>
>
> There is a possible dataloss scenario
> when using JBOD,
> when moving the partition leader log from one directory to another on the 
> same broker,
> when after the destination log has caught up to the source log and after the 
> broker has sent an update to the partition assignment
> if the broker accepts and commits a new record for the partition and then the 
> broker restarts and the original partition leader log is lost
> then the destination log would not contain the new record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)