Re: [jira] [Created] (KAFKA-10314) KafkaStorageException on reassignment when offline log directories exist

2020-08-10 Thread Noa Resare
I guess it might be time to nag a bit about this, according to the contributing 
code changes  instructions :) I opened a 
pull request  (with test) 6 days ago 
that resolves this issue for me. I would be delighted to have a review or two 
of this tiny change.

cheers
noa

> On 27 Jul 2020, at 16:46, Noa Resare (Jira)  wrote:
> 
> Noa Resare created KAFKA-10314:
> --
> 
> Summary: KafkaStorageException on reassignment when offline log 
> directories exist
> Key: KAFKA-10314
> URL: https://issues.apache.org/jira/browse/KAFKA-10314
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.5.0
>Reporter: Noa Resare
> 
> 
> If a reassignment of a partition is triggered to a broker with an offline 
> directory, the new broker will fail to follow, instead raising a 
> KafkaStorageException which causes the reassignment to stall indefinitely. 
> The error message we see is the following:
> 
> {{[2020-07-23 13:11:08,727] ERROR [Broker id=1] Skipped the become-follower 
> state change with correlation id 14 from controller 1 epoch 1 for partition 
> t2-0 (last update controller epoch 1) with leader 2 since the replica for the 
> partition is offline due to disk error 
> org.apache.kafka.common.errors.KafkaStorageException: Can not create log for 
> t2-0 because log directories /tmp/kafka/d1 are offline (state.change.logger)}}
> 
> It seems to me that unless the partition in question already existed on the 
> offline log partition, a better behaviour would simply be to assign the 
> partition to one of the available log directories.
> 
> The conditional in 
> [LogManager.scala:769|https://github.com/apache/kafka/blob/11f75691b87fcecc8b29bfd25c7067e054e408ea/core/src/main/scala/kafka/log/LogManager.scala#L769]
>  was introduced to prevent the issue in 
> [KAFKA-4763|https://issues.apache.org/jira/browse/KAFKA-4763] where 
> partitions in offline logdirs would be re-created in an online directory as 
> soon as a LeaderAndISR message gets processed. However, the semantics of 
> isNew seems different in LogManager (the replica is new on this broker) 
> compared to when isNew is set in 
> [KafkaController.scala|https://github.com/apache/kafka/blob/11f75691b87fcecc8b29bfd25c7067e054e408ea/core/src/main/scala/kafka/controller/KafkaController.scala#L879]
>  (where it seems to refer to whether the topic partition in itself is new, 
> all followers gets {{isNew=false}})
> 
> 
> 
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)



[jira] [Created] (KAFKA-10314) KafkaStorageException on reassignment when offline log directories exist

2020-07-27 Thread Noa Resare (Jira)
Noa Resare created KAFKA-10314:
--

 Summary: KafkaStorageException on reassignment when offline log 
directories exist
 Key: KAFKA-10314
 URL: https://issues.apache.org/jira/browse/KAFKA-10314
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.5.0
Reporter: Noa Resare


If a reassignment of a partition is triggered to a broker with an offline 
directory, the new broker will fail to follow, instead raising a 
KafkaStorageException which causes the reassignment to stall indefinitely. The 
error message we see is the following:

{{[2020-07-23 13:11:08,727] ERROR [Broker id=1] Skipped the become-follower 
state change with correlation id 14 from controller 1 epoch 1 for partition 
t2-0 (last update controller epoch 1) with leader 2 since the replica for the 
partition is offline due to disk error 
org.apache.kafka.common.errors.KafkaStorageException: Can not create log for 
t2-0 because log directories /tmp/kafka/d1 are offline (state.change.logger)}}

It seems to me that unless the partition in question already existed on the 
offline log partition, a better behaviour would simply be to assign the 
partition to one of the available log directories.

The conditional in 
[LogManager.scala:769|https://github.com/apache/kafka/blob/11f75691b87fcecc8b29bfd25c7067e054e408ea/core/src/main/scala/kafka/log/LogManager.scala#L769]
 was introduced to prevent the issue in 
[KAFKA-4763|https://issues.apache.org/jira/browse/KAFKA-4763] where partitions 
in offline logdirs would be re-created in an online directory as soon as a 
LeaderAndISR message gets processed. However, the semantics of isNew seems 
different in LogManager (the replica is new on this broker) compared to when 
isNew is set in 
[KafkaController.scala|https://github.com/apache/kafka/blob/11f75691b87fcecc8b29bfd25c7067e054e408ea/core/src/main/scala/kafka/controller/KafkaController.scala#L879]
 (where it seems to refer to whether the topic partition in itself is new, all 
followers gets {{isNew=false}})



--
This message was sent by Atlassian Jira
(v8.3.4#803005)