[jira] [Created] (KAFKA-12241) Partition offline when ISR shrinks to leader and LogDir goes offline

2021-01-26 Thread Noa Resare (Jira)
Noa Resare created KAFKA-12241:
--

 Summary: Partition offline when ISR shrinks to leader and LogDir 
goes offline
 Key: KAFKA-12241
 URL: https://issues.apache.org/jira/browse/KAFKA-12241
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.4.2
Reporter: Noa Resare


This is a long standing issue that we haven't previously tracked in a JIRA. We 
experience this maybe once per month on average and we see the following 
sequence of events:
 # A broker shrinks ISR to just itself for a partition. However, the followers 
are at highWatermark:{{ [Partition PARTITION broker=601] Shrinking ISR from 
1501,601,1201,1801 to 601. Leader: (highWatermark: 432385279, endOffset: 
432385280). Out of sync replicas: (brokerId: 1501, endOffset: 432385279) 
(brokerId: 1201, endOffset: 432385279) (brokerId: 1801, endOffset: 432385279).}}
 # Around this time (in the case I have in front of me, 20ms earlier according 
to the logging subsystem) LogDirFailureChannel captures an Error while 
appending records to PARTITION due to a readonly filesystem.
 # ~20 ms after the ISR shrink, LogDirFailureHandler offlines the partition: 
Logs for partitions LIST_OF_PARTITIONS are offline and logs for future 
partitions are offline due to failure on log directory /kafka/d6/data 
 # ~50ms later the controller marks the replicas as offline from 601: message: 
[Controller id=901] Mark replicas LIST_OF_PARTITIONS on broker 601 as offline 
 # ~2ms later the controller offlines the partition: [Controller id=901 
epoch=4] Changed partition PARTITION state from OnlinePartition to 
OfflinePartition 

To resolve this someone needs to manually enable unclean leader election, which 
is obviously not ideal. Since the leader knows that all the followers that are 
removed from ISR is at highWatermark, maybe it could convey that to the 
controller in the LeaderAndIsr response so that the controller could pick a new 
leader without having to resort to unclean leader election.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10490) Make constructors public for Admin API value objects

2020-09-16 Thread Noa Resare (Jira)
Noa Resare created KAFKA-10490:
--

 Summary: Make constructors public for Admin API value objects
 Key: KAFKA-10490
 URL: https://issues.apache.org/jira/browse/KAFKA-10490
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 2.6.0
Reporter: Noa Resare


Developers writing automation that uses the {{Admin}} API will in many cases 
want to create a mock and configure that mock to return the value objects that 
is expected to be able to test other pieces of functionality in a controlled 
way.

However, since the constructors in the value objects that the various API 
endpoints return are either {{protected}} or the default access level, 
instantiating such value objects takes needs to use some convoluted trick to 
create instances (either mock them with a mocking framework, use reflection 
magic or create a helper method in the same package as they appear).

Please consider updating the constructor signatures and make them public and in 
doing so encourage good testing practices everywhere.

Here are some examples of classes affected by this:
 * CreateTopicsResult
 * DeleteTopicsResult
 * ListTopicsResult
 * DescribeTopicsResult
 * DescribeClusterResult
 * DescribeAclsResult
 * CreateAclsResul
 * DeleteAclsResult
 * DescribeConfigsResult
 * AlterConfigsResult
 * AlterReplicaLogDirsResult
 * DescribeLogDirsResult
 * DescribeReplicaLogDirsResult
 * CreatePartitionsResult
 * CreateDelegationTokenResult
 * RenewDelegationTokenResult
 * ExpireDelegationTokenResult
 * DescribeDelegationTokenResult
 * ...and so on



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10314) KafkaStorageException on reassignment when offline log directories exist

2020-07-27 Thread Noa Resare (Jira)
Noa Resare created KAFKA-10314:
--

 Summary: KafkaStorageException on reassignment when offline log 
directories exist
 Key: KAFKA-10314
 URL: https://issues.apache.org/jira/browse/KAFKA-10314
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.5.0
Reporter: Noa Resare


If a reassignment of a partition is triggered to a broker with an offline 
directory, the new broker will fail to follow, instead raising a 
KafkaStorageException which causes the reassignment to stall indefinitely. The 
error message we see is the following:

{{[2020-07-23 13:11:08,727] ERROR [Broker id=1] Skipped the become-follower 
state change with correlation id 14 from controller 1 epoch 1 for partition 
t2-0 (last update controller epoch 1) with leader 2 since the replica for the 
partition is offline due to disk error 
org.apache.kafka.common.errors.KafkaStorageException: Can not create log for 
t2-0 because log directories /tmp/kafka/d1 are offline (state.change.logger)}}

It seems to me that unless the partition in question already existed on the 
offline log partition, a better behaviour would simply be to assign the 
partition to one of the available log directories.

The conditional in 
[LogManager.scala:769|https://github.com/apache/kafka/blob/11f75691b87fcecc8b29bfd25c7067e054e408ea/core/src/main/scala/kafka/log/LogManager.scala#L769]
 was introduced to prevent the issue in 
[KAFKA-4763|https://issues.apache.org/jira/browse/KAFKA-4763] where partitions 
in offline logdirs would be re-created in an online directory as soon as a 
LeaderAndISR message gets processed. However, the semantics of isNew seems 
different in LogManager (the replica is new on this broker) compared to when 
isNew is set in 
[KafkaController.scala|https://github.com/apache/kafka/blob/11f75691b87fcecc8b29bfd25c7067e054e408ea/core/src/main/scala/kafka/controller/KafkaController.scala#L879]
 (where it seems to refer to whether the topic partition in itself is new, all 
followers gets {{isNew=false}})



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-7685) Support loading trust stores from classpath

2018-11-28 Thread Noa Resare (JIRA)
Noa Resare created KAFKA-7685:
-

 Summary: Support loading trust stores from classpath
 Key: KAFKA-7685
 URL: https://issues.apache.org/jira/browse/KAFKA-7685
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 2.1.0
Reporter: Noa Resare


Certificate pinning as well as authenticating kafka brokers using a non-public 
CA certificate maintained inside an organisation is desirable to a lot of 
users. This can be accomplished today using the {{ssl.truststore.location}} 
configuration property. Unfortunately, this value is always interpreted as a 
filesystem path which makes distribution of such an alternative truststore a 
needlessly cumbersome process. If we had the ability to load a trust store from 
the classpath as well as from a file, the trust store could be shipped in a jar 
that could be declared as a regular maven style dependency.

If we did this by supporting prefixing {{ssl.truststore.location}} with 
{{classpath:}} this could be a backwards compatible change, one that builds on 
prior design patterns established by for example the Spring project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)