[jira] [Assigned] (KAFKA-13403) KafkaServer crashes when deleting topics due to the race in log deletion

2024-07-16 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada reassigned KAFKA-13403:


Assignee: Arun Mathew  (was: Haruki Okada)

> KafkaServer crashes when deleting topics due to the race in log deletion
> 
>
> Key: KAFKA-13403
> URL: https://issues.apache.org/jira/browse/KAFKA-13403
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.4.1
>Reporter: Haruki Okada
>Assignee: Arun Mathew
>Priority: Major
>
> h2. Environment
>  * OS: CentOS Linux release 7.6
>  * Kafka version: 2.4.1
>  * 
>  ** But as far as I checked the code, I think same phenomenon could happen 
> even on trunk
>  * Kafka log directory: RAID1+0 (i.e. not using JBOD so only single log.dirs 
> is set)
>  * Java version: AdoptOpenJDK 1.8.0_282
> h2. Phenomenon
> When we were in the middle of deleting several topics by `kafka-topics.sh 
> --delete --topic blah-blah`, one broker in our cluster crashed due to 
> following exception:
>  
> {code:java}
> [2021-10-21 18:19:19,122] ERROR Shutdown broker because all log dirs in 
> /data/kafka have failed (kafka.log.LogManager)
> {code}
>  
>  
> We also found NoSuchFileException was thrown right before the crash when 
> LogManager tried to delete logs for some partitions.
>  
> {code:java}
> [2021-10-21 18:19:18,849] ERROR Error while deleting log for foo-bar-topic-5 
> in dir /data/kafka (kafka.server.LogDirFailureChannel)
> java.nio.file.NoSuchFileException: 
> /data/kafka/foo-bar-topic-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete/03877066.timeindex.deleted
> at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at 
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> at 
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
> at 
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> at java.nio.file.Files.readAttributes(Files.java:1737)
> at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
> at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
> at java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372)
> at java.nio.file.Files.walkFileTree(Files.java:2706)
> at java.nio.file.Files.walkFileTree(Files.java:2742)
> at org.apache.kafka.common.utils.Utils.delete(Utils.java:732)
> at kafka.log.Log.$anonfun$delete$2(Log.scala:2036)
> at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at kafka.log.Log.maybeHandleIOException(Log.scala:2343)
> at kafka.log.Log.delete(Log.scala:2030)
> at kafka.log.LogManager.deleteLogs(LogManager.scala:826)
> at kafka.log.LogManager.$anonfun$deleteLogs$6(LogManager.scala:840)
> at 
> kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> So, the log-dir was marked as offline and ended up with KafkaServer crash 
> because the broker has only single log-dir.
> h2. Cause
> We also found below logs right before the NoSuchFileException.
>  
> {code:java}
> [2021-10-21 18:18:17,829] INFO Log for partition foo-bar-5 is renamed to 
> /data/kafka/foo-bar-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete and is 
> scheduled for deletion (kafka.log.LogManager)
> [2021-10-21 18:18:17,900] INFO [Log partition=foo-bar-5, dir=/data/kafka] 
> Found deletable segments with base offsets [3877066] due to retention time 
> 17280ms breach (kafka.log.Log)[2021-10-21 18:18:17,901] INFO [Log 
> partition=foo-bar-5, dir=/data/kafka] Scheduling segments for deletion 
> List(LogSegment(baseOffset=3877066, size=90316366, 
> lastModifiedTime=1634634956000, largestTime=1634634955854)) (kafka.log.Log)
> {code}
> After checking through Kafka code, w

[jira] [Commented] (KAFKA-17076) logEndOffset could be lost due to log cleaning

2024-07-15 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866105#comment-17866105
 ] 

Haruki Okada commented on KAFKA-17076:
--

Also, since lastOffset() always returns original last offset even after the 
compaction, log end offset of compacted log will not rewind
https://github.com/apache/kafka/blob/3.7.1/clients/src/main/java/org/apache/kafka/common/record/RecordBatch.java#L118-L120

> logEndOffset could be lost due to log cleaning
> --
>
> Key: KAFKA-17076
> URL: https://issues.apache.org/jira/browse/KAFKA-17076
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jun Rao
>Priority: Major
>
> It's possible for the log cleaner to remove all records in the suffix of the 
> log. If the partition is then reassigned, the new replica won't be able to 
> see the true logEndOffset since there is no record batch associated with it. 
> If this replica becomes the leader, it will assign an already used offset to 
> a newly produced record, which is incorrect.
>  
> It's relatively rare to trigger this issue since the active segment is never 
> cleaned and typically is not empty. However, the following is one possibility.
>  # records with offset 100-110 are produced and fully replicated to all ISR. 
> All those records are delete records for certain keys.
>  # record with offset 111 is produced. It forces the roll of a new segment in 
> broker b1 and is added to the log. The record is not committed and is later 
> truncated from the log, leaving an empty active segment in this log. b1 at 
> some point becomes the leader.
>  # log cleaner kicks in and removes records 100-110.
>  # The partition is reassigned to another broker b2. b2 replicates all 
> records from b1 up to offset 100 and marks its logEndOffset at 100. Since 
> there is no record to replicate after offset 100 in b1, b2's logEndOffset 
> stays at 100 and b2 can join the ISR.
>  # b2 becomes the leader and assign offset 100 to a new record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-17076) logEndOffset could be lost due to log cleaning

2024-07-14 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865823#comment-17865823
 ] 

Haruki Okada commented on KAFKA-17076:
--

[~junrao] Is that possible?

At the step 2 in your scenario, I guess truncation doesn't happen unless at 
least one record is returned from Fetch response because of 
(https://github.com/apache/kafka/pull/9382), so empty active segment is not 
possible in my understanding.
refs: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-Fetch

> logEndOffset could be lost due to log cleaning
> --
>
> Key: KAFKA-17076
> URL: https://issues.apache.org/jira/browse/KAFKA-17076
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jun Rao
>Priority: Major
>
> It's possible for the log cleaner to remove all records in the suffix of the 
> log. If the partition is then reassigned, the new replica won't be able to 
> see the true logEndOffset since there is no record batch associated with it. 
> If this replica becomes the leader, it will assign an already used offset to 
> a newly produced record, which is incorrect.
>  
> It's relatively rare to trigger this issue since the active segment is never 
> cleaned and typically is not empty. However, the following is one possibility.
>  # records with offset 100-110 are produced and fully replicated to all ISR. 
> All those records are delete records for certain keys.
>  # record with offset 111 is produced. It forces the roll of a new segment in 
> broker b1 and is added to the log. The record is not committed and is later 
> truncated from the log, leaving an empty active segment in this log. b1 at 
> some point becomes the leader.
>  # log cleaner kicks in and removes records 100-110.
>  # The partition is reassigned to another broker b2. b2 replicates all 
> records from b1 up to offset 100 and marks its logEndOffset at 100. Since 
> there is no record to replicate after offset 100 in b1, b2's logEndOffset 
> stays at 100 and b2 can join the ISR.
>  # b2 becomes the leader and assign offset 100 to a new record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-11 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865110#comment-17865110
 ] 

Haruki Okada commented on KAFKA-17061:
--

Did a micro benchmark to check the performance improvement of 
`addUpdateMetadataRequestForBrokers` by the patch.
Benchmark code: 
[https://gist.github.com/ocadaruma/e80be044227d6235126310e9058f546d]

!screenshot-flame.png|width=320! 
!screenshot-flame-patched.png|width=320!

 

As we can see, isReplicaOnline is no longer a bottleneck.

> KafkaController takes long time to connect to newly added broker after 
> registration on large cluster
> 
>
> Key: KAFKA-17061
> URL: https://issues.apache.org/jira/browse/KAFKA-17061
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
> Attachments: flame-patched.html, flame.html, 
> image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png, 
> screenshot-flame-patched.png, screenshot-flame.png
>
>
> h2. Environment
>  * Kafka version: 3.3.2
>  * Cluster: 200~ brokers
>  * Total num partitions: 40k
>  * ZK-based cluster
> h2. Phenomenon
> When a broker left the cluster once due to the long STW and came back after a 
> while, the controller took 6 seconds until connecting to the broker after 
> znode registration, it caused significant message delivery delay.
> {code:java}
> [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
> deleted brokers: , bounced brokers: , all live brokers: 1,... 
> (kafka.controller.KafkaController)
> [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 
> 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
> [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
> (kafka.controller.RequestSendThread)
> [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
> for 2 (kafka.controller.KafkaController)
> [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 
> 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
> requests (kafka.controller.RequestSendThread)
> {code}
> h2. Analysis
> From the flamegraph at that time, we can see that 
> [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
>  called by `isReplicaOnline` takes significant time in 
> `addUpdateMetadataRequestForBrokers` invocation on broker startup.
> !image-2024-07-02-17-24-11-861.png|width=541,height=303!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-11 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-17061:
-
Description: 
h2. Environment
 * Kafka version: 3.3.2
 * Cluster: 200~ brokers
 * Total num partitions: 40k
 * ZK-based cluster

h2. Phenomenon

When a broker left the cluster once due to the long STW and came back after a 
while, the controller took 6 seconds until connecting to the broker after znode 
registration, it caused significant message delivery delay.
{code:java}
[2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
deleted brokers: , bounced brokers: , all live brokers: 1,... 
(kafka.controller.KafkaController)
[2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 
trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
[2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
(kafka.controller.RequestSendThread)
[2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
for 2 (kafka.controller.KafkaController)
[2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 
connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
requests (kafka.controller.RequestSendThread)
{code}
h2. Analysis

>From the flamegraph at that time, we can see that 
>[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
> called by `isReplicaOnline` takes significant time in 
>`addUpdateMetadataRequestForBrokers` invocation on broker startup.

!image-2024-07-02-17-24-11-861.png|width=541,height=303!

  was:
h2. Environment
 * Kafka version: 3.3.2
 * Cluster: 200~ brokers
 * Total num partitions: 40k
 * ZK-based cluster

h2. Phenomenon

When a broker left the cluster once due to the long STW and came back after a 
while, the controller took 6 seconds until connecting to the broker after znode 
registration, it caused significant message delivery delay.
{code:java}
[2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
deleted brokers: , bounced brokers: , all live brokers: 1,... 
(kafka.controller.KafkaController)
[2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 
trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
[2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
(kafka.controller.RequestSendThread)
[2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
for 2 (kafka.controller.KafkaController)
[2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 
connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
requests (kafka.controller.RequestSendThread)
{code}
h2. Analysis

>From the flamegraph at that time, we can see that 
>[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
> calculation takes significant time in `addUpdateMetadataRequestForBrokers` 
>invocation on broker startup.

!image-2024-07-02-17-24-11-861.png|width=541,height=303!


> KafkaController takes long time to connect to newly added broker after 
> registration on large cluster
> 
>
> Key: KAFKA-17061
> URL: https://issues.apache.org/jira/browse/KAFKA-17061
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
> Attachments: flame-patched.html, flame.html, 
> image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png, 
> screenshot-flame-patched.png, screenshot-flame.png
>
>
> h2. Environment
>  * Kafka version: 3.3.2
>  * Cluster: 200~ brokers
>  * Total num partitions: 40k
>  * ZK-based cluster
> h2. Phenomenon
> When a broker left the cluster once due to the long STW and came back after a 
> while, the controller took 6 seconds until connecting to the broker after 
> znode registration, it caused significant message delivery delay.
> {code:java}
> [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
> deleted brokers: , bounced brokers: , all live brokers: 1,... 
> (kafka.controller.KafkaController)
> [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 
> 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
> [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
> (kafka.controller.RequestSendThread)
> [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
> for 2 (kafka.controller.KafkaController)
> [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 
> 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state cha

[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-11 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-17061:
-
Attachment: screenshot-flame-patched.png

> KafkaController takes long time to connect to newly added broker after 
> registration on large cluster
> 
>
> Key: KAFKA-17061
> URL: https://issues.apache.org/jira/browse/KAFKA-17061
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
> Attachments: flame-patched.html, flame.html, 
> image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png, 
> screenshot-flame-patched.png, screenshot-flame.png
>
>
> h2. Environment
>  * Kafka version: 3.3.2
>  * Cluster: 200~ brokers
>  * Total num partitions: 40k
>  * ZK-based cluster
> h2. Phenomenon
> When a broker left the cluster once due to the long STW and came back after a 
> while, the controller took 6 seconds until connecting to the broker after 
> znode registration, it caused significant message delivery delay.
> {code:java}
> [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
> deleted brokers: , bounced brokers: , all live brokers: 1,... 
> (kafka.controller.KafkaController)
> [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 
> 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
> [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
> (kafka.controller.RequestSendThread)
> [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
> for 2 (kafka.controller.KafkaController)
> [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 
> 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
> requests (kafka.controller.RequestSendThread)
> {code}
> h2. Analysis
> From the flamegraph at that time, we can see that 
> [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
>  calculation takes significant time in `addUpdateMetadataRequestForBrokers` 
> invocation on broker startup.
> !image-2024-07-02-17-24-11-861.png|width=541,height=303!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-11 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-17061:
-
Attachment: screenshot-flame.png

> KafkaController takes long time to connect to newly added broker after 
> registration on large cluster
> 
>
> Key: KAFKA-17061
> URL: https://issues.apache.org/jira/browse/KAFKA-17061
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
> Attachments: flame-patched.html, flame.html, 
> image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png, 
> screenshot-flame.png
>
>
> h2. Environment
>  * Kafka version: 3.3.2
>  * Cluster: 200~ brokers
>  * Total num partitions: 40k
>  * ZK-based cluster
> h2. Phenomenon
> When a broker left the cluster once due to the long STW and came back after a 
> while, the controller took 6 seconds until connecting to the broker after 
> znode registration, it caused significant message delivery delay.
> {code:java}
> [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
> deleted brokers: , bounced brokers: , all live brokers: 1,... 
> (kafka.controller.KafkaController)
> [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 
> 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
> [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
> (kafka.controller.RequestSendThread)
> [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
> for 2 (kafka.controller.KafkaController)
> [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 
> 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
> requests (kafka.controller.RequestSendThread)
> {code}
> h2. Analysis
> From the flamegraph at that time, we can see that 
> [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
>  calculation takes significant time in `addUpdateMetadataRequestForBrokers` 
> invocation on broker startup.
> !image-2024-07-02-17-24-11-861.png|width=541,height=303!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-11 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-17061:
-
Attachment: flame-patched.html

> KafkaController takes long time to connect to newly added broker after 
> registration on large cluster
> 
>
> Key: KAFKA-17061
> URL: https://issues.apache.org/jira/browse/KAFKA-17061
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
> Attachments: flame-patched.html, flame.html, 
> image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png
>
>
> h2. Environment
>  * Kafka version: 3.3.2
>  * Cluster: 200~ brokers
>  * Total num partitions: 40k
>  * ZK-based cluster
> h2. Phenomenon
> When a broker left the cluster once due to the long STW and came back after a 
> while, the controller took 6 seconds until connecting to the broker after 
> znode registration, it caused significant message delivery delay.
> {code:java}
> [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
> deleted brokers: , bounced brokers: , all live brokers: 1,... 
> (kafka.controller.KafkaController)
> [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 
> 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
> [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
> (kafka.controller.RequestSendThread)
> [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
> for 2 (kafka.controller.KafkaController)
> [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 
> 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
> requests (kafka.controller.RequestSendThread)
> {code}
> h2. Analysis
> From the flamegraph at that time, we can see that 
> [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
>  calculation takes significant time in `addUpdateMetadataRequestForBrokers` 
> invocation on broker startup.
> !image-2024-07-02-17-24-11-861.png|width=541,height=303!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-11 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-17061:
-
Attachment: flame.html

> KafkaController takes long time to connect to newly added broker after 
> registration on large cluster
> 
>
> Key: KAFKA-17061
> URL: https://issues.apache.org/jira/browse/KAFKA-17061
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
> Attachments: flame-patched.html, flame.html, 
> image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png
>
>
> h2. Environment
>  * Kafka version: 3.3.2
>  * Cluster: 200~ brokers
>  * Total num partitions: 40k
>  * ZK-based cluster
> h2. Phenomenon
> When a broker left the cluster once due to the long STW and came back after a 
> while, the controller took 6 seconds until connecting to the broker after 
> znode registration, it caused significant message delivery delay.
> {code:java}
> [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
> deleted brokers: , bounced brokers: , all live brokers: 1,... 
> (kafka.controller.KafkaController)
> [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 
> 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
> [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
> (kafka.controller.RequestSendThread)
> [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
> for 2 (kafka.controller.KafkaController)
> [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 
> 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
> requests (kafka.controller.RequestSendThread)
> {code}
> h2. Analysis
> From the flamegraph at that time, we can see that 
> [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
>  calculation takes significant time in `addUpdateMetadataRequestForBrokers` 
> invocation on broker startup.
> !image-2024-07-02-17-24-11-861.png|width=541,height=303!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-11 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-17061:
-
Description: 
h2. Environment
 * Kafka version: 3.3.2
 * Cluster: 200~ brokers
 * Total num partitions: 40k
 * ZK-based cluster

h2. Phenomenon

When a broker left the cluster once due to the long STW and came back after a 
while, the controller took 6 seconds until connecting to the broker after znode 
registration, it caused significant message delivery delay.
{code:java}
[2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
deleted brokers: , bounced brokers: , all live brokers: 1,... 
(kafka.controller.KafkaController)
[2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 
trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
[2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
(kafka.controller.RequestSendThread)
[2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
for 2 (kafka.controller.KafkaController)
[2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 
connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
requests (kafka.controller.RequestSendThread)
{code}
h2. Analysis

>From the flamegraph at that time, we can see that 
>[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
> calculation takes significant time in `addUpdateMetadataRequestForBrokers` 
>invocation on broker startup.

!image-2024-07-02-17-24-11-861.png|width=541,height=303!

  was:
h2. Environment
 * Kafka version: 3.3.2
 * Cluster: 200~ brokers
 * Total num partitions: 40k
 * ZK-based cluster

h2. Phenomenon

When a broker left the cluster once due to the long STW and came back after a 
while, the controller took 6 seconds until connecting to the broker after znode 
registration, it caused significant message delivery delay.
{code:java}
[2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
deleted brokers: , bounced brokers: , all live brokers: 1,... 
(kafka.controller.KafkaController)
[2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 
trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
[2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
(kafka.controller.RequestSendThread)
[2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
for 2 (kafka.controller.KafkaController)
[2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 
connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
requests (kafka.controller.RequestSendThread)
{code}
h2. Analysis

>From the flamegraph at that time, we can see that 
>[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
> calculation takes significant time.

!image-2024-07-02-17-24-11-861.png|width=541,height=303!


> KafkaController takes long time to connect to newly added broker after 
> registration on large cluster
> 
>
> Key: KAFKA-17061
> URL: https://issues.apache.org/jira/browse/KAFKA-17061
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
> Attachments: image-2024-07-02-17-22-06-100.png, 
> image-2024-07-02-17-24-11-861.png
>
>
> h2. Environment
>  * Kafka version: 3.3.2
>  * Cluster: 200~ brokers
>  * Total num partitions: 40k
>  * ZK-based cluster
> h2. Phenomenon
> When a broker left the cluster once due to the long STW and came back after a 
> while, the controller took 6 seconds until connecting to the broker after 
> znode registration, it caused significant message delivery delay.
> {code:java}
> [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
> deleted brokers: , bounced brokers: , all live brokers: 1,... 
> (kafka.controller.KafkaController)
> [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 
> 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
> [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
> (kafka.controller.RequestSendThread)
> [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
> for 2 (kafka.controller.KafkaController)
> [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 
> 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
> requests (kafka.controller.RequestSendThread)
> {code}
> h2. Analysis
> From the flamegraph at that time, we can see that 
> [liveBrokerIds|https://github.com/apache/k

[jira] [Commented] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-05 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863197#comment-17863197
 ] 

Haruki Okada commented on KAFKA-17061:
--

[~showuon] Hi, I submitted a 
[patch|https://github.com/apache/kafka/pull/16529]. Could you take a look?

> KafkaController takes long time to connect to newly added broker after 
> registration on large cluster
> 
>
> Key: KAFKA-17061
> URL: https://issues.apache.org/jira/browse/KAFKA-17061
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
> Attachments: image-2024-07-02-17-22-06-100.png, 
> image-2024-07-02-17-24-11-861.png
>
>
> h2. Environment
>  * Kafka version: 3.3.2
>  * Cluster: 200~ brokers
>  * Total num partitions: 40k
>  * ZK-based cluster
> h2. Phenomenon
> When a broker left the cluster once due to the long STW and came back after a 
> while, the controller took 6 seconds until connecting to the broker after 
> znode registration, it caused significant message delivery delay.
> {code:java}
> [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
> deleted brokers: , bounced brokers: , all live brokers: 1,... 
> (kafka.controller.KafkaController)
> [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 
> 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
> [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
> (kafka.controller.RequestSendThread)
> [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
> for 2 (kafka.controller.KafkaController)
> [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 
> 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
> requests (kafka.controller.RequestSendThread)
> {code}
> h2. Analysis
> From the flamegraph at that time, we can see that 
> [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
>  calculation takes significant time.
> !image-2024-07-02-17-24-11-861.png|width=541,height=303!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-03 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-17061:
-
Description: 
h2. Environment
 * Kafka version: 3.3.2
 * Cluster: 200~ brokers
 * Total num partitions: 40k
 * ZK-based cluster

h2. Phenomenon

When a broker left the cluster once due to the long STW and came back after a 
while, the controller took 6 seconds until connecting to the broker after znode 
registration, it caused significant message delivery delay.
{code:java}
[2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
deleted brokers: , bounced brokers: , all live brokers: 1,... 
(kafka.controller.KafkaController)
[2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 
trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
[2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
(kafka.controller.RequestSendThread)
[2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
for 2 (kafka.controller.KafkaController)
[2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 
connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
requests (kafka.controller.RequestSendThread)
{code}
h2. Analysis

>From the flamegraph at that time, we can see that 
>[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
> calculation takes significant time.

!image-2024-07-02-17-24-11-861.png|width=541,height=303!

  was:
h2. Environment
 * Kafka version: 3.3.2
 * Cluster: 200~ brokers
 * Total num partitions: 40k
 * ZK-based cluster

h2. Phenomenon

When a broker left the cluster once due to the long STW and came back after a 
while, the controller took 6 seconds until connecting to the broker after znode 
registration, it caused significant message delivery delay.
{code:java}
[2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
deleted brokers: , bounced brokers: , all live brokers: 1,... 
(kafka.controller.KafkaController)
[2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 
trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
[2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
(kafka.controller.RequestSendThread)
[2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
for 2 (kafka.controller.KafkaController)
[2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 
connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
requests (kafka.controller.RequestSendThread)
{code}
h2. Analysis

>From the flamegraph at that time, we can see that 
>[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
> calculation takes significant time.

!image-2024-07-02-17-24-11-861.png|width=541,height=303!

Since no concurrent modification against liveBrokerEpochs is expected, we can 
just cache the result to improve the performance.


> KafkaController takes long time to connect to newly added broker after 
> registration on large cluster
> 
>
> Key: KAFKA-17061
> URL: https://issues.apache.org/jira/browse/KAFKA-17061
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
> Attachments: image-2024-07-02-17-22-06-100.png, 
> image-2024-07-02-17-24-11-861.png
>
>
> h2. Environment
>  * Kafka version: 3.3.2
>  * Cluster: 200~ brokers
>  * Total num partitions: 40k
>  * ZK-based cluster
> h2. Phenomenon
> When a broker left the cluster once due to the long STW and came back after a 
> while, the controller took 6 seconds until connecting to the broker after 
> znode registration, it caused significant message delivery delay.
> {code:java}
> [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
> deleted brokers: , bounced brokers: , all live brokers: 1,... 
> (kafka.controller.KafkaController)
> [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 
> 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
> [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
> (kafka.controller.RequestSendThread)
> [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
> for 2 (kafka.controller.KafkaController)
> [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 
> 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
> requests (kafka.controller.RequestSendThread)
> {code}
> h2. Analysis
> From the flamegraph at that time, we

[jira] [Assigned] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-02 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada reassigned KAFKA-17061:


Assignee: Haruki Okada

> KafkaController takes long time to connect to newly added broker after 
> registration on large cluster
> 
>
> Key: KAFKA-17061
> URL: https://issues.apache.org/jira/browse/KAFKA-17061
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
> Attachments: image-2024-07-02-17-22-06-100.png, 
> image-2024-07-02-17-24-11-861.png
>
>
> h2. Environment
>  * Kafka version: 3.3.2
>  * Cluster: 200~ brokers
>  * Total num partitions: 40k
>  * ZK-based cluster
> h2. Phenomenon
> When a broker left the cluster once due to the long STW and came back after a 
> while, the controller took 6 seconds until connecting to the broker after 
> znode registration, it caused significant message delivery delay.
> {code:java}
> [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
> deleted brokers: , bounced brokers: , all live brokers: 1,... 
> (kafka.controller.KafkaController)
> [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 
> 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
> [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
> (kafka.controller.RequestSendThread)
> [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
> for 2 (kafka.controller.KafkaController)
> [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 
> 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
> requests (kafka.controller.RequestSendThread)
> {code}
> h2. Analysis
> From the flamegraph at that time, we can see that 
> [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
>  calculation takes significant time.
> !image-2024-07-02-17-24-11-861.png|width=541,height=303!
> Since no concurrent modification against liveBrokerEpochs is expected, we can 
> just cache the result to improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-02 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-17061:


 Summary: KafkaController takes long time to connect to newly added 
broker after registration on large cluster
 Key: KAFKA-17061
 URL: https://issues.apache.org/jira/browse/KAFKA-17061
 Project: Kafka
  Issue Type: Improvement
Reporter: Haruki Okada
 Attachments: image-2024-07-02-17-22-06-100.png, 
image-2024-07-02-17-24-11-861.png

h2. Environment
 * Kafka version: 3.3.2
 * Cluster: 200~ brokers
 * Total num partitions: 40k
 * ZK-based cluster

h2. Phenomenon

When a broker left the cluster once due to the long STW and came back after a 
while, the controller took 6 seconds until connecting to the broker after znode 
registration, it caused significant message delivery delay.
{code:java}
[2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
deleted brokers: , bounced brokers: , all live brokers: 1,... 
(kafka.controller.KafkaController)
[2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 
trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
[2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
(kafka.controller.RequestSendThread)
[2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
for 2 (kafka.controller.KafkaController)
[2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 
connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
requests (kafka.controller.RequestSendThread)
{code}
h2. Analysis

>From the flamegraph at that time, we can see that 
>[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
> calculation takes significant time.

!image-2024-07-02-17-24-11-861.png|width=541,height=303!

Since no concurrent modification against liveBrokerEpochs is expected, we can 
just cache the result to improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15612) Followup on whether the segment indexes need to be materialized or flushed before they are passed to RSM for writing them to tiered storage.

2024-06-19 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856223#comment-17856223
 ] 

Haruki Okada commented on KAFKA-15612:
--

In my understanding, we concluded that index files are not necessary to be 
flushed as it's guaranteed that mmap-ed content are consistent with the file in 
https://issues.apache.org/jira/browse/KAFKA-15609

Do we need other follow-ups?

> Followup on whether the segment indexes need to be materialized or flushed 
> before they are passed to RSM for writing them to tiered storage. 
> -
>
> Key: KAFKA-15612
> URL: https://issues.apache.org/jira/browse/KAFKA-15612
> Project: Kafka
>  Issue Type: Task
>Reporter: Satish Duggana
>Priority: Major
> Fix For: 3.9.0
>
>
> Followup on the [PR 
> comment|https://github.com/apache/kafka/pull/14529#discussion_r1360877868]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16916) ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will run forever

2024-06-07 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853312#comment-17853312
 ] 

Haruki Okada commented on KAFKA-16916:
--

Seems [~apoorvmittal10] already identified the root cause and submitted the PR, 
thanks Apoorv, Luke.

> ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will 
> run forever
> --
>
> Key: KAFKA-16916
> URL: https://issues.apache.org/jira/browse/KAFKA-16916
> Project: Kafka
>  Issue Type: Bug
>Reporter: Luke Chen
>Assignee: Apoorv Mittal
>Priority: Blocker
> Fix For: 3.8.0
>
>
> ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will 
> run forever



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16916) ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will run forever

2024-06-07 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada reassigned KAFKA-16916:


Assignee: Apoorv Mittal  (was: Haruki Okada)

> ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will 
> run forever
> --
>
> Key: KAFKA-16916
> URL: https://issues.apache.org/jira/browse/KAFKA-16916
> Project: Kafka
>  Issue Type: Bug
>Reporter: Luke Chen
>Assignee: Apoorv Mittal
>Priority: Blocker
> Fix For: 3.8.0
>
>
> ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will 
> run forever



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16541) Potential leader epoch checkpoint file corruption on OS crash

2024-05-19 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847684#comment-17847684
 ] 

Haruki Okada commented on KAFKA-16541:
--

[~junrao] Hi, i've just submitted a patch. PTAL

> Potential leader epoch checkpoint file corruption on OS crash
> -
>
> Key: KAFKA-16541
> URL: https://issues.apache.org/jira/browse/KAFKA-16541
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.7.0
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Minor
>
> Pointed out by [~junrao] on 
> [GitHub|https://github.com/apache/kafka/pull/14242#discussion_r1556161125]
> [A patch for KAFKA-15046|https://github.com/apache/kafka/pull/14242] got rid 
> of fsync of leader-epoch ckeckpoint file in some path for performance reason.
> However, since now checkpoint file is flushed to the device asynchronously by 
> OS, content would corrupt if OS suddenly crashes (e.g. by power failure, 
> kernel panic) in the middle of flush.
> Corrupted checkpoint file could prevent Kafka broker to start-up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description

2024-05-02 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843116#comment-17843116
 ] 

Haruki Okada commented on KAFKA-16372:
--

[~mpedersencrwd] In either case, it takes time to fix because:
* revert to the documented behavior => this is a breaking change so we may not 
able to fix until 4.0 release
* introduce an exception base class => In my understanding this needs KIP

So fixing javadoc might be appropriate as the short term fix for now.

Besides, I would like to clarify the use case of differentiating 
synchronous/asynchronous timeout.

{quote}our actions might vary because of this{quote}

Could you tell how the action will be different depending on broker may receive 
the message or not?

> max.block.ms behavior inconsistency with javadoc and the config description
> ---
>
> Key: KAFKA-16372
> URL: https://issues.apache.org/jira/browse/KAFKA-16372
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Minor
>
> As of Kafka 3.7.0, the javadoc of 
> [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956]
>  states that it throws TimeoutException when max.block.ms is exceeded on 
> buffer allocation or initial metadata fetch.
> Also it's stated in [buffer.memory config 
> description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory].
> However, I found that this is not true because TimeoutException extends 
> ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as 
> FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086]
>  instead of throwing it.
> I wonder if this is a bug or the documentation error.
> Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16541) Potential leader epoch checkpoint file corruption on OS crash

2024-05-01 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842816#comment-17842816
 ] 

Haruki Okada commented on KAFKA-16541:
--

[~junrao] Yes.
My concern now is only changing renameDir may not be enough, so I'm trying to 
figure out if we can fix in another way without checking all call paths

> Potential leader epoch checkpoint file corruption on OS crash
> -
>
> Key: KAFKA-16541
> URL: https://issues.apache.org/jira/browse/KAFKA-16541
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.7.0
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Minor
>
> Pointed out by [~junrao] on 
> [GitHub|https://github.com/apache/kafka/pull/14242#discussion_r1556161125]
> [A patch for KAFKA-15046|https://github.com/apache/kafka/pull/14242] got rid 
> of fsync of leader-epoch ckeckpoint file in some path for performance reason.
> However, since now checkpoint file is flushed to the device asynchronously by 
> OS, content would corrupt if OS suddenly crashes (e.g. by power failure, 
> kernel panic) in the middle of flush.
> Corrupted checkpoint file could prevent Kafka broker to start-up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16651) KafkaProducer.send does not throw TimeoutException as documented

2024-05-01 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842582#comment-17842582
 ] 

Haruki Okada commented on KAFKA-16651:
--

Might be duplicated: https://issues.apache.org/jira/browse/KAFKA-16372

> KafkaProducer.send does not throw TimeoutException as documented
> 
>
> Key: KAFKA-16651
> URL: https://issues.apache.org/jira/browse/KAFKA-16651
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.6.2
>Reporter: Mike Pedersen
>Priority: Major
>
> In the JavaDoc for {{{}KafkaProducer#send(ProducerRecord, Callback){}}}, it 
> claims that it will throw a {{TimeoutException}} if blocking on fetching 
> metadata or allocating memory and surpassing {{{}max.block.ms{}}}.
> {quote}Throws:
> {{TimeoutException}} - If the time taken for fetching metadata or allocating 
> memory for the record has surpassed max.block.ms.
> {quote}
> ([link|https://kafka.apache.org/36/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html#send(org.apache.kafka.clients.producer.ProducerRecord,org.apache.kafka.clients.producer.Callback)])
> But this is not the case. As {{TimeoutException}} is an {{ApiException}} it 
> will hit [this 
> catch|https://github.com/a0x8o/kafka/blob/54eff6af115ee647f60129f2ce6a044cb17215d0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1073-L1084]
>  which will result in a failed future being returned instead of the exception 
> being thrown.
> The "allocating memory" part likely changed as part of 
> [KAFKA-3720|https://github.com/apache/kafka/pull/8399/files#diff-43491ffa1e0f8d28db071d8c23f1a76b54f1f20ea98cf6921bfd1c77a90446abR29]
>  which changed the base exception for buffer exhaustion exceptions to 
> {{{}TimeoutException{}}}. Timing out waiting on metadata suffers the same 
> issue, but it is not clear whether this has always been the case.
> This is basically a discrepancy between documentation and behavior, so it's a 
> question of which one should be adjusted.
> And on that, being able to differentiate between synchronous timeouts (as 
> caused by waiting on metadata or allocating memory) and asynchronous timeouts 
> (eg. timing out waiting for acks) is useful. In the former case we _know_ 
> that the broker has not received the event but in the latter it _may_ be that 
> the broker has received it but the ack could not be delivered, and our 
> actions might vary because of this. The current behavior makes this hard to 
> differentiate since both result in a {{TimeoutException}} being delivered via 
> the callback. Currently, I am relying on the exception message string to 
> differentiate these two, but this is basically just relying on implementation 
> detail that may change at any time. Therefore I would suggest to either:
>  * Revert to the documented behavior of throwing in case of synchronous 
> timeouts
>  * Correct the javadoc and introduce an exception base class/interface for 
> synchronous timeouts



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16541) Potential leader epoch checkpoint file corruption on OS crash

2024-04-11 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-16541:


 Summary: Potential leader epoch checkpoint file corruption on OS 
crash
 Key: KAFKA-16541
 URL: https://issues.apache.org/jira/browse/KAFKA-16541
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Haruki Okada
Assignee: Haruki Okada


Pointed out by [~junrao] on 
[GitHub|https://github.com/apache/kafka/pull/14242#discussion_r1556161125]

[A patch for KAFKA-15046|https://github.com/apache/kafka/pull/14242] got rid of 
fsync of leader-epoch ckeckpoint file in some path for performance reason.

However, since now checkpoint file is flushed to the device asynchronously by 
OS, content would corrupt if OS suddenly crashes (e.g. by power failure, kernel 
panic) in the middle of flush.

Corrupted checkpoint file could prevent Kafka broker to start-up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16393) SslTransportLayer doesn't implement write(ByteBuffer[], int, int) correctly

2024-03-21 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada reassigned KAFKA-16393:


Assignee: Haruki Okada

> SslTransportLayer doesn't implement write(ByteBuffer[], int, int) correctly
> ---
>
> Key: KAFKA-16393
> URL: https://issues.apache.org/jira/browse/KAFKA-16393
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Minor
>
> As of Kafka 3.7.0, SslTransportLayer.write(ByteBuffer[], int, int) is 
> implemented like below:
> {code:java}
> public long write(ByteBuffer[] srcs, int offset, int length) throws 
> IOException {
> ...
> int i = offset;
> while (i < length) {
> if (srcs[i].hasRemaining() || hasPendingWrites()) {
> 
> {code}
> The loop index starts at `offset` and ends with `length`.
> However this isn't correct because end-index should be `offset + length`.
> Let's say we have the array of ByteBuffer with length = 5 and try calling 
> this method with offset = 3, length = 1.
> In current code, `write(srcs, 3, 1)` doesn't attempt any write because the 
> loop condition is immediately false.
> For now, seems this method is only called with args offset = 0, length = 
> srcs.length in Kafka code base so not causing any problem though, we should 
> fix this because this could introduce subtle bug if use this method with 
> different args in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16393) SslTransportLayer doesn't implement write(ByteBuffer[], int, int) correctly

2024-03-20 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-16393:


 Summary: SslTransportLayer doesn't implement write(ByteBuffer[], 
int, int) correctly
 Key: KAFKA-16393
 URL: https://issues.apache.org/jira/browse/KAFKA-16393
 Project: Kafka
  Issue Type: Improvement
Reporter: Haruki Okada


As of Kafka 3.7.0, SslTransportLayer.write(ByteBuffer[], int, int) is 
implemented like below:

{code:java}
public long write(ByteBuffer[] srcs, int offset, int length) throws IOException 
{
...
int i = offset;
while (i < length) {
if (srcs[i].hasRemaining() || hasPendingWrites()) {

{code}

The loop index starts at `offset` and ends with `length`.
However this isn't correct because end-index should be `offset + length`.

Let's say we have the array of ByteBuffer with length = 5 and try calling this 
method with offset = 3, length = 1.

In current code, `write(srcs, 3, 1)` doesn't attempt any write because the loop 
condition is immediately false.

For now, seems this method is only called with args offset = 0, length = 
srcs.length in Kafka code base so not causing any problem though, we should fix 
this because this could introduce subtle bug if use this method with different 
args in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description

2024-03-20 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828960#comment-17828960
 ] 

Haruki Okada edited comment on KAFKA-16372 at 3/20/24 2:16 PM:
---

[~showuon] Agreed.
One concern is, IMO many developers expect this "exception thrown on buffer 
full after max.block.ms"-behavior (because it's stated in Javadoc while we 
rarely hit buffer-full situation so no one realized this discrepancy).

Even some famous open-sources have exception-handling code which doesn't work 
actually due to this. (e.g. 
[logback-kafka-appender|https://github.com/danielwegener/logback-kafka-appender/blob/master/src/main/java/com/github/danielwegener/logback/kafka/delivery/AsynchronousDeliveryStrategy.java#L29])

I wonder if just fixing Javadoc and Kafka documentation is fine, or we should 
make a heads up about this somewhere (e.g. at Kafka user mailing list).

I would like to hear committer's opinion.

Anyways, meanwhile let me start fixing the docs.


was (Author: ocadaruma):
[~showuon] Agreed.
One concern is, IMO many developers expect this "exception thrown on buffer 
full after max.block.ms"-behavior (because it's stated in Javadoc while we 
rarely hit buffer-full situation so no one realized this discrepancy).

Even some famous open-sources have exception-handling code which doesn't work 
actually due to this. (e.g. 
[logback-kafka-appender|https://github.com/danielwegener/logback-kafka-appender/blob/master/src/main/java/com/github/danielwegener/logback/kafka/delivery/AsynchronousDeliveryStrategy.java#L29])

I wonder if just fixing Javadoc and Kafka documentation is fine, or we should 
include a heads up about this somewhere (e.g. at Kafka user mailing list).

I would like to hear committer's opinion.

Anyways, meanwhile let me start fixing the docs.

> max.block.ms behavior inconsistency with javadoc and the config description
> ---
>
> Key: KAFKA-16372
> URL: https://issues.apache.org/jira/browse/KAFKA-16372
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Haruki Okada
>Priority: Minor
>
> As of Kafka 3.7.0, the javadoc of 
> [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956]
>  states that it throws TimeoutException when max.block.ms is exceeded on 
> buffer allocation or initial metadata fetch.
> Also it's stated in [buffer.memory config 
> description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory].
> However, I found that this is not true because TimeoutException extends 
> ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as 
> FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086]
>  instead of throwing it.
> I wonder if this is a bug or the documentation error.
> Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description

2024-03-20 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada reassigned KAFKA-16372:


Assignee: Haruki Okada

> max.block.ms behavior inconsistency with javadoc and the config description
> ---
>
> Key: KAFKA-16372
> URL: https://issues.apache.org/jira/browse/KAFKA-16372
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Minor
>
> As of Kafka 3.7.0, the javadoc of 
> [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956]
>  states that it throws TimeoutException when max.block.ms is exceeded on 
> buffer allocation or initial metadata fetch.
> Also it's stated in [buffer.memory config 
> description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory].
> However, I found that this is not true because TimeoutException extends 
> ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as 
> FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086]
>  instead of throwing it.
> I wonder if this is a bug or the documentation error.
> Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description

2024-03-20 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828960#comment-17828960
 ] 

Haruki Okada edited comment on KAFKA-16372 at 3/20/24 2:15 PM:
---

[~showuon] Agreed.
One concern is, IMO many developers expect this "exception thrown on buffer 
full after max.block.ms"-behavior (because it's stated in Javadoc while we 
rarely hit buffer-full situation so no one realized this discrepancy).

Even some famous open-sources have exception-handling code which doesn't work 
actually due to this. (e.g. 
[logback-kafka-appender|https://github.com/danielwegener/logback-kafka-appender/blob/master/src/main/java/com/github/danielwegener/logback/kafka/delivery/AsynchronousDeliveryStrategy.java#L29])

I wonder if just fixing Javadoc and Kafka documentation is fine, or we should 
include a heads up about this somewhere (e.g. at Kafka user mailing list).

I would like to hear committer's opinion.

Anyways, meanwhile let me start fixing the docs.


was (Author: ocadaruma):
[~showuon] Agreed.
One concern is, IMO many developers expect this "exception thrown on buffer 
full after max.block.ms"-behavior (because it's stated in Javadoc while we 
rarely hit buffer-full situation so no one realized this discrepancy).

Even some famous open-sources have exception-handling code which doesn't work 
actually due to this. (e.g. 
[logback-kafka-append|https://github.com/danielwegener/logback-kafka-appender/blob/master/src/main/java/com/github/danielwegener/logback/kafka/delivery/AsynchronousDeliveryStrategy.java#L29])

I wonder if just fixing Javadoc and Kafka documentation is fine, or we should 
include a heads up about this somewhere (e.g. at Kafka user mailing list).

I would like to hear committer's opinion.

Anyways, meanwhile let me start fixing the docs.

> max.block.ms behavior inconsistency with javadoc and the config description
> ---
>
> Key: KAFKA-16372
> URL: https://issues.apache.org/jira/browse/KAFKA-16372
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Haruki Okada
>Priority: Minor
>
> As of Kafka 3.7.0, the javadoc of 
> [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956]
>  states that it throws TimeoutException when max.block.ms is exceeded on 
> buffer allocation or initial metadata fetch.
> Also it's stated in [buffer.memory config 
> description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory].
> However, I found that this is not true because TimeoutException extends 
> ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as 
> FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086]
>  instead of throwing it.
> I wonder if this is a bug or the documentation error.
> Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description

2024-03-20 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828960#comment-17828960
 ] 

Haruki Okada commented on KAFKA-16372:
--

[~showuon] Agreed.
One concern is, IMO many developers expect this "exception thrown on buffer 
full after max.block.ms"-behavior (because it's stated in Javadoc while we 
rarely hit buffer-full situation so no one realized this discrepancy).

Even some famous open-sources have exception-handling code which doesn't work 
actually due to this. (e.g. 
[logback-kafka-append|https://github.com/danielwegener/logback-kafka-appender/blob/master/src/main/java/com/github/danielwegener/logback/kafka/delivery/AsynchronousDeliveryStrategy.java#L29])

I wonder if just fixing Javadoc and Kafka documentation is fine, or we should 
include a heads up about this somewhere (e.g. at Kafka user mailing list).

I would like to hear committer's opinion.

Anyways, meanwhile let me start fixing the docs.

> max.block.ms behavior inconsistency with javadoc and the config description
> ---
>
> Key: KAFKA-16372
> URL: https://issues.apache.org/jira/browse/KAFKA-16372
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Haruki Okada
>Priority: Minor
>
> As of Kafka 3.7.0, the javadoc of 
> [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956]
>  states that it throws TimeoutException when max.block.ms is exceeded on 
> buffer allocation or initial metadata fetch.
> Also it's stated in [buffer.memory config 
> description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory].
> However, I found that this is not true because TimeoutException extends 
> ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as 
> FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086]
>  instead of throwing it.
> I wonder if this is a bug or the documentation error.
> Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description

2024-03-14 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-16372:
-
Description: 
As of Kafka 3.7.0, the javadoc of 
[KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956]
 states that it throws TimeoutException when max.block.ms is exceeded on buffer 
allocation or initial metadata fetch.

Also it's stated in [buffer.memory config 
description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory].

However, I found that this is not true because TimeoutException extends 
ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as 
FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086]
 instead of throwing it.

I wonder if this is a bug or the documentation error.

Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced.

  was:
As of Kafka 3.7.0, the javadoc of 
[KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956]
 states that it throws TimeoutException when max.block.ms is exceeded on buffer 
allocation or initial metadata fetch.

Also it's stated in [max.block.ms config 
description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory].

However, I found that this is not true because TimeoutException extends 
ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as 
FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086]
 instead of throwing it.

I wonder if this is a bug or the documentation error.

Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced.


> max.block.ms behavior inconsistency with javadoc and the config description
> ---
>
> Key: KAFKA-16372
> URL: https://issues.apache.org/jira/browse/KAFKA-16372
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Haruki Okada
>Priority: Minor
>
> As of Kafka 3.7.0, the javadoc of 
> [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956]
>  states that it throws TimeoutException when max.block.ms is exceeded on 
> buffer allocation or initial metadata fetch.
> Also it's stated in [buffer.memory config 
> description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory].
> However, I found that this is not true because TimeoutException extends 
> ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as 
> FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086]
>  instead of throwing it.
> I wonder if this is a bug or the documentation error.
> Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description

2024-03-14 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-16372:
-
Component/s: producer 
 (was: clients)
   Priority: Minor  (was: Major)

> max.block.ms behavior inconsistency with javadoc and the config description
> ---
>
> Key: KAFKA-16372
> URL: https://issues.apache.org/jira/browse/KAFKA-16372
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Haruki Okada
>Priority: Minor
>
> As of Kafka 3.7.0, the javadoc of 
> [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956]
>  states that it throws TimeoutException when max.block.ms is exceeded on 
> buffer allocation or initial metadata fetch.
> Also it's stated in [max.block.ms config 
> description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory].
> However, I found that this is not true because TimeoutException extends 
> ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as 
> FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086]
>  instead of throwing it.
> I wonder if this is a bug or the documentation error.
> Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description

2024-03-14 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-16372:


 Summary: max.block.ms behavior inconsistency with javadoc and the 
config description
 Key: KAFKA-16372
 URL: https://issues.apache.org/jira/browse/KAFKA-16372
 Project: Kafka
  Issue Type: Bug
  Components: clients
Reporter: Haruki Okada


As of Kafka 3.7.0, the javadoc of 
[KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956]
 states that it throws TimeoutException when max.block.ms is exceeded on buffer 
allocation or initial metadata fetch.

Also it's stated in [max.block.ms config 
description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory].

However, I found that this is not true because TimeoutException extends 
ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as 
FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086]
 instead of throwing it.

I wonder if this is a bug or the documentation error.

Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-9693) Kafka latency spikes caused by log segment flush on roll

2023-11-29 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791354#comment-17791354
 ] 

Haruki Okada commented on KAFKA-9693:
-

[~paolomoriello] [~novosibman] Hi, I believe the latency spike due to flushing 
on log.roll is now resolved by https://issues.apache.org/jira/browse/KAFKA-15046

> Kafka latency spikes caused by log segment flush on roll
> 
>
> Key: KAFKA-9693
> URL: https://issues.apache.org/jira/browse/KAFKA-9693
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
> Environment: OS: Amazon Linux 2
> Kafka version: 2.2.1
>Reporter: Paolo Moriello
>Assignee: Paolo Moriello
>Priority: Major
>  Labels: Performance, latency, performance
> Fix For: 3.7.0
>
> Attachments: image-2020-03-10-13-17-34-618.png, 
> image-2020-03-10-14-36-21-807.png, image-2020-03-10-15-00-23-020.png, 
> image-2020-03-10-15-00-54-204.png, image-2020-06-23-12-24-46-548.png, 
> image-2020-06-23-12-24-58-788.png, image-2020-06-26-13-43-21-723.png, 
> image-2020-06-26-13-46-52-861.png, image-2020-06-26-14-06-01-505.png, 
> latency_plot2.png
>
>
> h1. Summary
> When a log segment fills up, Kafka rolls over onto a new active segment and 
> force the flush of the old segment to disk. When this happens, log segment 
> _append_ duration increase causing important latency spikes on producer(s) 
> and replica(s). This ticket aims to highlight the problem and propose a 
> simple mitigation: add a new configuration to enable/disable rolled segment 
> flush.
> h1. 1. Phenomenon
> Response time of produce request (99th ~ 99.9th %ile) repeatedly spikes to 
> ~50x-200x more than usual. For instance, normally 99th %ile is lower than 
> 5ms, but when this issue occurs, it marks 100ms to 200ms. 99.9th and 99.99th 
> %iles even jump to 500-700ms.
> Latency spikes happen at constant frequency (depending on the input 
> throughput), for small amounts of time. All the producers experience a 
> latency increase at the same time.
> h1. !image-2020-03-10-13-17-34-618.png|width=942,height=314!
> {{Example of response time plot observed during on a single producer.}}
> URPs rarely appear in correspondence of the latency spikes too. This is 
> harder to reproduce, but from time to time it is possible to see a few 
> partitions going out of sync in correspondence of a spike.
> h1. 2. Experiment
> h2. 2.1 Setup
> Kafka cluster hosted on AWS EC2 instances.
> h4. Cluster
>  * 15 Kafka brokers: (EC2 m5.4xlarge)
>  ** Disk: 1100Gb EBS volumes (4750Mbps)
>  ** Network: 10 Gbps
>  ** CPU: 16 Intel Xeon Platinum 8000
>  ** Memory: 64Gb
>  * 3 Zookeeper nodes: m5.large
>  * 6 producers on 6 EC2 instances in the same region
>  * 1 topic, 90 partitions - replication factor=3
> h4. Broker config
> Relevant configurations:
> {quote}num.io.threads=8
>  num.replica.fetchers=2
>  offsets.topic.replication.factor=3
>  num.network.threads=5
>  num.recovery.threads.per.data.dir=2
>  min.insync.replicas=2
>  num.partitions=1
> {quote}
> h4. Perf Test
>  * Throughput ~6000-8000 (~40-70Mb/s input + replication = ~120-210Mb/s per 
> broker)
>  * record size = 2
>  * Acks = 1, linger.ms = 1, compression.type = none
>  * Test duration: ~20/30min
> h2. 2.2 Analysis
> Our analysis showed an high +correlation between log segment flush count/rate 
> and the latency spikes+. This indicates that the spikes in max latency are 
> related to Kafka behavior on rolling over new segments.
> The other metrics did not show any relevant impact on any hardware component 
> of the cluster, eg. cpu, memory, network traffic, disk throughput...
>  
>  !latency_plot2.png|width=924,height=308!
>  {{Correlation between latency spikes and log segment flush count. p50, p95, 
> p99, p999 and p latencies (left axis, ns) and the flush #count (right 
> axis, stepping blue line in plot).}}
> Kafka schedules logs flushing (this includes flushing the file record 
> containing log entries, the offset index, the timestamp index and the 
> transaction index) during _roll_ operations. A log is rolled over onto a new 
> empty log when:
>  * the log segment is full
>  * the maxtime has elapsed since the timestamp of first message in the 
> segment (or, in absence of it, since the create time)
>  * the index is full
> In this case, the increase in latency happens on _append_ of a new message 
> set to the active segment of the log. This is a synchronous operation which 
> therefore blocks producers requests, causing the latency increase.
> To confirm this, I instrumented Kafka to measure the duration of 
> FileRecords.append(MemoryRecords) method, which is responsible of writing 
> memory records to file. As a result, I observed the same spiky pattern as in 
> the producer latency, with a

[jira] [Created] (KAFKA-15924) Flaky test - QuorumControllerTest.testFatalMetadataReplayErrorOnActive

2023-11-29 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15924:


 Summary: Flaky test - 
QuorumControllerTest.testFatalMetadataReplayErrorOnActive
 Key: KAFKA-15924
 URL: https://issues.apache.org/jira/browse/KAFKA-15924
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/15/tests]

 
{code:java}
Error
org.opentest4j.AssertionFailedError: expected:  
but was: 
Stacktrace
org.opentest4j.AssertionFailedError: expected:  
but was: 
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at 
app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141)
at 
app//org.apache.kafka.controller.QuorumControllerTest.testFatalMetadataReplayErrorOnActive(QuorumControllerTest.java:1132)
at 
java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
 Method)
at 
java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566)
at 
app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
at 
app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
at 
app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
at 
app//org.junit.platform.engine.support.hie

[jira] [Updated] (KAFKA-15924) Flaky test - QuorumControllerTest.testFatalMetadataReplayErrorOnActive

2023-11-29 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15924:
-
Attachment: stdout.log

> Flaky test - QuorumControllerTest.testFatalMetadataReplayErrorOnActive
> --
>
> Key: KAFKA-15924
> URL: https://issues.apache.org/jira/browse/KAFKA-15924
> Project: Kafka
>  Issue Type: Bug
>Reporter: Haruki Okada
>Priority: Major
>  Labels: flaky-test
> Attachments: stdout.log
>
>
> [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/15/tests]
>  
> {code:java}
> Error
> org.opentest4j.AssertionFailedError: expected: 
>  but was: 
> Stacktrace
> org.opentest4j.AssertionFailedError: expected: 
>  but was: 
> at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> at 
> app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
> at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
> at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
> at 
> app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141)
> at 
> app//org.apache.kafka.controller.QuorumControllerTest.testFatalMetadataReplayErrorOnActive(QuorumControllerTest.java:1132)
> at 
> java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>  Method)
> at 
> java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
> at 
> app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> at 
> app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
> at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
> at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
> at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
> at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
> at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
> at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
> at 
> app//org.juni

[jira] [Updated] (KAFKA-15920) Flaky test - PlaintextConsumerTest.testCoordinatorFailover

2023-11-28 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15920:
-
Attachment: stdout.log

> Flaky test - PlaintextConsumerTest.testCoordinatorFailover
> --
>
> Key: KAFKA-15920
> URL: https://issues.apache.org/jira/browse/KAFKA-15920
> Project: Kafka
>  Issue Type: Bug
>Reporter: Haruki Okada
>Priority: Major
>  Labels: flaky-test
> Attachments: stdout.log
>
>
> [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
> {code:java}
> Error
> org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
> Stacktrace
> org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
> at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> at 
> app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
> at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
> at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
> at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:527)
> at 
> app//kafka.api.AbstractConsumerTest.ensureNoRebalance(AbstractConsumerTest.scala:326)
> at 
> app//kafka.api.BaseConsumerTest.testCoordinatorFailover(BaseConsumerTest.scala:109)
> at 
> java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>  Method)
> at 
> java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
> at 
> app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
> at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
> at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
> at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
> at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
> at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
> at 
> app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
> at 
> app//

[jira] [Updated] (KAFKA-15921) Flaky test - SaslScramSslEndToEndAuthorizationTest.testAuthentications

2023-11-28 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15921:
-
Attachment: stdout.log

> Flaky test - SaslScramSslEndToEndAuthorizationTest.testAuthentications
> --
>
> Key: KAFKA-15921
> URL: https://issues.apache.org/jira/browse/KAFKA-15921
> Project: Kafka
>  Issue Type: Bug
>Reporter: Haruki Okada
>Priority: Major
>  Labels: flaky-test
> Attachments: stdout.log
>
>
> [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
> {code:java}
> Error
> org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
> Stacktrace
> org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
> at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> at 
> app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
> at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166)
> at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161)
> at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:628)
> at 
> app//kafka.api.SaslScramSslEndToEndAuthorizationTest.testAuthentications(SaslScramSslEndToEndAuthorizationTest.scala:92)
> at 
> java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> at 
> java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at 
> java.base@17.0.7/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base@17.0.7/java.lang.reflect.Method.invoke(Method.java:568)
> at 
> app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
> at 
> app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> at 
> app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
> at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
> at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
> at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
> at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
> at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
> at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
> at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
> at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
> at 
> app//org.junit.platf

[jira] [Updated] (KAFKA-15919) Flaky test - BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs

2023-11-28 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15919:
-
Description: 
[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
{code:java}
Error
org.opentest4j.AssertionFailedError: expected:  
but was: 
Stacktrace
org.opentest4j.AssertionFailedError: expected:  
but was: 
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at 
app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141)
at 
app//kafka.server.BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs(BrokerLifecycleManagerTest.scala:236)
at 
java.base@21.0.1/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base@21.0.1/java.lang.reflect.Method.invoke(Method.java:580)
at 
app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
at 
app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
at 
app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
at java.base@21.0.1/java.util.ArrayList.forEach(ArrayList.java:1596)
at 
app//org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$exec

[jira] [Updated] (KAFKA-15918) Flaky test - OffsetsApiIntegrationTest.testResetSinkConnectorOffsets

2023-11-28 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15918:
-
Attachment: stdout.log

> Flaky test - OffsetsApiIntegrationTest.testResetSinkConnectorOffsets
> 
>
> Key: KAFKA-15918
> URL: https://issues.apache.org/jira/browse/KAFKA-15918
> Project: Kafka
>  Issue Type: Bug
>Reporter: Haruki Okada
>Priority: Major
>  Labels: flaky-test
> Attachments: stdout.log
>
>
> [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
>  
> {code:java}
> Error
> org.opentest4j.AssertionFailedError: Condition not met within timeout 3. 
> Sink connector consumer group offsets should catch up to the topic end 
> offsets ==> expected:  but was: 
> Stacktrace
> org.opentest4j.AssertionFailedError: Condition not met within timeout 3. 
> Sink connector consumer group offsets should catch up to the topic end 
> offsets ==> expected:  but was: 
> at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
> at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
> at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)
> at 
> org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331)
> at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379)
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328)
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312)
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302)
> at 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:917)
> at 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.resetAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:725)
> at 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSinkConnectorOffsets(OffsetsApiIntegrationTest.java:672)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
> at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
> at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.refle

[jira] [Updated] (KAFKA-15917) Flaky test - OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks

2023-11-28 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15917:
-
Attachment: stdout.log

> Flaky test - 
> OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks
> ---
>
> Key: KAFKA-15917
> URL: https://issues.apache.org/jira/browse/KAFKA-15917
> Project: Kafka
>  Issue Type: Bug
>Reporter: Haruki Okada
>Priority: Major
>  Labels: flaky-test
> Attachments: stdout.log
>
>
> [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
>  
>  
> {code:java}
> Error
> java.lang.AssertionError: 
> Expected: a string containing "zombie sink task"
>  but: was "Could not alter connector offsets. Error response: 
> {"error_code":500,"message":"Failed to alter consumer group offsets for 
> connector test-connector"}"
> Stacktrace
> java.lang.AssertionError: 
> Expected: a string containing "zombie sink task"
>  but: was "Could not alter connector offsets. Error response: 
> {"error_code":500,"message":"Failed to alter consumer group offsets for 
> connector test-connector"}"
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
> at 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks(OffsetsApiIntegrationTest.java:431)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
> at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
> at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
> at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
> at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
> at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
> at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.jav

[jira] [Created] (KAFKA-15921) Flaky test - SaslScramSslEndToEndAuthorizationTest.testAuthentications

2023-11-28 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15921:


 Summary: Flaky test - 
SaslScramSslEndToEndAuthorizationTest.testAuthentications
 Key: KAFKA-15921
 URL: https://issues.apache.org/jira/browse/KAFKA-15921
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
{code:java}
Error
org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
Stacktrace
org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at 
app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161)
at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:628)
at 
app//kafka.api.SaslScramSslEndToEndAuthorizationTest.testAuthentications(SaslScramSslEndToEndAuthorizationTest.scala:92)
at 
java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base@17.0.7/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base@17.0.7/java.lang.reflect.Method.invoke(Method.java:568)
at 
app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
at 
app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
at 
app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
at 
app//org.junit.platform.engine.support.hi

[jira] [Created] (KAFKA-15920) Flaky test - PlaintextConsumerTest.testCoordinatorFailover

2023-11-28 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15920:


 Summary: Flaky test - PlaintextConsumerTest.testCoordinatorFailover
 Key: KAFKA-15920
 URL: https://issues.apache.org/jira/browse/KAFKA-15920
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
{code:java}
Error
org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
Stacktrace
org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at 
app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:527)
at 
app//kafka.api.AbstractConsumerTest.ensureNoRebalance(AbstractConsumerTest.scala:326)
at 
app//kafka.api.BaseConsumerTest.testCoordinatorFailover(BaseConsumerTest.scala:109)
at 
java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
 Method)
at 
java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566)
at 
app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
at 
app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
at 
app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
at 
app

[jira] [Updated] (KAFKA-15917) Flaky test - OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks

2023-11-28 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15917:
-
Summary: Flaky test - 
OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks  (was: 
Flaky test - OffsetsApiIntegrationTest. 
testAlterSinkConnectorOffsetsZombieSinkTasks)

> Flaky test - 
> OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks
> ---
>
> Key: KAFKA-15917
> URL: https://issues.apache.org/jira/browse/KAFKA-15917
> Project: Kafka
>  Issue Type: Bug
>Reporter: Haruki Okada
>Priority: Major
>  Labels: flaky-test
>
> [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
>  
>  
> {code:java}
> Error
> java.lang.AssertionError: 
> Expected: a string containing "zombie sink task"
>  but: was "Could not alter connector offsets. Error response: 
> {"error_code":500,"message":"Failed to alter consumer group offsets for 
> connector test-connector"}"
> Stacktrace
> java.lang.AssertionError: 
> Expected: a string containing "zombie sink task"
>  but: was "Could not alter connector offsets. Error response: 
> {"error_code":500,"message":"Failed to alter consumer group offsets for 
> connector test-connector"}"
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
> at 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks(OffsetsApiIntegrationTest.java:431)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
> at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
> at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
> at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
> at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
> at com.sun.proxy

[jira] [Created] (KAFKA-15919) Flaky test - BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs

2023-11-28 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15919:


 Summary: Flaky test - 
BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs
 Key: KAFKA-15919
 URL: https://issues.apache.org/jira/browse/KAFKA-15919
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
{code:java}
Error
org.opentest4j.AssertionFailedError: expected:  
but was: 
Stacktrace
org.opentest4j.AssertionFailedError: expected:  
but was: 
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at 
app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141)
at 
app//kafka.server.BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs(BrokerLifecycleManagerTest.scala:236)
at 
java.base@21.0.1/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base@21.0.1/java.lang.reflect.Method.invoke(Method.java:580)
at 
app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
at 
app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
at 
app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
at java.base@21.0.1/java.util.ArrayList.forEach(ArrayList.java:1596)
at 
app//org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecuto

[jira] [Updated] (KAFKA-15918) Flaky test - OffsetsApiIntegrationTest.testResetSinkConnectorOffsets

2023-11-28 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15918:
-
Summary: Flaky test - 
OffsetsApiIntegrationTest.testResetSinkConnectorOffsets  (was: Flaky test - 
OffsetsApiIntegrationTest. testResetSinkConnectorOffsets)

> Flaky test - OffsetsApiIntegrationTest.testResetSinkConnectorOffsets
> 
>
> Key: KAFKA-15918
> URL: https://issues.apache.org/jira/browse/KAFKA-15918
> Project: Kafka
>  Issue Type: Bug
>Reporter: Haruki Okada
>Priority: Major
>  Labels: flaky-test
>
> [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
>  
> {code:java}
> Error
> org.opentest4j.AssertionFailedError: Condition not met within timeout 3. 
> Sink connector consumer group offsets should catch up to the topic end 
> offsets ==> expected:  but was: 
> Stacktrace
> org.opentest4j.AssertionFailedError: Condition not met within timeout 3. 
> Sink connector consumer group offsets should catch up to the topic end 
> offsets ==> expected:  but was: 
> at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
> at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
> at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)
> at 
> org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331)
> at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379)
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328)
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312)
> at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302)
> at 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:917)
> at 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.resetAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:725)
> at 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSinkConnectorOffsets(OffsetsApiIntegrationTest.java:672)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
> at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
> at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.

[jira] [Created] (KAFKA-15918) Flaky test - OffsetsApiIntegrationTest. testResetSinkConnectorOffsets

2023-11-28 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15918:


 Summary: Flaky test - OffsetsApiIntegrationTest. 
testResetSinkConnectorOffsets
 Key: KAFKA-15918
 URL: https://issues.apache.org/jira/browse/KAFKA-15918
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]

 
{code:java}
Error
org.opentest4j.AssertionFailedError: Condition not met within timeout 3. 
Sink connector consumer group offsets should catch up to the topic end offsets 
==> expected:  but was: 
Stacktrace
org.opentest4j.AssertionFailedError: Condition not met within timeout 3. 
Sink connector consumer group offsets should catch up to the topic end offsets 
==> expected:  but was: 
at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)
at 
org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331)
at 
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302)
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:917)
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.resetAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:725)
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSinkConnectorOffsets(OffsetsApiIntegrationTest.java:672)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDi

[jira] [Created] (KAFKA-15917) Flaky test - OffsetsApiIntegrationTest. testAlterSinkConnectorOffsetsZombieSinkTasks

2023-11-28 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15917:


 Summary: Flaky test - OffsetsApiIntegrationTest. 
testAlterSinkConnectorOffsetsZombieSinkTasks
 Key: KAFKA-15917
 URL: https://issues.apache.org/jira/browse/KAFKA-15917
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]

 

 
{code:java}
Error
java.lang.AssertionError: 
Expected: a string containing "zombie sink task"
 but: was "Could not alter connector offsets. Error response: 
{"error_code":500,"message":"Failed to alter consumer group offsets for 
connector test-connector"}"
Stacktrace
java.lang.AssertionError: 
Expected: a string containing "zombie sink task"
 but: was "Could not alter connector offsets. Error response: 
{"error_code":500,"message":"Failed to alter consumer group offsets for 
connector test-connector"}"
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks(OffsetsApiIntegrationTest.java:431)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
at 
org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
at 
org.gradle.process.internal.worker.child.SystemApplicationCla

[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier

2023-11-02 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1778#comment-1778
 ] 

Haruki Okada commented on KAFKA-15609:
--

I added 
[MmapTest3.java|https://gist.github.com/ocadaruma/fc26fc122829c63cb61e14d7fc96896d]
 and confirmed the reads by read() and writes via mmap are consistent.

 

> Would you happen to have some reference that I can read about this?

 

I failed to find good web reference but the book "The Linux Programming 
Interface 2nd edition" mentions about this.

excerpt:
{quote}Like many other modern UNIX implementations, Linux provides a so-called 
unified virtual memory system. This means that, where possible, memory mappings 
and blocks of the buffer cache share the same pages of physical memory. Thus, 
the views of a file obtained via a mapping and via I/O system calls (read(), 
write(), and so on) are always consistent, and the only use of msync() is to 
force the contents of a mapped region to be flushed to disk.{quote}

> Corrupted index uploaded to remote tier
> ---
>
> Key: KAFKA-15609
> URL: https://issues.apache.org/jira/browse/KAFKA-15609
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.6.0
>Reporter: Divij Vaidya
>Priority: Minor
>
> While testing Tiered Storage, we have observed corrupt indexes being present 
> in remote tier. One such situation is covered here at 
> https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another 
> such possible case of corruption.
> Potential cause of index corruption:
> We want to ensure that the file we are passing to RSM plugin contains all the 
> data which is present in MemoryByteBuffer i.e. we should have flushed the 
> MemoryByteBuffer to the file using force(). In Kafka, when we close a 
> segment, indexes are flushed asynchronously [1]. Hence, it might be possible 
> that when we are passing the file to RSM, the file doesn't contain flushed 
> data. Hence, we may end up uploading indexes which haven't been flushed yet. 
> Ideally, the contract should enforce that we force flush the content of 
> MemoryByteBuffer before we give the file for RSM. This will ensure that 
> indexes are not corrupted/incomplete.
> [1] 
> [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15609) Corrupted index uploaded to remote tier

2023-11-02 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782194#comment-17782194
 ] 

Haruki Okada edited comment on KAFKA-15609 at 11/2/23 3:36 PM:
---

[~divijvaidya]

Right, when we call two mmaps, they will be mapped to different virtual address.

However, as long as we use FileChannel.map, the write to one mmap is guaranteed 
to be visible to another mmap because they will be mapped with MAP_SHARED flag 
(except MapMode.PRIVATE).

[https://github.com/adoptium/jdk11u/blob/jdk-11.0.21%2B7/src/java.base/unix/native/libnio/ch/FileChannelImpl.c#L88]


was (Author: ocadaruma):
[~divijvaidya]

Right, when we call two mmaps, they will be mapped to different virtual address.

However, as long as we use FileChannel.map, the write to one mmap is guaranteed 
to be visible to another mmap because they will be mapped with MAP_SHARED flag.

https://github.com/adoptium/jdk11u/blob/jdk-11.0.21%2B7/src/java.base/unix/native/libnio/ch/FileChannelImpl.c#L88

> Corrupted index uploaded to remote tier
> ---
>
> Key: KAFKA-15609
> URL: https://issues.apache.org/jira/browse/KAFKA-15609
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.6.0
>Reporter: Divij Vaidya
>Priority: Minor
>
> While testing Tiered Storage, we have observed corrupt indexes being present 
> in remote tier. One such situation is covered here at 
> https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another 
> such possible case of corruption.
> Potential cause of index corruption:
> We want to ensure that the file we are passing to RSM plugin contains all the 
> data which is present in MemoryByteBuffer i.e. we should have flushed the 
> MemoryByteBuffer to the file using force(). In Kafka, when we close a 
> segment, indexes are flushed asynchronously [1]. Hence, it might be possible 
> that when we are passing the file to RSM, the file doesn't contain flushed 
> data. Hence, we may end up uploading indexes which haven't been flushed yet. 
> Ideally, the contract should enforce that we force flush the content of 
> MemoryByteBuffer before we give the file for RSM. This will ensure that 
> indexes are not corrupted/incomplete.
> [1] 
> [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15609) Corrupted index uploaded to remote tier

2023-11-02 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782202#comment-17782202
 ] 

Haruki Okada edited comment on KAFKA-15609 at 11/2/23 4:12 PM:
---

I validated the MappedByteBuffer behavior with this Java code: 
[https://gist.github.com/ocadaruma/fc26fc122829c63cb61e14d7fc96896d]

 

When we create two mmaps from the same file, writes to 1st one are always 
visible to 2nd one unless we specify MapMode.PRIVATE.

 

Also, in my understanding, page cache is directly mapped to the mmap area so 
even when we try to read the file with ordinary read() call which is written by 
mmap, the content should be consistent. at least in Linux


was (Author: ocadaruma):
I validated the MappedByteBuffer behavior with this Java code: 
[https://gist.github.com/ocadaruma/fc26fc122829c63cb61e14d7fc96896d]

 

When we create two mmaps from the same file, writes to 1st one are always 
visible to 2nd one unless we specify MapMode.PRIVATE.

> Corrupted index uploaded to remote tier
> ---
>
> Key: KAFKA-15609
> URL: https://issues.apache.org/jira/browse/KAFKA-15609
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.6.0
>Reporter: Divij Vaidya
>Priority: Minor
>
> While testing Tiered Storage, we have observed corrupt indexes being present 
> in remote tier. One such situation is covered here at 
> https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another 
> such possible case of corruption.
> Potential cause of index corruption:
> We want to ensure that the file we are passing to RSM plugin contains all the 
> data which is present in MemoryByteBuffer i.e. we should have flushed the 
> MemoryByteBuffer to the file using force(). In Kafka, when we close a 
> segment, indexes are flushed asynchronously [1]. Hence, it might be possible 
> that when we are passing the file to RSM, the file doesn't contain flushed 
> data. Hence, we may end up uploading indexes which haven't been flushed yet. 
> Ideally, the contract should enforce that we force flush the content of 
> MemoryByteBuffer before we give the file for RSM. This will ensure that 
> indexes are not corrupted/incomplete.
> [1] 
> [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier

2023-11-02 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782194#comment-17782194
 ] 

Haruki Okada commented on KAFKA-15609:
--

[~divijvaidya]

Right, when we call two mmaps, they will be mapped to different virtual address.

However, as long as we use FileChannel.map, the write to one mmap is guaranteed 
to be visible to another mmap because they will be mapped with MAP_SHARED flag.

https://github.com/adoptium/jdk11u/blob/jdk-11.0.21%2B7/src/java.base/unix/native/libnio/ch/FileChannelImpl.c#L88

> Corrupted index uploaded to remote tier
> ---
>
> Key: KAFKA-15609
> URL: https://issues.apache.org/jira/browse/KAFKA-15609
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.6.0
>Reporter: Divij Vaidya
>Priority: Minor
>
> While testing Tiered Storage, we have observed corrupt indexes being present 
> in remote tier. One such situation is covered here at 
> https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another 
> such possible case of corruption.
> Potential cause of index corruption:
> We want to ensure that the file we are passing to RSM plugin contains all the 
> data which is present in MemoryByteBuffer i.e. we should have flushed the 
> MemoryByteBuffer to the file using force(). In Kafka, when we close a 
> segment, indexes are flushed asynchronously [1]. Hence, it might be possible 
> that when we are passing the file to RSM, the file doesn't contain flushed 
> data. Hence, we may end up uploading indexes which haven't been flushed yet. 
> Ideally, the contract should enforce that we force flush the content of 
> MemoryByteBuffer before we give the file for RSM. This will ensure that 
> indexes are not corrupted/incomplete.
> [1] 
> [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier

2023-11-02 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782202#comment-17782202
 ] 

Haruki Okada commented on KAFKA-15609:
--

I validated the MappedByteBuffer behavior with this Java code: 
[https://gist.github.com/ocadaruma/fc26fc122829c63cb61e14d7fc96896d]

 

When we create two mmaps from the same file, writes to 1st one are always 
visible to 2nd one unless we specify MapMode.PRIVATE.

> Corrupted index uploaded to remote tier
> ---
>
> Key: KAFKA-15609
> URL: https://issues.apache.org/jira/browse/KAFKA-15609
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.6.0
>Reporter: Divij Vaidya
>Priority: Minor
>
> While testing Tiered Storage, we have observed corrupt indexes being present 
> in remote tier. One such situation is covered here at 
> https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another 
> such possible case of corruption.
> Potential cause of index corruption:
> We want to ensure that the file we are passing to RSM plugin contains all the 
> data which is present in MemoryByteBuffer i.e. we should have flushed the 
> MemoryByteBuffer to the file using force(). In Kafka, when we close a 
> segment, indexes are flushed asynchronously [1]. Hence, it might be possible 
> that when we are passing the file to RSM, the file doesn't contain flushed 
> data. Hence, we may end up uploading indexes which haven't been flushed yet. 
> Ideally, the contract should enforce that we force flush the content of 
> MemoryByteBuffer before we give the file for RSM. This will ensure that 
> indexes are not corrupted/incomplete.
> [1] 
> [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15688) Partition leader election not running when disk IO hangs

2023-10-26 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779956#comment-17779956
 ] 

Haruki Okada commented on KAFKA-15688:
--

> Is it possible to add such a feature to Kafka so that it shuts down in this 
> case as well?

 

That could be tricky to implement at Kafka level since to make disk IO timeout 
in case of device hung, a kind of timer has to be set on another thread for 
every IO. (Because the thread executing I/O can do nothing for timeout)

 

I guess there are several options to address the issue:

1) Set I/O timeout at OS/device level to cause IOException (which causes Kafka 
to stop) at Kafka level on disk hung

2) Deploy another process to watch disk health and let it kill Kafka on disk 
hung

 

For either solutions, a concern is, when a broker is unable to process requests 
due to disk hung (without leadership change), the broker may kick out other 
followers from ISR set unexpectedly (since it can't handle Fetch requests so 
can't increment HW) before it got killed.

In this case, the broker could be the last ISR so stopping it may cause the 
partition to be offline, which needs unclean leader election.

[KIP-966|https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas]
 could be the solution for this problem though.

 

Apart from above, [https://github.com/apache/kafka/pull/14242] could mitigate 
your issue I guess.

The thing is, even when disk got hung, produce shouldn't be disrupted because 
Kafka doesn't wait IOs for log-append to be synched to the device. (unless too 
many dirty pages accumulate)

However, as of Kafka 3.3.2, there are several paths which calls fsync on 
log-roll with holding UnifiedLog#lock. Due to this, if disk hungs during doing 
fsync, UnifiedLog#lock will be held for long time and all subsequent requests 
against same parittion may be blocked in the meantime.

 

Actually, we encountered similar issue on our on-prem Kafka which consists of a 
lot of HDDs that some HDD got broken on a daily basis.

The frequency of the issue is mitigated by the above patch indeed.

> Partition leader election not running when disk IO hangs
> 
>
> Key: KAFKA-15688
> URL: https://issues.apache.org/jira/browse/KAFKA-15688
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Peter Sinoros-Szabo
>Priority: Major
>
> We run our Kafka brokers on AWS EC2 nodes using AWS EBS as disk to store the 
> messages.
> Recently we had an issue when the EBS disk IO just stalled so Kafka was not 
> able to write or read anything from the disk, well except the data that was 
> still in page cache or that still fitted into the page cache before it is 
> synced to EBS.
> We experienced this issue in a few cases: sometimes partition leaders were 
> moved away to other brokers automatically, in other cases that didn't happen 
> and caused the Producers to fail producing messages to that broker.
> My expectation from Kafka in such a case would be that it notices it and 
> moves the leaders to other brokers where the partition has in sync replicas, 
> but as I mentioned this didn't happen always.
> I know Kafka will shut itself down in case it can't write to its disk, that 
> might be a good solution in this case as well as it would trigger the leader 
> election automatically.
> Is it possible to add such a feature to Kafka so that it shuts down in this 
> case as well?
> I guess similar issue might happen with other disk subsystems too or even 
> with a broken and slow disk.
> This scenario can be easily reproduced using AWS FIS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier

2023-10-16 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775965#comment-17775965
 ] 

Haruki Okada commented on KAFKA-15609:
--

> is the OS intelligent enough to understand that it should provide a "dirty" / 
> non-flushed view of the file to the second thread as well?

 

As Ismael pointed out, all file operations go through page cache (DirectIO is 
the exception but it isn't a case in Kafka) so uploading unflushed index to the 
remote storage shouldn't be an issue.

> Corrupted index uploaded to remote tier
> ---
>
> Key: KAFKA-15609
> URL: https://issues.apache.org/jira/browse/KAFKA-15609
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.6.0
>Reporter: Divij Vaidya
>Priority: Minor
>
> While testing Tiered Storage, we have observed corrupt indexes being present 
> in remote tier. One such situation is covered here at 
> https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another 
> such possible case of corruption.
> Potential cause of index corruption:
> We want to ensure that the file we are passing to RSM plugin contains all the 
> data which is present in MemoryByteBuffer i.e. we should have flushed the 
> MemoryByteBuffer to the file using force(). In Kafka, when we close a 
> segment, indexes are flushed asynchronously [1]. Hence, it might be possible 
> that when we are passing the file to RSM, the file doesn't contain flushed 
> data. Hence, we may end up uploading indexes which haven't been flushed yet. 
> Ideally, the contract should enforce that we force flush the content of 
> MemoryByteBuffer before we give the file for RSM. This will ensure that 
> indexes are not corrupted/incomplete.
> [1] 
> [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15567) ReplicaFetcherThreadBenchmark is not working

2023-10-09 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15567:


 Summary: ReplicaFetcherThreadBenchmark is not working
 Key: KAFKA-15567
 URL: https://issues.apache.org/jira/browse/KAFKA-15567
 Project: Kafka
  Issue Type: Improvement
Reporter: Haruki Okada
Assignee: Haruki Okada


* ReplicaFetcherThreadBenchmark is not working as of current trunk 
(https://github.com/apache/kafka/tree/c223a9c3761f796468ccfdae9e177e764ab6a965)

 
{code:java}
% jmh-benchmarks/jmh.sh ReplicaFetcherThreadBenchmark
(snip)
java.lang.NullPointerException
    at kafka.server.metadata.ZkMetadataCache.(ZkMetadataCache.scala:89)
    at kafka.server.MetadataCache.zkMetadataCache(MetadataCache.scala:120)
    at 
org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark.setup(ReplicaFetcherThreadBenchmark.java:220)
    at 
org.apache.kafka.jmh.fetcher.jmh_generated.ReplicaFetcherThreadBenchmark_testFetcher_jmhTest._jmh_tryInit_f_replicafetcherthreadbenchmark0_G(ReplicaFetcherThreadBenchmark_testFetcher_jmhTest.java:448)
    at 
org.apache.kafka.jmh.fetcher.jmh_generated.ReplicaFetcherThreadBenchmark_testFetcher_jmhTest.testFetcher_AverageTime(ReplicaFetcherThreadBenchmark_testFetcher_jmhTest.java:164)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at 
org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:527)
    at 
org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:504)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829) {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-7504) Broker performance degradation caused by call of sendfile reading disk in network thread

2023-08-24 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758588#comment-17758588
 ] 

Haruki Okada commented on KAFKA-7504:
-

I would like to bump this issue up again since this issue still exists even in 
current Kafka.

 

> Are you currently running this patch in production?

> do you plan on contributing it to the project?

[~enether]

I'm a colleague of [~kawamuray] and the patch is running on our production 
clusters for years.

This patch is crucial to keep the performance stable when catch-up reads 
happens.

 

We now have a plan to contribute it to the upstream.

I'll ping you when once I submit a patch.

> Broker performance degradation caused by call of sendfile reading disk in 
> network thread
> 
>
> Key: KAFKA-7504
> URL: https://issues.apache.org/jira/browse/KAFKA-7504
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>Priority: Major
>  Labels: latency, performance
> Attachments: Network_Request_Idle_After_Patch.png, 
> Network_Request_Idle_Per_Before_Patch.png, Response_Times_After_Patch.png, 
> Response_Times_Before_Patch.png, image-2018-10-14-14-18-38-149.png, 
> image-2018-10-14-14-18-57-429.png, image-2018-10-14-14-19-17-395.png, 
> image-2018-10-14-14-19-27-059.png, image-2018-10-14-14-19-41-397.png, 
> image-2018-10-14-14-19-51-823.png, image-2018-10-14-14-20-09-822.png, 
> image-2018-10-14-14-20-19-217.png, image-2018-10-14-14-20-33-500.png, 
> image-2018-10-14-14-20-46-566.png, image-2018-10-14-14-20-57-233.png
>
>
> h2. Environment
> OS: CentOS6
> Kernel version: 2.6.32-XX
>  Kafka version: 0.10.2.1, 0.11.1.2 (but reproduces with latest build from 
> trunk (2.2.0-SNAPSHOT)
> h2. Phenomenon
> Response time of Produce request (99th ~ 99.9th %ile) degrading to 50x ~ 100x 
> more than usual.
>  Normally 99th %ile is lower than 20ms, but when this issue occurs it marks 
> 50ms to 200ms.
> At the same time we could see two more things in metrics:
> 1. Disk read coincidence from the volume assigned to log.dirs.
>  2. Raise in network threads utilization (by 
> `kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent`)
> As we didn't see increase of requests in metrics, we suspected blocking in 
> event loop ran by network thread as the cause of raising network thread 
> utilization.
>  Reading through Kafka broker source code, we understand that the only disk 
> IO performed in network thread is reading log data through calling 
> sendfile(2) (via FileChannel#transferTo).
>  To probe that the calls of sendfile(2) are blocking network thread for some 
> moments, I ran following SystemTap script to inspect duration of sendfile 
> syscalls.
> {code:java}
> # Systemtap script to measure syscall duration
> global s
> global records
> probe syscall.$1 {
> s[tid()] = gettimeofday_us()
> }
> probe syscall.$1.return {
> elapsed = gettimeofday_us() - s[tid()]
> delete s[tid()]
> records <<< elapsed
> }
> probe end {
> print(@hist_log(records))
> }{code}
> {code:java}
> $ stap -v syscall-duration.stp sendfile
> # value (us)
> value | count
> 0 | 0
> 1 |71
> 2 |@@@   6171
>16 |@@@  29472
>32 |@@@   3418
>  2048 | 0
> ...
>  8192 | 3{code}
> As you can see there were some cases taking more than few milliseconds, 
> implies that it blocks network thread for that long and applying the same 
> latency for all other request/response processing.
> h2. Hypothesis
> Gathering the above observations, I made the following hypothesis.
> Let's say network-thread-1 multiplexing 3 connections.
>  - producer-A
>  - follower-B (broker replica fetch)
>  - consumer-C
> Broker receives requests from each of those clients, [Produce, FetchFollower, 
> FetchConsumer].
> They are processed well by request handler threads, and now the response 
> queue of the network-thread contains 3 responses in following order: 
> [FetchConsumer, Produce, FetchFollower].
> network-thread-1 takes 3 responses and processes them sequentially 
> ([https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/SocketServer.scala#L632]).
>  Ideally processing of these 3 responses completes in microseconds as in it 
> just copies ready responses into client socket's buffer with non-blocking 
> manner.
>  However, Kafka uses sendfile(2) for t

[jira] [Commented] (KAFKA-15391) Delete topic may lead to directory offline

2023-08-23 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758026#comment-17758026
 ] 

Haruki Okada commented on KAFKA-15391:
--

For this issue, we may need to swallow NoSuchFileException in Utils.flushDir 
but yeah I'll check the usage and add another method instead of changing 
existing one if necessary

> Delete topic may lead to directory offline
> --
>
> Key: KAFKA-15391
> URL: https://issues.apache.org/jira/browse/KAFKA-15391
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Divij Vaidya
>Assignee: Haruki Okada
>Priority: Major
> Fix For: 3.6.0
>
>
> This is an edge case where the entire log directory is marked offline when we 
> delete a topic. This symptoms of this scenario is characterised by the 
> following logs:
> {noformat}
> [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task 
> 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152)  
> org.apache.kafka.common.errors.KafkaStorageException: Error while flushing 
> log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 
> (exclusive) and recovery point 221 Caused by: 
> java.nio.file.NoSuchFileException: 
> /tmp/kafka-15093588566723278510/test-0{noformat}
> The above log is followed by logs such as:
> {noformat}
> [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task 
> 'flush-log' 
> (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException:
>  The log dir /tmp/kafka-15093588566723278510 is already offline due to a 
> previous IO exception.{noformat}
> The below sequence of events demonstrate the scenario where this bug manifests
> 1.  On the broker, partition lock is acquired and UnifiedLog.roll() is called 
> which schedules an async call for 
> flushUptoOffsetExclusive(). The roll may be called due to segment rotation 
> time or size.
> 2. Admin client calls deleteTopic
> 3. On the broker, LogManager.asyncDelete() is called which will call 
> UnifiedLog.renameDir()
> 4. The directory for the partition is successfully renamed with a "delete" 
> suffix.
> 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts 
> executing. It tries to call localLog.flush() without acquiring a partition 
> lock. 
> 6. LocalLog calls Utils.flushDir() which fails with an IOException.
> 7. On IOException, log directory is added to logDirFailureChannel
> 8. Any new interaction with this logDir fails and a log line is printed such 
> as 
> "The log dir $logDir is already offline due to a previous IO exception"
>  
> This is the reason DeleteTopicTest is flaky as well - 
> https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15391) Delete topic may lead to directory offline

2023-08-23 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada reassigned KAFKA-15391:


Assignee: Haruki Okada

> Delete topic may lead to directory offline
> --
>
> Key: KAFKA-15391
> URL: https://issues.apache.org/jira/browse/KAFKA-15391
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Divij Vaidya
>Assignee: Haruki Okada
>Priority: Major
> Fix For: 3.6.0
>
>
> This is an edge case where the entire log directory is marked offline when we 
> delete a topic. This symptoms of this scenario is characterised by the 
> following logs:
> {noformat}
> [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task 
> 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152)  
> org.apache.kafka.common.errors.KafkaStorageException: Error while flushing 
> log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 
> (exclusive) and recovery point 221 Caused by: 
> java.nio.file.NoSuchFileException: 
> /tmp/kafka-15093588566723278510/test-0{noformat}
> The above log is followed by logs such as:
> {noformat}
> [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task 
> 'flush-log' 
> (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException:
>  The log dir /tmp/kafka-15093588566723278510 is already offline due to a 
> previous IO exception.{noformat}
> The below sequence of events demonstrate the scenario where this bug manifests
> 1.  On the broker, partition lock is acquired and UnifiedLog.roll() is called 
> which schedules an async call for 
> flushUptoOffsetExclusive(). The roll may be called due to segment rotation 
> time or size.
> 2. Admin client calls deleteTopic
> 3. On the broker, LogManager.asyncDelete() is called which will call 
> UnifiedLog.renameDir()
> 4. The directory for the partition is successfully renamed with a "delete" 
> suffix.
> 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts 
> executing. It tries to call localLog.flush() without acquiring a partition 
> lock. 
> 6. LocalLog calls Utils.flushDir() which fails with an IOException.
> 7. On IOException, log directory is added to logDirFailureChannel
> 8. Any new interaction with this logDir fails and a log line is printed such 
> as 
> "The log dir $logDir is already offline due to a previous IO exception"
>  
> This is the reason DeleteTopicTest is flaky as well - 
> https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15391) Delete topic may lead to directory offline

2023-08-22 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757792#comment-17757792
 ] 

Haruki Okada commented on KAFKA-15391:
--

May I take this ticket?

I'm interested since this issue may also happen on our cluster (3.3.2) so I'm 
happy to solve that. I can submit a patch today

> Delete topic may lead to directory offline
> --
>
> Key: KAFKA-15391
> URL: https://issues.apache.org/jira/browse/KAFKA-15391
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Divij Vaidya
>Priority: Major
> Fix For: 3.6.0
>
>
> This is an edge case where the entire log directory is marked offline when we 
> delete a topic. This symptoms of this scenario is characterised by the 
> following logs:
> {noformat}
> [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task 
> 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152)  
> org.apache.kafka.common.errors.KafkaStorageException: Error while flushing 
> log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 
> (exclusive) and recovery point 221 Caused by: 
> java.nio.file.NoSuchFileException: 
> /tmp/kafka-15093588566723278510/test-0{noformat}
> The above log is followed by logs such as:
> {noformat}
> [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task 
> 'flush-log' 
> (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException:
>  The log dir /tmp/kafka-15093588566723278510 is already offline due to a 
> previous IO exception.{noformat}
> The below sequence of events demonstrate the scenario where this bug manifests
> 1.  On the broker, partition lock is acquired and UnifiedLog.roll() is called 
> which schedules an async call for 
> flushUptoOffsetExclusive(). The roll may be called due to segment rotation 
> time or size.
> 2. Admin client calls deleteTopic
> 3. On the broker, LogManager.asyncDelete() is called which will call 
> UnifiedLog.renameDir()
> 4. The directory for the partition is successfully renamed with a "delete" 
> suffix.
> 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts 
> executing. It tries to call localLog.flush() without acquiring a partition 
> lock. 
> 6. LocalLog calls Utils.flushDir() which fails with an IOException.
> 7. On IOException, log directory is added to logDirFailureChannel
> 8. Any new interaction with this logDir fails and a log line is printed such 
> as 
> "The log dir $logDir is already offline due to a previous IO exception"
>  
> This is the reason DeleteTopicTest is flaky as well - 
> https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15391) Delete topic may lead to directory offline

2023-08-22 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757442#comment-17757442
 ] 

Haruki Okada commented on KAFKA-15391:
--

I see, understood. Thanks

> Delete topic may lead to directory offline
> --
>
> Key: KAFKA-15391
> URL: https://issues.apache.org/jira/browse/KAFKA-15391
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Divij Vaidya
>Priority: Major
> Fix For: 3.6.0
>
>
> This is an edge case where the entire log directory is marked offline when we 
> delete a topic. This symptoms of this scenario is characterised by the 
> following logs:
> {noformat}
> [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task 
> 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152)  
> org.apache.kafka.common.errors.KafkaStorageException: Error while flushing 
> log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 
> (exclusive) and recovery point 221 Caused by: 
> java.nio.file.NoSuchFileException: 
> /tmp/kafka-15093588566723278510/test-0{noformat}
> The above log is followed by logs such as:
> {noformat}
> [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task 
> 'flush-log' 
> (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException:
>  The log dir /tmp/kafka-15093588566723278510 is already offline due to a 
> previous IO exception.{noformat}
> The below sequence of events demonstrate the scenario where this bug manifests
> 1.  On the broker, partition lock is acquired and UnifiedLog.roll() is called 
> which schedules an async call for 
> flushUptoOffsetExclusive(). The roll may be called due to segment rotation 
> time or size.
> 2. Admin client calls deleteTopic
> 3. On the broker, LogManager.asyncDelete() is called which will call 
> UnifiedLog.renameDir()
> 4. The directory for the partition is successfully renamed with a "delete" 
> suffix.
> 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts 
> executing. It tries to call localLog.flush() without acquiring a partition 
> lock. 
> 6. LocalLog calls Utils.flushDir() which fails with an IOException.
> 7. On IOException, log directory is added to logDirFailureChannel
> 8. Any new interaction with this logDir fails and a log line is printed such 
> as 
> "The log dir $logDir is already offline due to a previous IO exception"
>  
> This is the reason DeleteTopicTest is flaky as well - 
> https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15391) Delete topic may lead to directory offline

2023-08-22 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757388#comment-17757388
 ] 

Haruki Okada edited comment on KAFKA-15391 at 8/22/23 11:57 AM:


-Related? https://issues.apache.org/jira/browse/KAFKA-13403-

Hmm, similar but seems different


was (Author: ocadaruma):
Related? https://issues.apache.org/jira/browse/KAFKA-13403

> Delete topic may lead to directory offline
> --
>
> Key: KAFKA-15391
> URL: https://issues.apache.org/jira/browse/KAFKA-15391
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Divij Vaidya
>Priority: Major
> Fix For: 3.6.0
>
>
> This is an edge case where the entire log directory is marked offline when we 
> delete a topic. This symptoms of this scenario is characterised by the 
> following logs:
> {noformat}
> [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task 
> 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152)  
> org.apache.kafka.common.errors.KafkaStorageException: Error while flushing 
> log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 
> (exclusive) and recovery point 221 Caused by: 
> java.nio.file.NoSuchFileException: 
> /tmp/kafka-15093588566723278510/test-0{noformat}
> The above log is followed by logs such as:
> {noformat}
> [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task 
> 'flush-log' 
> (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException:
>  The log dir /tmp/kafka-15093588566723278510 is already offline due to a 
> previous IO exception.{noformat}
> The below sequence of events demonstrate the scenario where this bug manifests
> 1.  On the broker, partition lock is acquired and UnifiedLog.roll() is called 
> which schedules an async call for 
> flushUptoOffsetExclusive(). The roll may be called due to segment rotation 
> time or size.
> 2. Admin client calls deleteTopic
> 3. On the broker, LogManager.asyncDelete() is called which will call 
> UnifiedLog.renameDir()
> 4. The directory for the partition is successfully renamed with a "delete" 
> suffix.
> 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts 
> executing. It tries to call localLog.flush() without acquiring a partition 
> lock. 
> 6. LocalLog calls Utils.flushDir() which fails with an IOException.
> 7. On IOException, log directory is added to logDirFailureChannel
> 8. Any new interaction with this logDir fails and a log line is printed such 
> as 
> "The log dir $logDir is already offline due to a previous IO exception"
>  
> This is the reason DeleteTopicTest is flaky as well - 
> https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15391) Delete topic may lead to directory offline

2023-08-22 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757388#comment-17757388
 ] 

Haruki Okada commented on KAFKA-15391:
--

Related? https://issues.apache.org/jira/browse/KAFKA-13403

> Delete topic may lead to directory offline
> --
>
> Key: KAFKA-15391
> URL: https://issues.apache.org/jira/browse/KAFKA-15391
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Divij Vaidya
>Priority: Major
> Fix For: 3.6.0
>
>
> This is an edge case where the entire log directory is marked offline when we 
> delete a topic. This symptoms of this scenario is characterised by the 
> following logs:
> {noformat}
> [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task 
> 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152)  
> org.apache.kafka.common.errors.KafkaStorageException: Error while flushing 
> log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 
> (exclusive) and recovery point 221 Caused by: 
> java.nio.file.NoSuchFileException: 
> /tmp/kafka-15093588566723278510/test-0{noformat}
> The above log is followed by logs such as:
> {noformat}
> [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task 
> 'flush-log' 
> (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException:
>  The log dir /tmp/kafka-15093588566723278510 is already offline due to a 
> previous IO exception.{noformat}
> The below sequence of events demonstrate the scenario where this bug manifests
> 1.  On the broker, partition lock is acquired and UnifiedLog.roll() is called 
> which schedules an async call for 
> flushUptoOffsetExclusive(). The roll may be called due to segment rotation 
> time or size.
> 2. Admin client calls deleteTopic
> 3. On the broker, LogManager.asyncDelete() is called which will call 
> UnifiedLog.renameDir()
> 4. The directory for the partition is successfully renamed with a "delete" 
> suffix.
> 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts 
> executing. It tries to call localLog.flush() without acquiring a partition 
> lock. 
> 6. LocalLog calls Utils.flushDir() which fails with an IOException.
> 7. On IOException, log directory is added to logDirFailureChannel
> 8. Any new interaction with this logDir fails and a log line is printed such 
> as 
> "The log dir $logDir is already offline due to a previous IO exception"
>  
> This is the reason DeleteTopicTest is flaky as well - 
> https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load

2023-08-18 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755904#comment-17755904
 ] 

Haruki Okada edited comment on KAFKA-15046 at 8/18/23 12:05 PM:


I submitted a patch [https://github.com/apache/kafka/pull/14242] .

In the meantime, I tested above patch (with porting it to 3.3.2, which is the 
version we use) in our experimental environment:
 * Setting:
 ** num.io.threads = 48
 ** incoming byte-rate: 18MB/sec
 ** Adding 300ms artificial write-delay into the device using 
[device-mapper|https://github.com/kawamuray/ddi]
 * Without patch:
 ** !image-2023-08-18-19-23-36-597.png|width=292,height=164!
 ** request-handler idle ratio is below 40%
 ** produce-response time 99.9%ile is over 1 sec
 ** We see producer-state snapshotting takes hundreds of millisecs
 *** 
{code:java}
(snip)
[2023-08-18 13:23:02,552] INFO [ProducerStateManager partition=xxx-3] Wrote 
producer snapshot at offset 3030259 with 0 producer ids in 777 ms. 
(kafka.log.ProducerStateManager)
[2023-08-18 13:23:02,852] INFO [ProducerStateManager partition=xxx-10] Wrote 
producer snapshot at offset 2991767 with 0 producer ids in 678 ms. 
(kafka.log.ProducerStateManager){code}

 * With patch:
 ** !image-2023-08-18-19-29-56-377.png|width=297,height=169!
 ** request-handler idle ratio is kept 75%
 ** produce-response time 99.9%ile is around 100ms
 ** producer-state snapshotting done in millisecs in most cases
 *** 
{code:java}
(snip)
[2023-08-18 13:40:09,383] INFO [ProducerStateManager partition=xxx-3] Wrote 
producer snapshot at offset 6219284 with 0 producer ids in 0 ms. 
(kafka.log.ProducerStateManager) 
[2023-08-18 13:40:09,818] INFO [ProducerStateManager partition=icbm-2] Wrote 
producer snapshot at offset 6208459 with 0 producer ids in 0 ms. 
(kafka.log.ProducerStateManager){code}


was (Author: ocadaruma):
I submitted a patch [https://github.com/apache/kafka/pull/14242] .

In the meantime, I tested above patch (with porting it to 3.3.2, which is the 
version we use) in our experimental environment:
 * Setting:
 ** num.io.threads = 48
 ** incoming byte-rate: 18MB/sec
 ** Adding 300ms artificial write-delay into the device using 
[device-mapper|https://github.com/kawamuray/ddi]
 * Without patch:
 ** !image-2023-08-18-19-23-36-597.png|width=292,height=164!
 ** request-handler idle ratio is below 40%
 ** produce-response time 99.9%ile is over 1 sec
 ** We see producer-state snapshotting takes hundreds of millisecs
 *** 
{code:java}
(snip)
[2023-08-18 13:23:02,552] INFO [ProducerStateManager partition=xxx-3] Wrote 
producer snapshot at offset 3030259 with 0 producer ids in 777 ms. 
(kafka.log.ProducerStateManager)
[2023-08-18 13:23:02,852] INFO [ProducerStateManager partition=xxx-10] Wrote 
producer snapshot at offset 2991767 with 0 producer ids in 678 ms. 
(kafka.log.ProducerStateManager){code}

 * With patch:
 ** !image-2023-08-18-19-29-56-377.png|width=297,height=169!
 ** request-handler idle ratio is kept 75%
 ** produce-response time 99.9%ile is around 100ms
 ** producer-state snapshotting takes few millisecs in most cases
 *** 
{code:java}
(snip)
[2023-08-18 13:40:09,383] INFO [ProducerStateManager partition=xxx-3] Wrote 
producer snapshot at offset 6219284 with 0 producer ids in 0 ms. 
(kafka.log.ProducerStateManager) 
[2023-08-18 13:40:09,818] INFO [ProducerStateManager partition=icbm-2] Wrote 
producer snapshot at offset 6208459 with 0 producer ids in 0 ms. 
(kafka.log.ProducerStateManager){code}

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png, image-2023-08-18-19-23-36-597.png, 
> image-2023-08-18-19-29-56-377.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utiliza

[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load

2023-08-18 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755904#comment-17755904
 ] 

Haruki Okada commented on KAFKA-15046:
--

I submitted a patch [https://github.com/apache/kafka/pull/14242] .

In the meantime, I tested above patch (with porting it to 3.3.2, which is the 
version we use) in our experimental environment:
 * Setting:
 ** num.io.threads = 48
 ** incoming byte-rate: 18MB/sec
 ** Adding 300ms artificial write-delay into the device using 
[device-mapper|https://github.com/kawamuray/ddi]
 * Without patch:
 ** !image-2023-08-18-19-23-36-597.png|width=292,height=164!
 ** request-handler idle ratio is below 40%
 ** produce-response time 99.9%ile is over 1 sec
 ** We see producer-state snapshotting takes hundreds of millisecs
 *** 
{code:java}
(snip)
[2023-08-18 13:23:02,552] INFO [ProducerStateManager partition=xxx-3] Wrote 
producer snapshot at offset 3030259 with 0 producer ids in 777 ms. 
(kafka.log.ProducerStateManager)
[2023-08-18 13:23:02,852] INFO [ProducerStateManager partition=xxx-10] Wrote 
producer snapshot at offset 2991767 with 0 producer ids in 678 ms. 
(kafka.log.ProducerStateManager){code}

 * With patch:
 ** !image-2023-08-18-19-29-56-377.png|width=297,height=169!
 ** request-handler idle ratio is kept 75%
 ** produce-response time 99.9%ile is around 100ms
 ** producer-state snapshotting takes few millisecs in most cases
 *** 
{code:java}
(snip)
[2023-08-18 13:40:09,383] INFO [ProducerStateManager partition=xxx-3] Wrote 
producer snapshot at offset 6219284 with 0 producer ids in 0 ms. 
(kafka.log.ProducerStateManager) 
[2023-08-18 13:40:09,818] INFO [ProducerStateManager partition=icbm-2] Wrote 
producer snapshot at offset 6208459 with 0 producer ids in 0 ms. 
(kafka.log.ProducerStateManager){code}

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png, image-2023-08-18-19-23-36-597.png, 
> image-2023-08-18-19-29-56-377.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:

[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load

2023-08-18 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15046:
-
Attachment: image-2023-08-18-19-29-56-377.png

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png, image-2023-08-18-19-23-36-597.png, 
> image-2023-08-18-19-29-56-377.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
> at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
> at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
> at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}
>  * 
>  ** Also there were bunch of logs that writing producer snapshots took 
> hundreds of milliseconds.
>  *** 
> {code:java}
> ...
> [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
> producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
> (kafka.log.ProducerStateManager)
> [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote 
> producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. 
> (kafka.log.ProducerStateManager)
> [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote 
> producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. 
> (kafka.log.ProducerStateManager)
> ... {cod

[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load

2023-08-18 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15046:
-
Attachment: image-2023-08-18-19-23-36-597.png

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png, image-2023-08-18-19-23-36-597.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
> at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
> at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
> at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}
>  * 
>  ** Also there were bunch of logs that writing producer snapshots took 
> hundreds of milliseconds.
>  *** 
> {code:java}
> ...
> [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
> producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
> (kafka.log.ProducerStateManager)
> [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote 
> producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. 
> (kafka.log.ProducerStateManager)
> [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote 
> producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. 
> (kafka.log.ProducerStateManager)
> ... {code}
>  * From the analysis, we summariz

[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load

2023-08-18 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755898#comment-17755898
 ] 

Haruki Okada commented on KAFKA-15046:
--

After dug into the fsync call paths in detail, I summarized the problem and the 
solutions like below:
h2. Problem
 * While any blocking operation under holding the UnifiedLog.lock could lead to 
serious performance (even availability) issues, currently there are several 
paths that calls fsync(2) inside the lock
 ** In the meantime the lock is held, all subsequent produces against the 
partition may block
 ** This easily causes all request-handlers to be busy on bad disk performance
 ** Even worse, when a disk experiences tens of seconds of glitch (it's not 
rare in spinning drives), it makes the broker to unable to process any requests 
with unfenced from the cluster (i.e. "zombie" like status)

h2. Analysis of fsync(2) inside UnifidedLog.lock

First, fsyncs on start-up/shutdown timing isn't a problem since the broker 
isn't processing requests.

Given that, there are essentially 4 problematic call paths listed below:
h3. 1. [ProducerStateManager.takeSnapshot at 
UnifiedLog.roll|https://github.com/apache/kafka/blob/3f4816dd3eafaf1a0636d3ee689069f897c99e28/core/src/main/scala/kafka/log/UnifiedLog.scala#L2133]
 * Here, solutions is just moving fsync(2) call to the scheduler thread as part 
of existing "flush-log" job (before incrementing recovery point)

h3. 2. [ProducerStateManager.removeAndMarkSnapshotForDeletion as part of log 
segment 
deletion|https://github.com/apache/kafka/blob/3f4816dd3eafaf1a0636d3ee689069f897c99e28/core/src/main/scala/kafka/log/UnifiedLog.scala#L2133]
 * removeAndMarkSnapshotForDeletion calls Utils.atomicMoveWithFallback with 
parent-dir flushing when renaming to add .deleted suffix
 * Here, we don't need to flush parent-dir I suppose.
 * Worst case scenario, few producer snapshots which should've been deleted are 
remained with lucking .deleted after unclean shutdown
 ** In this case, these files will be eventually deleted so shouldn't be a big 
problem.

h3. 3. [LeaderEpochFileCache.truncateFromStart when incrementing 
log-start-offset|https://github.com/apache/kafka/blob/3f4816dd3eafaf1a0636d3ee689069f897c99e28/core/src/main/scala/kafka/log/UnifiedLog.scala#L986]
 * This path is called from deleteRecords on request-handler threads.
 * Here, we don't need fsync(2) either actually.
 * Upon unclean shutdown, few leader epochs might be remained in the file but 
it will be [handled by 
LogLoader|https://github.com/apache/kafka/blob/3f4816dd3eafaf1a0636d3ee689069f897c99e28/core/src/main/scala/kafka/log/LogLoader.scala#L185]
 on start-up so not a problem

h3. 4. [LeaderEpochFileCache.truncateFromEnd as part of log 
truncation|https://github.com/apache/kafka/blob/3f4816dd3eafaf1a0636d3ee689069f897c99e28/core/src/main/scala/kafka/log/UnifiedLog.scala#L1663]
 
 * Though this path is called mainly on replica fetcher threads, blocking 
replica fetchers isn't ideal either, since it could cause remote-scope produce 
performance degradation on leader side
 * Likewise, we don't need fsync(2) here since any epochs which untruncated 
will be handled on log loading procedure

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 

[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load

2023-08-16 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755351#comment-17755351
 ] 

Haruki Okada edited comment on KAFKA-15046 at 8/17/23 4:09 AM:
---

[~junrao] Hi, sorry for the late response.

Thanks for your suggestion.

 

> Another way to improve this is to move the LeaderEpochFile flushing logic to 
> be part of the flushing of rolled segments

 

Yeah, that sounds make sense.

I think ProducerState snapshot also should be unified to existing flushing 
logic then, instead of fsync-ing ProducerState separately in log.roll (i.e. 
current Kafka behavior), nor submitting to scheduler separately (i.e. like 
ongoing patch([https://github.com/apache/kafka/pull/13782]) does)

 


was (Author: ocadaruma):
[~junrao] Hi, sorry for the late response.

Thanks for your suggestion.

 

> Another way to improve this is to move the LeaderEpochFile flushing logic to 
> be part of the flushing of rolled segments

 

Yeah, that sounds make sense.

I think ProducerState snapshot also should be the unified to existing flushing 
logic then, instead of fsync-ing ProducerState separately in log.roll (i.e. 
current Kafka behavior), nor submitting to scheduler separately (i.e. like 
ongoing patch([https://github.com/apache/kafka/pull/13782]) does)

 

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
>   

[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load

2023-08-16 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755351#comment-17755351
 ] 

Haruki Okada edited comment on KAFKA-15046 at 8/17/23 4:08 AM:
---

[~junrao] Hi, sorry for the late response.

Thanks for your suggestion.

 

> Another way to improve this is to move the LeaderEpochFile flushing logic to 
> be part of the flushing of rolled segments

 

Yeah, that sounds make sense.

I think ProducerState snapshot also should be the unified to existing flushing 
logic then, instead of fsync-ing ProducerState separately in log.roll (i.e. 
current Kafka behavior), nor submitting to scheduler separately (i.e. like 
ongoing patch([https://github.com/apache/kafka/pull/13782]) does)

 


was (Author: ocadaruma):
[~junrao] Hi, sorry for the late response.

Thanks for your suggestion.

 

> Another way to improve this is to move the LeaderEpochFile flushing logic to 
> be part of the flushing of rolled segments

 

Yeah, that sounds make sense.

I think ProducerState snapshot also should be the unified to existing flushing 
logic then, instead of fsync-ing ProducerState separately in log.roll (i.e. 
current Kafka behavior), nor submitting to scheduler separately (i.e. like 
[ongoing patch|[https://github.com/apache/kafka/pull/13782]] ongoing patch does)

 

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.

[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load

2023-08-16 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755351#comment-17755351
 ] 

Haruki Okada commented on KAFKA-15046:
--

[~junrao] Hi, sorry for the late response.

Thanks for your suggestion.

 

> Another way to improve this is to move the LeaderEpochFile flushing logic to 
> be part of the flushing of rolled segments

 

Yeah, that sounds make sense.

I think ProducerState snapshot also should be the unified to existing flushing 
logic then, instead of fsync-ing ProducerState separately in log.roll (i.e. 
current Kafka behavior), nor submitting to scheduler separately (i.e. like 
[ongoing patch|[https://github.com/apache/kafka/pull/13782]] ongoing patch does)

 

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
> at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
> at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
> at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}
>  * 
>  ** Also there were bunch of logs that writing producer snapshots took 
> hundreds of milliseconds.
>  *** 
> {code:java}
> ...
> [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
> produce

[jira] [Commented] (KAFKA-15185) Consumers using the latest strategy may lose data after the topic adds partitions

2023-07-13 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742830#comment-17742830
 ] 

Haruki Okada commented on KAFKA-15185:
--

FYI: maybe duplicated with https://issues.apache.org/jira/browse/KAFKA-12478, 
https://issues.apache.org/jira/browse/KAFKA-12261

> Consumers using the latest strategy may lose data after the topic adds 
> partitions
> -
>
> Key: KAFKA-15185
> URL: https://issues.apache.org/jira/browse/KAFKA-15185
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 3.4.1
>Reporter: RivenSun
>Assignee: Luke Chen
>Priority: Major
>
> h2. condition:
> 1. Business topic adds partition
> 2. The configuration metadata.max.age.ms of producers and consumers is set to 
> five minutes.
> But the producer discovered the new partition before the consumer, and 
> generated 100 messages to the new partition.
> 3. The consumer parameter auto.offset.reset is set to *latest*
> h2. result:
> Consumers will lose these 100 messages
> First of all, we cannot directly set auto.offset.reset to {*}earliest{*}.
> Because the user's demand is that a newly subscribed group can discard all 
> old messages of the topic.
> However, after the group is subscribed, the message generated by the expanded 
> partition {*}must be guaranteed not to be lost{*}, similar to starting 
> consumption from the earliest.
> h2.  
> h2. suggestion:
> We have set the consumer's metadata.max.age.ms to 1/2 or 1/3 of the 
> producer's metadata.max.age.ms configuration.
> But this still can't solve the problem, because in many cases, the producer 
> may force refresh the metadata.
> Secondly, a smaller metadata.max.age.ms value will bring more metadata 
> refresh requests, which will increase the burden on the broker.
> So can we add a parameter to control how the consumer determines whether to 
> start consumption from the earliest or latest for the newly added partition.
> Perhaps during the rebalance process, the leaderConsumer needs to mark which 
> partitions are newly added when calculating the assignment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT

2023-06-06 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada reassigned KAFKA-14445:


Assignee: Haruki Okada

> Producer doesn't request metadata update on REQUEST_TIMED_OUT
> -
>
> Key: KAFKA-14445
> URL: https://issues.apache.org/jira/browse/KAFKA-14445
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
>
> Produce requests may fail with timeout by `request.timeout.ms` in below two 
> cases:
>  * Didn't receive produce response within `request.timeout.ms`
>  * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the 
> broker
> Former case usually happens when a broker-machine got failed or there's 
> network glitch etc.
> In this case, the connection will be disconnected and metadata-update will be 
> requested to discover new leader: 
> [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556]
>  
> The problem is in latter case (REQUEST_TIMED_OUT on the broker).
> In this case, the produce request will be ended up with TimeoutException, 
> which doesn't inherit InvalidMetadataException so it doesn't trigger metadata 
> update.
>  
> Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side 
> problem, that metadata-update doesn't make much sense indeed.
>  
> However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT 
> could cause produce requests to retry unnecessarily , which may end up with 
> batch expiration due to delivery timeout.
> Below is the scenario we experienced:
>  * Environment:
>  ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1
>  ** min.insync.replicas=2
>  ** acks=all
>  * Scenario:
>  ** broker 1 "partially" failed
>  *** It lost ZooKeeper connection and kicked out from the cluster
>   There was controller log like:
>  * 
> {code:java}
> [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , 
> deleted brokers: 1, bounced brokers: {code}
>  * 
>  ** 
>  *** However, somehow the broker was able continued to receive produce 
> requests
>   We're still working on investigating how this is possible though.
>   Indeed, broker 1 was somewhat "alive" and keeps working according to 
> server.log
>  *** In other words, broker 1 became "zombie"
>  ** broker 2 was elected as new leader
>  *** broker 3 became follower of broker 2
>  *** However, since broker 1 was still out of cluster, it didn't receive 
> LeaderAndIsr so 1 kept thinking itself as the leader of tp-0
>  ** Meanwhile, producer keeps sending produce requests to broker 1 and 
> requests were failed due to REQUEST_TIMED_OUT because no brokers replicates 
> from broker 1.
>  *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't 
> have a change to update its stale metadata
>  
> So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, 
> to address the case that the old leader became "zombie"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT

2023-06-06 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729695#comment-17729695
 ] 

Haruki Okada commented on KAFKA-14445:
--

[~kirktrue] Thanks for your patch about 
https://issues.apache.org/jira/browse/KAFKA-14317 .

 

However if I read the patch correctly, I guess our original issue should be 
addressed separately.

Our issue was the case where the producer receives REQUEST_TIMED_OUT response 
(i.e. request timed out inside the purgatory while waiting replication), rather 
than NetworkClient-level timeout.

So I think the || clause here 
([https://github.com/apache/kafka/pull/12813#discussion_r1048223644]) was 
necessary against the discussion.

 

Though this is kind of extreme edge case, I would like to solve anyways as it 
caused a batch expiration on our producer.

 

I'll submit a follow-up patch.

> Producer doesn't request metadata update on REQUEST_TIMED_OUT
> -
>
> Key: KAFKA-14445
> URL: https://issues.apache.org/jira/browse/KAFKA-14445
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Priority: Major
>
> Produce requests may fail with timeout by `request.timeout.ms` in below two 
> cases:
>  * Didn't receive produce response within `request.timeout.ms`
>  * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the 
> broker
> Former case usually happens when a broker-machine got failed or there's 
> network glitch etc.
> In this case, the connection will be disconnected and metadata-update will be 
> requested to discover new leader: 
> [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556]
>  
> The problem is in latter case (REQUEST_TIMED_OUT on the broker).
> In this case, the produce request will be ended up with TimeoutException, 
> which doesn't inherit InvalidMetadataException so it doesn't trigger metadata 
> update.
>  
> Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side 
> problem, that metadata-update doesn't make much sense indeed.
>  
> However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT 
> could cause produce requests to retry unnecessarily , which may end up with 
> batch expiration due to delivery timeout.
> Below is the scenario we experienced:
>  * Environment:
>  ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1
>  ** min.insync.replicas=2
>  ** acks=all
>  * Scenario:
>  ** broker 1 "partially" failed
>  *** It lost ZooKeeper connection and kicked out from the cluster
>   There was controller log like:
>  * 
> {code:java}
> [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , 
> deleted brokers: 1, bounced brokers: {code}
>  * 
>  ** 
>  *** However, somehow the broker was able continued to receive produce 
> requests
>   We're still working on investigating how this is possible though.
>   Indeed, broker 1 was somewhat "alive" and keeps working according to 
> server.log
>  *** In other words, broker 1 became "zombie"
>  ** broker 2 was elected as new leader
>  *** broker 3 became follower of broker 2
>  *** However, since broker 1 was still out of cluster, it didn't receive 
> LeaderAndIsr so 1 kept thinking itself as the leader of tp-0
>  ** Meanwhile, producer keeps sending produce requests to broker 1 and 
> requests were failed due to REQUEST_TIMED_OUT because no brokers replicates 
> from broker 1.
>  *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't 
> have a change to update its stale metadata
>  
> So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, 
> to address the case that the old leader became "zombie"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load

2023-06-05 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729561#comment-17729561
 ] 

Haruki Okada commented on KAFKA-15046:
--

I see, thank you for pointing out.

Hmm, now I agree with just making fileDescriptor fsync call asynchronous should 
be fine.

(I'm still doubting if we can't move LeaderEpochFileCache's method call outside 
of Log.lock because underlying CheckpointFile is doing exclusive control by 
itself 
([https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L56])
 though, carefully checking all paths which calling fsync to move it outside of 
Log.lock is too error prone and maybe hard to maintain)

 

May I assign this issue to me?

I would like to submit a patch to make LeaderEpochFile's fsync to be async. 
(for ProducerState snapshot, [https://github.com/apache/kafka/pull/13782] 
should cover already)

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
> at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
> at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
> at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}
>  * 
>  ** Also there were bunch of logs t

[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load

2023-06-02 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728551#comment-17728551
 ] 

Haruki Okada edited comment on KAFKA-15046 at 6/2/23 7:19 AM:
--

[~showuon]  Maybe I linked wrong file.

What I thought is to make any LeaderEpochFileCache methods which needs flush() 
to be called outside of Log's global lock.

LeaderEpochFileCache already does exclusive control by its RW lock so I think 
we don't need to call it inside the Log's global lock.

[https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/server/epoch/LeaderEpochFileCache.scala#L44]


was (Author: ocadaruma):
[~showuon]  Maybe I linked wrong file.

What I thought is to make any LeaderEpochFileCache methods (which needs 
flush()) to be called outside of Log's global lock.

LeaderEpochFileCache already does exclusive control by its RW lock so I think 
we don't need to call it inside the Log's global lock.

[https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/server/epoch/LeaderEpochFileCache.scala#L44]

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
> at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
> at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHan

[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load

2023-06-01 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728551#comment-17728551
 ] 

Haruki Okada commented on KAFKA-15046:
--

[~showuon]  Maybe I linked wrong file.

What I thought is to make any LeaderEpochFileCache methods (which needs 
flush()) to be called outside of Log's global lock.

LeaderEpochFileCache already does exclusive control by its RW lock so I think 
we don't need to call it inside the Log's global lock.

[https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/server/epoch/LeaderEpochFileCache.scala#L44]

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
> at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
> at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
> at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}
>  * 
>  ** Also there were bunch of logs that writing producer snapshots took 
> hundreds of milliseconds.
>  *** 
> {code:java}
> ...
> [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
> producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
> (kafka.log.ProducerStateManager)
> [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] W

[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load

2023-06-01 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728515#comment-17728515
 ] 

Haruki Okada edited comment on KAFKA-15046 at 6/2/23 4:01 AM:
--

Yeah, io_uring is promising.

However it only works with newer kernel (which some on-premises Kafka users may 
not be easy to update) and require rewriting a lot of parts of the code base.

-For leader-epoch cache, the checkpointing is already done in scheduler thread 
so we should adopt solution2 I think-

For leader epoch cache, some paths already doing checkpointing asynchronously 
(e.g. UnifiedLog.deleteOldSegments => UnifiedLog.maybeIncrementLogStartOffset 
=> LeaderEpochFileCache.truncateFromStart on kafka scheduler), so we have to 
make fsync called outside of the lock (i.e. solution-2) anyways I think.

 

Writing to CheckpointFile is already synchronized, so can't we just move 
checkpointing to outside of the lock? 
[https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L76]


was (Author: ocadaruma):
Yeah, io_uring is promising.

However it only works with newer kernel (which some on-premises Kafka users may 
not be easy to update) and require rewriting a lot of parts of the code base.

For leader-epoch cache, the checkpointing is already done in scheduler thread 
so we should adopt solution2 I think.

Writing to CheckpointFile is already synchronized, so can't we just move 
checkpointing to outside of the lock? 
[https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L76]

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptim

[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load

2023-06-01 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728515#comment-17728515
 ] 

Haruki Okada commented on KAFKA-15046:
--

Yeah, io_uring is promising.

However it only works with newer kernel (which some on-premises Kafka users may 
not be easy to update) and require a lot of parts of the code base.

For leader-epoch cache, the checkpointing is already done in scheduler thread 
so we should adopt solution2 I think.

Writing to CheckpointFile is already synchronized, so can't we just move 
checkpointing to outside of the lock? 
https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L76

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
> at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
> at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
> at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}
>  * 
>  ** Also there were bunch of logs that writing producer snapshots took 
> hundreds of milliseconds.
>  *** 
> {code:java}
> ...
> [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
> producer snapshot at offset 1748817854 with 8 producer ids 

[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load

2023-06-01 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728515#comment-17728515
 ] 

Haruki Okada edited comment on KAFKA-15046 at 6/2/23 12:01 AM:
---

Yeah, io_uring is promising.

However it only works with newer kernel (which some on-premises Kafka users may 
not be easy to update) and require rewriting a lot of parts of the code base.

For leader-epoch cache, the checkpointing is already done in scheduler thread 
so we should adopt solution2 I think.

Writing to CheckpointFile is already synchronized, so can't we just move 
checkpointing to outside of the lock? 
[https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L76]


was (Author: ocadaruma):
Yeah, io_uring is promising.

However it only works with newer kernel (which some on-premises Kafka users may 
not be easy to update) and require a lot of parts of the code base.

For leader-epoch cache, the checkpointing is already done in scheduler thread 
so we should adopt solution2 I think.

Writing to CheckpointFile is already synchronized, so can't we just move 
checkpointing to outside of the lock? 
https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L76

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
> at kafka.server.Repl

[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load

2023-06-01 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15046:
-
Description: 
* Phenomenon:
 ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
 ** Producer response time 99%ile got quite bad when we performed replica 
reassignment on the cluster
 *** RequestQueue scope was significant
 ** Also request-time throttling happened at the incidental time. This caused 
producers to delay sending messages in the mean time.
 ** The disk I/O latency was higher than usual due to the high load for replica 
reassignment.
 *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
 * Analysis:
 ** The request-handler utilization was much higher than usual.
 *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
 ** Also, thread time utilization was much higher than usual on almost all users
 *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
 ** From taking jstack several times, for most of them, we found that a 
request-handler was doing fsync for flusing ProducerState and meanwhile other 
request-handlers were waiting Log#lock for appending messages.

 * 
 ** 
 *** 
{code:java}
"data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
runnable  [0x7ef9a12e2000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method)
at 
sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
at 
sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
at 
kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
at 
kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
at 
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
at 
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
at 
kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
Source)
at 
scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
at 
scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
at scala.collection.mutable.HashMap.map(HashMap.scala:35)
at 
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}

 * 
 ** Also there were bunch of logs that writing producer snapshots took hundreds 
of milliseconds.
 *** 
{code:java}
...
[2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote 
producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote 
producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. 
(kafka.log.ProducerStateManager)
... {code}

 * From the analysis, we summarized the issue as below:

 * 
 ** 1. Disk write latency got worse due to the replica reassignment
 *** We already use replication quota, and lowering the quota further may not 
be acceptable for too long assignment duration
 ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync 
latency
 *** This is done at every log segment roll.
 *** In our case, the broker hosts high load partitions so log roll is 
occurring very frequently.
 ** 3. During ProducerStateManager#takeSnapshot is doing fsync, all subsequent 
produce requests to the partition is blocked due to Log#lock
 ** 4. During produce requests waiting the lock, they consume request handler 
threads time so it's accounted as thread-time utilization and caused throttling
 * Suggestion:
 ** We didn't see this phenomenon when we used Kafka 2.4.1.
 *** ProducerState fsync was introduced i

[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load

2023-06-01 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728266#comment-17728266
 ] 

Haruki Okada edited comment on KAFKA-15046 at 6/1/23 8:39 AM:
--

Hm, when I dug into further this, I noticed there's another path that causes 
essentially same phenomenon.

 
{code:java}
"data-plane-kafka-request-handler-17" #169 daemon prio=5 os_prio=0 
cpu=50994542.49ms elapsed=595635.65s tid=0x7efdaebabe30 nid=0x1e707 
runnable  [0x7ef9a0fdf000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method)
        at 
sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
        at 
sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
        at org.apache.kafka.common.utils.Utils.flushDir(Utils.java:966)
        at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:951)
        at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:925)
        at 
org.apache.kafka.server.common.CheckpointFile.write(CheckpointFile.java:98)
        - locked <0x000680fc4930> (a java.lang.Object)
        at 
kafka.server.checkpoints.CheckpointFileWithFailureHandler.write(CheckpointFileWithFailureHandler.scala:37)
        at 
kafka.server.checkpoints.LeaderEpochCheckpointFile.write(LeaderEpochCheckpointFile.scala:71)
        at 
kafka.server.epoch.LeaderEpochFileCache.flush(LeaderEpochFileCache.scala:291)
        at 
kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$3(LeaderEpochFileCache.scala:263)
        at 
kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$3$adapted(LeaderEpochFileCache.scala:259)
        at 
kafka.server.epoch.LeaderEpochFileCache$$Lambda$571/0x00080045f040.apply(Unknown
 Source)
        at scala.Option.foreach(Option.scala:437)
        at 
kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$1(LeaderEpochFileCache.scala:259)
        at 
kafka.server.epoch.LeaderEpochFileCache.truncateFromStart(LeaderEpochFileCache.scala:254)
        at 
kafka.log.UnifiedLog.$anonfun$maybeIncrementLogStartOffset$4(UnifiedLog.scala:1043)
        at 
kafka.log.UnifiedLog.$anonfun$maybeIncrementLogStartOffset$4$adapted(UnifiedLog.scala:1043)
        at kafka.log.UnifiedLog$$Lambda$2324/0x000800b59040.apply(Unknown 
Source)
        at scala.Option.foreach(Option.scala:437)
        at 
kafka.log.UnifiedLog.maybeIncrementLogStartOffset(UnifiedLog.scala:1043)
        - locked <0x000680fc5080> (a java.lang.Object)
        at 
kafka.cluster.Partition.$anonfun$deleteRecordsOnLeader$1(Partition.scala:1476)
        at kafka.cluster.Partition.deleteRecordsOnLeader(Partition.scala:1463)
        at 
kafka.server.ReplicaManager.$anonfun$deleteRecordsOnLocalLog$2(ReplicaManager.scala:687)
        at 
kafka.server.ReplicaManager$$Lambda$3156/0x000800d7c840.apply(Unknown 
Source)
        at 
scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
        at 
scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
        at scala.collection.mutable.HashMap.map(HashMap.scala:35)
        at 
kafka.server.ReplicaManager.deleteRecordsOnLocalLog(ReplicaManager.scala:680)
        at kafka.server.ReplicaManager.deleteRecords(ReplicaManager.scala:875)
        at 
kafka.server.KafkaApis.handleDeleteRecordsRequest(KafkaApis.scala:2216)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:196)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
        at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}
 

LeaderEpoch checkpointing also calls fsync with holding Log#lock and blocking 
request-handler threads to append in the meantime.

 

This is called by scheduler thread on log-segment-breaching so might be less 
frequent than log roll though.

Does it make sense to also making LeaderEpochCheckpointFile-flush to be outside 
of the lock?


was (Author: ocadaruma):
Hm, when I dug into further this, I noticed there's another path that causes 
essentially same phenomenon.

 
{code:java}
"data-plane-kafka-request-handler-17" #169 daemon prio=5 os_prio=0 
cpu=50994542.49ms elapsed=595635.65s tid=0x7efdaebabe30 nid=0x1e707 
runnable  [0x7ef9a0fdf000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method)
        at 
sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
        at 
sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
        at org.apache.kafka.common.utils.Utils.flushDir(Utils.java:966)
        at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:951)
        at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:925)
        at 
org.apache.kafk

[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load

2023-06-01 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728266#comment-17728266
 ] 

Haruki Okada commented on KAFKA-15046:
--

Hm, when I dug into further this, I noticed there's another path that causes 
essentially same phenomenon.

 
{code:java}
"data-plane-kafka-request-handler-17" #169 daemon prio=5 os_prio=0 
cpu=50994542.49ms elapsed=595635.65s tid=0x7efdaebabe30 nid=0x1e707 
runnable  [0x7ef9a0fdf000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method)
        at 
sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
        at 
sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
        at org.apache.kafka.common.utils.Utils.flushDir(Utils.java:966)
        at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:951)
        at 
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:925)
        at 
org.apache.kafka.server.common.CheckpointFile.write(CheckpointFile.java:98)
        - locked <0x000680fc4930> (a java.lang.Object)
        at 
kafka.server.checkpoints.CheckpointFileWithFailureHandler.write(CheckpointFileWithFailureHandler.scala:37)
        at 
kafka.server.checkpoints.LeaderEpochCheckpointFile.write(LeaderEpochCheckpointFile.scala:71)
        at 
kafka.server.epoch.LeaderEpochFileCache.flush(LeaderEpochFileCache.scala:291)
        at 
kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$3(LeaderEpochFileCache.scala:263)
        at 
kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$3$adapted(LeaderEpochFileCache.scala:259)
        at 
kafka.server.epoch.LeaderEpochFileCache$$Lambda$571/0x00080045f040.apply(Unknown
 Source)
        at scala.Option.foreach(Option.scala:437)
        at 
kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$1(LeaderEpochFileCache.scala:259)
        at 
kafka.server.epoch.LeaderEpochFileCache.truncateFromStart(LeaderEpochFileCache.scala:254)
        at 
kafka.log.UnifiedLog.$anonfun$maybeIncrementLogStartOffset$4(UnifiedLog.scala:1043)
        at 
kafka.log.UnifiedLog.$anonfun$maybeIncrementLogStartOffset$4$adapted(UnifiedLog.scala:1043)
        at kafka.log.UnifiedLog$$Lambda$2324/0x000800b59040.apply(Unknown 
Source)
        at scala.Option.foreach(Option.scala:437)
        at 
kafka.log.UnifiedLog.maybeIncrementLogStartOffset(UnifiedLog.scala:1043)
        - locked <0x000680fc5080> (a java.lang.Object)
        at 
kafka.cluster.Partition.$anonfun$deleteRecordsOnLeader$1(Partition.scala:1476)
        at kafka.cluster.Partition.deleteRecordsOnLeader(Partition.scala:1463)
        at 
kafka.server.ReplicaManager.$anonfun$deleteRecordsOnLocalLog$2(ReplicaManager.scala:687)
        at 
kafka.server.ReplicaManager$$Lambda$3156/0x000800d7c840.apply(Unknown 
Source)
        at 
scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
        at 
scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
        at scala.collection.mutable.HashMap.map(HashMap.scala:35)
        at 
kafka.server.ReplicaManager.deleteRecordsOnLocalLog(ReplicaManager.scala:680)
        at kafka.server.ReplicaManager.deleteRecords(ReplicaManager.scala:875)
        at 
kafka.server.KafkaApis.handleDeleteRecordsRequest(KafkaApis.scala:2216)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:196)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
        at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}
 

LeaderEpoch checkpointing also calls fsync with holding Log#lock and blocking 
request-handler threads to append in the meantime.

 

This is called by scheduler thread on log-segment-breaching so might be less 
frequent than log roll though.

Does it make sense to also making LeaderEpochCheckpointFile-flush to be 
asynchronous?

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Priority: Major
>  Labels: performance
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages

[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load

2023-06-01 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728262#comment-17728262
 ] 

Haruki Okada commented on KAFKA-15046:
--

Oh I haven't noticed there's another ticket and already the fix is available.

Thank you, I will take a look!

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Priority: Major
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
> at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
> at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
> at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}
>  * 
>  ** Also there were bunch of logs that writing producer snapshots took 
> hundreds of milliseconds.
>  *** 
> {code:java}
> ...
> [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
> producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
> (kafka.log.ProducerStateManager)
> [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote 
> producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. 
> (kafka.log.ProducerStateManager)
> [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote 
> producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. 
> (kafka.log.ProducerStateManager)
> ... {code}
>  * From the analysis, we summarized the issue as 

[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load

2023-06-01 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15046:
-
Description: 
* Phenomenon:
 ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
 ** Producer response time 99%ile got quite bad when we performed replica 
reassignment on the cluster
 *** RequestQueue scope was significant
 ** Also request-time throttling happened at the incidental time. This caused 
producers to delay sending messages in the mean time.
 ** The disk I/O latency was higher than usual due to the high load for replica 
reassignment.
 *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
 * Analysis:
 ** The request-handler utilization was much higher than usual.
 *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
 ** Also, thread time utilization was much higher than usual on almost all users
 *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
 ** From taking jstack several times, for most of them, we found that a 
request-handler was doing fsync for flusing ProducerState and meanwhile other 
request-handlers were waiting Log#lock for appending messages.

 * 
 ** 
 *** 
{code:java}
"data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
runnable  [0x7ef9a12e2000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method)
at 
sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
at 
sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
at 
kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
at 
kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
at 
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
at 
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
at 
kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
Source)
at 
scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
at 
scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
at scala.collection.mutable.HashMap.map(HashMap.scala:35)
at 
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}

 * 
 ** Also there were bunch of logs that writing producer snapshots took hundreds 
of milliseconds.
 *** 
{code:java}
...
[2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote 
producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote 
producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. 
(kafka.log.ProducerStateManager)
... {code}

 * From the analysis, we summarized the issue as below:

 * 
 ** 1. Disk write latency got worse due to the replica reassignment
 *** We already use replication quota, and lowering the quota further may not 
be acceptable for too long assignment duration
 ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync 
latency
 *** This is done at every log segment roll.
 *** In our case, the broker hosts high load partitions so log roll is 
occurring very frequently.
 ** 3. During ProducerStateManager#takeSnapshot is doing fsync, all subsequent 
produce requests to the partition is blocked due to Log#lock
 ** 4. During produce requests waiting the lock, they consume request handler 
threads time so it's accounted as thread and caused throttling
 * Suggestion:
 ** We didn't see this phenomenon when we used Kafka 2.4.1.
 *** ProducerState fsync was introduced in 2.8.0 by this: 

[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load

2023-05-31 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15046:
-
Description: 
* Phenomenon:
 ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
 ** Producer response time 99%ile got quite bad when we performed replica 
reassignment on the cluster
 *** RequestQueue scope was significant
 ** Also request-time throttling happened at the incidental time. This caused 
producers to delay sending messages at the incidental time.
 ** At the incidental time, the disk I/O latency was higher than usual due to 
the high load for replica reassignment.
 *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
 * Analysis:
 ** The request-handler utilization was much higher than usual.
 *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
 ** Also, thread time utilization was much higher than usual on almost all users
 *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
 ** From taking jstack several times, for most of them, we found that a 
request-handler was doing fsync for flusing ProducerState and meanwhile other 
request-handlers were waiting Log#lock for appending messages.

 * 
 ** 
 *** 
{code:java}
"data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
runnable  [0x7ef9a12e2000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method)
at 
sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
at 
sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
at 
kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
at 
kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
at 
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
at 
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
at 
kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
Source)
at 
scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
at 
scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
at scala.collection.mutable.HashMap.map(HashMap.scala:35)
at 
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}

 * 
 ** Also there were bunch of logs that writing producer snapshots took hundreds 
of milliseconds.
 *** 
{code:java}
...
[2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote 
producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote 
producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. 
(kafka.log.ProducerStateManager)
... {code}

 * From the analysis, we summarized the issue as below:

 * 
 ** 1. Disk write latency got worse due to the replica reassignment
 *** We already use replication quota, and lowering the quota further may not 
be acceptable for too long assignment duration
 ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync 
latency
 *** This is done at every log segment roll.
 *** In our case, the broker hosts high load partitions so log roll is 
occurring very frequently.
 ** 3. During ProducerStateManager#takeSnapshot is doing fsync, all subsequent 
produce requests to the partition is blocked due to Log#lock
 ** 4. During produce requests waiting the lock, they consume request handler 
threads time so it's accounted as thread and caused throttling
 * Suggestion:
 ** We didn't see this phenomenon when we used Kafka 2.4.1.
 *** ProducerState fsync was

[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load

2023-05-31 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728214#comment-17728214
 ] 

Haruki Okada commented on KAFKA-15046:
--

If the suggestion (stop fsync-ing) makes sense, I'm happy to submit a patch.

> Produce performance issue under high disk load
> --
>
> Key: KAFKA-15046
> URL: https://issues.apache.org/jira/browse/KAFKA-15046
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.3.2
>Reporter: Haruki Okada
>Priority: Major
> Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happens almost all the time. This caused 
> producers to delay sending messages at the incidental time.
>  ** At the incidental time, the disk I/O latency was higher than usual due to 
> the high load for replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
> runnable  [0x7ef9a12e2000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
> at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
> at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
> at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
> at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
> at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
> - locked <0x00060d75d820> (a java.lang.Object)
> at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
> at 
> kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
> Source)
> at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
> at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
> at scala.collection.mutable.HashMap.map(HashMap.scala:35)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
> at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
> at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
> at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}
>  * 
>  ** Also there were bunch of logs that writing producer snapshots took 
> hundreds of milliseconds.
>  *** 
> {code:java}
> ...
> [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
> producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
> (kafka.log.ProducerStateManager)
> [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote 
> producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. 
> (kafka.log.ProducerStateManager)
> [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote 
> producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. 
> (kafka.log.ProducerStateManager)
> ... {code}
>  * From the analysis, we summarized the issue as below:

[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load

2023-05-31 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-15046:
-
Description: 
* Phenomenon:
 ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
 ** Producer response time 99%ile got quite bad when we performed replica 
reassignment on the cluster
 *** RequestQueue scope was significant
 ** Also request-time throttling happens almost all the time. This caused 
producers to delay sending messages at the incidental time.
 ** At the incidental time, the disk I/O latency was higher than usual due to 
the high load for replica reassignment.
 *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
 * Analysis:
 ** The request-handler utilization was much higher than usual.
 *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
 ** Also, thread time utilization was much higher than usual on almost all users
 *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
 ** From taking jstack several times, for most of them, we found that a 
request-handler was doing fsync for flusing ProducerState and meanwhile other 
request-handlers were waiting Log#lock for appending messages.

 * 
 ** 
 *** 
{code:java}
"data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
runnable  [0x7ef9a12e2000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method)
at 
sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
at 
sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
at 
kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
at 
kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
at 
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
at 
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
at 
kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
Source)
at 
scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
at 
scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
at scala.collection.mutable.HashMap.map(HashMap.scala:35)
at 
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}

 * 
 ** Also there were bunch of logs that writing producer snapshots took hundreds 
of milliseconds.
 *** 
{code:java}
...
[2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote 
producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote 
producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. 
(kafka.log.ProducerStateManager)
... {code}

 * From the analysis, we summarized the issue as below:

 * 
 ** 1. Disk write latency got worse due to the replica reassignment
 *** We already use replication quota, and lowering the quota further may not 
be acceptable for too long assignment duration
 ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync 
latency
 *** This is done at every log segment roll.
 *** In our case, the broker hosts high load partitions so log roll is 
occurring very frequently.
 ** 3. During ProducerStateManager#takeSnapshot is doing fsync, all subsequent 
produce requests to the partition is blocked due to Log#lock
 ** 4. During produce requests waiting the lock, they consume request handler 
threads time so it's accounted as thread and caused throttling
 * Suggestion:
 ** We didn't see this phenomenon when we used Kafka 2.4.1.
 *** ProducerState fsync was int

[jira] [Created] (KAFKA-15046) Produce performance issue under high disk load

2023-05-31 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15046:


 Summary: Produce performance issue under high disk load
 Key: KAFKA-15046
 URL: https://issues.apache.org/jira/browse/KAFKA-15046
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 3.3.2
Reporter: Haruki Okada
 Attachments: image-2023-06-01-12-46-30-058.png, 
image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
image-2023-06-01-12-56-19-108.png

* Phenomenon:
 ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
 ** Producer response time 99%ile got quite bad when we performed replica 
reassignment on the cluster
 *** RequestQueue scope was significant
 ** Also request-time throttling happens almost all the time. This caused 
producers to delay sending messages at the incidental time.
 ** At the incidental time, the disk I/O latency was higher than usual due to 
the high load for replica reassignment.
 *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
 * Analysis:
 ** The request-handler utilization was much higher than usual.
 *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
 ** Also, thread time utilization was much higher than usual on almost all users
 *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
 ** From taking jstack several times, for most of them, we found that a 
request-handler was doing fsync for flusing ProducerState and meanwhile other 
request-handlers were waiting Log#lock for appending messages.

 *** 
{code:java}
"data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
runnable  [0x7ef9a12e2000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method)
at 
sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
at 
sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
at 
kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
at 
kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
at 
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
at 
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
at 
kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
Source)
at 
scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
at 
scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
at scala.collection.mutable.HashMap.map(HashMap.scala:35)
at 
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}

 ** Also there were bunch of logs that writing producer snapshots took hundreds 
of milliseconds.
 *** 
{code:java}
...
[2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote 
producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote 
producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. 
(kafka.log.ProducerStateManager)
... {code}

 * From the analysis, we summarized the issue as below:

 ** 1. Disk write latency got worse due to the replica reassignment
 *** We already use replication quota, and lowering the quota further may not 
be acceptable for too long assignment duration
 ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync 
latency
 *** This is done at every log segment roll.
 *** In our case, the broker hosts hundreds of partition leaders with high 
load, so log roll is occurring very frequently.
 ** 3. During ProducerStateManager#takeSnapshot is doing fsync

[jira] [Commented] (KAFKA-14757) Kafka Cooperative Sticky Assignor results in significant duplicate consumption

2023-02-23 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693013#comment-17693013
 ] 

Haruki Okada commented on KAFKA-14757:
--

Possibly related: https://issues.apache.org/jira/browse/KAFKA-9382

 

In cooperative rebalancing, consumers can process new messages even during 
rebalancing but commits are rejected.

> Kafka Cooperative Sticky Assignor results in significant duplicate consumption
> --
>
> Key: KAFKA-14757
> URL: https://issues.apache.org/jira/browse/KAFKA-14757
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 3.1.1
> Environment: AWS MSK (broker) and Spring Kafka (2.8.7) for use in 
> Spring Boot consumers.
>Reporter: Siddharth Anand
>Priority: Critical
>
> Details may be found within the linked document:
> [Kafka Cooperative Sticky Assignor Issue : Duplicate Consumption | 
> [https://docs.google.com/document/d/1E7qAwGOpF8jo_YhF4NwUx9CXxUGJmT8OhHEqIg7-GfI/edit?usp=sharing]]
> In a nutshell, we noticed that the Cooperative Sticky Assignor resulted in 
> significant duplicate message consumption. During last year's F1 Grand Prix 
> events and World Cup soccer events, our company's Kafka-based platform 
> received live-traffic. This live traffic, coupled with autoscaled consumers 
> resulted in as much as 70% duplicate message consumption at the Kafka 
> consumers. 
> In December 2022, we ran a synthetic load test to confirm that duplicate 
> message consumption occurs during consumer scale out/in and Kafka partition 
> rebalancing when using the Cooperative Sticky Assignor. This issue does not 
> occur when using the Range Assignor.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT

2022-12-06 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644055#comment-17644055
 ] 

Haruki Okada commented on KAFKA-14445:
--

[~kirktrue] Oh I was not aware of KAFKA-14317. Thanks.

 

> Is there more involved in your patch that is not in the above PR

 

No.

However, as mentioned in KAFKA-10228, changing error type could be considered 
as breaking change so may need more discussions I guess.

My plan was just requesting metadata update on REQUEST_TIMED_OUT response as 
well, without changing the error type so more trivial.

 

> Producer doesn't request metadata update on REQUEST_TIMED_OUT
> -
>
> Key: KAFKA-14445
> URL: https://issues.apache.org/jira/browse/KAFKA-14445
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Priority: Major
>
> Produce requests may fail with timeout by `request.timeout.ms` in below two 
> cases:
>  * Didn't receive produce response within `request.timeout.ms`
>  * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the 
> broker
> Former case usually happens when a broker-machine got failed or there's 
> network glitch etc.
> In this case, the connection will be disconnected and metadata-update will be 
> requested to discover new leader: 
> [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556]
>  
> The problem is in latter case (REQUEST_TIMED_OUT on the broker).
> In this case, the produce request will be ended up with TimeoutException, 
> which doesn't inherit InvalidMetadataException so it doesn't trigger metadata 
> update.
>  
> Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side 
> problem, that metadata-update doesn't make much sense indeed.
>  
> However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT 
> could cause produce requests to retry unnecessarily , which may end up with 
> batch expiration due to delivery timeout.
> Below is the scenario we experienced:
>  * Environment:
>  ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1
>  ** min.insync.replicas=2
>  ** acks=all
>  * Scenario:
>  ** broker 1 "partially" failed
>  *** It lost ZooKeeper connection and kicked out from the cluster
>   There was controller log like:
>  * 
> {code:java}
> [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , 
> deleted brokers: 1, bounced brokers: {code}
>  * 
>  ** 
>  *** However, somehow the broker was able continued to receive produce 
> requests
>   We're still working on investigating how this is possible though.
>   Indeed, broker 1 was somewhat "alive" and keeps working according to 
> server.log
>  *** In other words, broker 1 became "zombie"
>  ** broker 2 was elected as new leader
>  *** broker 3 became follower of broker 2
>  *** However, since broker 1 was still out of cluster, it didn't receive 
> LeaderAndIsr so 1 kept thinking itself as the leader of tp-0
>  ** Meanwhile, producer keeps sending produce requests to broker 1 and 
> requests were failed due to REQUEST_TIMED_OUT because no brokers replicates 
> from broker 1.
>  *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't 
> have a change to update its stale metadata
>  
> So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, 
> to address the case that the old leader became "zombie"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT

2022-12-06 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-14445:
-
Description: 
Produce requests may fail with timeout by `request.timeout.ms` in below two 
cases:
 * Didn't receive produce response within `request.timeout.ms`
 * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the 
broker

Former case usually happens when a broker-machine got failed or there's network 
glitch etc.

In this case, the connection will be disconnected and metadata-update will be 
requested to discover new leader: 
[https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556]

 

The problem is in latter case (REQUEST_TIMED_OUT on the broker).

In this case, the produce request will be ended up with TimeoutException, which 
doesn't inherit InvalidMetadataException so it doesn't trigger metadata update.

 

Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side 
problem, that metadata-update doesn't make much sense indeed.

 

However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could 
cause produce requests to retry unnecessarily , which may end up with batch 
expiration due to delivery timeout.

Below is the scenario we experienced:
 * Environment:
 ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1
 ** min.insync.replicas=2
 ** acks=all
 * Scenario:
 ** broker 1 "partially" failed
 *** It lost ZooKeeper connection and kicked out from the cluster
  There was controller log like:
 * 
{code:java}
[2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , 
deleted brokers: 1, bounced brokers: {code}

 * 
 ** 
 *** However, somehow the broker was able continued to receive produce requests
  We're still working on investigating how this is possible though.
  Indeed, broker 1 was somewhat "alive" and keeps working according to 
server.log
 *** In other words, broker 1 became "zombie"
 ** broker 2 was elected as new leader
 *** broker 3 became follower of broker 2
 *** However, since broker 1 was still out of cluster, it didn't receive 
LeaderAndIsr so 1 kept thinking itself as the leader of tp-0
 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests 
were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 
1.
 *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't have 
a change to update its stale metadata

 

So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, to 
address the case that the old leader became "zombie"

  was:
Produce requests may fail with timeout by `request.timeout.ms` in below two 
cases:
 * Didn't receive produce response within `request.timeout.ms`
 * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the 
broker

Former case usually happens when a broker-machine got failed or there's network 
glitch etc.

In this case, the connection will be disconnected and metadata-update will be 
requested to discover new leader: 
[https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556]

 

The problem is in latter case (REQUEST_TIMED_OUT on the broker).

In this case, the produce request will be ended up with TimeoutException, which 
doesn't inherit InvalidMetadataException so it doesn't trigger metadata update.

 

Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side 
problem, that metadata-update doesn't make much sense indeed.

 

However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could 
cause produce requests to retry unnecessarily , which may end up with batch 
expiration due to delivery timeout.

Below is the scenario we experienced:
 * Environment:
 ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1
 ** min.insync.replicas=2
 ** acks=all
 * Scenario:
 ** broker 1 "partially" failed
 *** It lost ZooKeeper connection and kicked out from the cluster
  There was controller log like:
 * 
{code:java}
[2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , 
deleted brokers: 1, bounced brokers: {code}

 * 
 ** 
 *** However, somehow the broker was able continued to receive produce requests
  We're still working on investigating how this is possible though.
  Indeed, broker 1 was somewhat "alive" and keeps working according to 
server.log
 *** In other words, broker 1 became "zombie"
 ** broker 2 was elected as new leader
 *** broker 3 became follower of broker 2
 *** However, since broker 1 was still out of cluster, it didn't receive 
LeaderAndIsr so 1 kept thinking itself as the leader of tp-0
 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests 
were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 
1.
 *** REQUES

[jira] [Updated] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT

2022-12-06 Thread Haruki Okada (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haruki Okada updated KAFKA-14445:
-
Description: 
Produce requests may fail with timeout by `request.timeout.ms` in below two 
cases:
 * Didn't receive produce response within `request.timeout.ms`
 * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the 
broker

Former case usually happens when a broker-machine got failed or there's network 
glitch etc.

In this case, the connection will be disconnected and metadata-update will be 
requested to discover new leader: 
[https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556]

 

The problem is in latter case (REQUEST_TIMED_OUT on the broker).

In this case, the produce request will be ended up with TimeoutException, which 
doesn't inherit InvalidMetadataException so it doesn't trigger metadata update.

 

Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side 
problem, that metadata-update doesn't make much sense indeed.

 

However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could 
cause produce requests to retry unnecessarily , which may end up with batch 
expiration due to delivery timeout.

Below is the scenario we experienced:
 * Environment:
 ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1
 ** min.insync.replicas=2
 ** acks=all
 * Scenario:
 ** broker 1 "partially" failed
 *** It lost ZooKeeper connection and kicked out from the cluster
  There was controller log like:
 * 
{code:java}
[2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , 
deleted brokers: 1, bounced brokers: {code}

 * 
 ** 
 *** However, somehow the broker was able continued to receive produce requests
  We're still working on investigating how this is possible though.
  Indeed, broker 1 was somewhat "alive" and keeps working according to 
server.log
 *** In other words, broker 1 became "zombie"
 ** broker 2 was elected as new leader
 *** broker 3 became follower of broker 2
 *** However, since broker 1 was still out of cluster, it didn't receive 
LeaderAndIsr so 1 kept thinking itself as the leader of tp-0
 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests 
were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 
1.
 *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't have 
a change to update its stale metadata

 

So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, 
for the case that the old leader became "zombie"

  was:
Produce requests may fail with timeout by `request.timeout.ms` in below two 
cases:
 * Didn't receive produce response within `request.timeout.ms`
 * Produce response received, but it ended up with `REQUEST_TIMEOUT_MS` in the 
broker

Former case usually happens when a broker-machine got failed or there's network 
glitch etc.

In this case, the connection will be disconnected and metadata-update will be 
requested to discover new leader: 
[https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556]

 

The problem is in latter case (REQUEST_TIMED_OUT on the broker).

In this case, the produce request will be ended up with TimeoutException, which 
doesn't inherit InvalidMetadataException so it doesn't trigger metadata update.

 

Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side 
problem, that metadata-update doesn't make much sense indeed.

 

However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could 
cause produce requests to retry unnecessarily , which may end up with batch 
expiration due to delivery timeout.

Below is the scenario we experienced:
 * Environment:
 ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1
 ** min.insync.replicas=2
 ** acks=all
 * Scenario:
 ** broker 1 "partially" failed
 *** It lost ZooKeeper connection and kicked out from the cluster
  There was controller log like:
 * 
{code:java}
[2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , 
deleted brokers: 1, bounced brokers: {code}

 *** However, somehow the broker was able continued to receive produce requests
  We're still working on investigating how this is possible though.
  Indeed, broker 1 was somewhat "alive" and keeps working according to 
server.log
 *** In other words, broker 1 became "zombie"
 ** broker 2 was elected as new leader
 *** broker 3 became follower of broker 2
 *** However, since broker 1 was still out of cluster, it didn't receive 
LeaderAndIsr so 1 kept thinking itself as the leader of tp-0
 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests 
were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 
1.
 *** REQUEST_TIMED_OUT doe

[jira] [Commented] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT

2022-12-06 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643768#comment-17643768
 ] 

Haruki Okada commented on KAFKA-14445:
--

If the suggestion makes sense, we're happy to send a patch.

> Producer doesn't request metadata update on REQUEST_TIMED_OUT
> -
>
> Key: KAFKA-14445
> URL: https://issues.apache.org/jira/browse/KAFKA-14445
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Haruki Okada
>Priority: Major
>
> Produce requests may fail with timeout by `request.timeout.ms` in below two 
> cases:
>  * Didn't receive produce response within `request.timeout.ms`
>  * Produce response received, but it ended up with `REQUEST_TIMEOUT_MS` in 
> the broker
> Former case usually happens when a broker-machine got failed or there's 
> network glitch etc.
> In this case, the connection will be disconnected and metadata-update will be 
> requested to discover new leader: 
> [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556]
>  
> The problem is in latter case (REQUEST_TIMED_OUT on the broker).
> In this case, the produce request will be ended up with TimeoutException, 
> which doesn't inherit InvalidMetadataException so it doesn't trigger metadata 
> update.
>  
> Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side 
> problem, that metadata-update doesn't make much sense indeed.
>  
> However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT 
> could cause produce requests to retry unnecessarily , which may end up with 
> batch expiration due to delivery timeout.
> Below is the scenario we experienced:
>  * Environment:
>  ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1
>  ** min.insync.replicas=2
>  ** acks=all
>  * Scenario:
>  ** broker 1 "partially" failed
>  *** It lost ZooKeeper connection and kicked out from the cluster
>   There was controller log like:
>  * 
> {code:java}
> [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , 
> deleted brokers: 1, bounced brokers: {code}
>  *** However, somehow the broker was able continued to receive produce 
> requests
>   We're still working on investigating how this is possible though.
>   Indeed, broker 1 was somewhat "alive" and keeps working according to 
> server.log
>  *** In other words, broker 1 became "zombie"
>  ** broker 2 was elected as new leader
>  *** broker 3 became follower of broker 2
>  *** However, since broker 1 was still out of cluster, it didn't receive 
> LeaderAndIsr so 1 kept thinking itself as the leader of tp-0
>  ** Meanwhile, producer keeps sending produce requests to broker 1 and 
> requests were failed due to REQUEST_TIMED_OUT because no brokers replicates 
> from broker 1.
>  *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't 
> have a change to update its stale metadata
>  
> So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, 
> for the case that the old leader became "zombie"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT

2022-12-06 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-14445:


 Summary: Producer doesn't request metadata update on 
REQUEST_TIMED_OUT
 Key: KAFKA-14445
 URL: https://issues.apache.org/jira/browse/KAFKA-14445
 Project: Kafka
  Issue Type: Improvement
Reporter: Haruki Okada


Produce requests may fail with timeout by `request.timeout.ms` in below two 
cases:
 * Didn't receive produce response within `request.timeout.ms`
 * Produce response received, but it ended up with `REQUEST_TIMEOUT_MS` in the 
broker

Former case usually happens when a broker-machine got failed or there's network 
glitch etc.

In this case, the connection will be disconnected and metadata-update will be 
requested to discover new leader: 
[https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556]

 

The problem is in latter case (REQUEST_TIMED_OUT on the broker).

In this case, the produce request will be ended up with TimeoutException, which 
doesn't inherit InvalidMetadataException so it doesn't trigger metadata update.

 

Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side 
problem, that metadata-update doesn't make much sense indeed.

 

However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could 
cause produce requests to retry unnecessarily , which may end up with batch 
expiration due to delivery timeout.

Below is the scenario we experienced:
 * Environment:
 ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1
 ** min.insync.replicas=2
 ** acks=all
 * Scenario:
 ** broker 1 "partially" failed
 *** It lost ZooKeeper connection and kicked out from the cluster
  There was controller log like:
 * 
{code:java}
[2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , 
deleted brokers: 1, bounced brokers: {code}

 *** However, somehow the broker was able continued to receive produce requests
  We're still working on investigating how this is possible though.
  Indeed, broker 1 was somewhat "alive" and keeps working according to 
server.log
 *** In other words, broker 1 became "zombie"
 ** broker 2 was elected as new leader
 *** broker 3 became follower of broker 2
 *** However, since broker 1 was still out of cluster, it didn't receive 
LeaderAndIsr so 1 kept thinking itself as the leader of tp-0
 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests 
were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 
1.
 *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't have 
a change to update its stale metadata

 

So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, 
for the case that the old leader became "zombie"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13572) Negative value for 'Preferred Replica Imbalance' metric

2022-07-14 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566872#comment-17566872
 ] 

Haruki Okada commented on KAFKA-13572:
--

We experienced similar phenomenon in our Kafka cluster and we found that 
following scenario can cause negative metric.

Let's say there are topic-A, topic-B.

 
 # Initiate topic deletion of topic-A
 ** TopicDeletionManager#enqueueTopicsForDeletion is called with argument 
Set(topic-A)
 *** 
[https://github.com/apache/kafka/blob/3.2.0/core/src/main/scala/kafka/controller/KafkaController.scala#L1771]
 # During topic-A's deletion procedure, topic-A's all partitions are marked as 
Offline (Leader = -1)
 ** 
[https://github.com/apache/kafka/blob/3.2.0/core/src/main/scala/kafka/controller/ReplicaStateMachine.scala#L368]
 # Before topic-A's deletion procedure completes, initiate topic deletion of 
topic-B
 ** Since topic-A's ZK delete-topic node still exists, 
TopicDeletionManager#enqueueTopicsForDeletion is called with argument 
Set(topic-A, topic-B)
 ** ControllerContext#cleanPreferredReplicaImbalanceMetric is called for both 
topic-A, topic-B
 *** 
[https://github.com/apache/kafka/blob/3.2.0/core/src/main/scala/kafka/controller/ControllerContext.scala#L496]
 *** Since topic-A is now NoLeader, `!hasPreferredLeader(replicaAssignment, 
leadershipInfo)` evaluates to true, then `preferredReplicaImbalanceCount` is 
decremented unexpectedly

> Negative value for 'Preferred Replica Imbalance' metric
> ---
>
> Key: KAFKA-13572
> URL: https://issues.apache.org/jira/browse/KAFKA-13572
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Siddharth Ahuja
>Priority: Major
> Attachments: 
> kafka_negative_preferred-replica-imbalance-count_jmx_2.JPG
>
>
> A negative value (-822) for the metric - 
> {{kafka_controller_kafkacontroller_preferredreplicaimbalancecount}} has been 
> observed - please see the attached screenshot and the output below:
> {code:java}
> $ curl -s http://localhost:9101/metrics | fgrep 
> 'kafka_controller_kafkacontroller_preferredreplicaimbalancecount'
> # HELP kafka_controller_kafkacontroller_preferredreplicaimbalancecount 
> Attribute exposed for management (kafka.controller name=PreferredReplicaImbalanceCount><>Value)
> # TYPE kafka_controller_kafkacontroller_preferredreplicaimbalancecount gauge
> kafka_controller_kafkacontroller_preferredreplicaimbalancecount -822.0
> {code}
> The issue has appeared after an operation where the number of partitions for 
> some topics were increased, and some topics were deleted/created in order to 
> decrease the number of their partitions.
> Ran the following command to check if there is/are any instance/s where the 
> preferred leader (1st broker in the Replica list) is not the current Leader:
>  
> {code:java}
> % grep ".*Topic:.*Partition:.*Leader:.*Replicas:.*Isr:.*Offline:.*" 
> kafka-topics_describe.out | awk '{print $6 " " $8}' | cut -d "," -f1 | awk 
> '{print $0, ($1==$2?_:"NOT") "MATCHED"}'|grep NOT | wc -l
>  0
> {code}
> but could not find any such instances.
> {{leader.imbalance.per.broker.percentage=2}} is set for all the brokers in 
> the cluster which means that we are allowed to have an imbalance of up to 2% 
> for preferred leaders. This seems to be a valid value, as such, this setting 
> should not contribute towards a negative metric.
> The metric seems to be getting subtracted in the code 
> [here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerContext.scala#L474-L503]
>  , however it is not clear when it can become -ve (i.e. subtracted more than 
> added) in absence of any comments or debug/trace level logs in the code. 
> However, one thing is for sure, you either have no imbalance (0) or have 
> imbalance (> 0), it doesn’t make sense for the metric to be < 0. 
> FWIW, no other anomalies besides this have been detected.
> Considering these metrics get actively monitored, we should look at adding 
> DEBUG/TRACE logging around the addition/subtraction of these metrics (and 
> elsewhere where appropriate) to identify any potential issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-13403) KafkaServer crashes when deleting topics due to the race in log deletion

2022-04-26 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527800#comment-17527800
 ] 

Haruki Okada edited comment on KAFKA-13403 at 4/26/22 7:35 AM:
---

[~showuon] Hi, could you help reviewing the PR 
[https://github.com/apache/kafka/pull/11438] ?

 

-There seems to be another ticket likely due to the same cause: 
https://issues.apache.org/jira/browse/KAFKA-13855-


After took another look at 13855, seems currently there's no clue to conclude 
it is the same cause.


was (Author: ocadaruma):
[~showuon] Hi, could you help reviewing the PR 
[https://github.com/apache/kafka/pull/11438] ?

 

There seems to be another ticket likely due to the same cause: 
https://issues.apache.org/jira/browse/KAFKA-13855

> KafkaServer crashes when deleting topics due to the race in log deletion
> 
>
> Key: KAFKA-13403
> URL: https://issues.apache.org/jira/browse/KAFKA-13403
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.4.1
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
>
> h2. Environment
>  * OS: CentOS Linux release 7.6
>  * Kafka version: 2.4.1
>  * 
>  ** But as far as I checked the code, I think same phenomenon could happen 
> even on trunk
>  * Kafka log directory: RAID1+0 (i.e. not using JBOD so only single log.dirs 
> is set)
>  * Java version: AdoptOpenJDK 1.8.0_282
> h2. Phenomenon
> When we were in the middle of deleting several topics by `kafka-topics.sh 
> --delete --topic blah-blah`, one broker in our cluster crashed due to 
> following exception:
>  
> {code:java}
> [2021-10-21 18:19:19,122] ERROR Shutdown broker because all log dirs in 
> /data/kafka have failed (kafka.log.LogManager)
> {code}
>  
>  
> We also found NoSuchFileException was thrown right before the crash when 
> LogManager tried to delete logs for some partitions.
>  
> {code:java}
> [2021-10-21 18:19:18,849] ERROR Error while deleting log for foo-bar-topic-5 
> in dir /data/kafka (kafka.server.LogDirFailureChannel)
> java.nio.file.NoSuchFileException: 
> /data/kafka/foo-bar-topic-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete/03877066.timeindex.deleted
> at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at 
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> at 
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
> at 
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> at java.nio.file.Files.readAttributes(Files.java:1737)
> at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
> at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
> at java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372)
> at java.nio.file.Files.walkFileTree(Files.java:2706)
> at java.nio.file.Files.walkFileTree(Files.java:2742)
> at org.apache.kafka.common.utils.Utils.delete(Utils.java:732)
> at kafka.log.Log.$anonfun$delete$2(Log.scala:2036)
> at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at kafka.log.Log.maybeHandleIOException(Log.scala:2343)
> at kafka.log.Log.delete(Log.scala:2030)
> at kafka.log.LogManager.deleteLogs(LogManager.scala:826)
> at kafka.log.LogManager.$anonfun$deleteLogs$6(LogManager.scala:840)
> at 
> kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> So, the log-dir was marked as offline and ended up with KafkaServer crash 
> because the broker has only single log-dir.
> h2. Cause
> We also found below logs right before the NoSuchFileException.
>  
> {code:java}
> [2021-10-21 18:18:17,829] INFO Log for partition foo-bar-5 is r

[jira] [Commented] (KAFKA-13855) FileNotFoundException: Error while rolling log segment for topic partition in dir

2022-04-26 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527961#comment-17527961
 ] 

Haruki Okada commented on KAFKA-13855:
--

H-mm sorry, sounds like I just overstepped.

Yeah, seems we need to dig into this further. Please nevermind for now.

> FileNotFoundException: Error while rolling log segment for topic partition in 
> dir
> -
>
> Key: KAFKA-13855
> URL: https://issues.apache.org/jira/browse/KAFKA-13855
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.6.1
>Reporter: Sergey Ivanov
>Priority: Major
>
> Hello,
> We faced an issue when one of Kafka broker in cluster has failed with an 
> exception and restarted:
>  
> {code:java}
> [2022-04-13T09:51:44,563][ERROR][category=kafka.server.LogDirFailureChannel] 
> Error while rolling log segment for prod_data_topic-7 in dir 
> /var/opt/kafka/data/1
> java.io.FileNotFoundException: 
> /var/opt/kafka/data/1/prod_data_topic-7/26872377.index (No such 
> file or directory)
>   at java.base/java.io.RandomAccessFile.open0(Native Method)
>   at java.base/java.io.RandomAccessFile.open(Unknown Source)
>   at java.base/java.io.RandomAccessFile.(Unknown Source)
>   at java.base/java.io.RandomAccessFile.(Unknown Source)
>   at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:183)
>   at kafka.log.AbstractIndex.resize(AbstractIndex.scala:176)
>   at 
> kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:242)
>   at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:242)
>   at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:508)
>   at kafka.log.Log.$anonfun$roll$8(Log.scala:1916)
>   at kafka.log.Log.$anonfun$roll$2(Log.scala:1916)
>   at kafka.log.Log.roll(Log.scala:2349)
>   at kafka.log.Log.maybeRoll(Log.scala:1865)
>   at kafka.log.Log.$anonfun$append$2(Log.scala:1169)
>   at kafka.log.Log.append(Log.scala:2349)
>   at kafka.log.Log.appendAsLeader(Log.scala:1019)
>   at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:984)
>   at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:972)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$4(ReplicaManager.scala:883)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273)
>   at 
> scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>   at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>   at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:273)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:266)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:871)
>   at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:571)
>   at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:605)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:132)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:70)
>   at java.base/java.lang.Thread.run(Unknown Source)
> [2022-04-13T09:51:44,812][ERROR][category=kafka.log.LogManager] Shutdown 
> broker because all log dirs in /var/opt/kafka/data/1 have failed {code}
> There are no any additional useful information in logs, just one warn before 
> this error:
> {code:java}
> [2022-04-13T09:51:44,720][WARN][category=kafka.server.ReplicaManager] 
> [ReplicaManager broker=1] Broker 1 stopped fetcher for partitions 
> __consumer_offsets-22,prod_data_topic-5,__consumer_offsets-30,
> 
> prod_data_topic-0 and stopped moving logs for partitions  because they are in 
> the failed log directory /var/opt/kafka/data/1.
> [2022-04-13T09:51:44,720][WARN][category=kafka.log.LogManager] Stopping 
> serving logs in dir /var/opt/kafka/data/1{code}
> The topic configuration is:
> {code:java}
> /opt/kafka $ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 
> --describe --topic prod_data_topic
> Topic: prod_data_topic        PartitionCount: 12      ReplicationFactor: 3    
> Configs: 
> min.insync.replicas=2,segment.bytes=1073741824,max.message.bytes=15728640,retention.bytes=4294967296
>         Topic: prod_data_topic        Partition: 0    Leader: 3       
> Replicas: 3,1,2 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 1    Leader: 1       
> Replicas: 1,2,3 Isr: 3,2,1
>         Topic: 

[jira] [Commented] (KAFKA-13403) KafkaServer crashes when deleting topics due to the race in log deletion

2022-04-25 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527800#comment-17527800
 ] 

Haruki Okada commented on KAFKA-13403:
--

[~showuon] Hi, could you help reviewing the PR 
[https://github.com/apache/kafka/pull/11438] ?

 

There seems to be another ticket likely due to the same cause: 
https://issues.apache.org/jira/browse/KAFKA-13855

> KafkaServer crashes when deleting topics due to the race in log deletion
> 
>
> Key: KAFKA-13403
> URL: https://issues.apache.org/jira/browse/KAFKA-13403
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.4.1
>Reporter: Haruki Okada
>Assignee: Haruki Okada
>Priority: Major
>
> h2. Environment
>  * OS: CentOS Linux release 7.6
>  * Kafka version: 2.4.1
>  * 
>  ** But as far as I checked the code, I think same phenomenon could happen 
> even on trunk
>  * Kafka log directory: RAID1+0 (i.e. not using JBOD so only single log.dirs 
> is set)
>  * Java version: AdoptOpenJDK 1.8.0_282
> h2. Phenomenon
> When we were in the middle of deleting several topics by `kafka-topics.sh 
> --delete --topic blah-blah`, one broker in our cluster crashed due to 
> following exception:
>  
> {code:java}
> [2021-10-21 18:19:19,122] ERROR Shutdown broker because all log dirs in 
> /data/kafka have failed (kafka.log.LogManager)
> {code}
>  
>  
> We also found NoSuchFileException was thrown right before the crash when 
> LogManager tried to delete logs for some partitions.
>  
> {code:java}
> [2021-10-21 18:19:18,849] ERROR Error while deleting log for foo-bar-topic-5 
> in dir /data/kafka (kafka.server.LogDirFailureChannel)
> java.nio.file.NoSuchFileException: 
> /data/kafka/foo-bar-topic-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete/03877066.timeindex.deleted
> at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at 
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> at 
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
> at 
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> at java.nio.file.Files.readAttributes(Files.java:1737)
> at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
> at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
> at java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372)
> at java.nio.file.Files.walkFileTree(Files.java:2706)
> at java.nio.file.Files.walkFileTree(Files.java:2742)
> at org.apache.kafka.common.utils.Utils.delete(Utils.java:732)
> at kafka.log.Log.$anonfun$delete$2(Log.scala:2036)
> at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at kafka.log.Log.maybeHandleIOException(Log.scala:2343)
> at kafka.log.Log.delete(Log.scala:2030)
> at kafka.log.LogManager.deleteLogs(LogManager.scala:826)
> at kafka.log.LogManager.$anonfun$deleteLogs$6(LogManager.scala:840)
> at 
> kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> So, the log-dir was marked as offline and ended up with KafkaServer crash 
> because the broker has only single log-dir.
> h2. Cause
> We also found below logs right before the NoSuchFileException.
>  
> {code:java}
> [2021-10-21 18:18:17,829] INFO Log for partition foo-bar-5 is renamed to 
> /data/kafka/foo-bar-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete and is 
> scheduled for deletion (kafka.log.LogManager)
> [2021-10-21 18:18:17,900] INFO [Log partition=foo-bar-5, dir=/data/kafka] 
> Found deletable segments with base offsets [3877066] due to retention time 
> 17280ms breach (kafka.log.Log)[2021-10-21 18:18:17,901] INFO [Log 
> partition=foo-bar-5, dir=/data/ka

[jira] [Commented] (KAFKA-13855) FileNotFoundException: Error while rolling log segment for topic partition in dir

2022-04-25 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527797#comment-17527797
 ] 

Haruki Okada commented on KAFKA-13855:
--

I guess that's same cause as https://issues.apache.org/jira/browse/KAFKA-13403 

> FileNotFoundException: Error while rolling log segment for topic partition in 
> dir
> -
>
> Key: KAFKA-13855
> URL: https://issues.apache.org/jira/browse/KAFKA-13855
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.6.1
>Reporter: Sergey Ivanov
>Priority: Major
>
> Hello,
> We faced an issue when one of Kafka broker in cluster has failed with an 
> exception and restarted:
>  
> {code:java}
> [2022-04-13T09:51:44,563][ERROR][category=kafka.server.LogDirFailureChannel] 
> Error while rolling log segment for prod_data_topic-7 in dir 
> /var/opt/kafka/data/1
> java.io.FileNotFoundException: 
> /var/opt/kafka/data/1/prod_data_topic-7/26872377.index (No such 
> file or directory)
>   at java.base/java.io.RandomAccessFile.open0(Native Method)
>   at java.base/java.io.RandomAccessFile.open(Unknown Source)
>   at java.base/java.io.RandomAccessFile.(Unknown Source)
>   at java.base/java.io.RandomAccessFile.(Unknown Source)
>   at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:183)
>   at kafka.log.AbstractIndex.resize(AbstractIndex.scala:176)
>   at 
> kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:242)
>   at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:242)
>   at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:508)
>   at kafka.log.Log.$anonfun$roll$8(Log.scala:1916)
>   at kafka.log.Log.$anonfun$roll$2(Log.scala:1916)
>   at kafka.log.Log.roll(Log.scala:2349)
>   at kafka.log.Log.maybeRoll(Log.scala:1865)
>   at kafka.log.Log.$anonfun$append$2(Log.scala:1169)
>   at kafka.log.Log.append(Log.scala:2349)
>   at kafka.log.Log.appendAsLeader(Log.scala:1019)
>   at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:984)
>   at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:972)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$4(ReplicaManager.scala:883)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273)
>   at 
> scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>   at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>   at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:273)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:266)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:871)
>   at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:571)
>   at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:605)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:132)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:70)
>   at java.base/java.lang.Thread.run(Unknown Source)
> [2022-04-13T09:51:44,812][ERROR][category=kafka.log.LogManager] Shutdown 
> broker because all log dirs in /var/opt/kafka/data/1 have failed {code}
> There are no any additional useful information in logs, just one warn before 
> this error:
> {code:java}
> [2022-04-13T09:51:44,720][WARN][category=kafka.server.ReplicaManager] 
> [ReplicaManager broker=1] Broker 1 stopped fetcher for partitions 
> __consumer_offsets-22,prod_data_topic-5,__consumer_offsets-30,
> 
> prod_data_topic-0 and stopped moving logs for partitions  because they are in 
> the failed log directory /var/opt/kafka/data/1.
> [2022-04-13T09:51:44,720][WARN][category=kafka.log.LogManager] Stopping 
> serving logs in dir /var/opt/kafka/data/1{code}
> The topic configuration is:
> {code:java}
> /opt/kafka $ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 
> --describe --topic prod_data_topic
> Topic: prod_data_topic        PartitionCount: 12      ReplicationFactor: 3    
> Configs: 
> min.insync.replicas=2,segment.bytes=1073741824,max.message.bytes=15728640,retention.bytes=4294967296
>         Topic: prod_data_topic        Partition: 0    Leader: 3       
> Replicas: 3,1,2 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 1    Leader: 1       
> Replicas: 1,2,3 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 2  

[jira] [Comment Edited] (KAFKA-10690) Produce-response delay caused by lagging replica fetch which affects in-sync one

2022-03-09 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503911#comment-17503911
 ] 

Haruki Okada edited comment on KAFKA-10690 at 3/10/22, 12:14 AM:
-

Thanks for the comment. 

 

[~showuon] 

> Are you sure this issue is due to the `in-sync` replica fetch?

 

Yeah, as long as replica fetch is `out-of-sync`, it doesn't block 
produce-request so the issue happens only on `in-sync` replica when `in-sync` 
replica fetching and `out-of-sync` replica fetching are done in same replica 
fetcher thread on follower side.

 

> Could you have a PoC to add an additional thread pool for lagging replica to 
> confirm this solution?

 

Haven't tried, as we wanted to confirm if anyone encounter similar issue or not 
(and if anyone addressed it in some way) first. But let us consider!

 

[~junrao] 

> Have you tried enabling replication throttling?

 

Yeah, we use replication throttling, and we suppose disk's performance itself 
is stable even on lagging-replica fetch.

We use HDD, so reading the data takes few~tens of milliseconds per IO even it's 
stable.

So if lagging replica fetch (likely not in page cache so causes disk reads) and 
in-sync replica fetch are done in same replica fetcher thread, in-sync one 
greatly affected by due to lagging one.


was (Author: ocadaruma):
Thanks for the comment. 

 

[~showuon] 

 

> Are you sure this issue is due to the `in-sync` replica fetch?

 

Yeah, as long as replica fetch is `out-of-sync`, it doesn't block 
produce-request so the issue happens only on `in-sync` replica when `in-sync` 
replica fetching and `out-of-sync` replica fetching are done in same replica 
fetcher thread on follower side.

 

> Could you have a PoC to add an additional thread pool for lagging replica to 
> confirm this solution?

 

Haven't tried, as we wanted to confirm if anyone encounter similar issue or not 
(and if anyone addressed it in some way) first. But let us consider!

 

[~junrao] 

 

> Have you tried enabling replication throttling?

 

Yeah, we use replication throttling, and we suppose disk's performance itself 
is stable even on lagging-replica fetch.

We use HDD, so reading the data takes few~tens of milliseconds per IO even it's 
stable.

So if lagging replica fetch (likely not in page cache so causes disk reads) and 
in-sync replica fetch are done in same replica fetcher thread (i.e. in same 
Fetch request), in-sync one greatly affected by due to lagging one.

> Produce-response delay caused by lagging replica fetch which affects in-sync 
> one
> 
>
> Key: KAFKA-10690
> URL: https://issues.apache.org/jira/browse/KAFKA-10690
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.4.1
>Reporter: Haruki Okada
>Priority: Major
> Attachments: image-2020-11-06-11-15-21-781.png, 
> image-2020-11-06-11-15-38-390.png, image-2020-11-06-11-17-09-910.png
>
>
> h2. Our environment
>  * Kafka version: 2.4.1
> h2. Phenomenon
>  * Produce response time 99th (remote scope) degrades to 500ms, which is 20 
> times worse than usual
>  ** Meanwhile, the cluster was running replica reassignment to service-in new 
> machine to recover replicas which held by failed (Hardware issue) broker 
> machine
> !image-2020-11-06-11-15-21-781.png|width=292,height=166!
> h2. Analysis
> Let's say
>  * broker-X: The broker we observed produce latency degradation
>  * broker-Y: The broker under servicing-in
> broker-Y was catching up replicas of partitions:
>  * partition-A: has relatively small log size
>  * partition-B: has large log size
> (actually, broker-Y was catching-up many other partitions. I noted only two 
> partitions here to make explanation simple)
> broker-X was the leader for both partition-A and partition-B.
> We found that both partition-A and partition-B are assigned to same 
> ReplicaFetcherThread of broker-Y, and produce latency started to degrade 
> right after broker-Y finished catching up partition-A.
> !image-2020-11-06-11-17-09-910.png|width=476,height=174!
> Besides, we observed disk reads on broker-X during service-in. (This is 
> natural since old segments are likely not in page cache)
> !image-2020-11-06-11-15-38-390.png|width=292,height=193!
> So we suspected that:
>  * In-sync replica fetch (partition-A) was involved by lagging replica fetch 
> (partition-B), which should be slow because it causes actual disk reads
>  ** Since ReplicaFetcherThread sends fetch requests in blocking manner, next 
> fetch request can't be sent until one fetch request completes
>  ** => Causes in-sync replica fetch for partitions assigned to same replica 
> fetcher thread to delay
>  ** => Causes remote scope produce latency degradation
> h2. Possib

[jira] [Commented] (KAFKA-10690) Produce-response delay caused by lagging replica fetch which affects in-sync one

2022-03-09 Thread Haruki Okada (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503911#comment-17503911
 ] 

Haruki Okada commented on KAFKA-10690:
--

Thanks for the comment. 

 

[~showuon] 

 

> Are you sure this issue is due to the `in-sync` replica fetch?

 

Yeah, as long as replica fetch is `out-of-sync`, it doesn't block 
produce-request so the issue happens only on `in-sync` replica when `in-sync` 
replica fetching and `out-of-sync` replica fetching are done in same replica 
fetcher thread on follower side.

 

> Could you have a PoC to add an additional thread pool for lagging replica to 
> confirm this solution?

 

Haven't tried, as we wanted to confirm if anyone encounter similar issue or not 
(and if anyone addressed it in some way) first. But let us consider!

 

[~junrao] 

 

> Have you tried enabling replication throttling?

 

Yeah, we use replication throttling, and we suppose disk's performance itself 
is stable even on lagging-replica fetch.

We use HDD, so reading the data takes few~tens of milliseconds per IO even it's 
stable.

So if lagging replica fetch (likely not in page cache so causes disk reads) and 
in-sync replica fetch are done in same replica fetcher thread (i.e. in same 
Fetch request), in-sync one greatly affected by due to lagging one.

> Produce-response delay caused by lagging replica fetch which affects in-sync 
> one
> 
>
> Key: KAFKA-10690
> URL: https://issues.apache.org/jira/browse/KAFKA-10690
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.4.1
>Reporter: Haruki Okada
>Priority: Major
> Attachments: image-2020-11-06-11-15-21-781.png, 
> image-2020-11-06-11-15-38-390.png, image-2020-11-06-11-17-09-910.png
>
>
> h2. Our environment
>  * Kafka version: 2.4.1
> h2. Phenomenon
>  * Produce response time 99th (remote scope) degrades to 500ms, which is 20 
> times worse than usual
>  ** Meanwhile, the cluster was running replica reassignment to service-in new 
> machine to recover replicas which held by failed (Hardware issue) broker 
> machine
> !image-2020-11-06-11-15-21-781.png|width=292,height=166!
> h2. Analysis
> Let's say
>  * broker-X: The broker we observed produce latency degradation
>  * broker-Y: The broker under servicing-in
> broker-Y was catching up replicas of partitions:
>  * partition-A: has relatively small log size
>  * partition-B: has large log size
> (actually, broker-Y was catching-up many other partitions. I noted only two 
> partitions here to make explanation simple)
> broker-X was the leader for both partition-A and partition-B.
> We found that both partition-A and partition-B are assigned to same 
> ReplicaFetcherThread of broker-Y, and produce latency started to degrade 
> right after broker-Y finished catching up partition-A.
> !image-2020-11-06-11-17-09-910.png|width=476,height=174!
> Besides, we observed disk reads on broker-X during service-in. (This is 
> natural since old segments are likely not in page cache)
> !image-2020-11-06-11-15-38-390.png|width=292,height=193!
> So we suspected that:
>  * In-sync replica fetch (partition-A) was involved by lagging replica fetch 
> (partition-B), which should be slow because it causes actual disk reads
>  ** Since ReplicaFetcherThread sends fetch requests in blocking manner, next 
> fetch request can't be sent until one fetch request completes
>  ** => Causes in-sync replica fetch for partitions assigned to same replica 
> fetcher thread to delay
>  ** => Causes remote scope produce latency degradation
> h2. Possible fix
> We think this issue can be addressed by designating part of 
> ReplicaFetcherThread (or creating another thread pool) for lagging replica 
> catching-up, but not so sure this is the appropriate way.
> Please give your opinions about this issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   >