[jira] [Created] (KAFKA-6377) Flaky test BufferPoolTest.testBlockTimeout
Matthias J. Sax created KAFKA-6377: -- Summary: Flaky test BufferPoolTest.testBlockTimeout Key: KAFKA-6377 URL: https://issues.apache.org/jira/browse/KAFKA-6377 Project: Kafka Issue Type: Bug Components: producer Reporter: Matthias J. Sax {noformat} java.lang.AssertionError: available memory8 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.kafka.clients.producer.internals.BufferPoolTest.testBlockTimeout(BufferPoolTest.java:190) {noformat} https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/10036/testReport/junit/org.apache.kafka.clients.producer.internals/BufferPoolTest/testBlockTimeout/ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-6303) Potential lack of synchronization in NioEchoServer#AcceptorThread
[ https://issues.apache.org/jira/browse/KAFKA-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated KAFKA-6303: -- Component/s: network > Potential lack of synchronization in NioEchoServer#AcceptorThread > - > > Key: KAFKA-6303 > URL: https://issues.apache.org/jira/browse/KAFKA-6303 > Project: Kafka > Issue Type: Bug > Components: network >Reporter: Ted Yu >Assignee: siva santhalingam >Priority: Minor > > In the run() method: > {code} > SocketChannel socketChannel = > ((ServerSocketChannel) key.channel()).accept(); > socketChannel.configureBlocking(false); > newChannels.add(socketChannel); > {code} > Modification to newChannels should be protected by synchronized block. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately
[ https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293719#comment-16293719 ] Frederic Arno commented on KAFKA-6323: -- Although I've got very little experience with Kafka (and none at all with STREAM_TIME punctuation), Guozhang's proposal sounds good to me. I think there's something to discuss about managing the gap between punctuations. Let's say the last punctuation happened at T0, and the next is planned at T1. Let's also consider T2 = T1 + x*interval, where x >= 2. With STREAM_TIME, if we get no data until T2, the gap is bigger than interval and according to Guozhang's proposal we punctuate only once at T2. And will punctuate next at T2+interval. With WALL_CLOCK_TIME it could also happen that we don't effectively punctuate before T2 (GC pause, overload, ...). If that happens should we also only punctuate once at T2 and next at T2+interval? > punctuate with WALL_CLOCK_TIME triggered immediately > > > Key: KAFKA-6323 > URL: https://issues.apache.org/jira/browse/KAFKA-6323 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 >Reporter: Frederic Arno >Assignee: Frederic Arno > Fix For: 1.1.0, 1.0.1 > > > When working on a custom Processor from which I am scheduling a punctuation > using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I > set, a call to my Punctuator is always triggered immediately. > Having a quick look at kafka-streams' code, I could find that all > PunctuationSchedule's timestamps are matched against the current time in > order to decide whether or not to trigger the punctuator > (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). > However, I've only seen code that initializes PunctuationSchedule's timestamp > to 0, which I guess is what is causing an immediate punctuation. > At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's > timestamp be initialized to current time + interval? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-6376) Improve Streams metrics for skipped recrods
[ https://issues.apache.org/jira/browse/KAFKA-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-6376: --- Labels: needs-kip (was: ) > Improve Streams metrics for skipped recrods > --- > > Key: KAFKA-6376 > URL: https://issues.apache.org/jira/browse/KAFKA-6376 > Project: Kafka > Issue Type: Bug > Components: metrics, streams >Affects Versions: 1.0.0 >Reporter: Matthias J. Sax > Labels: needs-kip > > Copy this from KIP-210 discussion thread: > {quote} > Note that currently we have two metrics for `skipped-records` on different > levels: > 1) on the highest level, the thread-level, we have a `skipped-records`, > that records all the skipped records due to deserialization errors. > 2) on the lower processor-node level, we have a > `skippedDueToDeserializationError`, that records the skipped records on > that specific source node due to deserialization errors. > So you can see that 1) does not cover any other scenarios and can just be > thought of as an aggregate of 2) across all the tasks' source nodes. > However, there are other places that can cause a record to be dropped, for > example: > 1) https://issues.apache.org/jira/browse/KAFKA-5784: records could be > dropped due to window elapsed. > 2) KIP-210: records could be dropped on the producer side. > 3) records could be dropped during user-customized processing on errors. > {quote} > [~guozhang] Not sure what you mean by "3) records could be dropped during > user-customized processing on errors." > Btw: we also drop record with {{null}} key and/or value for certain DSL > operations. This should be included as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately
[ https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293898#comment-16293898 ] Matthias J. Sax commented on KAFKA-6323: I like this. Having consistent behavior should improve user experience! We just need to document it in detail. > punctuate with WALL_CLOCK_TIME triggered immediately > > > Key: KAFKA-6323 > URL: https://issues.apache.org/jira/browse/KAFKA-6323 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 >Reporter: Frederic Arno >Assignee: Frederic Arno > Fix For: 1.1.0, 1.0.1 > > > When working on a custom Processor from which I am scheduling a punctuation > using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I > set, a call to my Punctuator is always triggered immediately. > Having a quick look at kafka-streams' code, I could find that all > PunctuationSchedule's timestamps are matched against the current time in > order to decide whether or not to trigger the punctuator > (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). > However, I've only seen code that initializes PunctuationSchedule's timestamp > to 0, which I guess is what is causing an immediate punctuation. > At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's > timestamp be initialized to current time + interval? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-6376) Improve Streams metrics for skipped records
[ https://issues.apache.org/jira/browse/KAFKA-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-6376: --- Summary: Improve Streams metrics for skipped records (was: Improve Streams metrics for skipped recrods) > Improve Streams metrics for skipped records > --- > > Key: KAFKA-6376 > URL: https://issues.apache.org/jira/browse/KAFKA-6376 > Project: Kafka > Issue Type: Bug > Components: metrics, streams >Affects Versions: 1.0.0 >Reporter: Matthias J. Sax > Labels: needs-kip > > Copy this from KIP-210 discussion thread: > {quote} > Note that currently we have two metrics for `skipped-records` on different > levels: > 1) on the highest level, the thread-level, we have a `skipped-records`, > that records all the skipped records due to deserialization errors. > 2) on the lower processor-node level, we have a > `skippedDueToDeserializationError`, that records the skipped records on > that specific source node due to deserialization errors. > So you can see that 1) does not cover any other scenarios and can just be > thought of as an aggregate of 2) across all the tasks' source nodes. > However, there are other places that can cause a record to be dropped, for > example: > 1) https://issues.apache.org/jira/browse/KAFKA-5784: records could be > dropped due to window elapsed. > 2) KIP-210: records could be dropped on the producer side. > 3) records could be dropped during user-customized processing on errors. > {quote} > [~guozhang] Not sure what you mean by "3) records could be dropped during > user-customized processing on errors." > Btw: we also drop record with {{null}} key and/or value for certain DSL > operations. This should be included as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6366) StackOverflowError in kafka-coordinator-heartbeat-thread
[ https://issues.apache.org/jira/browse/KAFKA-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293932#comment-16293932 ] Jason Gustafson commented on KAFKA-6366: [~joerg.heinicke] It would be helpful to see your consumer configuration and, if possible, the full logs. So far I've been unable to reproduce the problem. The only thing I can think of is that one thread is in a tight loop sending requests while the other is trying to mark the coordinator dead. As far as I can tell, the retry backoff should make this impossible, but I could be missing something. > StackOverflowError in kafka-coordinator-heartbeat-thread > > > Key: KAFKA-6366 > URL: https://issues.apache.org/jira/browse/KAFKA-6366 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 1.0.0 >Reporter: Joerg Heinicke > Attachments: 6366.v1.txt > > > With Kafka 1.0 our consumer groups fall into a permanent cycle of rebalancing > once a StackOverflowError in the heartbeat thread occurred due to > connectivity issues of the consumers to the coordinating broker: > Immediately before the exception there are hundreds, if not thousands of log > entries of following type: > 2017-12-12 16:23:12.361 [kafka-coordinator-heartbeat-thread | > my-consumer-group] INFO - [Consumer clientId=consumer-4, > groupId=my-consumer-group] Marking the coordinator : (id: > 2147483645 rack: null) dead > The exceptions always happen somewhere in the DateFormat code, even > though at different lines. > 2017-12-12 16:23:12.363 [kafka-coordinator-heartbeat-thread | > my-consumer-group] ERROR - Uncaught exception in thread > 'kafka-coordinator-heartbeat-thread | my-consumer-group': > java.lang.StackOverflowError > at > java.text.DateFormatSymbols.getProviderInstance(DateFormatSymbols.java:362) > at > java.text.DateFormatSymbols.getInstance(DateFormatSymbols.java:340) > at java.util.Calendar.getDisplayName(Calendar.java:2110) > at java.text.SimpleDateFormat.subFormat(SimpleDateFormat.java:1125) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:966) > at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936) > at java.text.DateFormat.format(DateFormat.java:345) > at > org.apache.log4j.helpers.PatternParser$DatePatternConverter.convert(PatternParser.java:443) > at > org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65) > at org.apache.log4j.PatternLayout.format(PatternLayout.java:506) > at > org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310) > at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) > at > org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) > at > org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) > at org.apache.log4j.Category.callAppenders(Category.java:206) > at org.apache.log4j.Category.forcedLog(Category.java:391) > at org.apache.log4j.Category.log(Category.java:856) > at > org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:324) > at > org.apache.kafka.common.utils.LogContext$KafkaLogger.info(LogContext.java:341) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead(AbstractCoordinator.java:649) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onFailure(AbstractCoordinator.java:797) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496) > ... > the following 9 lines are repeated around hundred times. > ... > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:353) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.failUnsentRequests(ConsumerNetworkClient.java:416) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.disconnect(ConsumerNetworkClient.java:388) > at >
[jira] [Created] (KAFKA-6376) Improve Streams metrics for skipped recrods
Matthias J. Sax created KAFKA-6376: -- Summary: Improve Streams metrics for skipped recrods Key: KAFKA-6376 URL: https://issues.apache.org/jira/browse/KAFKA-6376 Project: Kafka Issue Type: Bug Components: metrics, streams Affects Versions: 1.0.0 Reporter: Matthias J. Sax Copy this from KIP-210 discussion thread: {quote} Note that currently we have two metrics for `skipped-records` on different levels: 1) on the highest level, the thread-level, we have a `skipped-records`, that records all the skipped records due to deserialization errors. 2) on the lower processor-node level, we have a `skippedDueToDeserializationError`, that records the skipped records on that specific source node due to deserialization errors. So you can see that 1) does not cover any other scenarios and can just be thought of as an aggregate of 2) across all the tasks' source nodes. However, there are other places that can cause a record to be dropped, for example: 1) https://issues.apache.org/jira/browse/KAFKA-5784: records could be dropped due to window elapsed. 2) KIP-210: records could be dropped on the producer side. 3) records could be dropped during user-customized processing on errors. {quote} [~guozhang] Not sure what you mean by "3) records could be dropped during user-customized processing on errors." Btw: we also drop record with {{null}} key and/or value for certain DSL operations. This should be included as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately
[ https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294014#comment-16294014 ] Frederic Arno commented on KAFKA-6323: -- I updated my PR according to the above discussion, see https://github.com/apache/kafka/pull/4301 > punctuate with WALL_CLOCK_TIME triggered immediately > > > Key: KAFKA-6323 > URL: https://issues.apache.org/jira/browse/KAFKA-6323 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 >Reporter: Frederic Arno >Assignee: Frederic Arno > Fix For: 1.1.0, 1.0.1 > > > When working on a custom Processor from which I am scheduling a punctuation > using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I > set, a call to my Punctuator is always triggered immediately. > Having a quick look at kafka-streams' code, I could find that all > PunctuationSchedule's timestamps are matched against the current time in > order to decide whether or not to trigger the punctuator > (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). > However, I've only seen code that initializes PunctuationSchedule's timestamp > to 0, which I guess is what is causing an immediate punctuation. > At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's > timestamp be initialized to current time + interval? -- This message was sent by Atlassian JIRA (v6.4.14#64029)