[jira] [Created] (KAFKA-6377) Flaky test BufferPoolTest.testBlockTimeout

2017-12-16 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-6377:
--

 Summary: Flaky test BufferPoolTest.testBlockTimeout
 Key: KAFKA-6377
 URL: https://issues.apache.org/jira/browse/KAFKA-6377
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Reporter: Matthias J. Sax


{noformat}
java.lang.AssertionError: available memory8
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.kafka.clients.producer.internals.BufferPoolTest.testBlockTimeout(BufferPoolTest.java:190)
{noformat}

https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/10036/testReport/junit/org.apache.kafka.clients.producer.internals/BufferPoolTest/testBlockTimeout/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6303) Potential lack of synchronization in NioEchoServer#AcceptorThread

2017-12-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated KAFKA-6303:
--
Component/s: network

> Potential lack of synchronization in NioEchoServer#AcceptorThread
> -
>
> Key: KAFKA-6303
> URL: https://issues.apache.org/jira/browse/KAFKA-6303
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Reporter: Ted Yu
>Assignee: siva santhalingam
>Priority: Minor
>
> In the run() method:
> {code}
> SocketChannel socketChannel = 
> ((ServerSocketChannel) key.channel()).accept();
> socketChannel.configureBlocking(false);
> newChannels.add(socketChannel);
> {code}
> Modification to newChannels should be protected by synchronized block.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately

2017-12-16 Thread Frederic Arno (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293719#comment-16293719
 ] 

Frederic Arno commented on KAFKA-6323:
--

Although I've got very little experience with Kafka (and none at all with 
STREAM_TIME punctuation), Guozhang's proposal sounds good to me.

I think there's something to discuss about managing the gap between 
punctuations. Let's say the last punctuation happened at T0, and the next is 
planned at T1. Let's also consider T2 = T1 + x*interval, where x >= 2.

With STREAM_TIME, if we get no data until T2, the gap is bigger than interval 
and according to Guozhang's proposal we punctuate only once at T2. And will 
punctuate next at T2+interval.

With WALL_CLOCK_TIME it could also happen that we don't effectively punctuate 
before T2 (GC pause, overload, ...). If that happens should we also only 
punctuate once at T2 and next at T2+interval?

> punctuate with WALL_CLOCK_TIME triggered immediately
> 
>
> Key: KAFKA-6323
> URL: https://issues.apache.org/jira/browse/KAFKA-6323
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Frederic Arno
>Assignee: Frederic Arno
> Fix For: 1.1.0, 1.0.1
>
>
> When working on a custom Processor from which I am scheduling a punctuation 
> using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I 
> set, a call to my Punctuator is always triggered immediately.
> Having a quick look at kafka-streams' code, I could find that all 
> PunctuationSchedule's timestamps are matched against the current time in 
> order to decide whether or not to trigger the punctuator 
> (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). 
> However, I've only seen code that initializes PunctuationSchedule's timestamp 
> to 0, which I guess is what is causing an immediate punctuation.
> At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's 
> timestamp be initialized to current time + interval?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6376) Improve Streams metrics for skipped recrods

2017-12-16 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6376:
---
Labels: needs-kip  (was: )

> Improve Streams metrics for skipped recrods
> ---
>
> Key: KAFKA-6376
> URL: https://issues.apache.org/jira/browse/KAFKA-6376
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics, streams
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>  Labels: needs-kip
>
> Copy this from KIP-210 discussion thread:
> {quote}
> Note that currently we have two metrics for `skipped-records` on different
> levels:
> 1) on the highest level, the thread-level, we have a `skipped-records`,
> that records all the skipped records due to deserialization errors.
> 2) on the lower processor-node level, we have a
> `skippedDueToDeserializationError`, that records the skipped records on
> that specific source node due to deserialization errors.
> So you can see that 1) does not cover any other scenarios and can just be
> thought of as an aggregate of 2) across all the tasks' source nodes.
> However, there are other places that can cause a record to be dropped, for
> example:
> 1) https://issues.apache.org/jira/browse/KAFKA-5784: records could be
> dropped due to window elapsed.
> 2) KIP-210: records could be dropped on the producer side.
> 3) records could be dropped during user-customized processing on errors.
> {quote}
> [~guozhang] Not sure what you mean by "3) records could be dropped during 
> user-customized processing on errors."
> Btw: we also drop record with {{null}} key and/or value for certain DSL 
> operations. This should be included as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately

2017-12-16 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293898#comment-16293898
 ] 

Matthias J. Sax commented on KAFKA-6323:


I like this. Having consistent behavior should improve user experience! We just 
need to document it in detail.

> punctuate with WALL_CLOCK_TIME triggered immediately
> 
>
> Key: KAFKA-6323
> URL: https://issues.apache.org/jira/browse/KAFKA-6323
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Frederic Arno
>Assignee: Frederic Arno
> Fix For: 1.1.0, 1.0.1
>
>
> When working on a custom Processor from which I am scheduling a punctuation 
> using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I 
> set, a call to my Punctuator is always triggered immediately.
> Having a quick look at kafka-streams' code, I could find that all 
> PunctuationSchedule's timestamps are matched against the current time in 
> order to decide whether or not to trigger the punctuator 
> (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). 
> However, I've only seen code that initializes PunctuationSchedule's timestamp 
> to 0, which I guess is what is causing an immediate punctuation.
> At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's 
> timestamp be initialized to current time + interval?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6376) Improve Streams metrics for skipped records

2017-12-16 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6376:
---
Summary: Improve Streams metrics for skipped records  (was: Improve Streams 
metrics for skipped recrods)

> Improve Streams metrics for skipped records
> ---
>
> Key: KAFKA-6376
> URL: https://issues.apache.org/jira/browse/KAFKA-6376
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics, streams
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>  Labels: needs-kip
>
> Copy this from KIP-210 discussion thread:
> {quote}
> Note that currently we have two metrics for `skipped-records` on different
> levels:
> 1) on the highest level, the thread-level, we have a `skipped-records`,
> that records all the skipped records due to deserialization errors.
> 2) on the lower processor-node level, we have a
> `skippedDueToDeserializationError`, that records the skipped records on
> that specific source node due to deserialization errors.
> So you can see that 1) does not cover any other scenarios and can just be
> thought of as an aggregate of 2) across all the tasks' source nodes.
> However, there are other places that can cause a record to be dropped, for
> example:
> 1) https://issues.apache.org/jira/browse/KAFKA-5784: records could be
> dropped due to window elapsed.
> 2) KIP-210: records could be dropped on the producer side.
> 3) records could be dropped during user-customized processing on errors.
> {quote}
> [~guozhang] Not sure what you mean by "3) records could be dropped during 
> user-customized processing on errors."
> Btw: we also drop record with {{null}} key and/or value for certain DSL 
> operations. This should be included as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6366) StackOverflowError in kafka-coordinator-heartbeat-thread

2017-12-16 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293932#comment-16293932
 ] 

Jason Gustafson commented on KAFKA-6366:


[~joerg.heinicke] It would be helpful to see your consumer configuration and, 
if possible, the full logs. So far I've been unable to reproduce the problem. 
The only thing I can think of is that one thread is in a tight loop sending 
requests while the other is trying to mark the coordinator dead. As far as I 
can tell, the retry backoff should make this impossible, but I could be missing 
something.

> StackOverflowError in kafka-coordinator-heartbeat-thread
> 
>
> Key: KAFKA-6366
> URL: https://issues.apache.org/jira/browse/KAFKA-6366
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 1.0.0
>Reporter: Joerg Heinicke
> Attachments: 6366.v1.txt
>
>
> With Kafka 1.0 our consumer groups fall into a permanent cycle of rebalancing 
> once a StackOverflowError in the heartbeat thread occurred due to 
> connectivity issues of the consumers to the coordinating broker:
> Immediately before the exception there are hundreds, if not thousands of log 
> entries of following type:
> 2017-12-12 16:23:12.361 [kafka-coordinator-heartbeat-thread | 
> my-consumer-group] INFO  - [Consumer clientId=consumer-4, 
> groupId=my-consumer-group] Marking the coordinator : (id: 
> 2147483645 rack: null) dead
> The exceptions always happen somewhere in the DateFormat code, even 
> though at different lines.
> 2017-12-12 16:23:12.363 [kafka-coordinator-heartbeat-thread | 
> my-consumer-group] ERROR - Uncaught exception in thread 
> 'kafka-coordinator-heartbeat-thread | my-consumer-group':
> java.lang.StackOverflowError
>  at 
> java.text.DateFormatSymbols.getProviderInstance(DateFormatSymbols.java:362)
>  at 
> java.text.DateFormatSymbols.getInstance(DateFormatSymbols.java:340)
>  at java.util.Calendar.getDisplayName(Calendar.java:2110)
>  at java.text.SimpleDateFormat.subFormat(SimpleDateFormat.java:1125)
>  at java.text.SimpleDateFormat.format(SimpleDateFormat.java:966)
>  at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
>  at java.text.DateFormat.format(DateFormat.java:345)
>  at 
> org.apache.log4j.helpers.PatternParser$DatePatternConverter.convert(PatternParser.java:443)
>  at 
> org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
>  at org.apache.log4j.PatternLayout.format(PatternLayout.java:506)
>  at 
> org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
>  at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
>  at 
> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
>  at 
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
>  at org.apache.log4j.Category.callAppenders(Category.java:206)
>  at org.apache.log4j.Category.forcedLog(Category.java:391)
>  at org.apache.log4j.Category.log(Category.java:856)
>  at 
> org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:324)
>  at 
> org.apache.kafka.common.utils.LogContext$KafkaLogger.info(LogContext.java:341)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead(AbstractCoordinator.java:649)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onFailure(AbstractCoordinator.java:797)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496)
> ...
> the following 9 lines are repeated around hundred times.
> ...
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:353)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.failUnsentRequests(ConsumerNetworkClient.java:416)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.disconnect(ConsumerNetworkClient.java:388)
>  at 
> 

[jira] [Created] (KAFKA-6376) Improve Streams metrics for skipped recrods

2017-12-16 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-6376:
--

 Summary: Improve Streams metrics for skipped recrods
 Key: KAFKA-6376
 URL: https://issues.apache.org/jira/browse/KAFKA-6376
 Project: Kafka
  Issue Type: Bug
  Components: metrics, streams
Affects Versions: 1.0.0
Reporter: Matthias J. Sax


Copy this from KIP-210 discussion thread:

{quote}
Note that currently we have two metrics for `skipped-records` on different
levels:

1) on the highest level, the thread-level, we have a `skipped-records`,
that records all the skipped records due to deserialization errors.
2) on the lower processor-node level, we have a
`skippedDueToDeserializationError`, that records the skipped records on
that specific source node due to deserialization errors.


So you can see that 1) does not cover any other scenarios and can just be
thought of as an aggregate of 2) across all the tasks' source nodes.
However, there are other places that can cause a record to be dropped, for
example:

1) https://issues.apache.org/jira/browse/KAFKA-5784: records could be
dropped due to window elapsed.
2) KIP-210: records could be dropped on the producer side.
3) records could be dropped during user-customized processing on errors.
{quote}

[~guozhang] Not sure what you mean by "3) records could be dropped during 
user-customized processing on errors."

Btw: we also drop record with {{null}} key and/or value for certain DSL 
operations. This should be included as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately

2017-12-16 Thread Frederic Arno (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294014#comment-16294014
 ] 

Frederic Arno commented on KAFKA-6323:
--

I updated my PR according to the above discussion, see 
https://github.com/apache/kafka/pull/4301

> punctuate with WALL_CLOCK_TIME triggered immediately
> 
>
> Key: KAFKA-6323
> URL: https://issues.apache.org/jira/browse/KAFKA-6323
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Frederic Arno
>Assignee: Frederic Arno
> Fix For: 1.1.0, 1.0.1
>
>
> When working on a custom Processor from which I am scheduling a punctuation 
> using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I 
> set, a call to my Punctuator is always triggered immediately.
> Having a quick look at kafka-streams' code, I could find that all 
> PunctuationSchedule's timestamps are matched against the current time in 
> order to decide whether or not to trigger the punctuator 
> (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). 
> However, I've only seen code that initializes PunctuationSchedule's timestamp 
> to 0, which I guess is what is causing an immediate punctuation.
> At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's 
> timestamp be initialized to current time + interval?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)