[jira] [Created] (KAFKA-7214) Mystic FATAL error

2018-07-29 Thread Seweryn Habdank-Wojewodzki (JIRA)
Seweryn Habdank-Wojewodzki created KAFKA-7214:
-

 Summary: Mystic FATAL error
 Key: KAFKA-7214
 URL: https://issues.apache.org/jira/browse/KAFKA-7214
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.11.0.3
Reporter: Seweryn Habdank-Wojewodzki


Dears,

Very often at startup of the streaming application I got exception:

{code}
Exception caught in process. taskId=0_1, processor=KSTREAM-SOURCE-00, 
topic=my_instance_medium_topic, partition=1, offset=198900203; 
[org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:212),
 
org.apache.kafka.streams.processor.internals.AssignedTasks$2.apply(AssignedTasks.java:347),
 
org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:420),
 
org.apache.kafka.streams.processor.internals.AssignedTasks.process(AssignedTasks.java:339),
 
org.apache.kafka.streams.processor.internals.StreamThread.processAndPunctuate(StreamThread.java:648),
 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:513),
 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:482),
 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:459)]
 in thread 
my_application-my_instance-my_instance_medium-72ee1819-edeb-4d85-9d65-f67f7c321618-StreamThread-62
{code}

and then (without shutdown request from my side):

{code}
2018-07-30 07:45:02 [ar313] [INFO ] StreamThread:912 - stream-thread 
[my_application-my_instance-my_instance-72ee1819-edeb-4d85-9d65-f67f7c321618-StreamThread-62]
 State transition from PENDING_SHUTDOWN to DEAD.
{code}

What is this?
How to correctly handle it?

Thanks in advance for help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5998) /.checkpoint.tmp Not found exception

2018-07-29 Thread Yogesh BG (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561447#comment-16561447
 ] 

Yogesh BG commented on KAFKA-5998:
--

Is there any update on this? I know its not a issue, but would like to suppressĀ 
 the logs as log file is getting filled with these message

> /.checkpoint.tmp Not found exception
> 
>
> Key: KAFKA-5998
> URL: https://issues.apache.org/jira/browse/KAFKA-5998
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0, 0.11.0.1
>Reporter: Yogesh BG
>Priority: Major
> Attachments: 5998.v1.txt, 5998.v2.txt
>
>
> I have one kafka broker and one kafka stream running... I am running its 
> since two days under load of around 2500 msgs per second.. On third day am 
> getting below exception for some of the partitions, I have 16 partitions only 
> 0_0 and 0_1 gives this error
> {{09:43:25.955 [ks_0_inst-StreamThread-6] WARN  
> o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to 
> /data/kstreams/rtp-kafkastreams/0_1/.checkpoint:
> java.io.FileNotFoundException: 
> /data/kstreams/rtp-kafkastreams/0_1/.checkpoint.tmp (No such file or 
> directory)
> at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:221) 
> ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:171) 
> ~[na:1.7.0_111]
> at 
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:267)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:201)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:260)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:254)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:322)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:415)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:314)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:700)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:683)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:523)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> 09:43:25.974 [ks_0_inst-StreamThread-15] WARN  
> o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to 
> /data/kstreams/rtp-kafkastreams/0_0/.checkpoint:
> java.io.FileNotFoundException: 
> /data/kstreams/rtp-kafkastreams/0_0/.checkpoint.tmp (No such file or 
> directory)
> at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:221) 
> ~[na:1.7.0_111]
> at java.io.FileOutputStream.(FileOutputStream.java:171) 
> ~[na:1.7.0_111]
> at 
> org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> 

[jira] [Updated] (KAFKA-7180) In testHWCheckpointWithFailuresSingleLogSegment, wait until server1 has joined the ISR before shutting down server2

2018-07-29 Thread Ismael Juma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-7180:
---
Fix Version/s: (was: 2.0.0)
   2.0.1

> In testHWCheckpointWithFailuresSingleLogSegment, wait until server1 has 
> joined the ISR before shutting down server2
> ---
>
> Key: KAFKA-7180
> URL: https://issues.apache.org/jira/browse/KAFKA-7180
> Project: Kafka
>  Issue Type: Bug
>Reporter: Lucas Wang
>Assignee: Lucas Wang
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> In the testHWCheckpointWithFailuresSingleLogSegment method, the test logic is 
> 1. shutdown server1 and then capture the leadership of a partition in the 
> variable "leader", which should be server2
> 2. shutdown server2 and wait until the leadership has changed to a broker 
> other than server2
> through the line 
> waitUntilLeaderIsElectedOrChanged(zkClient, topic, partitionId, oldLeaderOpt 
> = Some(leader))
> However when we execute step 2 and shutdown server2, it's possible that 
> server1 has not caught up with the partition, and has not joined the ISR. 
> With unclean leader election turned off, the leadership cannot be transferred 
> to server1, causing the waited condition in step 2 to be never met. 
> The obvious fix is to wait until server1 has joined the ISR before shutting 
> down server2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7180) In testHWCheckpointWithFailuresSingleLogSegment, wait until server1 has joined the ISR before shutting down server2

2018-07-29 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7180.
-
Resolution: Fixed

> In testHWCheckpointWithFailuresSingleLogSegment, wait until server1 has 
> joined the ISR before shutting down server2
> ---
>
> Key: KAFKA-7180
> URL: https://issues.apache.org/jira/browse/KAFKA-7180
> Project: Kafka
>  Issue Type: Bug
>Reporter: Lucas Wang
>Assignee: Lucas Wang
>Priority: Minor
> Fix For: 2.1.0, 2.0.0
>
>
> In the testHWCheckpointWithFailuresSingleLogSegment method, the test logic is 
> 1. shutdown server1 and then capture the leadership of a partition in the 
> variable "leader", which should be server2
> 2. shutdown server2 and wait until the leadership has changed to a broker 
> other than server2
> through the line 
> waitUntilLeaderIsElectedOrChanged(zkClient, topic, partitionId, oldLeaderOpt 
> = Some(leader))
> However when we execute step 2 and shutdown server2, it's possible that 
> server1 has not caught up with the partition, and has not joined the ISR. 
> With unclean leader election turned off, the leadership cannot be transferred 
> to server1, causing the waited condition in step 2 to be never met. 
> The obvious fix is to wait until server1 has joined the ISR before shutting 
> down server2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7180) In testHWCheckpointWithFailuresSingleLogSegment, wait until server1 has joined the ISR before shutting down server2

2018-07-29 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-7180:

Fix Version/s: 2.1.0

> In testHWCheckpointWithFailuresSingleLogSegment, wait until server1 has 
> joined the ISR before shutting down server2
> ---
>
> Key: KAFKA-7180
> URL: https://issues.apache.org/jira/browse/KAFKA-7180
> Project: Kafka
>  Issue Type: Bug
>Reporter: Lucas Wang
>Assignee: Lucas Wang
>Priority: Minor
> Fix For: 2.0.0, 2.1.0
>
>
> In the testHWCheckpointWithFailuresSingleLogSegment method, the test logic is 
> 1. shutdown server1 and then capture the leadership of a partition in the 
> variable "leader", which should be server2
> 2. shutdown server2 and wait until the leadership has changed to a broker 
> other than server2
> through the line 
> waitUntilLeaderIsElectedOrChanged(zkClient, topic, partitionId, oldLeaderOpt 
> = Some(leader))
> However when we execute step 2 and shutdown server2, it's possible that 
> server1 has not caught up with the partition, and has not joined the ISR. 
> With unclean leader election turned off, the leadership cannot be transferred 
> to server1, causing the waited condition in step 2 to be never met. 
> The obvious fix is to wait until server1 has joined the ISR before shutting 
> down server2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7180) In testHWCheckpointWithFailuresSingleLogSegment, wait until server1 has joined the ISR before shutting down server2

2018-07-29 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-7180:

Fix Version/s: 2.0.0

> In testHWCheckpointWithFailuresSingleLogSegment, wait until server1 has 
> joined the ISR before shutting down server2
> ---
>
> Key: KAFKA-7180
> URL: https://issues.apache.org/jira/browse/KAFKA-7180
> Project: Kafka
>  Issue Type: Bug
>Reporter: Lucas Wang
>Assignee: Lucas Wang
>Priority: Minor
> Fix For: 2.0.0, 2.1.0
>
>
> In the testHWCheckpointWithFailuresSingleLogSegment method, the test logic is 
> 1. shutdown server1 and then capture the leadership of a partition in the 
> variable "leader", which should be server2
> 2. shutdown server2 and wait until the leadership has changed to a broker 
> other than server2
> through the line 
> waitUntilLeaderIsElectedOrChanged(zkClient, topic, partitionId, oldLeaderOpt 
> = Some(leader))
> However when we execute step 2 and shutdown server2, it's possible that 
> server1 has not caught up with the partition, and has not joined the ISR. 
> With unclean leader election turned off, the leadership cannot be transferred 
> to server1, causing the waited condition in step 2 to be never met. 
> The obvious fix is to wait until server1 has joined the ISR before shutting 
> down server2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6690) Priorities for Source Topics

2018-07-29 Thread Nick Afshartous (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561264#comment-16561264
 ] 

Nick Afshartous commented on KAFKA-6690:


[~guozhang] After I asked about how to proceed on the dev list,, [~enether] 
asked that I write a KIP.
Last Monday I requested permission on the dev list to create a KIP and I didn't 
see a reply granting permission.  

> Priorities for Source Topics
> 
>
> Key: KAFKA-6690
> URL: https://issues.apache.org/jira/browse/KAFKA-6690
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Bala Prassanna I
>Assignee: Nick Afshartous
>Priority: Major
>
> We often encounter use cases where we need to prioritise source topics. If a 
> consumer is listening more than one topic, say, HighPriorityTopic and 
> LowPriorityTopic, it should consume events from LowPriorityTopic only when 
> all the events from HighPriorityTopic are consumed. This is needed in Kafka 
> Streams processor topologies as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3514) Stream timestamp computation needs some further thoughts

2018-07-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561178#comment-16561178
 ] 

ASF GitHub Bot commented on KAFKA-3514:
---

guozhangwang opened a new pull request #5428: KAFKA-3514: Part III, Refactor 
StreamThread main loop
URL: https://github.com/apache/kafka/pull/5428
 
 
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Stream timestamp computation needs some further thoughts
> 
>
> Key: KAFKA-3514
> URL: https://issues.apache.org/jira/browse/KAFKA-3514
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: architecture
>
> Our current stream task's timestamp is used for punctuate function as well as 
> selecting which stream to process next (i.e. best effort stream 
> synchronization). And it is defined as the smallest timestamp over all 
> partitions in the task's partition group. This results in two unintuitive 
> corner cases:
> 1) observing a late arrived record would keep that stream's timestamp low for 
> a period of time, and hence keep being process until that late record. For 
> example take two partitions within the same task annotated by their 
> timestamps:
> {code}
> Stream A: 5, 6, 7, 8, 9, 1, 10
> {code}
> {code}
> Stream B: 2, 3, 4, 5
> {code}
> The late arrived record with timestamp "1" will cause stream A to be selected 
> continuously in the thread loop, i.e. messages with timestamp 5, 6, 7, 8, 9 
> until the record itself is dequeued and processed, then stream B will be 
> selected starting with timestamp 2.
> 2) an empty buffered partition will cause its timestamp to be not advanced, 
> and hence the task timestamp as well since it is the smallest among all 
> partitions. This may not be a severe problem compared with 1) above though.
> *Update*
> There is one more thing to consider (full discussion found here: 
> http://search-hadoop.com/m/Kafka/uyzND1iKZJN1yz0E5?subj=Order+of+punctuate+and+process+in+a+stream+processor)
> {quote}
> Let's assume the following case.
> - a stream processor that uses the Processor API
> - context.schedule(1000) is called in the init()
> - the processor reads only one topic that has one partition
> - using custom timestamp extractor, but that timestamp is just a wall 
> clock time
> Image the following events:
> 1., for 10 seconds I send in 5 messages / second
> 2., does not send any messages for 3 seconds
> 3., starts the 5 messages / second again
> I see that punctuate() is not called during the 3 seconds when I do not 
> send any messages. This is ok according to the documentation, because 
> there is not any new messages to trigger the punctuate() call. When the 
> first few messages arrives after a restart the sending (point 3. above) I 
> see the following sequence of method calls:
> 1., process() on the 1st message
> 2., punctuate() is called 3 times
> 3., process() on the 2nd message
> 4., process() on each following message
> What I would expect instead is that punctuate() is called first and then 
> process() is called on the messages, because the first message's timestamp 
> is already 3 seconds older then the last punctuate() was called, so the 
> first message belongs after the 3 punctuate() calls.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)