[jira] [Created] (KAFKA-8439) Readability issues on mobile phone

2019-05-27 Thread Daniel Noga (JIRA)
Daniel Noga created KAFKA-8439:
--

 Summary: Readability issues on mobile phone
 Key: KAFKA-8439
 URL: https://issues.apache.org/jira/browse/KAFKA-8439
 Project: Kafka
  Issue Type: Bug
  Components: website
Reporter: Daniel Noga
 Attachments: Screenshot_20190528_083509.png, 
Screenshot_20190528_083603.png

I tried to read [https://kafka.apache.org/22/documentation/streams/quickstart] 
page on my Samsung Galaxy S7 mobile phone and I have two difficulties.

First, zooming is disabled. So when I want to see the image on the bottom of 
site, I cannot zoom it or open it. So I don't see content of the image -> zoom 
should be enabled.

Second, when I switch to portrait view, I have probably 1/3 of the page hidden 
by menu -> bottom menu should not be visible always.

 

Images in attachment are screenshoted in desktop Chrome with developer tools.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8438) Add API to allow user to define end behavior of consumer failure

2019-05-27 Thread Boyang Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849322#comment-16849322
 ] 

Boyang Chen commented on KAFKA-8438:


Thanks for the proposal [~Yohan123]! it definitely does no harm for user to 
register a callback in case of a consumer failure, if they want to add alerting 
calls to customized service. However my assumption is that usually a thread 
consumer would die due to some uncaught exception during data processing or sth 
similar. If the exception is fatal, could we guarantee the triggering of this 
callback if the failure is not caused by consumer itself?

Also in terms of bootstrapping a new thread at this moment, who would be the 
caller? Does that mean we need to initialize a daemon thread on start-up to 
monitor consumer thread health?

The current plan is to use external monitoring service to detect consumer 
failures like this. So when something goes wrong, we could just choose to 
restart the service so that consumer will use the same `group.instance.id` to 
rejoin. Still, I feel this new API should be useful, but may be an overkill to 
solve the rebalance problem.

Maybe [~guozhang] and [~hachikuji] could also share some opinions here.

> Add API to allow user to define end behavior of consumer failure
> 
>
> Key: KAFKA-8438
> URL: https://issues.apache.org/jira/browse/KAFKA-8438
> Project: Kafka
>  Issue Type: New Feature
>  Components: consumer
>Reporter: Richard Yu
>Priority: Major
>  Labels: needs-dicussion, needs-kip
>
> Recently, in a concerted effort to make Kafka's rebalances less painful, 
> various approaches has been used to reduce the number of and impact of 
> rebalances. Often, the trigger of a rebalance is a failure of some sort, in 
> which case, the workload will be redistributed among surviving threads. 
> Working to reduce rebalances due to random consumer crashes, a recent change 
> to Kafka internals had been made (which introduces the concept of static 
> membership) that prevents a rebalance from occurring within 
> {{session.timeout.ms}} in the hope that the consumer thread which crashed 
> would recover in that time interval and rejoin the group.
> However, in some cases, some consumer threads would permanently go down or 
> remain dead for long periods of time. In these scenarios, users of Kafka 
> would possibly not be aware of such a crash until hours later after it 
> happened which forces Kafka users to manually start a new KafkaConsumer 
> process a considerable period of time after the failure had occurred. That is 
> where the addition of a callback such as {{onConsumerFailure}} would help. 
> There are multiple use cases for this callback (which is defined by the 
> user). {{onConsumerFailure}} is called when a particular consumer thread goes 
> under for some specified time interval (i.e. a config called 
> {{acceptable.consumer.failure.timeout.ms}}). When called, this method could 
> be used to log a consumer failure or should the user wish it, create a new 
> thread which would then rejoin the consumer group (which could also include 
> the required {{group.instance.id}} so that a rebalance wouldn't be 
> re-triggered –- we would need to think about that). 
> Should the old thread recover and attempt to rejoin the consumer group (with 
> the substitute thread being part of the group), the old thread will be denied 
> access and an exception would be thrown (to indicate that another process has 
> already taken its place).
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8438) Add API to allow user to define end behavior of consumer failure

2019-05-27 Thread Richard Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849321#comment-16849321
 ] 

Richard Yu commented on KAFKA-8438:
---

Oh, by the way, will probably work on this issue once I'm done with the other 
PRs I have in progress.

> Add API to allow user to define end behavior of consumer failure
> 
>
> Key: KAFKA-8438
> URL: https://issues.apache.org/jira/browse/KAFKA-8438
> Project: Kafka
>  Issue Type: New Feature
>  Components: consumer
>Reporter: Richard Yu
>Priority: Major
>  Labels: needs-dicussion, needs-kip
>
> Recently, in a concerted effort to make Kafka's rebalances less painful, 
> various approaches has been used to reduce the number of and impact of 
> rebalances. Often, the trigger of a rebalance is a failure of some sort, in 
> which case, the workload will be redistributed among surviving threads. 
> Working to reduce rebalances due to random consumer crashes, a recent change 
> to Kafka internals had been made (which introduces the concept of static 
> membership) that prevents a rebalance from occurring within 
> {{session.timeout.ms}} in the hope that the consumer thread which crashed 
> would recover in that time interval and rejoin the group.
> However, in some cases, some consumer threads would permanently go down or 
> remain dead for long periods of time. In these scenarios, users of Kafka 
> would possibly not be aware of such a crash until hours later after it 
> happened which forces Kafka users to manually start a new KafkaConsumer 
> process a considerable period of time after the failure had occurred. That is 
> where the addition of a callback such as {{onConsumerFailure}} would help. 
> There are multiple use cases for this callback (which is defined by the 
> user). {{onConsumerFailure}} is called when a particular consumer thread goes 
> under for some specified time interval (i.e. a config called 
> {{acceptable.consumer.failure.timeout.ms}}). When called, this method could 
> be used to log a consumer failure or should the user wish it, create a new 
> thread which would then rejoin the consumer group (which could also include 
> the required {{group.instance.id}} so that a rebalance wouldn't be 
> re-triggered –- we would need to think about that). 
> Should the old thread recover and attempt to rejoin the consumer group (with 
> the substitute thread being part of the group), the old thread will be denied 
> access and an exception would be thrown (to indicate that another process has 
> already taken its place).
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8438) Add API to allow user to define end behavior of consumer failure

2019-05-27 Thread Richard Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849287#comment-16849287
 ] 

Richard Yu commented on KAFKA-8438:
---

cc [~bchen225242] WDYT?

> Add API to allow user to define end behavior of consumer failure
> 
>
> Key: KAFKA-8438
> URL: https://issues.apache.org/jira/browse/KAFKA-8438
> Project: Kafka
>  Issue Type: New Feature
>  Components: consumer
>Reporter: Richard Yu
>Priority: Major
>  Labels: needs-dicussion, needs-kip
>
> Recently, in a concerted effort to make Kafka's rebalances less painful, 
> various approaches has been used to reduce the number of and impact of 
> rebalances. Often, the trigger of a rebalance is a failure of some sort, in 
> which case, the workload will be redistributed among surviving threads. 
> Working to reduce rebalances due to random consumer crashes, a recent change 
> to Kafka internals had been made (which introduces the concept of static 
> membership) that prevents a rebalance from occurring within 
> {{session.timeout.ms}} in the hope that the consumer thread which crashed 
> would recover in that time interval and rejoin the group.
> However, in some cases, some consumer threads would permanently go down or 
> remain dead for long periods of time. In these scenarios, users of Kafka 
> would possibly not be aware of such a crash until hours later after it 
> happened which forces Kafka users to manually start a new KafkaConsumer 
> process a considerable period of time after the failure had occurred. That is 
> where the addition of a callback such as {{onConsumerFailure}} would help. 
> There are multiple use cases for this callback (which is defined by the 
> user). {{onConsumerFailure}} is called when a particular consumer thread goes 
> under for some specified time interval (i.e. a config called 
> {{acceptable.consumer.failure.timeout.ms}}). When called, this method could 
> be used to log a consumer failure or should the user wish it, create a new 
> thread which would then rejoin the consumer group (which could also include 
> the required {{group.instance.id}} so that a rebalance wouldn't be 
> re-triggered –- we would need to think about that). 
> Should the old thread recover and attempt to rejoin the consumer group (with 
> the substitute thread being part of the group), the old thread will be denied 
> access and an exception would be thrown (to indicate that another process has 
> already taken its place).
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8438) Add API to allow user to define end behavior of consumer failure

2019-05-27 Thread Richard Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Yu updated KAFKA-8438:
--
Description: 
Recently, in a concerted effort to make Kafka's rebalances less painful, 
various approaches has been used to reduce the number of and impact of 
rebalances. Often, the trigger of a rebalance is a failure of some sort, in 
which case, the workload will be redistributed among surviving threads. Working 
to reduce rebalances due to random consumer crashes, a recent change to Kafka 
internals had been made (which introduces the concept of static membership) 
that prevents a rebalance from occurring within {{session.timeout.ms}} in the 
hope that the consumer thread which crashed would recover in that time interval 
and rejoin the group.

However, in some cases, some consumer threads would permanently go down or 
remain dead for long periods of time. In these scenarios, users of Kafka would 
possibly not be aware of such a crash until hours later after it happened which 
forces Kafka users to manually start a new KafkaConsumer process a considerable 
period of time after the failure had occurred. That is where the addition of a 
callback such as {{onConsumerFailure}} would help. There are multiple use cases 
for this callback (which is defined by the user). {{onConsumerFailure}} is 
called when a particular consumer thread goes under for some specified time 
interval (i.e. a config called {{acceptable.consumer.failure.timeout.ms}}). 
When called, this method could be used to log a consumer failure or should the 
user wish it, create a new thread which would then rejoin the consumer group 
(which could also include the required {{group.instance.id}} so that a 
rebalance wouldn't be re-triggered –- we would need to think about that). 

Should the old thread recover and attempt to rejoin the consumer group (with 
the substitute thread being part of the group), the old thread will be denied 
access and an exception would be thrown (to indicate that another process has 
already taken its place).

 

 

 

  was:
Recently, in a concerted effort to make Kafka's rebalances less painful, 
various approaches has been used to reduce the number of and impact of 
rebalances. Often, the trigger of a rebalance is a failure of some sort, in 
which case, the workload will be redistributed among surviving threads. Working 
to reduce rebalances due to random consumer crashes, a recent change to Kafka 
internals had been made (which introduces the concept of static membership) 
that prevents a rebalance from occurring within {{session.timeout.ms}} in the 
hope that the consumer thread which crashed would recover in that time interval 
and rejoin the group.

However, in some cases, some consumer threads would permanently go down or 
remain dead for long periods of time. In these scenarios, users of Kafka would 
possibly not be aware of such a crash until hours later after it happened which 
forces Kafka users to manually start a new KafkaConsumer process a considerable 
period of time after the failure had occurred. That is where the addition of a 
callback such as {{onConsumerFailure}} would help. There are multiple use cases 
for this callback (which is defined by the user). {{onConsumerFailure}} is 
called when a particular consumer thread goes under for some specified time 
interval (i.e. a config called {{acceeptable.consumer.failure.timeout.ms}}). 
When called, this method could be used to log a consumer failure or should the 
user wish it, create a new thread which would then rejoin the consumer group 
(which could also include the required {{group.instance.id}} so that a 
rebalance wouldn't be re-triggered). 

Should the old thread recover and attempt to rejoin the consumer group (with 
the substitute thread being part of the group), the old thread will be denied 
access and an exception would be thrown (to indicate that another process has 
already taken its place).

 

 

 


> Add API to allow user to define end behavior of consumer failure
> 
>
> Key: KAFKA-8438
> URL: https://issues.apache.org/jira/browse/KAFKA-8438
> Project: Kafka
>  Issue Type: New Feature
>  Components: consumer
>Reporter: Richard Yu
>Priority: Major
>  Labels: needs-dicussion, needs-kip
>
> Recently, in a concerted effort to make Kafka's rebalances less painful, 
> various approaches has been used to reduce the number of and impact of 
> rebalances. Often, the trigger of a rebalance is a failure of some sort, in 
> which case, the workload will be redistributed among surviving threads. 
> Working to reduce rebalances due to random consumer crashes, a recent change 
> to Kafka internals had been made (which introduces the concept of static 
> membership) that prevents a rebalance from 

[jira] [Updated] (KAFKA-8438) Add API to allow user to define end behavior of consumer failure

2019-05-27 Thread Richard Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Yu updated KAFKA-8438:
--
Description: 
Recently, in a concerted effort to make Kafka's rebalances less painful, 
various approaches has been used to reduce the number of and impact of 
rebalances. Often, the trigger of a rebalance is a failure of some sort, in 
which case, the workload will be redistributed among surviving threads. Working 
to reduce rebalances due to random consumer crashes, a recent change to Kafka 
internals had been made (which introduces the concept of static membership) 
that prevents a rebalance from occurring within {{session.timeout.ms}} in the 
hope that the consumer thread which crashed would recover in that time interval 
and rejoin the group.

However, in some cases, some consumer threads would permanently go down or 
remain dead for long periods of time. In these scenarios, users of Kafka would 
possibly not be aware of such a crash until hours later after it happened which 
forces Kafka users to manually start a new KafkaConsumer process a considerable 
period of time after the failure had occurred. That is where the addition of a 
callback such as {{onConsumerFailure}} would help. There are multiple use cases 
for this callback (which is defined by the user). {{onConsumerFailure}} is 
called when a particular consumer thread goes under for some specified time 
interval (i.e. a config called {{acceeptable.consumer.failure.timeout.ms}}). 
When called, this method could be used to log a consumer failure or should the 
user wish it, create a new thread which would then rejoin the consumer group 
(which could also include the required {{group.instance.id}} so that a 
rebalance wouldn't be re-triggered). 

Should the old thread recover and attempt to rejoin the consumer group (with 
the substitute thread being part of the group), the old thread will be denied 
access and an exception would be thrown (to indicate that another process has 
already taken its place).

 

 

 

  was:
Recently, in a concerted effort to make Kafka's rebalances less painful, 
various approaches has been used to reduce the number of and impact of 
rebalances. Often, the trigger of a rebalance is a failure of some sort, in 
which case, the workload will be redistributed among surviving threads. Working 
to reduce rebalances due to random consumer crashes, a recent change to Kafka 
internals had been made (which introduces the concept of static membership) 
that prevents a rebalance from occurring within {{session.timeout.ms}} in the 
hope that the consumer thread which crashed would recover in that time interval.

However, in some cases, some consumer threads would permanently go down or 
remain dead for long periods of time. In these scenarios, users of Kafka would 
possibly not be aware of such a crash until hours later after it happened which 
forces Kafka users to manually start a new KafkaConsumer process a considerable 
period of time after the failure had occurred. That is where the addition of a 
callback such as {{onConsumerFailure}} would help. There are multiple use cases 
for this callback (which is defined by the user). {{onConsumerFailure}} is 
called when a particular consumer thread goes under for some specified time 
interval (i.e. a config called {{acceeptable.consumer.failure.timeout.ms}}). 
When called, this method could be used to log a consumer failure or should the 
user wish it, create a new thread which would then rejoin the consumer group 
(which could also include the required {{group.instance.id}} so that a 
rebalance wouldn't be re-triggered). 

Should the old thread recover and attempt to rejoin the consumer group (with 
the substitute thread being part of the group), the old thread will be denied 
access and an exception would be thrown (to indicate that another process has 
already taken its place).

 

 

 


> Add API to allow user to define end behavior of consumer failure
> 
>
> Key: KAFKA-8438
> URL: https://issues.apache.org/jira/browse/KAFKA-8438
> Project: Kafka
>  Issue Type: New Feature
>  Components: consumer
>Reporter: Richard Yu
>Priority: Major
>  Labels: needs-dicussion, needs-kip
>
> Recently, in a concerted effort to make Kafka's rebalances less painful, 
> various approaches has been used to reduce the number of and impact of 
> rebalances. Often, the trigger of a rebalance is a failure of some sort, in 
> which case, the workload will be redistributed among surviving threads. 
> Working to reduce rebalances due to random consumer crashes, a recent change 
> to Kafka internals had been made (which introduces the concept of static 
> membership) that prevents a rebalance from occurring within 
> {{session.timeout.ms}} in the hope tha

[jira] [Created] (KAFKA-8438) Add API to allow user to define end behavior of consumer failure

2019-05-27 Thread Richard Yu (JIRA)
Richard Yu created KAFKA-8438:
-

 Summary: Add API to allow user to define end behavior of consumer 
failure
 Key: KAFKA-8438
 URL: https://issues.apache.org/jira/browse/KAFKA-8438
 Project: Kafka
  Issue Type: New Feature
  Components: consumer
Reporter: Richard Yu


Recently, in a concerted effort to make Kafka's rebalances less painful, 
various approaches has been used to reduce the number of and impact of 
rebalances. Often, the trigger of a rebalance is a failure of some sort, in 
which case, the workload will be redistributed among surviving threads. Working 
to reduce rebalances due to random consumer crashes, a recent change to Kafka 
internals had been made (which introduces the concept of static membership) 
that prevents a rebalance from occurring within {{session.timeout.ms}} in the 
hope that the consumer thread which crashed would recover in that time interval.

However, in some cases, some consumer threads would permanently go down or 
remain dead for long periods of time. In these scenarios, users of Kafka would 
possibly not be aware of such a crash until hours later after it happened which 
forces Kafka users to manually start a new KafkaConsumer process a considerable 
period of time after the failure had occurred. That is where the addition of a 
callback such as {{onConsumerFailure}} would help. There are multiple use cases 
for this callback (which is defined by the user). {{onConsumerFailure}} is 
called when a particular consumer thread goes under for some specified time 
interval (i.e. a config called {{acceeptable.consumer.failure.timeout.ms}}). 
When called, this method could be used to log a consumer failure or should the 
user wish it, create a new thread which would then rejoin the consumer group 
(which could also include the required {{group.instance.id}} so that a 
rebalance wouldn't be re-triggered). 

Should the old thread recover and attempt to rejoin the consumer group (with 
the substitute thread being part of the group), the old thread will be denied 
access and an exception would be thrown (to indicate that another process has 
already taken its place).

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8219) Add web documentation for static membership

2019-05-27 Thread Boyang Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-8219.

Resolution: Fixed

> Add web documentation for static membership
> ---
>
> Key: KAFKA-8219
> URL: https://issues.apache.org/jira/browse/KAFKA-8219
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer, streams
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Need official documentation update.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-2939) Make AbstractConfig.logUnused() tunable for clients

2019-05-27 Thread Rens Groothuijsen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rens Groothuijsen reassigned KAFKA-2939:


Assignee: (was: Rens Groothuijsen)

> Make AbstractConfig.logUnused() tunable for clients
> ---
>
> Key: KAFKA-2939
> URL: https://issues.apache.org/jira/browse/KAFKA-2939
> Project: Kafka
>  Issue Type: Improvement
>  Components: config
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: newbie
>
> Today we always log unused configs in KafkaProducer / KafkaConsumer in their 
> constructors, however for some cases like Kafka Streams that make use of 
> these clients, other configs may be passed in to configure Partitioner / 
> Serializer classes, etc. So it would be better to make this function call 
> optional to avoid printing unnecessary and confusing WARN entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8187) State store record loss across multiple reassignments when using standby tasks

2019-05-27 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849218#comment-16849218
 ] 

Guozhang Wang commented on KAFKA-8187:
--

[~hustclf] Thanks for filling the PR! [~bbejeck] and I will take a look and see 
if it can be merged to 2.3 asap.

> State store record loss across multiple reassignments when using standby tasks
> --
>
> Key: KAFKA-8187
> URL: https://issues.apache.org/jira/browse/KAFKA-8187
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.0.1
>Reporter: William Greer
>Assignee: Bill Bejeck
>Priority: Major
>
> Overview:
> There is a race condition that can cause a partitioned state store to be 
> missing records up to an offset when using standby tasks.
> When a reassignment occurs and a task is migrated to a StandbyTask in another 
> StreamThread/TaskManager on the same JVM, there can be lock contention that 
> prevents the StandbyTask on the currently assigned StreamThread from 
> acquiring the lock and to not retry acquiring the lock because all of the 
> active StreamTasks are running for that StreamThread. If the StandbyTask does 
> not acquire the lock before the StreamThread enters into the RUNNING state, 
> then the StandbyTask will not consume any records. If there is no subsequent 
> reassignment before the second execution of the stateDirCleaner Thread, then 
> the task directory for the StandbyTask will be deleted. When the next 
> reassignment occurs the offset that was read by the StandbyTask at creation 
> time before acquiring the lock will be written back to the state store 
> directory, this re-creates the state store directory.
> An example:
> StreamThread(A) and StreamThread(B) are running on the same JVM in the same 
> streams application.
> StreamThread(A) has StandbyTask 1_0
> StreamThread(B) has no tasks
> A reassignment is triggered by another host in the streams application fleet.
> StreamThread(A) is notified with a PARTITIONS_REVOKED event of the threads 
> one task
> StreamThread(B) is notified with a PARTITIONS_ASSIGNED event of a standby 
> task for 1_0
> Here begins the race condition.
> StreamThread(B) creates the StandbyTask which reads the current checkpoint 
> from disk.
> StreamThread(B) then attempts to updateNewAndRestoringTasks() for it's 
> assigned tasks. [0]
> StreamThread(B) initializes the new tasks for the active and standby tasks. 
> [1] [2]
> StreamThread(B) attempts to lock the state directory for task 1_0 but fails 
> with a LockException [3], since StreamThread(A) still holds the lock.
> StreamThread(B) returns true from updateNewAndRestoringTasks() due to the 
> check at [4] which only checks that the active assigned tasks are running.
> StreamThread(B) state is set to RUNNING
> StreamThread(A) closes the previous StandbyTask specifically calling 
> closeStateManager() [5]
> StreamThread(A) state is set to RUNNING
> Streams application for this host has completed re-balancing and is now in 
> the RUNNING state.
> State at this point is the following: State directory exists for 1_0 and all 
> data is present.
> Then at a period that is 1 to 2 intervals of [6](which is default of 10 
> minutes) after the reassignment had completed the stateDirCleaner thread will 
> execute [7].
> The stateDirCleaner will then do [8], which finds the directory 1_0, finds 
> that there isn't an active lock for that directory, acquire the lock, and 
> deletes the directory.
> State at this point is the following: State directory does not exist for 1_0.
> When the next reassignment occurs. The offset that was read by 
> StreamThread(B) during construction of the StandbyTask for 1_0 will be 
> written back to disk. This write re-creates the state store directory and 
> writes the .checkpoint file with the old offset.
> State at this point is the following: State directory exists for 1_0 with a 
> '.checkpoint' file in it, but there is no other state store data in the 
> directory.
> If this host is assigned the active task for 1_0 then all the history in the 
> state store will be missing from before the offset that was read at the 
> previous reassignment. 
> If this host is assigned the standby task for 1_0 then the lock will be 
> acquired and the standby will start to consume records, but it will still be 
> missing all records from before the offset that was read at the previous 
> reassignment.
> If this host is not assigned 1_0, then the state directory will get cleaned 
> up by the stateDirCleaner thread 10 to 20 minutes later and the record loss 
> issue will be hidden.
> [0] 
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java#L865-L869
> [1] 
> https://

[jira] [Commented] (KAFKA-8219) Add web documentation for static membership

2019-05-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849217#comment-16849217
 ] 

ASF GitHub Bot commented on KAFKA-8219:
---

guozhangwang commented on pull request #6790: KAFKA-8219: add doc changes for 
static membership release
URL: https://github.com/apache/kafka/pull/6790
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add web documentation for static membership
> ---
>
> Key: KAFKA-8219
> URL: https://issues.apache.org/jira/browse/KAFKA-8219
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer, streams
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Need official documentation update.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8437) Consumer should wait for api version before checking if offset validation is possible

2019-05-27 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-8437.

Resolution: Fixed

> Consumer should wait for api version before checking if offset validation is 
> possible
> -
>
> Key: KAFKA-8437
> URL: https://issues.apache.org/jira/browse/KAFKA-8437
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> KAFKA-8422 fixed a problem with ACL assumptions made during offset 
> validation. If using an old api version, we skip the validation. However, if 
> we do not have an active connection to the node, then we will not have api 
> version information available. Rather than skipping the validation check as 
> we did in KAFKA-8422, we should await the version information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8437) Consumer should wait for api version before checking if offset validation is possible

2019-05-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849214#comment-16849214
 ] 

ASF GitHub Bot commented on KAFKA-8437:
---

hachikuji commented on pull request #6823: KAFKA-8437; Await node api versions 
before checking if offset validation is possible
URL: https://github.com/apache/kafka/pull/6823
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consumer should wait for api version before checking if offset validation is 
> possible
> -
>
> Key: KAFKA-8437
> URL: https://issues.apache.org/jira/browse/KAFKA-8437
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> KAFKA-8422 fixed a problem with ACL assumptions made during offset 
> validation. If using an old api version, we skip the validation. However, if 
> we do not have an active connection to the node, then we will not have api 
> version information available. Rather than skipping the validation check as 
> we did in KAFKA-8422, we should await the version information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8437) Consumer should wait for api version before checking if offset validation is possible

2019-05-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849122#comment-16849122
 ] 

ASF GitHub Bot commented on KAFKA-8437:
---

hachikuji commented on pull request #6823: KAFKA-8437; Await node api versions 
before checking if offset validation is possible
URL: https://github.com/apache/kafka/pull/6823
 
 
   The consumer should await api version information before determining whether 
the broker supports offset validation. In KAFKA-8422, we skip the validation if 
we don't have api version information, which means we always skip validation 
the first time we connect to a node.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consumer should wait for api version before checking if offset validation is 
> possible
> -
>
> Key: KAFKA-8437
> URL: https://issues.apache.org/jira/browse/KAFKA-8437
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> KAFKA-8422 fixed a problem with ACL assumptions made during offset 
> validation. If using an old api version, we skip the validation. However, if 
> we do not have an active connection to the node, then we will not have api 
> version information available. Rather than skipping the validation check as 
> we did in KAFKA-8422, we should await the version information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8437) Consumer should wait for api version before checking if offset validation is possible

2019-05-27 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-8437:
--

 Summary: Consumer should wait for api version before checking if 
offset validation is possible
 Key: KAFKA-8437
 URL: https://issues.apache.org/jira/browse/KAFKA-8437
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


KAFKA-8422 fixed a problem with ACL assumptions made during offset validation. 
If using an old api version, we skip the validation. However, if we do not have 
an active connection to the node, then we will not have api version information 
available. Rather than skipping the validation check as we did in KAFKA-8422, 
we should await the version information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8433) Give the opportunity to use serializers and deserializers with IntegrationTestUtils

2019-05-27 Thread Anthony Callaert (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849083#comment-16849083
 ] 

Anthony Callaert commented on KAFKA-8433:
-

[~mjsax] I'm agree with you. Testing without public API was not the best 
implementation and I resolve it by testing topologies.

Maybe it will be useless to merge this PR if there is an intention to remove 
non internally used methods.

Regards

> Give the opportunity to use serializers and deserializers with 
> IntegrationTestUtils
> ---
>
> Key: KAFKA-8433
> URL: https://issues.apache.org/jira/browse/KAFKA-8433
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Anthony Callaert
>Priority: Minor
>
> Currently, each static method using a producer or a consumer don't allow to 
> pass serializers or deserializers as arguments.
> Because of that we are not able to mock schema registry (for example), or 
> other producer / consumer specific attributs.
> To resolve that we just need to add methods using serializers or 
> deserializers as arguments.
> Kafka producer and consumer constructors already accept null serializers or 
> deserializers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7763) KafkaProducer with transactionId endless waits when network is disconnection for 10-20s

2019-05-27 Thread Terence Mill (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849037#comment-16849037
 ] 

Terence Mill commented on KAFKA-7763:
-

@huxihx Is there a snapshot version (repo) we can use which includes the fix?

> KafkaProducer with transactionId endless waits when network is disconnection 
> for 10-20s
> ---
>
> Key: KAFKA-7763
> URL: https://issues.apache.org/jira/browse/KAFKA-7763
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, producer 
>Affects Versions: 2.1.0
>Reporter: weasker
>Assignee: huxihx
>Priority: Major
> Fix For: 2.3.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> When the client disconnect with the bootstrap server, a KafkaProducer with 
> transactionId endless waits on commitTransaction, the question is the same 
> with below issues:
> https://issues.apache.org/jira/browse/KAFKA-6446
> the reproduce condition you can do it as belows:
> 1、producer.initTransactions();
> 2、producer.beginTransaction();
> 3、producer.send(record1);//set the breakpoint here
> key step: run the breakpoint above 3 then disconnect the network by manual, 
> 10-20seconds recover the network and continute the program by canceling the 
> breakpoint
> 4、producer.send(record2);
> 5、producer.commitTransaction();//endless waits
>  
> I found in 2.1.0 version the modificaiton about the initTransactions method, 
> but the 
> commitTransaction and abortTransaction method, I think it's the same question 
> with initTransactions...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7994) Improve Stream-Time for rebalances and restarts

2019-05-27 Thread Richard Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849022#comment-16849022
 ] 

Richard Yu commented on KAFKA-7994:
---

Hi Bruno. I think this one of the cases which this issue is working to fix. I 
think if my PR is merged, then this example you mentioned should no longer be a 
problem (time is reset correctly after restarts). 

> Improve Stream-Time for rebalances and restarts
> ---
>
> Key: KAFKA-7994
> URL: https://issues.apache.org/jira/browse/KAFKA-7994
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Richard Yu
>Priority: Major
> Attachments: possible-patch.diff
>
>
> We compute a per-partition partition-time as the maximum timestamp over all 
> records processed so far. Furthermore, we use partition-time to compute 
> stream-time for each task as maximum over all partition-times (for all 
> corresponding task partitions). This stream-time is used to make decisions 
> about processing out-of-order records or drop them if they are late (ie, 
> timestamp < stream-time - grace-period).
> During rebalances and restarts, stream-time is initialized as UNKNOWN (ie, 
> -1) for tasks that are newly created (or migrated). In net effect, we forget 
> current stream-time for this case what may lead to non-deterministic behavior 
> if we stop processing right before a late record, that would be dropped if we 
> continue processing, but is not dropped after rebalance/restart. Let's look 
> at an examples with a grade period of 5ms for a tumbling windowed of 5ms, and 
> the following records (timestamps in parenthesis):
>  
> {code:java}
> r1(0) r2(5) r3(11) r4(2){code}
> In the example, stream-time advances as 0, 5, 11, 11  and thus record `r4` is 
> dropped as late because 2 < 6 = 11 - 5. However, if we stop processing or 
> rebalance after processing `r3` but before processing `r4`, we would 
> reinitialize stream-time as -1, and thus would process `r4` on restart/after 
> rebalance. The problem is, that stream-time does advance differently from a 
> global point of view: 0, 5, 11, 2.
> Note, this is a corner case, because if we would stop processing one record 
> earlier, ie, after processing `r2` but before processing `r3`, stream-time 
> would be advance correctly from a global point of view.
> A potential fix would be, to store latest observed partition-time in the 
> metadata of committed offsets. Thus way, on restart/rebalance we can 
> re-initialize time correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6029) Controller should wait for the leader migration to finish before ack a ControlledShutdownRequest

2019-05-27 Thread Jiangjie Qin (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848809#comment-16848809
 ] 

Jiangjie Qin commented on KAFKA-6029:
-

[~hachikuji] Yes, I think the issue should have been resolved. There are 
actually two scenarios mentioned in this ticket.

The original one is more of an issue that the controller messages are processed 
at different time in different brokers. This should have been addressed by 
[KIP-291|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-291%3A+Separating+controller+connections+and+requests+from+the+data+plane]].

Another issue was that shutting down brokers are incorrectly added back to ISR 
by the leader, which is addressed by 
[KIP-320|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation]]
 as you mentioned.

> Controller should wait for the leader migration to finish before ack a 
> ControlledShutdownRequest
> 
>
> Key: KAFKA-6029
> URL: https://issues.apache.org/jira/browse/KAFKA-6029
> Project: Kafka
>  Issue Type: Sub-task
>  Components: controller, core
>Affects Versions: 1.0.0
>Reporter: Jiangjie Qin
>Assignee: Zhanxiang (Patrick) Huang
>Priority: Major
>
> In the controlled shutdown process, the controller will return the 
> ControlledShutdownResponse immediately after the state machine is updated. 
> Because the LeaderAndIsrRequests and UpdateMetadataRequests may not have been 
> successfully processed by the brokers, the leader migration and active ISR 
> shrink may not have done when the shutting down broker proceeds to shut down. 
> This will cause some of the leaders to take up to replica.lag.time.max.ms to 
> kick the broker out of ISR. Meanwhile the produce purgatory size will grow.
> Ideally, the controller should wait until all the LeaderAndIsrRequests and 
> UpdateMetadataRequests has been acked before sending back the 
> ControlledShutdownResponse.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7994) Improve Stream-Time for rebalances and restarts

2019-05-27 Thread Bruno Cadonna (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848791#comment-16848791
 ] 

Bruno Cadonna commented on KAFKA-7994:
--

I run into a case where the reset of the stream time to -1 after a restart 
resulted in incorrect results. More specifically, an event that should have 
been dropped because the grace period was exceeded opened a window.

Suppose the following application:
{code:java}
builder
   .stream(INPUT_TOPIC, Consumed.with(Serdes.String(), Serdes.String()))
   .groupByKey()
   
.windowedBy(TimeWindows.of(Duration.ofMinutes(60)).grace(Duration.ofMinutes(30)))
   ...
{code}

with the following input (numbers are timestamps)
m1, m2, m3, m61, m4, m90, m5 
application restart
m6, m150

With the reset of the stream time to -1 after restart, the application will 
create the following windows:
[0, 60), {m1, m2, m3, m4}
[0, 60), {m6}
[60, 120), {m61, m90}

Window [0, 60), {m6} is wrong because if there had not be a restart, stream 
time would be at 90 when m6 reaches the processor. Stream time 90 is outside 
the grace period and m6 should be dropped. However, with the restart stream 
time is reset to -1 and m6 is not dropped because the condition to drop events 
timestamp < stream-time - grace-period is not satisfied, i.e., 6 < -1 - 30. 
Hence, a second window [0, 60) is incorrectly opened.

> Improve Stream-Time for rebalances and restarts
> ---
>
> Key: KAFKA-7994
> URL: https://issues.apache.org/jira/browse/KAFKA-7994
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Richard Yu
>Priority: Major
> Attachments: possible-patch.diff
>
>
> We compute a per-partition partition-time as the maximum timestamp over all 
> records processed so far. Furthermore, we use partition-time to compute 
> stream-time for each task as maximum over all partition-times (for all 
> corresponding task partitions). This stream-time is used to make decisions 
> about processing out-of-order records or drop them if they are late (ie, 
> timestamp < stream-time - grace-period).
> During rebalances and restarts, stream-time is initialized as UNKNOWN (ie, 
> -1) for tasks that are newly created (or migrated). In net effect, we forget 
> current stream-time for this case what may lead to non-deterministic behavior 
> if we stop processing right before a late record, that would be dropped if we 
> continue processing, but is not dropped after rebalance/restart. Let's look 
> at an examples with a grade period of 5ms for a tumbling windowed of 5ms, and 
> the following records (timestamps in parenthesis):
>  
> {code:java}
> r1(0) r2(5) r3(11) r4(2){code}
> In the example, stream-time advances as 0, 5, 11, 11  and thus record `r4` is 
> dropped as late because 2 < 6 = 11 - 5. However, if we stop processing or 
> rebalance after processing `r3` but before processing `r4`, we would 
> reinitialize stream-time as -1, and thus would process `r4` on restart/after 
> rebalance. The problem is, that stream-time does advance differently from a 
> global point of view: 0, 5, 11, 2.
> Note, this is a corner case, because if we would stop processing one record 
> earlier, ie, after processing `r2` but before processing `r3`, stream-time 
> would be advance correctly from a global point of view.
> A potential fix would be, to store latest observed partition-time in the 
> metadata of committed offsets. Thus way, on restart/rebalance we can 
> re-initialize time correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)