[jira] [Commented] (KAFKA-7132) Consider adding faster form of rebalancing

2019-01-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731744#comment-16731744
 ] 

ASF GitHub Bot commented on KAFKA-7132:
---

ConcurrencyPractitioner commented on pull request #5340:  [KAFKA-7132] [WIP] 
Consider adding a faster form of rebalance 
URL: https://github.com/apache/kafka/pull/5340
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider adding faster form of rebalancing
> --
>
> Key: KAFKA-7132
> URL: https://issues.apache.org/jira/browse/KAFKA-7132
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Richard Yu
>Assignee: Richard Yu
>Priority: Critical
>  Labels: performance
>
> Currently, when a consumer falls out of a consumer group, it will restart 
> processing from the last checkpointed offset. However, this design could 
> result in a lag which some users could not afford to let happen. For example, 
> lets say a consumer crashed at offset 100, with the last checkpointed offset 
> being at 70. When it recovers at a later offset (say, 120), it will be behind 
> by an offset range of 50 (120 - 70). This is because the consumer restarted 
> at 70, forcing it to reprocess old data. To avoid this from happening, one 
> option would be to allow the current consumer to start processing not from 
> the last checkpointed offset (which is 70 in the example), but from 120 where 
> it recovers. Meanwhile, a new KafkaConsumer will be instantiated and start 
> reading from offset 70 in concurrency with the old process, and will be 
> terminated once it reaches 120. In this manner, a considerable amount of lag 
> can be avoided, particularly since the old consumer could proceed as if 
> nothing had happened. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7132) Consider adding faster form of rebalancing

2018-07-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534455#comment-16534455
 ] 

ASF GitHub Bot commented on KAFKA-7132:
---

ConcurrencyPractitioner opened a new pull request #5340:  [KAFKA-7132] [WIP] 
Consider adding a faster form of rebalance 
URL: https://github.com/apache/kafka/pull/5340
 
 
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider adding faster form of rebalancing
> --
>
> Key: KAFKA-7132
> URL: https://issues.apache.org/jira/browse/KAFKA-7132
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Richard Yu
>Priority: Critical
>  Labels: performance
>
> Currently, when a consumer falls out of a consumer group, it will restart 
> processing from the last checkpointed offset. However, this design could 
> result in a lag which some users could not afford to let happen. For example, 
> lets say a consumer crashed at offset 100, with the last checkpointed offset 
> being at 70. When it recovers at a later offset (say, 120), it will be behind 
> by an offset range of 50 (120 - 70). This is because the consumer restarted 
> at 70, forcing it to reprocess old data. To avoid this from happening, one 
> option would be to allow the current consumer to start processing not from 
> the last checkpointed offset (which is 70 in the example), but from 120 where 
> it recovers. Meanwhile, a new KafkaConsumer will be instantiated and start 
> reading from offset 70 in concurrency with the old process, and will be 
> terminated once it reaches 120. In this manner, a considerable amount of lag 
> can be avoided, particularly since the old consumer could proceed as if 
> nothing had happened. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7132) Consider adding faster form of rebalancing

2018-07-04 Thread Richard Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533252#comment-16533252
 ] 

Richard Yu commented on KAFKA-7132:
---

Hi all,

You could find the KIP here. (Discussion thread TBD)

[https://cwiki.apache.org/confluence/display/KAFKA/KIP-333%3A+Add+faster+mode+of+rebalancing#KIP-333:Addfastermodeofrebalancing-RejectedAlternatives]

 

> Consider adding faster form of rebalancing
> --
>
> Key: KAFKA-7132
> URL: https://issues.apache.org/jira/browse/KAFKA-7132
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Richard Yu
>Priority: Critical
>  Labels: performance
>
> Currently, when a consumer falls out of a consumer group, it will restart 
> processing from the last checkpointed offset. However, this design could 
> result in a lag which some users could not afford to let happen. For example, 
> lets say a consumer crashed at offset 100, with the last checkpointed offset 
> being at 70. When it recovers at a later offset (say, 120), it will be behind 
> by an offset range of 50 (120 - 70). This is because the consumer restarted 
> at 70, forcing it to reprocess old data. To avoid this from happening, one 
> option would be to allow the current consumer to start processing not from 
> the last checkpointed offset (which is 70 in the example), but from 120 where 
> it recovers. Meanwhile, a new KafkaConsumer will be instantiated and start 
> reading from offset 70 in concurrency with the old process, and will be 
> terminated once it reaches 120. In this manner, a considerable amount of lag 
> can be avoided, particularly since the old consumer could proceed as if 
> nothing had happened. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7132) Consider adding faster form of rebalancing

2018-07-04 Thread Richard Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533162#comment-16533162
 ] 

Richard Yu commented on KAFKA-7132:
---

Well, when I was imagining this situation, I was thinking that the log's 
maximum extent is at offset 70.

> Consider adding faster form of rebalancing
> --
>
> Key: KAFKA-7132
> URL: https://issues.apache.org/jira/browse/KAFKA-7132
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Richard Yu
>Priority: Critical
>  Labels: performance
>
> Currently, when a consumer falls out of a consumer group, it will restart 
> processing from the last checkpointed offset. However, this design could 
> result in a lag which some users could not afford to let happen. For example, 
> lets say a consumer crashed at offset 100, with the last checkpointed offset 
> being at 70. When it recovers at a later offset (say, 120), it will be behind 
> by an offset range of 50 (120 - 70). This is because the consumer restarted 
> at 70, forcing it to reprocess old data. To avoid this from happening, one 
> option would be to allow the current consumer to start processing not from 
> the last checkpointed offset (which is 70 in the example), but from 120 where 
> it recovers. Meanwhile, a new KafkaConsumer will be instantiated and start 
> reading from offset 70 in concurrency with the old process, and will be 
> terminated once it reaches 120. In this manner, a considerable amount of lag 
> can be avoided, particularly since the old consumer could proceed as if 
> nothing had happened. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7132) Consider adding faster form of rebalancing

2018-07-04 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532980#comment-16532980
 ] 

Guozhang Wang commented on KAFKA-7132:
--

[~Yohan123] There are two things we should consider here, [~enether] has 
mentioned one, that is to guarantee offset ordering for consumption. Another 
thing is to guarantee at-least-once semantics by default. Resuming from the 
last committed offset would likely introduce duplicated records to be 
processed, but would also avoid data loss. Restarting from the latest offset 
(I'm not sure what do you mean by "it recovers at a later offset", so I'd 
assume you meant to say when consumer resumes, the log has grown to offset 120) 
would cause you to lose the data from 100 - 120, while using a separate 
consumer to cover the gap would violate ordering guarantees.

> Consider adding faster form of rebalancing
> --
>
> Key: KAFKA-7132
> URL: https://issues.apache.org/jira/browse/KAFKA-7132
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Richard Yu
>Priority: Critical
>  Labels: performance
>
> Currently, when a consumer falls out of a consumer group, it will restart 
> processing from the last checkpointed offset. However, this design could 
> result in a lag which some users could not afford to let happen. For example, 
> lets say a consumer crashed at offset 100, with the last checkpointed offset 
> being at 70. When it recovers at a later offset (say, 120), it will be behind 
> by an offset range of 50 (120 - 70). This is because the consumer restarted 
> at 70, forcing it to reprocess old data. To avoid this from happening, one 
> option would be to allow the current consumer to start processing not from 
> the last checkpointed offset (which is 70 in the example), but from 120 where 
> it recovers. Meanwhile, a new KafkaConsumer will be instantiated and start 
> reading from offset 70 in concurrency with the old process, and will be 
> terminated once it reaches 120. In this manner, a considerable amount of lag 
> can be avoided, particularly since the old consumer could proceed as if 
> nothing had happened. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7132) Consider adding faster form of rebalancing

2018-07-04 Thread Stanislav Kozlovski (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532545#comment-16532545
 ] 

Stanislav Kozlovski commented on KAFKA-7132:


The best way to consider this is to open a KIP and pass it to the mailing group 
for thorough discussion.
This is a good way to avoid lag but unfortunately will mess up every ordering 
guarantee.

> Consider adding faster form of rebalancing
> --
>
> Key: KAFKA-7132
> URL: https://issues.apache.org/jira/browse/KAFKA-7132
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Richard Yu
>Priority: Critical
>  Labels: performance
>
> Currently, when a consumer falls out of a consumer group, it will restart 
> processing from the last checkpointed offset. However, this design could 
> result in a lag which some users could not afford to let happen. For example, 
> lets say a consumer crashed at offset 100, with the last checkpointed offset 
> being at 70. When it recovers at a later offset (say, 120), it will be behind 
> by an offset range of 50 (120 - 70). This is because the consumer restarted 
> at 70, forcing it to reprocess old data. To avoid this from happening, one 
> option would be to allow the current consumer to start processing not from 
> the last checkpointed offset (which is 70 in the example), but from 120 where 
> it recovers. Meanwhile, a new KafkaConsumer will be instantiated and start 
> reading from offset 70 in concurrency with the old process, and will be 
> terminated once it reaches 120. In this manner, a considerable amount of lag 
> can be avoided, particularly since the old consumer could proceed as if 
> nothing had happened. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)