[jira] [Comment Edited] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-10-19 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955393#comment-16955393
 ] 

Raman Gupta edited comment on KAFKA-8803 at 10/20/19 5:34 AM:
--

[~bbejeck] Now the error is happening again, for two different streams than the 
stream which was failing with this error before. Both of streams now 
experiencing issus have also been running just fine until now, and changing 
`max.block.ms` for them. I still get the same error message. After setting:
{code:java}
props.put(StreamsConfig.producerPrefix(ProducerConfig.MAX_BLOCK_MS_CONFIG), 
120);{code}
it takes longer for the error to occur but after 20 minutes it still does:
{code:java}
2019-10-19 22:10:52,910 ERROR --- [c892e-StreamThread-1] 
org.apa.kaf.str.pro.int.StreamTask    : task [0_1] Timeout 
exception caught when initializing transactions for task 0_1. This might happen 
if the broker is slow to respond, if the network connection to the broker was 
interrupted, or if similar circumstances arise. You can increase producer 
parameter `max.block.ms` to increase this timeout
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
120milliseconds while awaiting InitProducerId{code}
So at this point, short of waiting 17 days for the stream to finally recover on 
its own as it did before, I don't know how to solve this, other than ditching 
Kafka entirely, which admittedly, is an idea looking better and better.

 


was (Author: rocketraman):
[~bbejeck] Now the error is happening again, for two different streams than the 
stream before before. Both of these streams have also been running just fine 
until now, and changing `max.block.ms` for the stream has no effect. I still 
get the same error message. After setting:
{code:java}
props.put(StreamsConfig.producerPrefix(ProducerConfig.MAX_BLOCK_MS_CONFIG), 
120);{code}
it takes longer for the error to occur but after 20 minutes it still does:
{code:java}
2019-10-19 22:10:52,910 ERROR --- [c892e-StreamThread-1] 
org.apa.kaf.str.pro.int.StreamTask    : task [0_1] Timeout 
exception caught when initializing transactions for task 0_1. This might happen 
if the broker is slow to respond, if the network connection to the broker was 
interrupted, or if similar circumstances arise. You can increase producer 
parameter `max.block.ms` to increase this timeout
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
120milliseconds while awaiting InitProducerId{code}
So at this point, short of waiting 17 days for the stream to finally recover on 
its own as it did before, I don't know how to solve this, other than ditching 
Kafka entirely, which admittedly, is an idea looking better and better.

 

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 

[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-10-19 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955393#comment-16955393
 ] 

Raman Gupta commented on KAFKA-8803:


[~bbejeck] Now the error is happening again, for two different streams than the 
stream before before. Both of these streams have also been running just fine 
until now, and changing `max.block.ms` for the stream has no effect. I still 
get the same error message. After setting:
{code:java}
props.put(StreamsConfig.producerPrefix(ProducerConfig.MAX_BLOCK_MS_CONFIG), 
120);{code}
it takes longer for the error to occur but after 20 minutes it still does:
{code:java}
2019-10-19 22:10:52,910 ERROR --- [c892e-StreamThread-1] 
org.apa.kaf.str.pro.int.StreamTask    : task [0_1] Timeout 
exception caught when initializing transactions for task 0_1. This might happen 
if the broker is slow to respond, if the network connection to the broker was 
interrupted, or if similar circumstances arise. You can increase producer 
parameter `max.block.ms` to increase this timeout
org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
120milliseconds while awaiting InitProducerId{code}
So at this point, short of waiting 17 days for the stream to finally recover on 
its own as it did before, I don't know how to solve this, other than ditching 
Kafka entirely, which admittedly, is an idea looking better and better.

 

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-7689) Add Commit/List Offsets Operations to AdminClient

2019-10-19 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-7689:
---
Fix Version/s: (was: 2.4.0)
   2.5.0

> Add Commit/List Offsets Operations to AdminClient
> -
>
> Key: KAFKA-7689
> URL: https://issues.apache.org/jira/browse/KAFKA-7689
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Reporter: Mickael Maison
>Assignee: Mickael Maison
>Priority: Major
> Fix For: 2.5.0
>
>
> Jira for KIP-396: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8964) Refactor Stream-Thread-level Metrics

2019-10-19 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-8964:
-
Fix Version/s: 2.5.0

> Refactor Stream-Thread-level Metrics 
> -
>
> Key: KAFKA-8964
> URL: https://issues.apache.org/jira/browse/KAFKA-8964
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Major
> Fix For: 2.5.0
>
>
> Refactor Stream-Thread-level metrics as specified in KIP-444



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8964) Refactor Stream-Thread-level Metrics

2019-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955286#comment-16955286
 ] 

ASF GitHub Bot commented on KAFKA-8964:
---

guozhangwang commented on pull request #7474: KAFKA-8964: Refactor thread-level 
metrics depending on built-in metrics version
URL: https://github.com/apache/kafka/pull/7474
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Stream-Thread-level Metrics 
> -
>
> Key: KAFKA-8964
> URL: https://issues.apache.org/jira/browse/KAFKA-8964
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Major
>
> Refactor Stream-Thread-level metrics as specified in KIP-444



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2019-10-19 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955262#comment-16955262
 ] 

Raman Gupta commented on KAFKA-8803:


And now suddenly I have this same problem again... this is super-frustrating.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Raman Gupta
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7739) Kafka Tiered Storage

2019-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955233#comment-16955233
 ] 

ASF GitHub Bot commented on KAFKA-7739:
---

satishd commented on pull request #7561: [WIP] KAFKA-7739: Tiered storage
URL: https://github.com/apache/kafka/pull/7561
 
 
   [WIP] This is the initial **draft** version of the KIP-405. It includes the 
initial set of changes required for plugging in a RemoteStorageManager. We will 
update the KIP and this PR in the next few days with more details.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Kafka Tiered Storage
> 
>
> Key: KAFKA-7739
> URL: https://issues.apache.org/jira/browse/KAFKA-7739
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Harsha
>Assignee: Harsha
>Priority: Major
>
> More detais are in the KIP 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9041) Flaky Test LogCleanerIntegrationTest#testIsThreadFailed

2019-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955222#comment-16955222
 ] 

ASF GitHub Bot commented on KAFKA-9041:
---

hachikuji commented on pull request #7542: [KAFKA-9041] Flaky Test 
LogCleanerIntegrationTest#testIsThreadFailed
URL: https://github.com/apache/kafka/pull/7542
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky Test LogCleanerIntegrationTest#testIsThreadFailed
> ---
>
> Key: KAFKA-9041
> URL: https://issues.apache.org/jira/browse/KAFKA-9041
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Reporter: Matthias J. Sax
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>  Labels: flaky-test
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/2622/testReport/junit/kafka.log/LogCleanerIntegrationTest/testIsThreadFailed/]
> {quote}java.lang.AssertionError: expected:<1> but was:<0> at 
> org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.failNotEquals(Assert.java:835) at 
> org.junit.Assert.assertEquals(Assert.java:647) at 
> org.junit.Assert.assertEquals(Assert.java:633) at 
> kafka.log.LogCleanerIntegrationTest.testIsThreadFailed(LogCleanerIntegrationTest.scala:211){quote}
> STDOUT:
> {quote}[2019-10-14 22:33:01,382] ERROR [kafka-log-cleaner-thread-0]: Error 
> due to (kafka.log.LogCleaner:76) java.lang.InterruptedException at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1081)
>  at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1369)
>  at 
> java.base/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:278) 
> at kafka.utils.ShutdownableThread.pause(ShutdownableThread.scala:82) at 
> kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:315) at 
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9026) Replace DescribeAcls request/response with automated protocol

2019-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955221#comment-16955221
 ] 

ASF GitHub Bot commented on KAFKA-9026:
---

mimaison commented on pull request #7560: KAFKA-9026: Use automatic RPC 
generation in DescribeAcls
URL: https://github.com/apache/kafka/pull/7560
 
 
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Replace DescribeAcls request/response with automated protocol
> -
>
> Key: KAFKA-9026
> URL: https://issues.apache.org/jira/browse/KAFKA-9026
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Mickael Maison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9067) BigDecimal conversion unnecessarily enforces the scale

2019-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955154#comment-16955154
 ] 

ASF GitHub Bot commented on KAFKA-9067:
---

piotrsmolinski commented on pull request #7559: KAFKA-9067: added support for 
changing the provided BigDecimal scale
URL: https://github.com/apache/kafka/pull/7559
 
 
   Kafka Connect schema framework failed whenever the scale of the set big 
decimal value
   was different from the one declared in the field definition. This change 
allows to gracefully
   expand or reduce scale using defined rounding rules.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BigDecimal conversion unnecessarily enforces the scale 
> ---
>
> Key: KAFKA-9067
> URL: https://issues.apache.org/jira/browse/KAFKA-9067
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.3.0
>Reporter: Piotr Smolinski
>Priority: Major
>
> In Kafka Connect schema framework it is possible to use fixed point decimal 
> numbers mapped as logical type Decimal. The type is related to Avro defined 
> logical type. When the type is used, the scale value is stored in the schema 
> definition (later it might end in Avro schema) and the unscaled value is 
> stored as integer of unbounded size.
> The problem arises when the decimal value to decode has different scale than 
> the one declared in the schema. During conversion to Avro or JSON using 
> standard converters the operation fails with DataException.
> The proposed solution is to use setScale method to adapt the scale to the 
> correct value and provide rounding mode as parameter to the schema:
> https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html#setScale-int-java.math.RoundingMode-
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-6144) Allow serving interactive queries from in-sync Standbys

2019-10-19 Thread Navinder Brar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navinder Brar updated KAFKA-6144:
-
Summary: Allow serving interactive queries from in-sync Standbys  (was: 
Allow state stores to serve stale reads during rebalance)

> Allow serving interactive queries from in-sync Standbys
> ---
>
> Key: KAFKA-6144
> URL: https://issues.apache.org/jira/browse/KAFKA-6144
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>Assignee: Navinder Brar
>Priority: Major
>  Labels: kip-535
> Attachments: image-2019-10-09-20-33-37-423.png, 
> image-2019-10-09-20-47-38-096.png
>
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround is to allow stale data to be read from the state stores when 
> use case allows.
> Relates to KAFKA-6145 - Warm up new KS instances before migrating tasks - 
> potentially a two phase rebalance
> This is the description from KAFKA-6031 (keeping this JIRA as the title is 
> more descriptive):
> {quote}
> Currently reads for a key are served by single replica, which has 2 drawbacks:
>  - if replica is down there is a down time in serving reads for keys it was 
> responsible for until a standby replica takes over
>  - in case of semantic partitioning some replicas might become hot and there 
> is no easy way to scale the read load
> If standby replicas would have endpoints that are exposed in StreamsMetadata 
> it would enable serving reads from several replicas, which would mitigate the 
> above drawbacks. 
> Due to the lag between replicas reading from multiple replicas simultaneously 
> would have weaker (eventual) consistency comparing to reads from single 
> replica. This however should be acceptable tradeoff in many cases.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-6144) Allow state stores to serve stale reads during rebalance

2019-10-19 Thread Navinder Brar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navinder Brar updated KAFKA-6144:
-
Summary: Allow state stores to serve stale reads during rebalance  (was: 
Allow serving interactive queries from in-sync Standbys)

> Allow state stores to serve stale reads during rebalance
> 
>
> Key: KAFKA-6144
> URL: https://issues.apache.org/jira/browse/KAFKA-6144
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>Assignee: Navinder Brar
>Priority: Major
>  Labels: kip-535
> Attachments: image-2019-10-09-20-33-37-423.png, 
> image-2019-10-09-20-47-38-096.png
>
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround is to allow stale data to be read from the state stores when 
> use case allows.
> Relates to KAFKA-6145 - Warm up new KS instances before migrating tasks - 
> potentially a two phase rebalance
> This is the description from KAFKA-6031 (keeping this JIRA as the title is 
> more descriptive):
> {quote}
> Currently reads for a key are served by single replica, which has 2 drawbacks:
>  - if replica is down there is a down time in serving reads for keys it was 
> responsible for until a standby replica takes over
>  - in case of semantic partitioning some replicas might become hot and there 
> is no easy way to scale the read load
> If standby replicas would have endpoints that are exposed in StreamsMetadata 
> it would enable serving reads from several replicas, which would mitigate the 
> above drawbacks. 
> Due to the lag between replicas reading from multiple replicas simultaneously 
> would have weaker (eventual) consistency comparing to reads from single 
> replica. This however should be acceptable tradeoff in many cases.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-6144) Allow serving interactive queries from in-sync Standbys

2019-10-19 Thread Navinder Brar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navinder Brar updated KAFKA-6144:
-
Labels: kip-535  (was: needs-kip)

> Allow serving interactive queries from in-sync Standbys
> ---
>
> Key: KAFKA-6144
> URL: https://issues.apache.org/jira/browse/KAFKA-6144
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>Assignee: Navinder Brar
>Priority: Major
>  Labels: kip-535
> Attachments: image-2019-10-09-20-33-37-423.png, 
> image-2019-10-09-20-47-38-096.png
>
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround is to allow stale data to be read from the state stores when 
> use case allows.
> Relates to KAFKA-6145 - Warm up new KS instances before migrating tasks - 
> potentially a two phase rebalance
> This is the description from KAFKA-6031 (keeping this JIRA as the title is 
> more descriptive):
> {quote}
> Currently reads for a key are served by single replica, which has 2 drawbacks:
>  - if replica is down there is a down time in serving reads for keys it was 
> responsible for until a standby replica takes over
>  - in case of semantic partitioning some replicas might become hot and there 
> is no easy way to scale the read load
> If standby replicas would have endpoints that are exposed in StreamsMetadata 
> it would enable serving reads from several replicas, which would mitigate the 
> above drawbacks. 
> Due to the lag between replicas reading from multiple replicas simultaneously 
> would have weaker (eventual) consistency comparing to reads from single 
> replica. This however should be acceptable tradeoff in many cases.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-6144) Allow serving interactive queries from in-sync Standbys

2019-10-19 Thread Navinder Brar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navinder Brar updated KAFKA-6144:
-
Summary: Allow serving interactive queries from in-sync Standbys  (was: 
Allow state stores to serve stale reads during rebalance)

> Allow serving interactive queries from in-sync Standbys
> ---
>
> Key: KAFKA-6144
> URL: https://issues.apache.org/jira/browse/KAFKA-6144
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>Assignee: Navinder Brar
>Priority: Major
>  Labels: needs-kip
> Attachments: image-2019-10-09-20-33-37-423.png, 
> image-2019-10-09-20-47-38-096.png
>
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround is to allow stale data to be read from the state stores when 
> use case allows.
> Relates to KAFKA-6145 - Warm up new KS instances before migrating tasks - 
> potentially a two phase rebalance
> This is the description from KAFKA-6031 (keeping this JIRA as the title is 
> more descriptive):
> {quote}
> Currently reads for a key are served by single replica, which has 2 drawbacks:
>  - if replica is down there is a down time in serving reads for keys it was 
> responsible for until a standby replica takes over
>  - in case of semantic partitioning some replicas might become hot and there 
> is no easy way to scale the read load
> If standby replicas would have endpoints that are exposed in StreamsMetadata 
> it would enable serving reads from several replicas, which would mitigate the 
> above drawbacks. 
> Due to the lag between replicas reading from multiple replicas simultaneously 
> would have weaker (eventual) consistency comparing to reads from single 
> replica. This however should be acceptable tradeoff in many cases.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)