[jira] [Created] (KAFKA-10258) Get rid of use_zk_connection flag in kafka.py public methods

2020-07-09 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-10258:
--

 Summary: Get rid of use_zk_connection flag in kafka.py public 
methods
 Key: KAFKA-10258
 URL: https://issues.apache.org/jira/browse/KAFKA-10258
 Project: Kafka
  Issue Type: Sub-task
Reporter: Vinoth Chandar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10213) Prefer --bootstrap-server in ducktape tests for Kafka clients

2020-06-29 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-10213:
--

 Summary: Prefer --bootstrap-server in ducktape tests for Kafka 
clients
 Key: KAFKA-10213
 URL: https://issues.apache.org/jira/browse/KAFKA-10213
 Project: Kafka
  Issue Type: Sub-task
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10138) Prefer --bootstrap-server for reassign_partitions command in ducktape tests

2020-06-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved KAFKA-10138.

Resolution: Fixed

> Prefer --bootstrap-server for reassign_partitions command in ducktape tests
> ---
>
> Key: KAFKA-10138
> URL: https://issues.apache.org/jira/browse/KAFKA-10138
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10174) Prefer --bootstrap-server ducktape tests using kafka_configs.sh

2020-06-16 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-10174:
--

 Summary: Prefer --bootstrap-server ducktape tests using 
kafka_configs.sh
 Key: KAFKA-10174
 URL: https://issues.apache.org/jira/browse/KAFKA-10174
 Project: Kafka
  Issue Type: Sub-task
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10138) Prefer --bootstrap-server for reassign_partitions command in ducktape tests

2020-06-10 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-10138:
--

 Summary: Prefer --bootstrap-server for reassign_partitions command 
in ducktape tests
 Key: KAFKA-10138
 URL: https://issues.apache.org/jira/browse/KAFKA-10138
 Project: Kafka
  Issue Type: Sub-task
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10131) Minimize use of --zookeeper flag in ducktape tests

2020-06-09 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-10131:
--

 Summary: Minimize use of --zookeeper flag in ducktape tests
 Key: KAFKA-10131
 URL: https://issues.apache.org/jira/browse/KAFKA-10131
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar


Get the ducktape tests working without the --zookeeper flag (except for scram).

(Note: When doing compat testing we'll still use the old flags.)

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10071) TopicCommand tool should make more efficient metadata calls to Kafka Servers

2020-05-29 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-10071:
--

 Summary: TopicCommand tool should make more efficient metadata 
calls to Kafka Servers
 Key: KAFKA-10071
 URL: https://issues.apache.org/jira/browse/KAFKA-10071
 Project: Kafka
  Issue Type: Improvement
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar


This is a follow up from discussion of. KAFKA-9945 

[https://github.com/apache/kafka/pull/8737] 

alter, describe, delete all pull down the entire topic list today, in order to 
support regex matching .. We need to make these commands much more efficient 
(there is also the issue that regex includes support for period.. so may be we 
need two different switches going forward).. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9512) Flaky Test LagFetchIntegrationTest.shouldFetchLagsDuringRestoration

2020-02-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved KAFKA-9512.
---
Resolution: Fixed

Closing since the PR is now landed

> Flaky Test LagFetchIntegrationTest.shouldFetchLagsDuringRestoration
> ---
>
> Key: KAFKA-9512
> URL: https://issues.apache.org/jira/browse/KAFKA-9512
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.5.0
>Reporter: Matthias J. Sax
>Assignee: Vinoth Chandar
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.5.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/497/testReport/junit/org.apache.kafka.streams.integration/LagFetchIntegrationTest/shouldFetchLagsDuringRestoration/]
> {quote}java.lang.NullPointerException at 
> org.apache.kafka.streams.integration.LagFetchIntegrationTest.shouldFetchLagsDuringRestoration(LagFetchIntegrationTest.java:306){quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9431) Expose API in KafkaStreams to fetch all local offset lags

2020-01-14 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-9431:
-

 Summary: Expose API in KafkaStreams to fetch all local offset lags
 Key: KAFKA-9431
 URL: https://issues.apache.org/jira/browse/KAFKA-9431
 Project: Kafka
  Issue Type: Sub-task
  Components: streams
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar
 Fix For: 2.5.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9430) Tighten up lag estimates when source topic optimization is on

2020-01-14 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-9430:
-

 Summary: Tighten up lag estimates when source topic optimization 
is on 
 Key: KAFKA-9430
 URL: https://issues.apache.org/jira/browse/KAFKA-9430
 Project: Kafka
  Issue Type: Sub-task
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar


Right now, we use _endOffsets_ of the source topic for the computation. Since 
the source topics can also have user event produces, this is an over estimate

 

>From John:

For "optimized" changelogs, this will be wrong, strictly speaking, but it's an 
over-estimate (which seems better than an under-estimate), and it's also still 
an apples-to-apples comparison, since all replicas would use the same upper 
bound to compute their lags, so the "pick the freshest" replica is still going 
to pick the right one. We can add a new 2.5 blocker ticket to really fix it, 
and not worry about it until after this KSQL stuff is done.

 

For active: we need to use  consumed offsets and not end of source topic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9429) Allow ability to control whether stale reads out of state stores are desirable

2020-01-14 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-9429:
-

 Summary: Allow ability to control whether stale reads out of state 
stores are desirable
 Key: KAFKA-9429
 URL: https://issues.apache.org/jira/browse/KAFKA-9429
 Project: Kafka
  Issue Type: Sub-task
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar
 Fix For: 2.5.0


>From John :

 

I also meant to talk with you about the change to allow querying recovering 
stores. I think you might have already talked with Matthias a little about this 
in the scope of KIP-216, but it's probably not ok to just change the default 
from only allowing query while running, since there are actually people 
depending on full-consistency queries for correctness right now.

 

What we can do is add an overload {{KafkaStreams.store(name, 
QueriableStoreType, QueriableStoreOptions)}}, with one option: 
{{queryStaleState(true/false)}} (your preference on the name, I just made that 
up right now). The default would be false, and KSQL would set it to true. While 
false, it would not allow querying recovering stores OR standbys. This 
basically allows a single switch to preserve existing behavior.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9428) Expose standby information in KafkaStreams via queryMetadataForKey API

2020-01-14 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-9428:
-

 Summary: Expose standby information in KafkaStreams via 
queryMetadataForKey API
 Key: KAFKA-9428
 URL: https://issues.apache.org/jira/browse/KAFKA-9428
 Project: Kafka
  Issue Type: Sub-task
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar
 Fix For: 2.5.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8994) Streams should expose standby replication information & allow stale reads of state store

2019-10-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved KAFKA-8994.
---
Resolution: Duplicate

> Streams should expose standby replication information & allow stale reads of 
> state store
> 
>
> Key: KAFKA-8994
> URL: https://issues.apache.org/jira/browse/KAFKA-8994
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: needs-kip
>
> Currently Streams interactive queries (IQ) fail during the time period where 
> there is a rebalance in progress. 
> Consider the following scenario in a three node Streams cluster with node A, 
> node S and node R, executing a stateful sub-topology/topic group with 1 
> partition and `_num.standby.replicas=1_`  
>  * *t0*: A is the active instance owning the partition, B is the standby that 
> keeps replicating the A's state into its local disk, R just routes streams 
> IQs to active instance using StreamsMetadata
>  * *t1*: IQs pick node R as router, R forwards query to A, A responds back to 
> R which reverse forwards back the results.
>  * *t2:* Active A instance is killed and rebalance begins. IQs start failing 
> to A
>  * *t3*: Rebalance assignment happens and standby B is now promoted as active 
> instance. IQs continue to fail
>  * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last 
> commit position, IQs continue to fail
>  * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs 
> start succeeding again
>  
> Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can 
> take few seconds (~10 seconds based on defaults values). Depending on how 
> laggy the standby B was prior to A being killed, t4 can take few 
> seconds-minutes. 
> While this behavior favors consistency over availability at all times, the 
> long unavailability window might be undesirable for certain classes of 
> applications (e.g simple caches or dashboards). 
> This issue aims to also expose information about standby B to R, during each 
> rebalance such that the queries can be routed by an application to a standby 
> to serve stale reads, choosing availability over consistency. 
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-8994) Streams should expose standby replication information & allow stale reads of state store

2019-10-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reopened KAFKA-8994:
---

> Streams should expose standby replication information & allow stale reads of 
> state store
> 
>
> Key: KAFKA-8994
> URL: https://issues.apache.org/jira/browse/KAFKA-8994
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: needs-kip
>
> Currently Streams interactive queries (IQ) fail during the time period where 
> there is a rebalance in progress. 
> Consider the following scenario in a three node Streams cluster with node A, 
> node S and node R, executing a stateful sub-topology/topic group with 1 
> partition and `_num.standby.replicas=1_`  
>  * *t0*: A is the active instance owning the partition, B is the standby that 
> keeps replicating the A's state into its local disk, R just routes streams 
> IQs to active instance using StreamsMetadata
>  * *t1*: IQs pick node R as router, R forwards query to A, A responds back to 
> R which reverse forwards back the results.
>  * *t2:* Active A instance is killed and rebalance begins. IQs start failing 
> to A
>  * *t3*: Rebalance assignment happens and standby B is now promoted as active 
> instance. IQs continue to fail
>  * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last 
> commit position, IQs continue to fail
>  * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs 
> start succeeding again
>  
> Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can 
> take few seconds (~10 seconds based on defaults values). Depending on how 
> laggy the standby B was prior to A being killed, t4 can take few 
> seconds-minutes. 
> While this behavior favors consistency over availability at all times, the 
> long unavailability window might be undesirable for certain classes of 
> applications (e.g simple caches or dashboards). 
> This issue aims to also expose information about standby B to R, during each 
> rebalance such that the queries can be routed by an application to a standby 
> to serve stale reads, choosing availability over consistency. 
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8994) Streams should expose standby replication information & allow stale reads of state store

2019-10-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved KAFKA-8994.
---
Resolution: Fixed

> Streams should expose standby replication information & allow stale reads of 
> state store
> 
>
> Key: KAFKA-8994
> URL: https://issues.apache.org/jira/browse/KAFKA-8994
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: needs-kip
>
> Currently Streams interactive queries (IQ) fail during the time period where 
> there is a rebalance in progress. 
> Consider the following scenario in a three node Streams cluster with node A, 
> node S and node R, executing a stateful sub-topology/topic group with 1 
> partition and `_num.standby.replicas=1_`  
>  * *t0*: A is the active instance owning the partition, B is the standby that 
> keeps replicating the A's state into its local disk, R just routes streams 
> IQs to active instance using StreamsMetadata
>  * *t1*: IQs pick node R as router, R forwards query to A, A responds back to 
> R which reverse forwards back the results.
>  * *t2:* Active A instance is killed and rebalance begins. IQs start failing 
> to A
>  * *t3*: Rebalance assignment happens and standby B is now promoted as active 
> instance. IQs continue to fail
>  * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last 
> commit position, IQs continue to fail
>  * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs 
> start succeeding again
>  
> Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can 
> take few seconds (~10 seconds based on defaults values). Depending on how 
> laggy the standby B was prior to A being killed, t4 can take few 
> seconds-minutes. 
> While this behavior favors consistency over availability at all times, the 
> long unavailability window might be undesirable for certain classes of 
> applications (e.g simple caches or dashboards). 
> This issue aims to also expose information about standby B to R, during each 
> rebalance such that the queries can be routed by an application to a standby 
> to serve stale reads, choosing availability over consistency. 
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-8994) Streams should expose standby replication information & allow stale reads of state store

2019-10-07 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-8994:
-

 Summary: Streams should expose standby replication information & 
allow stale reads of state store
 Key: KAFKA-8994
 URL: https://issues.apache.org/jira/browse/KAFKA-8994
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Reporter: Vinoth Chandar


Currently Streams interactive queries (IQ) fail during the time period where 
there is a rebalance in progress. 

Consider the following scenario in a three node Streams cluster with node A, 
node S and node R, executing a stateful sub-topology/topic group with 1 
partition and `_num.standby.replicas=1_`  
 * *t0*: A is the active instance owning the partition, B is the standby that 
keeps replicating the A's state into its local disk, R just routes streams IQs 
to active instance using StreamsMetadata
 * *t1*: IQs pick node R as router, R forwards query to A, A responds back to R 
which reverse forwards back the results.
 * *t2:* Active A instance is killed and rebalance begins. IQs start failing to 
A
 * *t3*: Rebalance assignment happens and standby B is now promoted as active 
instance. IQs continue to fail
 * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last 
commit position, IQs continue to fail
 * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs 
start succeeding again

 

Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can 
take few seconds (~10 seconds based on defaults values). Depending on how laggy 
the standby B was prior to A being killed, t4 can take few seconds-minutes. 

 

While this behavior favors consistency over availability at all times, the long 
unavailability window might be undesirable for certain classes of applications 
(e.g simple caches or dashboards). 

 

This issue aims to also expose information about standby B to R, during each 
rebalance such that the queries can be routed by an application to a standby to 
serve stale reads, choosing availability over consistency. 

 

 

 

 

 

 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8839) Improve logging in Kafka Streams around debugging task lifecycle

2019-10-07 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved KAFKA-8839.
---
Resolution: Fixed

Closing since the PR has been merged

> Improve logging in Kafka Streams around debugging task lifecycle 
> -
>
> Key: KAFKA-8839
> URL: https://issues.apache.org/jira/browse/KAFKA-8839
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 2.4.0
>
>
> As a follow up to KAFKA-8831, this Jira will track efforts around improving 
> logging/docs around 
>  
>  * Being able to follow state of tasks from assignment to restoration 
>  * Better detection of misconfigured state store dir 
>  * Docs giving guidance for rebalance time and state store config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8913) Document topic based configs & ISR settings for Streams apps

2019-09-18 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved KAFKA-8913.
---
Resolution: Fixed

> Document topic based configs & ISR settings for Streams apps
> 
>
> Key: KAFKA-8913
> URL: https://issues.apache.org/jira/browse/KAFKA-8913
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation, streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> Noticed that it was not clear how to configure the internal topics . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-8913) Document topic based configs & ISR settings for Streams apps

2019-09-16 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-8913:
-

 Summary: Document topic based configs & ISR settings for Streams 
apps
 Key: KAFKA-8913
 URL: https://issues.apache.org/jira/browse/KAFKA-8913
 Project: Kafka
  Issue Type: Improvement
  Components: documentation, streams
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar


Noticed that it was not clear how to configure the internal topics . 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (KAFKA-8870) Prevent dirty reads of Streams state store from Interactive queries

2019-09-04 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-8870:
-

 Summary: Prevent dirty reads of Streams state store from 
Interactive queries
 Key: KAFKA-8870
 URL: https://issues.apache.org/jira/browse/KAFKA-8870
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Vinoth Chandar


Today, Interactive Queries (IQ) against Streams state store could see 
uncommitted data, even with EOS processing guarantees (these are actually 
orthogonal, but clarifying since EOS may give the impression that everything is 
dandy). This is causes primarily because state updates in rocksdb are visible 
even before the kafka transaction is committed. Thus, if the instance fails, 
then the failed over instance will redo the uncommited old transaction and the 
following could be possible during recovery,.

Value for key K can go from *V0 → V1 → V2* on active instance A, IQ reads V1, 
instance A fails and any failure/rebalancing will leave the standy instance B 
rewinding offsets and reprocessing, during which time IQ can again see V0 or V1 
or any number of previous values for the same key.

In this issue, we will plan work towards providing consistency for IQ, for a 
single row in a single state store. i.e once a query sees V1, it can only see 
either V1 or V2.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (KAFKA-8839) Improve logging in Kafka Streams around debugging task lifecycle

2019-08-27 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-8839:
-

 Summary: Improve logging in Kafka Streams around debugging task 
lifecycle 
 Key: KAFKA-8839
 URL: https://issues.apache.org/jira/browse/KAFKA-8839
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Vinoth Chandar


As a follow up to KAFKA-8831, this Jira will track efforts around improving 
logging/docs around 

 
 * Being able to follow state of tasks from assignment to restoration 
 * Better detection of misconfigured state store dir 
 * Docs giving guidance for rebalance time and state store config



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (KAFKA-8831) Joining a new instance sometimes does not cause rebalancing

2019-08-26 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved KAFKA-8831.
---
Resolution: Not A Problem

I will think about better logging and put up a patch. Closing this issue

> Joining a new instance sometimes does not cause rebalancing
> ---
>
> Key: KAFKA-8831
> URL: https://issues.apache.org/jira/browse/KAFKA-8831
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Chris Pettitt
>Assignee: Chris Pettitt
>Priority: Major
> Attachments: StandbyTaskTest.java, fail.log
>
>
> See attached log. The application is in a REBALANCING state. The second 
> instance joins a bit after the first instance (~250ms). The group coordinator 
> says it is going to rebalance but nothing happens. The first instance gets 
> all partitions (2). The application transitions to RUNNING.
> See attached test, which starts one client and then starts another about 
> 250ms later. This seems to consistently repro the issue for me.
> This is blocking my work on KAFKA-8755, so I'm inclined to pick it up



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (KAFKA-8810) Add mechanism to detect topology mismatch between streams instances

2019-08-16 Thread Vinoth Chandar (JIRA)
Vinoth Chandar created KAFKA-8810:
-

 Summary: Add mechanism to detect topology mismatch between streams 
instances
 Key: KAFKA-8810
 URL: https://issues.apache.org/jira/browse/KAFKA-8810
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Vinoth Chandar


Noticed this while reading through the StreamsPartitionAssignor related code. 
If an user accidentally deploys a different topology on one of the instances, 
there is no mechanism to detect this and refuse assignment/take action. Given 
Kafka Streams is designed as an embeddable library, I feel this is rather an 
important scenario to handle. For e.g, kafka streams is embedded into a web 
front end tier and operators deploy a hot fix for a site issue to a few 
instances that are leaking memory and that accidentally also deploys some 
topology changes with it. 


Please feel free to close the issue, if its a duplicate. (Could not find a 
ticket for this) 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8799) Support ability to pass global user data to consumers during Assignment

2019-08-13 Thread Vinoth Chandar (JIRA)
Vinoth Chandar created KAFKA-8799:
-

 Summary: Support ability to pass global user data to consumers 
during Assignment
 Key: KAFKA-8799
 URL: https://issues.apache.org/jira/browse/KAFKA-8799
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Vinoth Chandar


This is a follow up from KAFKA-7149 

*Background :* 

Although we reduced the size of the AssignmentInfo object sent during each 
rebalance from leader to all followers in KAFKA-7149, we still repeat the same 
_partitionsByHost_ map for each host (all this when interactive queries are 
enabled) and thus still end up sending redundant bytes to the broker and also 
logging a large kafka message.

With 100s of streams instances, this overhead can grow into tens of megabytes 
easily.  

*Proposal :*

Extend the group assignment protocol to be able to support passing of an 
additional byte[], which can now contain the HostInfo -> 
partitions/partitionsByHost data just one time. 

{code}
final class GroupAssignment {
private final Map assignments;

// bytes sent to each consumer from leader
private final byte[] globalUserData
...
}
{code}
 
This can generally be handy to any other application like Streams, that does 
some stateful processing or lightweight cluster management 
 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-2580) Kafka Broker keeps file handles open for all log files (even if its not written to/read from)

2015-11-02 Thread Vinoth Chandar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985707#comment-14985707
 ] 

Vinoth Chandar commented on KAFKA-2580:
---

[~jkreps] ah ok. point taken :) 

> Kafka Broker keeps file handles open for all log files (even if its not 
> written to/read from)
> -
>
> Key: KAFKA-2580
> URL: https://issues.apache.org/jira/browse/KAFKA-2580
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Vinoth Chandar
>Assignee: Grant Henke
>
> We noticed this in one of our clusters where we stage logs for a longer 
> amount of time. It appears that the Kafka broker keeps file handles open even 
> for non active (not written to or read from) files. (in fact, there are some 
> threads going back to 2013 
> http://grokbase.com/t/kafka/users/132p65qwcn/keeping-logs-forever) 
> Needless to say, this is a problem and forces us to either artificially bump 
> up ulimit (its already at 100K) or expand the cluster (even if we have 
> sufficient IO and everything). 
> Filing this ticket, since I could find anything similar. Very interested to 
> know if there are plans to address this (given how Samza's changelog topic is 
> meant to be a persistent large state use case).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2580) Kafka Broker keeps file handles open for all log files (even if its not written to/read from)

2015-10-31 Thread Vinoth Chandar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984243#comment-14984243
 ] 

Vinoth Chandar commented on KAFKA-2580:
---

Based on this, looks like we can close this? 

>> So a lot of this comes down to the implementation. A naive 10k item LRU 
>> cache could easily be far more memory hungry than having 50k open FDs, plus 
>> being in heap this would add a huge number of objects to manage.

[~jkreps] I am a little confused. What I meant by LRU cache was simply limiting 
the number of "java.io.File" objects (or equivalent in Kafka codebase) that 
represents the handle to the segment. So, if there are 10K such objects in a 
(properly sized) ConcurrentHashMap, how would that add to the memory overhead 
so much, compared to holding 50K/200K objects anyway?

> Kafka Broker keeps file handles open for all log files (even if its not 
> written to/read from)
> -
>
> Key: KAFKA-2580
> URL: https://issues.apache.org/jira/browse/KAFKA-2580
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Vinoth Chandar
>Assignee: Grant Henke
>
> We noticed this in one of our clusters where we stage logs for a longer 
> amount of time. It appears that the Kafka broker keeps file handles open even 
> for non active (not written to or read from) files. (in fact, there are some 
> threads going back to 2013 
> http://grokbase.com/t/kafka/users/132p65qwcn/keeping-logs-forever) 
> Needless to say, this is a problem and forces us to either artificially bump 
> up ulimit (its already at 100K) or expand the cluster (even if we have 
> sufficient IO and everything). 
> Filing this ticket, since I could find anything similar. Very interested to 
> know if there are plans to address this (given how Samza's changelog topic is 
> meant to be a persistent large state use case).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2580) Kafka Broker keeps file handles open for all log files (even if its not written to/read from)

2015-10-19 Thread Vinoth Chandar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963574#comment-14963574
 ] 

Vinoth Chandar commented on KAFKA-2580:
---

[~jjkoshy] Good point. if I understand correctly, even if say all consumers 
start bootstrapping with startTime=earliest, which can just force opening of 
all file handles, an LRU based scheme would keep closing the file handles 
internally from oldest to latest file, which still is good behaviour. In order 
to lessen the impact of fs.close() on old file by delegating to a background 
thread, which takes a config that caps the number of items in the file handle 
cache. 

I like the cache approach better since it will be one place thru which all 
access go,so future feature transparently play nicely with overall system 
limits. 

> Kafka Broker keeps file handles open for all log files (even if its not 
> written to/read from)
> -
>
> Key: KAFKA-2580
> URL: https://issues.apache.org/jira/browse/KAFKA-2580
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Vinoth Chandar
>Assignee: Grant Henke
>
> We noticed this in one of our clusters where we stage logs for a longer 
> amount of time. It appears that the Kafka broker keeps file handles open even 
> for non active (not written to or read from) files. (in fact, there are some 
> threads going back to 2013 
> http://grokbase.com/t/kafka/users/132p65qwcn/keeping-logs-forever) 
> Needless to say, this is a problem and forces us to either artificially bump 
> up ulimit (its already at 100K) or expand the cluster (even if we have 
> sufficient IO and everything). 
> Filing this ticket, since I could find anything similar. Very interested to 
> know if there are plans to address this (given how Samza's changelog topic is 
> meant to be a persistent large state use case).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2580) Kafka Broker keeps file handles open for all log files (even if its not written to/read from)

2015-10-02 Thread Vinoth Chandar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941910#comment-14941910
 ] 

Vinoth Chandar commented on KAFKA-2580:
---

A LRU file handle is something very commonly employed in databases, which works 
pretty well in practice. (considering that it involves random access).  So +1  
on that path. 

[~granthenke] would you have cycles for this? If no one is working on this 
currently, we (uber) can take a stab at this, later this quarter. 


> Kafka Broker keeps file handles open for all log files (even if its not 
> written to/read from)
> -
>
> Key: KAFKA-2580
> URL: https://issues.apache.org/jira/browse/KAFKA-2580
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Vinoth Chandar
>
> We noticed this in one of our clusters where we stage logs for a longer 
> amount of time. It appears that the Kafka broker keeps file handles open even 
> for non active (not written to or read from) files. (in fact, there are some 
> threads going back to 2013 
> http://grokbase.com/t/kafka/users/132p65qwcn/keeping-logs-forever) 
> Needless to say, this is a problem and forces us to either artificially bump 
> up ulimit (its already at 100K) or expand the cluster (even if we have 
> sufficient IO and everything). 
> Filing this ticket, since I could find anything similar. Very interested to 
> know if there are plans to address this (given how Samza's changelog topic is 
> meant to be a persistent large state use case).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2580) Kafka Broker keeps file handles open for all log files (even if its not written to/read from)

2015-09-25 Thread Vinoth Chandar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908298#comment-14908298
 ] 

Vinoth Chandar commented on KAFKA-2580:
---

More context on how we determined this

{code}
vinoth@kafka-agg:~$ sudo ls -l /proc//fd | wc -l
50820
vinoth@kafka-agg::~$ ls -R /var/kafka-spool/data | grep -e ".log" -e ".index" | 
wc -l
97242
vinoth@kafka-agg::~$ ls -R /var/kafka-spool/data | grep -e ".index" | wc -l
48456
vinoth@kafka-agg::~$ ls -R /var/kafka-spool/data | grep -e ".log"  | wc -l
48788


vinoth@kafka-changelog-cluster:~$ sudo ls -l /proc//fd | wc -l
59128
vinoth@kafka-changelog-cluster:~$ ls -R /var/kafka-spool/data | grep -e ".log" 
-e ".index" | wc -l
117548
vinoth@kafka-changelog-cluster:~$ ls -R /var/kafka-spool/data | grep  -e 
".index" | wc -l 
58774
vinoth@kafka-changelog-cluster:~$ ls -R /var/kafka-spool/data | grep  -e ".log" 
| wc -l
58774
{code}

> Kafka Broker keeps file handles open for all log files (even if its not 
> written to/read from)
> -
>
> Key: KAFKA-2580
> URL: https://issues.apache.org/jira/browse/KAFKA-2580
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Vinoth Chandar
>
> We noticed this in one of our clusters where we stage logs for a longer 
> amount of time. It appears that the Kafka broker keeps file handles open even 
> for non active (not written to or read from) files. (in fact, there are some 
> threads going back to 2013 
> http://grokbase.com/t/kafka/users/132p65qwcn/keeping-logs-forever) 
> Needless to say, this is a problem and forces us to either artificially bump 
> up ulimit (its already at 100K) or expand the cluster (even if we have 
> sufficient IO and everything). 
> Filing this ticket, since I could find anything similar. Very interested to 
> know if there are plans to address this (given how Samza's changelog topic is 
> meant to be a persistent large state use case).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2580) Kafka Broker keeps file handles open for all log files (even if its not written to/read from)

2015-09-25 Thread Vinoth Chandar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908910#comment-14908910
 ] 

Vinoth Chandar commented on KAFKA-2580:
---

Thanks for jumping in  [~guozhang] .we have 256MB segment sizes and 100K 
descriptors.. 

>> Adding a feature that closes inactive segments' open file handlers and 
>> re-open them upon being read / written again is possible, but would be 
>> tricky.

Can you please elaborate? Looks straightforward to me from the outside :) 


> Kafka Broker keeps file handles open for all log files (even if its not 
> written to/read from)
> -
>
> Key: KAFKA-2580
> URL: https://issues.apache.org/jira/browse/KAFKA-2580
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Vinoth Chandar
>
> We noticed this in one of our clusters where we stage logs for a longer 
> amount of time. It appears that the Kafka broker keeps file handles open even 
> for non active (not written to or read from) files. (in fact, there are some 
> threads going back to 2013 
> http://grokbase.com/t/kafka/users/132p65qwcn/keeping-logs-forever) 
> Needless to say, this is a problem and forces us to either artificially bump 
> up ulimit (its already at 100K) or expand the cluster (even if we have 
> sufficient IO and everything). 
> Filing this ticket, since I could find anything similar. Very interested to 
> know if there are plans to address this (given how Samza's changelog topic is 
> meant to be a persistent large state use case).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2580) Kafka Broker keeps file handles open for all log files (even if its not written to/read from)

2015-09-25 Thread Vinoth Chandar (JIRA)
Vinoth Chandar created KAFKA-2580:
-

 Summary: Kafka Broker keeps file handles open for all log files 
(even if its not written to/read from)
 Key: KAFKA-2580
 URL: https://issues.apache.org/jira/browse/KAFKA-2580
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2.1
Reporter: Vinoth Chandar


We noticed this in one of our clusters where we stage logs for a longer amount 
of time. It appears that the Kafka broker keeps file handles open even for non 
active (not written to or read from) files. (in fact, there are some threads 
going back to 2013 
http://grokbase.com/t/kafka/users/132p65qwcn/keeping-logs-forever) 

Needless to say, this is a problem and forces us to either artificially bump up 
ulimit (its already at 100K) or expand the cluster (even if we have sufficient 
IO and everything). 

Filing this ticket, since I could find anything similar. Very interested to 
know if there are plans to address this (given how Samza's changelog topic is 
meant to be a persistent large state use case).  




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)