[jira] [Commented] (KAFKA-9338) Incremental fetch sessions do not maintain or use leader epoch for fencing purposes

2020-01-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016601#comment-17016601
 ] 

ASF GitHub Bot commented on KAFKA-9338:
---

hachikuji commented on pull request #7970: KAFKA-9338; Fetch session should 
cache request leader epoch
URL: https://github.com/apache/kafka/pull/7970
 
 
   Since the leader epoch was not maintained in the fetch session cache, no 
validation would be done except for the initial (full) fetch request. This 
patch adds the leader epoch to the session cache and addresses the testing gaps.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incremental fetch sessions do not maintain or use leader epoch for fencing 
> purposes
> ---
>
> Key: KAFKA-9338
> URL: https://issues.apache.org/jira/browse/KAFKA-9338
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0
>Reporter: Lucas Bradstreet
>Assignee: Jason Gustafson
>Priority: Major
>
> KIP-320 adds the ability to fence replicas by detecting stale leader epochs 
> from followers, and helping consumers handle unclean truncation.
> Unfortunately the incremental fetch session handling does not maintain or use 
> the leader epoch in the fetch session cache. As a result, it does not appear 
> that the leader epoch is used for fencing a majority of the time. I'm not 
> sure if this is only the case after incremental fetch sessions are 
> established - it may be the case that the first "full" fetch session is safe.
> Optional.empty is returned for the FetchRequest.PartitionData here:
> [https://github.com/apache/kafka/blob/a4cbdc6a7b3140ccbcd0e2339e28c048b434974e/core/src/main/scala/kafka/server/FetchSession.scala#L111]
> I believe this affects brokers from 2.1.0 when fencing was improved on the 
> replica fetcher side, and 2.3.0 and above for consumers, which is when client 
> side truncation detection was added on the consumer side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9410) Make groupId Optional in KafkaConsumer

2020-01-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016553#comment-17016553
 ] 

ASF GitHub Bot commented on KAFKA-9410:
---

hachikuji commented on pull request #7943: KAFKA-9410: Make groupId Optional in 
KafkaConsumer
URL: https://github.com/apache/kafka/pull/7943
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make groupId Optional in KafkaConsumer
> --
>
> Key: KAFKA-9410
> URL: https://issues.apache.org/jira/browse/KAFKA-9410
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: David Mollitor
>Priority: Minor
>
> ... because it is.
>  
> [https://docs.oracle.com/javase/8/docs/api/java/util/Optional.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7317) Use collections subscription for main consumer to reduce metadata

2020-01-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016522#comment-17016522
 ] 

ASF GitHub Bot commented on KAFKA-7317:
---

ableegoldman commented on pull request #7969: KAFKA-7317: Use collections 
subscription for main consumer to reduce metadata
URL: https://github.com/apache/kafka/pull/7969
 
 
   Also addresses [KAFKA-8821](https://issues.apache.org/jira/browse/KAFKA-8821)
   
   Note that we still have to fall back to using pattern subscription if the 
user has added any regex-based source nodes to the topology. Includes some 
minor cleanup on the side
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use collections subscription for main consumer to reduce metadata
> -
>
> Key: KAFKA-7317
> URL: https://issues.apache.org/jira/browse/KAFKA-7317
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Sophie Blee-Goldman
>Priority: Major
>
> In KAFKA-4633 we switched from "collection subscription" to "pattern 
> subscription" for `Consumer#subscribe()` to avoid triggering auto topic 
> creating on the broker. In KAFKA-5291, the metadata request was extended to 
> overwrite the broker config within the request itself. However, this feature 
> is only used in `KafkaAdminClient`. KAFKA-7320 adds this feature for the 
> consumer client, too.
> This ticket proposes to use the new feature within Kafka Streams to allow the 
> usage of collection based subscription in consumer and admit clients to 
> reduce the metadata response size than can be very large for large number of 
> partitions in the cluster.
> Note, that Streams need to be able to distinguish if it connects to older 
> brokers that do not support the new metadata request and still use pattern 
> subscription for this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9259) suppress() for windowed-Serdes does not work with default serdes

2020-01-15 Thread John Roesler (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016489#comment-17016489
 ] 

John Roesler commented on KAFKA-9259:
-

Yep! That sounds right.

> suppress() for windowed-Serdes does not work with default serdes
> 
>
> Key: KAFKA-9259
> URL: https://issues.apache.org/jira/browse/KAFKA-9259
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.0
>Reporter: Matthias J. Sax
>Assignee: Omkar Mestry
>Priority: Major
>  Labels: newbie
>
> The suppress() operator either inherits serdes from its upstream operator or 
> falls back to default serdes from the config.
> If the upstream operator is an windowed aggregation, the window-aggregation 
> operator wraps the user passed-in serde with a window-serde and pushed it 
> into suppress() – however, if default serdes are used, the window-aggregation 
> operator cannot push anything into suppress(). At runtime, it just creates a 
> default serde and wraps it according. For this case, suppress() also falls 
> back to default serdes; however, it does not wrap the serde and thus a 
> ClassCastException is thrown when the serde is used later.
> suppress() is already aware if the upstream aggregation is time/session 
> windowed or not and thus should use this information to wrap default serdes 
> accordingly.
> The current workaround for windowed-suppress is to overwrite the default 
> serde upstream to suppress(), such that suppress() inherits serdes and does 
> not fall back to default serdes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9338) Incremental fetch sessions do not maintain or use leader epoch for fencing purposes

2020-01-15 Thread Jason Gustafson (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016488#comment-17016488
 ] 

Jason Gustafson commented on KAFKA-9338:


[~lucasbradstreet] Thanks, good find.

> Incremental fetch sessions do not maintain or use leader epoch for fencing 
> purposes
> ---
>
> Key: KAFKA-9338
> URL: https://issues.apache.org/jira/browse/KAFKA-9338
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0
>Reporter: Lucas Bradstreet
>Assignee: Jason Gustafson
>Priority: Major
>
> KIP-320 adds the ability to fence replicas by detecting stale leader epochs 
> from followers, and helping consumers handle unclean truncation.
> Unfortunately the incremental fetch session handling does not maintain or use 
> the leader epoch in the fetch session cache. As a result, it does not appear 
> that the leader epoch is used for fencing a majority of the time. I'm not 
> sure if this is only the case after incremental fetch sessions are 
> established - it may be the case that the first "full" fetch session is safe.
> Optional.empty is returned for the FetchRequest.PartitionData here:
> [https://github.com/apache/kafka/blob/a4cbdc6a7b3140ccbcd0e2339e28c048b434974e/core/src/main/scala/kafka/server/FetchSession.scala#L111]
> I believe this affects brokers from 2.1.0 when fencing was improved on the 
> replica fetcher side, and 2.3.0 and above for consumers, which is when client 
> side truncation detection was added on the consumer side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9439) Add more public API tests for KafkaProducer

2020-01-15 Thread Xiang Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016479#comment-17016479
 ] 

Xiang Zhang commented on KAFKA-9439:


I am interested in this, as a newbie, could I get some advice on this ? thanks

> Add more public API tests for KafkaProducer
> ---
>
> Key: KAFKA-9439
> URL: https://issues.apache.org/jira/browse/KAFKA-9439
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Priority: Major
>  Labels: newbie
>
> While working on KIP-447, we realized a lack of test coverage on the 
> KafkaProducer public APIs. For example, `commitTransaction` and 
> `sendOffsetsToTransaction` are not even called in the 
> `KafkaProducerTest.java` and the code coverage is only 75%. 
> Adding more unit tests here will be pretty valuable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9338) Incremental fetch sessions do not maintain or use leader epoch for fencing purposes

2020-01-15 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reassigned KAFKA-9338:
--

Assignee: Jason Gustafson

> Incremental fetch sessions do not maintain or use leader epoch for fencing 
> purposes
> ---
>
> Key: KAFKA-9338
> URL: https://issues.apache.org/jira/browse/KAFKA-9338
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0
>Reporter: Lucas Bradstreet
>Assignee: Jason Gustafson
>Priority: Major
>
> KIP-320 adds the ability to fence replicas by detecting stale leader epochs 
> from followers, and helping consumers handle unclean truncation.
> Unfortunately the incremental fetch session handling does not maintain or use 
> the leader epoch in the fetch session cache. As a result, it does not appear 
> that the leader epoch is used for fencing a majority of the time. I'm not 
> sure if this is only the case after incremental fetch sessions are 
> established - it may be the case that the first "full" fetch session is safe.
> Optional.empty is returned for the FetchRequest.PartitionData here:
> [https://github.com/apache/kafka/blob/a4cbdc6a7b3140ccbcd0e2339e28c048b434974e/core/src/main/scala/kafka/server/FetchSession.scala#L111]
> I believe this affects brokers from 2.1.0 when fencing was improved on the 
> replica fetcher side, and 2.3.0 and above for consumers, which is when client 
> side truncation detection was added on the consumer side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9323) Refactor Streams' upgrade system tests

2020-01-15 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016438#comment-17016438
 ] 

Sophie Blee-Goldman commented on KAFKA-9323:


I can help you figure out the upgrade matrix though. Might need some input on 
the much much older versions, which may not be compatible for upgrades at all, 
but here are the general rules (the first matching upgrade path is the correct 
one):
 # Cooperative upgrade: an upgrade from EAGER to COOPERATIVE rebalancing. The 
cooperative protocol was introduced in 2.4 and turned on by default, so this 
includes upgrades where from_version < 2.4 and to_version >= 2.4 
 # Version probing upgrade: an upgrade across a metadata version bump (see 
below), where from_version >= 2.0 (version probing was introduced in 2.0)
 # Metadata upgrade: an upgrade across a metadata version bump, where 
from_version < 2.0
 # Simple upgrade: normal upgrade (no change in rebalance protocol or metadata 
version)

The Streams versions corresponding to each metadata version so far are:
 # --> 0.10.0
 # --> 0.10.1, 0.10.2, 0.11.0, 2.0, 1.1
 # --> 2.0
 # --> 2.1, 2.2, 2.3
 # --> 2.4

> Refactor Streams'  upgrade system tests
> ---
>
> Key: KAFKA-9323
> URL: https://issues.apache.org/jira/browse/KAFKA-9323
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Assignee: Nikolay Izhikov
>Priority: Major
>
> With the introduction of version probing in 2.0 and cooperative rebalancing 
> in 2.4, the specific upgrade path depends heavily on the to & from version. 
> This can be a complex operation, and we should make sure to test a realistic 
> upgrade scenario across all possible combinations. The current system tests 
> have gaps however, which have allowed bugs in the upgrade path to slip by 
> unnoticed for several versions. 
> Our current system tests include a "simple upgrade downgrade" test, a 
> metadata upgrade test, a version probing test, and a cooperative upgrade 
> test. This has a few drawbacks and current issues:
> a) only the version probing test tests "forwards compatibility" (upgrade from 
> latest to future version)
> b) nothing tests version probing "backwards compatibility" (upgrade from 
> older version to latest), except:
> c) the cooperative rebalancing test actually happens to involve a version 
> probing step, and so coincidentally DOES test VP (but only starting with 2.4)
> d) each test itself tries to test the upgrade across different versions, 
> meaning there may be overlap and/or unnecessary tests. For example, currently 
> both the metadata_upgrade and cooperative_upgrade tests will test the upgrade 
> of 1.0 -> 2.4
> e) worse, a number of (to, from) pairs are not tested according to the 
> correct upgrade path at all, which has lead to easily reproducible bugs 
> slipping past for several versions.
> f) we have a test_simple_upgrade_downgrade test which does not actually do a 
> downgrade, and for some reason only tests upgrading within the range [0.10.1 
> - 1.1]
> g) as new versions are released, it is unclear to those not directly involved 
> in these tests and/or projects whether and what needs to be updated (eg 
> should this new version be added to the cooperative test? should the old 
> version be aded to the metadata test?)
> We should definitely fill in the testing gap here, but how to do so is of 
> course up for discussion.
> I would propose to refactor the upgrade tests, and rather than maintain 
> different lists of versions to pass as input to each different test, we 
> should have a single matrix that we update with each new version that 
> specifies which upgrade path that version combination actually requires. We 
> can then loop through each version combination and test only the actual 
> upgrade path that users would actually need to follow. This way we can be 
> sure we are not missing anything, as each and every possible upgrade would be 
> tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-9323) Refactor Streams' upgrade system tests

2020-01-15 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016420#comment-17016420
 ] 

Sophie Blee-Goldman edited comment on KAFKA-9323 at 1/16/20 12:58 AM:
--

Hey [~nizhikov], thanks for picking this up! The relevant files to this are 
streams_cooperative_rebalance_upgrade_test.py and streams_upgrade_test.py

We don't have this upgrade matrix written down anywhere, at the moment this 
information is pretty scattered around the system tests and the upgrade guide. 
This makes it difficult for maintainers to understand and update the tests 
correctly, and difficult for users to discover how to upgrade between specific 
versions. We should really add this to the docs in addition to the system 
tests... 

Besides compiling this matrix, we definitely need to add a new version probing 
test that tests upgrading TO the current version. And obviously, we should only 
run this test for upgrades that actually require version probing, but _don't_ 
require a cooperative upgrade (since that will include VP) – note this means we 
can probably just copy over much of the code from the first half of the 
cooperative test. I'd personally want to call this new test 
test_version_probing_upgrade and then rename the existing VP test to 
test_future_version_probing_upgrade (will refer to these like this going 
forward)

Then, we should refactor the tests to leverage a single source of truth (the 
upgrade matrix) to determine which upgrade path should be followed for a given 
to/from version. Each version combination should be run by _exactly one_ system 
test.

I think how exactly to go about this is up for some discussion (and I'm sure 
[~mjsax] has strong opinions on it :P ) but I would propose to have a single 
test_upgrade system test parametrized by all versions (with to > from) that 
just calls the appropriate upgrade path by looking it up in the upgrade matrix. 
This way it can share all the setup, etc and the actual individual upgrades can 
isolate and focus on the specific upgrade path it's testing. Each entry would 
point to one of the following functions, which contain just the upgrade logic 
extracted from the corresponding system test. 
 * existing test_simple_upgrade_downgrade -> do_simple_upgrade(to, from):
 * cooperative_rebalance_upgrade_test -> do_cooperative_upgrade(to, from):
 * test_metadata_upgrade -> do_metadata_upgrade(to, from):
 * test_version_probing_upgrade (the new one, not the "future" one) -> 
do_version_probing_upgrade(to, from):

We can list all the "rules" for determining which upgrade path above the 
upgrade matrix, so maintainers will only need to look at one place to update as 
new versions come out. This way we are guaranteed to test all possible 
upgrades, and do so according to the correct upgrade path. We also make it 
_much_ easier and faster for developers working on new features to add an 
upgrade path to the system tests, and be sure it will be followed (when 
appropriate)


was (Author: ableegoldman):
Hey [~nizhikov], thanks for picking this up! The relevant files to this are 
streams_cooperative_rebalance_upgrade_test.py and streams_upgrade_test.py

We don't have this upgrade matrix written down anywhere, at the moment this 
information is pretty scattered around the system tests and the upgrade guide. 
This makes it difficult for maintainers to understand and update the tests 
correctly, and difficult for users to discover how to upgrade between specific 
versions. We should really add this to the docs in addition to the system 
tests... 

 

Besides compiling this matrix, we definitely need to add a new version probing 
test that tests upgrading TO the current version. And obviously, we should only 
run this test for upgrades that actually require version probing, but _don't_ 
require a cooperative upgrade (since that will include VP) – note this means we 
can probably just copy over much of the code from the first half of the 
cooperative test. I'd personally want to call this new test 
test_version_probing_upgrade and then rename the existing VP test to 
test_future_version_probing_upgrade (will refer to these like this going 
forward)

Then, we should refactor the tests to leverage a single source of truth (the 
upgrade matrix) to determine which upgrade path should be followed for a given 
to/from version. Each version combination should be run by _exactly one_ system 
test.

I think how exactly to go about this is up for some discussion (and I'm sure 
[~mjsax] has strong opinions on it :P ) but I would propose to have a single 
test_upgrade system test parametrized by all versions (with to > from) that 
just calls the appropriate upgrade path by looking it up in the upgrade matrix. 
This way it can share all the setup, etc and the actual individual upgrades can 
isolate and focus on the specific upgrade path it's testing. Each entry 

[jira] [Commented] (KAFKA-9323) Refactor Streams' upgrade system tests

2020-01-15 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016420#comment-17016420
 ] 

Sophie Blee-Goldman commented on KAFKA-9323:


Hey [~nizhikov], thanks for picking this up! The relevant files to this are 
streams_cooperative_rebalance_upgrade_test.py and streams_upgrade_test.py

We don't have this upgrade matrix written down anywhere, at the moment this 
information is pretty scattered around the system tests and the upgrade guide. 
This makes it difficult for maintainers to understand and update the tests 
correctly, and difficult for users to discover how to upgrade between specific 
versions. We should really add this to the docs in addition to the system 
tests... 

 

Besides compiling this matrix, we definitely need to add a new version probing 
test that tests upgrading TO the current version. And obviously, we should only 
run this test for upgrades that actually require version probing, but _don't_ 
require a cooperative upgrade (since that will include VP) – note this means we 
can probably just copy over much of the code from the first half of the 
cooperative test. I'd personally want to call this new test 
test_version_probing_upgrade and then rename the existing VP test to 
test_future_version_probing_upgrade (will refer to these like this going 
forward)

Then, we should refactor the tests to leverage a single source of truth (the 
upgrade matrix) to determine which upgrade path should be followed for a given 
to/from version. Each version combination should be run by _exactly one_ system 
test.

I think how exactly to go about this is up for some discussion (and I'm sure 
[~mjsax] has strong opinions on it :P ) but I would propose to have a single 
test_upgrade system test parametrized by all versions (with to > from) that 
just calls the appropriate upgrade path by looking it up in the upgrade matrix. 
This way it can share all the setup, etc and the actual individual upgrades can 
isolate and focus on the specific upgrade path it's testing. Each entry would 
point to one of the following functions, which contain just the upgrade logic 
extracted from the corresponding system test. 
 * existing test_simple_upgrade_downgrade -> do_simple_upgrade(to, from):
 * cooperative_rebalance_upgrade_test -> do_cooperative_upgrade(to, from):
 * test_metadata_upgrade -> do_metadata_upgrade(to, from):
 * test_version_probing_upgrade (the new one, not the "future" one) -> 
do_version_probing_upgrade(to, from):

We can list all the "rules" for determining which upgrade path above the 
upgrade matrix, so maintainers will only need to look at one place to update as 
new versions come out. This way we are guaranteed to test all possible 
upgrades, and do so according to the correct upgrade path. We also make it 
_much_ easier and faster for developers working on new features to add an 
upgrade path to the system tests, and be sure it will be followed (when 
appropriate)

> Refactor Streams'  upgrade system tests
> ---
>
> Key: KAFKA-9323
> URL: https://issues.apache.org/jira/browse/KAFKA-9323
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Assignee: Nikolay Izhikov
>Priority: Major
>
> With the introduction of version probing in 2.0 and cooperative rebalancing 
> in 2.4, the specific upgrade path depends heavily on the to & from version. 
> This can be a complex operation, and we should make sure to test a realistic 
> upgrade scenario across all possible combinations. The current system tests 
> have gaps however, which have allowed bugs in the upgrade path to slip by 
> unnoticed for several versions. 
> Our current system tests include a "simple upgrade downgrade" test, a 
> metadata upgrade test, a version probing test, and a cooperative upgrade 
> test. This has a few drawbacks and current issues:
> a) only the version probing test tests "forwards compatibility" (upgrade from 
> latest to future version)
> b) nothing tests version probing "backwards compatibility" (upgrade from 
> older version to latest), except:
> c) the cooperative rebalancing test actually happens to involve a version 
> probing step, and so coincidentally DOES test VP (but only starting with 2.4)
> d) each test itself tries to test the upgrade across different versions, 
> meaning there may be overlap and/or unnecessary tests. For example, currently 
> both the metadata_upgrade and cooperative_upgrade tests will test the upgrade 
> of 1.0 -> 2.4
> e) worse, a number of (to, from) pairs are not tested according to the 
> correct upgrade path at all, which has lead to easily reproducible bugs 
> slipping past for several versions.
> f) we have a test_simple_upgrade_downgrade test which does not actually do a 
> downgrade, and for some 

[jira] [Updated] (KAFKA-9323) Refactor Streams' upgrade system tests

2020-01-15 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9323:
---
Description: 
With the introduction of version probing in 2.0 and cooperative rebalancing in 
2.4, the specific upgrade path depends heavily on the to & from version. This 
can be a complex operation, and we should make sure to test a realistic upgrade 
scenario across all possible combinations. The current system tests have gaps 
however, which have allowed bugs in the upgrade path to slip by unnoticed for 
several versions. 

Our current system tests include a "simple upgrade downgrade" test, a metadata 
upgrade test, a version probing test, and a cooperative upgrade test. This has 
a few drawbacks and current issues:

a) only the version probing test tests "forwards compatibility" (upgrade from 
latest to future version)

b) nothing tests version probing "backwards compatibility" (upgrade from older 
version to latest), except:

c) the cooperative rebalancing test actually happens to involve a version 
probing step, and so coincidentally DOES test VP (but only starting with 2.4)

d) each test itself tries to test the upgrade across different versions, 
meaning there may be overlap and/or unnecessary tests. For example, currently 
both the metadata_upgrade and cooperative_upgrade tests will test the upgrade 
of 1.0 -> 2.4

e) worse, a number of (to, from) pairs are not tested according to the correct 
upgrade path at all, which has lead to easily reproducible bugs slipping past 
for several versions.

f) we have a test_simple_upgrade_downgrade test which does not actually do a 
downgrade, and for some reason only tests upgrading within the range [0.10.1 - 
1.1]

g) as new versions are released, it is unclear to those not directly involved 
in these tests and/or projects whether and what needs to be updated (eg should 
this new version be added to the cooperative test? should the old version be 
aded to the metadata test?)

We should definitely fill in the testing gap here, but how to do so is of 
course up for discussion.

I would propose to refactor the upgrade tests, and rather than maintain 
different lists of versions to pass as input to each different test, we should 
have a single matrix that we update with each new version that specifies which 
upgrade path that version combination actually requires. We can then loop 
through each version combination and test only the actual upgrade path that 
users would actually need to follow. This way we can be sure we are not missing 
anything, as each and every possible upgrade would be tested.

  was:
With the introduction of version probing in 2.0 and cooperative rebalancing in 
2.4, the specific upgrade path depends heavily on the to & from version. This 
can be a complex operation, and we should make sure to test a realistic upgrade 
scenario across all possible combinations. The current system tests have gaps 
however, which have allowed bugs in the upgrade path to slip by unnoticed for 
several versions. 

Our current system tests include a metadata upgrade test, a version probing 
test, and a cooperative upgrade test. This has a few drawbacks:

a) only the version probing test tests "forwards compatibility" (upgrade from 
latest to future version)

b) nothing tests version probing "backwards compatibility" (upgrade from older 
version to latest), except:

c) the cooperative rebalancing test actually happens to involve a version 
probing step, and so coincidentally DOES test VP (but only starting with 2.4)

d) each test itself tries to test the upgrade across different versions, 
meaning there may be overlap and/or unnecessary tests 

e) as new versions are released, it is unclear to those not directly involved 
in these tests and/or projects whether and what needs to be updated (eg should 
this new version be added to the cooperative test? should the old version be 
aded to the metadata test?)

We should definitely fill in the testing gap here, but how to do so is of 
course up for discussion.

I would propose to refactor the upgrade tests, and rather than maintain 
different lists of versions to pass as input to each different test, we should 
have a single matrix that we update with each new version that specifies which 
upgrade path that version combination actually requires. We can then loop 
through each version combination and test only the actual upgrade path that 
users would actually need to follow. This way we can be sure we are not missing 
anything, as each and every possible upgrade would be tested.


> Refactor Streams'  upgrade system tests
> ---
>
> Key: KAFKA-9323
> URL: https://issues.apache.org/jira/browse/KAFKA-9323
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Sophie Blee-Goldman
>   

[jira] [Created] (KAFKA-9442) Kafka connect REST API times out when trying to create a connector

2020-01-15 Thread Kedar Shenoy (Jira)
Kedar Shenoy created KAFKA-9442:
---

 Summary: Kafka connect REST API times out when trying to create a 
connector
 Key: KAFKA-9442
 URL: https://issues.apache.org/jira/browse/KAFKA-9442
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 2.3.1
 Environment: Kafka - Amazon MSK running version 2.3.1 with only 
Plaintext configured
Connect - Docker image version : latest (5.4.0) running on Amazon ECS cluster 
with tasks configured. I have attached a snapshot of the environment variables 
passed to the image. 
Reporter: Kedar Shenoy
 Attachments: 404.txt, Startup-log.txt, environment-variables.png, get 
call1.jpg, get-wrong-connector-name.png, post-s3.png

REST API for some resources results in a timeout and there is no exception in 
the logs. 

Summary of what was done so far:
 * Kafka connect docker image running as a ECS service with 2 tasks 
 ** Service starts up and creates the 3 topics as expected.
 ** There are a few warnings during start up but no errors - logs attached
 * REST API 
 ** Root resource / returns the connect version.
 ** GET call to /connectors returns a empty list as expected.
 ** wrong resource e.g. /connectors1 returns a 404 
 ** GET call with a wrong id results in a timeout e,g, /connectors/abc where 
abc does not exist
 ** POST call to create a connector config results in timeout
 ** POST call to create a connector with some wrong data returns a 400 e.g. 
providing different values for attribute "name" in top level vs config entry

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7591) Changelog retention period doesn't synchronise with window-store size

2020-01-15 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016395#comment-17016395
 ] 

Matthias J. Sax commented on KAFKA-7591:


SGTM.

> Changelog retention period doesn't synchronise with window-store size
> -
>
> Key: KAFKA-7591
> URL: https://issues.apache.org/jira/browse/KAFKA-7591
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jon Bates
>Priority: Major
>
> When a new windowed state store is created, the associated changelog topic's 
> `retention.ms` value is set to `window-size + 
> CHANGELOG_ADDITIONAL_RETENTION_MS`
> h3. Expected Behaviour
> If the window-size is updated, the changelog topic's `retention.ms` config 
> should be updated to reflect the new size
> h3. Actual Behaviour
> The changelog-topic's `retention.ms` setting is not amended, resulting in 
> possible loss of data upon application restart
>  
> n.b. Although it is easy to update changelog topic config, I logged this as 
> `major` due to the potential for data-loss for any user of Kafka-Streams who 
> may not be intimately aware of the relationship between a windowed store and 
> the changelog config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9441) Refactor commit logic

2020-01-15 Thread Matthias J. Sax (Jira)
Matthias J. Sax created KAFKA-9441:
--

 Summary: Refactor commit logic
 Key: KAFKA-9441
 URL: https://issues.apache.org/jira/browse/KAFKA-9441
 Project: Kafka
  Issue Type: Sub-task
  Components: streams
Reporter: Matthias J. Sax


Using producer per thread in combination with EOS, it's not possible any longer 
to commit individual task independently (as done currently).

We need to refactor StreamsThread, to commit all tasks at the same time for the 
new model.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9441) Refactor commit logic

2020-01-15 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-9441:
--

Assignee: Matthias J. Sax

> Refactor commit logic
> -
>
> Key: KAFKA-9441
> URL: https://issues.apache.org/jira/browse/KAFKA-9441
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Major
>
> Using producer per thread in combination with EOS, it's not possible any 
> longer to commit individual task independently (as done currently).
> We need to refactor StreamsThread, to commit all tasks at the same time for 
> the new model.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7591) Changelog retention period doesn't synchronise with window-store size

2020-01-15 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016372#comment-17016372
 ] 

Sophie Blee-Goldman commented on KAFKA-7591:


That's fair, I suppose if it _was_ the window size that changed then all bets 
are off anyway, while users who just changed the retention period will be able 
to benefit. We should just change it and log that we did, maybe in the meantime 
including a warning about what is/isn't a compatible change.

Tangentially, we should compile a list of compatible changes that can be made 
dynamically vs incompatible changes that require a reset, and document that 
somewhere. This seems like a common question/source of confusion

> Changelog retention period doesn't synchronise with window-store size
> -
>
> Key: KAFKA-7591
> URL: https://issues.apache.org/jira/browse/KAFKA-7591
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jon Bates
>Priority: Major
>
> When a new windowed state store is created, the associated changelog topic's 
> `retention.ms` value is set to `window-size + 
> CHANGELOG_ADDITIONAL_RETENTION_MS`
> h3. Expected Behaviour
> If the window-size is updated, the changelog topic's `retention.ms` config 
> should be updated to reflect the new size
> h3. Actual Behaviour
> The changelog-topic's `retention.ms` setting is not amended, resulting in 
> possible loss of data upon application restart
>  
> n.b. Although it is easy to update changelog topic config, I logged this as 
> `major` due to the potential for data-loss for any user of Kafka-Streams who 
> may not be intimately aware of the relationship between a windowed store and 
> the changelog config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-3596) Kafka Streams: Window expiration needs to consider more than event time

2020-01-15 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016366#comment-17016366
 ] 

Sophie Blee-Goldman commented on KAFKA-3596:


I see, maybe I misunderstood the original ticket or previous behavior then. I'm 
not sure if this was different for Streams in the past, but as John said we 
don't have "future" events, only late ones, with the grace period determining 
the tolerance. I'll close this as "not a problem"

> Kafka Streams: Window expiration needs to consider more than event time
> ---
>
> Key: KAFKA-3596
> URL: https://issues.apache.org/jira/browse/KAFKA-3596
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Henry Cai
>Priority: Minor
>  Labels: architecture
>
> Currently in Kafka Streams, the way the windows are expired in RocksDB is 
> triggered by new event insertion.  When a window is created at T0 with 10 
> minutes retention, when we saw a new record coming with event timestamp T0 + 
> 10 +1, we will expire that window (remove it) out of RocksDB.
> In the real world, it's very easy to see event coming with future timestamp 
> (or out-of-order events coming with big time gaps between events), this way 
> of retiring a window based on one event's event timestamp is dangerous.  I 
> think at least we need to consider both the event's event time and 
> server/stream time elapse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-3596) Kafka Streams: Window expiration needs to consider more than event time

2020-01-15 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman resolved KAFKA-3596.

Resolution: Not A Problem

> Kafka Streams: Window expiration needs to consider more than event time
> ---
>
> Key: KAFKA-3596
> URL: https://issues.apache.org/jira/browse/KAFKA-3596
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Henry Cai
>Priority: Minor
>  Labels: architecture
>
> Currently in Kafka Streams, the way the windows are expired in RocksDB is 
> triggered by new event insertion.  When a window is created at T0 with 10 
> minutes retention, when we saw a new record coming with event timestamp T0 + 
> 10 +1, we will expire that window (remove it) out of RocksDB.
> In the real world, it's very easy to see event coming with future timestamp 
> (or out-of-order events coming with big time gaps between events), this way 
> of retiring a window based on one event's event timestamp is dangerous.  I 
> think at least we need to consider both the event's event time and 
> server/stream time elapse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-3596) Kafka Streams: Window expiration needs to consider more than event time

2020-01-15 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016358#comment-17016358
 ] 

Matthias J. Sax commented on KAFKA-3596:


I don't think that the behavior changed in 2.1 release: Note that if a "future 
event" arrives (ie, we have a big jump between current stream-time and the 
timestamp of the new record), stream-time would jump ahead and thus "still-live 
windows" would be closed as their grace-period and retention time passes, and 
thus those windows would be dropped.

However, I think that this is correct behavior and changing how we compute 
stream-time or how we handle grace-period or retention time would be wrong. 
Hence, I would close this ticket as "not a problem" (not as "fixed", because 
there is nothing to be fixed IMHO).

To actually address the problem of wrongly timestamped input records (ie, that 
jump ahead in time) there is only two correct approaches:

1. If the records are indeed correctly timestamped, it implies a larger 
"unorder" and increasing the grace-period and retention time configs is the 
right way to address it
2. If the record timestamp is incorrect, it should be fixed by a custom 
TimestampExtractor that can compare current partition-time (that is passed into 
the extractor to the record timestamp and either return a different (corrected) 
timestamp, or filter out the record by returning -1 (ie, the extractor can 
enforce a bound on much time can be advanced by a single record before the 
"jump" is considered incorrect).

> Kafka Streams: Window expiration needs to consider more than event time
> ---
>
> Key: KAFKA-3596
> URL: https://issues.apache.org/jira/browse/KAFKA-3596
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Henry Cai
>Priority: Minor
>  Labels: architecture
>
> Currently in Kafka Streams, the way the windows are expired in RocksDB is 
> triggered by new event insertion.  When a window is created at T0 with 10 
> minutes retention, when we saw a new record coming with event timestamp T0 + 
> 10 +1, we will expire that window (remove it) out of RocksDB.
> In the real world, it's very easy to see event coming with future timestamp 
> (or out-of-order events coming with big time gaps between events), this way 
> of retiring a window based on one event's event timestamp is dangerous.  I 
> think at least we need to consider both the event's event time and 
> server/stream time elapse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7591) Changelog retention period doesn't synchronise with window-store size

2020-01-15 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016352#comment-17016352
 ] 

Matthias J. Sax commented on KAFKA-7591:


Personally, I don't see why updating the config and enabling this feature would 
make the situation worse and why we would need to block this feature? We can't 
detect the discussed case today, and hence, enabling this feature does not 
change the current "level of protection" (not better but also not worth), but 
would just make Kafka Streams more feature rich (ie, better).

I don't see what we gain by logging a WARN if the expected config does not 
match the actual config: if the reason is indeed a change of the window size, 
the program is already starting up and if we just log a WARN and proceed, the 
state will get "corrupted", so we don't really gain anything from my point of 
view.

> Changelog retention period doesn't synchronise with window-store size
> -
>
> Key: KAFKA-7591
> URL: https://issues.apache.org/jira/browse/KAFKA-7591
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jon Bates
>Priority: Major
>
> When a new windowed state store is created, the associated changelog topic's 
> `retention.ms` value is set to `window-size + 
> CHANGELOG_ADDITIONAL_RETENTION_MS`
> h3. Expected Behaviour
> If the window-size is updated, the changelog topic's `retention.ms` config 
> should be updated to reflect the new size
> h3. Actual Behaviour
> The changelog-topic's `retention.ms` setting is not amended, resulting in 
> possible loss of data upon application restart
>  
> n.b. Although it is easy to update changelog topic config, I logged this as 
> `major` due to the potential for data-loss for any user of Kafka-Streams who 
> may not be intimately aware of the relationship between a windowed store and 
> the changelog config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9358) Explicitly resign controller leadership and broker znode

2020-01-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016325#comment-17016325
 ] 

ASF GitHub Bot commented on KAFKA-9358:
---

mumrah commented on pull request #7968: KAFKA-9358 Explicitly unregister 
brokers and controller from ZK
URL: https://github.com/apache/kafka/pull/7968
 
 
   Instead of closing the zkClient immediately after controller shutdown, the 
controller now resigns on shutdown rather than relying on the zkClient shutdown 
to implicitly do so.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Explicitly resign controller leadership and broker znode
> 
>
> Key: KAFKA-9358
> URL: https://issues.apache.org/jira/browse/KAFKA-9358
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, core
>Reporter: Lucas Bradstreet
>Assignee: Lucas Bradstreet
>Priority: Major
>
> When shutting down the controller the broker shuts down the controller and 
> then closes the zookeeper connection. Closing the zookeeper connection 
> results in ephemeral nodes being removed. It is currently critical that the 
> zkClient is closed after the controller is shutdown, otherwise a controller 
> election will not occur if the broker being shutdown is currently the 
> controller.
> We should consider resigning leadership explicitly in the controller rather 
> than relying on the zookeeper client being closed. This would ensure that any 
> changes in shutdown order cannot lead to periods where a broker's controller 
> component is stopped while also maintaining leadership until the zkClient is 
> closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-6629) SegmentedCacheFunctionTest does not cover session window serdes

2020-01-15 Thread Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman reassigned KAFKA-6629:
--

Assignee: (was: Manasvi Gupta)

> SegmentedCacheFunctionTest does not cover session window serdes
> ---
>
> Key: KAFKA-6629
> URL: https://issues.apache.org/jira/browse/KAFKA-6629
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: newbie, unit-test
>
> The SegmentedCacheFunctionTest.java only covers time window serdes, but not 
> session window serdes. We should fill in this coverage gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6629) SegmentedCacheFunctionTest does not cover session window serdes

2020-01-15 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016306#comment-17016306
 ] 

Sophie Blee-Goldman commented on KAFKA-6629:


Unassigned since there has been no activity for over a year

> SegmentedCacheFunctionTest does not cover session window serdes
> ---
>
> Key: KAFKA-6629
> URL: https://issues.apache.org/jira/browse/KAFKA-6629
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Manasvi Gupta
>Priority: Major
>  Labels: newbie, unit-test
>
> The SegmentedCacheFunctionTest.java only covers time window serdes, but not 
> session window serdes. We should fill in this coverage gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-3596) Kafka Streams: Window expiration needs to consider more than event time

2020-01-15 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016296#comment-17016296
 ] 

Sophie Blee-Goldman commented on KAFKA-3596:


[~vvcephei] Can we close this? I think I agree with your assessment, and this 
is no longer an issue. Is 2.1 the correct fix version?

> Kafka Streams: Window expiration needs to consider more than event time
> ---
>
> Key: KAFKA-3596
> URL: https://issues.apache.org/jira/browse/KAFKA-3596
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Henry Cai
>Priority: Minor
>  Labels: architecture
>
> Currently in Kafka Streams, the way the windows are expired in RocksDB is 
> triggered by new event insertion.  When a window is created at T0 with 10 
> minutes retention, when we saw a new record coming with event timestamp T0 + 
> 10 +1, we will expire that window (remove it) out of RocksDB.
> In the real world, it's very easy to see event coming with future timestamp 
> (or out-of-order events coming with big time gaps between events), this way 
> of retiring a window based on one event's event timestamp is dangerous.  I 
> think at least we need to consider both the event's event time and 
> server/stream time elapse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7591) Changelog retention period doesn't synchronise with window-store size

2020-01-15 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016294#comment-17016294
 ] 

Sophie Blee-Goldman commented on KAFKA-7591:


Right, that is why I say if we can't distinguish between a window size change 
(incompatible) and a retention period change (fine) we should _not_ alter the 
configs, and just log a warning if there is a mismatch. Once KAFKA-8307 is 
fixed then we can consider allowing Streams to change existing topic configs. 
For now, I think it would still be valuable to check whether there are 
contradictory configs and raise it to the user if so.

> Changelog retention period doesn't synchronise with window-store size
> -
>
> Key: KAFKA-7591
> URL: https://issues.apache.org/jira/browse/KAFKA-7591
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jon Bates
>Priority: Major
>
> When a new windowed state store is created, the associated changelog topic's 
> `retention.ms` value is set to `window-size + 
> CHANGELOG_ADDITIONAL_RETENTION_MS`
> h3. Expected Behaviour
> If the window-size is updated, the changelog topic's `retention.ms` config 
> should be updated to reflect the new size
> h3. Actual Behaviour
> The changelog-topic's `retention.ms` setting is not amended, resulting in 
> possible loss of data upon application restart
>  
> n.b. Although it is easy to update changelog topic config, I logged this as 
> `major` due to the potential for data-loss for any user of Kafka-Streams who 
> may not be intimately aware of the relationship between a windowed store and 
> the changelog config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9440) Add ConsumerGroupCommand to delete static members

2020-01-15 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-9440:
--

 Summary: Add ConsumerGroupCommand to delete static members
 Key: KAFKA-9440
 URL: https://issues.apache.org/jira/browse/KAFKA-9440
 Project: Kafka
  Issue Type: Improvement
Reporter: Boyang Chen


We introduced a new AdminClient API removeMembersFromConsumerGroup in 2.4. It 
would be good to instantiate the API as part of the ConsumerGroupCommand for 
easy command line usage. 
This change requires a new KIP, and just posting out here in case anyone who 
uses static membership to pick it up, if they would like to use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6144) Allow serving interactive queries from in-sync Standbys

2020-01-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016273#comment-17016273
 ] 

ASF GitHub Bot commented on KAFKA-6144:
---

vvcephei commented on pull request #7962: KAFKA-6144: option to query restoring 
and standby
URL: https://github.com/apache/kafka/pull/7962
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow serving interactive queries from in-sync Standbys
> ---
>
> Key: KAFKA-6144
> URL: https://issues.apache.org/jira/browse/KAFKA-6144
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>Assignee: Navinder Brar
>Priority: Major
>  Labels: kip-535
> Attachments: image-2019-10-09-20-33-37-423.png, 
> image-2019-10-09-20-47-38-096.png
>
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal-breaker 
> for micro service use cases.
> One workaround is to allow stale data to be read from the state stores when 
> use case allows. Adding the use case from KAFKA-8994 as it is more 
> descriptive.
> "Consider the following scenario in a three node Streams cluster with node A, 
> node S and node R, executing a stateful sub-topology/topic group with 1 
> partition and `_num.standby.replicas=1_`  
>  * *t0*: A is the active instance owning the partition, B is the standby that 
> keeps replicating the A's state into its local disk, R just routes streams 
> IQs to active instance using StreamsMetadata
>  * *t1*: IQs pick node R as router, R forwards query to A, A responds back to 
> R which reverse forwards back the results.
>  * *t2:* Active A instance is killed and rebalance begins. IQs start failing 
> to A
>  * *t3*: Rebalance assignment happens and standby B is now promoted as active 
> instance. IQs continue to fail
>  * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last 
> commit position, IQs continue to fail
>  * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs 
> start succeeding again
>  
> Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can 
> take few seconds (~10 seconds based on defaults values). Depending on how 
> laggy the standby B was prior to A being killed, t4 can take few 
> seconds-minutes. 
> While this behavior favors consistency over availability at all times, the 
> long unavailability window might be undesirable for certain classes of 
> applications (e.g simple caches or dashboards). 
> This issue aims to also expose information about standby B to R, during each 
> rebalance such that the queries can be routed by an application to a standby 
> to serve stale reads, choosing availability over consistency."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9439) Add more public API tests for KafkaProducer

2020-01-15 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-9439:
--

 Summary: Add more public API tests for KafkaProducer
 Key: KAFKA-9439
 URL: https://issues.apache.org/jira/browse/KAFKA-9439
 Project: Kafka
  Issue Type: Improvement
Reporter: Boyang Chen


While working on KIP-447, we realized a lack of test coverage on the 
KafkaProducer public APIs. For example, `commitTransaction` and 
`sendOffsetsToTransaction` are not even called in the `KafkaProducerTest.java` 
and the code coverage is only 75%. 

Adding more unit tests here will be pretty valuable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-15 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016236#comment-17016236
 ] 

Raman Gupta commented on KAFKA-8803:


Looks like this is not just a dev-ops issue, from what I'm seeing here, it 
looks like *the EXACTLY_ONCE guarantee of the stream that failed to start with 
this error is being violated by Kafka*.

After restarting all my brokers I was expecting the stream to start where it 
left off and process all input messages. However, the stream offset was already 
past a message that had never been processed. This is a huge problem.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9323) Refactor Streams' upgrade system tests

2020-01-15 Thread Nikolay Izhikov (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016167#comment-17016167
 ] 

Nikolay Izhikov commented on KAFKA-9323:


Hello, [~ableegoldman]

Can you, please, confirm my analysis of the current source code of stream 
system tests:

We have 4 python scripts with the upgrade tests:
 * streams_broker_compatibility_test.py
 * streams_cooperative_rebalance_upgrade_test.py
 * streams_statis_memebership_test.py
 * streams_upgrade_test.py

> a) only the version probing test tests "forwards compatibility" (upgrade from 
> latest to future version)

streams_upgrade_test.py#test_version_probing_upgrade

> b) nothing tests version probing "backwards compatibility" (upgrade from 
> older version to latest), except:

a new test that should be created.

> c) the cooperative rebalancing test actually happens to involve a version 
> probing step, and so coincidentally DOES test VP (but only starting with 2.4)

stream_cooperative_rebalance_upgrade_test.py#test_upgrade_to_cooperative_rebalance

> we should have a single matrix that we update with each new version that 
> specifies which upgrade path that version combination actually requires

Do we have this matrix written somewhere?

As I can understand your proposal we should refactor all script from the 
mentioned files.
 Is it correct?

> Refactor Streams'  upgrade system tests
> ---
>
> Key: KAFKA-9323
> URL: https://issues.apache.org/jira/browse/KAFKA-9323
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Assignee: Nikolay Izhikov
>Priority: Major
>
> With the introduction of version probing in 2.0 and cooperative rebalancing 
> in 2.4, the specific upgrade path depends heavily on the to & from version. 
> This can be a complex operation, and we should make sure to test a realistic 
> upgrade scenario across all possible combinations. The current system tests 
> have gaps however, which have allowed bugs in the upgrade path to slip by 
> unnoticed for several versions. 
> Our current system tests include a metadata upgrade test, a version probing 
> test, and a cooperative upgrade test. This has a few drawbacks:
> a) only the version probing test tests "forwards compatibility" (upgrade from 
> latest to future version)
> b) nothing tests version probing "backwards compatibility" (upgrade from 
> older version to latest), except:
> c) the cooperative rebalancing test actually happens to involve a version 
> probing step, and so coincidentally DOES test VP (but only starting with 2.4)
> d) each test itself tries to test the upgrade across different versions, 
> meaning there may be overlap and/or unnecessary tests 
> e) as new versions are released, it is unclear to those not directly involved 
> in these tests and/or projects whether and what needs to be updated (eg 
> should this new version be added to the cooperative test? should the old 
> version be aded to the metadata test?)
> We should definitely fill in the testing gap here, but how to do so is of 
> course up for discussion.
> I would propose to refactor the upgrade tests, and rather than maintain 
> different lists of versions to pass as input to each different test, we 
> should have a single matrix that we update with each new version that 
> specifies which upgrade path that version combination actually requires. We 
> can then loop through each version combination and test only the actual 
> upgrade path that users would actually need to follow. This way we can be 
> sure we are not missing anything, as each and every possible upgrade would be 
> tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9438) Issue with mm2 active/active replication

2020-01-15 Thread Roman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman updated KAFKA-9438:
-
Description: 
Hi,

 

i am trying to configure the the active/active with new kafka 2.4.0 and MM2.

I have 2 kafka clusters of 3 kafkas, one in lets say BO and on in IN.

In each cluster there are 3 kafkas.

Topics are replicated properly so in BO i see
{quote}topics

in.topics
{quote}
 

in IN i see
{quote}topics

bo.topic
{quote}
 

That should be according to documentation.

 

But when I stop the replication process on one data center and start it up, the 
replication replicate the topics with the same prefix twice bo.bo.topics or 
in.in.topics depending on what connector i restart.

I have also blacklisted the topics but they are still replicating.

 

bo.properties file
{quote}name = in-bo
 #topics = .*
 topics.blacklist = "bo.*"
 #groups = .*
 connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
 tasks.max = 10

source.cluster.alias = in
 target.cluster.alias = bo
 source.cluster.bootstrap.servers = IN-kafka1:9092,IN-kafka2:9092,IN-kafka3:9092
 target.cluster.bootstrap.servers = BO-kafka1:9092,BO-kafka2:9092,bo-kafka39092

use ByteArrayConverter to ensure that records are not re-encoded
 key.converter = org.apache.kafka.connect.converters.ByteArrayConverter
 value.converter = org.apache.kafka.connect.converters.ByteArrayConverter

 
{quote}
in.properties
{quote}name = bo-in
 #topics = .*
 topics.blacklist = "in.*"
 #groups = .*
 connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
 tasks.max = 10

source.cluster.alias = bo
 target.cluster.alias = in
 target.cluster.bootstrap.servers = IN-kafka1:9092,IN-kafka2:9092,IN-kafka3:9092
 source.cluster.bootstrap.servers = BO-kafka1:9092,BO-kafka2:9092,BO-kafka3:9092

use ByteArrayConverter to ensure that records are not re-encoded
 key.converter = org.apache.kafka.connect.converters.ByteArrayConverter
 value.converter = org.apache.kafka.connect.converters.ByteArrayConverter
{quote}
 

 

 

  was:
Hi,

 

i am trying to configure the the active/active with new kafka 2.4.0 and MM2.

I have 2 kafka clusters of 3 kafkas, one in lets say BO and on in IN.

In each cluster there are 3 kafkas.

Topics are replicated properly so in BO i see
{quote}topics

in.topics
{quote}
 

in IN i see
{quote}topics

bo.topic
{quote}
 

That should be according to documentation.

 

But when I stop the replication process on one data center and start it up, the 
replication replicate the topics with the same prefix twice bo.bo.topics or 
in.in.topics depending on what connector i restart.

I have also blacklisted the topics but they are still replicating.

 

bo.properties file
{quote}name = in-bo
#topics = .*
topics.blacklist = "bo.*"
#groups = .*
connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
tasks.max = 10

source.cluster.alias = in
target.cluster.alias = bo
source.cluster.bootstrap.servers = IN-kafka1:9092,IN-kafka2:9092,IN-kafka3:9092
target.cluster.bootstrap.servers = BO-kafka1:9092,BO-kafka2:9092,bo-kafka39092


# use ByteArrayConverter to ensure that records are not re-encoded
key.converter = org.apache.kafka.connect.converters.ByteArrayConverter
value.converter = org.apache.kafka.connect.converters.ByteArrayConverter

 
{quote}
in.properties
{quote}name = bo-in
#topics = .*
topics.blacklist = "in.*"
#groups = .*
connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
tasks.max = 10

source.cluster.alias = bo
target.cluster.alias = in
target.cluster.bootstrap.servers = IN-kafka1:9092,IN-kafka2:9092,IN-kafka3:9092
source.cluster.bootstrap.servers = BO-kafka1:9092,BO-kafka2:9092,BO-kafka3:9092


# use ByteArrayConverter to ensure that records are not re-encoded
key.converter = org.apache.kafka.connect.converters.ByteArrayConverter
value.converter = org.apache.kafka.connect.converters.ByteArrayConverter
{quote}
 

 

 


> Issue with mm2 active/active replication
> 
>
> Key: KAFKA-9438
> URL: https://issues.apache.org/jira/browse/KAFKA-9438
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.4.0
>Reporter: Roman
>Priority: Minor
>
> Hi,
>  
> i am trying to configure the the active/active with new kafka 2.4.0 and MM2.
> I have 2 kafka clusters of 3 kafkas, one in lets say BO and on in IN.
> In each cluster there are 3 kafkas.
> Topics are replicated properly so in BO i see
> {quote}topics
> in.topics
> {quote}
>  
> in IN i see
> {quote}topics
> bo.topic
> {quote}
>  
> That should be according to documentation.
>  
> But when I stop the replication process on one data center and start it up, 
> the replication replicate the topics with the same prefix twice bo.bo.topics 
> or in.in.topics depending on what connector i restart.
> I have also blacklisted the 

[jira] [Created] (KAFKA-9438) Issue with mm2 active/active replication

2020-01-15 Thread Roman (Jira)
Roman created KAFKA-9438:


 Summary: Issue with mm2 active/active replication
 Key: KAFKA-9438
 URL: https://issues.apache.org/jira/browse/KAFKA-9438
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Affects Versions: 2.4.0
Reporter: Roman


Hi,

 

i am trying to configure the the active/active with new kafka 2.4.0 and MM2.

I have 2 kafka clusters of 3 kafkas, one in lets say BO and on in IN.

In each cluster there are 3 kafkas.

Topics are replicated properly so in BO i see
{quote}topics

in.topics
{quote}
 

in IN i see
{quote}topics

bo.topic
{quote}
 

That should be according to documentation.

 

But when I stop the replication process on one data center and start it up, the 
replication replicate the topics with the same prefix twice bo.bo.topics or 
in.in.topics depending on what connector i restart.

I have also blacklisted the topics but they are still replicating.

 

bo.properties file
{quote}name = in-bo
#topics = .*
topics.blacklist = "bo.*"
#groups = .*
connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
tasks.max = 10

source.cluster.alias = in
target.cluster.alias = bo
source.cluster.bootstrap.servers = IN-kafka1:9092,IN-kafka2:9092,IN-kafka3:9092
target.cluster.bootstrap.servers = BO-kafka1:9092,BO-kafka2:9092,bo-kafka39092


# use ByteArrayConverter to ensure that records are not re-encoded
key.converter = org.apache.kafka.connect.converters.ByteArrayConverter
value.converter = org.apache.kafka.connect.converters.ByteArrayConverter

 
{quote}
in.properties
{quote}name = bo-in
#topics = .*
topics.blacklist = "in.*"
#groups = .*
connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
tasks.max = 10

source.cluster.alias = bo
target.cluster.alias = in
target.cluster.bootstrap.servers = IN-kafka1:9092,IN-kafka2:9092,IN-kafka3:9092
source.cluster.bootstrap.servers = BO-kafka1:9092,BO-kafka2:9092,BO-kafka3:9092


# use ByteArrayConverter to ensure that records are not re-encoded
key.converter = org.apache.kafka.connect.converters.ByteArrayConverter
value.converter = org.apache.kafka.connect.converters.ByteArrayConverter
{quote}
 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-15 Thread Raman Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016128#comment-17016128
 ] 

Raman Gupta commented on KAFKA-8803:


This happened to me again today. Recycling 3 of 4 brokers fixed the issue.

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Boyang Chen
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9428) Expose standby information in KafkaStreams via queryMetadataForKey API

2020-01-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016094#comment-17016094
 ] 

ASF GitHub Bot commented on KAFKA-9428:
---

vvcephei commented on pull request #7960: KAFKA-9428: Add KeyQueryMetadata APIs 
to KafkaStreams
URL: https://github.com/apache/kafka/pull/7960
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Expose standby information in KafkaStreams via queryMetadataForKey API
> --
>
> Key: KAFKA-9428
> URL: https://issues.apache.org/jira/browse/KAFKA-9428
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8847) Deprecate and remove usage of supporting classes in kafka.security.auth

2020-01-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016093#comment-17016093
 ] 

ASF GitHub Bot commented on KAFKA-8847:
---

rajinisivaram commented on pull request #7966: KAFKA-8847; Deprecate and remove 
usage of supporting classes in kafka.security.auth
URL: https://github.com/apache/kafka/pull/7966
 
 
   Removes references to the old scala Acl classes from `kafka.security.auth` 
(Acl, Operation, ResourceType, Resource etc.) and replaces these with the Java 
API. Only the old `SimpleAclAuthorizer`, `AuthorizerWrapper` used to wrap 
legacy authorizer instances and tests using `SimpleAclAuthorizer` continue to 
use the old API. Deprecated the old scala API.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Deprecate and remove usage of supporting classes in kafka.security.auth
> ---
>
> Key: KAFKA-8847
> URL: https://issues.apache.org/jira/browse/KAFKA-8847
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.5.0
>
>
> Deprecate Acl, Resource etc. from `kafka.security.auth` and replace 
> references to these with the equivalent Java classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9323) Refactor Streams' upgrade system tests

2020-01-15 Thread Nikolay Izhikov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Izhikov reassigned KAFKA-9323:
--

Assignee: Nikolay Izhikov

> Refactor Streams'  upgrade system tests
> ---
>
> Key: KAFKA-9323
> URL: https://issues.apache.org/jira/browse/KAFKA-9323
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Assignee: Nikolay Izhikov
>Priority: Major
>
> With the introduction of version probing in 2.0 and cooperative rebalancing 
> in 2.4, the specific upgrade path depends heavily on the to & from version. 
> This can be a complex operation, and we should make sure to test a realistic 
> upgrade scenario across all possible combinations. The current system tests 
> have gaps however, which have allowed bugs in the upgrade path to slip by 
> unnoticed for several versions. 
> Our current system tests include a metadata upgrade test, a version probing 
> test, and a cooperative upgrade test. This has a few drawbacks:
> a) only the version probing test tests "forwards compatibility" (upgrade from 
> latest to future version)
> b) nothing tests version probing "backwards compatibility" (upgrade from 
> older version to latest), except:
> c) the cooperative rebalancing test actually happens to involve a version 
> probing step, and so coincidentally DOES test VP (but only starting with 2.4)
> d) each test itself tries to test the upgrade across different versions, 
> meaning there may be overlap and/or unnecessary tests 
> e) as new versions are released, it is unclear to those not directly involved 
> in these tests and/or projects whether and what needs to be updated (eg 
> should this new version be added to the cooperative test? should the old 
> version be aded to the metadata test?)
> We should definitely fill in the testing gap here, but how to do so is of 
> course up for discussion.
> I would propose to refactor the upgrade tests, and rather than maintain 
> different lists of versions to pass as input to each different test, we 
> should have a single matrix that we update with each new version that 
> specifies which upgrade path that version combination actually requires. We 
> can then loop through each version combination and test only the actual 
> upgrade path that users would actually need to follow. This way we can be 
> sure we are not missing anything, as each and every possible upgrade would be 
> tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9414) sink-task-metrics.sink-record-lag-max metric is not exposed

2020-01-15 Thread Aidar Makhmutov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aidar Makhmutov resolved KAFKA-9414.

Resolution: Not A Bug

It's not been implemented yet.

> sink-task-metrics.sink-record-lag-max metric is not exposed
> ---
>
> Key: KAFKA-9414
> URL: https://issues.apache.org/jira/browse/KAFKA-9414
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.2.0
>Reporter: Aidar Makhmutov
>Priority: Minor
>
> In group of metrics
> {{kafka.connect:type=sink-task-metrics,connector="\{connector}",task="\{task}"}}
> The following metric is not exposed by JMX (but present in documentation): 
> {{sink-record-lag-max}}
>  
> Details:
>  * Docker image: confluentinc/cp-kafka-connect:5.2.1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9437) KIP-559: Make the Kafka Protocol Friendlier with L7 Proxies

2020-01-15 Thread David Jacot (Jira)
David Jacot created KAFKA-9437:
--

 Summary: KIP-559: Make the Kafka Protocol Friendlier with L7 
Proxies
 Key: KAFKA-9437
 URL: https://issues.apache.org/jira/browse/KAFKA-9437
 Project: Kafka
  Issue Type: Improvement
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (KAFKA-9436) New Kafka Connect SMT for plainText => Struct(or Map)

2020-01-15 Thread whsoul (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

whsoul updated KAFKA-9436:
--
Comment: was deleted

(was: [https://github.com/apache/kafka/pull/7965])

> New Kafka Connect SMT for plainText => Struct(or Map)
> -
>
> Key: KAFKA-9436
> URL: https://issues.apache.org/jira/browse/KAFKA-9436
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: whsoul
>Priority: Major
>
> I'd like to parse and convert plain text rows to struct(or map) data, and 
> load into documented database such as mongoDB, elasticSearch, etc... with SMT
>  
> For example
>  
> plain text apache log
> {code:java}
> "111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS 
> /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\; 
> \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, 
> like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
> {code}
> SMT connect config with regular expression below can easily transform a plain 
> text to struct (or map) data.
>  
> {code:java}
> "transforms": "TimestampTopic, RegexTransform",
> "transforms.RegexTransform.type": 
> "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
> "transforms.RegexTransform.struct.field": "message",
> "transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) 
> \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) 
> (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""
> "transforms.RegexTransform.mapping": 
> "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
> {code}
>  
> I have PR about this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9436) New Kafka Connect SMT for plainText => Struct(or Map)

2020-01-15 Thread whsoul (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015833#comment-17015833
 ] 

whsoul commented on KAFKA-9436:
---

[https://github.com/apache/kafka/pull/7965]

> New Kafka Connect SMT for plainText => Struct(or Map)
> -
>
> Key: KAFKA-9436
> URL: https://issues.apache.org/jira/browse/KAFKA-9436
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: whsoul
>Priority: Major
>
> I'd like to parse and convert plain text rows to struct(or map) data, and 
> load into documented database such as mongoDB, elasticSearch, etc... with SMT
>  
> For example
>  
> plain text apache log
> {code:java}
> "111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS 
> /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\; 
> \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, 
> like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
> {code}
> SMT connect config with regular expression below can easily transform a plain 
> text to struct (or map) data.
>  
> {code:java}
> "transforms": "TimestampTopic, RegexTransform",
> "transforms.RegexTransform.type": 
> "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
> "transforms.RegexTransform.struct.field": "message",
> "transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) 
> \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) 
> (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""
> "transforms.RegexTransform.mapping": 
> "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
> {code}
>  
> I have PR about this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9436) New Kafka Connect SMT for plainText => Struct(or Map)

2020-01-15 Thread whsoul (Jira)
whsoul created KAFKA-9436:
-

 Summary: New Kafka Connect SMT for plainText => Struct(or Map)
 Key: KAFKA-9436
 URL: https://issues.apache.org/jira/browse/KAFKA-9436
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: whsoul


I'd like to parse and convert plain text rows to struct(or map) data, and load 
into documented database such as mongoDB, elasticSearch, etc... with SMT

 

For example

 

plain text apache log
{code:java}
"111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS 
/api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\; 
\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, 
like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
{code}
SMT connect config with regular expression below can easily transform a plain 
text to struct (or map) data.

 
{code:java}
"transforms": "TimestampTopic, RegexTransform",
"transforms.RegexTransform.type": 
"org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",

"transforms.RegexTransform.struct.field": "message",
"transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) 
\\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) 
(.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""

"transforms.RegexTransform.mapping": 
"IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
{code}
 

I have PR about this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9435) Replace DescribeLogDirs request/response with automated protocol

2020-01-15 Thread Tom Bentley (Jira)
Tom Bentley created KAFKA-9435:
--

 Summary: Replace DescribeLogDirs request/response with automated 
protocol
 Key: KAFKA-9435
 URL: https://issues.apache.org/jira/browse/KAFKA-9435
 Project: Kafka
  Issue Type: Sub-task
Reporter: Tom Bentley
Assignee: Tom Bentley






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9434) Replace AlterReplicaLogDirs request/response with automated protocol

2020-01-15 Thread Tom Bentley (Jira)
Tom Bentley created KAFKA-9434:
--

 Summary: Replace AlterReplicaLogDirs request/response with 
automated protocol
 Key: KAFKA-9434
 URL: https://issues.apache.org/jira/browse/KAFKA-9434
 Project: Kafka
  Issue Type: Sub-task
Reporter: Tom Bentley
Assignee: Tom Bentley






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9433) Replace AlterConfigs request/response with automated protocol

2020-01-15 Thread Tom Bentley (Jira)
Tom Bentley created KAFKA-9433:
--

 Summary: Replace AlterConfigs request/response with automated 
protocol
 Key: KAFKA-9433
 URL: https://issues.apache.org/jira/browse/KAFKA-9433
 Project: Kafka
  Issue Type: Sub-task
Reporter: Tom Bentley
Assignee: Tom Bentley






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9432) Replace DescribeConfigs request/response with automated protocol

2020-01-15 Thread Tom Bentley (Jira)
Tom Bentley created KAFKA-9432:
--

 Summary: Replace DescribeConfigs request/response with automated 
protocol
 Key: KAFKA-9432
 URL: https://issues.apache.org/jira/browse/KAFKA-9432
 Project: Kafka
  Issue Type: Sub-task
Reporter: Tom Bentley
Assignee: Tom Bentley






--
This message was sent by Atlassian Jira
(v8.3.4#803005)