[jira] [Commented] (KAFKA-9066) Kafka Connect JMX : source & sink task metrics missing for tasks in failed state

2020-02-24 Thread Selman Kayrancioglu (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044186#comment-17044186
 ] 

Selman Kayrancioglu commented on KAFKA-9066:


We're also experiencing this with version 2.3.0.

> Kafka Connect JMX : source & sink task metrics missing for tasks in failed 
> state
> 
>
> Key: KAFKA-9066
> URL: https://issues.apache.org/jira/browse/KAFKA-9066
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.1.1
>Reporter: Mikołaj Stefaniak
>Priority: Major
>
> h2. Overview
> Kafka Connect exposes various metrics via JMX. Those metrics can be exported 
> i.e. by _Prometheus JMX Exporter_ for further processing.
> One of crucial attributes is connector's *task status.*
> According to official Kafka docs, status is available as +status+ attribute 
> of following MBean:
> {quote}kafka.connect:type=connector-task-metrics,connector="\{connector}",task="\{task}"status
>  - The status of the connector task. One of 'unassigned', 'running', 
> 'paused', 'failed', or 'destroyed'.
> {quote}
> h2. Issue
> Generally +connector-task-metrics+ are exposed propery for tasks in +running+ 
> status but not exposed at all if task is +failed+.
> Failed Task *appears* properly with failed status when queried via *REST API*:
>  
> {code:java}
> $ curl -X GET -u 'user:pass' 
> http://kafka-connect.mydomain.com/connectors/customerconnector/status
> {"name":"customerconnector","connector":{"state":"RUNNING","worker_id":"kafka-connect.mydomain.com:8080"},"tasks":[{"id":0,"state":"FAILED","worker_id":"kafka-connect.mydomain.com:8080","trace":"org.apache.kafka.connect.errors.ConnectException:
>  Received DML 'DELETE FROM mysql.rds_sysinfo .."}],"type":"source"}
> $ {code}
>  
> Failed Task *doesn't appear* as bean with +connector-task-metrics+ type when 
> queried via *JMX*:
>  
> {code:java}
> $ echo "beans -d kafka.connect" | java -jar 
> target/jmxterm-1.1.0-SNAPSHOT-uber.jar -l localhost:8081 -n -v silent | grep 
> connector=customerconnector
> kafka.connect:connector=customerconnector,task=0,type=task-error-metricskafka.connect:connector=customerconnector,type=connector-metrics
> $
> {code}
> h2. Expected result
> It is expected, that bean with +connector-task-metrics+ type will appear also 
> for tasks that failed.
> Below is example of how beans are properly registered for tasks in Running 
> state:
>  
> {code:java}
> $ echo "get -b 
> kafka.connect:connector=sinkConsentSubscription-1000,task=0,type=connector-task-metrics
>  status" | java -jar target/jmxterm-1.1.0-SNAPSHOT-ube r.jar -l 
> localhost:8081 -n -v silent
> status = running;
> $
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9602) Incorrect close of producer instance during partition assignment

2020-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044162#comment-17044162
 ] 

ASF GitHub Bot commented on KAFKA-9602:
---

abbccdda commented on pull request #8166: (WIP) KAFKA-9602: Close the stream 
producer only in EOS
URL: https://github.com/apache/kafka/pull/8166
 
 
   This bug reproduces through the trunk stream test, the producer was closed 
unexpectedly when EOS is not turned on.
   
   Will work on adding unit test to guard this logic.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incorrect close of producer instance during partition assignment
> 
>
> Key: KAFKA-9602
> URL: https://issues.apache.org/jira/browse/KAFKA-9602
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> The new StreamProducer instance close doesn't distinguish between an 
> EOS/non-EOS shutdown. The StreamProducer should take care of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9602) Incorrect close of producer instance during partition assignment

2020-02-24 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-9602:
--

 Summary: Incorrect close of producer instance during partition 
assignment
 Key: KAFKA-9602
 URL: https://issues.apache.org/jira/browse/KAFKA-9602
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Boyang Chen
Assignee: Boyang Chen


The new StreamProducer instance close doesn't distinguish between an 
EOS/non-EOS shutdown. The StreamProducer should take care of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9601) Workers log raw connector configs, including values

2020-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044147#comment-17044147
 ] 

ASF GitHub Bot commented on KAFKA-9601:
---

C0urante commented on pull request #8165: KAFKA-9601: Stop logging raw 
connector config values
URL: https://github.com/apache/kafka/pull/8165
 
 
   [Jira](https://issues.apache.org/jira/browse/KAFKA-9601)
   
   whoopsie daisy
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Workers log raw connector configs, including values
> ---
>
> Key: KAFKA-9601
> URL: https://issues.apache.org/jira/browse/KAFKA-9601
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Critical
>
> [This line right 
> here|https://github.com/apache/kafka/blob/5359b2e3bc1cf13a301f32490a6630802afc4974/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConnector.java#L78]
>  logs all configs (key and value) for a connector, which is bad, since it can 
> lead to secrets (db credentials, cloud storage credentials, etc.) being 
> logged in plaintext.
> We can remove this line. Or change it to just log config keys. Or try to do 
> some super-fancy parsing that masks sensitive values. Well, hopefully not 
> that. That sounds like a lot of work.
> Affects all versions of Connect back through 0.10.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9601) Workers log raw connector configs, including values

2020-02-24 Thread Chris Egerton (Jira)
Chris Egerton created KAFKA-9601:


 Summary: Workers log raw connector configs, including values
 Key: KAFKA-9601
 URL: https://issues.apache.org/jira/browse/KAFKA-9601
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Chris Egerton
Assignee: Chris Egerton


[This line right 
here|https://github.com/apache/kafka/blob/5359b2e3bc1cf13a301f32490a6630802afc4974/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConnector.java#L78]
 logs all configs (key and value) for a connector, which is bad, since it can 
lead to secrets (db credentials, cloud storage credentials, etc.) being logged 
in plaintext.

We can remove this line. Or change it to just log config keys. Or try to do 
some super-fancy parsing that masks sensitive values. Well, hopefully not that. 
That sounds like a lot of work.

Affects all versions of Connect back through 0.10.1.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9600) EndTxn handler should check strict epoch equality

2020-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044133#comment-17044133
 ] 

ASF GitHub Bot commented on KAFKA-9600:
---

abbccdda commented on pull request #8164: KAFKA-9600: EndTxn should enforce 
strict epoch checking if from client
URL: https://github.com/apache/kafka/pull/8164
 
 
   This PR enhances the epoch checking logic for endTransaction call in 
TransactionCoordinator. Previously it relaxes the checking by allowing a 
producer epoch bump, which is error-prone since there is no reason to see a 
producer epoch bump from client.
   
   Since this is purely a server side bug which requires no client side change, 
we haven't added any integration test to verify this behavior yet.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> EndTxn handler should check strict epoch equality
> -
>
> Key: KAFKA-9600
> URL: https://issues.apache.org/jira/browse/KAFKA-9600
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Boyang Chen
>Priority: Major
>
> The EndTxn path in TransactionCoordinator is shared between direct calls to 
> EndTxn from the client and internal transaction abort logic. To support the 
> latter, the code is written to allow an epoch bump. However, if the client 
> bumps the epoch unexpectedly (e.g. due to a buggy implementation), then the 
> internal invariants are violated which results in a hanging transaction. 
> Specifically, the transaction is left in a pending state because the epoch 
> following append to the log does not match what we expect.
> To fix this, we should ensure that an EndTxn from the client checks for 
> strict epoch equality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-7711) Add a bounded flush() API to Kafka Producer

2020-02-24 Thread highluck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

highluck reassigned KAFKA-7711:
---

Assignee: (was: highluck)

> Add a bounded flush()  API to Kafka Producer
> 
>
> Key: KAFKA-7711
> URL: https://issues.apache.org/jira/browse/KAFKA-7711
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: kun du
>Priority: Minor
>  Labels: needs-kip
>
> Currently the call to Producer.flush() can be hang there for indeterminate 
> time.
> It is a good idea to add a bounded flush() API and timeout if producer is 
> unable to flush all the batch records in a limited time. In this way the 
> caller of flush() has a chance to decide what to do next instead of just wait 
> forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9455) Consider using TreeMap for in-memory stores of Streams

2020-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044128#comment-17044128
 ] 

ASF GitHub Bot commented on KAFKA-9455:
---

highluck commented on pull request #8163: KAFKA-9455; Consider using TreeMap 
for in-memory stores of Streams
URL: https://github.com/apache/kafka/pull/8163
 
 
   https://issues.apache.org/jira/browse/KAFKA-9455
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider using TreeMap for in-memory stores of Streams
> --
>
> Key: KAFKA-9455
> URL: https://issues.apache.org/jira/browse/KAFKA-9455
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: highluck
>Priority: Major
>  Labels: newbie++
>
> From [~ableegoldman]: It's worth noting that it might be a good idea to 
> switch to TreeMap for different reasons. Right now the ConcurrentSkipListMap 
> allows us to safely perform range queries without copying over the entire 
> keyset, but the performance on point queries seems to scale noticeably worse 
> with the number of unique keys. Point queries are used by aggregations while 
> range queries are used by windowed joins, but of course both are available 
> within the PAPI and for interactive queries so it's hard to say which we 
> should prefer. Maybe rather than make that tradeoff we should have one 
> version for efficient range queries (a "JoinWindowStore") and one for 
> efficient point queries ("AggWindowStore") - or something. I know we've had 
> similar thoughts for a different RocksDB store layout for Joins (although I 
> can't find that ticket anywhere..), it seems like the in-memory stores could 
> benefit from a special "Join" version as well cc/ Guozhang Wang
> Here are some random thoughts:
> 1. For kafka streams processing logic (i.e. without IQ), it's better to make 
> all processing logic relying on point queries rather than range queries. 
> Right now the only processor that use range queries are, as mentioned above, 
> windowed stream-stream joins. I think we should consider using a different 
> window implementation for this (and as a result also get rid of the 
> retainDuplicate flags) to refactor the windowed stream-stream join operation.
> 2. With 1), range queries would only be exposed as IQ. Depending on its usage 
> frequency I think it makes lots of sense to optimize for single-point queries.
> Of course, even without step 1) we should still consider using tree-map for 
> windowed in-memory stores to have a better scaling effect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9455) Consider using TreeMap for in-memory stores of Streams

2020-02-24 Thread highluck (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044126#comment-17044126
 ] 

highluck commented on KAFKA-9455:
-

[~guozhang]

thank you!

`JoinWindowStore` is my mistake

> Consider using TreeMap for in-memory stores of Streams
> --
>
> Key: KAFKA-9455
> URL: https://issues.apache.org/jira/browse/KAFKA-9455
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: highluck
>Priority: Major
>  Labels: newbie++
>
> From [~ableegoldman]: It's worth noting that it might be a good idea to 
> switch to TreeMap for different reasons. Right now the ConcurrentSkipListMap 
> allows us to safely perform range queries without copying over the entire 
> keyset, but the performance on point queries seems to scale noticeably worse 
> with the number of unique keys. Point queries are used by aggregations while 
> range queries are used by windowed joins, but of course both are available 
> within the PAPI and for interactive queries so it's hard to say which we 
> should prefer. Maybe rather than make that tradeoff we should have one 
> version for efficient range queries (a "JoinWindowStore") and one for 
> efficient point queries ("AggWindowStore") - or something. I know we've had 
> similar thoughts for a different RocksDB store layout for Joins (although I 
> can't find that ticket anywhere..), it seems like the in-memory stores could 
> benefit from a special "Join" version as well cc/ Guozhang Wang
> Here are some random thoughts:
> 1. For kafka streams processing logic (i.e. without IQ), it's better to make 
> all processing logic relying on point queries rather than range queries. 
> Right now the only processor that use range queries are, as mentioned above, 
> windowed stream-stream joins. I think we should consider using a different 
> window implementation for this (and as a result also get rid of the 
> retainDuplicate flags) to refactor the windowed stream-stream join operation.
> 2. With 1), range queries would only be exposed as IQ. Depending on its usage 
> frequency I think it makes lots of sense to optimize for single-point queries.
> Of course, even without step 1) we should still consider using tree-map for 
> windowed in-memory stores to have a better scaling effect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9598) RocksDB exception when grouping dynamically appearing topics into a KTable

2020-02-24 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044035#comment-17044035
 ] 

Guozhang Wang commented on KAFKA-9598:
--

I think it maybe related to some old bugs that we fixed in trunk, [~sergem] 
could you compile the latest trunk and run to see if it works?

> RocksDB exception when grouping dynamically appearing topics into a KTable 
> ---
>
> Key: KAFKA-9598
> URL: https://issues.apache.org/jira/browse/KAFKA-9598
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.2.0, 2.4.0
>Reporter: Sergey Menshikov
>Priority: Major
> Attachments: exception-details.txt
>
>
> A streams application consumes a number of topics via a whitelisted regex. 
> The topics appear dynamically, generated from dynamically appearing MongoDB 
> collections by debezium MongoDB source driver.
> The development is running on debezium docker images (Debezium 0.9 and 
> Debezium 1.0 -> Kafka 2.2.0 and 2.4.0), single instance of Kafka, Connect and 
> the streams consumer app.
> As the MongoDB driver provides only deltas of the changes, to collect full 
> record for each key, the code creates KTable which is then transformed into a 
> KStream for further joining with other KTables and Global KTables.
> The following piece of code results in the exception when a new topic is 
> added:
>  
> {code:java}
> Pattern tResultPattern =
>  
> Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}");
> KStream tResultsTempStream = builder.stream(tResultPattern, 
> Consumed.with(stringSerde, jsonSerde));
>  KTable tResultsTempTable = 
> tResultsTempStream.groupByKey(Grouped.with(stringSerde,jsonSerde))
>  .reduce((aggValue, newValue) -> mergeNodes(aggValue,newValue)); // 
> mergeNodes is a Json traverse/merger procedure
> KStream tResults =
>  tResultsTempTable.toStream();
>  
> {code}
> kconsumer_1 | Exception in thread "split-reader-client3-StreamThread-1" 
> org.apache.kafka.streams.errors.ProcessorStateException: Error opening store 
> KSTREAM-REDUCE-STATE-STORE-32 at location 
> /tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32
> ...
> kconsumer_1 | Caused by: org.rocksdb.RocksDBException: lock : 
> /tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32/LOCK: 
> No locks available
> Kstore 10_0 contains tr[0-9a-fA-F] 32 records, I checked.
>  
> more details about exception are in the attached file.
>  
> The exception is no longer present when I use an intermediate topic instead:
>  
>   
> {code:java}
> Pattern tResultPattern =
>  
> Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}");
> KStream tResultsTempStream = builder.stream(tResultPattern, 
> Consumed.with(stringSerde, jsonSerde));
>  
> tResultsTempStream.transform(trTransformer::new).to(config.getProperty("tr_intermediate_topic_name"),Produced.with(stringSerde,
>  jsonSerde)); // trTransformer adds topic name into value Json, in previous 
> snippet it was done in the pipeline after grouping/streaming
> KStream tResultsTempStream2 = 
> builder.stream(config.getProperty("tr_intermediate_topic_name"), 
> Consumed.with(stringSerde, jsonSerde));
>  KTable tResultsTempTable = 
> tResultsTempStream2.groupByKey(Grouped.with(stringSerde,jsonSerde))
>  .reduce((aggValue, newValue) -> mergeNodes(aggValue,newValue));
> KStream tResults =
>  tResultsTempTable.toStream();
> {code}
>  
>  
> If making KTable from multiple whitelisted topics is something that is 
> outside of scope of Kafka Streams, perhaps it would make sense to mention it 
> in the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9598) RocksDB exception when grouping dynamically appearing topics into a KTable

2020-02-24 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-9598:
-
Description: 
A streams application consumes a number of topics via a whitelisted regex. The 
topics appear dynamically, generated from dynamically appearing MongoDB 
collections by debezium MongoDB source driver.

The development is running on debezium docker images (Debezium 0.9 and Debezium 
1.0 -> Kafka 2.2.0 and 2.4.0), single instance of Kafka, Connect and the 
streams consumer app.

As the MongoDB driver provides only deltas of the changes, to collect full 
record for each key, the code creates KTable which is then transformed into a 
KStream for further joining with other KTables and Global KTables.

The following piece of code results in the exception when a new topic is added:

 
{code:java}
Pattern tResultPattern =
 
Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}");
KStream tResultsTempStream = builder.stream(tResultPattern, 
Consumed.with(stringSerde, jsonSerde));
 KTable tResultsTempTable = 
tResultsTempStream.groupByKey(Grouped.with(stringSerde,jsonSerde))
 .reduce((aggValue, newValue) -> mergeNodes(aggValue,newValue)); // mergeNodes 
is a Json traverse/merger procedure
KStream tResults =
 tResultsTempTable.toStream();
 
{code}
kconsumer_1 | Exception in thread "split-reader-client3-StreamThread-1" 
org.apache.kafka.streams.errors.ProcessorStateException: Error opening store 
KSTREAM-REDUCE-STATE-STORE-32 at location 
/tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32

...

kconsumer_1 | Caused by: org.rocksdb.RocksDBException: lock : 
/tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32/LOCK: No 
locks available

Kstore 10_0 contains tr[0-9a-fA-F] 32 records, I checked.
 
more details about exception are in the attached file.
 
The exception is no longer present when I use an intermediate topic instead:
 
  
{code:java}
Pattern tResultPattern =
 
Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}");
KStream tResultsTempStream = builder.stream(tResultPattern, 
Consumed.with(stringSerde, jsonSerde));
 
tResultsTempStream.transform(trTransformer::new).to(config.getProperty("tr_intermediate_topic_name"),Produced.with(stringSerde,
 jsonSerde)); // trTransformer adds topic name into value Json, in previous 
snippet it was done in the pipeline after grouping/streaming
KStream tResultsTempStream2 = 
builder.stream(config.getProperty("tr_intermediate_topic_name"), 
Consumed.with(stringSerde, jsonSerde));
 KTable tResultsTempTable = 
tResultsTempStream2.groupByKey(Grouped.with(stringSerde,jsonSerde))
 .reduce((aggValue, newValue) -> mergeNodes(aggValue,newValue));
KStream tResults =
 tResultsTempTable.toStream();
{code}
 

 

If making KTable from multiple whitelisted topics is something that is outside 
of scope of Kafka Streams, perhaps it would make sense to mention it in the 
docs.

  was:
A streams application consumes a number of topics via a whitelisted regex. The 
topics appear dynamically, generated from dynamically appearing MongoDB 
collections by debezium MongoDB source driver.

The development is running on debezium docker images (Debezium 0.9 and Debezium 
1.0 -> Kafka 2.2.0 and 2.4.0), single instance of Kafka, Connect and the 
streams consumer app.

As the MongoDB driver provides only deltas of the changes, to collect full 
record for each key, the code creates KTable which is then transformed into a 
KStream for further joining with other KTables and Global KTables.

The following piece of code results in the exception when a new topic is added:

 
{code:java}
Pattern tResultPattern =
 
Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}");
KStream tResultsTempStream = builder.stream(tResultPattern, 
Consumed.with(stringSerde, jsonSerde));
 KTable tResultsTempTable = 
tResultsTempStream.groupByKey(Grouped.with(stringSerde,jsonSerde))
 .reduce((aggValue, newValue) -> mergeNodes(aggValue,newValue)); // mergeNodes 
is a Json traverse/merger procedure
KStream tResults =
 tResultsTempTable.toStream();
 
{code}
kconsumer_1 | Exception in thread "split-reader-client3-StreamThread-1" 
org.apache.kafka.streams.errors.ProcessorStateException: Error opening store 
KSTREAM-REDUCE-STATE-STORE-32 at location 
/tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32

...

kconsumer_1 | Caused by: org.rocksdb.RocksDBException: lock : 
/tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32/LOCK: No 
locks available

Kstore 10_0 contains tr[0-9a-fA-F]

{32} records, I checked.
 
 more details about exception are in the attached file.
 
 The exception is no longer present when I use an intermediate topic instead:
 
  
{code:java}
Pattern tResultPattern =
 
Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}");

[jira] [Commented] (KAFKA-9572) Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some Records

2020-02-24 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044030#comment-17044030
 ] 

Guozhang Wang commented on KAFKA-9572:
--

I looked into the source code of 2.4 and 2.5 and did not find any new issues 
other than conjectured https://issues.apache.org/jira/browse/KAFKA-8574 – the 
logs themselves cannot further validates whether it was the root cause, but at 
least it shows that some restoring tasks are closed before restoration is 
completed, which could possibly lead to the bug of KAFKA-8574. This bug is 
fixed as part of the tech debt cleanup as in KAFKA-9113. So I think I have 
about 60 percent confidence that this issue is no longer there in 2.6 but it 
would still be in 2.4 and 2.5 since the fix itself incurs a lot of the cleanup 
it is hard to cherry-pick to older branches.

I'd suggest we close this ticket for 2.6 only and if 
StreamsEosTest.test_failure_and_recovery failed on trunk again we could look 
into this once more. WDYT [~cadonna] [~apurva]


> Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some 
> Records
> ---
>
> Key: KAFKA-9572
> URL: https://issues.apache.org/jira/browse/KAFKA-9572
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.4.0
>Reporter: Bruno Cadonna
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 2.5.0
>
> Attachments: 7-changelog-1.txt, data-1.txt, streams22.log, 
> streams23.log, streams30.log, sum-1.txt
>
>
> System test {{StreamsEosTest.test_failure_and_recovery}} failed due to a 
> wrongly computed aggregation under exactly-once (EOS). The specific error is:
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: Result verification 
> failed for ConsumerRecord(topic = sum, partition = 1, leaderEpoch = 0, offset 
> = 2805, CreateTime = 1580719595164, serialized key size = 4, serialized value 
> size = 8, headers = RecordHeaders(headers = [], isReadOnly = false), key = 
> [B@6c779568, value = [B@f381794) expected <6069,17269> but was <6069,10698>
>   at 
> org.apache.kafka.streams.tests.EosTestDriver.verifySum(EosTestDriver.java:444)
>   at 
> org.apache.kafka.streams.tests.EosTestDriver.verify(EosTestDriver.java:196)
>   at 
> org.apache.kafka.streams.tests.StreamsEosTest.main(StreamsEosTest.java:69)
> {code} 
> That means, the sum computed by the Streams app seems to be wrong for key 
> 6069. I checked the dumps of the log segments of the input topic partition 
> (attached: data-1.txt) and indeed two input records are not considered in the 
> sum. With those two missed records the sum would be correct. More concretely, 
> the input values for key 6069 are:
> # 147
> # 9250
> # 5340 
> # 1231
> # 1301
> The sum of this values is 17269 as stated in the exception above as expected 
> sum. If you subtract values 3 and 4, i.e., 5340 and 1231 from 17269, you get 
> 10698 , which is the actual sum in the exception above. Somehow those two 
> values are missing.
> In the log dump of the output topic partition (attached: sum-1.txt), the sum 
> is correct until the 4th value 1231 , i.e. 15968, then it is overwritten with 
> 10698.
> In the log dump of the changelog topic of the state store that stores the sum 
> (attached: 7-changelog-1.txt), the sum is also overwritten as in the output 
> topic.
> I attached the logs of the three Streams instances involved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9600) EndTxn handler should check strict epoch equality

2020-02-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-9600:
--

Assignee: Boyang Chen

> EndTxn handler should check strict epoch equality
> -
>
> Key: KAFKA-9600
> URL: https://issues.apache.org/jira/browse/KAFKA-9600
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Boyang Chen
>Priority: Major
>
> The EndTxn path in TransactionCoordinator is shared between direct calls to 
> EndTxn from the client and internal transaction abort logic. To support the 
> latter, the code is written to allow an epoch bump. However, if the client 
> bumps the epoch unexpectedly (e.g. due to a buggy implementation), then the 
> internal invariants are violated which results in a hanging transaction. 
> Specifically, the transaction is left in a pending state because the epoch 
> following append to the log does not match what we expect.
> To fix this, we should ensure that an EndTxn from the client checks for 
> strict epoch equality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9595) Config tool should use admin client's incremental config API

2020-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044003#comment-17044003
 ] 

ASF GitHub Bot commented on KAFKA-9595:
---

agam commented on pull request #8162: KAFKA-9595: switch usage of 
`alterConfigs` to `incrementalAlterConfigs` for kafka-configs tool
URL: https://github.com/apache/kafka/pull/8162
 
 
   - Also, some minor refactoring for common code
   - Test changes to `ConfigCommandTest`
   - Verifies builds, tests pass
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Config tool should use admin client's incremental config API
> 
>
> Key: KAFKA-9595
> URL: https://issues.apache.org/jira/browse/KAFKA-9595
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Agam Brahma
>Priority: Major
>  Labels: newbie
>
> The `alterConfigs` API is deprecated. We should convert `ConfigCommand` to 
> use `incrementalAlterConfigs`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9047) AdminClient group operations may not respect backoff

2020-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043866#comment-17043866
 ] 

ASF GitHub Bot commented on KAFKA-9047:
---

skaundinya15 commented on pull request #8161: KAFKA-9047: Making AdminClient 
group operations respect retries and backoff
URL: https://github.com/apache/kafka/pull/8161
 
 
   https://issues.apache.org/jira/browse/KAFKA-9047
   
   Previously, `AdminClient` group operations did not respect a `Call`'s number 
of configured tries and retry backoff. This could lead to tight retry loops 
that put a lot of pressure on the broker. This PR introduces fixes that ensures 
for all group operations the `AdminClient` respects the number of tries and the 
backoff a given `Call` has.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AdminClient group operations may not respect backoff
> 
>
> Key: KAFKA-9047
> URL: https://issues.apache.org/jira/browse/KAFKA-9047
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Reporter: Jason Gustafson
>Assignee: Sanjana Kaundinya
>Priority: Major
>
> The retry logic for consumer group operations in the admin client is 
> complicated by the need to find the coordinator. Instead of simply retry 
> loops which send the same request over and over, we can get more complex 
> retry loops like the following:
>  # Send FindCoordinator to B -> Coordinator is A
>  # Send DescribeGroup to A -> NOT_COORDINATOR
>  # Go back to 1
> Currently we construct a new Call object for each step in this loop, which 
> means we lose some of retry bookkeeping such as the last retry time and the 
> number of tries. This means it is possible to have tight retry loops which 
> bounce between steps 1 and 2 and do not respect the retry backoff. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9455) Consider using TreeMap for in-memory stores of Streams

2020-02-24 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043797#comment-17043797
 ] 

Guozhang Wang commented on KAFKA-9455:
--

I'm actually considering that we should use tree-map for all in-memory 
time-windowed stores, independent of what queries they may be accessed for.

What do you mean by `JoinWindowStore`? I think we do not have a specific 
store-type just for windowed stream-stream join?

> Consider using TreeMap for in-memory stores of Streams
> --
>
> Key: KAFKA-9455
> URL: https://issues.apache.org/jira/browse/KAFKA-9455
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: highluck
>Priority: Major
>  Labels: newbie++
>
> From [~ableegoldman]: It's worth noting that it might be a good idea to 
> switch to TreeMap for different reasons. Right now the ConcurrentSkipListMap 
> allows us to safely perform range queries without copying over the entire 
> keyset, but the performance on point queries seems to scale noticeably worse 
> with the number of unique keys. Point queries are used by aggregations while 
> range queries are used by windowed joins, but of course both are available 
> within the PAPI and for interactive queries so it's hard to say which we 
> should prefer. Maybe rather than make that tradeoff we should have one 
> version for efficient range queries (a "JoinWindowStore") and one for 
> efficient point queries ("AggWindowStore") - or something. I know we've had 
> similar thoughts for a different RocksDB store layout for Joins (although I 
> can't find that ticket anywhere..), it seems like the in-memory stores could 
> benefit from a special "Join" version as well cc/ Guozhang Wang
> Here are some random thoughts:
> 1. For kafka streams processing logic (i.e. without IQ), it's better to make 
> all processing logic relying on point queries rather than range queries. 
> Right now the only processor that use range queries are, as mentioned above, 
> windowed stream-stream joins. I think we should consider using a different 
> window implementation for this (and as a result also get rid of the 
> retainDuplicate flags) to refactor the windowed stream-stream join operation.
> 2. With 1), range queries would only be exposed as IQ. Depending on its usage 
> frequency I think it makes lots of sense to optimize for single-point queries.
> Of course, even without step 1) we should still consider using tree-map for 
> windowed in-memory stores to have a better scaling effect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9599) create unique sensor to record group rebalance

2020-02-24 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-9599.
--
Fix Version/s: 2.4.1
   2.5.0
   Resolution: Fixed

> create unique sensor to record group rebalance
> --
>
> Key: KAFKA-9599
> URL: https://issues.apache.org/jira/browse/KAFKA-9599
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
> Fix For: 2.5.0, 2.4.1
>
>
> {code:scala}
>   val offsetDeletionSensor = metrics.sensor("OffsetDeletions")
>   ...
>   val groupCompletedRebalanceSensor = metrics.sensor("OffsetDeletions")
> {code}
> the "offset deletion" and "group rebalance" should not be recorded by the 
> same sensor since they are totally different.
> the code is introduced by KAFKA-8730



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9600) EndTxn handler should check strict epoch equality

2020-02-24 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-9600:
---
Description: 
The EndTxn path in TransactionCoordinator is shared between direct calls to 
EndTxn from the client and internal transaction abort logic. To support the 
latter, the code is written to allow an epoch bump. However, if the client 
bumps the epoch unexpectedly (e.g. due to a buggy implementation), then the 
internal invariants are violated which results in a hanging transaction. 
Specifically, the transaction is left in a pending state because the epoch 
following append to the log does not match what we expect.

To fix this, we should ensure that an EndTxn from the client checks for strict 
epoch equality.

  was:The EndTxn path in TransactionCoordinator is shared between direct calls 
to EndTxn from the client and internal transaction abort logic. To support the 
latter, the code is written to allow an epoch bump. However, if the client 
bumps the epoch unexpectedly (e.g. due to a buggy implementation), then we can 
be left with a hanging transaction. To fix this, we should ensure that an 
EndTxn from the client checks for strict epoch equality.


> EndTxn handler should check strict epoch equality
> -
>
> Key: KAFKA-9600
> URL: https://issues.apache.org/jira/browse/KAFKA-9600
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Major
>
> The EndTxn path in TransactionCoordinator is shared between direct calls to 
> EndTxn from the client and internal transaction abort logic. To support the 
> latter, the code is written to allow an epoch bump. However, if the client 
> bumps the epoch unexpectedly (e.g. due to a buggy implementation), then the 
> internal invariants are violated which results in a hanging transaction. 
> Specifically, the transaction is left in a pending state because the epoch 
> following append to the log does not match what we expect.
> To fix this, we should ensure that an EndTxn from the client checks for 
> strict epoch equality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9600) EndTxn handler should check strict epoch equality

2020-02-24 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-9600:
--

 Summary: EndTxn handler should check strict epoch equality
 Key: KAFKA-9600
 URL: https://issues.apache.org/jira/browse/KAFKA-9600
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


The EndTxn path in TransactionCoordinator is shared between direct calls to 
EndTxn from the client and internal transaction abort logic. To support the 
latter, the code is written to allow an epoch bump. However, if the client 
bumps the epoch unexpectedly (e.g. due to a buggy implementation), then we can 
be left with a hanging transaction. To fix this, we should ensure that an 
EndTxn from the client checks for strict epoch equality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9562) Streams not making progress under heavy failures with EOS enabled on 2.5 branch

2020-02-24 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043717#comment-17043717
 ] 

Boyang Chen commented on KAFKA-9562:


A status update: we have fixed two sub tasks and one flush call try catch. The 
latest changes are deployed to soak. Will update this ticket once we believe 
2.5 is good to go on stream side.

> Streams not making progress under heavy failures with EOS enabled on 2.5 
> branch
> ---
>
> Key: KAFKA-9562
> URL: https://issues.apache.org/jira/browse/KAFKA-9562
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.5.0
>Reporter: John Roesler
>Assignee: Boyang Chen
>Priority: Blocker
> Fix For: 2.5.0
>
>
> During soak testing in preparation for the 2.5.0 release, we have discovered 
> a case in which Streams appears to stop making progress. Specifically, this 
> is a failure-resilience test in which we inject network faults separating the 
> instances from the brokers roughly every twenty minutes.
> On 2.4, Streams would obviously spend a lot of time rebalancing under this 
> scenario, but would still make progress. However, on the current 2.5 branch, 
> Streams effectively stops making progress except rarely.
> This appears to be a severe regression, so I'm filing this ticket as a 2.5.0 
> release blocker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9581) Deprecate rebalanceException on StreamThread to avoid infinite loop

2020-02-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-9581.

Resolution: Fixed

> Deprecate rebalanceException on StreamThread to avoid infinite loop
> ---
>
> Key: KAFKA-9581
> URL: https://issues.apache.org/jira/browse/KAFKA-9581
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9581) Deprecate rebalanceException on StreamThread to avoid infinite loop

2020-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043671#comment-17043671
 ] 

ASF GitHub Bot commented on KAFKA-9581:
---

guozhangwang commented on pull request #8145: KAFKA-9581: Remove rebalance 
exception withholding
URL: https://github.com/apache/kafka/pull/8145
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Deprecate rebalanceException on StreamThread to avoid infinite loop
> ---
>
> Key: KAFKA-9581
> URL: https://issues.apache.org/jira/browse/KAFKA-9581
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-6435) Application Reset Tool might delete incorrect internal topics

2020-02-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-6435:
---
Labels: bug help-wanted newbie  (was: bug)

> Application Reset Tool might delete incorrect internal topics
> -
>
> Key: KAFKA-6435
> URL: https://issues.apache.org/jira/browse/KAFKA-6435
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, tools
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>Priority: Major
>  Labels: bug, help-wanted, newbie
>
> The streams application reset tool, deletes all topic that start with 
> {{-}}.
> If people have two versions of the same application and name them {{"app"}} 
> and {{"app-v2"}}, resetting {{"app"}} would also delete the internal topics 
> of {{"app-v2"}}.
> We either need to disallow the dash in the application ID, or improve the 
> topic identification logic in the reset tool to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9599) create unique sensor to record group rebalance

2020-02-24 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043633#comment-17043633
 ] 

Guozhang Wang commented on KAFKA-9599:
--

Thanks for catching this bug [~chia7712]! Please ping me when you have a PR 
ready.

> create unique sensor to record group rebalance
> --
>
> Key: KAFKA-9599
> URL: https://issues.apache.org/jira/browse/KAFKA-9599
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
>
> {code:scala}
>   val offsetDeletionSensor = metrics.sensor("OffsetDeletions")
>   ...
>   val groupCompletedRebalanceSensor = metrics.sensor("OffsetDeletions")
> {code}
> the "offset deletion" and "group rebalance" should not be recorded by the 
> same sensor since they are totally different.
> the code is introduced by KAFKA-8730



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed

2020-02-24 Thread Bill Bejeck (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043607#comment-17043607
 ] 

Bill Bejeck commented on KAFKA-9398:


The PR for this ticket was closed as discussions are on-going to what is the 
proper fix.  I've updated the {{Fix Version}} field.

> Kafka Streams main thread may not exit even after close timeout has passed
> --
>
> Key: KAFKA-9398
> URL: https://issues.apache.org/jira/browse/KAFKA-9398
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Assignee: Bill Bejeck
>Priority: Critical
>
> Kafka Streams offers the KafkaStreams.close() method when shutting down a 
> Kafka Streams application. There are two overloads to this method, one that 
> takes no parameters and another taking a Duration specifying how long the 
> close() method should block waiting for streams shut down operations to 
> complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE.
> The issue is that if a StreamThread is taking to long to complete or if one 
> of the Consumer or Producer clients is in a hung state, the Kafka Streams 
> application won't exit even after the specified timeout has expired.
> For example, consider this scenario:
>  # A sink topic gets deleted by accident 
>  # The user sets Producer max.block.ms config to a high value
> In this case, the Producer will issue a WARN logging statement and will 
> continue to make metadata requests looking for the expected topic. The 
> {{Producer}} will continue making metadata requests up until the max.block.ms 
> expires. If this value is high enough, calling close() with a timeout won't 
> fix the issue as when the timeout expires, the Kafka Streams application's 
> main thread won't exit.
> To prevent this type of issue, we should call Thread.interrupt() on all 
> StreamThread instances once the close() timeout has expired. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed

2020-02-24 Thread Bill Bejeck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck updated KAFKA-9398:
---
Fix Version/s: (was: 2.5.0)

> Kafka Streams main thread may not exit even after close timeout has passed
> --
>
> Key: KAFKA-9398
> URL: https://issues.apache.org/jira/browse/KAFKA-9398
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Assignee: Bill Bejeck
>Priority: Critical
>
> Kafka Streams offers the KafkaStreams.close() method when shutting down a 
> Kafka Streams application. There are two overloads to this method, one that 
> takes no parameters and another taking a Duration specifying how long the 
> close() method should block waiting for streams shut down operations to 
> complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE.
> The issue is that if a StreamThread is taking to long to complete or if one 
> of the Consumer or Producer clients is in a hung state, the Kafka Streams 
> application won't exit even after the specified timeout has expired.
> For example, consider this scenario:
>  # A sink topic gets deleted by accident 
>  # The user sets Producer max.block.ms config to a high value
> In this case, the Producer will issue a WARN logging statement and will 
> continue to make metadata requests looking for the expected topic. The 
> {{Producer}} will continue making metadata requests up until the max.block.ms 
> expires. If this value is high enough, calling close() with a timeout won't 
> fix the issue as when the timeout expires, the Kafka Streams application's 
> main thread won't exit.
> To prevent this type of issue, we should call Thread.interrupt() on all 
> StreamThread instances once the close() timeout has expired. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9599) create unique sensor to record group rebalance

2020-02-24 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-9599:
-

 Summary: create unique sensor to record group rebalance
 Key: KAFKA-9599
 URL: https://issues.apache.org/jira/browse/KAFKA-9599
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


{code:scala}
  val offsetDeletionSensor = metrics.sensor("OffsetDeletions")

  ...

  val groupCompletedRebalanceSensor = metrics.sensor("OffsetDeletions")
{code}

the "offset deletion" and "group rebalance" should not be recorded by the same 
sensor since they are totally different.

the code is introduced by KAFKA-8730



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9541) Flaky Test DescribeConsumerGroupTest#testDescribeGroupMembersWithShortInitializationTimeout

2020-02-24 Thread huxihx (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxihx reassigned KAFKA-9541:
-

Assignee: (was: huxihx)

> Flaky Test 
> DescribeConsumerGroupTest#testDescribeGroupMembersWithShortInitializationTimeout
> ---
>
> Key: KAFKA-9541
> URL: https://issues.apache.org/jira/browse/KAFKA-9541
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.4.0
>Reporter: huxihx
>Priority: Major
>
> h3. Error Message
> java.lang.AssertionError: assertion failed
> h3. Stacktrace
> java.lang.AssertionError: assertion failed at 
> scala.Predef$.assert(Predef.scala:267) at 
> kafka.admin.DescribeConsumerGroupTest.testDescribeGroupMembersWithShortInitializationTimeout(DescribeConsumerGroupTest.scala:630)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at jdk.internal.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
>  at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
>  at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
>  at jdk.internal.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> 

[jira] [Updated] (KAFKA-9497) Brokers start up even if SASL provider is not loaded and throw NPE when clients connect

2020-02-24 Thread Rajini Sivaram (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram updated KAFKA-9497:
--
Fix Version/s: (was: 2.5.0)
   2.6.0

> Brokers start up even if SASL provider is not loaded and throw NPE when 
> clients connect
> ---
>
> Key: KAFKA-9497
> URL: https://issues.apache.org/jira/browse/KAFKA-9497
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.2, 0.11.0.3, 1.1.1, 2.4.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.6.0
>
>
> Note: This is not a regression, this has been the behaviour since SASL was 
> first implemented in Kafka.
>  
> Sasl.createSaslServer and Sasl.createSaslClient may return null if a SASL 
> provider that works for the specified configs cannot be created. We don't 
> currently handle this case. As a result broker/client throws 
> NullPointerException if a provider has not been loaded. On the broker-side, 
> we allow brokers to start up successfully even if SASL provider for its 
> enabled mechanisms are not found. For SASL mechanisms 
> PLAIN/SCRAM-xx/OAUTHBEARER, the login module in Kafka loads the SASL 
> providers. If the login module is incorrectly configured, brokers startup and 
> then fail client connections when hitting NPE. Clients see disconnections 
> during authentication as a result. It is difficult to tell from the client or 
> broker logs why the failure occurred. We should fail during startup if SASL 
> providers are not found and provide better diagnostics for this case.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)