[jira] [Created] (KAFKA-9475) Replace transaction abortion scheduler with a delayed queue

2020-01-24 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-9475:
--

 Summary: Replace transaction abortion scheduler with a delayed 
queue
 Key: KAFKA-9475
 URL: https://issues.apache.org/jira/browse/KAFKA-9475
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen


Although we could try setting the txn timeout to be 10 second, the purging 
scheduler only works every one minute interval, so in the worst case we shall 
still wait for 1 minute. We are considering several potential fixes:
 # Change interval to 10 seconds: means we will have 6X frequent checking, more 
read contention on txn metadata. The benefit here is an easy one-line fix 
without correctness concern
 # Use an existing delayed queue, a.k.a purgatory. From what I heard, the 
purgatory needs at least 2 extra threads to work properly, with some add-on 
overhead for memory and complexity. The benefit here is more precise timeout 
reaction, without a redundant full metadata read lock.
 # Create a new delayed queue. This could be done by using scala delayed queue, 
the concern here is that whether this approach is production ready. Benefits 
are the same as 2, with less code complexity potentially

This ticket is to track #2 progress if we decide to go through this path 
eventually.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9447) Add examples for EOS standalone and group mode under kafka/examples

2020-01-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-9447:
--

Assignee: Boyang Chen

> Add examples for EOS standalone and group mode under kafka/examples
> ---
>
> Key: KAFKA-9447
> URL: https://issues.apache.org/jira/browse/KAFKA-9447
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Although we have integration tests for EOS model, it would be best to also 
> put them in the examples for people to use.
> Also considering the comment here: 
> [https://github.com/apache/kafka/pull/7952#discussion_r368313968] it would be 
> important for us to utilize the API before jumping into the conclusion what 
> would be the best option we have at hand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9254) Updating Kafka Broker configuration dynamically twice reverts log configuration to default

2020-01-24 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-9254:
---
Affects Version/s: 2.1.1
   2.2.2
   2.4.0
   2.3.1

> Updating Kafka Broker configuration dynamically twice reverts log 
> configuration to default
> --
>
> Key: KAFKA-9254
> URL: https://issues.apache.org/jira/browse/KAFKA-9254
> Project: Kafka
>  Issue Type: Bug
>  Components: config, log, replication
>Affects Versions: 2.0.1, 2.1.1, 2.2.2, 2.4.0, 2.3.1
>Reporter: fenghong
>Assignee: huxihx
>Priority: Critical
> Fix For: 2.0.2, 2.1.2, 2.2.3, 2.3.2, 2.4.1
>
>
> We are engineers at Huobi and now encounter Kafka BUG 
> Modifying DynamicBrokerConfig more than 2 times will invalidate the topic 
> level unrelated configuration
> The bug reproduction method as follows:
>  # Set Kafka Broker config  server.properties min.insync.replicas=3
>  # Create topic test-1 and set topic‘s level config min.insync.replicas=2
>  # Dynamically modify the configuration twice as shown below
> {code:java}
> bin/kafka-configs.sh --bootstrap-server xxx:9092 --entity-type brokers 
> --entity-default --alter --add-config log.message.timestamp.type=LogAppendTime
> bin/kafka-configs.sh --bootstrap-server xxx:9092 --entity-type brokers 
> --entity-default --alter --add-config log.retention.ms=60480
> {code}
>  # stop a Kafka Server and found the Exception as shown below
>  org.apache.kafka.common.errors.NotEnoughReplicasException: Number of insync 
> replicas for partition test-1-0 is [2], below required minimum [3]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9254) Updating Kafka Broker configuration dynamically twice reverts log configuration to default

2020-01-24 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-9254.

Fix Version/s: 2.4.1
   2.3.2
   2.2.3
   2.1.2
   2.0.2
   Resolution: Fixed

> Updating Kafka Broker configuration dynamically twice reverts log 
> configuration to default
> --
>
> Key: KAFKA-9254
> URL: https://issues.apache.org/jira/browse/KAFKA-9254
> Project: Kafka
>  Issue Type: Bug
>  Components: config, log, replication
>Affects Versions: 2.0.1
>Reporter: fenghong
>Assignee: huxihx
>Priority: Critical
> Fix For: 2.0.2, 2.1.2, 2.2.3, 2.3.2, 2.4.1
>
>
> We are engineers at Huobi and now encounter Kafka BUG 
> Modifying DynamicBrokerConfig more than 2 times will invalidate the topic 
> level unrelated configuration
> The bug reproduction method as follows:
>  # Set Kafka Broker config  server.properties min.insync.replicas=3
>  # Create topic test-1 and set topic‘s level config min.insync.replicas=2
>  # Dynamically modify the configuration twice as shown below
> {code:java}
> bin/kafka-configs.sh --bootstrap-server xxx:9092 --entity-type brokers 
> --entity-default --alter --add-config log.message.timestamp.type=LogAppendTime
> bin/kafka-configs.sh --bootstrap-server xxx:9092 --entity-type brokers 
> --entity-default --alter --add-config log.retention.ms=60480
> {code}
>  # stop a Kafka Server and found the Exception as shown below
>  org.apache.kafka.common.errors.NotEnoughReplicasException: Number of insync 
> replicas for partition test-1-0 is [2], below required minimum [3]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9473) A large number of core system tests failing due to Kafka server failed to start on trunk

2020-01-24 Thread Ismael Juma (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023343#comment-17023343
 ] 

Ismael Juma commented on KAFKA-9473:


I think 
[https://github.com/apache/kafka/commit/a3509c0870230bcc1af4efbfafcf9f69d7cf55fd]
 may have fixed this.

> A large number of core system tests failing due to Kafka server failed to 
> start on trunk
> 
>
> Key: KAFKA-9473
> URL: https://issues.apache.org/jira/browse/KAFKA-9473
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Critical
>
> By running a full set of core system tests, we detected 38/166 test failures 
> which are due to 
> `FAIL: Kafka server didn't finish startup in 60 seconds`
> need further investigation on this.
> [https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3701/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9472) Reducing number of tasks for connector causes deleted tasks to show as UNASSIGNED

2020-01-24 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023325#comment-17023325
 ] 

Konstantine Karantasis commented on KAFKA-9472:
---

Sounds good. Thanks for taking a closer look

> Reducing number of tasks for connector causes deleted tasks to show as 
> UNASSIGNED
> -
>
> Key: KAFKA-9472
> URL: https://issues.apache.org/jira/browse/KAFKA-9472
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 
> 2.4.0, 2.3.1
>Reporter: Chris Egerton
>Priority: Major
>
> If a connector is successfully created with {{t1}} running tasks and then 
> reconfigured to use {{t1 - n}} tasks (where {{t1}} and {{n}} are both whole 
> numbers and {{n}} is strictly less than {{t1}}), the connector should then 
> list {{t1 - n}} total tasks in its status (which can be queried via the 
> {{/connectors/:name:/status}} endpoint or the {{/connectors}} endpoint with 
> the {{expand}} URL query parameter set to {{status}}).
> However, the connector will instead continue to list {{t1}} total tasks in 
> its status, with {{n}} of them being listed as {{UNASSIGNED}} and the 
> remaining {{t1 - n}} of them being listed as {{STARTED}}.
> This is because the only time a task status is removed from the status 
> backing store (as opposed to simply being updated to {{UNASSIGNED}}) is when 
> its connector is deleted. See relevant code snippets from the 
> [AbstractHerder|https://github.com/apache/kafka/blob/df13fc93d0aebfe0ecc40dd4af3c5fb19b35f710/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractHerder.java#L187-L192]
>  and 
> [DistributedHerder|https://github.com/apache/kafka/blob/df13fc93d0aebfe0ecc40dd4af3c5fb19b35f710/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1511-L1520]
>  classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-8382) Add TimestampedSessionStore

2020-01-24 Thread highluck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

highluck reassigned KAFKA-8382:
---

Assignee: highluck

> Add TimestampedSessionStore
> ---
>
> Key: KAFKA-8382
> URL: https://issues.apache.org/jira/browse/KAFKA-8382
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: highluck
>Priority: Minor
>  Labels: kip
>
> Follow up to KIP-258, to complete the KIP by adding TimestampedSessionStores.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-258%3A+Allow+to+Store+Record+Timestamps+in+RocksDB]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9254) Updating Kafka Broker configuration dynamically twice reverts log configuration to default

2020-01-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023315#comment-17023315
 ] 

ASF GitHub Bot commented on KAFKA-9254:
---

hachikuji commented on pull request #7870: KAFKA-9254: Overridden topic configs 
are reset after dynamic default change
URL: https://github.com/apache/kafka/pull/7870
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Updating Kafka Broker configuration dynamically twice reverts log 
> configuration to default
> --
>
> Key: KAFKA-9254
> URL: https://issues.apache.org/jira/browse/KAFKA-9254
> Project: Kafka
>  Issue Type: Bug
>  Components: config, log, replication
>Affects Versions: 2.0.1
>Reporter: fenghong
>Assignee: huxihx
>Priority: Critical
>
> We are engineers at Huobi and now encounter Kafka BUG 
> Modifying DynamicBrokerConfig more than 2 times will invalidate the topic 
> level unrelated configuration
> The bug reproduction method as follows:
>  # Set Kafka Broker config  server.properties min.insync.replicas=3
>  # Create topic test-1 and set topic‘s level config min.insync.replicas=2
>  # Dynamically modify the configuration twice as shown below
> {code:java}
> bin/kafka-configs.sh --bootstrap-server xxx:9092 --entity-type brokers 
> --entity-default --alter --add-config log.message.timestamp.type=LogAppendTime
> bin/kafka-configs.sh --bootstrap-server xxx:9092 --entity-type brokers 
> --entity-default --alter --add-config log.retention.ms=60480
> {code}
>  # stop a Kafka Server and found the Exception as shown below
>  org.apache.kafka.common.errors.NotEnoughReplicasException: Number of insync 
> replicas for partition test-1-0 is [2], below required minimum [3]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8843) Zookeeper migration tool support for TLS

2020-01-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023299#comment-17023299
 ] 

ASF GitHub Bot commented on KAFKA-8843:
---

rondagostino commented on pull request #8003: KAFKA-8843: KIP-515: Zookeeper 
TLS support
URL: https://github.com/apache/kafka/pull/8003
 
 
   Signed-off-by: Ron Dagostino 
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Zookeeper migration tool support for TLS
> 
>
> Key: KAFKA-8843
> URL: https://issues.apache.org/jira/browse/KAFKA-8843
> Project: Kafka
>  Issue Type: Bug
>Reporter: Pere Urbon-Bayes
>Assignee: Pere Urbon-Bayes
>Priority: Minor
>
> Currently zookeeper-migration tool works based on SASL authentication. What 
> means only digest and kerberos authentication is supported.
>  
> With the introduction of ZK 3.5, TLS is added, including a new X509 
> authentication provider. 
>  
> To support this great future and utilise the TLS principals, the 
> zookeeper-migration-tool script should support the X509 authentication as 
> well.
>  
> In my newbie view, this should mean adding a new parameter to allow other 
> ways of authentication around 
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ZkSecurityMigrator.scala#L65.
>  
> |https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ZkSecurityMigrator.scala#L65]
>  
> If I understand the process correct, this will require a KIP, right?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9472) Reducing number of tasks for connector causes deleted tasks to show as UNASSIGNED

2020-01-24 Thread Chris Egerton (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023297#comment-17023297
 ] 

Chris Egerton commented on KAFKA-9472:
--

After looking into this a bit I suspect the root cause is different... it looks 
like KAFKA-8869 is caused by not removing task configurations for deleted 
connectors from the task config map, whereas this issue is caused by not 
removing the statuses for no-longer-needed tasks from the status backing store 
when the number of tasks for a connector is reduced, but the connector is still 
running.

>From what I can tell, this issue does not arise when a connector is deleted 
>(the task statuses are correctly removed from the backing store), and 
>KAFKA-8869 does not arise when the number of tasks for a connector is reduced 
>(the task reduction is correctly reflected in the values for the task config 
>map).

> Reducing number of tasks for connector causes deleted tasks to show as 
> UNASSIGNED
> -
>
> Key: KAFKA-9472
> URL: https://issues.apache.org/jira/browse/KAFKA-9472
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 
> 2.4.0, 2.3.1
>Reporter: Chris Egerton
>Priority: Major
>
> If a connector is successfully created with {{t1}} running tasks and then 
> reconfigured to use {{t1 - n}} tasks (where {{t1}} and {{n}} are both whole 
> numbers and {{n}} is strictly less than {{t1}}), the connector should then 
> list {{t1 - n}} total tasks in its status (which can be queried via the 
> {{/connectors/:name:/status}} endpoint or the {{/connectors}} endpoint with 
> the {{expand}} URL query parameter set to {{status}}).
> However, the connector will instead continue to list {{t1}} total tasks in 
> its status, with {{n}} of them being listed as {{UNASSIGNED}} and the 
> remaining {{t1 - n}} of them being listed as {{STARTED}}.
> This is because the only time a task status is removed from the status 
> backing store (as opposed to simply being updated to {{UNASSIGNED}}) is when 
> its connector is deleted. See relevant code snippets from the 
> [AbstractHerder|https://github.com/apache/kafka/blob/df13fc93d0aebfe0ecc40dd4af3c5fb19b35f710/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractHerder.java#L187-L192]
>  and 
> [DistributedHerder|https://github.com/apache/kafka/blob/df13fc93d0aebfe0ecc40dd4af3c5fb19b35f710/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1511-L1520]
>  classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9473) A large number of core system tests failing due to Kafka server failed to start on trunk

2020-01-24 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023287#comment-17023287
 ] 

Boyang Chen commented on KAFKA-9473:


```

[2020-01-24 07:53:17,594] ERROR [KafkaServer id=1] Fatal error during 
KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)

org.apache.kafka.common.KafkaException: 
javax.security.auth.login.LoginException: Connection refused (Connection 
refused)

        at 
org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172)

        at 
org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)

        at 
org.apache.kafka.common.network.ChannelBuilders.serverChannelBuilder(ChannelBuilders.java:97)

        at kafka.network.Processor.(SocketServer.scala:724)

        at kafka.network.SocketServer.newProcessor(SocketServer.scala:367)

        at 
kafka.network.SocketServer.$anonfun$addDataPlaneProcessors$1(SocketServer.scala:252)

        at 
kafka.network.SocketServer.addDataPlaneProcessors(SocketServer.scala:251)

        at 
kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1(SocketServer.scala:214)

        at 
kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1$adapted(SocketServer.scala:211)

        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)

        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)

        at 
kafka.network.SocketServer.createDataPlaneAcceptorsAndProcessors(SocketServer.scala:211)

        at kafka.network.SocketServer.startup(SocketServer.scala:122)

        at kafka.server.KafkaServer.startup(KafkaServer.scala:242)

        at 
kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)

        at kafka.Kafka$.main(Kafka.scala:84)

        at kafka.Kafka.main(Kafka.scala)

Caused by: javax.security.auth.login.LoginException: Connection refused 
(Connection refused)

        at 
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:808)

        at 
com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at 
javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)```

> A large number of core system tests failing due to Kafka server failed to 
> start on trunk
> 
>
> Key: KAFKA-9473
> URL: https://issues.apache.org/jira/browse/KAFKA-9473
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Critical
>
> By running a full set of core system tests, we detected 38/166 test failures 
> which are due to 
> `FAIL: Kafka server didn't finish startup in 60 seconds`
> need further investigation on this.
> [https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3701/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9152) Improve Sensor Retrieval

2020-01-24 Thread Bill Bejeck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck resolved KAFKA-9152.

Resolution: Fixed

> Improve Sensor Retrieval 
> -
>
> Key: KAFKA-9152
> URL: https://issues.apache.org/jira/browse/KAFKA-9152
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: highluck
>Priority: Minor
>  Labels: newbie, tech-debt
> Fix For: 2.5.0
>
>
> This ticket shall improve two aspects of the retrieval of sensors:
> 1. Currently, when a sensor is retrieved with {{*Metrics.*Sensor()}} (e.g. 
> {{ThreadMetrics.createTaskSensor()}}) after it was created with the same 
> method {{*Metrics.*Sensor()}}, the sensor is added again to the corresponding 
> queue in {{*Sensors}} (e.g. {{threadLevelSensors}}) in 
> {{StreamsMetricsImpl}}. Those queues are used to remove the sensors when 
> {{removeAll*LevelSensors()}} is called. Having multiple times the same 
> sensors in this queue is not an issue from a correctness point of view. 
> However, it would reduce the footprint to only store a sensor once in those 
> queues.
> 2. When a sensor is retrieved, the current code attempts to create a new 
> sensor and to add to it again the corresponding metrics. This could be 
> avoided.
>  
> Both aspects could be improved by checking whether a sensor already exists by 
> calling {{getSensor()}} on the {{Metrics}} object and checking the return 
> value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9152) Improve Sensor Retrieval

2020-01-24 Thread Bill Bejeck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck updated KAFKA-9152:
---
Fix Version/s: 2.5.0

> Improve Sensor Retrieval 
> -
>
> Key: KAFKA-9152
> URL: https://issues.apache.org/jira/browse/KAFKA-9152
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: highluck
>Priority: Minor
>  Labels: newbie, tech-debt
> Fix For: 2.5.0
>
>
> This ticket shall improve two aspects of the retrieval of sensors:
> 1. Currently, when a sensor is retrieved with {{*Metrics.*Sensor()}} (e.g. 
> {{ThreadMetrics.createTaskSensor()}}) after it was created with the same 
> method {{*Metrics.*Sensor()}}, the sensor is added again to the corresponding 
> queue in {{*Sensors}} (e.g. {{threadLevelSensors}}) in 
> {{StreamsMetricsImpl}}. Those queues are used to remove the sensors when 
> {{removeAll*LevelSensors()}} is called. Having multiple times the same 
> sensors in this queue is not an issue from a correctness point of view. 
> However, it would reduce the footprint to only store a sensor once in those 
> queues.
> 2. When a sensor is retrieved, the current code attempts to create a new 
> sensor and to add to it again the corresponding metrics. This could be 
> avoided.
>  
> Both aspects could be improved by checking whether a sensor already exists by 
> calling {{getSensor()}} on the {{Metrics}} object and checking the return 
> value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9152) Improve Sensor Retrieval

2020-01-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023283#comment-17023283
 ] 

ASF GitHub Bot commented on KAFKA-9152:
---

bbejeck commented on pull request #7928: KAFKA-9152; Improve Sensor Retrieval
URL: https://github.com/apache/kafka/pull/7928
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Sensor Retrieval 
> -
>
> Key: KAFKA-9152
> URL: https://issues.apache.org/jira/browse/KAFKA-9152
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: highluck
>Priority: Minor
>  Labels: newbie, tech-debt
>
> This ticket shall improve two aspects of the retrieval of sensors:
> 1. Currently, when a sensor is retrieved with {{*Metrics.*Sensor()}} (e.g. 
> {{ThreadMetrics.createTaskSensor()}}) after it was created with the same 
> method {{*Metrics.*Sensor()}}, the sensor is added again to the corresponding 
> queue in {{*Sensors}} (e.g. {{threadLevelSensors}}) in 
> {{StreamsMetricsImpl}}. Those queues are used to remove the sensors when 
> {{removeAll*LevelSensors()}} is called. Having multiple times the same 
> sensors in this queue is not an issue from a correctness point of view. 
> However, it would reduce the footprint to only store a sensor once in those 
> queues.
> 2. When a sensor is retrieved, the current code attempts to create a new 
> sensor and to add to it again the corresponding metrics. This could be 
> avoided.
>  
> Both aspects could be improved by checking whether a sensor already exists by 
> calling {{getSensor()}} on the {{Metrics}} object and checking the return 
> value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9472) Reducing number of tasks for connector causes deleted tasks to show as UNASSIGNED

2020-01-24 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023280#comment-17023280
 ] 

Konstantine Karantasis commented on KAFKA-9472:
---

I overlooked the backing stores. But, yes, I'm suspecting a common root cause, 
which has to do with the fact that we don't remove the task from the config 
backing store, which would be reflected to the status backing store. More 
investigation might be granted, that's why I didn't close the ticket. But I 
wanted to bring up that this sounded relevant to KAFKA-8869. 

> Reducing number of tasks for connector causes deleted tasks to show as 
> UNASSIGNED
> -
>
> Key: KAFKA-9472
> URL: https://issues.apache.org/jira/browse/KAFKA-9472
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 
> 2.4.0, 2.3.1
>Reporter: Chris Egerton
>Priority: Major
>
> If a connector is successfully created with {{t1}} running tasks and then 
> reconfigured to use {{t1 - n}} tasks (where {{t1}} and {{n}} are both whole 
> numbers and {{n}} is strictly less than {{t1}}), the connector should then 
> list {{t1 - n}} total tasks in its status (which can be queried via the 
> {{/connectors/:name:/status}} endpoint or the {{/connectors}} endpoint with 
> the {{expand}} URL query parameter set to {{status}}).
> However, the connector will instead continue to list {{t1}} total tasks in 
> its status, with {{n}} of them being listed as {{UNASSIGNED}} and the 
> remaining {{t1 - n}} of them being listed as {{STARTED}}.
> This is because the only time a task status is removed from the status 
> backing store (as opposed to simply being updated to {{UNASSIGNED}}) is when 
> its connector is deleted. See relevant code snippets from the 
> [AbstractHerder|https://github.com/apache/kafka/blob/df13fc93d0aebfe0ecc40dd4af3c5fb19b35f710/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractHerder.java#L187-L192]
>  and 
> [DistributedHerder|https://github.com/apache/kafka/blob/df13fc93d0aebfe0ecc40dd4af3c5fb19b35f710/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1511-L1520]
>  classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9472) Reducing number of tasks for connector causes deleted tasks to show as UNASSIGNED

2020-01-24 Thread Chris Egerton (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023271#comment-17023271
 ] 

Chris Egerton commented on KAFKA-9472:
--

[~kkonstantine] the issues may be similar but I don't see how this is a 
duplicate. KAFKA-8869 appears to be related to the config backing store whereas 
this issue is related to the status backing store. Do you suspect a shared root 
cause that is not explicitly mentioned in either ticket?

> Reducing number of tasks for connector causes deleted tasks to show as 
> UNASSIGNED
> -
>
> Key: KAFKA-9472
> URL: https://issues.apache.org/jira/browse/KAFKA-9472
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 
> 2.4.0, 2.3.1
>Reporter: Chris Egerton
>Priority: Major
>
> If a connector is successfully created with {{t1}} running tasks and then 
> reconfigured to use {{t1 - n}} tasks (where {{t1}} and {{n}} are both whole 
> numbers and {{n}} is strictly less than {{t1}}), the connector should then 
> list {{t1 - n}} total tasks in its status (which can be queried via the 
> {{/connectors/:name:/status}} endpoint or the {{/connectors}} endpoint with 
> the {{expand}} URL query parameter set to {{status}}).
> However, the connector will instead continue to list {{t1}} total tasks in 
> its status, with {{n}} of them being listed as {{UNASSIGNED}} and the 
> remaining {{t1 - n}} of them being listed as {{STARTED}}.
> This is because the only time a task status is removed from the status 
> backing store (as opposed to simply being updated to {{UNASSIGNED}}) is when 
> its connector is deleted. See relevant code snippets from the 
> [AbstractHerder|https://github.com/apache/kafka/blob/df13fc93d0aebfe0ecc40dd4af3c5fb19b35f710/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractHerder.java#L187-L192]
>  and 
> [DistributedHerder|https://github.com/apache/kafka/blob/df13fc93d0aebfe0ecc40dd4af3c5fb19b35f710/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1511-L1520]
>  classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-8803:
--

Assignee: Sophie Blee-Goldman  (was: Boyang Chen)

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Attachments: logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9472) Reducing number of tasks for connector causes deleted tasks to show as UNASSIGNED

2020-01-24 Thread Konstantine Karantasis (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023264#comment-17023264
 ] 

Konstantine Karantasis commented on KAFKA-9472:
---

Thanks for reporting [~ChrisEgerton]
I’m suspecting this ticket might be a duplicate of: 
https://issues.apache.org/jira/browse/KAFKA-8869

> Reducing number of tasks for connector causes deleted tasks to show as 
> UNASSIGNED
> -
>
> Key: KAFKA-9472
> URL: https://issues.apache.org/jira/browse/KAFKA-9472
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 
> 2.4.0, 2.3.1
>Reporter: Chris Egerton
>Priority: Major
>
> If a connector is successfully created with {{t1}} running tasks and then 
> reconfigured to use {{t1 - n}} tasks (where {{t1}} and {{n}} are both whole 
> numbers and {{n}} is strictly less than {{t1}}), the connector should then 
> list {{t1 - n}} total tasks in its status (which can be queried via the 
> {{/connectors/:name:/status}} endpoint or the {{/connectors}} endpoint with 
> the {{expand}} URL query parameter set to {{status}}).
> However, the connector will instead continue to list {{t1}} total tasks in 
> its status, with {{n}} of them being listed as {{UNASSIGNED}} and the 
> remaining {{t1 - n}} of them being listed as {{STARTED}}.
> This is because the only time a task status is removed from the status 
> backing store (as opposed to simply being updated to {{UNASSIGNED}}) is when 
> its connector is deleted. See relevant code snippets from the 
> [AbstractHerder|https://github.com/apache/kafka/blob/df13fc93d0aebfe0ecc40dd4af3c5fb19b35f710/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractHerder.java#L187-L192]
>  and 
> [DistributedHerder|https://github.com/apache/kafka/blob/df13fc93d0aebfe0ecc40dd4af3c5fb19b35f710/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1511-L1520]
>  classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9474) Kafka RPC protocol should support type 'double'

2020-01-24 Thread Brian Byrne (Jira)
Brian Byrne created KAFKA-9474:
--

 Summary: Kafka RPC protocol should support type 'double'
 Key: KAFKA-9474
 URL: https://issues.apache.org/jira/browse/KAFKA-9474
 Project: Kafka
  Issue Type: Improvement
Reporter: Brian Byrne
Assignee: Brian Byrne


Should be fairly straightforward. Useful for KIP-546.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9254) Updating Kafka Broker configuration dynamically twice reverts log configuration to default

2020-01-24 Thread Vikas Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Singh updated KAFKA-9254:
---
Summary: Updating Kafka Broker configuration dynamically twice reverts log 
configuration to default  (was: Topic level configuration failed)

> Updating Kafka Broker configuration dynamically twice reverts log 
> configuration to default
> --
>
> Key: KAFKA-9254
> URL: https://issues.apache.org/jira/browse/KAFKA-9254
> Project: Kafka
>  Issue Type: Bug
>  Components: config, log, replication
>Affects Versions: 2.0.1
>Reporter: fenghong
>Assignee: huxihx
>Priority: Critical
>
> We are engineers at Huobi and now encounter Kafka BUG 
> Modifying DynamicBrokerConfig more than 2 times will invalidate the topic 
> level unrelated configuration
> The bug reproduction method as follows:
>  # Set Kafka Broker config  server.properties min.insync.replicas=3
>  # Create topic test-1 and set topic‘s level config min.insync.replicas=2
>  # Dynamically modify the configuration twice as shown below
> {code:java}
> bin/kafka-configs.sh --bootstrap-server xxx:9092 --entity-type brokers 
> --entity-default --alter --add-config log.message.timestamp.type=LogAppendTime
> bin/kafka-configs.sh --bootstrap-server xxx:9092 --entity-type brokers 
> --entity-default --alter --add-config log.retention.ms=60480
> {code}
>  # stop a Kafka Server and found the Exception as shown below
>  org.apache.kafka.common.errors.NotEnoughReplicasException: Number of insync 
> replicas for partition test-1-0 is [2], below required minimum [3]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9462) Correct exception message in DistributedHerder

2020-01-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-9462.
--
  Reviewer: Randall Hauch
Resolution: Fixed

Thanks, [~yuzhih...@gmail.com]. Merged to the `trunk` branch. IMO backporting 
is not really warranted this since this is a very minor change to an exception 
message.

> Correct exception message in DistributedHerder
> --
>
> Key: KAFKA-9462
> URL: https://issues.apache.org/jira/browse/KAFKA-9462
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Trivial
> Fix For: 2.5.0
>
>
> There are a few exception messages in DistributedHerder which were copied 
> from other exception message.
> This task corrects the messages to reflect actual condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9462) Correct exception message in DistributedHerder

2020-01-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-9462:
-
Priority: Trivial  (was: Minor)

> Correct exception message in DistributedHerder
> --
>
> Key: KAFKA-9462
> URL: https://issues.apache.org/jira/browse/KAFKA-9462
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Trivial
> Fix For: 2.5.0
>
>
> There are a few exception messages in DistributedHerder which were copied 
> from other exception message.
> This task corrects the messages to reflect actual condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9462) Correct exception message in DistributedHerder

2020-01-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-9462:
-
Issue Type: Bug  (was: Improvement)

> Correct exception message in DistributedHerder
> --
>
> Key: KAFKA-9462
> URL: https://issues.apache.org/jira/browse/KAFKA-9462
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 2.5.0
>
>
> There are a few exception messages in DistributedHerder which were copied 
> from other exception message.
> This task corrects the messages to reflect actual condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9462) Correct exception message in DistributedHerder

2020-01-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reassigned KAFKA-9462:


Assignee: Ted Yu

> Correct exception message in DistributedHerder
> --
>
> Key: KAFKA-9462
> URL: https://issues.apache.org/jira/browse/KAFKA-9462
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 2.5.0
>
>
> There are a few exception messages in DistributedHerder which were copied 
> from other exception message.
> This task corrects the messages to reflect actual condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9462) Correct exception message in DistributedHerder

2020-01-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-9462:
-
Issue Type: Improvement  (was: Bug)

> Correct exception message in DistributedHerder
> --
>
> Key: KAFKA-9462
> URL: https://issues.apache.org/jira/browse/KAFKA-9462
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 2.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 2.5.0
>
>
> There are a few exception messages in DistributedHerder which were copied 
> from other exception message.
> This task corrects the messages to reflect actual condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9462) Correct exception message in DistributedHerder

2020-01-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-9462:
-
Affects Version/s: 2.4.0

> Correct exception message in DistributedHerder
> --
>
> Key: KAFKA-9462
> URL: https://issues.apache.org/jira/browse/KAFKA-9462
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 2.5.0
>
>
> There are a few exception messages in DistributedHerder which were copied 
> from other exception message.
> This task corrects the messages to reflect actual condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9462) Correct exception message in DistributedHerder

2020-01-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-9462:
-
Fix Version/s: 2.5.0

> Correct exception message in DistributedHerder
> --
>
> Key: KAFKA-9462
> URL: https://issues.apache.org/jira/browse/KAFKA-9462
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Fix For: 2.5.0
>
>
> There are a few exception messages in DistributedHerder which were copied 
> from other exception message.
> This task corrects the messages to reflect actual condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9462) Correct exception message in DistributedHerder

2020-01-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023221#comment-17023221
 ] 

ASF GitHub Bot commented on KAFKA-9462:
---

rhauch commented on pull request #7995: KAFKA-9462: Correct exception message 
in DistributedHerder
URL: https://github.com/apache/kafka/pull/7995
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Correct exception message in DistributedHerder
> --
>
> Key: KAFKA-9462
> URL: https://issues.apache.org/jira/browse/KAFKA-9462
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> There are a few exception messages in DistributedHerder which were copied 
> from other exception message.
> This task corrects the messages to reflect actual condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9462) Correct exception message in DistributedHerder

2020-01-24 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-9462:
-
Component/s: KafkaConnect

> Correct exception message in DistributedHerder
> --
>
> Key: KAFKA-9462
> URL: https://issues.apache.org/jira/browse/KAFKA-9462
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 2.5.0
>
>
> There are a few exception messages in DistributedHerder which were copied 
> from other exception message.
> This task corrects the messages to reflect actual condition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9473) A large number of core system tests failing due to Kafka server failed to start on trunk

2020-01-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9473:
---
Description: 
By running a full set of core system tests, we detected 38/166 test failures 
which are due to 

`FAIL: Kafka server didn't finish startup in 60 seconds`

need further investigation on this.

[https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3701/]

  was:
By running a full set of core system 
[tests|[https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3701/]],
 we detected 38/166 test failures which are due to 

`FAIL: Kafka server didn't finish startup in 60 seconds`

need further investigation on this.


> A large number of core system tests failing due to Kafka server failed to 
> start on trunk
> 
>
> Key: KAFKA-9473
> URL: https://issues.apache.org/jira/browse/KAFKA-9473
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Critical
>
> By running a full set of core system tests, we detected 38/166 test failures 
> which are due to 
> `FAIL: Kafka server didn't finish startup in 60 seconds`
> need further investigation on this.
> [https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3701/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9473) A large number of core system tests failing due to Kafka server failed to start on trunk

2020-01-24 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9473:
---
Description: 
By running a full set of core system 
[tests|[https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3701/]],
 we detected 38/166 test failures which are due to 

`FAIL: Kafka server didn't finish startup in 60 seconds`

need further investigation on this.

  was:
By running a full set of core system tests, we detected 38/166 test failures 
which are due to 

`FAIL: Kafka server didn't finish startup in 60 seconds`

need further investigation on this.


> A large number of core system tests failing due to Kafka server failed to 
> start on trunk
> 
>
> Key: KAFKA-9473
> URL: https://issues.apache.org/jira/browse/KAFKA-9473
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Critical
>
> By running a full set of core system 
> [tests|[https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3701/]],
>  we detected 38/166 test failures which are due to 
> `FAIL: Kafka server didn't finish startup in 60 seconds`
> need further investigation on this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9473) A large number of core system tests failing due to Kafka server failed to start on trunk

2020-01-24 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-9473:
--

 Summary: A large number of core system tests failing due to Kafka 
server failed to start on trunk
 Key: KAFKA-9473
 URL: https://issues.apache.org/jira/browse/KAFKA-9473
 Project: Kafka
  Issue Type: Bug
Reporter: Boyang Chen


By running a full set of core system tests, we detected 38/166 test failures 
which are due to 

`FAIL: Kafka server didn't finish startup in 60 seconds`

need further investigation on this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-7317) Use collections subscription for main consumer to reduce metadata

2020-01-24 Thread Bill Bejeck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck updated KAFKA-7317:
---
Fix Version/s: 2.5.0

> Use collections subscription for main consumer to reduce metadata
> -
>
> Key: KAFKA-7317
> URL: https://issues.apache.org/jira/browse/KAFKA-7317
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Fix For: 2.5.0
>
>
> In KAFKA-4633 we switched from "collection subscription" to "pattern 
> subscription" for `Consumer#subscribe()` to avoid triggering auto topic 
> creating on the broker. In KAFKA-5291, the metadata request was extended to 
> overwrite the broker config within the request itself. However, this feature 
> is only used in `KafkaAdminClient`. KAFKA-7320 adds this feature for the 
> consumer client, too.
> This ticket proposes to use the new feature within Kafka Streams to allow the 
> usage of collection based subscription in consumer and admit clients to 
> reduce the metadata response size than can be very large for large number of 
> partitions in the cluster.
> Note, that Streams need to be able to distinguish if it connects to older 
> brokers that do not support the new metadata request and still use pattern 
> subscription for this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-7317) Use collections subscription for main consumer to reduce metadata

2020-01-24 Thread Bill Bejeck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck resolved KAFKA-7317.

Resolution: Fixed

> Use collections subscription for main consumer to reduce metadata
> -
>
> Key: KAFKA-7317
> URL: https://issues.apache.org/jira/browse/KAFKA-7317
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Fix For: 2.5.0
>
>
> In KAFKA-4633 we switched from "collection subscription" to "pattern 
> subscription" for `Consumer#subscribe()` to avoid triggering auto topic 
> creating on the broker. In KAFKA-5291, the metadata request was extended to 
> overwrite the broker config within the request itself. However, this feature 
> is only used in `KafkaAdminClient`. KAFKA-7320 adds this feature for the 
> consumer client, too.
> This ticket proposes to use the new feature within Kafka Streams to allow the 
> usage of collection based subscription in consumer and admit clients to 
> reduce the metadata response size than can be very large for large number of 
> partitions in the cluster.
> Note, that Streams need to be able to distinguish if it connects to older 
> brokers that do not support the new metadata request and still use pattern 
> subscription for this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8821) Avoid pattern subscription to allow for stricter ACL settings

2020-01-24 Thread Bill Bejeck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck resolved KAFKA-8821.

Resolution: Fixed

Resolved via [https://github.com/apache/kafka/pull/7969]

> Avoid pattern subscription to allow for stricter ACL settings
> -
>
> Key: KAFKA-8821
> URL: https://issues.apache.org/jira/browse/KAFKA-8821
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Sophie Blee-Goldman
>Priority: Minor
> Fix For: 2.5.0
>
>
> To avoid triggering auto topic creation (if `auto.create.topic.enable=true` 
> on the brokers), Kafka Streams uses consumer pattern subscription. For this 
> case, the consumer requests all metadata from the brokers and does client 
> side filtering.
> However, if users want to set ACL to restrict a Kafka Streams application, 
> this may results in broker side ERROR logs that some metadata cannot be 
> provided. The only way to avoid those broker side ERROR logs is to grant 
> corresponding permissions.
> As of 2.3 release it's possible to disable auto topic creation client side 
> (via https://issues.apache.org/jira/browse/KAFKA-7320). Kafka Streams should 
> use this new feature (note, that broker version 0.11 is required) to allow 
> users to set strict ACLs without getting flooded with ERROR logs on the 
> broker.
> The proposal is that by default Kafka Streams disables auto-topic create 
> client side (optimistically) and uses regular subscription (not pattern 
> subscription). If an older broker is used, users need to explicitly enable 
> `allow.auto.create.topic` client side. If we detect this setting, we switch 
> back to pattern based subscription.
> If users don't enable auto topic create client side and run with an older 
> broker, we would just rethrow the exception to the user, adding some context 
> information on how to fix the issue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8821) Avoid pattern subscription to allow for stricter ACL settings

2020-01-24 Thread Bill Bejeck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck updated KAFKA-8821:
---
Fix Version/s: 2.5.0

> Avoid pattern subscription to allow for stricter ACL settings
> -
>
> Key: KAFKA-8821
> URL: https://issues.apache.org/jira/browse/KAFKA-8821
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Sophie Blee-Goldman
>Priority: Minor
> Fix For: 2.5.0
>
>
> To avoid triggering auto topic creation (if `auto.create.topic.enable=true` 
> on the brokers), Kafka Streams uses consumer pattern subscription. For this 
> case, the consumer requests all metadata from the brokers and does client 
> side filtering.
> However, if users want to set ACL to restrict a Kafka Streams application, 
> this may results in broker side ERROR logs that some metadata cannot be 
> provided. The only way to avoid those broker side ERROR logs is to grant 
> corresponding permissions.
> As of 2.3 release it's possible to disable auto topic creation client side 
> (via https://issues.apache.org/jira/browse/KAFKA-7320). Kafka Streams should 
> use this new feature (note, that broker version 0.11 is required) to allow 
> users to set strict ACLs without getting flooded with ERROR logs on the 
> broker.
> The proposal is that by default Kafka Streams disables auto-topic create 
> client side (optimistically) and uses regular subscription (not pattern 
> subscription). If an older broker is used, users need to explicitly enable 
> `allow.auto.create.topic` client side. If we detect this setting, we switch 
> back to pattern based subscription.
> If users don't enable auto topic create client side and run with an older 
> broker, we would just rethrow the exception to the user, adding some context 
> information on how to fix the issue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7317) Use collections subscription for main consumer to reduce metadata

2020-01-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023198#comment-17023198
 ] 

ASF GitHub Bot commented on KAFKA-7317:
---

bbejeck commented on pull request #7969: KAFKA-7317: Use collections 
subscription for main consumer to reduce metadata
URL: https://github.com/apache/kafka/pull/7969
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use collections subscription for main consumer to reduce metadata
> -
>
> Key: KAFKA-7317
> URL: https://issues.apache.org/jira/browse/KAFKA-7317
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Sophie Blee-Goldman
>Priority: Major
>
> In KAFKA-4633 we switched from "collection subscription" to "pattern 
> subscription" for `Consumer#subscribe()` to avoid triggering auto topic 
> creating on the broker. In KAFKA-5291, the metadata request was extended to 
> overwrite the broker config within the request itself. However, this feature 
> is only used in `KafkaAdminClient`. KAFKA-7320 adds this feature for the 
> consumer client, too.
> This ticket proposes to use the new feature within Kafka Streams to allow the 
> usage of collection based subscription in consumer and admit clients to 
> reduce the metadata response size than can be very large for large number of 
> partitions in the cluster.
> Note, that Streams need to be able to distinguish if it connects to older 
> brokers that do not support the new metadata request and still use pattern 
> subscription for this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9471) Throw exception for DEAD StreamThread.State

2020-01-24 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved KAFKA-9471.
---
Resolution: Duplicate

> Throw exception for DEAD StreamThread.State
> ---
>
> Key: KAFKA-9471
> URL: https://issues.apache.org/jira/browse/KAFKA-9471
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
>
> In StreamThreadStateStoreProvider we have:
> {code}
> if (streamThread.state() == StreamThread.State.DEAD) {
> return Collections.emptyList();
> {code}
> If user cannot retry anymore, we should throw exception which is handled in 
> the else block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9471) Throw exception for DEAD StreamThread.State

2020-01-24 Thread Vito Jeng (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022980#comment-17022980
 ] 

Vito Jeng commented on KAFKA-9471:
--

[~yuzhih...@gmail.com]

Yes, I would be include this in KIP-216.

> Throw exception for DEAD StreamThread.State
> ---
>
> Key: KAFKA-9471
> URL: https://issues.apache.org/jira/browse/KAFKA-9471
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
>
> In StreamThreadStateStoreProvider we have:
> {code}
> if (streamThread.state() == StreamThread.State.DEAD) {
> return Collections.emptyList();
> {code}
> If user cannot retry anymore, we should throw exception which is handled in 
> the else block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9471) Throw exception for DEAD StreamThread.State

2020-01-24 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022970#comment-17022970
 ] 

Ted Yu commented on KAFKA-9471:
---

[~vitojeng]
Please let me know if I should proceed with this or, do you plan to include the 
exception throwing in KIP-216 ?

> Throw exception for DEAD StreamThread.State
> ---
>
> Key: KAFKA-9471
> URL: https://issues.apache.org/jira/browse/KAFKA-9471
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
>
> In StreamThreadStateStoreProvider we have:
> {code}
> if (streamThread.state() == StreamThread.State.DEAD) {
> return Collections.emptyList();
> {code}
> If user cannot retry anymore, we should throw exception which is handled in 
> the else block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9471) Throw exception for DEAD StreamThread.State

2020-01-24 Thread Vito Jeng (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022968#comment-17022968
 ] 

Vito Jeng commented on KAFKA-9471:
--

[~yuzhih...@gmail.com]

Yes. Currently, *shouldAllowToQueryAfterThreadDied()* would be fail when 
InvalidStateStoreException thrown.

This test method also should be update, IMO.

I notice this during KIP-216 implementation, too, but my PR is not yet 
completed.

> Throw exception for DEAD StreamThread.State
> ---
>
> Key: KAFKA-9471
> URL: https://issues.apache.org/jira/browse/KAFKA-9471
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
>
> In StreamThreadStateStoreProvider we have:
> {code}
> if (streamThread.state() == StreamThread.State.DEAD) {
> return Collections.emptyList();
> {code}
> If user cannot retry anymore, we should throw exception which is handled in 
> the else block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-24 Thread Oleksii Boiko (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021982#comment-17021982
 ] 

Oleksii Boiko edited comment on KAFKA-8803 at 1/24/20 11:10 AM:


Hi all,

We are facing same issue. The frequency of issue reproducing was increased 
after migrating to Kafka-streams 2.3
 For our case increase was caused by changing commit behaviour in 
Kafka-streams. Previously(we used 2.0.1), "commit offsets" was executed only in 
case when new record was consumed on source node but in version 2.3 "commit 
offset" executes on each "punctuate" call even no changes were made. We have 
punctuator with 1s wall-clock scheduler. As the the result commit offsets 
operations count was grown. Additionally to this we had detected that error(on 
broker side) which causes transctional id stuck

 
{noformat}
java.lang.IllegalStateException: TransactionalId  failed transition to 
state TxnTransitMetadata(producerId=61432, producerEpoch=0, txnTimeoutMs=6, 
txnState=Ongoing, topicPartitions=Set(__consumer_offsets-10), 
txnStartTimestamp=1577934186261, txnLastUpdateTimestamp=1577934186261) due to 
unexpected metadata{noformat}
appears almost every day  at the same time and txnStartTimestamp of previous 
transaction state is younger than txnStartTimestamp of transaction target state

If I'm correct that means that transaction can not be transferred to "Ongoing" 
state and as the result it never expires due to only "Ongoing" transactions can 
be expired
 kafka.coordinator.transaction.TransactionStateManager#timedOutTransactions
{noformat}
def timedOutTransactions(): Iterable[TransactionalIdAndProducerIdEpoch] = {
val now = time.milliseconds()
inReadLock(stateLock) {
  transactionMetadataCache.filter { case (txnPartitionId, _) =>
!leavingPartitions.exists(_.txnPartitionId == txnPartitionId)
  }.flatMap { case (_, entry) =>
entry.metadataPerTransactionalId.filter { case (_, txnMetadata) =>
  if (txnMetadata.pendingTransitionInProgress) {
false
  } else {
txnMetadata.state match {
  case Ongoing =>
txnMetadata.txnStartTimestamp + txnMetadata.txnTimeoutMs < now
  case _ => false
}
  }
}.map { case (txnId, txnMetadata) =>
  TransactionalIdAndProducerIdEpoch(txnId, txnMetadata.producerId, 
txnMetadata.producerEpoch)
}
  }
}
  }{noformat}
Based on this inputs we had found that time of issue reproducing is the time of 
ntp synchronization. 
 Our broker timer goes forward in comparison with ntp and have up to +2 seconds 
per 24 hour and ntp sync rollbacks this delta.

After disabling ntp sync issue was gone
 We had found that similar issue was already fixed previously but it does not 
cover all the cases https://issues.apache.org/jira/browse/KAFKA-5415
 There is one more place where timestamp comparison exists

kafka.coordinator.transaction.TransactionMetadata#completeTransitionTo
{noformat}
case Ongoing => // from addPartitions if (!validProducerEpoch(transitMetadata) 
|| !topicPartitions.subsetOf(transitMetadata.topicPartitions) || txnTimeoutMs 
!= transitMetadata.txnTimeoutMs || txnStartTimestamp > 
transitMetadata.txnStartTimestamp) { 
throwStateTransitionFailure(transitMetadata) } else { txnStartTimestamp = 
transitMetadata.txnStartTimestamp 
addPartitions(transitMetadata.topicPartitions) }
{noformat}
 

 

 


was (Author: oleksii.boiko):
Hi all,

We are facing same issue. The frequency of issue reproducing was increased 
after migrating to Kafka-streams 2.3
 For our case increase was caused by changing commit behaviour in 
Kafka-streams. Previously(we used 2.0.1), "commit offsets" was executed only in 
case when new record was consumed on source node but in version 2.3 "commit 
offset" executes on each "punctuate" call even no changes were made. We have 
punctuator with 1s wall-clock scheduler. As the the result commit offsets 
operations count was grown. Additionally to this we had detected that error(on 
broker side) which causes transctional id stuck

 
{noformat}
java.lang.IllegalStateException: TransactionalId  failed transition to 
state TxnTransitMetadata(producerId=61432, producerEpoch=0, txnTimeoutMs=6, 
txnState=Ongoing, topicPartitions=Set(__consumer_offsets-10), 
txnStartTimestamp=1577934186261, txnLastUpdateTimestamp=1577934186261) due to 
unexpected metadata{noformat}
appears almost every day  at the same time and txnStartTimestamp of previous 
transaction state is younger than txnStartTimestamp of transaction target state

If I'm correct that means that transaction can not be transferred to "Ongoing" 
state and as the result it never expires due to only "Ongoing" transactions can 
be expired
 kafka.coordinator.transaction.TransactionStateManager#timedOutTransactions
{noformat}
def timedOutTransactions(): 

[jira] [Comment Edited] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-01-24 Thread Oleksii Boiko (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021982#comment-17021982
 ] 

Oleksii Boiko edited comment on KAFKA-8803 at 1/24/20 11:09 AM:


Hi all,

We are facing same issue. The frequency of issue reproducing was increased 
after migrating to Kafka-streams 2.3
 For our case increase was caused by changing commit behaviour in 
Kafka-streams. Previously(we used 2.0.1), "commit offsets" was executed only in 
case when new record was consumed on source node but in version 2.3 "commit 
offset" executes on each "punctuate" call even no changes were made. We have 
punctuator with 1s wall-clock scheduler. As the the result commit offsets 
operations count was grown. Additionally to this we had detected that error(on 
broker side) which causes transctional id stuck

 
{noformat}
java.lang.IllegalStateException: TransactionalId  failed transition to 
state TxnTransitMetadata(producerId=61432, producerEpoch=0, txnTimeoutMs=6, 
txnState=Ongoing, topicPartitions=Set(__consumer_offsets-10), 
txnStartTimestamp=1577934186261, txnLastUpdateTimestamp=1577934186261) due to 
unexpected metadata{noformat}
appears almost every day  at the same time and txnStartTimestamp of previous 
transaction state is younger than txnStartTimestamp of transaction target state

If I'm correct that means that transaction can not be transferred to "Ongoing" 
state and as the result it never expires due to only "Ongoing" transactions can 
be expired
 kafka.coordinator.transaction.TransactionStateManager#timedOutTransactions
{noformat}
def timedOutTransactions(): Iterable[TransactionalIdAndProducerIdEpoch] = {
val now = time.milliseconds()
inReadLock(stateLock) {
  transactionMetadataCache.filter { case (txnPartitionId, _) =>
!leavingPartitions.exists(_.txnPartitionId == txnPartitionId)
  }.flatMap { case (_, entry) =>
entry.metadataPerTransactionalId.filter { case (_, txnMetadata) =>
  if (txnMetadata.pendingTransitionInProgress) {
false
  } else {
txnMetadata.state match {
  case Ongoing =>
txnMetadata.txnStartTimestamp + txnMetadata.txnTimeoutMs < now
  case _ => false
}
  }
}.map { case (txnId, txnMetadata) =>
  TransactionalIdAndProducerIdEpoch(txnId, txnMetadata.producerId, 
txnMetadata.producerEpoch)
}
  }
}
  }{noformat}
Based on this inputs we had found that time of issue reproducing is the time of 
ntp synchronization. 
 Our broker timer goes froward in comparison with ntp and have up to +2 seconds 
per 24 hour and ntp sync rollbacks this delta.

After disabling ntp sync issue was gone
 We had found that similar issue was already fixed previously but it does not 
cover all the cases https://issues.apache.org/jira/browse/KAFKA-5415
 There is one more place where timestamp comparison exists

kafka.coordinator.transaction.TransactionMetadata#completeTransitionTo
{noformat}
case Ongoing => // from addPartitions if (!validProducerEpoch(transitMetadata) 
|| !topicPartitions.subsetOf(transitMetadata.topicPartitions) || txnTimeoutMs 
!= transitMetadata.txnTimeoutMs || txnStartTimestamp > 
transitMetadata.txnStartTimestamp) { 
throwStateTransitionFailure(transitMetadata) } else { txnStartTimestamp = 
transitMetadata.txnStartTimestamp 
addPartitions(transitMetadata.topicPartitions) }
{noformat}
 

 

 


was (Author: oleksii.boiko):
Hi all,

We are facing same issue. The frequency of issue reproducing was increased 
after migrating to Kafka-streams 2.3
 For our case increase was caused by changing commit behaviour in 
Kafka-streams. Previously(we used 2.0.1), "commit offsets" was executed only in 
case when new record was consumed on source node but in version 2.3 "commit 
offset" executes on each "punctuate" call even no changes were made. We have 
punctuator with 1s wall-clock scheduler. As the the result commit offsets 
operations count was grown. Additionally to this we had detected that error(on 
broker side) which causes transctional id stuck

 
{noformat}
java.lang.IllegalStateException: TransactionalId  failed transition to 
state TxnTransitMetadata(producerId=61432, producerEpoch=0, txnTimeoutMs=6, 
txnState=Ongoing, topicPartitions=Set(__consumer_offsets-10), 
txnStartTimestamp=1577934186261, txnLastUpdateTimestamp=1577934186261) due to 
unexpected metadata{noformat}
appears almost every day  at the same time and txnStartTimestamp of previous 
transaction state is younger than txnStartTimestamp of transaction target state

If I'm correct that means that transaction can not be transferred to "Ongoing" 
state and as the result it never expires due to only "Ongoing" transactions can 
be expired
 kafka.coordinator.transaction.TransactionStateManager#timedOutTransactions
{noformat}
def timedOutTransactions(): 

[jira] [Resolved] (KAFKA-9181) Flaky test kafka.api.SaslGssapiSslEndToEndAuthorizationTest.testNoConsumeWithoutDescribeAclViaSubscribe

2020-01-24 Thread Rajini Sivaram (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-9181.
---
Fix Version/s: 2.5.0
 Reviewer: Jason Gustafson
   Resolution: Fixed

> Flaky test 
> kafka.api.SaslGssapiSslEndToEndAuthorizationTest.testNoConsumeWithoutDescribeAclViaSubscribe
> ---
>
> Key: KAFKA-9181
> URL: https://issues.apache.org/jira/browse/KAFKA-9181
> Project: Kafka
>  Issue Type: Test
>  Components: core
>Reporter: Bill Bejeck
>Assignee: Rajini Sivaram
>Priority: Major
>  Labels: flaky-test, tests
> Fix For: 2.5.0
>
>
> Failed in 
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/26571/testReport/junit/kafka.api/SaslGssapiSslEndToEndAuthorizationTest/testNoConsumeWithoutDescribeAclViaSubscribe/]
>  
> {noformat}
> Error Messageorg.apache.kafka.common.errors.TopicAuthorizationException: Not 
> authorized to access topics: 
> [topic2]Stacktraceorg.apache.kafka.common.errors.TopicAuthorizationException: 
> Not authorized to access topics: [topic2]
> Standard OutputAdding ACLs for resource 
> `ResourcePattern(resourceType=CLUSTER, name=kafka-cluster, 
> patternType=LITERAL)`: 
>   (principal=User:kafka, host=*, operation=CLUSTER_ACTION, 
> permissionType=ALLOW) 
> Current ACLs for resource `Cluster:LITERAL:kafka-cluster`: 
>   User:kafka has Allow permission for operations: ClusterAction from 
> hosts: * 
> Adding ACLs for resource `ResourcePattern(resourceType=TOPIC, name=*, 
> patternType=LITERAL)`: 
>   (principal=User:kafka, host=*, operation=READ, permissionType=ALLOW) 
> Current ACLs for resource `Topic:LITERAL:*`: 
>   User:kafka has Allow permission for operations: Read from hosts: * 
> Debug is  true storeKey true useTicketCache false useKeyTab true doNotPrompt 
> false ticketCache is null isInitiator true KeyTab is 
> /tmp/kafka6494439724844851846.tmp refreshKrb5Config is false principal is 
> kafka/localh...@example.com tryFirstPass is false useFirstPass is false 
> storePass is false clearPass is false
> principal is kafka/localh...@example.com
> Will use keytab
> Commit Succeeded 
> [2019-11-13 04:43:16,187] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition __consumer_offsets-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-11-13 04:43:16,191] ERROR [ReplicaFetcher replicaId=2, leaderId=0, 
> fetcherId=0] Error for partition __consumer_offsets-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-11-13 04:43:16,384] ERROR [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=0] Error for partition e2etopic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-11-13 04:43:16,384] ERROR [ReplicaFetcher replicaId=0, leaderId=2, 
> fetcherId=0] Error for partition e2etopic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> Adding ACLs for resource `ResourcePattern(resourceType=TOPIC, name=e2etopic, 
> patternType=LITERAL)`: 
>   (principal=User:client, host=*, operation=WRITE, permissionType=ALLOW)
>   (principal=User:client, host=*, operation=DESCRIBE, 
> permissionType=ALLOW)
>   (principal=User:client, host=*, operation=CREATE, permissionType=ALLOW) 
> Current ACLs for resource `Topic:LITERAL:e2etopic`: 
>   User:client has Allow permission for operations: Describe from hosts: *
>   User:client has Allow permission for operations: Write from hosts: *
>   User:client has Allow permission for operations: Create from hosts: * 
> Adding ACLs for resource `ResourcePattern(resourceType=TOPIC, name=e2etopic, 
> patternType=LITERAL)`: 
>   (principal=User:client, host=*, operation=READ, permissionType=ALLOW)
>   (principal=User:client, host=*, operation=DESCRIBE, 
> permissionType=ALLOW) 
> Adding ACLs for resource `ResourcePattern(resourceType=GROUP, name=group, 
> patternType=LITERAL)`: 
>   (principal=User:client, host=*, operation=READ, permissionType=ALLOW) 
> Current ACLs for resource `Topic:LITERAL:e2etopic`: 
>   User:client has Allow permission for operations: Read from hosts: *
>   User:client has Allow permission for operations: Describe from hosts: *
>   User:client has Allow permission for operations: Write from hosts: *
>   User:client has 

[jira] [Commented] (KAFKA-9181) Flaky test kafka.api.SaslGssapiSslEndToEndAuthorizationTest.testNoConsumeWithoutDescribeAclViaSubscribe

2020-01-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022848#comment-17022848
 ] 

ASF GitHub Bot commented on KAFKA-9181:
---

rajinisivaram commented on pull request #7941: KAFKA-9181; Maintain clean 
separation between local and group subscriptions in consumer's SubscriptionState
URL: https://github.com/apache/kafka/pull/7941
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky test 
> kafka.api.SaslGssapiSslEndToEndAuthorizationTest.testNoConsumeWithoutDescribeAclViaSubscribe
> ---
>
> Key: KAFKA-9181
> URL: https://issues.apache.org/jira/browse/KAFKA-9181
> Project: Kafka
>  Issue Type: Test
>  Components: core
>Reporter: Bill Bejeck
>Assignee: Rajini Sivaram
>Priority: Major
>  Labels: flaky-test, tests
>
> Failed in 
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/26571/testReport/junit/kafka.api/SaslGssapiSslEndToEndAuthorizationTest/testNoConsumeWithoutDescribeAclViaSubscribe/]
>  
> {noformat}
> Error Messageorg.apache.kafka.common.errors.TopicAuthorizationException: Not 
> authorized to access topics: 
> [topic2]Stacktraceorg.apache.kafka.common.errors.TopicAuthorizationException: 
> Not authorized to access topics: [topic2]
> Standard OutputAdding ACLs for resource 
> `ResourcePattern(resourceType=CLUSTER, name=kafka-cluster, 
> patternType=LITERAL)`: 
>   (principal=User:kafka, host=*, operation=CLUSTER_ACTION, 
> permissionType=ALLOW) 
> Current ACLs for resource `Cluster:LITERAL:kafka-cluster`: 
>   User:kafka has Allow permission for operations: ClusterAction from 
> hosts: * 
> Adding ACLs for resource `ResourcePattern(resourceType=TOPIC, name=*, 
> patternType=LITERAL)`: 
>   (principal=User:kafka, host=*, operation=READ, permissionType=ALLOW) 
> Current ACLs for resource `Topic:LITERAL:*`: 
>   User:kafka has Allow permission for operations: Read from hosts: * 
> Debug is  true storeKey true useTicketCache false useKeyTab true doNotPrompt 
> false ticketCache is null isInitiator true KeyTab is 
> /tmp/kafka6494439724844851846.tmp refreshKrb5Config is false principal is 
> kafka/localh...@example.com tryFirstPass is false useFirstPass is false 
> storePass is false clearPass is false
> principal is kafka/localh...@example.com
> Will use keytab
> Commit Succeeded 
> [2019-11-13 04:43:16,187] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition __consumer_offsets-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-11-13 04:43:16,191] ERROR [ReplicaFetcher replicaId=2, leaderId=0, 
> fetcherId=0] Error for partition __consumer_offsets-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-11-13 04:43:16,384] ERROR [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=0] Error for partition e2etopic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-11-13 04:43:16,384] ERROR [ReplicaFetcher replicaId=0, leaderId=2, 
> fetcherId=0] Error for partition e2etopic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> Adding ACLs for resource `ResourcePattern(resourceType=TOPIC, name=e2etopic, 
> patternType=LITERAL)`: 
>   (principal=User:client, host=*, operation=WRITE, permissionType=ALLOW)
>   (principal=User:client, host=*, operation=DESCRIBE, 
> permissionType=ALLOW)
>   (principal=User:client, host=*, operation=CREATE, permissionType=ALLOW) 
> Current ACLs for resource `Topic:LITERAL:e2etopic`: 
>   User:client has Allow permission for operations: Describe from hosts: *
>   User:client has Allow permission for operations: Write from hosts: *
>   User:client has Allow permission for operations: Create from hosts: * 
> Adding ACLs for resource `ResourcePattern(resourceType=TOPIC, name=e2etopic, 
> patternType=LITERAL)`: 
>   (principal=User:client, host=*, operation=READ, permissionType=ALLOW)
>   (principal=User:client, host=*, operation=DESCRIBE, 
> permissionType=ALLOW) 
> Adding ACLs for resource