[jira] [Created] (KAFKA-8051) remove KafkaMbean when network close

2019-03-05 Thread limeng (JIRA)
limeng created KAFKA-8051:
-

 Summary: remove KafkaMbean when network close
 Key: KAFKA-8051
 URL: https://issues.apache.org/jira/browse/KAFKA-8051
 Project: Kafka
  Issue Type: Bug
  Components: clients, core
Affects Versions: 0.10.2.2, 0.10.2.1, 0.10.2.0
Reporter: limeng
 Fix For: 2.2.1


the  broker server will be oom when 
 * a large number of clients frequently close and reconnect
 * the clientId changes every time when reconnect,that gives rise to too much 
kafkaMbean in broker

the reason is that broker forget to remove kafkaMbean when detect connection 
closes.
h2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8050) remove KafkaMbean when network close

2019-03-05 Thread limeng (JIRA)
limeng created KAFKA-8050:
-

 Summary: remove KafkaMbean when network close
 Key: KAFKA-8050
 URL: https://issues.apache.org/jira/browse/KAFKA-8050
 Project: Kafka
  Issue Type: Bug
  Components: clients, core
Affects Versions: 0.10.2.2, 0.10.2.1, 0.10.2.0
Reporter: limeng
 Fix For: 2.2.1


the  broker server will be oom when 
 * a large number of clients frequently close and reconnect
 * the clientId changes every time when reconnect,that gives rise to too much 
kafkaMbean in broker

the reason is that broker forget to remove kafkaMbean when detect connection 
closes.
h2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8049) remove KafkaMbean when network close

2019-03-05 Thread limeng (JIRA)
limeng created KAFKA-8049:
-

 Summary: remove KafkaMbean when network close
 Key: KAFKA-8049
 URL: https://issues.apache.org/jira/browse/KAFKA-8049
 Project: Kafka
  Issue Type: Bug
  Components: clients, core
Affects Versions: 0.10.2.2, 0.10.2.1, 0.10.2.0
Reporter: limeng
 Fix For: 2.2.1


the  broker server will be oom when 
 * a large number of clients frequently close and reconnect
 * the clientId changes every time when reconnect,that gives rise to too much 
kafkaMbean in broker

the reason is that broker forget to remove kafkaMbean when detect connection 
closes.
h2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8048) remove KafkaMbean when network close

2019-03-05 Thread limeng (JIRA)
limeng created KAFKA-8048:
-

 Summary: remove KafkaMbean when network close
 Key: KAFKA-8048
 URL: https://issues.apache.org/jira/browse/KAFKA-8048
 Project: Kafka
  Issue Type: Bug
  Components: clients, core
Affects Versions: 0.10.2.2, 0.10.2.1, 0.10.2.0
Reporter: limeng
 Fix For: 2.2.1


the  broker server will be oom when 
 * a large number of clients frequently close and reconnect
 * the clientId changes every time when reconnect,that gives rise to too much 
kafkaMbean in broker

the reason is that broker forget to remove kafkaMbean when detect connection 
closes.
h2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8047) remove KafkaMbean when network close

2019-03-05 Thread limeng (JIRA)
limeng created KAFKA-8047:
-

 Summary: remove KafkaMbean when network close
 Key: KAFKA-8047
 URL: https://issues.apache.org/jira/browse/KAFKA-8047
 Project: Kafka
  Issue Type: Bug
  Components: clients, core
Affects Versions: 0.10.2.2, 0.10.2.1, 0.10.2.0
Reporter: limeng
 Fix For: 2.2.1


the  broker server will be oom when 
 * a large number of clients frequently close and reconnect
 * the clientId changes every time when reconnect,that gives rise to too much 
kafkaMbean in broker

the reason is that broker forget to remove kafkaMbean when detect connection 
closes.
h2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8020) Consider changing design of ThreadCache

2019-03-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785201#comment-16785201
 ] 

ASF GitHub Bot commented on KAFKA-8020:
---

ConcurrencyPractitioner commented on pull request #6376: [KAFKA-8020] Consider 
making ThreadCache a TLRUCache
URL: https://github.com/apache/kafka/pull/6376
 
 
   We are implementing an time-aware LRU Cache to supplant the NamedCache 
currently used by Kafka Streams.
   
   The tests will be modified to check to see that entries which have their 
life time expired will be removed first. 
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider changing design of ThreadCache 
> 
>
> Key: KAFKA-8020
> URL: https://issues.apache.org/jira/browse/KAFKA-8020
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Richard Yu
>Priority: Major
>
> In distributed systems, time-aware LRU Caches offers a superior eviction 
> policy better than traditional LRU models, having more cache hits than 
> misses. In this new policy, if an item is stored beyond its useful lifespan, 
> then it is removed. For example, in {{CachingWindowStore}}, a window usually 
> is of limited size. After it expires, it would no longer be queried for, but 
> it potentially could stay in the ThreadCache for an unnecessary amount of 
> time if it is not evicted (i.e. the number of entries being inserted is few). 
> For better allocation of memory, it would be better if we implement a 
> time-aware LRU Cache which takes into account the lifespan of an entry and 
> removes it once it has expired.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7994) Improve Stream-Time for rebalances and restarts

2019-03-05 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785150#comment-16785150
 ] 

Guozhang Wang commented on KAFKA-7994:
--

| It wouldn't be terrible if we sometimes held on to records a little longer 
than necessary, but actually, I think this won't happen either.

Hmm I think there's still a risk for this, depending on how users think of 
suppression semantics: back to my example in the first comment, and consider a 
windowed count with length 10 and grace 0, users would expect that a final 
result of window (0,10) be sent when r3 is processed with value 2, but they 
would not see the result even after r4's processed since the ST is only 2 at 
that sub-topology, right?

> Improve Stream-Time for rebalances and restarts
> ---
>
> Key: KAFKA-7994
> URL: https://issues.apache.org/jira/browse/KAFKA-7994
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>
> We compute a per-partition partition-time as the maximum timestamp over all 
> records processed so far. Furthermore, we use partition-time to compute 
> stream-time for each task as maximum over all partition-times (for all 
> corresponding task partitions). This stream-time is used to make decisions 
> about processing out-of-order records or drop them if they are late (ie, 
> timestamp < stream-time - grace-period).
> During rebalances and restarts, stream-time is initialized as UNKNOWN (ie, 
> -1) for tasks that are newly created (or migrated). In net effect, we forget 
> current stream-time for this case what may lead to non-deterministic behavior 
> if we stop processing right before a late record, that would be dropped if we 
> continue processing, but is not dropped after rebalance/restart. Let's look 
> at an examples with a grade period of 5ms for a tumbling windowed of 5ms, and 
> the following records (timestamps in parenthesis):
>  
> {code:java}
> r1(0) r2(5) r3(11) r4(2){code}
> In the example, stream-time advances as 0, 5, 11, 11  and thus record `r4` is 
> dropped as late because 2 < 6 = 11 - 5. However, if we stop processing or 
> rebalance after processing `r3` but before processing `r4`, we would 
> reinitialize stream-time as -1, and thus would process `r4` on restart/after 
> rebalance. The problem is, that stream-time does advance differently from a 
> global point of view: 0, 5, 11, 2.
>  
> Note, this is a corner case, because if we would stop processing one record 
> earlier, ie, after processing `r2` but before processing `r3`, stream-time 
> would be advance correctly from a global point of view.
> A potential fix would be, to store latest observed partition-time in the 
> metadata of committed offsets. Thus way, on restart/rebalance we can 
> re-initialize time correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7994) Improve Stream-Time for rebalances and restarts

2019-03-05 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785144#comment-16785144
 ] 

Guozhang Wang commented on KAFKA-7994:
--

Just to add: even in the KIP to add negative timestamp support, we will still 
preserve -1 for special meanings and trade that we cannot represent that 
specific ms in history :)

> Improve Stream-Time for rebalances and restarts
> ---
>
> Key: KAFKA-7994
> URL: https://issues.apache.org/jira/browse/KAFKA-7994
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>
> We compute a per-partition partition-time as the maximum timestamp over all 
> records processed so far. Furthermore, we use partition-time to compute 
> stream-time for each task as maximum over all partition-times (for all 
> corresponding task partitions). This stream-time is used to make decisions 
> about processing out-of-order records or drop them if they are late (ie, 
> timestamp < stream-time - grace-period).
> During rebalances and restarts, stream-time is initialized as UNKNOWN (ie, 
> -1) for tasks that are newly created (or migrated). In net effect, we forget 
> current stream-time for this case what may lead to non-deterministic behavior 
> if we stop processing right before a late record, that would be dropped if we 
> continue processing, but is not dropped after rebalance/restart. Let's look 
> at an examples with a grade period of 5ms for a tumbling windowed of 5ms, and 
> the following records (timestamps in parenthesis):
>  
> {code:java}
> r1(0) r2(5) r3(11) r4(2){code}
> In the example, stream-time advances as 0, 5, 11, 11  and thus record `r4` is 
> dropped as late because 2 < 6 = 11 - 5. However, if we stop processing or 
> rebalance after processing `r3` but before processing `r4`, we would 
> reinitialize stream-time as -1, and thus would process `r4` on restart/after 
> rebalance. The problem is, that stream-time does advance differently from a 
> global point of view: 0, 5, 11, 2.
>  
> Note, this is a corner case, because if we would stop processing one record 
> earlier, ie, after processing `r2` but before processing `r3`, stream-time 
> would be advance correctly from a global point of view.
> A potential fix would be, to store latest observed partition-time in the 
> metadata of committed offsets. Thus way, on restart/rebalance we can 
> re-initialize time correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8046) Shutdown broker because all log dirs in /tmp/kafka-logs have failed

2019-03-05 Thread jaren (JIRA)
jaren created KAFKA-8046:


 Summary: Shutdown broker because all log dirs in /tmp/kafka-logs 
have failed
 Key: KAFKA-8046
 URL: https://issues.apache.org/jira/browse/KAFKA-8046
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: centos 7
Reporter: jaren


kafka stop working every few days.Here are some of logs.

ERROR Error while reading checkpoint file 
/tmp/kafka-logs/cleaner-offset-checkpoint (kafka.server.LogDirFailureChannel)
java.io.FileNotFoundException: /tmp/kafka-logs/cleaner-offset-checkpoint (No 
such file or directory)
 at java.io.FileInputStream.open0(Native Method)
 at java.io.FileInputStream.open(FileInputStream.java:195)
 at java.io.FileInputStream.(FileInputStream.java:138)
 at 
kafka.server.checkpoints.CheckpointFile.liftedTree2$1(CheckpointFile.scala:87)
 at kafka.server.checkpoints.CheckpointFile.read(CheckpointFile.scala:86)
 at 
kafka.server.checkpoints.OffsetCheckpointFile.read(OffsetCheckpointFile.scala:61)
 at 
kafka.log.LogCleanerManager$$anonfun$allCleanerCheckpoints$1$$anonfun$apply$1.apply(LogCleanerManager.scala:89)
 at 
kafka.log.LogCleanerManager$$anonfun$allCleanerCheckpoints$1$$anonfun$apply$1.apply(LogCleanerManager.scala:87)
 at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
 at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
 at scala.collection.Iterator$class.foreach(Iterator.scala:891)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
 at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
 at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
 at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
 at 
kafka.log.LogCleanerManager$$anonfun$allCleanerCheckpoints$1.apply(LogCleanerManager.scala:87)
 at 
kafka.log.LogCleanerManager$$anonfun$allCleanerCheckpoints$1.apply(LogCleanerManager.scala:95)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
 at 
kafka.log.LogCleanerManager.allCleanerCheckpoints(LogCleanerManager.scala:86)
 at 
kafka.log.LogCleanerManager$$anonfun$grabFilthiestCompactedLog$1.apply(LogCleanerManager.scala:126)
 at 
kafka.log.LogCleanerManager$$anonfun$grabFilthiestCompactedLog$1.apply(LogCleanerManager.scala:123)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
 at 
kafka.log.LogCleanerManager.grabFilthiestCompactedLog(LogCleanerManager.scala:123)
 at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:296)
 at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:289)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
[2019-03-04 16:44:13,154] INFO [ReplicaManager broker=1] Stopping serving 
replicas in dir /tmp/kafka-logs (kafka.server.ReplicaManager)
[2019-03-04 16:44:13,189] INFO [ReplicaFetcherManager on broker 1] Removed 
fetcher for partitions 
__consumer_offsets-22,FOTA_PLAIN_FORCESTOP-0,__consumer_offsets-30,OBSERVE_DEVICE-
 
0,__consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,FOTA_DOWNLOAD_ERROR-0,__consumer_offsets-
 
25,DEVICE_DE_REGISTER-0,__consumer_offsets-35,DEVICE_REG_UPDATE-0,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__consumer_offsets-
 
16,__consumer_offsets-28,FOTA_IMEI_MONITOR-0,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,FOTA_IMEI_MONITOR-1-0,__consumer_offsets-3,__consumer_offsets-18,DATA_TO_DEVICE-
 
0,__consumer_offsets-37,emq_notify-0,__consumer_offsets-15,__consumer_offsets-24,FOTA_PLAIN_MONITOR_FORCE-0,DEVICE_REGISTER-0,springCloudBus-0,__consumer_offsets-38,__consumer_offsets-
 
17,DEVICE_REP-0,__consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,FOTA_STATICS_MONITOR-1-
 
0,__consumer_offsets-14,FOTA_STATICS_MONITOR-0,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,__consumer_offsets-39,FOTA_STATE_CHANGE-0,__consumer_offsets-12,FOTA_UPGRADE_NOTIFY-
 
0,__consumer_offsets-45,__consumer_offsets-1,emq_message_down-0,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,emq_message-0,__consumer_offsets-34,__consumer_offsets-
 
10,__consumer_offsets-32,__consumer_offsets-40,REQUEST_DEVICE-0 
(kafka.server.ReplicaFetcherManager)
[2019-03-04 16:44:13,190] INFO [ReplicaAlterLogDirsManager on broker 1] Removed 
fetcher for partitions 
__consumer_offsets-22,FOTA_PLAIN_FORCESTOP-0,__consumer_offsets-30,OBSERVE_DEVICE-
 
0,__consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,FOTA_DOWNLOAD_ERROR-0,__consumer_offsets-
 

[jira] [Commented] (KAFKA-8037) KTable restore may load bad data

2019-03-05 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785127#comment-16785127
 ] 

Guozhang Wang commented on KAFKA-8037:
--

We use `offsetLimit` in ProcessorStateManager for source table's committed 
offset, if there's a committed offset X, then we stop restoring data at X other 
than till the log-end-offset; if there's no committed offset, then we will try 
to restore to the log-end-offset at starting up (log may keep growing but we 
will only restore to the point we saw at that point). So suppose we start 
successfully without bad data, do normal processing that deserialize and we 
committed at offset X, and suppose there's a bad data at offset Y > X. If we 
failed and restart, the restoration will not reach Y but stop at X.

However this issue is still valid if there is a bad data and the app is 
starting for the first time, in which case there's no committed offset and it 
will tries to restore to the log end offset.

> KTable restore may load bad data
> 
>
> Key: KAFKA-8037
> URL: https://issues.apache.org/jira/browse/KAFKA-8037
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Minor
>
> If an input topic contains bad data, users can specify a 
> `deserialization.exception.handler` to drop corrupted records on read. 
> However, this mechanism may be by-passed on restore. Assume a 
> `builder.table()` call reads and drops a corrupted record. If the table state 
> is lost and restored from the changelog topic, the corrupted record may be 
> copied into the store, because on restore plain bytes are copied.
> If the KTable is used in a join, an internal `store.get()` call to lookup the 
> record would fail with a deserialization exception if the value part cannot 
> be deserialized.
> GlobalKTables are affected, too (cf. KAFKA-7663 that may allow a fix for 
> GlobalKTable case). It's unclear to me atm, how this issue could be addressed 
> for KTables though.
> Note, that user state stores are not affected, because they always have a 
> dedicated changelog topic (and don't reuse an input topic) and thus the 
> corrupted record would not be written into the changelog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8046) Shutdown broker because all log dirs in /tmp/kafka-logs have failed

2019-03-05 Thread jaren (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785124#comment-16785124
 ] 

jaren commented on KAFKA-8046:
--

we use kafka_2.11_2.0.0

> Shutdown broker because all log dirs in /tmp/kafka-logs have failed
> ---
>
> Key: KAFKA-8046
> URL: https://issues.apache.org/jira/browse/KAFKA-8046
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: centos 7
>Reporter: jaren
>Priority: Major
>
> kafka stop working every few days.Here are some of logs.
> ERROR Error while reading checkpoint file 
> /tmp/kafka-logs/cleaner-offset-checkpoint (kafka.server.LogDirFailureChannel)
> java.io.FileNotFoundException: /tmp/kafka-logs/cleaner-offset-checkpoint (No 
> such file or directory)
>  at java.io.FileInputStream.open0(Native Method)
>  at java.io.FileInputStream.open(FileInputStream.java:195)
>  at java.io.FileInputStream.(FileInputStream.java:138)
>  at 
> kafka.server.checkpoints.CheckpointFile.liftedTree2$1(CheckpointFile.scala:87)
>  at kafka.server.checkpoints.CheckpointFile.read(CheckpointFile.scala:86)
>  at 
> kafka.server.checkpoints.OffsetCheckpointFile.read(OffsetCheckpointFile.scala:61)
>  at 
> kafka.log.LogCleanerManager$$anonfun$allCleanerCheckpoints$1$$anonfun$apply$1.apply(LogCleanerManager.scala:89)
>  at 
> kafka.log.LogCleanerManager$$anonfun$allCleanerCheckpoints$1$$anonfun$apply$1.apply(LogCleanerManager.scala:87)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>  at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206)
>  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>  at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
>  at 
> kafka.log.LogCleanerManager$$anonfun$allCleanerCheckpoints$1.apply(LogCleanerManager.scala:87)
>  at 
> kafka.log.LogCleanerManager$$anonfun$allCleanerCheckpoints$1.apply(LogCleanerManager.scala:95)
>  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
>  at 
> kafka.log.LogCleanerManager.allCleanerCheckpoints(LogCleanerManager.scala:86)
>  at 
> kafka.log.LogCleanerManager$$anonfun$grabFilthiestCompactedLog$1.apply(LogCleanerManager.scala:126)
>  at 
> kafka.log.LogCleanerManager$$anonfun$grabFilthiestCompactedLog$1.apply(LogCleanerManager.scala:123)
>  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
>  at 
> kafka.log.LogCleanerManager.grabFilthiestCompactedLog(LogCleanerManager.scala:123)
>  at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:296)
>  at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:289)
>  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-03-04 16:44:13,154] INFO [ReplicaManager broker=1] Stopping serving 
> replicas in dir /tmp/kafka-logs (kafka.server.ReplicaManager)
> [2019-03-04 16:44:13,189] INFO [ReplicaFetcherManager on broker 1] Removed 
> fetcher for partitions 
> __consumer_offsets-22,FOTA_PLAIN_FORCESTOP-0,__consumer_offsets-30,OBSERVE_DEVICE-
>  
> 0,__consumer_offsets-8,__consumer_offsets-21,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,FOTA_DOWNLOAD_ERROR-0,__consumer_offsets-
>  
> 25,DEVICE_DE_REGISTER-0,__consumer_offsets-35,DEVICE_REG_UPDATE-0,__consumer_offsets-41,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-47,__consumer_offsets-
>  
> 16,__consumer_offsets-28,FOTA_IMEI_MONITOR-0,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,FOTA_IMEI_MONITOR-1-0,__consumer_offsets-3,__consumer_offsets-18,DATA_TO_DEVICE-
>  
> 0,__consumer_offsets-37,emq_notify-0,__consumer_offsets-15,__consumer_offsets-24,FOTA_PLAIN_MONITOR_FORCE-0,DEVICE_REGISTER-0,springCloudBus-0,__consumer_offsets-38,__consumer_offsets-
>  
> 17,DEVICE_REP-0,__consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,FOTA_STATICS_MONITOR-1-
>  
> 0,__consumer_offsets-14,FOTA_STATICS_MONITOR-0,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,__consumer_offsets-39,FOTA_STATE_CHANGE-0,__consumer_offsets-12,FOTA_UPGRADE_NOTIFY-
>  
> 0,__consumer_offsets-45,__consumer_offsets-1,emq_message_down-0,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,emq_message-0,__consumer_offsets-34,__consumer_offsets-
>  
> 10,__consumer_offsets-32,__consumer_offsets-40,REQUEST_DEVICE-0 
> (kafka.server.ReplicaFetcherManager)
> [2019-03-04 16:44:13,190] INFO 

[jira] [Commented] (KAFKA-3729) Auto-configure non-default SerDes passed alongside the topology builder

2019-03-05 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785119#comment-16785119
 ] 

Ted Yu commented on KAFKA-3729:
---

Attached tentative patch.
If it is on right track, I can send out a PR.

>  Auto-configure non-default SerDes passed alongside the topology builder
> 
>
> Key: KAFKA-3729
> URL: https://issues.apache.org/jira/browse/KAFKA-3729
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Fred Patton
>Priority: Major
>  Labels: api, newbie
> Attachments: 3729.txt
>
>
> From Guozhang Wang:
> "Only default serdes provided through configs are auto-configured today. But 
> we could auto-configure other serdes passed alongside the topology builder as 
> well."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3729) Auto-configure non-default SerDes passed alongside the topology builder

2019-03-05 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated KAFKA-3729:
--
Attachment: 3729.txt

>  Auto-configure non-default SerDes passed alongside the topology builder
> 
>
> Key: KAFKA-3729
> URL: https://issues.apache.org/jira/browse/KAFKA-3729
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Fred Patton
>Priority: Major
>  Labels: api, newbie
> Attachments: 3729.txt
>
>
> From Guozhang Wang:
> "Only default serdes provided through configs are auto-configured today. But 
> we could auto-configure other serdes passed alongside the topology builder as 
> well."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7928) Deprecate WindowStore.put(key, value)

2019-03-05 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-7928:
--

Assignee: Lee Dongjin

> Deprecate WindowStore.put(key, value)
> -
>
> Key: KAFKA-7928
> URL: https://issues.apache.org/jira/browse/KAFKA-7928
> Project: Kafka
>  Issue Type: Improvement
>Reporter: John Roesler
>Assignee: Lee Dongjin
>Priority: Major
>  Labels: beginner, easy-fix, needs-kip, newbie
>
> Specifically, `org.apache.kafka.streams.state.WindowStore#put(K, V)`
> This method is strange... A window store needs to have a timestamp associated 
> with the key, so if you do a put without a timestamp, it's up to the store to 
> just make one up.
> Even the javadoc on the method recommends not to use it, due to this 
> confusing behavior.
> We should just deprecate it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8045) Possible Flaky Test LogOffsetTest.testGetOffsetsBeforeNow

2019-03-05 Thread Bill Bejeck (JIRA)
Bill Bejeck created KAFKA-8045:
--

 Summary: Possible Flaky Test LogOffsetTest.testGetOffsetsBeforeNow
 Key: KAFKA-8045
 URL: https://issues.apache.org/jira/browse/KAFKA-8045
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Bill Bejeck


Test failed on an unrelated change 
https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/2910/
{noformat}
java.lang.AssertionError: Partition [__consumer_offsets,0] metadata not 
propagated after 15000 ms
at kafka.utils.TestUtils$.fail(TestUtils.scala:360)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:770)
at 
kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:859)
at kafka.utils.TestUtils$.$anonfun$createTopic$1(TestUtils.scala:307)
at 
kafka.utils.TestUtils$.$anonfun$createTopic$1$adapted(TestUtils.scala:306)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
at scala.collection.immutable.Range.foreach(Range.scala:158)
at scala.collection.TraversableLike.map(TraversableLike.scala:237)
at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at kafka.utils.TestUtils$.createTopic(TestUtils.scala:306)
at kafka.utils.TestUtils$.createOffsetsTopic(TestUtils.scala:354)
at 
kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:95)
at 
kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73)
at jdk.internal.reflect.GeneratedMethodAccessor155.invoke(Unknown 
Source)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at jdk.internal.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at 

[jira] [Commented] (KAFKA-3522) Consider adding version information into rocksDB storage format

2019-03-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785078#comment-16785078
 ] 

ASF GitHub Bot commented on KAFKA-3522:
---

mjsax commented on pull request #6151: KAFKA-3522: Add in-memory 
TimestampedKeyValueStore
URL: https://github.com/apache/kafka/pull/6151
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consider adding version information into rocksDB storage format
> ---
>
> Key: KAFKA-3522
> URL: https://issues.apache.org/jira/browse/KAFKA-3522
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: architecture
>
> Kafka Streams does not introduce any modifications to the data format in the 
> underlying Kafka protocol, but it does use RocksDB for persistent state 
> storage, and currently its data format is fixed and hard-coded. We want to 
> consider the evolution path in the future we we change the data format, and 
> hence having some version info stored along with the storage file / directory 
> would be useful.
> And this information could be even out of the storage file; for example, we 
> can just use a small "version indicator" file in the rocksdb directory for 
> this purposes. Thoughts? [~enothereska] [~jkreps]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7978) Flaky Test SaslSslAdminClientIntegrationTest#testConsumerGroups

2019-03-05 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785044#comment-16785044
 ] 

Matthias J. Sax commented on KAFKA-7978:


One more: 
[https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/19994/testReport/junit/kafka.api/AdminClientIntegrationTest/testConsumerGroups/]

> Flaky Test SaslSslAdminClientIntegrationTest#testConsumerGroups
> ---
>
> Key: KAFKA-7978
> URL: https://issues.apache.org/jira/browse/KAFKA-7978
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/25/]
> {quote}java.lang.AssertionError: expected:<2> but was:<0> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:834) at 
> org.junit.Assert.assertEquals(Assert.java:645) at 
> org.junit.Assert.assertEquals(Assert.java:631) at 
> kafka.api.AdminClientIntegrationTest.testConsumerGroups(AdminClientIntegrationTest.scala:1157)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7895) Ktable supress operator emitting more than one record for the same key per window

2019-03-05 Thread Bill Bejeck (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck updated KAFKA-7895:
---
Fix Version/s: 2.1.2

> Ktable supress operator emitting more than one record for the same key per 
> window
> -
>
> Key: KAFKA-7895
> URL: https://issues.apache.org/jira/browse/KAFKA-7895
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.0, 2.1.1
>Reporter: prasanthi
>Assignee: John Roesler
>Priority: Major
> Fix For: 2.2.0, 2.1.2
>
>
> Hi, We are using kstreams to get the aggregated counts per vendor(key) within 
> a specified window.
> Here's how we configured the suppress operator to emit one final record per 
> key/window.
> {code:java}
> KTable, Long> windowedCount = groupedStream
>  .windowedBy(TimeWindows.of(Duration.ofMinutes(1)).grace(ofMillis(5L)))
>  .count(Materialized.with(Serdes.Integer(),Serdes.Long()))
>  .suppress(Suppressed.untilWindowCloses(unbounded()));
> {code}
> But we are getting more than one record for the same key/window as shown 
> below.
> {code:java}
> [KTABLE-TOSTREAM-10]: [131@154906704/154906710], 1039
> [KTABLE-TOSTREAM-10]: [131@154906704/154906710], 1162
> [KTABLE-TOSTREAM-10]: [9@154906704/154906710], 6584
> [KTABLE-TOSTREAM-10]: [88@154906704/154906710], 107
> [KTABLE-TOSTREAM-10]: [108@154906704/154906710], 315
> [KTABLE-TOSTREAM-10]: [119@154906704/154906710], 119
> [KTABLE-TOSTREAM-10]: [154@154906704/154906710], 746
> [KTABLE-TOSTREAM-10]: [154@154906704/154906710], 809{code}
> Could you please take a look?
> Thanks
>  
>  
> Added by John:
> Acceptance Criteria:
>  * add suppress to system tests, such that it's exercised with crash/shutdown 
> recovery, rebalance, etc.
>  ** [https://github.com/apache/kafka/pull/6278]
>  * make sure that there's some system test coverage with caching disabled.
>  ** Follow-on ticket: https://issues.apache.org/jira/browse/KAFKA-7943
>  * test with tighter time bounds with windows of say 30 seconds and use 
> system time without adding any extra time for verification
>  ** Follow-on ticket: https://issues.apache.org/jira/browse/KAFKA-7944



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3729) Auto-configure non-default SerDes passed alongside the topology builder

2019-03-05 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784828#comment-16784828
 ] 

Guozhang Wang commented on KAFKA-3729:
--

[~mjsax] Yes `configure()` is tended to be idempotent, and if users impl is not 
to be idempotent they should use a boolean flag, e.g. to make sure the logic is 
executed only once.

[~yuzhih...@gmail.com] It is from the interface `ProcessorContext#appConfigs`, 
and yes they can be used. Note one tricky thing is that the configs are only 
passed in at the construction time of the KafkaStreams, whereas the topology is 
built beforehand, so we can only do the auto-configure at KafkaStreams 
construction time, not at the topology building time since users are passing in 
the serde objects.

>  Auto-configure non-default SerDes passed alongside the topology builder
> 
>
> Key: KAFKA-3729
> URL: https://issues.apache.org/jira/browse/KAFKA-3729
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Fred Patton
>Priority: Major
>  Labels: api, newbie
>
> From Guozhang Wang:
> "Only default serdes provided through configs are auto-configured today. But 
> we could auto-configure other serdes passed alongside the topology builder as 
> well."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7895) Ktable supress operator emitting more than one record for the same key per window

2019-03-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784816#comment-16784816
 ] 

ASF GitHub Bot commented on KAFKA-7895:
---

bbejeck commented on pull request #6325: KAFKA-7895: fix stream-time reckoning 
for Suppress (2.1) (#6286)
URL: https://github.com/apache/kafka/pull/6325
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ktable supress operator emitting more than one record for the same key per 
> window
> -
>
> Key: KAFKA-7895
> URL: https://issues.apache.org/jira/browse/KAFKA-7895
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.0, 2.1.1
>Reporter: prasanthi
>Assignee: John Roesler
>Priority: Major
> Fix For: 2.2.0
>
>
> Hi, We are using kstreams to get the aggregated counts per vendor(key) within 
> a specified window.
> Here's how we configured the suppress operator to emit one final record per 
> key/window.
> {code:java}
> KTable, Long> windowedCount = groupedStream
>  .windowedBy(TimeWindows.of(Duration.ofMinutes(1)).grace(ofMillis(5L)))
>  .count(Materialized.with(Serdes.Integer(),Serdes.Long()))
>  .suppress(Suppressed.untilWindowCloses(unbounded()));
> {code}
> But we are getting more than one record for the same key/window as shown 
> below.
> {code:java}
> [KTABLE-TOSTREAM-10]: [131@154906704/154906710], 1039
> [KTABLE-TOSTREAM-10]: [131@154906704/154906710], 1162
> [KTABLE-TOSTREAM-10]: [9@154906704/154906710], 6584
> [KTABLE-TOSTREAM-10]: [88@154906704/154906710], 107
> [KTABLE-TOSTREAM-10]: [108@154906704/154906710], 315
> [KTABLE-TOSTREAM-10]: [119@154906704/154906710], 119
> [KTABLE-TOSTREAM-10]: [154@154906704/154906710], 746
> [KTABLE-TOSTREAM-10]: [154@154906704/154906710], 809{code}
> Could you please take a look?
> Thanks
>  
>  
> Added by John:
> Acceptance Criteria:
>  * add suppress to system tests, such that it's exercised with crash/shutdown 
> recovery, rebalance, etc.
>  ** [https://github.com/apache/kafka/pull/6278]
>  * make sure that there's some system test coverage with caching disabled.
>  ** Follow-on ticket: https://issues.apache.org/jira/browse/KAFKA-7943
>  * test with tighter time bounds with windows of say 30 seconds and use 
> system time without adding any extra time for verification
>  ** Follow-on ticket: https://issues.apache.org/jira/browse/KAFKA-7944



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8044) System Test Failure: ReassignPartitionsTest.test_reassign_partitions

2019-03-05 Thread Manikumar (JIRA)
Manikumar created KAFKA-8044:


 Summary: System Test Failure: 
ReassignPartitionsTest.test_reassign_partitions
 Key: KAFKA-8044
 URL: https://issues.apache.org/jira/browse/KAFKA-8044
 Project: Kafka
  Issue Type: Bug
  Components: system tests
Affects Versions: 2.1.0
Reporter: Manikumar
Assignee: Manikumar


{quote}
Node ubuntu@worker10: did not stop within the specified timeout of 150 seconds 
Traceback (most recent call last): File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py",
 line 132, in run data = self.run_test() File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py",
 line 189, in run_test return self.test_context.function(self.test) File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/mark/_mark.py",
 line 428, in wrapper return functools.partial(f, *args, **kwargs)(*w_args, 
**w_kwargs) File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
 line 148, in test_reassign_partitions self.move_start_offset() File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/tests/kafkatest/tests/core/reassign_partitions_test.py",
 line 121, in move_start_offset producer.stop() File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/services/background_thread.py",
 line 82, in stop super(BackgroundThreadService, self).stop() File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/venv/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/services/service.py",
 line 279, in stop self.stop_node(node) File 
"/home/jenkins/workspace/system-test-kafka_2.2-WCMO537TEDQFHHUF4SVK33SPUR5AG4KSITCCNJIYFPKDZQWBEJOA/kafka/tests/kafkatest/services/verifiable_producer.py",
 line 285, in stop_node (str(node.account), str(self.stop_timeout_sec)) 
AssertionError: Node ubuntu@worker10: did not stop within the specified timeout 
of 150 seconds
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (KAFKA-8043) Host

2019-03-05 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin closed KAFKA-8043.
-

> Host
> 
>
> Key: KAFKA-8043
> URL: https://issues.apache.org/jira/browse/KAFKA-8043
> Project: Kafka
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8043) Host

2019-03-05 Thread Bolke de Bruin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved KAFKA-8043.
---
Resolution: Incomplete

> Host
> 
>
> Key: KAFKA-8043
> URL: https://issues.apache.org/jira/browse/KAFKA-8043
> Project: Kafka
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-3729) Auto-configure non-default SerDes passed alongside the topology builder

2019-03-05 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784755#comment-16784755
 ] 

Ted Yu commented on KAFKA-3729:
---

Can AbstractProcessorContext#appConfigs() be used to obtain the Map which 
configure() uses ?

>  Auto-configure non-default SerDes passed alongside the topology builder
> 
>
> Key: KAFKA-3729
> URL: https://issues.apache.org/jira/browse/KAFKA-3729
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Fred Patton
>Priority: Major
>  Labels: api, newbie
>
> From Guozhang Wang:
> "Only default serdes provided through configs are auto-configured today. But 
> we could auto-configure other serdes passed alongside the topology builder as 
> well."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

2019-03-05 Thread Adrian McCague (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian McCague updated KAFKA-8042:
--
Description: 
Note that this is from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

In the process there are multiple instances of {{KafkaStreams}} so the below 
detail is from one of these.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each). As can be 
seen this topology has 5 joins - which is the extent of its state.
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOINOTHER-25-store: 851
./KSTREAM-JOINOTHER-40-store: 819
./KSTREAM-JOINTHIS-24-store: 851
./KSTREAM-JOINTHIS-29-store: 836
./KSTREAM-JOINOTHER-35-store: 819
./KSTREAM-JOINOTHER-30-store: 819
./KSTREAM-JOINOTHER-45-store: 745
./KSTREAM-JOINTHIS-39-store: 819
./KSTREAM-JOINTHIS-44-store: 685
./KSTREAM-JOINTHIS-34-store: 819

There are many (x800 as above) of these segment files:
./KSTREAM-JOINOTHER-25-store.155146629
./KSTREAM-JOINOTHER-25-store.155155902
./KSTREAM-JOINOTHER-25-store.155149269
./KSTREAM-JOINOTHER-25-store.155154879
./KSTREAM-JOINOTHER-25-store.155169861
./KSTREAM-JOINOTHER-25-store.155153064
./KSTREAM-JOINOTHER-25-store.155148444
./KSTREAM-JOINOTHER-25-store.155155671
./KSTREAM-JOINOTHER-25-store.155168673
./KSTREAM-JOINOTHER-25-store.155159565
./KSTREAM-JOINOTHER-25-store.155175735
./KSTREAM-JOINOTHER-25-store.155168574
./KSTREAM-JOINOTHER-25-store.155163525
./KSTREAM-JOINOTHER-25-store.155165241
./KSTREAM-JOINOTHER-25-store.155146662
./KSTREAM-JOINOTHER-25-store.155178177
./KSTREAM-JOINOTHER-25-store.155158740
./KSTREAM-JOINOTHER-25-store.155168145
./KSTREAM-JOINOTHER-25-store.155166231
./KSTREAM-JOINOTHER-25-store.155172171
./KSTREAM-JOINOTHER-25-store.155175075
./KSTREAM-JOINOTHER-25-store.155163096
./KSTREAM-JOINOTHER-25-store.155161512
./KSTREAM-JOINOTHER-25-store.155179233
./KSTREAM-JOINOTHER-25-store.155146266
./KSTREAM-JOINOTHER-25-store.155153691
./KSTREAM-JOINOTHER-25-store.155159235
./KSTREAM-JOINOTHER-25-store.155152734
./KSTREAM-JOINOTHER-25-store.155160687
./KSTREAM-JOINOTHER-25-store.155174415
./KSTREAM-JOINOTHER-25-store.155150820
./KSTREAM-JOINOTHER-25-store.155148642
... etc
{code}

Once re-balancing and state restoration is complete - the redundant segment 
files are deleted and the segment count drops to 508 total (where the above 
mentioned state directory is one of many).

We have seen the number of these segment stores grow to as many as 15000 over 
the baseline 508 which can fill smaller volumes. *This means that a state 
volume that would normally have ~300MB total disk usage would use in excess of 
30GB during rebalancing*, mostly preallocated MANIFEST files.

h2. Expected Behaviour

For this particular application we expect 508 segment folders total to be 
active and existing throughout rebalancing. Give or take migrated tasks that 
are subject to the {{state.cleanup.delay.ms}}.

h2. Preliminary investigation

* This does not appear to be the case in v1.1.0. With our application the 
number of state directories only grows to 670 (over the base line 508)
* The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in 
v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many 
segment stores.
* Suspect https://github.com/apache/kafka/pull/5253 to be the source of this 
change of behaviour.

A workaround is to use {{rocksdb.config.setter}} and set the preallocated 
amount for MANIFEST files to a lower value such as 64KB, however the number of 
segment stores appears to be unbounded so disk volumes may still fill up for a 
heavier application.

  was:
Note that this from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

In the process there are multiple instances of {{KafkaStreams}} so the below 
detail is from one of these.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each). As can be 
seen this 

[jira] [Created] (KAFKA-8043) Host

2019-03-05 Thread Bolke de Bruin (JIRA)
Bolke de Bruin created KAFKA-8043:
-

 Summary: Host
 Key: KAFKA-8043
 URL: https://issues.apache.org/jira/browse/KAFKA-8043
 Project: Kafka
  Issue Type: Bug
Reporter: Bolke de Bruin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

2019-03-05 Thread Adrian McCague (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian McCague updated KAFKA-8042:
--
Description: 
Note that this from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

In the process there are multiple instances of {{KafkaStreams}} so the below 
detail is from one of these.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each). As can be 
seen this topology has 5 joins - which is the extent of its state.
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOINOTHER-25-store: 851
./KSTREAM-JOINOTHER-40-store: 819
./KSTREAM-JOINTHIS-24-store: 851
./KSTREAM-JOINTHIS-29-store: 836
./KSTREAM-JOINOTHER-35-store: 819
./KSTREAM-JOINOTHER-30-store: 819
./KSTREAM-JOINOTHER-45-store: 745
./KSTREAM-JOINTHIS-39-store: 819
./KSTREAM-JOINTHIS-44-store: 685
./KSTREAM-JOINTHIS-34-store: 819

There are many (x800 as above) of these segment files:
./KSTREAM-JOINOTHER-25-store.155146629
./KSTREAM-JOINOTHER-25-store.155155902
./KSTREAM-JOINOTHER-25-store.155149269
./KSTREAM-JOINOTHER-25-store.155154879
./KSTREAM-JOINOTHER-25-store.155169861
./KSTREAM-JOINOTHER-25-store.155153064
./KSTREAM-JOINOTHER-25-store.155148444
./KSTREAM-JOINOTHER-25-store.155155671
./KSTREAM-JOINOTHER-25-store.155168673
./KSTREAM-JOINOTHER-25-store.155159565
./KSTREAM-JOINOTHER-25-store.155175735
./KSTREAM-JOINOTHER-25-store.155168574
./KSTREAM-JOINOTHER-25-store.155163525
./KSTREAM-JOINOTHER-25-store.155165241
./KSTREAM-JOINOTHER-25-store.155146662
./KSTREAM-JOINOTHER-25-store.155178177
./KSTREAM-JOINOTHER-25-store.155158740
./KSTREAM-JOINOTHER-25-store.155168145
./KSTREAM-JOINOTHER-25-store.155166231
./KSTREAM-JOINOTHER-25-store.155172171
./KSTREAM-JOINOTHER-25-store.155175075
./KSTREAM-JOINOTHER-25-store.155163096
./KSTREAM-JOINOTHER-25-store.155161512
./KSTREAM-JOINOTHER-25-store.155179233
./KSTREAM-JOINOTHER-25-store.155146266
./KSTREAM-JOINOTHER-25-store.155153691
./KSTREAM-JOINOTHER-25-store.155159235
./KSTREAM-JOINOTHER-25-store.155152734
./KSTREAM-JOINOTHER-25-store.155160687
./KSTREAM-JOINOTHER-25-store.155174415
./KSTREAM-JOINOTHER-25-store.155150820
./KSTREAM-JOINOTHER-25-store.155148642
... etc
{code}

Once re-balancing and state restoration is complete - the redundant segment 
files are deleted and the segment count drops to 508.

We have seen the number of these segment stores grow to as many as 15000 over 
the baseline 508 which can fill smaller volumes. *This means that a state 
volume that would normally have ~300MB total disk usage would use in excess of 
30GB during rebalancing*, mostly preallocated MANIFEST files.

h2. Expected Behaviour

For this particular application we expect 508 segment folders total to be 
active and existing throughout rebalancing. Give or take migrated tasks that 
are subject to the {{state.cleanup.delay.ms}}.

h2. Preliminary investigation

* This does not appear to be the case in v1.1.0. With our application the 
number of state directories only grows to 670 (over the base line 508)
* The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in 
v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many 
segment stores.
* Suspect https://github.com/apache/kafka/pull/5253 to be the source of this 
change of behaviour.

A workaround is to use {{rocksdb.config.setter}} and set the preallocated 
amount for MANIFEST files to a lower value such as 64KB, however the number of 
segment stores appears to be unbounded so disk volumes may still fill up for a 
heavier application.

  was:
Note that this from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each):
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058

[jira] [Updated] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

2019-03-05 Thread Adrian McCague (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian McCague updated KAFKA-8042:
--
Description: 
Note that this from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each):
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOINOTHER-25-store: 851
./KSTREAM-JOINOTHER-40-store: 819
./KSTREAM-JOINTHIS-24-store: 851
./KSTREAM-JOINTHIS-29-store: 836
./KSTREAM-JOINOTHER-35-store: 819
./KSTREAM-JOINOTHER-30-store: 819
./KSTREAM-JOINOTHER-45-store: 745
./KSTREAM-JOINTHIS-39-store: 819
./KSTREAM-JOINTHIS-44-store: 685
./KSTREAM-JOINTHIS-34-store: 819

There are many (x800 as above) of these segment files:
./KSTREAM-JOINOTHER-25-store.155146629
./KSTREAM-JOINOTHER-25-store.155155902
./KSTREAM-JOINOTHER-25-store.155149269
./KSTREAM-JOINOTHER-25-store.155154879
./KSTREAM-JOINOTHER-25-store.155169861
./KSTREAM-JOINOTHER-25-store.155153064
./KSTREAM-JOINOTHER-25-store.155148444
./KSTREAM-JOINOTHER-25-store.155155671
./KSTREAM-JOINOTHER-25-store.155168673
./KSTREAM-JOINOTHER-25-store.155159565
./KSTREAM-JOINOTHER-25-store.155175735
./KSTREAM-JOINOTHER-25-store.155168574
./KSTREAM-JOINOTHER-25-store.155163525
./KSTREAM-JOINOTHER-25-store.155165241
./KSTREAM-JOINOTHER-25-store.155146662
./KSTREAM-JOINOTHER-25-store.155178177
./KSTREAM-JOINOTHER-25-store.155158740
./KSTREAM-JOINOTHER-25-store.155168145
./KSTREAM-JOINOTHER-25-store.155166231
./KSTREAM-JOINOTHER-25-store.155172171
./KSTREAM-JOINOTHER-25-store.155175075
./KSTREAM-JOINOTHER-25-store.155163096
./KSTREAM-JOINOTHER-25-store.155161512
./KSTREAM-JOINOTHER-25-store.155179233
./KSTREAM-JOINOTHER-25-store.155146266
./KSTREAM-JOINOTHER-25-store.155153691
./KSTREAM-JOINOTHER-25-store.155159235
./KSTREAM-JOINOTHER-25-store.155152734
./KSTREAM-JOINOTHER-25-store.155160687
./KSTREAM-JOINOTHER-25-store.155174415
./KSTREAM-JOINOTHER-25-store.155150820
./KSTREAM-JOINOTHER-25-store.155148642
... etc
{code}

Once re-balancing and state restoration is complete - the redundant segment 
files are deleted and the segment count drops to 508.

We have seen the number of these segment stores grow to as many as 15000 over 
the baseline 508 which can fill smaller volumes. *This means that a state 
volume that would normally have ~300MB total disk usage would use in excess of 
30GB during rebalancing*, mostly preallocated MANIFEST files.

h2. Expected Behaviour

For this particular application we expect 508 segment folders total to be 
active and existing throughout rebalancing. Give or take migrated tasks that 
are subject to the {{state.cleanup.delay.ms}}.

h2. Preliminary investigation

* This does not appear to be the case in v1.1.0. With our application the 
number of state directories only grows to 670 (over the base line 508)
* The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in 
v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many 
segment stores.
* Suspect https://github.com/apache/kafka/pull/5253 to be the source of this 
change of behaviour.

A workaround is to use {{rocksdb.config.setter}} and set the preallocated 
amount for MANIFEST files to a lower value such as 64KB, however the number of 
segment stores appears to be unbounded so disk volumes may still fill up for a 
heavier application.

  was:
Note that this from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each):
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOINOTHER-25-store: 851
./KSTREAM-JOINOTHER-40-store: 819
./KSTREAM-JOINTHIS-24-store: 851
./KSTREAM-JOINTHIS-29-store: 836

[jira] [Created] (KAFKA-8042) Kafka Streams creates many segment stores on state restore

2019-03-05 Thread Adrian McCague (JIRA)
Adrian McCague created KAFKA-8042:
-

 Summary: Kafka Streams creates many segment stores on state restore
 Key: KAFKA-8042
 URL: https://issues.apache.org/jira/browse/KAFKA-8042
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.1.1, 2.1.0
Reporter: Adrian McCague
 Attachments: StateStoreSegments-StreamsConfig.txt

Note that this from the perspective of one instance of an application, where 
there are 8 instances total, with partition count 8 for all topics and of 
course stores. Standby replicas = 1.

h2. Actual Behaviour

During state restore of an application, many segment stores are created (I am 
using MANIFEST files as a marker since they preallocate 4MB each):
{code:java}
bash-4.2# pwd
/data/fooapp/0_7
bash-4.2# for dir in $(find . -maxdepth 1 -type d); do echo "${dir}: $(find 
${dir} -type f -name 'MANIFEST-*' -printf x | wc -c)"; done
.: 8058
./KSTREAM-JOINOTHER-25-store: 851
./KSTREAM-JOINOTHER-40-store: 819
./KSTREAM-JOINTHIS-24-store: 851
./KSTREAM-JOINTHIS-29-store: 836
./KSTREAM-JOINOTHER-35-store: 819
./KSTREAM-JOINOTHER-30-store: 819
./KSTREAM-JOINOTHER-45-store: 745
./KSTREAM-JOINTHIS-39-store: 819
./KSTREAM-JOINTHIS-44-store: 685
./KSTREAM-JOINTHIS-34-store: 819

There are many (x800 as above) of these segment files:
./KSTREAM-JOINOTHER-25-store.155146629
./KSTREAM-JOINOTHER-25-store.155155902
./KSTREAM-JOINOTHER-25-store.155149269
./KSTREAM-JOINOTHER-25-store.155154879
./KSTREAM-JOINOTHER-25-store.155169861
./KSTREAM-JOINOTHER-25-store.155153064
./KSTREAM-JOINOTHER-25-store.155148444
./KSTREAM-JOINOTHER-25-store.155155671
./KSTREAM-JOINOTHER-25-store.155168673
./KSTREAM-JOINOTHER-25-store.155159565
./KSTREAM-JOINOTHER-25-store.155175735
./KSTREAM-JOINOTHER-25-store.155168574
./KSTREAM-JOINOTHER-25-store.155163525
./KSTREAM-JOINOTHER-25-store.155165241
./KSTREAM-JOINOTHER-25-store.155146662
./KSTREAM-JOINOTHER-25-store.155178177
./KSTREAM-JOINOTHER-25-store.155158740
./KSTREAM-JOINOTHER-25-store.155168145
./KSTREAM-JOINOTHER-25-store.155166231
./KSTREAM-JOINOTHER-25-store.155172171
./KSTREAM-JOINOTHER-25-store.155175075
./KSTREAM-JOINOTHER-25-store.155163096
./KSTREAM-JOINOTHER-25-store.155161512
./KSTREAM-JOINOTHER-25-store.155179233
./KSTREAM-JOINOTHER-25-store.155146266
./KSTREAM-JOINOTHER-25-store.155153691
./KSTREAM-JOINOTHER-25-store.155159235
./KSTREAM-JOINOTHER-25-store.155152734
./KSTREAM-JOINOTHER-25-store.155160687
./KSTREAM-JOINOTHER-25-store.155174415
./KSTREAM-JOINOTHER-25-store.155150820
./KSTREAM-JOINOTHER-25-store.155148642
{code}

Once re-balancing and state restoration is complete - the redundant segment 
files are deleted and the segment count drops to 508.

We have seen the number of these segment stores grow to as many as 15000 over 
the baseline 508 which can fill smaller volumes. *This means that a state 
volume that would normally have ~300MB total disk usage would use in excess of 
30GB during rebalancing*, mostly preallocated MANIFEST files.

h2. Expected Behaviour

For this particular application we expect 508 segment folders total to be 
active and existing throughout rebalancing. Give or take migrated tasks that 
are subject to the {{state.cleanup.delay.ms}}.

h2. Preliminary investigation

* This does not appear to be the case in v1.1.0. With our application the 
number of state directories only grows to 670 (over the base line 508)
* The MANIFEST files were not preallocated to 4MB in v1.1.0 they are now in 
v2.1.x, this appears to be expected RocksDB behaviour, but exacerbates the many 
segment stores.
* Suspect https://github.com/apache/kafka/pull/5253 to be the source of this 
change of behaviour.

A workaround is to use {{rocksdb.config.setter}} and set the preallocated 
amount for MANIFEST files to a lower value such as 64KB, however the number of 
segment stores appears to be unbounded so disk volumes may still fill up for a 
heavier application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-03-05 Thread Abhi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhi updated KAFKA-7925:

Attachment: jira-server.log-6
jira-server.log-5
jira-server.log-4
jira-server.log-3
jira-server.log-2
jira-server.log-1
jira_prod.producer.log

> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: jira-server.log-1, jira-server.log-2, jira-server.log-3, 
> jira-server.log-4, jira-server.log-5, jira-server.log-6, 
> jira_prod.producer.log, threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP 
> mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP 
> mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP 
> mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2035u IPv4 30012898 0t0 TCP 
> mwkafka-prod-02.tbd:36866->mwkafka-prod-01.dr:9092 (FIN_WAIT2)
> java 144319 kafkagod 2036u IPv4 30004729 0t0 TCP 
> mwkafka-prod-02.tbd:36882->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2037u IPv4 30004914 0t0 TCP 
> mwkafka-prod-02.tbd:58426->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2038u IPv4 30015651 0t0 TCP 
> mwkafka-prod-02.tbd:36884->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2039u IPv4 30012966 0t0 TCP 
> mwkafka-prod-02.tbd:58422->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2040u IPv4 30005643 0t0 TCP 
> mwkafka-prod-02.tbd:36252->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2041u IPv4 30012944 0t0 TCP 
> mwkafka-prod-02.tbd:36286->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2042u IPv4 30012973 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51924 (ESTABLISHED)
> java 144319 kafkagod 2043u sock 0,7 0t0 30012463 protocol: TCP
> java 144319 kafkagod 2044u IPv4 30012979 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39994 (ESTABLISHED)
> java 144319 kafkagod 2045u IPv4 30012899 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34548 (ESTABLISHED)
> java 144319 kafkagod 2046u sock 0,7 0t0 30003437 protocol: TCP
> java 144319 kafkagod 2047u IPv4 30012980 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:38120 (ESTABLISHED)
> java 144319 

[jira] [Commented] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-03-05 Thread Abhi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784669#comment-16784669
 ] 

Abhi commented on KAFKA-7925:
-

I tried the test again. I started the producer application to publish messages 
on 40 topics using same producer sequentially. I got the 
org.apache.kafka.common.errors.UnknownServerException again. This time all the 
servers were running properly without any exceptions.
I have uploaded the producer and all brokers server.log at debug level in the 
issue.

Getting the same exception when running consumer group command:
env KAFKA_LOG4J_CONFIG=/u/choudhab/kafka/log4j.properties 
KAFKA_OPTS="-Dsun.security.jgss.native=true 
-Dsun.security.jgss.lib=/usr/libexec/libgsswrap.so 
-Djavax.security.auth.useSubjectCredsOnly=false 
-Djava.security.auth.login.config=/u/bansalp/kafka/producer_jaas.conf" 
/proj/tools/infra/apache/kafka/kafka_2.12-2.1.1/bin/kafka-consumer-groups.sh 
--bootstrap-server mwkafka-prod-01.nyc:9092 --command-config 
/u/choudhab/kafka/command_config  --list
Error: Executing consumer group command failed due to 
org.apache.kafka.common.errors.UnknownServerException: Error listing groups on 
mwkafka-prod-01.dr.xxx.com:9092 (id: 3 rack: dr.xxx.com)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.UnknownServerException: Error listing groups on 
mwkafka-prod-01.dr.xxx.com:9092 (id: 3 rack: dr.xxx.com)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at 
org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262)
at 
kafka.admin.ConsumerGroupCommand$ConsumerGroupService.listGroups(ConsumerGroupCommand.scala:132)
at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:58)
at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)
Caused by: org.apache.kafka.common.errors.UnknownServerException: Error listing 
groups on mwkafka-prod-01.dr.xxx.com:9092 (id: 3 rack: dr.xxx.com)
[2019-03-05 12:13:15,120] WARN [Principal=null]: TGT renewal thread has been 
interrupted and will exit. 
(org.apache.kafka.common.security.kerberos.KerberosLogin)


> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> 

[jira] [Updated] (KAFKA-8010) kafka-configs.sh does not allow setting config with an equal in the value

2019-03-05 Thread Kartik (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kartik updated KAFKA-8010:
--
Attachment: (was: image-2019-03-05-19-41-44-461.png)

> kafka-configs.sh does not allow setting config with an equal in the value
> -
>
> Key: KAFKA-8010
> URL: https://issues.apache.org/jira/browse/KAFKA-8010
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Mickael Maison
>Priority: Major
> Attachments: image-2019-03-05-19-45-47-168.png
>
>
> The sasl.jaas.config typically includes equals in its value. Unfortunately 
> the kafka-configs tool does not parse such values correctly and hits an error:
> ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config "sasl.jaas.config=KafkaServer \{\n  
> org.apache.kafka.common.security.plain.PlainLoginModule required\n  
> username=\"myuser\"\n  password=\"mypassword\";\n};\nClient \{\n  
> org.apache.zookeeper.server.auth.DigestLoginModule required\n  
> username=\"myuser2\"\n  password=\"mypassword2\;\n};"
> requirement failed: Invalid entity config: all configs to be added must be in 
> the format "key=val"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8010) kafka-configs.sh does not allow setting config with an equal in the value

2019-03-05 Thread Kartik (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784480#comment-16784480
 ] 

Kartik edited comment on KAFKA-8010 at 3/5/19 2:16 PM:
---

[~mimaison] For me it's working.  Attaching the image, Ignore the warning 
message. The value is getting parsed when provided in a single quote.

!image-2019-03-05-19-45-47-168.png!

Are you still getting " Invalid entity config: all configs to be added must be 
in the format "key=val" error message even after proving the data in the single 
quote. Can you share the image?


was (Author: kartikvk1996):
[~mimaison] For me it's working.  Attaching the image, Ignore the warning 
message. The value is getting parsed when provided in a single quote.

 

!image-2019-03-05-19-41-44-461.png!

 

Are you still getting " Invalid entity config: all configs to be added must be 
in the format "key=val" error message even after proving the data in the single 
quote. Can you share the image?

 

> kafka-configs.sh does not allow setting config with an equal in the value
> -
>
> Key: KAFKA-8010
> URL: https://issues.apache.org/jira/browse/KAFKA-8010
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Mickael Maison
>Priority: Major
> Attachments: image-2019-03-05-19-45-47-168.png
>
>
> The sasl.jaas.config typically includes equals in its value. Unfortunately 
> the kafka-configs tool does not parse such values correctly and hits an error:
> ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config "sasl.jaas.config=KafkaServer \{\n  
> org.apache.kafka.common.security.plain.PlainLoginModule required\n  
> username=\"myuser\"\n  password=\"mypassword\";\n};\nClient \{\n  
> org.apache.zookeeper.server.auth.DigestLoginModule required\n  
> username=\"myuser2\"\n  password=\"mypassword2\;\n};"
> requirement failed: Invalid entity config: all configs to be added must be in 
> the format "key=val"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8010) kafka-configs.sh does not allow setting config with an equal in the value

2019-03-05 Thread Kartik (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784480#comment-16784480
 ] 

Kartik commented on KAFKA-8010:
---

[~mimaison] For me it's working.  Attaching the image, Ignore the warning 
message. The value is getting parsed when provided in a single quote.

 

!image-2019-03-05-19-41-44-461.png!

 

Are you still getting " Invalid entity config: all configs to be added must be 
in the format "key=val" error message even after proving the data in the single 
quote. Can you share the image?

 

> kafka-configs.sh does not allow setting config with an equal in the value
> -
>
> Key: KAFKA-8010
> URL: https://issues.apache.org/jira/browse/KAFKA-8010
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Mickael Maison
>Priority: Major
> Attachments: image-2019-03-05-19-41-44-461.png
>
>
> The sasl.jaas.config typically includes equals in its value. Unfortunately 
> the kafka-configs tool does not parse such values correctly and hits an error:
> ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config "sasl.jaas.config=KafkaServer \{\n  
> org.apache.kafka.common.security.plain.PlainLoginModule required\n  
> username=\"myuser\"\n  password=\"mypassword\";\n};\nClient \{\n  
> org.apache.zookeeper.server.auth.DigestLoginModule required\n  
> username=\"myuser2\"\n  password=\"mypassword2\;\n};"
> requirement failed: Invalid entity config: all configs to be added must be in 
> the format "key=val"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8010) kafka-configs.sh does not allow setting config with an equal in the value

2019-03-05 Thread Kartik (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kartik updated KAFKA-8010:
--
Attachment: image-2019-03-05-19-41-44-461.png

> kafka-configs.sh does not allow setting config with an equal in the value
> -
>
> Key: KAFKA-8010
> URL: https://issues.apache.org/jira/browse/KAFKA-8010
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Mickael Maison
>Priority: Major
> Attachments: image-2019-03-05-19-41-44-461.png
>
>
> The sasl.jaas.config typically includes equals in its value. Unfortunately 
> the kafka-configs tool does not parse such values correctly and hits an error:
> ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config "sasl.jaas.config=KafkaServer \{\n  
> org.apache.kafka.common.security.plain.PlainLoginModule required\n  
> username=\"myuser\"\n  password=\"mypassword\";\n};\nClient \{\n  
> org.apache.zookeeper.server.auth.DigestLoginModule required\n  
> username=\"myuser2\"\n  password=\"mypassword2\;\n};"
> requirement failed: Invalid entity config: all configs to be added must be in 
> the format "key=val"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-03-05 Thread Abhi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784456#comment-16784456
 ] 

Abhi edited comment on KAFKA-7925 at 3/5/19 1:54 PM:
-

[~rsivaram]
Did you run a clean build using the instructions Manikumar posted above? We 
want to make sure that all the jars you are running with came from that build 
to avoid NoClassDefFoundError. The client exceptions could be related to that.
>> All brokers were using JARs from [~omkreddy]'s build (using the 
>> instructions). There was no NoClassDefFoundError at the startup. Note that 
>> only one broker saw these exceptions and they all use same configuration and 
>> jars

>>Were all the brokers running with the build including the PR for a day? And 
>>during this time, were the clients always failing? Are the clients also 
>>running with the build including the PR?
Yes the brokers ran fine with PR build. The client doesn't always fail. When I 
run producer with 1 or 2 topics for some time, it works as expected but when i 
go to 40 topics, it fails after sending messages to first 15 topics (this 
happens sequentially).

The NoClassDefFoundError  exceptions went away with a restart of that 
particular broker.

I will give this another go just to make sure no other issue is affecting the 
observations.



was (Author: xabhi):
[~rsivaram]
Did you run a clean build using the instructions Manikumar posted above? We 
want to make sure that all the jars you are running with came from that build 
to avoid NoClassDefFoundError. The client exceptions could be related to that.
>> All brokers were using JARs from [~omkreddy]'s build (using the 
>> instructions). There was no NoClassDefFoundError at the startup. Note that 
>> only one broker saw these exceptions and they all use same configuration and 
>> jars

>>Were all the brokers running with the build including the PR for a day? And 
>>during this time, were the clients always failing? Are the clients also 
>>running with the build including the PR?
Yes the brokers ran fine with PR build. The client doesn't always fail. When I 
run producer with 1 or 2 topics for some time, it works as expected but when i 
go to 40 topics, it fails after sending messages to first 15 topics (this 
happens sequentially).

I can give this another go if you want.


> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 

[jira] [Commented] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-03-05 Thread Abhi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784456#comment-16784456
 ] 

Abhi commented on KAFKA-7925:
-

[~rsivaram]
Did you run a clean build using the instructions Manikumar posted above? We 
want to make sure that all the jars you are running with came from that build 
to avoid NoClassDefFoundError. The client exceptions could be related to that.
>> All brokers were using JARs from [~omkreddy]'s build (using the 
>> instructions). There was no NoClassDefFoundError at the startup. Note that 
>> only one broker saw these exceptions and they all use same configuration and 
>> jars

>>Were all the brokers running with the build including the PR for a day? And 
>>during this time, were the clients always failing? Are the clients also 
>>running with the build including the PR?
Yes the brokers ran fine with PR build. The client doesn't always fail. When I 
run producer with 1 or 2 topics for some time, it works as expected but when i 
go to 40 topics, it fails after sending messages to first 15 topics (this 
happens sequentially).

I can give this another go if you want.


> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP 
> mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP 
> mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP 
> mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2035u IPv4 30012898 0t0 TCP 
> mwkafka-prod-02.tbd:36866->mwkafka-prod-01.dr:9092 (FIN_WAIT2)
> java 144319 kafkagod 2036u IPv4 30004729 0t0 TCP 
> mwkafka-prod-02.tbd:36882->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2037u IPv4 30004914 0t0 TCP 
> mwkafka-prod-02.tbd:58426->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2038u IPv4 30015651 0t0 TCP 
> mwkafka-prod-02.tbd:36884->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2039u IPv4 30012966 0t0 TCP 
> mwkafka-prod-02.tbd:58422->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2040u IPv4 30005643 0t0 TCP 
> mwkafka-prod-02.tbd:36252->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2041u IPv4 30012944 0t0 TCP 
> mwkafka-prod-02.tbd:36286->mwkafka-prod-02.dr:9092 

[jira] [Commented] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-03-05 Thread Rajini Sivaram (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784441#comment-16784441
 ] 

Rajini Sivaram commented on KAFKA-7925:
---

[~xabhi] Did you run a clean build using the instructions [~omkreddy] posted 
above? We want to make sure that all the jars you are running with came from 
that build to avoid NoClassDefFoundError. The client exceptions could be 
related to that.

_"These exceptions started coming after the server was running fine for a day 
and I don't see such exceptions in other servers."_

Were all the brokers running with the build including the PR for a day? And 
during this time, were the clients always failing? Are the clients also running 
with the build including the PR?

300MB would be too large to upload to the JIRA, so if it will be better to 
extract client and server logs for a shorter matching period of time.


> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP 
> mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP 
> mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP 
> mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2035u IPv4 30012898 0t0 TCP 
> mwkafka-prod-02.tbd:36866->mwkafka-prod-01.dr:9092 (FIN_WAIT2)
> java 144319 kafkagod 2036u IPv4 30004729 0t0 TCP 
> mwkafka-prod-02.tbd:36882->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2037u IPv4 30004914 0t0 TCP 
> mwkafka-prod-02.tbd:58426->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2038u IPv4 30015651 0t0 TCP 
> mwkafka-prod-02.tbd:36884->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2039u IPv4 30012966 0t0 TCP 
> mwkafka-prod-02.tbd:58422->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2040u IPv4 30005643 0t0 TCP 
> mwkafka-prod-02.tbd:36252->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2041u IPv4 30012944 0t0 TCP 
> mwkafka-prod-02.tbd:36286->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2042u IPv4 30012973 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51924 (ESTABLISHED)
> java 144319 kafkagod 2043u sock 0,7 0t0 30012463 protocol: TCP
> java 144319 kafkagod 2044u IPv4 30012979 0t0 TCP 

[jira] [Commented] (KAFKA-6177) kafka-mirror-maker.sh RecordTooLargeException

2019-03-05 Thread Artem Chekunov (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784442#comment-16784442
 ] 

Artem Chekunov commented on KAFKA-6177:
---

Hello there,

 

I had the same issue.

In "*producer.properties*" I increase "*max.request.size*" which solved the 
problem

> kafka-mirror-maker.sh RecordTooLargeException
> -
>
> Key: KAFKA-6177
> URL: https://issues.apache.org/jira/browse/KAFKA-6177
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.1.1
> Environment: centos 7
>Reporter: Rémi REY
>Priority: Minor
>  Labels: support
> Attachments: consumer.config, producer.config, server.properties
>
>
> Hi all,
> I am facing an issue with kafka-mirror-maker.sh.
> We have 2 kafka clusters with the same configuration and mirror maker 
> instances in charge of the mirroring between the clusters.
> We haven't change the default configuration on the message size, so the 
> 112 bytes limitation is expected on both clusters.
> we are facing the following error at the mirroring side:
> {code}
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: [2017-09-21 14:30:49,431] 
> ERROR Error when sending message to topic my_topic_name with key: 81 bytes, 
> value: 1000272 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: 
> org.apache.kafka.common.errors.RecordTooLargeException: The request included 
> a message larger than the max message size the server will accept.
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: [2017-09-21 14:30:49,511] 
> ERROR Error when sending message to topic my_topic_name with key: 81 bytes, 
> value: 13846 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: 
> java.lang.IllegalStateException: Producer is closed forcefully.
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:513)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortIncompleteBatches(RecordAccumulator.java:493)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:156)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> java.lang.Thread.run(Thread.java:745)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: [2017-09-21 14:30:49,511] 
> FATAL [mirrormaker-thread-0] Mirror maker thread failure due to  
> (kafka.tools.MirrorMaker$MirrorMakerThread)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: 
> java.lang.IllegalStateException: Cannot send after the producer is closed.
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:185)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:474)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:436)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> kafka.tools.MirrorMaker$MirrorMakerProducer.send(MirrorMaker.scala:657)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> kafka.tools.MirrorMaker$MirrorMakerThread$$anonfun$run$6.apply(MirrorMaker.scala:434)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> kafka.tools.MirrorMaker$MirrorMakerThread$$anonfun$run$6.apply(MirrorMaker.scala:434)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> scala.collection.Iterator$class.foreach(Iterator.scala:893)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
> kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:434)
> {code}
> Why am I getting this error ? 
> {code}
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: [2017-09-21 14:30:49,431] 
> ERROR Error when sending message to topic my_topic_name with key: 81 bytes, 
> value: 1000272 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: 
> org.apache.kafka.common.errors.RecordTooLargeException: The request included 
> a message larger than the max message size the server will accept.
> {code}
> How can mirror maker encounter a 1000272 bytes 

[jira] [Commented] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-03-05 Thread Abhi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784403#comment-16784403
 ] 

Abhi commented on KAFKA-7925:
-

[~rsivaram]
While look at the broker logs, I found multiple java.lang.NoClassDefFoundError 
exceptions in one of the server logs. I am not sure if this is related to this 
issue or totally separate issue. These exceptions started coming after the 
server was running fine for a day and I don't see such exceptions in other 
servers.

[2019-03-03 03:25:33,167] ERROR [ReplicaFetcher replicaId=4, leaderId=5, 
fetcherId=3] Error due to (kafka.server.ReplicaFetcherThread)
org.apache.kafka.common.KafkaException: Error processing data for partition 
fps.pese.desim_se-0 offset 202742
at 
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:338)
at scala.Option.foreach(Option.scala:274)
at 
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:296)
at 
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:295)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$5(AbstractFetcherThread.scala:295)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:295)
at 
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:132)
at 
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:131)
at scala.Option.foreach(Option.scala:274)
at 
kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:131)
at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:89)
Caused by: java.lang.NoClassDefFoundError: kafka/log/BatchMetadata
at kafka.log.Log.$anonfun$append$2(Log.scala:925)
at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
at kafka.log.Log.append(Log.scala:827)
at kafka.log.Log.appendAsFollower(Log.scala:807)
at 
kafka.cluster.Partition.$anonfun$doAppendRecordsToFollowerOrFutureReplica$1(Partition.scala:708)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
at 
kafka.cluster.Partition.doAppendRecordsToFollowerOrFutureReplica(Partition.scala:699)
at 
kafka.cluster.Partition.appendRecordsToFollowerOrFutureReplica(Partition.scala:715)
at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:157)
at 
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:307)
... 16 more
Caused by: java.lang.ClassNotFoundException: kafka.log.BatchMetadata
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 27 more

Other such exceptions included java.lang.NoClassDefFoundError: 
kafka/common/LongRef, java.lang.NoClassDefFoundError: kafka/log/LogAppendInfo$, 
java.lang.NoClassDefFoundError: kafka/api/LeaderAndIsr, 
java.lang.NoClassDefFoundError: org/apache/zookeeper/proto/SetWatches etc.


Do you need server logs for all brokers? The size is in >300MB for the logs is 
it okay to upload here on JIRA? I can upload client logs as they are small. I 
will try to extract out the logs for the test duration and upload here soon.






> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT 

[jira] [Commented] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-03-05 Thread Rajini Sivaram (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784387#comment-16784387
 ] 

Rajini Sivaram commented on KAFKA-7925:
---

[~xabhi] Thank you for trying out the PR. It looks like the solution I was 
attempting may be failing authentication. Can you attach the full client and 
broker logs at DEBUG level to the JIRA? It only needs to cover a short time, 
but the full set of logs and exceptions will be useful. I will have to see if I 
can set up a test environment before submitting a new PR.

> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP 
> mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP 
> mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP 
> mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2035u IPv4 30012898 0t0 TCP 
> mwkafka-prod-02.tbd:36866->mwkafka-prod-01.dr:9092 (FIN_WAIT2)
> java 144319 kafkagod 2036u IPv4 30004729 0t0 TCP 
> mwkafka-prod-02.tbd:36882->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2037u IPv4 30004914 0t0 TCP 
> mwkafka-prod-02.tbd:58426->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2038u IPv4 30015651 0t0 TCP 
> mwkafka-prod-02.tbd:36884->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2039u IPv4 30012966 0t0 TCP 
> mwkafka-prod-02.tbd:58422->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2040u IPv4 30005643 0t0 TCP 
> mwkafka-prod-02.tbd:36252->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2041u IPv4 30012944 0t0 TCP 
> mwkafka-prod-02.tbd:36286->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2042u IPv4 30012973 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51924 (ESTABLISHED)
> java 144319 kafkagod 2043u sock 0,7 0t0 30012463 protocol: TCP
> java 144319 kafkagod 2044u IPv4 30012979 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39994 (ESTABLISHED)
> java 144319 kafkagod 2045u IPv4 30012899 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34548 (ESTABLISHED)
> java 144319 kafkagod 2046u sock 0,7 0t0 30003437 protocol: TCP
> java 144319 kafkagod 2047u IPv4 30012980 0t0 TCP 
> 

[jira] [Resolved] (KAFKA-7966) Flaky Test DynamicBrokerReconfigurationTest#testLogCleanerConfig

2019-03-05 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-7966.
---
Resolution: Duplicate

> Flaky Test DynamicBrokerReconfigurationTest#testLogCleanerConfig
> 
>
> Key: KAFKA-7966
> URL: https://issues.apache.org/jira/browse/KAFKA-7966
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Assignee: Rajini Sivaram
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/22/]
> {quote}java.lang.AssertionError: Partition [__consumer_offsets,0] metadata 
> not propagated after 15000 ms at 
> kafka.utils.TestUtils$.fail(TestUtils.scala:356) at 
> kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:766) at 
> kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:855) at 
> kafka.utils.TestUtils$.$anonfun$createTopic$1(TestUtils.scala:303) at 
> kafka.utils.TestUtils$.$anonfun$createTopic$1$adapted(TestUtils.scala:302) at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at 
> scala.collection.immutable.Range.foreach(Range.scala:158) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:237) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:230) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> kafka.utils.TestUtils$.createTopic(TestUtils.scala:302) at 
> kafka.server.DynamicBrokerReconfigurationTest.setUp(DynamicBrokerReconfigurationTest.scala:137){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7966) Flaky Test DynamicBrokerReconfigurationTest#testLogCleanerConfig

2019-03-05 Thread Rajini Sivaram (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784379#comment-16784379
 ] 

Rajini Sivaram commented on KAFKA-7966:
---

DynamicBrokerReconfiguration test creates __consumer_offsets topic with the 
default of 50 partitions. From the exception it looks like the test failed to 
create the topic and propagate metadata within the 15 second timeout. This is 
in the test setup, so no other timing issues here, just a slow test machine. I 
am changing the number of offset partitions to the value used in other tests, 
which is 5. Change is included in the PR for KAFKA-7976 
(https://github.com/apache/kafka/pull/6374).

> Flaky Test DynamicBrokerReconfigurationTest#testLogCleanerConfig
> 
>
> Key: KAFKA-7966
> URL: https://issues.apache.org/jira/browse/KAFKA-7966
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Assignee: Rajini Sivaram
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/22/]
> {quote}java.lang.AssertionError: Partition [__consumer_offsets,0] metadata 
> not propagated after 15000 ms at 
> kafka.utils.TestUtils$.fail(TestUtils.scala:356) at 
> kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:766) at 
> kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:855) at 
> kafka.utils.TestUtils$.$anonfun$createTopic$1(TestUtils.scala:303) at 
> kafka.utils.TestUtils$.$anonfun$createTopic$1$adapted(TestUtils.scala:302) at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at 
> scala.collection.immutable.Range.foreach(Range.scala:158) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:237) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:230) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> kafka.utils.TestUtils$.createTopic(TestUtils.scala:302) at 
> kafka.server.DynamicBrokerReconfigurationTest.setUp(DynamicBrokerReconfigurationTest.scala:137){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7976) Flaky Test DynamicBrokerReconfigurationTest#testUncleanLeaderElectionEnable

2019-03-05 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reassigned KAFKA-7976:
-

Assignee: Rajini Sivaram

> Flaky Test DynamicBrokerReconfigurationTest#testUncleanLeaderElectionEnable
> ---
>
> Key: KAFKA-7976
> URL: https://issues.apache.org/jira/browse/KAFKA-7976
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Rajini Sivaram
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://builds.apache.org/blue/organizations/jenkins/kafka-2.2-jdk8/detail/kafka-2.2-jdk8/28/]
> {quote}java.lang.AssertionError: Unclean leader not elected
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> kafka.server.DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable(DynamicBrokerReconfigurationTest.scala:488){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7966) Flaky Test DynamicBrokerReconfigurationTest#testLogCleanerConfig

2019-03-05 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reassigned KAFKA-7966:
-

Assignee: Rajini Sivaram

> Flaky Test DynamicBrokerReconfigurationTest#testLogCleanerConfig
> 
>
> Key: KAFKA-7966
> URL: https://issues.apache.org/jira/browse/KAFKA-7966
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Assignee: Rajini Sivaram
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/22/]
> {quote}java.lang.AssertionError: Partition [__consumer_offsets,0] metadata 
> not propagated after 15000 ms at 
> kafka.utils.TestUtils$.fail(TestUtils.scala:356) at 
> kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:766) at 
> kafka.utils.TestUtils$.waitUntilMetadataIsPropagated(TestUtils.scala:855) at 
> kafka.utils.TestUtils$.$anonfun$createTopic$1(TestUtils.scala:303) at 
> kafka.utils.TestUtils$.$anonfun$createTopic$1$adapted(TestUtils.scala:302) at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at 
> scala.collection.immutable.Range.foreach(Range.scala:158) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:237) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:230) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> kafka.utils.TestUtils$.createTopic(TestUtils.scala:302) at 
> kafka.server.DynamicBrokerReconfigurationTest.setUp(DynamicBrokerReconfigurationTest.scala:137){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7988) Flaky Test DynamicBrokerReconfigurationTest#testThreadPoolResize

2019-03-05 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reassigned KAFKA-7988:
-

Assignee: Rajini Sivaram

> Flaky Test DynamicBrokerReconfigurationTest#testThreadPoolResize
> 
>
> Key: KAFKA-7988
> URL: https://issues.apache.org/jira/browse/KAFKA-7988
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Assignee: Rajini Sivaram
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://builds.apache.org/blue/organizations/jenkins/kafka-2.2-jdk8/detail/kafka-2.2-jdk8/30/]
> {quote}kafka.server.DynamicBrokerReconfigurationTest > testThreadPoolResize 
> FAILED java.lang.AssertionError: Invalid threads: expected 6, got 5: 
> List(ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-1, 
> ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-2, ReplicaFetcherThread-0-1) 
> at org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.server.DynamicBrokerReconfigurationTest.verifyThreads(DynamicBrokerReconfigurationTest.scala:1260)
>  at 
> kafka.server.DynamicBrokerReconfigurationTest.maybeVerifyThreadPoolSize$1(DynamicBrokerReconfigurationTest.scala:531)
>  at 
> kafka.server.DynamicBrokerReconfigurationTest.resizeThreadPool$1(DynamicBrokerReconfigurationTest.scala:550)
>  at 
> kafka.server.DynamicBrokerReconfigurationTest.reducePoolSize$1(DynamicBrokerReconfigurationTest.scala:536)
>  at 
> kafka.server.DynamicBrokerReconfigurationTest.$anonfun$testThreadPoolResize$3(DynamicBrokerReconfigurationTest.scala:559)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at 
> kafka.server.DynamicBrokerReconfigurationTest.verifyThreadPoolResize$1(DynamicBrokerReconfigurationTest.scala:558)
>  at 
> kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize(DynamicBrokerReconfigurationTest.scala:572){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7976) Flaky Test DynamicBrokerReconfigurationTest#testUncleanLeaderElectionEnable

2019-03-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784373#comment-16784373
 ] 

ASF GitHub Bot commented on KAFKA-7976:
---

omkreddy commented on pull request #6358: [Do not merge]KAFKA-7976: Increase 
TestUtils DEFAULT_MAX_WAIT_MS to 20 seconds 
URL: https://github.com/apache/kafka/pull/6358
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky Test DynamicBrokerReconfigurationTest#testUncleanLeaderElectionEnable
> ---
>
> Key: KAFKA-7976
> URL: https://issues.apache.org/jira/browse/KAFKA-7976
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://builds.apache.org/blue/organizations/jenkins/kafka-2.2-jdk8/detail/kafka-2.2-jdk8/28/]
> {quote}java.lang.AssertionError: Unclean leader not elected
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> kafka.server.DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable(DynamicBrokerReconfigurationTest.scala:488){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-03-05 Thread Abhi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784365#comment-16784365
 ] 

Abhi commented on KAFKA-7925:
-

[~rsivaram]

I am also seeing a similar exception when running consumer group command (with 
the setup running your patch). I don't get this exception on the other kafka 
setup that is running with v2.1.1

env KAFKA_LOG4J_CONFIG=log4j.properties 
KAFKA_OPTS="-Dsun.security.jgss.native=true 
-Dsun.security.jgss.lib=/usr/libexec/libgsswrap.so 
-Djavax.security.auth.useSubjectCredsOnly=false 
-Djava.security.auth.login.config=jaas.conf" 
/proj/tools/infra/apache/kafka/kafka_2.12-2.1.1/bin/kafka-consumer-groups.sh 
--bootstrap-server mwkafka-prod-01.nyc:9092 --command-config command_config  
--list  
Error: Executing consumer group command failed due to 
org.apache.kafka.common.errors.UnknownServerException: Error listing groups on 
mwkafka-prod-01.dr.xxx.com:9092 (id: 3 rack: dr.xxx.com)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.UnknownServerException: Error listing groups on 
mwkafka-prod-01.dr.xxx.com:9092 (id: 3 rack: dr.xxx.com)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at 
org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262)
at 
kafka.admin.ConsumerGroupCommand$ConsumerGroupService.listGroups(ConsumerGroupCommand.scala:132)
at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:58)
at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)
Caused by: org.apache.kafka.common.errors.UnknownServerException: Error listing 
groups on mwkafka-prod-01.dr.xxx.com:9092 (id: 3 rack: dr.xxx.com)


> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP 
> mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP 
> mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP 
> mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 

[jira] [Commented] (KAFKA-7703) KafkaConsumer.position may return a wrong offset after "seekToEnd" is called

2019-03-05 Thread Nikita Glashenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784366#comment-16784366
 ] 

Nikita Glashenko commented on KAFKA-7703:
-

We stumbled upon this bug in production.

[~viktorsomogyi], are you still working on this issue?

If not, I can try to continue working on it.

 

> KafkaConsumer.position may return a wrong offset after "seekToEnd" is called
> 
>
> Key: KAFKA-7703
> URL: https://issues.apache.org/jira/browse/KAFKA-7703
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>
> After "seekToEnd" is called, "KafkaConsumer.position" may return a wrong 
> offset set by another reset request.
> Here is a reproducer: 
> https://github.com/zsxwing/kafka/commit/4e1aa11bfa99a38ac1e2cb0872c055db56b33246
> In this reproducer, "poll(0)" will send an "earliest" request in background. 
> However, after "seekToEnd" is called, due to a race condition in 
> "Fetcher.resetOffsetIfNeeded" (It's not atomic, "seekToEnd" could happen 
> between the check 
> https://github.com/zsxwing/kafka/commit/4e1aa11bfa99a38ac1e2cb0872c055db56b33246#diff-b45245913eaae46aa847d2615d62cde0R585
>  and the seek 
> https://github.com/zsxwing/kafka/commit/4e1aa11bfa99a38ac1e2cb0872c055db56b33246#diff-b45245913eaae46aa847d2615d62cde0R605),
>  "KafkaConsumer.position" may return an "earliest" offset.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7976) Flaky Test DynamicBrokerReconfigurationTest#testUncleanLeaderElectionEnable

2019-03-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784356#comment-16784356
 ] 

ASF GitHub Bot commented on KAFKA-7976:
---

rajinisivaram commented on pull request #6374: KAFKA-7976 - Fix 
DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable
URL: https://github.com/apache/kafka/pull/6374
 
 
   Ensure that controller is not shutdown in the test.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky Test DynamicBrokerReconfigurationTest#testUncleanLeaderElectionEnable
> ---
>
> Key: KAFKA-7976
> URL: https://issues.apache.org/jira/browse/KAFKA-7976
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://builds.apache.org/blue/organizations/jenkins/kafka-2.2-jdk8/detail/kafka-2.2-jdk8/28/]
> {quote}java.lang.AssertionError: Unclean leader not elected
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at 
> kafka.server.DynamicBrokerReconfigurationTest.testUncleanLeaderElectionEnable(DynamicBrokerReconfigurationTest.scala:488){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-03-05 Thread Abhi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784298#comment-16784298
 ] 

Abhi commented on KAFKA-7925:
-

Hi, I deployed the patch in my setup but now I am getting below exception when 
trying to publish messages. This exception is received on client side. I did 
not see any error or warnings around this time (2019-03-05 04:16:25) in server 
logs.

[2019-03-05 04:16:25,146] ERROR Uncaught exception in thread 
'kafka-producer-network-thread | test_prod': 
(org.apache.kafka.common.utils.KafkaThread)
deshaw.common.util.ApplicationDeath: 
org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request.
at 
deshaw.kafka.test.KafkaTestProducer$ProducerCallback.onCompletion(KafkaTestProducer.java:475)
at 
org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1304)
at 
org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
at 
org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)
at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:685)
at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:635)
at 
org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:557)
at 
org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
at 
org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:786)
at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:557)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:549)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:311)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.common.errors.UnknownServerException: The server 
experienced an unexpected error when processing the request.
[2019-03-05 04:17:25,144] DEBUG [Producer clientId=test_prod] Exception 
occurred during message send: (org.apache.kafka.clients.producer.KafkaProducer)

On a side note, after deploying this patch, I also observed lot of connection 
disconnects:

[2019-03-05 01:23:39,553] DEBUG [SocketServer brokerId=1] Connection with 
/10.219.26.10 disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:640)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:561)
at org.apache.kafka.common.network.Selector.poll(Selector.java:472)
at kafka.network.Processor.poll(SocketServer.scala:830)
at kafka.network.Processor.run(SocketServer.scala:730)
at java.base/java.lang.Thread.run(Thread.java:834)
[2019-03-05 01:23:39,553] DEBUG [SocketServer brokerId=1] Connection with 
/10.219.26.10 disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:640)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:561)
at org.apache.kafka.common.network.Selector.poll(Selector.java:472)
at kafka.network.Processor.poll(SocketServer.scala:830)
at kafka.network.Processor.run(SocketServer.scala:730)
at java.base/java.lang.Thread.run(Thread.java:834)

[2019-03-05 01:27:57,386] DEBUG [Controller id=1, targetBrokerId=4] Connection 
with mwkafka-prod-02.dr.deshaw.com/10.218.247.23 disconnected 
(org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:640)
at 

[jira] [Comment Edited] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-03-05 Thread Abhi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784298#comment-16784298
 ] 

Abhi edited comment on KAFKA-7925 at 3/5/19 10:35 AM:
--

Hi, I deployed the patch in my setup but now I am getting below exception when 
trying to publish messages. This exception is received on client side. I did 
not see any error or warnings around this time (2019-03-05 04:16:25) in server 
logs.

[2019-03-05 04:16:25,146] ERROR Uncaught exception in thread 
'kafka-producer-network-thread | test_prod': 
(org.apache.kafka.common.utils.KafkaThread)
common.util.ApplicationDeath: 
org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request.
at 
xxx.kafka.test.KafkaTestProducer$ProducerCallback.onCompletion(KafkaTestProducer.java:475)
at 
org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1304)
at 
org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
at 
org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)
at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:685)
at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:635)
at 
org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:557)
at 
org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
at 
org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:786)
at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:557)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:549)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:311)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.common.errors.UnknownServerException: The server 
experienced an unexpected error when processing the request.
[2019-03-05 04:17:25,144] DEBUG [Producer clientId=test_prod] Exception 
occurred during message send: (org.apache.kafka.clients.producer.KafkaProducer)

On a side note, after deploying this patch, I also observed lot of connection 
disconnects:

[2019-03-05 01:23:39,553] DEBUG [SocketServer brokerId=1] Connection with 
/10.219.26.10 disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:640)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:561)
at org.apache.kafka.common.network.Selector.poll(Selector.java:472)
at kafka.network.Processor.poll(SocketServer.scala:830)
at kafka.network.Processor.run(SocketServer.scala:730)
at java.base/java.lang.Thread.run(Thread.java:834)
[2019-03-05 01:23:39,553] DEBUG [SocketServer brokerId=1] Connection with 
/10.219.26.10 disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:640)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:561)
at org.apache.kafka.common.network.Selector.poll(Selector.java:472)
at kafka.network.Processor.poll(SocketServer.scala:830)
at kafka.network.Processor.run(SocketServer.scala:730)
at java.base/java.lang.Thread.run(Thread.java:834)

[2019-03-05 01:27:57,386] DEBUG [Controller id=1, targetBrokerId=4] Connection 
with mwkafka-prod-02.dr.xxx.com/10.218.247.23 disconnected 
(org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:640)
 

[jira] [Comment Edited] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-03-05 Thread Abhi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784298#comment-16784298
 ] 

Abhi edited comment on KAFKA-7925 at 3/5/19 10:38 AM:
--

Hi, I deployed the patch in my setup but now I am getting below exception when 
trying to publish messages on 40 topics using same producer. This exception is 
received on client side. I did not see any error or warnings around this time 
(2019-03-05 04:16:25) in server logs.

[2019-03-05 04:16:25,146] ERROR Uncaught exception in thread 
'kafka-producer-network-thread | test_prod': 
(org.apache.kafka.common.utils.KafkaThread)
common.util.ApplicationDeath: 
org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request.
at 
xxx.kafka.test.KafkaTestProducer$ProducerCallback.onCompletion(KafkaTestProducer.java:475)
at 
org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1304)
at 
org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
at 
org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)
at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:685)
at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:635)
at 
org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:557)
at 
org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
at 
org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:786)
at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:557)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:549)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:311)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.common.errors.UnknownServerException: The server 
experienced an unexpected error when processing the request.
[2019-03-05 04:17:25,144] DEBUG [Producer clientId=test_prod] Exception 
occurred during message send: (org.apache.kafka.clients.producer.KafkaProducer)

On a side note, after deploying this patch, I also observed lot of connection 
disconnects:

[2019-03-05 01:23:39,553] DEBUG [SocketServer brokerId=1] Connection with 
/10.219.26.10 disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:640)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:561)
at org.apache.kafka.common.network.Selector.poll(Selector.java:472)
at kafka.network.Processor.poll(SocketServer.scala:830)
at kafka.network.Processor.run(SocketServer.scala:730)
at java.base/java.lang.Thread.run(Thread.java:834)
[2019-03-05 01:23:39,553] DEBUG [SocketServer brokerId=1] Connection with 
/10.219.26.10 disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:640)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:561)
at org.apache.kafka.common.network.Selector.poll(Selector.java:472)
at kafka.network.Processor.poll(SocketServer.scala:830)
at kafka.network.Processor.run(SocketServer.scala:730)
at java.base/java.lang.Thread.run(Thread.java:834)

[2019-03-05 01:27:57,386] DEBUG [Controller id=1, targetBrokerId=4] Connection 
with mwkafka-prod-02.dr.xxx.com/10.218.247.23 disconnected 
(org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
at 

[jira] [Updated] (KAFKA-7502) Cleanup KTable materialization logic in a single place

2019-03-05 Thread Lee Dongjin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lee Dongjin updated KAFKA-7502:
---
Priority: Major  (was: Minor)

> Cleanup KTable materialization logic in a single place
> --
>
> Key: KAFKA-7502
> URL: https://issues.apache.org/jira/browse/KAFKA-7502
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Lee Dongjin
>Priority: Major
>
> Today since we pre-create all the `KTableXXX` operator along with the logical 
> node, we are effectively duplicating the logic to determine whether the 
> resulted KTable should be materialized. More specifically, the 
> materialization principle today is that:
> 1) If users specified Materialized in the DSL and it contains a queryable 
> name. We always materialize.
> 2) If users specified Materialized in the DSL but not contains a queryable 
> name, or if users do not specify a Materialized object at all, Streams may 
> choose to materialize or not. But in any cases, even if the KTable is 
> materialized it will not be queryable since there's no queryable name (i.e. 
> only storeName is not null, but queryableName is null):
> 2.a) If the resulted KTable is from an aggregation, we always materialize 
> since it is needed for storing the aggregation (i.e. we use the 
> MaterializedInternal constructor with nameProvider != null).
> 2.b) If the resulted KTable is from a source topic, we delay the 
> materialization until the downstream operator requires this KTable to be 
> materialized or send-old-values (see `KTableSourceNode` and `KTableSource`).
> 2.c) If the resulted KTable if from a join, we always materialize. However 
> this can be optimized similar to 2.b) but is orthogonal to this ticket (see 
> `KTableImpl#buildJoin` where we always use constructor with nameProvider != 
> null).
> 2.d) If the resulted KTable is from a stateless operation like filter / 
> mapValues, we never materialize.
> 
> Now, in all of these cases, we have logical node like "KTableKTableJoinNode", 
> as well as physical node like `ProcessorNode`. Ideally we should always 
> create the logical Plan (i.e. the StreamsGraph), and then optimize it if 
> necessary, and then generate the physical plan (i.e. the Topology), however 
> today we create some physical nodes beforehand, and the above logic is hence 
> duplicated in the creation of both physical nodes and logical nodes. For 
> example, in `KTableKTableJoinNode` we check if Materialized is null for 
> adding a state store, and in `KTableImpl#doJoin` we check if materialized is 
> specified (case 2.c) above). 
> Another example is in TableProcessorNode which is used for 2.d) above, in 
> which it includes the logic whereas its caller, `KTableImpl#doFilter` for 
> example, also contains the logic when deciding to pass `queryableName` 
> parameter to `KTableProcessorSupplier`.
> This is bug-vulnerable since we may update the logic in one class but forgot 
> to update the other class.
> --
> What we want to have is a cleaner code path similar to what we have for 2.b), 
> such that when creating the logical nodes we keep track of whether 1) 
> materialized is specified, and 2) queryable name is provided. And during 
> optimization phase, we may change the inner physical ProcessorBuilder's 
> parameters like queryable name etc, and then when it is time to generate the 
> physical node, we can just blindly take the parameters and go for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)