[jira] [Comment Edited] (KAFKA-3238) Deadlock Mirrormaker consumer not fetching any messages
[ https://issues.apache.org/jira/browse/KAFKA-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149068#comment-15149068 ] Rekha Joshi edited comment on KAFKA-3238 at 2/17/16 3:46 AM: - anyone home? :) This has similarity to KAFKA-2048, but as per my investigation did not see the IllegalMonitorStateException in logs.The trace here suggest the underlying fetcher thread is stopped.Thankfully for having logs issue corrected, KAFKA-1891 is fixed in Kafka 0.9. Maybe with mirrormaker refactored in 0.9 and few other fixes, Kafka 0.9 is one option? Anyhow it requires uphaul of a production system at my end, so wondering if there are any other suggestions? Would be great to have your inputs! thanks [~nehanarkhede] [~junrao] [~jkreps] was (Author: rekhajoshm): anyone home? :) This has similarity to KAFKA-2048, but as per my investigation did not see the IllegalMonitorStateException in logs.But the underlying fetcher thread to be completely stopped.The trace substantiate that. Its not clearly mentioned but was KAFKA-2048 is fixed in 0.9? would be great to have your inputs! thanks [~nehanarkhede] [~junrao] [~jkreps] > Deadlock Mirrormaker consumer not fetching any messages > --- > > Key: KAFKA-3238 > URL: https://issues.apache.org/jira/browse/KAFKA-3238 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Rekha Joshi >Priority: Critical > > Hi, > We have been seeing consistent issue mirroring between our DataCenters > happening randomly.Below are the details. > Thanks > Rekha > {code} > Source: AWS (13 Brokers) > Destination: OTHER-DC (20 Brokers) > Topic: mirrortest (source: 260 partitions, destination: 200 partitions) > Connectivity: AWS Direct Connect (max 6Gbps) > Data details: Source is receiving 40,000 msg/sec, each message is around > 5KB > Mirroring > > Node: is at our OTHER-DC (24 Cores, 48 GB Memory) > JVM Details: 1.8.0_51, -Xmx8G -Xms8G -XX:PermSize=96m -XX:MaxPermSize=96m > -XX:+UseG1GC -XX:MaxGCPauseMillis=20 > -XX:InitiatingHeapOccupancyPercent=35 > Launch script: kafka.tools.MirrorMaker --consumer.config > consumer.properties --producer.config producer.properties --num.producers > 1 --whitelist mirrortest --num.streams 1 --queue.size 10 > consumer.properties > --- > zookeeper.connect= > group.id=KafkaMirror > auto.offset.reset=smallest > fetch.message.max.bytes=900 > zookeeper.connection.timeout.ms=6 > rebalance.max.retries=4 > rebalance.backoff.ms=5000 > producer.properties > -- > metadata.broker.list= > partitioner.class= > producer.type=async > When we start the mirroring job everything works fine as expected, > Eventually we hit an issue where the job stops consuming no more. > At this stage: > 1. No Error seen in the mirrormaker logs > 2. consumer threads are not fetching any messages and we see thread dumps > as follows: > "ConsumerFetcherThread-KafkaMirror_1454375262613-7b760d87-0-26" - Thread > t@73 > java.lang.Thread.State: WAITING > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <79b6d3ce> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awa > i > t(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350 > ) > at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60) > at > kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcher > T > hread.scala:49) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu > n > $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu > n > $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109) > at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$m > c > V$sp(AbstractFetcherThread.scala:109) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A > b > stractFetcherThread.scala:109) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A > b > stractFetcherThread.scala:109) > at kafka.utils.Utils$.inLock(Utils.scala:535) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThr > e > ad.scala:108) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > Locked ownable synchronizers: > - locked
[jira] [Comment Edited] (KAFKA-3238) Deadlock Mirrormaker consumer not fetching any messages
[ https://issues.apache.org/jira/browse/KAFKA-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149068#comment-15149068 ] Rekha Joshi edited comment on KAFKA-3238 at 2/16/16 9:29 PM: - anyone home? :) This has similarity to KAFKA-2048, but as per my investigation did not see the IllegalMonitorStateException in logs.But the underlying fetcher thread to be completely stopped.The trace substantiate that. Its not clearly mentioned but was KAFKA-2048 is fixed in 0.9? would be great to have your inputs! thanks [~nehanarkhede] [~junrao] [~jkreps] was (Author: rekhajoshm): anyone home? :) would be great to have your inputs! thanks [~nehanarkhede] [~junrao] [~jkreps] > Deadlock Mirrormaker consumer not fetching any messages > --- > > Key: KAFKA-3238 > URL: https://issues.apache.org/jira/browse/KAFKA-3238 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Rekha Joshi >Priority: Critical > > Hi, > We have been seeing consistent issue mirroring between our DataCenters > happening randomly.Below are the details. > Thanks > Rekha > {code} > Source: AWS (13 Brokers) > Destination: OTHER-DC (20 Brokers) > Topic: mirrortest (source: 260 partitions, destination: 200 partitions) > Connectivity: AWS Direct Connect (max 6Gbps) > Data details: Source is receiving 40,000 msg/sec, each message is around > 5KB > Mirroring > > Node: is at our OTHER-DC (24 Cores, 48 GB Memory) > JVM Details: 1.8.0_51, -Xmx8G -Xms8G -XX:PermSize=96m -XX:MaxPermSize=96m > -XX:+UseG1GC -XX:MaxGCPauseMillis=20 > -XX:InitiatingHeapOccupancyPercent=35 > Launch script: kafka.tools.MirrorMaker --consumer.config > consumer.properties --producer.config producer.properties --num.producers > 1 --whitelist mirrortest --num.streams 1 --queue.size 10 > consumer.properties > --- > zookeeper.connect= > group.id=KafkaMirror > auto.offset.reset=smallest > fetch.message.max.bytes=900 > zookeeper.connection.timeout.ms=6 > rebalance.max.retries=4 > rebalance.backoff.ms=5000 > producer.properties > -- > metadata.broker.list= > partitioner.class= > producer.type=async > When we start the mirroring job everything works fine as expected, > Eventually we hit an issue where the job stops consuming no more. > At this stage: > 1. No Error seen in the mirrormaker logs > 2. consumer threads are not fetching any messages and we see thread dumps > as follows: > "ConsumerFetcherThread-KafkaMirror_1454375262613-7b760d87-0-26" - Thread > t@73 > java.lang.Thread.State: WAITING > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <79b6d3ce> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awa > i > t(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350 > ) > at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60) > at > kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcher > T > hread.scala:49) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu > n > $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu > n > $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109) > at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$m > c > V$sp(AbstractFetcherThread.scala:109) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A > b > stractFetcherThread.scala:109) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A > b > stractFetcherThread.scala:109) > at kafka.utils.Utils$.inLock(Utils.scala:535) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThr > e > ad.scala:108) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > Locked ownable synchronizers: > - locked <199dc92d> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > 3. Producer stops producing, in trace mode we notice it's handling 0 > events and Thread dump as follows: > "ProducerSendThread--0" - Thread t@53 > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) > at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) > at sun.nio.ch.IOUtil.write(IOUtil.java:148) > at
[jira] [Comment Edited] (KAFKA-3238) Deadlock Mirrormaker consumer not fetching any messages
[ https://issues.apache.org/jira/browse/KAFKA-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145786#comment-15145786 ] Rekha Joshi edited comment on KAFKA-3238 at 2/13/16 7:29 AM: - Hi, [~nehanarkhede] [~junrao] [~jkreps] : I have seen few similar deadlock related issues on kafka - KAFKA-914 , KAFKA-702 and few open issues in similar vein.Looking forward to hearing your analysis/thoughts on this mirrormaker issue soon. thanks! Thanks Rekha was (Author: rekhajoshm): Hi, [~nehanarkhede] [~junrao] [~jkreps] : I have seen few similar deadlock related issues on kafka - KAFKA-914 , KAFKA-702 and was wondering if it relates to datastructures used/locking pattern/JDK related. For eg: http://bugs.java.com/view_bug.do?bug_id=7011862 Please let me know your analysis/thoughts on the mirrormaker issue here? thanks! Thanks Rekha > Deadlock Mirrormaker consumer not fetching any messages > --- > > Key: KAFKA-3238 > URL: https://issues.apache.org/jira/browse/KAFKA-3238 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Rekha Joshi >Priority: Critical > > Hi, > We have been seeing consistent issue mirroring between our DataCenters > happening randomly.Below are the details. > Thanks > Rekha > {code} > Source: AWS (13 Brokers) > Destination: OTHER-DC (20 Brokers) > Topic: mirrortest (source: 260 partitions, destination: 200 partitions) > Connectivity: AWS Direct Connect (max 6Gbps) > Data details: Source is receiving 40,000 msg/sec, each message is around > 5KB > Mirroring > > Node: is at our OTHER-DC (24 Cores, 48 GB Memory) > JVM Details: 1.8.0_51, -Xmx8G -Xms8G -XX:PermSize=96m -XX:MaxPermSize=96m > -XX:+UseG1GC -XX:MaxGCPauseMillis=20 > -XX:InitiatingHeapOccupancyPercent=35 > Launch script: kafka.tools.MirrorMaker --consumer.config > consumer.properties --producer.config producer.properties --num.producers > 1 --whitelist mirrortest --num.streams 1 --queue.size 10 > consumer.properties > --- > zookeeper.connect= > group.id=KafkaMirror > auto.offset.reset=smallest > fetch.message.max.bytes=900 > zookeeper.connection.timeout.ms=6 > rebalance.max.retries=4 > rebalance.backoff.ms=5000 > producer.properties > -- > metadata.broker.list= > partitioner.class= > producer.type=async > When we start the mirroring job everything works fine as expected, > Eventually we hit an issue where the job stops consuming no more. > At this stage: > 1. No Error seen in the mirrormaker logs > 2. consumer threads are not fetching any messages and we see thread dumps > as follows: > "ConsumerFetcherThread-KafkaMirror_1454375262613-7b760d87-0-26" - Thread > t@73 > java.lang.Thread.State: WAITING > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <79b6d3ce> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awa > i > t(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350 > ) > at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60) > at > kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcher > T > hread.scala:49) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu > n > $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu > n > $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109) > at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$m > c > V$sp(AbstractFetcherThread.scala:109) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A > b > stractFetcherThread.scala:109) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A > b > stractFetcherThread.scala:109) > at kafka.utils.Utils$.inLock(Utils.scala:535) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThr > e > ad.scala:108) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > Locked ownable synchronizers: > - locked <199dc92d> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > 3. Producer stops producing, in trace mode we notice it's handling 0 > events and Thread dump as follows: > "ProducerSendThread--0" - Thread t@53 > java.lang.Thread.State: RUNNABLE > at
[jira] [Comment Edited] (KAFKA-3238) Deadlock Mirrormaker consumer not fetching any messages
[ https://issues.apache.org/jira/browse/KAFKA-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145786#comment-15145786 ] Rekha Joshi edited comment on KAFKA-3238 at 2/13/16 4:24 AM: - Hi, [~nehanarkhede] [~junrao] [~jkreps] : I have seen few similar deadlock related issues on kafka - KAFKA-914 , KAFKA-702 and was wondering if it relates to datastructures used/locking pattern/JDK related. For eg: http://bugs.java.com/view_bug.do?bug_id=7011862 Please let me know your analysis/thoughts on the mirrormaker issue here? thanks! Thanks Rekha was (Author: rekhajoshm): Hi, [~nehanarkhede] [~junrao] [~jkreps] : I have seen few deadlock related issues on kafka - KAFKA-914 , KAFKA-702 and was wondering if it relates to datastructures used/locking pattern/JDK related. For eg: http://bugs.java.com/view_bug.do?bug_id=7011862 Please let me know your analysis/thoughts on the mirrormaker issue here? thanks! Thanks Rekha > Deadlock Mirrormaker consumer not fetching any messages > --- > > Key: KAFKA-3238 > URL: https://issues.apache.org/jira/browse/KAFKA-3238 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Rekha Joshi >Priority: Critical > > Hi, > We have been seeing consistent issue mirroring between our DataCenters > happening randomly.Below are the details. > Thanks > Rekha > {code} > Source: AWS (13 Brokers) > Destination: OTHER-DC (20 Brokers) > Topic: mirrortest (source: 260 partitions, destination: 200 partitions) > Connectivity: AWS Direct Connect (max 6Gbps) > Data details: Source is receiving 40,000 msg/sec, each message is around > 5KB > Mirroring > > Node: is at our OTHER-DC (24 Cores, 48 GB Memory) > JVM Details: 1.8.0_51, -Xmx8G -Xms8G -XX:PermSize=96m -XX:MaxPermSize=96m > -XX:+UseG1GC -XX:MaxGCPauseMillis=20 > -XX:InitiatingHeapOccupancyPercent=35 > Launch script: kafka.tools.MirrorMaker --consumer.config > consumer.properties --producer.config producer.properties --num.producers > 1 --whitelist mirrortest --num.streams 1 --queue.size 10 > consumer.properties > --- > zookeeper.connect= > group.id=KafkaMirror > auto.offset.reset=smallest > fetch.message.max.bytes=900 > zookeeper.connection.timeout.ms=6 > rebalance.max.retries=4 > rebalance.backoff.ms=5000 > producer.properties > -- > metadata.broker.list= > partitioner.class= > producer.type=async > When we start the mirroring job everything works fine as expected, > Eventually we hit an issue where the job stops consuming no more. > At this stage: > 1. No Error seen in the mirrormaker logs > 2. consumer threads are not fetching any messages and we see thread dumps > as follows: > "ConsumerFetcherThread-KafkaMirror_1454375262613-7b760d87-0-26" - Thread > t@73 > java.lang.Thread.State: WAITING > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <79b6d3ce> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awa > i > t(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350 > ) > at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60) > at > kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcher > T > hread.scala:49) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu > n > $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu > n > $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109) > at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$m > c > V$sp(AbstractFetcherThread.scala:109) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A > b > stractFetcherThread.scala:109) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A > b > stractFetcherThread.scala:109) > at kafka.utils.Utils$.inLock(Utils.scala:535) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThr > e > ad.scala:108) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > Locked ownable synchronizers: > - locked <199dc92d> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > 3. Producer stops producing, in trace mode we notice it's handling 0 > events and Thread dump as follows: > "ProducerSendThread--0" - Thread