[jira] [Comment Edited] (KAFKA-3238) Deadlock Mirrormaker consumer not fetching any messages

2016-02-16 Thread Rekha Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149068#comment-15149068
 ] 

Rekha Joshi edited comment on KAFKA-3238 at 2/17/16 3:46 AM:
-

anyone home? :) This has similarity to KAFKA-2048, but as per my investigation 
did not see the IllegalMonitorStateException in logs.The trace here suggest the 
underlying fetcher thread is stopped.Thankfully for having logs issue 
corrected, KAFKA-1891 is fixed in Kafka 0.9. Maybe with mirrormaker refactored 
in 0.9 and few other fixes, Kafka 0.9 is one option? Anyhow it requires uphaul 
of a production system at my end, so wondering if there are any other 
suggestions?
Would be great to have your inputs!  thanks [~nehanarkhede] [~junrao] [~jkreps]


was (Author: rekhajoshm):
anyone home? :) 
This has similarity to KAFKA-2048, but as per my investigation did not see the 
IllegalMonitorStateException in logs.But the underlying fetcher thread to be  
completely stopped.The trace substantiate that. Its not clearly mentioned but 
was KAFKA-2048 is fixed in 0.9?
would be great to have your inputs!  thanks [~nehanarkhede] [~junrao] [~jkreps]

> Deadlock Mirrormaker consumer not fetching any messages
> ---
>
> Key: KAFKA-3238
> URL: https://issues.apache.org/jira/browse/KAFKA-3238
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Rekha Joshi
>Priority: Critical
>
> Hi,
> We have been seeing consistent issue mirroring between our DataCenters 
> happening randomly.Below are the details.
> Thanks
> Rekha
> {code}
> Source: AWS (13 Brokers)
> Destination: OTHER-DC (20 Brokers)
> Topic: mirrortest (source: 260 partitions, destination: 200 partitions)
> Connectivity: AWS Direct Connect (max 6Gbps)
> Data details: Source is receiving 40,000 msg/sec, each message is around
> 5KB
> Mirroring
> 
> Node: is at our OTHER-DC (24 Cores, 48 GB Memory)
> JVM Details: 1.8.0_51, -Xmx8G -Xms8G -XX:PermSize=96m -XX:MaxPermSize=96m
> -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> -XX:InitiatingHeapOccupancyPercent=35
> Launch script: kafka.tools.MirrorMaker --consumer.config
> consumer.properties --producer.config producer.properties --num.producers
> 1 --whitelist mirrortest --num.streams 1 --queue.size 10
> consumer.properties
> ---
> zookeeper.connect=
> group.id=KafkaMirror
> auto.offset.reset=smallest
> fetch.message.max.bytes=900
> zookeeper.connection.timeout.ms=6
> rebalance.max.retries=4
> rebalance.backoff.ms=5000
> producer.properties
> --
> metadata.broker.list=
> partitioner.class=
> producer.type=async
> When we start the mirroring job everything works fine as expected,
> Eventually we hit an issue where the job stops consuming no more.
> At this stage:
> 1. No Error seen in the mirrormaker logs
> 2. consumer threads are not fetching any messages and we see thread dumps
> as follows:
> "ConsumerFetcherThread-KafkaMirror_1454375262613-7b760d87-0-26" - Thread
> t@73
> java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <79b6d3ce> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awa
> i
> t(AbstractQueuedSynchronizer.java:2039)
> at
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350
> )
> at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
> at
> kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcher
> T
> hread.scala:49)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu
> n
> $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu
> n
> $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109)
> at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
> at
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$m
> c
> V$sp(AbstractFetcherThread.scala:109)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A
> b
> stractFetcherThread.scala:109)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A
> b
> stractFetcherThread.scala:109)
> at kafka.utils.Utils$.inLock(Utils.scala:535)
> at
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThr
> e
> ad.scala:108)
> at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> Locked ownable synchronizers:
> - locked 

[jira] [Comment Edited] (KAFKA-3238) Deadlock Mirrormaker consumer not fetching any messages

2016-02-16 Thread Rekha Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149068#comment-15149068
 ] 

Rekha Joshi edited comment on KAFKA-3238 at 2/16/16 9:29 PM:
-

anyone home? :) 
This has similarity to KAFKA-2048, but as per my investigation did not see the 
IllegalMonitorStateException in logs.But the underlying fetcher thread to be  
completely stopped.The trace substantiate that. Its not clearly mentioned but 
was KAFKA-2048 is fixed in 0.9?
would be great to have your inputs!  thanks [~nehanarkhede] [~junrao] [~jkreps]


was (Author: rekhajoshm):
anyone home? :) would be great to have your inputs! thanks [~nehanarkhede] 
[~junrao] [~jkreps]

> Deadlock Mirrormaker consumer not fetching any messages
> ---
>
> Key: KAFKA-3238
> URL: https://issues.apache.org/jira/browse/KAFKA-3238
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Rekha Joshi
>Priority: Critical
>
> Hi,
> We have been seeing consistent issue mirroring between our DataCenters 
> happening randomly.Below are the details.
> Thanks
> Rekha
> {code}
> Source: AWS (13 Brokers)
> Destination: OTHER-DC (20 Brokers)
> Topic: mirrortest (source: 260 partitions, destination: 200 partitions)
> Connectivity: AWS Direct Connect (max 6Gbps)
> Data details: Source is receiving 40,000 msg/sec, each message is around
> 5KB
> Mirroring
> 
> Node: is at our OTHER-DC (24 Cores, 48 GB Memory)
> JVM Details: 1.8.0_51, -Xmx8G -Xms8G -XX:PermSize=96m -XX:MaxPermSize=96m
> -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> -XX:InitiatingHeapOccupancyPercent=35
> Launch script: kafka.tools.MirrorMaker --consumer.config
> consumer.properties --producer.config producer.properties --num.producers
> 1 --whitelist mirrortest --num.streams 1 --queue.size 10
> consumer.properties
> ---
> zookeeper.connect=
> group.id=KafkaMirror
> auto.offset.reset=smallest
> fetch.message.max.bytes=900
> zookeeper.connection.timeout.ms=6
> rebalance.max.retries=4
> rebalance.backoff.ms=5000
> producer.properties
> --
> metadata.broker.list=
> partitioner.class=
> producer.type=async
> When we start the mirroring job everything works fine as expected,
> Eventually we hit an issue where the job stops consuming no more.
> At this stage:
> 1. No Error seen in the mirrormaker logs
> 2. consumer threads are not fetching any messages and we see thread dumps
> as follows:
> "ConsumerFetcherThread-KafkaMirror_1454375262613-7b760d87-0-26" - Thread
> t@73
> java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <79b6d3ce> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awa
> i
> t(AbstractQueuedSynchronizer.java:2039)
> at
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350
> )
> at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
> at
> kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcher
> T
> hread.scala:49)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu
> n
> $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu
> n
> $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109)
> at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
> at
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$m
> c
> V$sp(AbstractFetcherThread.scala:109)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A
> b
> stractFetcherThread.scala:109)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A
> b
> stractFetcherThread.scala:109)
> at kafka.utils.Utils$.inLock(Utils.scala:535)
> at
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThr
> e
> ad.scala:108)
> at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> Locked ownable synchronizers:
> - locked <199dc92d> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> 3. Producer stops producing, in trace mode we notice it's handling 0
> events and Thread dump as follows:
> "ProducerSendThread--0" - Thread t@53
>   java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
> at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
> at sun.nio.ch.IOUtil.write(IOUtil.java:148)
> at 

[jira] [Comment Edited] (KAFKA-3238) Deadlock Mirrormaker consumer not fetching any messages

2016-02-12 Thread Rekha Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145786#comment-15145786
 ] 

Rekha Joshi edited comment on KAFKA-3238 at 2/13/16 7:29 AM:
-

Hi,
 [~nehanarkhede] [~junrao] [~jkreps] :
I have seen few similar deadlock related issues on kafka - KAFKA-914 , 
KAFKA-702 and few open issues in similar vein.Looking forward to hearing your 
analysis/thoughts on this mirrormaker issue soon. thanks!
Thanks
Rekha


was (Author: rekhajoshm):
Hi,
 [~nehanarkhede] [~junrao] [~jkreps] :
I have seen few similar deadlock related issues on kafka - KAFKA-914 , 
KAFKA-702 and was wondering if it relates to datastructures used/locking 
pattern/JDK related. For eg: http://bugs.java.com/view_bug.do?bug_id=7011862 
Please let me know your analysis/thoughts on the mirrormaker issue here? thanks!
Thanks
Rekha

> Deadlock Mirrormaker consumer not fetching any messages
> ---
>
> Key: KAFKA-3238
> URL: https://issues.apache.org/jira/browse/KAFKA-3238
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Rekha Joshi
>Priority: Critical
>
> Hi,
> We have been seeing consistent issue mirroring between our DataCenters 
> happening randomly.Below are the details.
> Thanks
> Rekha
> {code}
> Source: AWS (13 Brokers)
> Destination: OTHER-DC (20 Brokers)
> Topic: mirrortest (source: 260 partitions, destination: 200 partitions)
> Connectivity: AWS Direct Connect (max 6Gbps)
> Data details: Source is receiving 40,000 msg/sec, each message is around
> 5KB
> Mirroring
> 
> Node: is at our OTHER-DC (24 Cores, 48 GB Memory)
> JVM Details: 1.8.0_51, -Xmx8G -Xms8G -XX:PermSize=96m -XX:MaxPermSize=96m
> -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> -XX:InitiatingHeapOccupancyPercent=35
> Launch script: kafka.tools.MirrorMaker --consumer.config
> consumer.properties --producer.config producer.properties --num.producers
> 1 --whitelist mirrortest --num.streams 1 --queue.size 10
> consumer.properties
> ---
> zookeeper.connect=
> group.id=KafkaMirror
> auto.offset.reset=smallest
> fetch.message.max.bytes=900
> zookeeper.connection.timeout.ms=6
> rebalance.max.retries=4
> rebalance.backoff.ms=5000
> producer.properties
> --
> metadata.broker.list=
> partitioner.class=
> producer.type=async
> When we start the mirroring job everything works fine as expected,
> Eventually we hit an issue where the job stops consuming no more.
> At this stage:
> 1. No Error seen in the mirrormaker logs
> 2. consumer threads are not fetching any messages and we see thread dumps
> as follows:
> "ConsumerFetcherThread-KafkaMirror_1454375262613-7b760d87-0-26" - Thread
> t@73
> java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <79b6d3ce> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awa
> i
> t(AbstractQueuedSynchronizer.java:2039)
> at
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350
> )
> at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
> at
> kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcher
> T
> hread.scala:49)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu
> n
> $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu
> n
> $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109)
> at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
> at
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$m
> c
> V$sp(AbstractFetcherThread.scala:109)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A
> b
> stractFetcherThread.scala:109)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A
> b
> stractFetcherThread.scala:109)
> at kafka.utils.Utils$.inLock(Utils.scala:535)
> at
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThr
> e
> ad.scala:108)
> at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> Locked ownable synchronizers:
> - locked <199dc92d> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> 3. Producer stops producing, in trace mode we notice it's handling 0
> events and Thread dump as follows:
> "ProducerSendThread--0" - Thread t@53
>   java.lang.Thread.State: RUNNABLE
> at 

[jira] [Comment Edited] (KAFKA-3238) Deadlock Mirrormaker consumer not fetching any messages

2016-02-12 Thread Rekha Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145786#comment-15145786
 ] 

Rekha Joshi edited comment on KAFKA-3238 at 2/13/16 4:24 AM:
-

Hi,
 [~nehanarkhede] [~junrao] [~jkreps] :
I have seen few similar deadlock related issues on kafka - KAFKA-914 , 
KAFKA-702 and was wondering if it relates to datastructures used/locking 
pattern/JDK related. For eg: http://bugs.java.com/view_bug.do?bug_id=7011862 
Please let me know your analysis/thoughts on the mirrormaker issue here? thanks!
Thanks
Rekha


was (Author: rekhajoshm):
Hi,
 [~nehanarkhede] [~junrao] [~jkreps] :
I have seen few deadlock related issues on kafka - KAFKA-914 , KAFKA-702 and 
was wondering if it relates to datastructures used/locking pattern/JDK related. 
For eg: http://bugs.java.com/view_bug.do?bug_id=7011862 Please let me know your 
analysis/thoughts on the mirrormaker issue here? thanks!
Thanks
Rekha

> Deadlock Mirrormaker consumer not fetching any messages
> ---
>
> Key: KAFKA-3238
> URL: https://issues.apache.org/jira/browse/KAFKA-3238
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Rekha Joshi
>Priority: Critical
>
> Hi,
> We have been seeing consistent issue mirroring between our DataCenters 
> happening randomly.Below are the details.
> Thanks
> Rekha
> {code}
> Source: AWS (13 Brokers)
> Destination: OTHER-DC (20 Brokers)
> Topic: mirrortest (source: 260 partitions, destination: 200 partitions)
> Connectivity: AWS Direct Connect (max 6Gbps)
> Data details: Source is receiving 40,000 msg/sec, each message is around
> 5KB
> Mirroring
> 
> Node: is at our OTHER-DC (24 Cores, 48 GB Memory)
> JVM Details: 1.8.0_51, -Xmx8G -Xms8G -XX:PermSize=96m -XX:MaxPermSize=96m
> -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> -XX:InitiatingHeapOccupancyPercent=35
> Launch script: kafka.tools.MirrorMaker --consumer.config
> consumer.properties --producer.config producer.properties --num.producers
> 1 --whitelist mirrortest --num.streams 1 --queue.size 10
> consumer.properties
> ---
> zookeeper.connect=
> group.id=KafkaMirror
> auto.offset.reset=smallest
> fetch.message.max.bytes=900
> zookeeper.connection.timeout.ms=6
> rebalance.max.retries=4
> rebalance.backoff.ms=5000
> producer.properties
> --
> metadata.broker.list=
> partitioner.class=
> producer.type=async
> When we start the mirroring job everything works fine as expected,
> Eventually we hit an issue where the job stops consuming no more.
> At this stage:
> 1. No Error seen in the mirrormaker logs
> 2. consumer threads are not fetching any messages and we see thread dumps
> as follows:
> "ConsumerFetcherThread-KafkaMirror_1454375262613-7b760d87-0-26" - Thread
> t@73
> java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <79b6d3ce> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awa
> i
> t(AbstractQueuedSynchronizer.java:2039)
> at
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350
> )
> at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
> at
> kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcher
> T
> hread.scala:49)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu
> n
> $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfu
> n
> $apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109)
> at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
> at
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$m
> c
> V$sp(AbstractFetcherThread.scala:109)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A
> b
> stractFetcherThread.scala:109)
> at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(A
> b
> stractFetcherThread.scala:109)
> at kafka.utils.Utils$.inLock(Utils.scala:535)
> at
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThr
> e
> ad.scala:108)
> at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> Locked ownable synchronizers:
> - locked <199dc92d> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> 3. Producer stops producing, in trace mode we notice it's handling 0
> events and Thread dump as follows:
> "ProducerSendThread--0" - Thread