[ https://issues.apache.org/jira/browse/SPARK-15408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cody Koeninger closed SPARK-15408. ---------------------------------- Resolution: Cannot Reproduce > Spark streaming app crashes with NotLeaderForPartitionException > ---------------------------------------------------------------- > > Key: SPARK-15408 > URL: https://issues.apache.org/jira/browse/SPARK-15408 > Project: Spark > Issue Type: Bug > Components: Streaming > Affects Versions: 1.6.0 > Environment: Ubuntu 64 bit > Reporter: Johny Mathew > Priority: Critical > > We have a spark streaming application reading from kafka (with Kafka Direct > API) and it crashed with the exception shown in the next paragraph. We have a > 5 node kafka cluster with 19 partitions (replication factor 3). Even though > the the spark application crashed the other kafka consumer apps were running > fine. Only one of the 5 kafka node was not working correctly (it did not go > down) > /opt/hadoop/bin/yarn application -status application_1463151451543_0007 > 16/05/13 20:09:56 INFO client.RMProxy: Connecting to ResourceManager at > /172.16.130.189:8050 > Application Report : > Application-Id : application_1463151451543_0007 > Application-Name : com.ibm.alchemy.eventgen.EventGenMetrics > Application-Type : SPARK > User : stack > Queue : default > Start-Time : 1463155034571 > Finish-Time : 1463155310520 > Progress : 100% > State : FINISHED > Final-State : FAILED > Tracking-URL : N/A > RPC Port : 0 > AM Host : 172.16.130.188 > Aggregate Resource Allocation : 9562329 MB-seconds, 2393 vcore-seconds > Diagnostics : User class threw exception: > org.apache.spark.SparkException: > ArrayBuffer(kafka.common.NotLeaderForPartitionException, > kafka.common.NotLeaderForPartitionException, > kafka.common.NotLeaderForPartitionException, > kafka.common.NotLeaderForPartitionException, > kafka.common.NotLeaderForPartitionException, > kafka.common.NotLeaderForPartitionException, > kafka.common.NotLeaderForPartitionException, > kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: > Couldn't find leader offsets for Set([alchemy-metrics,17], > [alchemy-metrics,10], [alchemy-metrics,3], [alchemy-metrics,4], > [alchemy-metrics,9], [alchemy-metrics,15], [alchemy-metrics,18], > [alchemy-metrics,5])) > We cleared checkpoint and started the application but it crashed again. Then > at the end we found out the misbehaving kafka node and restarted it which > fixed the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org