[jira] [Commented] (SPARK-19275) Spark Streaming, Kafka receiver, "Failed to get records for ... after polling for 512"

2018-02-01 Thread Riccardo Vincelli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348996#comment-16348996
 ] 

Riccardo Vincelli commented on SPARK-19275:
---

Hi, I would like to point out that a timeout could always be a symptom of a 
long-sitting thread not getting enough cpu time - have a look at the thread 
dump and be suspicious of threads sitting RUNNABLE assigned to tasks which are 
not complex at all and usually take little time. Thanks,

> Spark Streaming, Kafka receiver, "Failed to get records for ... after polling 
> for 512"
> --
>
> Key: SPARK-19275
> URL: https://issues.apache.org/jira/browse/SPARK-19275
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
> Environment: Apache Spark 2.0.0, Kafka 0.10 for Scala 2.11
>Reporter: Dmitry Ochnev
>Priority: Major
>
> We have a Spark Streaming application reading records from Kafka 0.10.
> Some tasks are failed because of the following error:
> "java.lang.AssertionError: assertion failed: Failed to get records for (...) 
> after polling for 512"
> The first attempt fails and the second attempt (retry) completes 
> successfully, - this is the pattern that we see for many tasks in our logs. 
> These fails and retries consume resources.
> A similar case with a stack trace are described here:
> https://www.mail-archive.com/user@spark.apache.org/msg56564.html
> https://gist.github.com/SrikanthTati/c2e95c4ac689cd49aab817e24ec42767
> Here is the line from the stack trace where the error is raised:
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
> We tried several values for "spark.streaming.kafka.consumer.poll.ms", - 2, 5, 
> 10, 30 and 60 seconds, but the error appeared in all the cases except the 
> last one. Moreover, increasing the threshold led to increasing total Spark 
> stage duration.
> In other words, increasing "spark.streaming.kafka.consumer.poll.ms" led to 
> fewer task failures but with cost of total stage duration. So, it is bad for 
> performance when processing data streams.
> We have a suspicion that there is a bug in CachedKafkaConsumer (and/or other 
> related classes) which inhibits the reading process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19275) Spark Streaming, Kafka receiver, "Failed to get records for ... after polling for 512"

2017-09-14 Thread Karan Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166112#comment-16166112
 ] 

Karan Singh commented on SPARK-19275:
-

Hi Team , 
My Spark Streaming duration is 5 seconds (5000) and kafka is all at its default 
properties , i am still facing this issue , Can anyone tell me how to resolve 
it what i am doing wrong ?
JavaStreamingContext ssc = new JavaStreamingContext(sc, 
new Duration(5000));


Exception in Spark Streamings
Job aborted due to stage failure: Task 6 in stage 289.0 failed 4 times, most 
recent failure: Lost task 6.3 in stage 289.0 (xx, executor 2): 
java.lang.AssertionError: assertion failed: Failed to get records for 
spark-executor-xxx pulse 1 163684030 after polling for 512
at scala.Predef$.assert(Predef.scala:170)
at 
org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
at 
org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:227)
at 
org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
at 
scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:31)
at 
com.olacabs.analytics.engine.EngineManager$1$1.call(EngineManager.java:165)
at 
com.olacabs.analytics.engine.EngineManager$1$1.call(EngineManager.java:132)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at 
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

> Spark Streaming, Kafka receiver, "Failed to get records for ... after polling 
> for 512"
> --
>
> Key: SPARK-19275
> URL: https://issues.apache.org/jira/browse/SPARK-19275
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
> Environment: Apache Spark 2.0.0, Kafka 0.10 for Scala 2.11
>Reporter: Dmitry Ochnev
>
> We have a Spark Streaming application reading records from Kafka 0.10.
> Some tasks are failed because of the following error:
> "java.lang.AssertionError: assertion failed: Failed to get records for (...) 
> after polling for 512"
> The first attempt fails and the second attempt (retry) completes 
> successfully, - this is the pattern that we see for many tasks in our logs. 
> These fails and retries consume resources.
> A similar case with a stack trace are described here:
> https://www.mail-archive.com/user@spark.apache.org/msg56564.html
> https://gist.github.com/SrikanthTati/c2e95c4ac689cd49aab817e24ec42767
> Here is the line from the stack trace where the error is raised:
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
> We tried several values for "spark.streaming.kafka.consumer.poll.ms", - 2, 5, 
> 10, 30 and 60 seconds, but the error appeared in all the cases except the 
> last one. Moreover, increasing the threshold led to increasing total Spark 
> stage duration.
> In other words, increasing "spark.streaming.kafka.consumer.poll.ms" led to 
> fewer task failures but with cost of total stage duration. So, it is bad for 
> performance when processing data streams.
> We have a suspicion that there is a bug in CachedKafkaConsumer (and/or other 
> related classes) which inhibits the reading process.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19275) Spark Streaming, Kafka receiver, "Failed to get records for ... after polling for 512"

2017-05-18 Thread Aaquib Khwaja (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015591#comment-16015591
 ] 

Aaquib Khwaja commented on SPARK-19275:
---

Hi [~dmitry_iii], I also ran into a similar issue.
I've set the value of 'spark.streaming.kafka.consumer.poll.ms' as 6, but 
i'm still running into issues.
Here is the stack trace and other details: 
http://stackoverflow.com/questions/44045323/sparkstreamingkafka-failed-to-get-records-after-polling-for-6

Also, below are some relevant configs:
batch.interval = 60s
spark.streaming.kafka.consumer.poll.ms = 6
session.timeout.ms = 6 (default: 3)
heartbeat.interval.ms = 6000 (default: 3000)
request.timeout.ms = 9 (default: 4)

Any help would be great !
Thanks.

> Spark Streaming, Kafka receiver, "Failed to get records for ... after polling 
> for 512"
> --
>
> Key: SPARK-19275
> URL: https://issues.apache.org/jira/browse/SPARK-19275
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
> Environment: Apache Spark 2.0.0, Kafka 0.10 for Scala 2.11
>Reporter: Dmitry Ochnev
>
> We have a Spark Streaming application reading records from Kafka 0.10.
> Some tasks are failed because of the following error:
> "java.lang.AssertionError: assertion failed: Failed to get records for (...) 
> after polling for 512"
> The first attempt fails and the second attempt (retry) completes 
> successfully, - this is the pattern that we see for many tasks in our logs. 
> These fails and retries consume resources.
> A similar case with a stack trace are described here:
> https://www.mail-archive.com/user@spark.apache.org/msg56564.html
> https://gist.github.com/SrikanthTati/c2e95c4ac689cd49aab817e24ec42767
> Here is the line from the stack trace where the error is raised:
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
> We tried several values for "spark.streaming.kafka.consumer.poll.ms", - 2, 5, 
> 10, 30 and 60 seconds, but the error appeared in all the cases except the 
> last one. Moreover, increasing the threshold led to increasing total Spark 
> stage duration.
> In other words, increasing "spark.streaming.kafka.consumer.poll.ms" led to 
> fewer task failures but with cost of total stage duration. So, it is bad for 
> performance when processing data streams.
> We have a suspicion that there is a bug in CachedKafkaConsumer (and/or other 
> related classes) which inhibits the reading process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19275) Spark Streaming, Kafka receiver, "Failed to get records for ... after polling for 512"

2017-01-19 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830976#comment-15830976
 ] 

Shixiong Zhu commented on SPARK-19275:
--

This error usually means Spark cannot fetch records from Kafka brokers in time. 
Could you check the Kafka side to see if there is any long pause GC?

> Spark Streaming, Kafka receiver, "Failed to get records for ... after polling 
> for 512"
> --
>
> Key: SPARK-19275
> URL: https://issues.apache.org/jira/browse/SPARK-19275
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
> Environment: Apache Spark 2.0.0, Kafka 0.10 for Scala 2.11
>Reporter: Dmitry Ochnev
>
> We have a Spark Streaming application reading records from Kafka 0.10.
> Some tasks are failed because of the following error:
> "java.lang.AssertionError: assertion failed: Failed to get records for (...) 
> after polling for 512"
> The first attempt fails and the second attempt (retry) completes 
> successfully, - this is the pattern that we see for many tasks in our logs. 
> These fails and retries consume resources.
> A similar case with a stack trace are described here:
> https://www.mail-archive.com/user@spark.apache.org/msg56564.html
> https://gist.github.com/SrikanthTati/c2e95c4ac689cd49aab817e24ec42767
> Here is the line from the stack trace where the error is raised:
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
> We tried several values for "spark.streaming.kafka.consumer.poll.ms", - 2, 5, 
> 10, 30 and 60 seconds, but the error appeared in all the cases except the 
> last one. Moreover, increasing the threshold led to increasing total Spark 
> stage duration.
> In other words, increasing "spark.streaming.kafka.consumer.poll.ms" led to 
> fewer task failures but with cost of total stage duration. So, it is bad for 
> performance when processing data streams.
> We have a suspicion that there is a bug in CachedKafkaConsumer (and/or other 
> related classes) which inhibits the reading process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org