I'm using a pretty recent version of Spark (> 0.8) from Github and it's
failing with the following exception for a very simple task on the
spark-shell.
*scala> val file = sc.textFile("hdfs://.......")*
file: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
<console>:12
*scala> val errors = file.filter(line => line.contains("sometext"))*
errors: org.apache.spark.rdd.RDD[String] = FilteredRDD[2] at filter at
<console>:14
*scala> errors.count()*
org.apache.spark.SparkException: Job aborted: Task 0.0:32 failed more than
4 times
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:819)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:817)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:817)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:432)
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:494)
at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:158)