rdd.saveAsTextFile problem

Philip Ogren Thu, 02 Jan 2014 09:23:29 -0800

I have a very simple Spark application that looks like the following:



var myRdd: RDD[Array[String]] = initMyRdd()
println(myRdd.first.mkString(", "))
println(myRdd.count)

myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
myRdd.saveAsTextFile("target/mydir/")

The println statements work as expected. The first saveAsTextFilestatement also works as expected. The second saveAsTextFile statementdoes not (even if the first is commented out.) I get the exceptionpasted below. If I inspect "target/mydir" I see that there is adirectory called_temporary/0/_temporary/attempt_201401020953_0000_m_000000_1 whichcontains an empty part-00000 file. It's curious because this codeworked before with Spark 0.8.0 and now I am running on Spark 0.8.1. Ihappen to be running this on Windows in "local" mode at the moment.Perhaps I should try running it on my linux box.


Thanks,
Philip

Exception in thread "main" org.apache.spark.SparkException: Job aborted:Task 2.0:0 failed more than 0 times; aborting jobjava.lang.NullPointerExceptionatorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)atscala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

atorg.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)atorg.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)atorg.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)atorg.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)

rdd.saveAsTextFile problem

Reply via email to