You want to write it to a local file on the machine? Try using "file:///path/to/target/mydir/" instead
I'm not sure what behavior would be if you did this on a multi-machine cluster though -- you may get a bit of data on each machine in that local directory. On Thu, Jan 2, 2014 at 12:22 PM, Philip Ogren <[email protected]>wrote: > I have a very simple Spark application that looks like the following: > > > var myRdd: RDD[Array[String]] = initMyRdd() > println(myRdd.first.mkString(", ")) > println(myRdd.count) > > myRdd.saveAsTextFile("hdfs://myserver:8020/mydir") > myRdd.saveAsTextFile("target/mydir/") > > > The println statements work as expected. The first saveAsTextFile > statement also works as expected. The second saveAsTextFile statement does > not (even if the first is commented out.) I get the exception pasted > below. If I inspect "target/mydir" I see that there is a directory called > _temporary/0/_temporary/attempt_201401020953_0000_m_000000_1 which > contains an empty part-00000 file. It's curious because this code worked > before with Spark 0.8.0 and now I am running on Spark 0.8.1. I happen to be > running this on Windows in "local" mode at the moment. Perhaps I should > try running it on my linux box. > > Thanks, > Philip > > > Exception in thread "main" org.apache.spark.SparkException: Job aborted: > Task 2.0:0 failed more than 0 times; aborting job > java.lang.NullPointerException > at org.apache.spark.scheduler.DAGScheduler$$anonfun$ > abortStage$1.apply(DAGScheduler.scala:827) > at org.apache.spark.scheduler.DAGScheduler$$anonfun$ > abortStage$1.apply(DAGScheduler.scala:825) > at scala.collection.mutable.ResizableArray$class.foreach( > ResizableArray.scala:60) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at org.apache.spark.scheduler.DAGScheduler.abortStage( > DAGScheduler.scala:825) > at org.apache.spark.scheduler.DAGScheduler.processEvent( > DAGScheduler.scala:440) > at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$ > scheduler$DAGScheduler$$run(DAGScheduler.scala:502) > at org.apache.spark.scheduler.DAGScheduler$$anon$1.run( > DAGScheduler.scala:157) > > >
