I just tried your suggestion and get the same results with the _temporary directory. Thanks though.

On 1/2/2014 10:28 AM, Andrew Ash wrote:
You want to write it to a local file on the machine? Try using "file:///path/to/target/mydir/" instead

I'm not sure what behavior would be if you did this on a multi-machine cluster though -- you may get a bit of data on each machine in that local directory.


On Thu, Jan 2, 2014 at 12:22 PM, Philip Ogren <[email protected] <mailto:[email protected]>> wrote:

    I have a very simple Spark application that looks like the following:


    var myRdd: RDD[Array[String]] = initMyRdd()
    println(myRdd.first.mkString(", "))
    println(myRdd.count)

    myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
    myRdd.saveAsTextFile("target/mydir/")


    The println statements work as expected.  The first saveAsTextFile
    statement also works as expected.  The second saveAsTextFile
    statement does not (even if the first is commented out.)  I get
    the exception pasted below.  If I inspect "target/mydir" I see
    that there is a directory called
    _temporary/0/_temporary/attempt_201401020953_0000_m_000000_1 which
    contains an empty part-00000 file.  It's curious because this code
    worked before with Spark 0.8.0 and now I am running on Spark
    0.8.1. I happen to be running this on Windows in "local" mode at
    the moment.  Perhaps I should try running it on my linux box.

    Thanks,
    Philip


    Exception in thread "main" org.apache.spark.SparkException: Job
    aborted: Task 2.0:0 failed more than 0 times; aborting job
    java.lang.NullPointerException
        at
    
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
        at
    
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
        at
    
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
        at
    scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at
    org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
        at
    org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
        at org.apache.spark.scheduler.DAGScheduler.org
    
<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
        at
    org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)




Reply via email to