Re: rdd.saveAsTextFile problem

Andrew Ash Thu, 02 Jan 2014 10:26:21 -0800

For testing, maybe try using .collect and doing the comparison between
expected and actual in memory rather than on disk?



On Thu, Jan 2, 2014 at 12:54 PM, Philip Ogren <[email protected]>wrote:

>  I just tried your suggestion and get the same results with the _temporary
> directory.  Thanks though.
>
>
> On 1/2/2014 10:28 AM, Andrew Ash wrote:
>
> You want to write it to a local file on the machine?  Try using
> "file:///path/to/target/mydir/" instead
>
>  I'm not sure what behavior would be if you did this on a multi-machine
> cluster though -- you may get a bit of data on each machine in that local
> directory.
>
>
> On Thu, Jan 2, 2014 at 12:22 PM, Philip Ogren <[email protected]>wrote:
>
>> I have a very simple Spark application that looks like the following:
>>
>>
>> var myRdd: RDD[Array[String]] = initMyRdd()
>> println(myRdd.first.mkString(", "))
>> println(myRdd.count)
>>
>> myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
>> myRdd.saveAsTextFile("target/mydir/")
>>
>>
>> The println statements work as expected.  The first saveAsTextFile
>> statement also works as expected.  The second saveAsTextFile statement does
>> not (even if the first is commented out.)  I get the exception pasted
>> below.  If I inspect "target/mydir" I see that there is a directory called
>> _temporary/0/_temporary/attempt_201401020953_0000_m_000000_1 which contains
>> an empty part-00000 file.  It's curious because this code worked before
>> with Spark 0.8.0 and now I am running on Spark 0.8.1. I happen to be
>> running this on Windows in "local" mode at the moment.  Perhaps I should
>> try running it on my linux box.
>>
>> Thanks,
>> Philip
>>
>>
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted:
>> Task 2.0:0 failed more than 0 times; aborting job
>> java.lang.NullPointerException
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>>     at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>     at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>>     at
>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>>     at org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>>
>>
>>
>
>

Re: rdd.saveAsTextFile problem

Reply via email to