The full file on all the machines or just write the partitions that are
already on each machine to disk?
If the latter, try rdd.saveAsTextFile("file:///tmp/mydata")
On Tue, Feb 11, 2014 at 9:39 AM, David Thomas <[email protected]> wrote:
> I want it to be available on all machines in the cluster.
>
>
> On Tue, Feb 11, 2014 at 10:35 AM, Andrew Ash <[email protected]> wrote:
>
>> Do you want the files scattered across the local temp directories of all
>> your machines or just one of them? If just one, I'd recommend having your
>> driver program execute hadoop fs -getmerge /path/to/files... using Scala's
>> external process libraries.
>>
>>
>> On Tue, Feb 11, 2014 at 9:18 AM, David Thomas <[email protected]>wrote:
>>
>>> I'm trying to copy a file from hdfs to a temp local directory within a
>>> map function using static method of FileUtil and I get the below error. Is
>>> there a way to get around this?
>>>
>>> org.apache.spark.SparkException: Job aborted: Task not serializable:
>>> java.io.NotSerializableException: org.apache.hadoop.fs.Path
>>> at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>>>
>>
>>
>