Re: Writing RDDs to HDFS

Ognen Duzlevski Mon, 24 Mar 2014 16:49:27 -0700

Just so I can close this thread (in case anyone else runs into thisstuff) - I did sleep through the basics of Spark ;). The answer on whymy job is in waiting state (hanging) is here:http://spark.incubator.apache.org/docs/latest/spark-standalone.html#resource-scheduling


Ognen


On 3/24/14, 5:01 PM, Diana Carroll wrote:

Ongen:
I don't know why your process is hanging, sorry. But I do know thatthe way saveAsTextFile works is that you give it a path to adirectory, not a file. The "file" is saved in multiple parts,corresponding to the partitions. (part-00000, part-00001 etc.)
(Presumably it does this because it allows each partition to be savedon the local disk, to minimize network traffic. It's how Hadoopworks, too.)
On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski<og...@nengoiksvelzud.com <mailto:og...@nengoiksvelzud.com>> wrote:
    Is
    someRDD.saveAsTextFile("hdfs://ip:port/path/final_filename.txt")
    supposed to work? Meaning, can I save files to the HDFS fs this way?

    I tried:

    val r = sc.parallelize(List(1,2,3,4,5,6,7,8))
    r.saveAsTextFile("hdfs://ip:port/path/file.txt")

    and it is just hanging. At the same time on my HDFS it created
    file.txt but as a directory which has subdirectories (the final
    one is empty).

    Thanks!
    Ognen


--
"A distributed system is one in which the failure of a computer you didn't even know 
existed can render your own computer unusable"
-- Leslie Lamport

Re: Writing RDDs to HDFS

Reply via email to