Re: Writing RDDs to HDFS

Ognen Duzlevski Mon, 24 Mar 2014 15:07:29 -0700

Diana, thanks. I am not very well acquainted with HDFS. I use hdfs -putto put things as files into the filesystem (and sc.textFile to get stuffout of them in Spark) and I see that they appear to be saved as filesthat are replicated across 3 out of the 16 nodes in the hdfs cluster(which is my case is also my Spark cluster) -> hence, I was puzzled whya directory this time. What you are saying makes sense, I suppose. Asfor the hanging - I am curious about that myself.


Ognen


On 3/24/14, 5:01 PM, Diana Carroll wrote:

Ongen:
I don't know why your process is hanging, sorry. But I do know thatthe way saveAsTextFile works is that you give it a path to adirectory, not a file. The "file" is saved in multiple parts,corresponding to the partitions. (part-00000, part-00001 etc.)
(Presumably it does this because it allows each partition to be savedon the local disk, to minimize network traffic. It's how Hadoopworks, too.)
On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski<og...@nengoiksvelzud.com <mailto:og...@nengoiksvelzud.com>> wrote:
    Is
    someRDD.saveAsTextFile("hdfs://ip:port/path/final_filename.txt")
    supposed to work? Meaning, can I save files to the HDFS fs this way?

    I tried:

    val r = sc.parallelize(List(1,2,3,4,5,6,7,8))
    r.saveAsTextFile("hdfs://ip:port/path/file.txt")

    and it is just hanging. At the same time on my HDFS it created
    file.txt but as a directory which has subdirectories (the final
    one is empty).

    Thanks!
    Ognen

Re: Writing RDDs to HDFS

Reply via email to