Diana, thanks. I am not very well acquainted with HDFS. I use hdfs -put to put things as files into the filesystem (and sc.textFile to get stuff out of them in Spark) and I see that they appear to be saved as files that are replicated across 3 out of the 16 nodes in the hdfs cluster (which is my case is also my Spark cluster) -> hence, I was puzzled why a directory this time. What you are saying makes sense, I suppose. As for the hanging - I am curious about that myself.

Ognen

On 3/24/14, 5:01 PM, Diana Carroll wrote:
Ongen:

I don't know why your process is hanging, sorry. But I do know that the way saveAsTextFile works is that you give it a path to a directory, not a file. The "file" is saved in multiple parts, corresponding to the partitions. (part-00000, part-00001 etc.)

(Presumably it does this because it allows each partition to be saved on the local disk, to minimize network traffic. It's how Hadoop works, too.)




On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski <og...@nengoiksvelzud.com <mailto:og...@nengoiksvelzud.com>> wrote:

    Is
    someRDD.saveAsTextFile("hdfs://ip:port/path/final_filename.txt")
    supposed to work? Meaning, can I save files to the HDFS fs this way?

    I tried:

    val r = sc.parallelize(List(1,2,3,4,5,6,7,8))
    r.saveAsTextFile("hdfs://ip:port/path/file.txt")

    and it is just hanging. At the same time on my HDFS it created
    file.txt but as a directory which has subdirectories (the final
    one is empty).

    Thanks!
    Ognen


Reply via email to