Re: Writing RDDs to HDFS

Yana Kadiyska Mon, 24 Mar 2014 17:47:23 -0700

Ognen, can you comment if you were actually able to run two jobs
concurrently with just restricting spark.cores.max? I run Shark on the
same cluster and was not able to see a standalone job get in (since
Shark is a "long running" job) until I restricted both spark.cores.max
_and_ spark.executor.memory. Just curious if I did something wrong.


On Mon, Mar 24, 2014 at 7:48 PM, Ognen Duzlevski
<og...@plainvanillagames.com> wrote:
> Just so I can close this thread (in case anyone else runs into this stuff) -
> I did sleep through the basics of Spark ;). The answer on why my job is in
> waiting state (hanging) is here:
> http://spark.incubator.apache.org/docs/latest/spark-standalone.html#resource-scheduling
>
>
> Ognen
>
> On 3/24/14, 5:01 PM, Diana Carroll wrote:
>
> Ongen:
>
> I don't know why your process is hanging, sorry.  But I do know that the way
> saveAsTextFile works is that you give it a path to a directory, not a file.
> The "file" is saved in multiple parts, corresponding to the partitions.
> (part-00000, part-00001 etc.)
>
> (Presumably it does this because it allows each partition to be saved on the
> local disk, to minimize network traffic.  It's how Hadoop works, too.)
>
>
>
>
> On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski <og...@nengoiksvelzud.com>
> wrote:
>>
>> Is someRDD.saveAsTextFile("hdfs://ip:port/path/final_filename.txt")
>> supposed to work? Meaning, can I save files to the HDFS fs this way?
>>
>> I tried:
>>
>> val r = sc.parallelize(List(1,2,3,4,5,6,7,8))
>> r.saveAsTextFile("hdfs://ip:port/path/file.txt")
>>
>> and it is just hanging. At the same time on my HDFS it created file.txt
>> but as a directory which has subdirectories (the final one is empty).
>>
>> Thanks!
>> Ognen
>
>
>
> --
> "A distributed system is one in which the failure of a computer you didn't
> even know existed can render your own computer unusable"
> -- Leslie Lamport

Re: Writing RDDs to HDFS

Reply via email to