Re: Writing RDDs to HDFS

Ognen Duzlevski Tue, 25 Mar 2014 20:18:08 -0700

Well, my long running app has 512M per executor on a 16 node clusterwhere each machine has 16G of RAM. I could not run a second applicationuntil I restricted the spark.cores.max. As soon as I restricted thecores, I am able to run a second job at the same time.


Ognen


On 3/24/14, 7:46 PM, Yana Kadiyska wrote:

Ognen, can you comment if you were actually able to run two jobs
concurrently with just restricting spark.cores.max? I run Shark on the
same cluster and was not able to see a standalone job get in (since
Shark is a "long running" job) until I restricted both spark.cores.max
_and_ spark.executor.memory. Just curious if I did something wrong.

On Mon, Mar 24, 2014 at 7:48 PM, Ognen Duzlevski
<og...@plainvanillagames.com> wrote:

Just so I can close this thread (in case anyone else runs into this stuff) -
I did sleep through the basics of Spark ;). The answer on why my job is in
waiting state (hanging) is here:
http://spark.incubator.apache.org/docs/latest/spark-standalone.html#resource-scheduling


Ognen

On 3/24/14, 5:01 PM, Diana Carroll wrote:

Ongen:

I don't know why your process is hanging, sorry.  But I do know that the way
saveAsTextFile works is that you give it a path to a directory, not a file.
The "file" is saved in multiple parts, corresponding to the partitions.
(part-00000, part-00001 etc.)

(Presumably it does this because it allows each partition to be saved on the
local disk, to minimize network traffic.  It's how Hadoop works, too.)




On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski <og...@nengoiksvelzud.com>
wrote:

Is someRDD.saveAsTextFile("hdfs://ip:port/path/final_filename.txt")
supposed to work? Meaning, can I save files to the HDFS fs this way?

I tried:

val r = sc.parallelize(List(1,2,3,4,5,6,7,8))
r.saveAsTextFile("hdfs://ip:port/path/file.txt")

and it is just hanging. At the same time on my HDFS it created file.txt
but as a directory which has subdirectories (the final one is empty).

Thanks!
Ognen

Re: Writing RDDs to HDFS

Reply via email to