Hi,

If you get stuck in job fails, one of best practices is to increase
#partitions.
Also, you'd better off using DataFrame instread of RDD in terms of join
optimization.

// maropu


On Thu, May 26, 2016 at 11:40 PM, Priya Ch <learnings.chitt...@gmail.com>
wrote:

> Hello Team,
>
>
>  I am trying to perform join 2 rdds where one is of size 800 MB and the
> other is 190 MB. During the join step, my job halts and I don't see
> progress in the execution.
>
> This is the message I see on console -
>
> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
> locations for shuffle 0 to <hostname1>:40000
> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
> locations for shuffle 1 to <hostname2>:40000
>
> After these messages, I dont see any progress. I am using Spark 1.6.0
> version and yarn scheduler (running in YARN client mode). My cluster
> configurations is - 3 node cluster (1 master and 2 slaves). Each slave has
> 1 TB hard disk space, 300GB memory and 32 cores.
>
> HDFS block size is 128 MB.
>
> Thanks,
> Padma Ch
>



-- 
---
Takeshi Yamamuro

Reply via email to