Well if your data is skewed I don't think it can be avoided but mitigated
using skew techniques.
I'd recommend you to take a look at "salted join" maybe.
On Tue, 26 Jan 2021 at 11:29, rajat kumar
wrote:
> Hi ,
>
> Yes I understand its skew based problem but how can it be avoided . Could
>
Hi ,
Yes I understand its skew based problem but how can it be avoided . Could
you please suggest?
I am in Spark2.4
Thanks
Rajat
On Tue, Jan 26, 2021 at 3:58 PM German Schiavon
wrote:
> Hi,
>
> One word : SKEW
>
> It seems the classic skew problem, you would have to apply skew techniques
>
Hi,
One word : SKEW
It seems the classic skew problem, you would have to apply skew techniques
to repartition your data properly or if you are in spark 3.0+ try the
skewJoin optimization.
On Tue, 26 Jan 2021 at 11:20, rajat kumar
wrote:
> Hi Everyone,
>
> I am running a spark application
Hi Everyone,
I am running a spark application where I have applied 2 left joins. 1st
join in Broadcast and another one is normal.
Out of 200 tasks , last 1 task is stuck . It is running at "ANY" Locality
level. It seems data skewness issue.
It is doing too much spill and shuffle write is too