Re: Thread spilling sort issue with single task

2021-01-26 Thread German Schiavon
Well if your data is skewed I don't think it can be avoided but mitigated using skew techniques. I'd recommend you to take a look at "salted join" maybe. On Tue, 26 Jan 2021 at 11:29, rajat kumar wrote: > Hi , > > Yes I understand its skew based problem but how can it be avoided . Could >

Re: Thread spilling sort issue with single task

2021-01-26 Thread rajat kumar
Hi , Yes I understand its skew based problem but how can it be avoided . Could you please suggest? I am in Spark2.4 Thanks Rajat On Tue, Jan 26, 2021 at 3:58 PM German Schiavon wrote: > Hi, > > One word : SKEW > > It seems the classic skew problem, you would have to apply skew techniques >

Re: Thread spilling sort issue with single task

2021-01-26 Thread German Schiavon
Hi, One word : SKEW It seems the classic skew problem, you would have to apply skew techniques to repartition your data properly or if you are in spark 3.0+ try the skewJoin optimization. On Tue, 26 Jan 2021 at 11:20, rajat kumar wrote: > Hi Everyone, > > I am running a spark application

Thread spilling sort issue with single task

2021-01-26 Thread rajat kumar
Hi Everyone, I am running a spark application where I have applied 2 left joins. 1st join in Broadcast and another one is normal. Out of 200 tasks , last 1 task is stuck . It is running at "ANY" Locality level. It seems data skewness issue. It is doing too much spill and shuffle write is too