Re: spark sql data skew

Jean Georges Perrin Fri, 13 Jul 2018 03:26:12 -0700

Just thinking out loud… repartition by key? create a composite key based on 
company and userid?


How big is your dataset?

> On Jul 13, 2018, at 06:20, 崔苗 <cuim...@danale.com> wrote:
> 
> Hi,
> when I want to count(distinct userId) by company，I met the data skew and the 
> task takes too long time，how to count distinct by keys on skew data in spark 
> sql ?
> 
> thanks for any reply
>

Re: spark sql data skew

Reply via email to