Re: Re: spark sql data skew

Xiaomeng Wan Fri, 20 Jul 2018 09:50:10 -0700

try divide and conquer, create a column x for the fist character of userid,
and group by company+x. if still too large, try first two character.


On 17 July 2018 at 02:25, 崔苗 <cuim...@danale.com> wrote:

> 30Ｇ user data, how to get distinct users count after creating a composite
> key based on company and userid?
>
>
> 在 2018-07-13 18:24:52，Jean Georges Perrin <j...@jgp.net> 写道：
>
> Just thinking out loud… repartition by key? create a composite key based
> on company and userid?
>
> How big is your dataset?
>
> On Jul 13, 2018, at 06:20, 崔苗 <cuim...@danale.com> wrote:
>
> Hi,
> when I want to count(distinct userId) by company，I met the data skew and
> the task takes too long time，how to count distinct by keys on skew data in
> spark sql ?
>
> thanks for any reply
>
>
>
>

Re: Re: spark sql data skew

Reply via email to