subject:"Re\: spark sql data skew"

Re: Re: Re: spark sql data skew

2018-07-23 Thread Gourav Sengupta

https://docs.databricks.com/spark/latest/spark-sql/skew-join.html The above might help, in case you are using a join. On Mon, Jul 23, 2018 at 4:49 AM, 崔苗 wrote: > but how to get count(distinct userId) group by company from count(distinct > userId) group by company+x? > count(userId) is

Re:Re: Re: spark sql data skew

2018-07-22 Thread 崔苗

but how to get count(distinct userId) group by company from count(distinct userId) group by company+x? count(userId) is different from count(distinct userId) 在 2018-07-21 00:49:58，Xiaomeng Wan 写道： try divide and conquer, create a column x for the fist character of userid, and group by

Re: Re: spark sql data skew

2018-07-20 Thread Xiaomeng Wan

try divide and conquer, create a column x for the fist character of userid, and group by company+x. if still too large, try first two character. On 17 July 2018 at 02:25, 崔苗 wrote: > 30Ｇ user data, how to get distinct users count after creating a composite > key based on company and userid? > >

Re: spark sql data skew

2018-07-13 Thread Jean Georges Perrin

Just thinking out loud… repartition by key? create a composite key based on company and userid? How big is your dataset? > On Jul 13, 2018, at 06:20, 崔苗 wrote: > > Hi, > when I want to count(distinct userId) by company，I met the data skew and the > task takes too long time，how to count

Re: Re: Re: spark sql data skew

Re:Re: Re: spark sql data skew

Re: Re: spark sql data skew

Re: spark sql data skew

4 matches

Site Navigation

Mail list logo

Footer information