30G user data, how to get distinct users count after creating a composite key 
based on company and userid?

在 2018-07-13 18:24:52,Jean Georges Perrin <j...@jgp.net> 写道:
Just thinking out loud… repartition by key? create a composite key based on 
company and userid? 

How big is your dataset?

On Jul 13, 2018, at 06:20, 崔苗 <cuim...@danale.com> wrote:

Hi,
when I want to count(distinct userId) by company,I met the data skew and the 
task takes too long time,how to count distinct by keys on skew data in spark 
sql ?


thanks for any reply











Reply via email to