Just thinking out loud… repartition by key? create a composite key based on company and userid?
How big is your dataset? > On Jul 13, 2018, at 06:20, 崔苗 <cuim...@danale.com> wrote: > > Hi, > when I want to count(distinct userId) by company,I met the data skew and the > task takes too long time,how to count distinct by keys on skew data in spark > sql ? > > thanks for any reply >