from:"Galen Marchetti"

Re: Handling Skewness and Heterogeneity

2017-02-14 Thread Galen Marchetti

t; > However, I am looking for something for heterogeneous cluster for which > the distribution is not known in prior. > > Cheers, > Anis > > > On Tue, 14 Feb 2017 at 20:19, Galen Marchetti > wrote: > >> Anis, >> >> I've typically seen people

Re: Handling Skewness and Heterogeneity

2017-02-14 Thread Galen Marchetti

Anis, I've typically seen people handle skew by seeding the keys corresponding to high volumes with random values, then partitioning the dataset based on the original key *and* the random value, then reducing. Ex: ( , ) -> ( , , ) This transformation reduces the size of the huge partition, mak