t;
> However, I am looking for something for heterogeneous cluster for which
> the distribution is not known in prior.
>
> Cheers,
> Anis
>
>
> On Tue, 14 Feb 2017 at 20:19, Galen Marchetti
> wrote:
>
>> Anis,
>>
>> I've typically seen people
Anis,
I've typically seen people handle skew by seeding the keys corresponding to
high volumes with random values, then partitioning the dataset based on the
original key *and* the random value, then reducing.
Ex: ( , ) -> ( , ,
)
This transformation reduces the size of the huge partition, mak