from:"\"Anis Nasir\""

Re: skewed data in join

2017-02-16 Thread Anis Nasir

You can also so something similar to what is mentioned in [1]. The basic idea is to use two hash functions for each key and assigning it to the least loaded of the two hashed worker. Cheers, Anis [1]. https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancin

Re: Handling Skewness and Heterogeneity

2017-02-14 Thread Anis Nasir

formation reduces the size of the huge partition, making it > tenable for spark, as long as you can figure out logic for aggregating the > results of the seeded partitions together again. > > On Tue, Feb 14, 2017 at 12:01 PM, Anis Nasir wrote: > > Dear All, > > I have fe

Handling Skewness and Heterogeneity

2017-02-14 Thread Anis Nasir

Dear All, I have few use cases for spark streaming where spark cluster consist of heterogenous machines. Additionally, there is skew present in both the input distribution (e.g., each tuple is drawn from a zipf distribution) and the service time (e.g., service time required for each tuple comes f

Re: skewed data in join

Re: Handling Skewness and Heterogeneity

Handling Skewness and Heterogeneity

3 matches

Site Navigation

Mail list logo

Footer information