Gianmarco De Francisci Morales created STORM-632:
----------------------------------------------------

             Summary: New grouping for better load balancing
                 Key: STORM-632
                 URL: https://issues.apache.org/jira/browse/STORM-632
             Project: Apache Storm
          Issue Type: New Feature
            Reporter: Gianmarco De Francisci Morales


Hi,

We have recently studied the problem of load balancing in Storm [1].
In particular, we focused on what happens when the key distribution of the 
stream is skewed when using key grouping.
We developed a new stream partitioning scheme (which we call Partial Key 
Grouping). It achieves better load balancing than key grouping while being more 
scalable than shuffle grouping in terms of memory.

In the paper we show a number of mining algorithms that are easy to implement 
with partial key grouping, and whose performance can benefit from it. We think 
that it might also be useful for a larger class of algorithms.

We don't have experience in Clojure, however partial key grouping is very easy 
to implement: it requires just a few lines of code in Java when implemented as 
a custom grouping in Storm [2].
We believe it should be very easy to port from Java.

For all these reasons, we believe it will be a nice addition to the standard 
groupings available in Storm. If the community thinks it's a good idea, we will 
be happy to offer support in the porting.

References:
[1] 
https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
[2] https://github.com/gdfm/partial-key-grouping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to