If you can use "partial key grouping" depends on your use case. Think
careful before you apply it...

Maybe you want to read the research paper about it. It clearly describes
when you can use it and when not:
https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf


-Matthias

On 09/30/2015 12:18 AM, Ken Danniswara wrote:
> Hi,
> 
> From what I read, the default FieldGrouping did not balance the load as
> like ShuffleGrouping do. In this case, there is a discussion about
> custom Grouping implementation called partial key grouping where it have
> better balancing problem. Maybe it
> helps. https://github.com/gdfm/partial-key-grouping
> 
> On Wed, Sep 30, 2015 at 12:11 AM, Kashyap Mhaisekar <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Thanks Derek. I use strings and I still end up with some bolts
>     having the maximum requests :(
> 
>     On Tue, Sep 29, 2015 at 5:03 PM, Derek Dagit <[email protected]
>     <mailto:[email protected]>> wrote:
> 
>         The code that hashes the field values is here:
> 
>         
> https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24
> 
> 
>         You can write a little java program, something like:
> 
>         public static void main(String[] args) {
>           ArrayList<String> myList = new ArrayList<String>();
>              myList.add("first field value");
>           myList.add("second field value");
> 
>           int hash = Arrays.deephashCode(myList.toArray()); // as in
>         tuple.clj
> 
> 
>           System.out.println("hash is "+hash);
>           int numTasks = 32;
> 
>           System.out.println("task index is " + hash % numTasks);
> 
>         }
> 
> 
>         There are certain types of values that may not hash
>         consistently.  If you are using String values, then it should be
>         fine. Other types may or may not, depending on how the class
>         implements hashCode().
> 
> 
>         --
>         Derek
> 
> 
>         ________________________________
>         From: Kashyap Mhaisekar <[email protected]
>         <mailto:[email protected]>>
>         To: [email protected] <mailto:[email protected]>
>         Sent: Tuesday, September 29, 2015 4:28 PM
>         Subject: Field Group Hash Computation
> 
> 
> 
>         Hi,
>         I have a field grouping based on 2 fields. I have 32 consumers
>         for the tuple and I see most of the times, out of 64 bolts, the
>         field group is always on 8 of them. Of the 8, 2 have more than
>         60% of the data. The data for the field grouping can have 20
>         different combinations.
> 
>         Do you know what is the way to compute the Hash of the fields
>         used for computing? One of the groups mails indicate that the
>         approach is -
> 
>         It calls "hashCode" on the list of selected values and mods it
>         by the
>         number of consumer tasks. You can play around with that function
>         to see if
>         something about your data is causing something degenerative to
>         happen and
>         cause skew
> 
>         I saw the clojure code but not sure how to understand this.
> 
>         Thanks
>         Kashyap
> 
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to