The code that hashes the field values is here:

https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24


You can write a little java program, something like:

public static void main(String[] args) {
  ArrayList<String> myList = new ArrayList<String>();
     myList.add("first field value");
  myList.add("second field value");

  int hash = Arrays.deephashCode(myList.toArray()); // as in tuple.clj


  System.out.println("hash is "+hash);
  int numTasks = 32;

  System.out.println("task index is " + hash % numTasks);

}


There are certain types of values that may not hash consistently.  If you are 
using String values, then it should be fine. Other types may or may not, 
depending on how the class implements hashCode().

 
-- 
Derek


________________________________
From: Kashyap Mhaisekar <[email protected]>
To: [email protected] 
Sent: Tuesday, September 29, 2015 4:28 PM
Subject: Field Group Hash Computation



Hi,
I have a field grouping based on 2 fields. I have 32 consumers for the tuple 
and I see most of the times, out of 64 bolts, the field group is always on 8 of 
them. Of the 8, 2 have more than 60% of the data. The data for the field 
grouping can have 20 different combinations.

Do you know what is the way to compute the Hash of the fields used for 
computing? One of the groups mails indicate that the approach is -

It calls "hashCode" on the list of selected values and mods it by the 
number of consumer tasks. You can play around with that function to see if 
something about your data is causing something degenerative to happen and 
cause skew

I saw the clojure code but not sure how to understand this.

Thanks
Kashyap

Reply via email to