The code that hashes the field values is here:
https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24
You can write a little java program, something like:
public static void main(String[] args) {
ArrayList<String> myList = new ArrayList<String>();
myList.add("first field value");
myList.add("second field value");
int hash = Arrays.deephashCode(myList.toArray()); // as in tuple.clj
System.out.println("hash is "+hash);
int numTasks = 32;
System.out.println("task index is " + hash % numTasks);
}
There are certain types of values that may not hash consistently. If you are
using String values, then it should be fine. Other types may or may not,
depending on how the class implements hashCode().
--
Derek
________________________________
From: Kashyap Mhaisekar <[email protected]>
To: [email protected]
Sent: Tuesday, September 29, 2015 4:28 PM
Subject: Field Group Hash Computation
Hi,
I have a field grouping based on 2 fields. I have 32 consumers for the tuple
and I see most of the times, out of 64 bolts, the field group is always on 8 of
them. Of the 8, 2 have more than 60% of the data. The data for the field
grouping can have 20 different combinations.
Do you know what is the way to compute the Hash of the fields used for
computing? One of the groups mails indicate that the approach is -
It calls "hashCode" on the list of selected values and mods it by the
number of consumer tasks. You can play around with that function to see if
something about your data is causing something degenerative to happen and
cause skew
I saw the clojure code but not sure how to understand this.
Thanks
Kashyap