Thanks Derek. I use strings and I still end up with some bolts having the maximum requests :(
On Tue, Sep 29, 2015 at 5:03 PM, Derek Dagit <[email protected]> wrote: > The code that hashes the field values is here: > > > https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24 > > > You can write a little java program, something like: > > public static void main(String[] args) { > ArrayList<String> myList = new ArrayList<String>(); > myList.add("first field value"); > myList.add("second field value"); > > int hash = Arrays.deephashCode(myList.toArray()); // as in tuple.clj > > > System.out.println("hash is "+hash); > int numTasks = 32; > > System.out.println("task index is " + hash % numTasks); > > } > > > There are certain types of values that may not hash consistently. If you > are using String values, then it should be fine. Other types may or may > not, depending on how the class implements hashCode(). > > > -- > Derek > > > ________________________________ > From: Kashyap Mhaisekar <[email protected]> > To: [email protected] > Sent: Tuesday, September 29, 2015 4:28 PM > Subject: Field Group Hash Computation > > > > Hi, > I have a field grouping based on 2 fields. I have 32 consumers for the > tuple and I see most of the times, out of 64 bolts, the field group is > always on 8 of them. Of the 8, 2 have more than 60% of the data. The data > for the field grouping can have 20 different combinations. > > Do you know what is the way to compute the Hash of the fields used for > computing? One of the groups mails indicate that the approach is - > > It calls "hashCode" on the list of selected values and mods it by the > number of consumer tasks. You can play around with that function to see if > something about your data is causing something degenerative to happen and > cause skew > > I saw the clojure code but not sure how to understand this. > > Thanks > Kashyap >
