Hi, >From what I read, the default FieldGrouping did not balance the load as like ShuffleGrouping do. In this case, there is a discussion about custom Grouping implementation called partial key grouping where it have better balancing problem. Maybe it helps. https://github.com/gdfm/partial-key-grouping
On Wed, Sep 30, 2015 at 12:11 AM, Kashyap Mhaisekar <[email protected]> wrote: > Thanks Derek. I use strings and I still end up with some bolts having the > maximum requests :( > > On Tue, Sep 29, 2015 at 5:03 PM, Derek Dagit <[email protected]> wrote: > >> The code that hashes the field values is here: >> >> >> https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24 >> >> >> You can write a little java program, something like: >> >> public static void main(String[] args) { >> ArrayList<String> myList = new ArrayList<String>(); >> myList.add("first field value"); >> myList.add("second field value"); >> >> int hash = Arrays.deephashCode(myList.toArray()); // as in tuple.clj >> >> >> System.out.println("hash is "+hash); >> int numTasks = 32; >> >> System.out.println("task index is " + hash % numTasks); >> >> } >> >> >> There are certain types of values that may not hash consistently. If you >> are using String values, then it should be fine. Other types may or may >> not, depending on how the class implements hashCode(). >> >> >> -- >> Derek >> >> >> ________________________________ >> From: Kashyap Mhaisekar <[email protected]> >> To: [email protected] >> Sent: Tuesday, September 29, 2015 4:28 PM >> Subject: Field Group Hash Computation >> >> >> >> Hi, >> I have a field grouping based on 2 fields. I have 32 consumers for the >> tuple and I see most of the times, out of 64 bolts, the field group is >> always on 8 of them. Of the 8, 2 have more than 60% of the data. The data >> for the field grouping can have 20 different combinations. >> >> Do you know what is the way to compute the Hash of the fields used for >> computing? One of the groups mails indicate that the approach is - >> >> It calls "hashCode" on the list of selected values and mods it by the >> number of consumer tasks. You can play around with that function to see if >> something about your data is causing something degenerative to happen and >> cause skew >> >> I saw the clojure code but not sure how to understand this. >> >> Thanks >> Kashyap >> > >
