I am going to load cache data in a set of worker bolts that I will be
sending data to with a fields grouping on one field, a string. I need to
be able to duplicate Storm's fieldsgrouping mod/hash so I can preload the
worker bolts with data. I tried just taking the hashCode() of the string
then performing a mod on that with the number of workers I have, but it's
not the same as the Storm calculation.
I searched and found the Storm 8 Clojure code that performs the mod hash
and have a note from Nathan but would like the actual algorithm in Java if
possible.
Any help would be appreciated.
Clojure mod/hash code:
1. (defn- mk-fields-grouper [^Fields out-fields ^Fields
group-fields num-tasks]
2. (fn [^List values]
3. (mod (tuple/list-hash-code (.select out-fields
group-fields values))
4. num-tasks)
5. ))
Note from Nathan:
It calls "hashCode" on the list of selected values and mods it by the
number of consumer tasks. You can play around with that function to see if
something about your data is causing something degenerative to happen and
cause skew.
Rick Rankin