Thanks Derek. I use strings and I still end up with some bolts having the
maximum requests :(

On Tue, Sep 29, 2015 at 5:03 PM, Derek Dagit <[email protected]> wrote:

> The code that hashes the field values is here:
>
>
> https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24
>
>
> You can write a little java program, something like:
>
> public static void main(String[] args) {
>   ArrayList<String> myList = new ArrayList<String>();
>      myList.add("first field value");
>   myList.add("second field value");
>
>   int hash = Arrays.deephashCode(myList.toArray()); // as in tuple.clj
>
>
>   System.out.println("hash is "+hash);
>   int numTasks = 32;
>
>   System.out.println("task index is " + hash % numTasks);
>
> }
>
>
> There are certain types of values that may not hash consistently.  If you
> are using String values, then it should be fine. Other types may or may
> not, depending on how the class implements hashCode().
>
>
> --
> Derek
>
>
> ________________________________
> From: Kashyap Mhaisekar <[email protected]>
> To: [email protected]
> Sent: Tuesday, September 29, 2015 4:28 PM
> Subject: Field Group Hash Computation
>
>
>
> Hi,
> I have a field grouping based on 2 fields. I have 32 consumers for the
> tuple and I see most of the times, out of 64 bolts, the field group is
> always on 8 of them. Of the 8, 2 have more than 60% of the data. The data
> for the field grouping can have 20 different combinations.
>
> Do you know what is the way to compute the Hash of the fields used for
> computing? One of the groups mails indicate that the approach is -
>
> It calls "hashCode" on the list of selected values and mods it by the
> number of consumer tasks. You can play around with that function to see if
> something about your data is causing something degenerative to happen and
> cause skew
>
> I saw the clojure code but not sure how to understand this.
>
> Thanks
> Kashyap
>

Reply via email to