Hi,

>From what I read, the default FieldGrouping did not balance the load as
like ShuffleGrouping do. In this case, there is a discussion about custom
Grouping implementation called partial key grouping where it have better
balancing problem. Maybe it helps.
https://github.com/gdfm/partial-key-grouping

On Wed, Sep 30, 2015 at 12:11 AM, Kashyap Mhaisekar <[email protected]>
wrote:

> Thanks Derek. I use strings and I still end up with some bolts having the
> maximum requests :(
>
> On Tue, Sep 29, 2015 at 5:03 PM, Derek Dagit <[email protected]> wrote:
>
>> The code that hashes the field values is here:
>>
>>
>> https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24
>>
>>
>> You can write a little java program, something like:
>>
>> public static void main(String[] args) {
>>   ArrayList<String> myList = new ArrayList<String>();
>>      myList.add("first field value");
>>   myList.add("second field value");
>>
>>   int hash = Arrays.deephashCode(myList.toArray()); // as in tuple.clj
>>
>>
>>   System.out.println("hash is "+hash);
>>   int numTasks = 32;
>>
>>   System.out.println("task index is " + hash % numTasks);
>>
>> }
>>
>>
>> There are certain types of values that may not hash consistently.  If you
>> are using String values, then it should be fine. Other types may or may
>> not, depending on how the class implements hashCode().
>>
>>
>> --
>> Derek
>>
>>
>> ________________________________
>> From: Kashyap Mhaisekar <[email protected]>
>> To: [email protected]
>> Sent: Tuesday, September 29, 2015 4:28 PM
>> Subject: Field Group Hash Computation
>>
>>
>>
>> Hi,
>> I have a field grouping based on 2 fields. I have 32 consumers for the
>> tuple and I see most of the times, out of 64 bolts, the field group is
>> always on 8 of them. Of the 8, 2 have more than 60% of the data. The data
>> for the field grouping can have 20 different combinations.
>>
>> Do you know what is the way to compute the Hash of the fields used for
>> computing? One of the groups mails indicate that the approach is -
>>
>> It calls "hashCode" on the list of selected values and mods it by the
>> number of consumer tasks. You can play around with that function to see if
>> something about your data is causing something degenerative to happen and
>> cause skew
>>
>> I saw the clojure code but not sure how to understand this.
>>
>> Thanks
>> Kashyap
>>
>
>

Reply via email to