Thanks Derek. Here is the code and the results.
When the string is added to an ArrayList and then (hashCode % 64) is
computed they come out same. 64 is the no. of consumer tasks. The hashcode
of the strings by themselves is different.

My emit emits as -
collector.emit(new Values(str1,str2,str3)) where str3 is field grouped and
has the string values in "arr" in the below program
---------------
package com.demo;

import java.util.ArrayList;
import java.util.Random;

public class HashTest {

public static void main(String[] args) {

String[] arr = { "0:499", "500:999", "1000:1499", "1500:1999",
"2000:2499", "2500:2999", "3000:3499", "3500:3999",
"4000:4499", "4500:4999", "5000:5499", "5500:5999",
"6000:6499", "6500:6999", "7000:7499", "7500:7999",
"8000:8499", "9500:9999" };

int tasks = 64;//
for (int i = 0; i < arr.length; i++) {
ArrayList<String> arl = new ArrayList<String>();
arl.add(arr[i]);

System.out.println("Hash: " + arr[i] + " -- (hash): "
+ (arl.hashCode()%tasks) + " -- (String's hashcode): " + arr[i].hashCode());
}
}
}

Results:
Hash: 0:499 -- (hash): 41 -- (String's hashcode): 46108682
Hash: 500:999 -- (hash): 51 -- (String's hashcode): 1213367572
Hash: 1000:1499 -- (hash): 29 -- (String's hashcode): 464373438
Hash: 1500:1999 -- (hash): 61 -- (String's hashcode): 588495326
Hash: 2000:2499 -- (hash): -3 -- (String's hashcode): -1343051234
Hash: 2500:2999 -- (hash): -35 -- (String's hashcode): -1218929346
Hash: 3000:3499 -- (hash): 29 -- (String's hashcode): 1144491390
Hash: 3500:3999 -- (hash): 61 -- (String's hashcode): 1268613278
Hash: 4000:4499 -- (hash): -3 -- (String's hashcode): -662933282
Hash: 4500:4999 -- (hash): -35 -- (String's hashcode): -538811394
Hash: 5000:5499 -- (hash): 29 -- (String's hashcode): 1824609342
Hash: 5500:5999 -- (hash): 61 -- (String's hashcode): 1948731230
Hash: 6000:6499 -- (hash): 61 -- (String's hashcode): 17184670
Hash: 6500:6999 -- (hash): 29 -- (String's hashcode): 141306558
Hash: 7000:7499 -- (hash): -35 -- (String's hashcode): -1790240002
Hash: 7500:7999 -- (hash): -3 -- (String's hashcode): -1666118114
Hash: 8000:8499 -- (hash): 61 -- (String's hashcode): 697302622
Hash: 9500:9999 -- (hash): -3 -- (String's hashcode): -986000162

----------------------

Thanks
kashyap

On Wed, Sep 30, 2015 at 9:20 AM, Derek Dagit <[email protected]> wrote:

> > This hashcode is coming out same for different string combinations...
>
> As far as I understand, this can only happen with vanishingly small
> probability.
>
> Here is the hashCode implementation for String:
>
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/lang/String.java#String.hashCode%28%29
>
> Here is the Arrays code that combines the hashes of the individual Strings:
>
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/util/Arrays.java#Arrays.deepHashCode%28java.lang.Object[]%29
>
>
>
> Would you share an example of different combinations of String field
> values that hash to the same hashcode value?
> --
> Derek
>
>
> ________________________________
> From: Kashyap Mhaisekar <[email protected]>
> To: [email protected]
> Sent: Tuesday, September 29, 2015 6:04 PM
> Subject: Re: Field Group Hash Computation
>
>
>
> Thanks guys. From what I understand, partial key grouping is used when you
> know your grouping will create imbalance. In my case, most of my field
> groups to one bolt thereby causing it to be a bottleneck. Since I emit
> string, I guess the hash is on ArrayList(str1,str2...).hashcode(). This
> hashcode is coming out same for different string combinations...
> Thanks
> Kashyap
>
>
> On Sep 29, 2015 17:51, "Matthias J. Sax" <[email protected]> wrote:
>
> If you can use "partial key grouping" depends on your use case. Think
> >careful before you apply it...
> >
> >Maybe you want to read the research paper about it. It clearly describes
> >when you can use it and when not:
> >
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> >
> >
> >-Matthias
> >
> >On 09/30/2015 12:18 AM, Ken Danniswara wrote:
> >> Hi,
> >>
> >> From what I read, the default FieldGrouping did not balance the load as
> >> like ShuffleGrouping do. In this case, there is a discussion about
> >> custom Grouping implementation called partial key grouping where it have
> >> better balancing problem. Maybe it
> >> helps. https://github.com/gdfm/partial-key-grouping
> >>
> >> On Wed, Sep 30, 2015 at 12:11 AM, Kashyap Mhaisekar <
> [email protected]
> >> <mailto:[email protected]>> wrote:
> >>
> >>     Thanks Derek. I use strings and I still end up with some bolts
> >>     having the maximum requests :(
> >>
> >>     On Tue, Sep 29, 2015 at 5:03 PM, Derek Dagit <[email protected]
> >>     <mailto:[email protected]>> wrote:
> >>
> >>         The code that hashes the field values is here:
> >>
> >>
> https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24
> >>
> >>
> >>         You can write a little java program, something like:
> >>
> >>         public static void main(String[] args) {
> >>           ArrayList<String> myList = new ArrayList<String>();
> >>              myList.add("first field value");
> >>           myList.add("second field value");
> >>
> >>           int hash = Arrays.deephashCode(myList.toArray()); // as in
> >>         tuple.clj
> >>
> >>
> >>           System.out.println("hash is "+hash);
> >>           int numTasks = 32;
> >>
> >>           System.out.println("task index is " + hash % numTasks);
> >>
> >>         }
> >>
> >>
> >>         There are certain types of values that may not hash
> >>         consistently.  If you are using String values, then it should be
> >>         fine. Other types may or may not, depending on how the class
> >>         implements hashCode().
> >>
> >>
> >>         --
> >>         Derek
> >>
> >>
> >>         ________________________________
> >>         From: Kashyap Mhaisekar <[email protected]
> >>         <mailto:[email protected]>>
> >>         To: [email protected] <mailto:[email protected]>
> >>         Sent: Tuesday, September 29, 2015 4:28 PM
> >>         Subject: Field Group Hash Computation
> >>
> >>
> >>
> >>         Hi,
> >>         I have a field grouping based on 2 fields. I have 32 consumers
> >>         for the tuple and I see most of the times, out of 64 bolts, the
> >>         field group is always on 8 of them. Of the 8, 2 have more than
> >>         60% of the data. The data for the field grouping can have 20
> >>         different combinations.
> >>
> >>         Do you know what is the way to compute the Hash of the fields
> >>         used for computing? One of the groups mails indicate that the
> >>         approach is -
> >>
> >>         It calls "hashCode" on the list of selected values and mods it
> >>         by the
> >>         number of consumer tasks. You can play around with that function
> >>         to see if
> >>         something about your data is causing something degenerative to
> >>         happen and
> >>         cause skew
> >>
> >>         I saw the clojure code but not sure how to understand this.
> >>
> >>         Thanks
> >>         Kashyap
> >>
> >>
> >>
> >
> >
>

Reply via email to