Yes. That's right.

"Values" extends ArrayList and does not overwrite .hashCode().

-Matthias

On 09/30/2015 11:21 AM, Kashyap Mhaisekar wrote:
> Is the computation right for hash? ArrayList(str1,str2...).hashcode()
> where str1,str2 etc are fields being grouped?
> 
> Thanks
> Kashyap
> 
> On Sep 29, 2015 18:04, "Kashyap Mhaisekar" <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Thanks guys. From what I understand, partial key grouping is used
>     when you know your grouping will create imbalance. In my case, most
>     of my field groups to one bolt thereby causing it to be a
>     bottleneck. Since I emit string, I guess the hash is on
>     ArrayList(str1,str2...).hashcode(). This hashcode is coming out same
>     for different string combinations...
> 
>     Thanks
>     Kashyap
> 
>     On Sep 29, 2015 17:51, "Matthias J. Sax" <[email protected]
>     <mailto:[email protected]>> wrote:
> 
>         If you can use "partial key grouping" depends on your use case.
>         Think
>         careful before you apply it...
> 
>         Maybe you want to read the research paper about it. It clearly
>         describes
>         when you can use it and when not:
>         
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> 
> 
>         -Matthias
> 
>         On 09/30/2015 12:18 AM, Ken Danniswara wrote:
>         > Hi,
>         >
>         > From what I read, the default FieldGrouping did not balance
>         the load as
>         > like ShuffleGrouping do. In this case, there is a discussion about
>         > custom Grouping implementation called partial key grouping
>         where it have
>         > better balancing problem. Maybe it
>         > helps. https://github.com/gdfm/partial-key-grouping
>         >
>         > On Wed, Sep 30, 2015 at 12:11 AM, Kashyap Mhaisekar
>         <[email protected] <mailto:[email protected]>
>         > <mailto:[email protected] <mailto:[email protected]>>> wrote:
>         >
>         >     Thanks Derek. I use strings and I still end up with some bolts
>         >     having the maximum requests :(
>         >
>         >     On Tue, Sep 29, 2015 at 5:03 PM, Derek Dagit
>         <[email protected] <mailto:[email protected]>
>         >     <mailto:[email protected]
>         <mailto:[email protected]>>> wrote:
>         >
>         >         The code that hashes the field values is here:
>         >
>         >       
>          
> https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24
>         >
>         >
>         >         You can write a little java program, something like:
>         >
>         >         public static void main(String[] args) {
>         >           ArrayList<String> myList = new ArrayList<String>();
>         >              myList.add("first field value");
>         >           myList.add("second field value");
>         >
>         >           int hash = Arrays.deephashCode(myList.toArray()); //
>         as in
>         >         tuple.clj
>         >
>         >
>         >           System.out.println("hash is "+hash);
>         >           int numTasks = 32;
>         >
>         >           System.out.println("task index is " + hash % numTasks);
>         >
>         >         }
>         >
>         >
>         >         There are certain types of values that may not hash
>         >         consistently.  If you are using String values, then it
>         should be
>         >         fine. Other types may or may not, depending on how the
>         class
>         >         implements hashCode().
>         >
>         >
>         >         --
>         >         Derek
>         >
>         >
>         >         ________________________________
>         >         From: Kashyap Mhaisekar <[email protected]
>         <mailto:[email protected]>
>         >         <mailto:[email protected] <mailto:[email protected]>>>
>         >         To: [email protected]
>         <mailto:[email protected]> <mailto:[email protected]
>         <mailto:[email protected]>>
>         >         Sent: Tuesday, September 29, 2015 4:28 PM
>         >         Subject: Field Group Hash Computation
>         >
>         >
>         >
>         >         Hi,
>         >         I have a field grouping based on 2 fields. I have 32
>         consumers
>         >         for the tuple and I see most of the times, out of 64
>         bolts, the
>         >         field group is always on 8 of them. Of the 8, 2 have
>         more than
>         >         60% of the data. The data for the field grouping can
>         have 20
>         >         different combinations.
>         >
>         >         Do you know what is the way to compute the Hash of the
>         fields
>         >         used for computing? One of the groups mails indicate
>         that the
>         >         approach is -
>         >
>         >         It calls "hashCode" on the list of selected values and
>         mods it
>         >         by the
>         >         number of consumer tasks. You can play around with
>         that function
>         >         to see if
>         >         something about your data is causing something
>         degenerative to
>         >         happen and
>         >         cause skew
>         >
>         >         I saw the clojure code but not sure how to understand
>         this.
>         >
>         >         Thanks
>         >         Kashyap
>         >
>         >
>         >
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to