Thanks Matthias. My question was this -
If am emitting out str1,str2,str3 but field grouped on str2,str3 only then
will the hash be on Values(str1,str2,str3) or on Values(str2,str3) alone?.
In my case str1,str2 are changing but I see the values go to same bolt
instance. Can we debug what is the hash generated?

Thanks you!

Kashyap
On Sep 30, 2015 5:14 AM, "Matthias J. Sax" <[email protected]> wrote:

> Yes. That's right.
>
> "Values" extends ArrayList and does not overwrite .hashCode().
>
> -Matthias
>
> On 09/30/2015 11:21 AM, Kashyap Mhaisekar wrote:
> > Is the computation right for hash? ArrayList(str1,str2...).hashcode()
> > where str1,str2 etc are fields being grouped?
> >
> > Thanks
> > Kashyap
> >
> > On Sep 29, 2015 18:04, "Kashyap Mhaisekar" <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Thanks guys. From what I understand, partial key grouping is used
> >     when you know your grouping will create imbalance. In my case, most
> >     of my field groups to one bolt thereby causing it to be a
> >     bottleneck. Since I emit string, I guess the hash is on
> >     ArrayList(str1,str2...).hashcode(). This hashcode is coming out same
> >     for different string combinations...
> >
> >     Thanks
> >     Kashyap
> >
> >     On Sep 29, 2015 17:51, "Matthias J. Sax" <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         If you can use "partial key grouping" depends on your use case.
> >         Think
> >         careful before you apply it...
> >
> >         Maybe you want to read the research paper about it. It clearly
> >         describes
> >         when you can use it and when not:
> >
> https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
> >
> >
> >         -Matthias
> >
> >         On 09/30/2015 12:18 AM, Ken Danniswara wrote:
> >         > Hi,
> >         >
> >         > From what I read, the default FieldGrouping did not balance
> >         the load as
> >         > like ShuffleGrouping do. In this case, there is a discussion
> about
> >         > custom Grouping implementation called partial key grouping
> >         where it have
> >         > better balancing problem. Maybe it
> >         > helps. https://github.com/gdfm/partial-key-grouping
> >         >
> >         > On Wed, Sep 30, 2015 at 12:11 AM, Kashyap Mhaisekar
> >         <[email protected] <mailto:[email protected]>
> >         > <mailto:[email protected] <mailto:[email protected]>>>
> wrote:
> >         >
> >         >     Thanks Derek. I use strings and I still end up with some
> bolts
> >         >     having the maximum requests :(
> >         >
> >         >     On Tue, Sep 29, 2015 at 5:03 PM, Derek Dagit
> >         <[email protected] <mailto:[email protected]>
> >         >     <mailto:[email protected]
> >         <mailto:[email protected]>>> wrote:
> >         >
> >         >         The code that hashes the field values is here:
> >         >
> >         >
> >
> https://github.com/apache/storm/blob/9d911ec1b4f7b5aabe646a5d2cd31591fe4df1b0/storm-core/src/clj/backtype/storm/tuple.clj#L24
> >         >
> >         >
> >         >         You can write a little java program, something like:
> >         >
> >         >         public static void main(String[] args) {
> >         >           ArrayList<String> myList = new ArrayList<String>();
> >         >              myList.add("first field value");
> >         >           myList.add("second field value");
> >         >
> >         >           int hash = Arrays.deephashCode(myList.toArray()); //
> >         as in
> >         >         tuple.clj
> >         >
> >         >
> >         >           System.out.println("hash is "+hash);
> >         >           int numTasks = 32;
> >         >
> >         >           System.out.println("task index is " + hash %
> numTasks);
> >         >
> >         >         }
> >         >
> >         >
> >         >         There are certain types of values that may not hash
> >         >         consistently.  If you are using String values, then it
> >         should be
> >         >         fine. Other types may or may not, depending on how the
> >         class
> >         >         implements hashCode().
> >         >
> >         >
> >         >         --
> >         >         Derek
> >         >
> >         >
> >         >         ________________________________
> >         >         From: Kashyap Mhaisekar <[email protected]
> >         <mailto:[email protected]>
> >         >         <mailto:[email protected] <mailto:
> [email protected]>>>
> >         >         To: [email protected]
> >         <mailto:[email protected]> <mailto:[email protected]
> >         <mailto:[email protected]>>
> >         >         Sent: Tuesday, September 29, 2015 4:28 PM
> >         >         Subject: Field Group Hash Computation
> >         >
> >         >
> >         >
> >         >         Hi,
> >         >         I have a field grouping based on 2 fields. I have 32
> >         consumers
> >         >         for the tuple and I see most of the times, out of 64
> >         bolts, the
> >         >         field group is always on 8 of them. Of the 8, 2 have
> >         more than
> >         >         60% of the data. The data for the field grouping can
> >         have 20
> >         >         different combinations.
> >         >
> >         >         Do you know what is the way to compute the Hash of the
> >         fields
> >         >         used for computing? One of the groups mails indicate
> >         that the
> >         >         approach is -
> >         >
> >         >         It calls "hashCode" on the list of selected values and
> >         mods it
> >         >         by the
> >         >         number of consumer tasks. You can play around with
> >         that function
> >         >         to see if
> >         >         something about your data is causing something
> >         degenerative to
> >         >         happen and
> >         >         cause skew
> >         >
> >         >         I saw the clojure code but not sure how to understand
> >         this.
> >         >
> >         >         Thanks
> >         >         Kashyap
> >         >
> >         >
> >         >
> >
>
>

Reply via email to