[GitHub] storm issue #2379: STORM-2782 - refactor partial key grouping to make it mor...

2017-10-30 Thread kevpeek
Github user kevpeek commented on the issue: https://github.com/apache/storm/pull/2379 new PR with this squashed to one commit: https://github.com/apache/storm/pull/2395 ---

[GitHub] storm issue #2379: STORM-2782 - refactor partial key grouping to make it mor...

2017-10-30 Thread kevpeek
Github user kevpeek commented on the issue: https://github.com/apache/storm/pull/2379 I went ahead and replaced the original hash function based logic with something faster. This code now runs in about 60% of the time taken by the original. As for changing the hash logic, I r

[GitHub] storm issue #2379: STORM-2782 - refactor partial key grouping to make it mor...

2017-10-28 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/2379 @kevpeek You can refer another implementation to see how we optimize it. https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/grouping/LoadAwareShuffleGr

[GitHub] storm issue #2379: STORM-2782 - refactor partial key grouping to make it mor...

2017-10-27 Thread kevpeek
Github user kevpeek commented on the issue: https://github.com/apache/storm/pull/2379 current performance is between 3% and 12% slower than the old version. @revans2 Would you elaborate on what you meant by, "pre-allocate the possible return values as lists?" Thanks.

[GitHub] storm issue #2379: STORM-2782 - refactor partial key grouping to make it mor...

2017-10-26 Thread revans2
Github user revans2 commented on the issue: https://github.com/apache/storm/pull/2379 @kevpeek yes the JVM can be very dumb when it comes to arrays vs Lists. That sounds great if we can be within a few percentage points I would be happy to merge this in. ---

[GitHub] storm issue #2379: STORM-2782 - refactor partial key grouping to make it mor...

2017-10-26 Thread kevpeek
Github user kevpeek commented on the issue: https://github.com/apache/storm/pull/2379 heh. It crossed my mind to change those to arrays, but I figured the jvm would render the difference negligible. A quick test proves that I was very wrong. I'll tweak that tomorrow and run some more

[GitHub] storm issue #2379: STORM-2782 - refactor partial key grouping to make it mor...

2017-10-26 Thread revans2
Github user revans2 commented on the issue: https://github.com/apache/storm/pull/2379 @kevpeek I agree that it might not be worth merging it right now. But there is room for improvement. The first thing I would do is to pre-allocate the possible return values as lists. We s

[GitHub] storm issue #2379: STORM-2782 - refactor partial key grouping to make it mor...

2017-10-23 Thread kevpeek
Github user kevpeek commented on the issue: https://github.com/apache/storm/pull/2379 @revans2 I ran some perf tests, and this refactor did slow things down considerably. I have made a small change that eliminates the main source of slowness, but it's still not great. The cur

[GitHub] storm issue #2379: STORM-2782 - refactor partial key grouping to make it mor...

2017-10-20 Thread kevpeek
Github user kevpeek commented on the issue: https://github.com/apache/storm/pull/2379 I will go collect some performance numbers and get back to you. ---

[GitHub] storm issue #2379: STORM-2782 - refactor partial key grouping to make it mor...

2017-10-20 Thread revans2
Github user revans2 commented on the issue: https://github.com/apache/storm/pull/2379 I am a conditional +1 on this change assuming that the performance numbers are within a few percent of the original. ---