[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-12-30 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15736 I'm going to manually close this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-12-19 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15736 Ping @a-roberts to resolve this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-12-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15736 @a-roberts let's either finish the thought and merge this as mostly a code cleanup and maybe marginal win, or just close it. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-12-06 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15736 It does seem like nice cleanup in any event. I am not sure why the first commit was faster as this seems like a 'superset' of optimization. We can't use that one in any event. If you want to update

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-12-06 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 New data for us, inlined comparator scores here (code provided below to check I've not profiled something useless!): ``` ScalaSparkPagerank 2016-12-05 13:44:41 25992811548.149

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-12-04 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15736 I'd certainly be curious to see a benchmark of the 'final' version with inlined comparator. I would honestly be surprised if that's not fastest of all. --- If your project is set up for it, you can

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-30 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 Passed on your question to our JIT developers > The sense* of the test can have an impact of the VM interpreter performance, but that is not usually much of a component of actual

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-30 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15736 Hm why does the order matter - maybe helps branch prediction? I doubt we even know how the bytecode orders this let alone how it is JITted and whether it will gather branching info on this one

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-30 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 I see, so doing the != comparison first which is likely to be true more of the time so we're not consistently failing this check then entering the else ``` def

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-30 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15736 (Particularly) as the number of partitions increase, "if (a._1 != b._1)" might be better for bpt reasons. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-30 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 Good point, done, I can get profiling the below code then? Builds fine and no scalastyle problems ``` def getComparator[K](keyComparator: Option[Comparator[K]]): Comparator[(Int,

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-30 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15736 Looks right except you just want to write ``` if (...) { ... } else { new Comparator... } ``` --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-30 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 @srowen how about this for profiling? ``` private[spark] object WritablePartitionedPairCollection { /** * Takes an optional parameter (keyComparator), use if provided

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-29 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15736 If that's true then again doesn't my suggestion to inline `partitionComparator` fix it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-28 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 Before progressing I've discussed what I'm seeing with our JIT compiler team, with the refactoring to reduce code duplication, the following occurs which solves some of the mystery -- although

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 I've conducted a lot of performance tests and gathered .hcd files so I can investigate this next week, but it looks like either the first commit is the best for performance or my current

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69172/ Test FAILed. ---

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #69172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69172/consoleFull)** for PR 15736 at commit

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #69172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69172/consoleFull)** for PR 15736 at commit

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69165/ Test PASSed. ---

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #69165 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69165/consoleFull)** for PR 15736 at commit

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69164/ Test PASSed. ---

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #69164 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69164/consoleFull)** for PR 15736 at commit

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #69165 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69165/consoleFull)** for PR 15736 at commit

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #69164 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69164/consoleFull)** for PR 15736 at commit

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 Back to working on the performance related JIRAs now, so based on the above helpful comments here's what I'll do Remove the .get.compare from the loop as suggested above - we'll do a .get

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-19 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 I'm resuming the work for all of these related PRs again this week after the London Spark meetup on Wednesday, if you are keen to take it on I'm more than happy to help out and will share some

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-19 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15736 I know you're busy but this does look like a good change to finish off. That it's a win is self-evident, just a question of how much, and benchmarks you have already show it is an improvement. I can

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-09 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 Sean, they are great suggestions, thanks -- I'll find the time (like for the other outstanding pull requests) to get your feedback integrated, tested and profiled, currently caught up in

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-09 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15736 @a-roberts this does look like a worthy change, what do you think of the further simplifications here? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-05 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15736 To recap some of my feedback here, I think this will be a fine change but it can be refactored further. I think we can refactor this logic that appears twice in one place, perhaps in

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67987/ Test PASSed. ---

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #67987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67987/consoleFull)** for PR 15736 at commit

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67986/ Test FAILed. ---

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #67986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67986/consoleFull)** for PR 15736 at commit

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #67987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67987/consoleFull)** for PR 15736 at commit

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 Addressed the scalastyle comments and added the PartitionedAppendOnlyMap change here as per the above suggestions, will look at the review comments next Two unrelated asides 1) wary

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #67986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67986/consoleFull)** for PR 15736 at commit

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 Will be adding the commit from #15735 here upon addressing the feedback --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #67984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67984/consoleFull)** for PR 15736 at commit

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15736 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67984/ Test FAILed. ---

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15736 **[Test build #67984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67984/consoleFull)** for PR 15736 at commit