Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15736
I'm going to manually close this
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15736
Ping @a-roberts to resolve this
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15736
@a-roberts let's either finish the thought and merge this as mostly a code
cleanup and maybe marginal win, or just close it.
---
If your project is set up for it, you can reply to this email and
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15736
It does seem like nice cleanup in any event. I am not sure why the first
commit was faster as this seems like a 'superset' of optimization. We can't use
that one in any event. If you want to update
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
New data for us, inlined comparator scores here (code provided below to
check I've not profiled something useless!):
```
ScalaSparkPagerank 2016-12-05 13:44:41 25992811548.149
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15736
I'd certainly be curious to see a benchmark of the 'final' version with
inlined comparator. I would honestly be surprised if that's not fastest of all.
---
If your project is set up for it, you can
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
Passed on your question to our JIT developers
> The sense* of the test can have an impact of the VM interpreter
performance, but that is not usually much of a component of actual
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15736
Hm why does the order matter - maybe helps branch prediction? I doubt we
even know how the bytecode orders this let alone how it is JITted and whether
it will gather branching info on this one
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
I see, so doing the != comparison first which is likely to be true more of
the time so we're not consistently failing this check then entering the else
```
def
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/15736
(Particularly) as the number of partitions increase, "if (a._1 != b._1)"
might be better for bpt reasons.
---
If your project is set up for it, you can reply to this email and have your
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
Good point, done, I can get profiling the below code then? Builds fine and
no scalastyle problems
```
def getComparator[K](keyComparator: Option[Comparator[K]]):
Comparator[(Int,
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15736
Looks right except you just want to write
```
if (...) {
...
} else {
new Comparator...
}
```
---
If your project is set up for it, you can reply to this email
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
@srowen how about this for profiling?
```
private[spark] object WritablePartitionedPairCollection {
/**
* Takes an optional parameter (keyComparator), use if provided
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15736
If that's true then again doesn't my suggestion to inline
`partitionComparator` fix it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
Before progressing I've discussed what I'm seeing with our JIT compiler
team, with the refactoring to reduce code duplication, the following occurs
which solves some of the mystery -- although
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
I've conducted a lot of performance tests and gathered .hcd files so I can
investigate this next week, but it looks like either the first commit is the
best for performance or my current
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69172/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #69172 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69172/consoleFull)**
for PR 15736 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #69172 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69172/consoleFull)**
for PR 15736 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69165/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #69165 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69165/consoleFull)**
for PR 15736 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69164/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #69164 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69164/consoleFull)**
for PR 15736 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #69165 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69165/consoleFull)**
for PR 15736 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #69164 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69164/consoleFull)**
for PR 15736 at commit
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
Back to working on the performance related JIRAs now, so based on the above
helpful comments here's what I'll do
Remove the .get.compare from the loop as suggested above - we'll do a .get
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
I'm resuming the work for all of these related PRs again this week after
the London Spark meetup on Wednesday, if you are keen to take it on I'm more
than happy to help out and will share some
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15736
I know you're busy but this does look like a good change to finish off.
That it's a win is self-evident, just a question of how much, and benchmarks
you have already show it is an improvement. I can
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
Sean, they are great suggestions, thanks -- I'll find the time (like for
the other outstanding pull requests) to get your feedback integrated, tested
and profiled, currently caught up in
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15736
@a-roberts this does look like a worthy change, what do you think of the
further simplifications here?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/15736
To recap some of my feedback here, I think this will be a fine change but
it can be refactored further.
I think we can refactor this logic that appears twice in one place, perhaps
in
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67987/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #67987 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67987/consoleFull)**
for PR 15736 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67986/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #67986 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67986/consoleFull)**
for PR 15736 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #67987 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67987/consoleFull)**
for PR 15736 at commit
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
Addressed the scalastyle comments and added the PartitionedAppendOnlyMap
change here as per the above suggestions, will look at the review comments next
Two unrelated asides
1) wary
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #67986 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67986/consoleFull)**
for PR 15736 at commit
Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
Will be adding the commit from #15735 here upon addressing the feedback
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #67984 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67984/consoleFull)**
for PR 15736 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15736
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67984/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15736
**[Test build #67984 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67984/consoleFull)**
for PR 15736 at commit
48 matches
Mail list logo