Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
New data for us, inlined comparator scores here (code provided below to
check I've not profiled something useless!):
```
ScalaSparkPagerank 2016-12-05 13:44:41 259928115 48.149
5398411 5398411
ScalaSparkPagerank 2016-12-05 13:46:43 259928115 46.897
5542531 5542531
ScalaSparkPagerank 2016-12-05 13:48:46 259928115 49.130
5290619 5290619
ScalaSparkPagerank 2016-12-05 13:50:49 259928115 49.793
5220173 5220173
ScalaSparkPagerank 2016-12-05 13:52:50 259928115 48.061
5408296 5408296
ScalaSparkPagerank 2016-12-05 13:54:52 259928115 46.468
5593701 5593701
ScalaSparkPagerank 2016-12-05 13:56:56 259928115 51.385
5058443 5058443
ScalaSparkPagerank 2016-12-05 13:58:59 259928115 47.857
5431349 5431349
ScalaSparkPagerank 2016-12-05 14:00:59 259928115 46.515
5588049 5588049
ScalaSparkPagerank 2016-12-05 14:03:03 259928115 47.791
5438850 5438850
Avg 48.2046s
```
Remember our "vanilla" average time is 47.752s and our first commit
averaged 47.229s (so not much of a difference really).
I think we're splitting hairs and I've got another PR I am seeing good
results on that I plan to focus on instead: the SizeEstimator.
This is what I've benchmarked, PartitionedAppendOnlyMap first, so let me
know if there any further suggestions, otherwise I propose leaving this one for
later as actually against the Spark master codebase I'm not noticing anything
exciting.
```
def partitionedDestructiveSortedIterator(keyComparator:
Option[Comparator[K]])
: Iterator[((Int, K), V)] = {
val comparator = {
if (keyComparator.isDefined) {
val theKeyComp = keyComparator.get
new Comparator[(Int, K)] {
// We know we have a non-empty comparator here
override def compare(a: (Int, K), b: (Int, K)): Int = {
if (a._1 != b._1) {
a._1 - b._1
} else {
theKeyComp.compare(a._2, b._2)
}
}
}
} else {
new Comparator[(Int, K)] {
override def compare(a: (Int, K), b: (Int, K)): Int = {
a._1 - b._1
}
}
}
}
destructiveSortedIterator(comparator)
}
```
In PartitionedPairBuffer
```
/** Iterate through the data in a given order. For this class this is not
really destructive. */
override def partitionedDestructiveSortedIterator(keyComparator:
Option[Comparator[K]])
: Iterator[((Int, K), V)] = {
val comparator = {
if (keyComparator.isDefined) {
val theKeyComp = keyComparator.get
new Comparator[(Int, K)] {
// We know we have a non-empty comparator here
override def compare(a: (Int, K), b: (Int, K)): Int = {
if (a._1 != b._1) {
a._1 - b._1
} else {
theKeyComp.compare(a._2, b._2)
}
}
}
} else {
new Comparator[(Int, K)] {
override def compare(a: (Int, K), b: (Int, K)): Int = {
a._1 - b._1
}
}
}
}
new Sorter(new KVArraySortDataFormat[(Int, K), AnyRef]).sort(data, 0,
curSize, comparator)
iterator
}
```
WritablePartitionedPairCollection remains unchanged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]