Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/15736 New data for us, inlined comparator scores here (code provided below to check I've not profiled something useless!): ``` ScalaSparkPagerank 2016-12-05 13:44:41 259928115 48.149 5398411 5398411 ScalaSparkPagerank 2016-12-05 13:46:43 259928115 46.897 5542531 5542531 ScalaSparkPagerank 2016-12-05 13:48:46 259928115 49.130 5290619 5290619 ScalaSparkPagerank 2016-12-05 13:50:49 259928115 49.793 5220173 5220173 ScalaSparkPagerank 2016-12-05 13:52:50 259928115 48.061 5408296 5408296 ScalaSparkPagerank 2016-12-05 13:54:52 259928115 46.468 5593701 5593701 ScalaSparkPagerank 2016-12-05 13:56:56 259928115 51.385 5058443 5058443 ScalaSparkPagerank 2016-12-05 13:58:59 259928115 47.857 5431349 5431349 ScalaSparkPagerank 2016-12-05 14:00:59 259928115 46.515 5588049 5588049 ScalaSparkPagerank 2016-12-05 14:03:03 259928115 47.791 5438850 5438850 Avg 48.2046s ``` Remember our "vanilla" average time is 47.752s and our first commit averaged 47.229s (so not much of a difference really). I think we're splitting hairs and I've got another PR I am seeing good results on that I plan to focus on instead: the SizeEstimator. This is what I've benchmarked, PartitionedAppendOnlyMap first, so let me know if there any further suggestions, otherwise I propose leaving this one for later as actually against the Spark master codebase I'm not noticing anything exciting. ``` def partitionedDestructiveSortedIterator(keyComparator: Option[Comparator[K]]) : Iterator[((Int, K), V)] = { val comparator = { if (keyComparator.isDefined) { val theKeyComp = keyComparator.get new Comparator[(Int, K)] { // We know we have a non-empty comparator here override def compare(a: (Int, K), b: (Int, K)): Int = { if (a._1 != b._1) { a._1 - b._1 } else { theKeyComp.compare(a._2, b._2) } } } } else { new Comparator[(Int, K)] { override def compare(a: (Int, K), b: (Int, K)): Int = { a._1 - b._1 } } } } destructiveSortedIterator(comparator) } ``` In PartitionedPairBuffer ``` /** Iterate through the data in a given order. For this class this is not really destructive. */ override def partitionedDestructiveSortedIterator(keyComparator: Option[Comparator[K]]) : Iterator[((Int, K), V)] = { val comparator = { if (keyComparator.isDefined) { val theKeyComp = keyComparator.get new Comparator[(Int, K)] { // We know we have a non-empty comparator here override def compare(a: (Int, K), b: (Int, K)): Int = { if (a._1 != b._1) { a._1 - b._1 } else { theKeyComp.compare(a._2, b._2) } } } } else { new Comparator[(Int, K)] { override def compare(a: (Int, K), b: (Int, K)): Int = { a._1 - b._1 } } } } new Sorter(new KVArraySortDataFormat[(Int, K), AnyRef]).sort(data, 0, curSize, comparator) iterator } ``` WritablePartitionedPairCollection remains unchanged.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org