[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148562203 [Test build #43812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43812/console) for PR 9124 at commit [`7590b77`](https://github.com/apache/spark/commit/7590b7747750a053f3253656d97dbeb7f38b8f80). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148562385 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148562387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43812/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148526833 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148526820 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148531647 OK, merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9124#discussion_r42182171 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala --- @@ -18,535 +18,91 @@ package org.apache.spark.util.collection import scala.collection.mutable.ArrayBuffer - import scala.util.Random import org.apache.spark._ import org.apache.spark.serializer.{JavaSerializer, KryoSerializer} -// TODO: some of these spilling tests probably aren't actually spilling (SPARK-11078) class ExternalSorterSuite extends SparkFunSuite with LocalSparkContext { - private def createSparkConf(loadDefaults: Boolean, kryo: Boolean): SparkConf = { -val conf = new SparkConf(loadDefaults) -if (kryo) { - conf.set("spark.serializer", classOf[KryoSerializer].getName) -} else { - // Make the Java serializer write a reset instruction (TC_RESET) after each object to test - // for a bug we had with bytes written past the last object in a batch (SPARK-2792) - conf.set("spark.serializer.objectStreamReset", "1") - conf.set("spark.serializer", classOf[JavaSerializer].getName) -} -conf.set("spark.shuffle.sort.bypassMergeThreshold", "0") -// Ensure that we actually have multiple batches per spill file -conf.set("spark.shuffle.spill.batchSize", "10") -conf.set("spark.testing.memory", "200") -conf - } - - test("empty data stream with kryo ser") { -emptyDataStream(createSparkConf(false, true)) - } - - test("empty data stream with java ser") { -emptyDataStream(createSparkConf(false, false)) - } - - def emptyDataStream(conf: SparkConf) { -conf.set("spark.shuffle.manager", "org.apache.spark.shuffle.sort.SortShuffleManager") -sc = new SparkContext("local", "test", conf) - -val agg = new Aggregator[Int, Int, Int](i => i, (i, j) => i + j, (i, j) => i + j) -val ord = implicitly[Ordering[Int]] - -// Both aggregator and ordering -val sorter = new ExternalSorter[Int, Int, Int]( - Some(agg), Some(new HashPartitioner(3)), Some(ord), None) -assert(sorter.iterator.toSeq === Seq()) -sorter.stop() - -// Only aggregator -val sorter2 = new ExternalSorter[Int, Int, Int]( - Some(agg), Some(new HashPartitioner(3)), None, None) -assert(sorter2.iterator.toSeq === Seq()) -sorter2.stop() - -// Only ordering -val sorter3 = new ExternalSorter[Int, Int, Int]( - None, Some(new HashPartitioner(3)), Some(ord), None) -assert(sorter3.iterator.toSeq === Seq()) -sorter3.stop() - -// Neither aggregator nor ordering -val sorter4 = new ExternalSorter[Int, Int, Int]( - None, Some(new HashPartitioner(3)), None, None) -assert(sorter4.iterator.toSeq === Seq()) -sorter4.stop() - } + import TestUtils.{assertNotSpilled, assertSpilled} - test("few elements per partition with kryo ser") { -fewElementsPerPartition(createSparkConf(false, true)) - } + testWithMultipleSer("empty data stream")(emptyDataStream) - test("few elements per partition with java ser") { -fewElementsPerPartition(createSparkConf(false, false)) - } + testWithMultipleSer("few elements per partition")(fewElementsPerPartition) - def fewElementsPerPartition(conf: SparkConf) { -conf.set("spark.shuffle.manager", "org.apache.spark.shuffle.sort.SortShuffleManager") -sc = new SparkContext("local", "test", conf) - -val agg = new Aggregator[Int, Int, Int](i => i, (i, j) => i + j, (i, j) => i + j) -val ord = implicitly[Ordering[Int]] -val elements = Set((1, 1), (2, 2), (5, 5)) -val expected = Set( - (0, Set()), (1, Set((1, 1))), (2, Set((2, 2))), (3, Set()), (4, Set()), - (5, Set((5, 5))), (6, Set())) - -// Both aggregator and ordering -val sorter = new ExternalSorter[Int, Int, Int]( - Some(agg), Some(new HashPartitioner(7)), Some(ord), None) -sorter.insertAll(elements.iterator) -assert(sorter.partitionedIterator.map(p => (p._1, p._2.toSet)).toSet === expected) -sorter.stop() - -// Only aggregator -val sorter2 = new ExternalSorter[Int, Int, Int]( - Some(agg), Some(new HashPartitioner(7)), None, None) -sorter2.insertAll(elements.iterator) -assert(sorter2.partitionedIterator.map(p => (p._1, p._2.toSet)).toSet === expected) -sorter2.stop() + testWithMultipleSer("empty partitions with spilling")(emptyPartitionsWithSpilling) -// Only ordering -val sorter3 = new ExternalSorter[Int, Int, Int]( - None, Some(new
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148528504 [Test build #43812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43812/consoleFull) for PR 9124 at commit [`7590b77`](https://github.com/apache/spark/commit/7590b7747750a053f3253656d97dbeb7f38b8f80). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148524964 LGTM over all --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/9124#discussion_r42182568 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala --- @@ -18,535 +18,91 @@ package org.apache.spark.util.collection import scala.collection.mutable.ArrayBuffer - import scala.util.Random import org.apache.spark._ import org.apache.spark.serializer.{JavaSerializer, KryoSerializer} -// TODO: some of these spilling tests probably aren't actually spilling (SPARK-11078) class ExternalSorterSuite extends SparkFunSuite with LocalSparkContext { - private def createSparkConf(loadDefaults: Boolean, kryo: Boolean): SparkConf = { -val conf = new SparkConf(loadDefaults) -if (kryo) { - conf.set("spark.serializer", classOf[KryoSerializer].getName) -} else { - // Make the Java serializer write a reset instruction (TC_RESET) after each object to test - // for a bug we had with bytes written past the last object in a batch (SPARK-2792) - conf.set("spark.serializer.objectStreamReset", "1") - conf.set("spark.serializer", classOf[JavaSerializer].getName) -} -conf.set("spark.shuffle.sort.bypassMergeThreshold", "0") -// Ensure that we actually have multiple batches per spill file -conf.set("spark.shuffle.spill.batchSize", "10") -conf.set("spark.testing.memory", "200") -conf - } - - test("empty data stream with kryo ser") { -emptyDataStream(createSparkConf(false, true)) - } - - test("empty data stream with java ser") { -emptyDataStream(createSparkConf(false, false)) - } - - def emptyDataStream(conf: SparkConf) { -conf.set("spark.shuffle.manager", "org.apache.spark.shuffle.sort.SortShuffleManager") -sc = new SparkContext("local", "test", conf) - -val agg = new Aggregator[Int, Int, Int](i => i, (i, j) => i + j, (i, j) => i + j) -val ord = implicitly[Ordering[Int]] - -// Both aggregator and ordering -val sorter = new ExternalSorter[Int, Int, Int]( - Some(agg), Some(new HashPartitioner(3)), Some(ord), None) -assert(sorter.iterator.toSeq === Seq()) -sorter.stop() - -// Only aggregator -val sorter2 = new ExternalSorter[Int, Int, Int]( - Some(agg), Some(new HashPartitioner(3)), None, None) -assert(sorter2.iterator.toSeq === Seq()) -sorter2.stop() - -// Only ordering -val sorter3 = new ExternalSorter[Int, Int, Int]( - None, Some(new HashPartitioner(3)), Some(ord), None) -assert(sorter3.iterator.toSeq === Seq()) -sorter3.stop() - -// Neither aggregator nor ordering -val sorter4 = new ExternalSorter[Int, Int, Int]( - None, Some(new HashPartitioner(3)), None, None) -assert(sorter4.iterator.toSeq === Seq()) -sorter4.stop() - } + import TestUtils.{assertNotSpilled, assertSpilled} - test("few elements per partition with kryo ser") { -fewElementsPerPartition(createSparkConf(false, true)) - } + testWithMultipleSer("empty data stream")(emptyDataStream) - test("few elements per partition with java ser") { -fewElementsPerPartition(createSparkConf(false, false)) - } + testWithMultipleSer("few elements per partition")(fewElementsPerPartition) - def fewElementsPerPartition(conf: SparkConf) { -conf.set("spark.shuffle.manager", "org.apache.spark.shuffle.sort.SortShuffleManager") -sc = new SparkContext("local", "test", conf) - -val agg = new Aggregator[Int, Int, Int](i => i, (i, j) => i + j, (i, j) => i + j) -val ord = implicitly[Ordering[Int]] -val elements = Set((1, 1), (2, 2), (5, 5)) -val expected = Set( - (0, Set()), (1, Set((1, 1))), (2, Set((2, 2))), (3, Set()), (4, Set()), - (5, Set((5, 5))), (6, Set())) - -// Both aggregator and ordering -val sorter = new ExternalSorter[Int, Int, Int]( - Some(agg), Some(new HashPartitioner(7)), Some(ord), None) -sorter.insertAll(elements.iterator) -assert(sorter.partitionedIterator.map(p => (p._1, p._2.toSet)).toSet === expected) -sorter.stop() - -// Only aggregator -val sorter2 = new ExternalSorter[Int, Int, Int]( - Some(agg), Some(new HashPartitioner(7)), None, None) -sorter2.insertAll(elements.iterator) -assert(sorter2.partitionedIterator.map(p => (p._1, p._2.toSet)).toSet === expected) -sorter2.stop() + testWithMultipleSer("empty partitions with spilling")(emptyPartitionsWithSpilling) -// Only ordering -val sorter3 = new ExternalSorter[Int, Int, Int]( - None, Some(new
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9124 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/9124 [SPARK-11078] Ensure spilling tests actually spill #9084 uncovered that many tests that test spilling don't actually spill. This is a follow-up patch to fix that to ensure our unit tests actually catch potential bugs in spilling. The size of this patch is inflated by the refactoring of `ExternalSorterSuite`, which had a lot of duplicate code and logic. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark spilling-tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9124.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9124 commit fd34c25d7ab6d3506d42b2f0cd8c8f26b7726e50 Author: Andrew OrDate: 2015-10-14T20:05:42Z Fix and clean up ExternalSorterSuite This commit does several things: - remove noisy warning in GrantEverythingMemoryManager - remove duplciate code in ExternalSorterSuite - add a force spill threshold to make it easier to verify spilling - ensure spilling tests actually spill in ExternalSorterSuite commit 285c81c26b210bcbe88f055475b2a53ac61bb5c1 Author: Andrew Or Date: 2015-10-14T21:33:51Z Fix spilling tests in ExternalAppendOnlyMapSuite commit 7226933d2a40896b9c4d606eb2c0ab6437507431 Author: Andrew Or Date: 2015-10-14T22:25:15Z Fix DistributedSuite commit 1b7fa3d6d32b1e254a47706db7eccd915c7368aa Author: Andrew Or Date: 2015-10-14T22:28:25Z Merge branch 'master' of github.com:apache/spark into spilling-tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148221886 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148221865 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148223975 [Test build #43747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43747/consoleFull) for PR 9124 at commit [`1b7fa3d`](https://github.com/apache/spark/commit/1b7fa3d6d32b1e254a47706db7eccd915c7368aa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148246056 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148246061 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43747/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148245856 [Test build #43747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43747/console) for PR 9124 at commit [`1b7fa3d`](https://github.com/apache/spark/commit/1b7fa3d6d32b1e254a47706db7eccd915c7368aa). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148263547 [Test build #1904 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1904/consoleFull) for PR 9124 at commit [`1b7fa3d`](https://github.com/apache/spark/commit/1b7fa3d6d32b1e254a47706db7eccd915c7368aa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11078] Ensure spilling tests actually s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9124#issuecomment-148282338 [Test build #1904 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1904/console) for PR 9124 at commit [`1b7fa3d`](https://github.com/apache/spark/commit/1b7fa3d6d32b1e254a47706db7eccd915c7368aa). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org