Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/21527#discussion_r196810539
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala ---
@@ -188,4 +188,37 @@ class MapStatusSuite extends SparkFunSuite {
assert(count === 3000)
}
}
+
+ test("YSPARK-500 MapStatus has 2000 hardcoded") {
+ val conf = new SparkConf()
+ .setMaster("local")
+ .setAppName("YSPARK-500")
+ val sizes = Array.fill[Long](500)(150L)
+ // Test default value
+ withSpark(new SparkContext(conf)) { sc =>
+ val status = MapStatus(null, sizes)
+ assert(status.isInstanceOf[CompressedMapStatus])
+ }
+ // Test Non-positive values
+ for (s <- -1 to 0) {
+ assertThrows[IllegalArgumentException] {
+ conf.set(config.SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS, s)
+ withSpark(new SparkContext(conf)) { sc =>
+ val status = MapStatus(null, sizes)
+ }
+ }
+ }
+ // Test positive values
+ for(s <- 1 to 3000) {
--- End diff --
this is overkill, especially since you're creating a SparkContext in each
test. (Look at how long this takes compared to other tests:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91850/testReport/org.apache.spark.scheduler/MapStatusSuite/)
you really dont' need a SparkContext for these tests, I'd just set a
`SparkEnv` -- just use mockito to create a mock which return the conf, and be
sure to unset it at the end of the test in a `finally`. that's probably enough
that runtime will be small, but in any case,just a few cases are probably
enough:
```scala
Seq(1, 100, 499, 500, 501).foreach { s => ...
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]