srowen commented on a change in pull request #24373: [WIP][SPARK-27460][TESTS]
Running slowest test suites in their own forked JVMs for higher parallelism
URL: https://github.com/apache/spark/pull/24373#discussion_r275393391
##########
File path: project/SparkBuild.scala
##########
@@ -430,6 +430,81 @@ object SparkBuild extends PomBuild {
else x.settings(Seq[Setting[_]](): _*)
} ++ Seq[Project](OldDeps.project)
}
+
+ if (!sys.env.contains("SERIAL_SBT_TESTS")) {
+ allProjects.foreach(enable(SparkParallelTestGrouping.settings))
+ }
+}
+
+object SparkParallelTestGrouping {
+ // Settings for parallelizing tests. The basic strategy here is to run the
slowest suites (or
+ // collections of suites) in their own forked JVMs, allowing us to gain
parallelism within a
+ // SBT project. Here, we take a whitelisting approach where the default
behavior is to run all
+ // tests sequentially in a single JVM, requiring us to manually opt-in to
the extra parallelism.
+ //
+ // There are a reasons why such a whitelist approach is good:
+ //
+ // 1. Launching one JVM per suite adds significant overhead for
short-running suites. In
+ // addition to JVM startup time and JIT warmup, it appears that
initialization of Derby
+ // metastores can be very slow so creating a fresh warehouse per suite
is inefficient.
+ //
+ // 2. When parallelizing within a project we need to give each forked JVM
a different tmpdir
Review comment:
I get it, but this won't help the Maven build and is kind of brittle. Is it
really hard to just set temp dirs differently for different suites?
Can a suite run suites in scalatest? and parallelize suites that way?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]