Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/19851#discussion_r154029237
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
---
@@ -50,14 +52,24 @@ class HiveExternalCatalogVersionsSuite extends
SparkSubmitTestUtils {
super.afterAll()
}
- private def downloadSpark(version: String): Unit = {
- import scala.sys.process._
+ private def tryDownloadSpark(version: String, path: String): Unit = {
+ // Try mirrors a few times until one succeeds
+ for (i <- 0 until 3) {
+ val preferredMirror =
+ Seq("wget",
"https://www.apache.org/dyn/closer.lua?preferred=true", "-q", "-O", "-").!!.trim
+ val url =
s"$preferredMirror/spark/spark-$version/spark-$version-bin-hadoop2.7.tgz"
+ logInfo(s"Downloading Spark $version from $url")
+ if (Seq("wget", url, "-q", "-P", path).! == 0) {
+ return
+ }
+ logWarning(s"Failed to download Spark $version from $url")
+ }
+ fail(s"Unable to download Spark $version")
--- End diff --
btw, I've also seen a mirror abruptly ending a download but not getting
reported as an error, resulting in an incomplete/corrupted tgz.
it's possible the mirror misreports the response byte size in that case.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]