Repository: spark
Updated Branches:
  refs/heads/branch-2.2 154bbc959 -> 768d0b7ce


[SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalogVersionsSuite should 
verify the downloaded file

## What changes were proposed in this pull request?

This is a backport of #21210 because `branch-2.2` also faces the same failures.

Although [SPARK-22654](https://issues.apache.org/jira/browse/SPARK-22654) made 
`HiveExternalCatalogVersionsSuite` download from Apache mirrors three times, it 
has been flaky because it didn't verify the downloaded file. Some Apache 
mirrors terminate the downloading abnormally, the *corrupted* file shows the 
following errors.

```
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
22:46:32.700 WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite:

===== POSSIBLE THREAD LEAK IN SUITE 
o.a.s.sql.hive.HiveExternalCatalogVersionsSuite, thread names: Keep-Alive-Timer 
=====

*** RUN ABORTED ***
  java.io.IOException: Cannot run program "./bin/spark-submit" (in directory 
"/tmp/test-spark/spark-2.2.0"): error=2, No such file or directory
```

This has been reported weirdly in two ways. For example, the above case is 
reported as Case 2 `no failures`.

- Case 1. [Test Result (1 failure / 
+1)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/4389/)
- Case 2. [Test Result (no 
failures)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4811/)

This PR aims to make `HiveExternalCatalogVersionsSuite` more robust by 
verifying the downloaded `tgz` file by extracting and checking the existence of 
`bin/spark-submit`. If it turns out that the file is empty or corrupted, 
`HiveExternalCatalogVersionsSuite` will do retry logic like the download 
failure.

## How was this patch tested?

Pass the Jenkins.

Author: Dongjoon Hyun <dongj...@apache.org>

Closes #21232 from dongjoon-hyun/SPARK-23489-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/768d0b7c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/768d0b7c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/768d0b7c

Branch: refs/heads/branch-2.2
Commit: 768d0b7cecb8716d517279df2228a404681e4f95
Parents: 154bbc9
Author: Dongjoon Hyun <dongj...@apache.org>
Authored: Thu May 3 17:10:15 2018 -0700
Committer: gatorsmile <gatorsm...@gmail.com>
Committed: Thu May 3 17:10:15 2018 -0700

----------------------------------------------------------------------
 .../hive/HiveExternalCatalogVersionsSuite.scala | 35 ++++++++++----------
 1 file changed, 18 insertions(+), 17 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/768d0b7c/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
----------------------------------------------------------------------
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
index a3d5b94..2b37047 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
@@ -57,30 +57,31 @@ class HiveExternalCatalogVersionsSuite extends 
SparkSubmitTestUtils {
     for (i <- 0 until 3) {
       val preferredMirror =
         Seq("wget", "https://www.apache.org/dyn/closer.lua?preferred=true";, 
"-q", "-O", "-").!!.trim
-      val url = 
s"$preferredMirror/spark/spark-$version/spark-$version-bin-hadoop2.7.tgz"
+      val filename = s"spark-$version-bin-hadoop2.7.tgz"
+      val url = s"$preferredMirror/spark/spark-$version/$filename"
       logInfo(s"Downloading Spark $version from $url")
       if (Seq("wget", url, "-q", "-P", path).! == 0) {
-        return
+        val downloaded = new File(sparkTestingDir, filename).getCanonicalPath
+        val targetDir = new File(sparkTestingDir, 
s"spark-$version").getCanonicalPath
+
+        Seq("mkdir", targetDir).!
+        val exitCode = Seq("tar", "-xzf", downloaded, "-C", targetDir, 
"--strip-components=1").!
+        Seq("rm", downloaded).!
+
+        // For a corrupted file, `tar` returns non-zero values. However, we 
also need to check
+        // the extracted file because `tar` returns 0 for empty file.
+        val sparkSubmit = new File(sparkTestingDir, 
s"spark-$version/bin/spark-submit")
+        if (exitCode == 0 && sparkSubmit.exists()) {
+          return
+        } else {
+          Seq("rm", "-rf", targetDir).!
+        }
       }
       logWarning(s"Failed to download Spark $version from $url")
     }
     fail(s"Unable to download Spark $version")
   }
 
-
-  private def downloadSpark(version: String): Unit = {
-    tryDownloadSpark(version, sparkTestingDir.getCanonicalPath)
-
-    val downloaded = new File(sparkTestingDir, 
s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
-    val targetDir = new File(sparkTestingDir, 
s"spark-$version").getCanonicalPath
-
-    Seq("mkdir", targetDir).!
-
-    Seq("tar", "-xzf", downloaded, "-C", targetDir, "--strip-components=1").!
-
-    Seq("rm", downloaded).!
-  }
-
   private def genDataDir(name: String): String = {
     new File(tmpDataDir, name).getCanonicalPath
   }
@@ -125,7 +126,7 @@ class HiveExternalCatalogVersionsSuite extends 
SparkSubmitTestUtils {
     PROCESS_TABLES.testingVersions.zipWithIndex.foreach { case (version, 
index) =>
       val sparkHome = new File(sparkTestingDir, s"spark-$version")
       if (!sparkHome.exists()) {
-        downloadSpark(version)
+        tryDownloadSpark(version, sparkTestingDir.getCanonicalPath)
       }
 
       val args = Seq(


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to