GitHub user dongjoon-hyun opened a pull request:
https://github.com/apache/spark/pull/21232
[SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalogVersionsSuite
should verify the downloaded file
## What changes were proposed in this pull request?
This is a backport of #21210 because `branch-2.2` also faces the same
failures.
Although [SPARK-22654](https://issues.apache.org/jira/browse/SPARK-22654)
made `HiveExternalCatalogVersionsSuite` download from Apache mirrors three
times, it has been flaky because it didn't verify the downloaded file. Some
Apache mirrors terminate the downloading abnormally, the *corrupted* file shows
the following errors.
```
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
22:46:32.700 WARN
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite:
===== POSSIBLE THREAD LEAK IN SUITE
o.a.s.sql.hive.HiveExternalCatalogVersionsSuite, thread names: Keep-Alive-Timer
=====
*** RUN ABORTED ***
java.io.IOException: Cannot run program "./bin/spark-submit" (in
directory "/tmp/test-spark/spark-2.2.0"): error=2, No such file or directory
```
This has been reported weirdly in two ways. For example, the above case is
reported as Case 2 `no failures`.
- Case 1. [Test Result (1 failure /
+1)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/4389/)
- Case 2. [Test Result (no
failures)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4811/)
This PR aims to make `HiveExternalCatalogVersionsSuite` more robust by
verifying the downloaded `tgz` file by extracting and checking the existence of
`bin/spark-submit`. If it turns out that the file is empty or corrupted,
`HiveExternalCatalogVersionsSuite` will do retry logic like the download
failure.
## How was this patch tested?
Pass the Jenkins.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dongjoon-hyun/spark SPARK-23489-2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21232.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21232
----
commit 9408c0dad840f4eb7948a1c03eb66017e5a676fe
Author: Dongjoon Hyun <dongjoon@...>
Date: 2018-05-03T19:37:44Z
[SPARK-23489][SQL][TEST][BRANCH-2.2] HiveExternalCatalogVersionsSuite
should verify the downloaded file
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]