Repository: spark Updated Branches: refs/heads/master 704af4bd6 -> 17cdabb88
[SPARK-19809][SQL][TEST] NullPointerException on zero-size ORC file ## What changes were proposed in this pull request? Until 2.2.1, Spark raises `NullPointerException` on zero-size ORC files. Usually, these zero-size ORC files are generated by 3rd-party apps like Flume. ```scala scala> sql("create table empty_orc(a int) stored as orc location '/tmp/empty_orc'") $ touch /tmp/empty_orc/zero.orc scala> sql("select * from empty_orc").show java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021) ... Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:560) ``` After [SPARK-22279](https://github.com/apache/spark/pull/19499), Apache Spark with the default configuration doesn't have this bug. Although Hive 1.2.1 library code path still has the problem, we had better have a test coverage on what we have now in order to prevent future regression on it. ## How was this patch tested? Pass a newly added test case. Author: Dongjoon Hyun <dongj...@apache.org> Closes #19948 from dongjoon-hyun/SPARK-19809-EMPTY-FILE. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/17cdabb8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/17cdabb8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/17cdabb8 Branch: refs/heads/master Commit: 17cdabb88761e67ca555299109f89afdf02a4280 Parents: 704af4b Author: Dongjoon Hyun <dongj...@apache.org> Authored: Wed Dec 13 07:42:24 2017 +0900 Committer: hyukjinkwon <gurwls...@gmail.com> Committed: Wed Dec 13 07:42:24 2017 +0900 ---------------------------------------------------------------------- .../spark/sql/hive/execution/SQLQuerySuite.scala | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/17cdabb8/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---------------------------------------------------------------------- diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala index f2562c3..93c91d3 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala @@ -2172,4 +2172,21 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } } + + test("SPARK-19809 NullPointerException on zero-size ORC file") { + Seq("native", "hive").foreach { orcImpl => + withSQLConf(SQLConf.ORC_IMPLEMENTATION.key -> orcImpl) { + withTempPath { dir => + withTable("spark_19809") { + sql(s"CREATE TABLE spark_19809(a int) STORED AS ORC LOCATION '$dir'") + Files.touch(new File(s"${dir.getCanonicalPath}", "zero.orc")) + + withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") { // default since 2.3.0 + checkAnswer(sql("SELECT * FROM spark_19809"), Seq.empty) + } + } + } + } + } + } } --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org