spark git commit: [SPARK-19809][SQL][TEST] NullPointerException on zero-size ORC file

gurwls223 Tue, 12 Dec 2017 14:42:56 -0800

Repository: spark
Updated Branches:
  refs/heads/master 704af4bd6 -> 17cdabb88



[SPARK-19809][SQL][TEST] NullPointerException on zero-size ORC file

## What changes were proposed in this pull request?

Until 2.2.1, Spark raises `NullPointerException` on zero-size ORC files. 
Usually, these zero-size ORC files are generated by 3rd-party apps like Flume.

```scala
scala> sql("create table empty_orc(a int) stored as orc location 
'/tmp/empty_orc'")

$ touch /tmp/empty_orc/zero.orc

scala> sql("select * from empty_orc").show
java.lang.RuntimeException: serious problem at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
...
Caused by: java.lang.NullPointerException at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:560)
```

After [SPARK-22279](https://github.com/apache/spark/pull/19499), Apache Spark 
with the default configuration doesn't have this bug. Although Hive 1.2.1 
library code path still has the problem, we had better have a test coverage on 
what we have now in order to prevent future regression on it.

## How was this patch tested?

Pass a newly added test case.

Author: Dongjoon Hyun <dongj...@apache.org>

Closes #19948 from dongjoon-hyun/SPARK-19809-EMPTY-FILE.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/17cdabb8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/17cdabb8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/17cdabb8

Branch: refs/heads/master
Commit: 17cdabb88761e67ca555299109f89afdf02a4280
Parents: 704af4b
Author: Dongjoon Hyun <dongj...@apache.org>
Authored: Wed Dec 13 07:42:24 2017 +0900
Committer: hyukjinkwon <gurwls...@gmail.com>
Committed: Wed Dec 13 07:42:24 2017 +0900

----------------------------------------------------------------------
 .../spark/sql/hive/execution/SQLQuerySuite.scala   | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/17cdabb8/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
----------------------------------------------------------------------
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
index f2562c3..93c91d3 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
@@ -2172,4 +2172,21 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
       }
     }
   }
+
+  test("SPARK-19809 NullPointerException on zero-size ORC file") {
+    Seq("native", "hive").foreach { orcImpl =>
+      withSQLConf(SQLConf.ORC_IMPLEMENTATION.key -> orcImpl) {
+        withTempPath { dir =>
+          withTable("spark_19809") {
+            sql(s"CREATE TABLE spark_19809(a int) STORED AS ORC LOCATION 
'$dir'")
+            Files.touch(new File(s"${dir.getCanonicalPath}", "zero.orc"))
+
+            withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") { // 
default since 2.3.0
+              checkAnswer(sql("SELECT * FROM spark_19809"), Seq.empty)
+            }
+          }
+        }
+      }
+    }
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-19809][SQL][TEST] NullPointerException on zero-size ORC file

Reply via email to