GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/19571
[SPARK-15474][SQL] Write and read back non-emtpy schema with empty dataframe ## What changes were proposed in this pull request? Previously, ORC file format cannot write a correct schema in case of empty dataframe. Instead, it creates an empty ORC file with **empty** schema, `struct<>`. So, Spark users cannot write and read back ORC files with non-empty schema and no rows. This PR uses new Apache ORC 1.4.1 to create an empty ORC file with a correct schema. Also, this PR uses ORC 1.4.1 to infer schema always. **BEFORE** ```scala scala> val emptyDf = Seq((true, 1, "str")).toDF("a", "b", "c").limit(0) scala> emptyDf.write.format("orc").mode("overwrite").save("/tmp/empty") scala> spark.read.format("orc").load("/tmp/empty").printSchema org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.; ``` **AFTER** ```scala scala> val emptyDf = Seq((true, 1, "str")).toDF("a", "b", "c").limit(0) scala> emptyDf.write.format("orc").mode("overwrite").save("/tmp/empty") scala> spark.read.format("orc").load("/tmp/empty").printSchema root |-- a: boolean (nullable = true) |-- b: integer (nullable = true) |-- c: string (nullable = true) ``` ## How was this patch tested? Pass the Jenkins with newly added test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15474 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19571.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19571 ---- commit be7ba9b5a9c70519a7fa1b0497955fbba763e2e6 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2017-10-25T02:50:33Z [SPARK-15474][SQL] Write and read back non-emtpy schema with empty dataframe ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org