Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/19571#discussion_r146959158
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
@@ -252,6 +253,13 @@ private[orc] class OrcOutputWriter(
override def close(): Unit = {
if (recordWriterInstantiated) {
recordWriter.close(Reporter.NULL)
+ } else {
+ // SPARK-15474 Write empty orc file with correct schema
+ val conf = context.getConfiguration()
+ val writer = org.apache.orc.OrcFile.createWriter(
+ new Path(path),
org.apache.orc.mapred.OrcOutputFormat.buildOptions(conf))
+ new org.apache.orc.mapreduce.OrcMapreduceRecordWriter(writer)
+ writer.close()
--- End diff --
Yep. That's correct understanding. This PR intentionally focuses only on
handling empty files and inferring schema. This will help us transit safely
from old Hive ORC to new Apache ORC 1.4.1.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]