Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19571#discussion_r146783308
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
@@ -252,6 +253,13 @@ private[orc] class OrcOutputWriter(
override def close(): Unit = {
if (recordWriterInstantiated) {
recordWriter.close(Reporter.NULL)
+ } else {
+ // SPARK-15474 Write empty orc file with correct schema
+ val conf = context.getConfiguration()
+ val writer = org.apache.orc.OrcFile.createWriter(
+ new Path(path),
org.apache.orc.mapred.OrcOutputFormat.buildOptions(conf))
+ new org.apache.orc.mapreduce.OrcMapreduceRecordWriter(writer)
+ writer.close()
--- End diff --
So, if i understood correctly it will write out by
`org.apache.orc.mapreduce.OrcMapreduceRecordWriter` when output is empty but,
write out by `org.apache.hadoop.hive.ql.io.orc.OrcRecordWriter` when output is
non-empty? I thought we should use the same writer for both paths if possible
and this one looks rather a band-aid fix. It won't block this PR but I wonder
if this is the only way we could do for now.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]