[GitHub] spark pull request #19571: [SPARK-15474][SQL] Write and read back non-emtpy ...

HyukjinKwon Wed, 25 Oct 2017 01:37:38 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19571#discussion_r146783308
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -252,6 +253,13 @@ private[orc] class OrcOutputWriter(
       override def close(): Unit = {
         if (recordWriterInstantiated) {
           recordWriter.close(Reporter.NULL)
    +    } else {
    +      // SPARK-15474 Write empty orc file with correct schema
    +      val conf = context.getConfiguration()
    +      val writer = org.apache.orc.OrcFile.createWriter(
    +        new Path(path), 
org.apache.orc.mapred.OrcOutputFormat.buildOptions(conf))
    +      new org.apache.orc.mapreduce.OrcMapreduceRecordWriter(writer)
    +      writer.close()
    --- End diff --
    
    So, if i understood correctly it will write out by 
`org.apache.orc.mapreduce.OrcMapreduceRecordWriter` when output is empty but, 
write out by `org.apache.hadoop.hive.ql.io.orc.OrcRecordWriter` when output is 
non-empty? I thought we should use the same writer for both paths if possible 
and this one looks rather a band-aid fix. It won't block this PR but I wonder 
if this is the only way we could do for now.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19571: [SPARK-15474][SQL] Write and read back non-emtpy ...

Reply via email to