[GitHub] spark pull request #19571: [SPARK-15474][SQL] Write and read back non-emtpy ...

dongjoon-hyun Wed, 25 Oct 2017 12:15:29 -0700

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19571#discussion_r146959158
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
    @@ -252,6 +253,13 @@ private[orc] class OrcOutputWriter(
       override def close(): Unit = {
         if (recordWriterInstantiated) {
           recordWriter.close(Reporter.NULL)
    +    } else {
    +      // SPARK-15474 Write empty orc file with correct schema
    +      val conf = context.getConfiguration()
    +      val writer = org.apache.orc.OrcFile.createWriter(
    +        new Path(path), 
org.apache.orc.mapred.OrcOutputFormat.buildOptions(conf))
    +      new org.apache.orc.mapreduce.OrcMapreduceRecordWriter(writer)
    +      writer.close()
    --- End diff --
    
    Yep. That's correct understanding. This PR intentionally focuses only on 
handling empty files and inferring schema. This will help us transit safely 
from old Hive ORC to new Apache ORC 1.4.1.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19571: [SPARK-15474][SQL] Write and read back non-emtpy ...

Reply via email to