[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

viirya Mon, 11 Dec 2017 16:29:16 -0800

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19943#discussion_r156241635
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
    @@ -139,15 +146,25 @@ class OrcFileFormat
           }
         }
     
    +    val resultSchema = StructType(requiredSchema.fields ++ 
partitionSchema.fields)
    +    val enableVectorizedReader = 
sparkSession.sessionState.conf.orcVectorizedReaderEnabled &&
    +      supportBatch(sparkSession, resultSchema)
    +
         val broadcastedConf =
           sparkSession.sparkContext.broadcast(new 
SerializableConfiguration(hadoopConf))
         val isCaseSensitive = 
sparkSession.sessionState.conf.caseSensitiveAnalysis
     
         (file: PartitionedFile) => {
           val conf = broadcastedConf.value.value
     
    +      val filePath = new Path(new URI(file.filePath))
    +
    +      val fs = filePath.getFileSystem(conf)
    +      val readerOptions = OrcFile.readerOptions(conf).filesystem(fs)
    +      val reader = OrcFile.createReader(filePath, readerOptions)
    --- End diff --
    
    Why extract the creation of `reader` from `requestedColumnIds` to here?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

Reply via email to