[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

viirya Tue, 26 Dec 2017 22:36:00 -0800

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19943#discussion_r158768353
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
    @@ -170,6 +171,8 @@ case class FileSourceScanExec(
     
       val needsUnsafeRowConversion: Boolean = if 
(relation.fileFormat.isInstanceOf[ParquetSource]) {
         
SparkSession.getActiveSession.get.sessionState.conf.parquetVectorizedReaderEnabled
    +  } else if (relation.fileFormat.isInstanceOf[OrcFileFormat]) {
    +    
SparkSession.getActiveSession.get.sessionState.conf.orcVectorizedReaderEnabled
    --- End diff --
    
    Different than Parquet, for now we enable vectorized ORC reader when batch 
output is supported. We don't need unsafe row conversion at all for ORC. 
Because once it supports batch, we go batch-based approach. If it doesn't 
support batch, we don't enable vectorized ORC reader at all, so we don't need 
unsafe row conversion too.
    
    Once we can enable vectorized ORC even batch is not supported, we need to 
add this.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader

Reply via email to