[GitHub] [spark] LuciferYang commented on a change in pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

GitBox Tue, 17 Aug 2021 21:24:35 -0700


LuciferYang commented on a change in pull request #33748:
URL: https://github.com/apache/spark/pull/33748#discussion_r690882533




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
##########
@@ -60,6 +60,7 @@ case class OrcPartitionReaderFactory(
   private val capacity = sqlConf.orcVectorizedReaderBatchSize
   private val orcFilterPushDown = sqlConf.orcFilterPushDown
   private val ignoreCorruptFiles = sqlConf.ignoreCorruptFiles
+  private val metaCacheEnabled = sqlConf.fileMetaCacheEnabled

Review comment:
       > BTW, @viirya 's suggestion about the config is a list config like 
spark.sql.sources.useV1SourceList.
   
   @dongjoon-hyun @viirya 
   
   If `useFileMetaCacheList` config is used, without change the V2 API, it 
seems that it can only be hard coded as
   ```
   val metaCacheEnabled = useFileMetaCacheList.toLowerCase(Locale.ROOT)
         .split(",").map(_.trim).contains("ORC")
   ```
   
   the `formatName("ORC")` is defined `OrcTable`, we can't get it through the 
API here at present.
   
   
   On the other hand, this config will not have corresponding implementations 
for all build-in data format like `spark.sql.sources.useV1SourceList`, if the 
new data format is not considered,  it may only work for `Parquet` and `Orc`, 
so are we sure we need to use a list config?
   
   
   
   
   
   
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LuciferYang commented on a change in pull request #33748: [SPARK-36516][SQL] Support File Metadata Cache for ORC

Reply via email to