LuciferYang commented on a change in pull request #33748:
URL: https://github.com/apache/spark/pull/33748#discussion_r690882533
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
##########
@@ -60,6 +60,7 @@ case class OrcPartitionReaderFactory(
private val capacity = sqlConf.orcVectorizedReaderBatchSize
private val orcFilterPushDown = sqlConf.orcFilterPushDown
private val ignoreCorruptFiles = sqlConf.ignoreCorruptFiles
+ private val metaCacheEnabled = sqlConf.fileMetaCacheEnabled
Review comment:
> BTW, @viirya 's suggestion about the config is a list config like
spark.sql.sources.useV1SourceList.
@dongjoon-hyun @viirya
If `useFileMetaCacheList` config is used, without change the V2 API, it
seems that it can only be hard coded as
```
val metaCacheEnabled = useFileMetaCacheList.toLowerCase(Locale.ROOT)
.split(",").map(_.trim).contains("ORC")
```
the `formatName("ORC")` is defined `OrcTable`, we can't get it through the
API here at present.
On the other hand, this config will not have corresponding implementations
for all build-in data format like `spark.sql.sources.useV1SourceList`, if the
new data format is not considered, it may only work for `Parquet` and `Orc`,
so are we sure we need to use a list config?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]