gatorsmile commented on a change in pull request #23943: [SPARK-27034][SQL]
Nested schema pruning for ORC
URL: https://github.com/apache/spark/pull/23943#discussion_r358063793
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1541,8 +1541,8 @@ object SQLConf {
.internal()
.doc("Prune nested fields from a logical relation's output which are
unnecessary in " +
"satisfying a query. This optimization allows columnar file format
readers to avoid " +
- "reading unnecessary nested column data. Currently Parquet is the only
data source that " +
- "implements this optimization.")
+ "reading unnecessary nested column data. Currently Parquet and ORC v1
are the " +
+ "data sources that implement this optimization.")
.booleanConf
.createWithDefault(false)
Review comment:
@dbtsai @dongjoon-hyun We turned on this flag by default in the upcoming 3.0
because Apple has tried this in the production in the last few months. I am
wondering if that statement also includes ORC nested schema pruning?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]