cloud-fan commented on a change in pull request #23387: [SPARK-26447][SQL]Allow
OrcColumnarBatchReader to return less partition columns
URL: https://github.com/apache/spark/pull/23387#discussion_r244036202
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java
##########
@@ -58,10 +58,16 @@
/**
* The column IDs of the physical ORC file schema which are required by this
reader.
- * -1 means this required column doesn't exist in the ORC file.
+ * -1 means this required column is partition column, or it doesn't exist in
the ORC file.
Review comment:
I think we need more comments here.
Ideally partition column should never appear in the physical file, and
should only appear in the directory name. However, Spark is OK with partition
columns inside physical file, but Spark will discard the values from the file,
and use the partition value got from directory name.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]