dongjoon-hyun commented on a change in pull request #23387:
[SPARK-26447][SQL]Allow OrcColumnarBatchReader to return less partition columns
URL: https://github.com/apache/spark/pull/23387#discussion_r244266136
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java
##########
@@ -143,75 +147,75 @@ public void initialize(
/**
* Initialize columnar batch by setting required schema and partition
information.
* With this information, this creates ColumnarBatch with the full schema.
+ *
+ * @param orcSchema Schema from ORC file reader.
+ * @param requiredFields All the fields that are required to return,
including partition fields.
+ * @param requestedDataColIds Requested column ids from orcSchema. -1 if not
existed.
+ * @param requestedPartitionColIds Requested column ids from partition
schema. -1 if not existed.
+ * @param partitionValues Values of partition columns.
*/
public void initBatch(
TypeDescription orcSchema,
- int[] requestedColIds,
StructField[] requiredFields,
- StructType partitionSchema,
+ int[] requestedDataColIds,
+ int[] requestedPartitionColIds,
InternalRow partitionValues) {
batch = orcSchema.createRowBatch(capacity);
assert(!batch.selectedInUse); // `selectedInUse` should be initialized
with `false`.
-
+ assert(requiredFields.length == requestedDataColIds.length);
+ assert(requiredFields.length == requestedPartitionColIds.length);
+ // If a required column is also partition column, use partition value and
don't read from file.
+ for (int i = 0; i < requiredFields.length; i++) {
+ if (requestedPartitionColIds[i] != -1) {
+ requestedDataColIds[i] = -1;
+ }
+ }
Review comment:
Thanks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]