huaxingao commented on code in PR #52522:
URL: https://github.com/apache/spark/pull/52522#discussion_r2417348154
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PushVariantIntoScan.scala:
##########
@@ -279,6 +280,8 @@ object PushVariantIntoScan extends Rule[LogicalPlan] {
relation @ LogicalRelationWithTable(
hadoopFsRelation@HadoopFsRelation(_, _, _, _, _: ParquetFileFormat, _),
_)) =>
rewritePlan(p, projectList, filters, relation, hadoopFsRelation)
+ case p@PhysicalOperation(projectList, filters, relation:
DataSourceV2Relation) =>
Review Comment:
Sorry for the confusion. I have updated the code.
The logic for transforming variant columns to struct is identical between
DSv1 and DSv2. Now they both use the same helper methods
(`collectAndRewriteVariants`, `buildAttributeMap`, `buildFilterAndProject`).
The only difference is how the transformed schema is communicated to the
data source. DSv1 stores the new schema in `HadoopFsRelation.dataSchema` and
the file source reads this field directly; DSv2 has no schema field to update.
The schema is communicated later when `V2ScanRelationPushDown` calls
`pruneColumns`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]