Re: [PR] [SPARK-53805][SQL] Push Variant into DSv2 scan [spark]

via GitHub Sat, 18 Oct 2025 05:04:19 -0700


huaxingao commented on code in PR #52522:
URL: https://github.com/apache/spark/pull/52522#discussion_r2417348154



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PushVariantIntoScan.scala:
##########
@@ -279,6 +280,8 @@ object PushVariantIntoScan extends Rule[LogicalPlan] {
       relation @ LogicalRelationWithTable(
       hadoopFsRelation@HadoopFsRelation(_, _, _, _, _: ParquetFileFormat, _), 
_)) =>
         rewritePlan(p, projectList, filters, relation, hadoopFsRelation)
+      case p@PhysicalOperation(projectList, filters, relation: 
DataSourceV2Relation) =>

Review Comment:
   Sorry for the confusion. I have updated the code.
   
   The logic for transforming variant columns to struct is identical between 
DSv1 and DSv2. Now they both use the same helper methods 
(`collectAndRewriteVariants`, `buildAttributeMap`, `buildFilterAndProject`).
   
   The only difference is how the transformed schema is communicated to the 
data source. DSv1 stores the new schema in `HadoopFsRelation.dataSchema` and 
the file source reads this field directly; DSv2 has no schema field to update. 
The schema is communicated later when `V2ScanRelationPushDown` calls 
`pruneColumns`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-53805][SQL] Push Variant into DSv2 scan [spark]

Reply via email to