[GitHub] [spark] zzzzming95 commented on a diff in pull request #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

GitBox Sat, 29 Oct 2022 02:40:42 -0700


zzzzming95 commented on code in PR #38358:
URL: https://github.com/apache/spark/pull/38358#discussion_r1008670694



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala:
##########
@@ -187,8 +188,17 @@ object FileFormatWriter extends Logging {
     // We should first sort by partition columns, then bucket id, and finally 
sorting columns.
     val requiredOrdering =
       partitionColumns ++ writerBucketSpec.map(_.bucketIdExpression) ++ 
sortColumns
+
+    // SPARK-40588: plan may contain an AdaptiveSparkPlanExec, which does not 
know
+    // its final plan's ordering, so we have to materialize that plan first
+    def materializeAdaptiveSparkPlan(plan: SparkPlan): SparkPlan = plan match {

Review Comment:
   I found a bug similar to this issue. The core of these problems is that 
`V1Writes#prepareQuery()` will generate a new Sort .
   
https://github.com/apache/spark/blob/a2f3958a29ad16a1b3e372156d6c6ae4959d5e8c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala#L70
   
   In this pr  : https://github.com/apache/spark/pull/38356
   
   My solution is that in `V1Writes#prepareQuery()`, the missing ordering field 
of Sort in the top layer is also brought.
   
   If `AdaptiveSparkPlanExec` is removed and the plan is extracted, will it 
cause other situations that do not meet expectations? For example, removing 
'AdaptiveSparkPlanExec' causes some AQE features to fail to take effect?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zzzzming95 commented on a diff in pull request #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

Reply via email to