cloud-fan commented on code in PR #52584:
URL: https://github.com/apache/spark/pull/52584#discussion_r2427702871
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala:
##########
@@ -157,6 +153,24 @@ object FileFormatWriter extends Logging {
val actualOrdering = writeFilesOpt.map(_.child)
.getOrElse(materializeAdaptiveSparkPlan(plan))
.outputOrdering
+
+ val requiredOrdering = {
+ // We should first sort by dynamic partition columns, then bucket id,
and finally sorting
+ // columns.
+ val ordering = partitionColumns.drop(numStaticPartitionCols) ++
+ writerBucketSpec.map(_.bucketIdExpression) ++ sortColumns
+ plan.logicalLink match {
Review Comment:
I'm a bit worried about this. In AQE we have a fallback to find logical link
in the children, so that it's more reliable. Now we have the risk of perf
regression if the logical link is not present and we add an extra sort.
Shall we remove the adding sort here completly if planned write is enabled
(`WriteFiles` is present)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]