pan3793 opened a new pull request, #52697: URL: https://github.com/apache/spark/pull/52697
Backport #52584 to branch-3.5 ### What changes were proposed in this pull request? This is the second try of https://github.com/apache/spark/pull/52474, following [the suggestion from cloud-fan](https://github.com/apache/spark/pull/52474#issuecomment-3383971418) This PR fixes a bug in `plannedWrite`, where the `query` has foldable orderings in the partition columns. ``` CREATE TABLE t (i INT, j INT, k STRING) USING PARQUET PARTITIONED BY (k); INSERT OVERWRITE t SELECT j AS i, i AS j, '0' as k FROM t0 SORT BY k, i; ``` The evaluation of `FileFormatWriter.orderingMatched` fails because `SortOrder(Literal)` is eliminated by `EliminateSorts`. ### Why are the changes needed? `V1Writes` will override the custom sort order when the query output ordering does not satisfy the required ordering. Before SPARK-53707, when the query's output contains literals in partition columns, the judgment produces a false-negative result, thus causing the sort order not to take effect. SPARK-53707 partially fixes the issue on the logical plan by adding a `Project` of query in `V1Writes`. Before SPARK-53707 ``` Sort [0 ASC NULLS FIRST, i#280 ASC NULLS FIRST], false +- Project [j#287 AS i#280, i#286 AS j#281, 0 AS k#282] +- Relation spark_catalog.default.t0[i#286,j#287,k#288] parquet ``` After SPARK-53707 ``` Project [i#284, j#285, 0 AS k#290] +- Sort [0 ASC NULLS FIRST, i#284 ASC NULLS FIRST], false +- Project [i#284, j#285] +- Relation spark_catalog.default.t0[i#284,j#285,k#286] parquet ``` Note, note the issue still exists because there is another place to check the ordering match again in `FileFormatWriter`. This PR fixes the issue thoroughly, with new UTs added. ### Does this PR introduce _any_ user-facing change? Yes, it's a bug fix. ### How was this patch tested? New UTs are added. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
