[GitHub] [spark] EnricoMi opened a new pull request, #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

GitBox Sun, 23 Oct 2022 06:20:34 -0700


EnricoMi opened a new pull request, #38358:
URL: https://github.com/apache/spark/pull/38358


   ### What changes were proposed in this pull request?
   FileFormatWriter materializes an AdaptiveQueryPlan before accessing the 
plan's `outputOrdering`. This is required for Spark 3.0 to 3.3. Spark 3.4 does 
not need this because FileFormatWriter gets the final plan.
   
   ### Why are the changes needed?
   FileFormatWriter enforces an ordering if the written plan does not provide 
that ordering. An AdaptiveQueryPlan does not know its final ordering (Spark 3.0 
to 3.3), so FileFormatWriter always enforces the ordering even if the plan 
provides it. In case of spilling, that order gets broken (see SPARK-40588).
   
   ### Does this PR introduce _any_ user-facing change?
   This fixes [SPARK-40588](https://issues.apache.org/jira/browse/SPARK-40588), 
which was introduced in 3.0.
   
   ### How was this patch tested?
   The final plan that is written to files cannot be extracted from 
FileFormatWriter. The bug explained in 
[SPARK-40588](https://issues.apache.org/jira/browse/SPARK-40588) can only be 
asserted on the result files when spilling occurs. This is very hard to control 
in an unit test scenario.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] EnricoMi opened a new pull request, #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

Reply via email to