cloud-fan commented on code in PR #45234:
URL: https://github.com/apache/spark/pull/45234#discussion_r1684091860
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala:
##########
@@ -47,6 +47,15 @@ import org.apache.spark.util.random.XORShiftRandom
*/
trait ShuffleExchangeLike extends Exchange {
+ /**
+ * The asynchronous job that materializes the shuffle. It also does the
preparations work,
+ * such as waiting for the subqueries.
+ */
+ @transient private lazy val shuffleFuture: Future[MapOutputStatistics] =
executeQuery {
+ materializationStarted.set(true)
Review Comment:
After a closer look, I don't think this change works as we expect. We set
this `materializationStarted` flag before we return the `Future`, which means
we are still on the AQE loop's main thread. That said, once we submit a query
stage, its `materializationStarted` becomes true immediately and we can't
really avoid the wasted query stage execution.
The test passed because `ShuffleExchangeExec` calls `child.execute()` before
returning the `Future`. Then we exit the AQE loop without cancelling other
stages.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]