Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

via GitHub Mon, 22 Apr 2024 22:54:53 -0700


cloud-fan commented on code in PR #45234:
URL: https://github.com/apache/spark/pull/45234#discussion_r1575677485



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala:
##########
@@ -51,13 +51,30 @@ abstract class QueryStageExec extends LeafExecNode {
    */
   val plan: SparkPlan
 
+  /**
+   * Name of this query stage which is unique in the entire query plan.
+   */
+  val name: String = s"${this.getClass.getSimpleName}-$id"
+
+  /**
+   * This flag aims to detect if the stage materialization is started. This 
helps
+   * to avoid unnecessary stage materialization when the stage is canceled.
+   */
+  private val materializationStarted = new AtomicBoolean()

Review Comment:
   sorry for the last-minute proposal, but I'm wondering if it's more efficient 
to push this cancelation optimization into shuffle and broadcast nodes.
   
   It looks a bit fragile to operate on the `shuffleFuture` directly in 
`ShuffleQueryStageExec.cancel`. I think we should let `ShuffleExchangeLike` 
provide clear APIs to do it. Today it provides `submitShuffleJob`, and it 
should also provide `cancelShuffleJob`.
   
   Within `ShuffleExchangeLike`, we can do more optimizations. e.g. even if we 
cancel the shuffle stage after the shuffle stage is submitted, we can still 
avoid submitting the shuffle job, as the shuffle node might be doing other 
preparation work: generating the shuffle dependency, waiting for subqueries to 
finish, etc. It's more efficient to check the isCanceled flag at the last 
minute, right before submitting the shuffle job.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

Reply via email to