[jira] [Assigned] (SPARK-42753) ReusedExchange refers to non-existent node

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42753:


Assignee: (was: Apache Spark)

> ReusedExchange refers to non-existent node
> --
>
> Key: SPARK-42753
> URL: https://issues.apache.org/jira/browse/SPARK-42753
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.4.0
>Reporter: Steven Chen
>Priority: Major
>
> There is an AQE “issue“ where during AQE planning, the Exchange "that's 
> being" reused could be replaced in the plan tree. So, when we print the query 
> plan, the ReusedExchange will refer to an “unknown“ Exchange. An example 
> below:
>  
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]
>  Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code}
>  
>  
> Below is an example to demonstrate the root cause:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A
>           |-- SomeNode Y
>               |-- Exchange B
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C
>           |-- SomeNode N
>               |-- Exchange D
> {code}
>  
>  
> Step 1: Exchange B is materialized and the QueryStage is added to stage cache
> Step 2: Exchange D reuses Exchange B
> Step 3: Exchange C is materialized and the QueryStage is added to stage cache
> Step 4: Exchange A reuses Exchange C
>  
> Then the final plan looks like:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A -> ReusedExchange (reuses Exchange C)
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C -> PhotonShuffleMapStage 
>           |-- SomeNode N
>               |-- Exchange D -> ReusedExchange (reuses Exchange B)
> {code}
>  
>  
> As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist 
> node. This *DOES NOT* affect query execution but will cause the query 
> visualization malfunction in the following ways:
>  # The ReusedExchange child subtree will still appear in the Spark UI graph 
> but will contain no node IDs.
>  # The ReusedExchange node details in the Explain plan will refer to a 
> UNKNOWN node. Example below.
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]{code}
>  # The child exchange and its subtree may be missing from the Explain text 
> completely. No node details or tree string shown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42753) ReusedExchange refers to non-existent node

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42753:


Assignee: Apache Spark

> ReusedExchange refers to non-existent node
> --
>
> Key: SPARK-42753
> URL: https://issues.apache.org/jira/browse/SPARK-42753
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.4.0
>Reporter: Steven Chen
>Assignee: Apache Spark
>Priority: Major
>
> There is an AQE “issue“ where during AQE planning, the Exchange "that's 
> being" reused could be replaced in the plan tree. So, when we print the query 
> plan, the ReusedExchange will refer to an “unknown“ Exchange. An example 
> below:
>  
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]
>  Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code}
>  
>  
> Below is an example to demonstrate the root cause:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A
>           |-- SomeNode Y
>               |-- Exchange B
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C
>           |-- SomeNode N
>               |-- Exchange D
> {code}
>  
>  
> Step 1: Exchange B is materialized and the QueryStage is added to stage cache
> Step 2: Exchange D reuses Exchange B
> Step 3: Exchange C is materialized and the QueryStage is added to stage cache
> Step 4: Exchange A reuses Exchange C
>  
> Then the final plan looks like:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A -> ReusedExchange (reuses Exchange C)
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C -> PhotonShuffleMapStage 
>           |-- SomeNode N
>               |-- Exchange D -> ReusedExchange (reuses Exchange B)
> {code}
>  
>  
> As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist 
> node. This *DOES NOT* affect query execution but will cause the query 
> visualization malfunction in the following ways:
>  # The ReusedExchange child subtree will still appear in the Spark UI graph 
> but will contain no node IDs.
>  # The ReusedExchange node details in the Explain plan will refer to a 
> UNKNOWN node. Example below.
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]{code}
>  # The child exchange and its subtree may be missing from the Explain text 
> completely. No node details or tree string shown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org