[GitHub] [spark] ChenMichael commented on a change in pull request #34036: [SPARK-36795][SQL] Explain Formatted has Duplicate Node IDs

GitBox Wed, 22 Sep 2021 22:04:10 -0700


ChenMichael commented on a change in pull request #34036:
URL: https://github.com/apache/spark/pull/34036#discussion_r714473203




##########
File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
##########
@@ -704,6 +704,31 @@ class ExplainSuiteAE extends ExplainSuiteHelper with 
EnableAdaptiveExecutionSuit
         "Bucketed: false (bucket column(s) not read)")
     }
   }
+
+  test("SPARK-36795: Node IDs should not be duplicated when InMemoryRelation 
Present") {
+    withTempView("t1", "t2") {
+      Seq(1).toDF("k").write.saveAsTable("t1")
+      Seq(1).toDF("key").write.saveAsTable("t2")
+      spark.sql("SELECT * FROM t1").persist()
+      val query = "SELECT * FROM (SELECT * FROM t1) join t2 " +
+        "ON k = t2.key"
+      val df = sql(query).toDF()
+
+      df.collect()
+      checkKeywordsExistsInExplain(df, FormattedMode,
+        """   * BroadcastHashJoin Inner BuildLeft (12)
+          |   :- BroadcastQueryStage (8)
+          |   :  +- BroadcastExchange (7)
+          |   :     +- * Filter (6)
+          |   :        +- * ColumnarToRow (5)

Review comment:
       Ok. changed the test to regex that extracts the node ids and asserts 
they are different.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ChenMichael commented on a change in pull request #34036: [SPARK-36795][SQL] Explain Formatted has Duplicate Node IDs

Reply via email to