erenavsarogullari commented on code in PR #39037:
URL: https://github.com/apache/spark/pull/39037#discussion_r1067611746


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala:
##########
@@ -2693,6 +2694,21 @@ class AdaptiveQueryExecSuite
       assert(df.rdd.getNumPartitions == 3)
     }
   }
+
+  test("SPARK-41214: Fix AQE cache does not update plan and metrics") {

Review Comment:
   `AdaptiveSparkPlan` nodes are being injected for following use-cases:
   1- Parent Query level as root node of `SparkPlan`,
   2- AQE under `InMemoryRelation`,
   3- SubQueries.
   Does it makes sense to have UT also including both `subQuery` + `AQE under 
IMR` cases?



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala:
##########
@@ -2693,6 +2694,21 @@ class AdaptiveQueryExecSuite
       assert(df.rdd.getNumPartitions == 3)
     }
   }
+
+  test("SPARK-41214: Fix AQE cache does not update plan and metrics") {
+    withSQLConf(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING.key -> 
"true") {
+      val arr = Seq(
+        (1, "Employee_1", "Department_1"),
+        (2, "Employee_2", "Department_2"))
+      val df = arr.toDF("id", "name", "department").filter($"id" < 
3).groupBy($"name").count()
+      df.cache().createOrReplaceTempView("v1")
+      val arr2 = Seq((1, "Employee_1", "Department_1"))
+      val df2 = arr2.toDF("id", "name", "department").filter($"id" > 
0).groupBy($"name").count()
+      df2.cache().createOrReplaceTempView("v2")
+
+      runAdaptiveAndVerifyResult("SELECT * FROM v1 JOIN v2 on v1.name = 
v2.name")

Review Comment:
   `HashAggregateExec` nodes metrics were coming as empty before:
   
https://issues.apache.org/jira/secure/attachment/13052914/DAG%20when%20AQE%3DON%20and%20AQECachedDFSupport%3DON%20without%20fix.png
   
   Does it make sense to verify also `HashAggregateExec` metric(s) (coming 
before InMemoryRelation nodes) to support robustness? For example: 
`HashAggregateExec - number of output rows` does not change per test run.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to