AngersZhuuuu commented on a change in pull request #26437: [SPARK-29800][SQL]
Rewrite non-correlated subquery use ScalaSubquery to optimize perf
URL: https://github.com/apache/spark/pull/26437#discussion_r362734856
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
##########
@@ -785,7 +785,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils
with SharedSparkSessi
assert(getNumInMemoryRelations(ds) == 2)
val cachedDs = sql(sqlText).cache()
-
assert(getNumInMemoryTablesRecursively(cachedDs.queryExecution.sparkPlan) == 3)
+
assert(getNumInMemoryTablesRecursively(cachedDs.queryExecution.sparkPlan) == 2)
Review comment:
> previously we also rewrite it (to join) in the optimizer, right?
**Before **
```
+- Filter NOT exists#230 []
: +- Project [c1#222]
: +- InMemoryRelation [c1#222], StorageLevel(disk, memory,
deserialized, 1 replicas)
: +- *(1) Project [value#219 AS c1#222]
: +- *(1) LocalTableScan [value#219]
+- InMemoryRelation [c1#222], StorageLevel(disk, memory, deserialized, 1
replicas)
+- *(1) Project [value#219 AS c1#222]
+- *(1) LocalTableScan [value#219]
InMemoryTableScan [c1#222]
+- InMemoryRelation [c1#222], StorageLevel(disk, memory, deserialized, 1
replicas)
+- BroadcastNestedLoopJoin BuildRight, LeftAnti
:- *(1) ColumnarToRow
: +- Scan In-memory table t1 [c1#222]
: +- InMemoryRelation [c1#222], StorageLevel(disk,
memory, deserialized, 1 replicas)
: +- *(1) Project [value#219 AS c1#222]
: +- *(1) LocalTableScan [value#219]
+- BroadcastExchange IdentityBroadcastMode, [id=#103]
+- *(2) ColumnarToRow
+- Scan In-memory table t1
+- InMemoryRelation [c1#222], StorageLevel(disk,
memory, deserialized, 1 replicas)
+- *(1) Project [value#219 AS c1#222]
+- *(1) LocalTableScan [value#219]
```
**Now**
```
Project [c1#222]
+- Filter NOT exists#230 []
: +- Project [c1#222]
: +- InMemoryRelation [c1#222], StorageLevel(disk, memory,
deserialized, 1 replicas)
: +- *(1) Project [value#219 AS c1#222]
: +- *(1) LocalTableScan [value#219]
+- InMemoryRelation [c1#222], StorageLevel(disk, memory, deserialized, 1
replicas)
+- *(1) Project [value#219 AS c1#222]
+- *(1) LocalTableScan [value#219]
InMemoryTableScan [c1#222]
+- InMemoryRelation [c1#222], StorageLevel(disk, memory, deserialized, 1
replicas)
+- *(1) Filter isnull(Subquery scalar-subquery#253, [id=#101])
: +- Subquery scalar-subquery#253, [id=#101]
: +- CollectLimit 1
: +- *(1) Project [1 AS col#270]
: +- *(1) ColumnarToRow
: +- Scan In-memory table t1
: +- InMemoryRelation [c1#222],
StorageLevel(disk, memory, deserialized, 1 replicas)
: +- *(1) Project [value#219 AS c1#222]
: +- *(1) LocalTableScan [value#219]
+- *(1) ColumnarToRow
+- Scan In-memory table t1 [c1#222], [isnull(ReusedSubquery
Subquery scalar-subquery#253, [id=#101])]
:- InMemoryRelation [c1#222], StorageLevel(disk,
memory, deserialized, 1 replicas)
: +- *(1) Project [value#219 AS c1#222]
: +- *(1) LocalTableScan [value#219]
+- ReusedSubquery Subquery scalar-subquery#253,
[id=#101]
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]