mcdull-zhang opened a new pull request, #36831:
URL: https://github.com/apache/spark/pull/36831
### What changes were proposed in this pull request?
This is the modified version of #36483
PropagateEmptyRelation can simplify Join. For example, if the right side of
LEFT JOIN is empty, then it can eliminate join to its left side.
If there is a shuffle on the left side, it can be considered that the
shuffle is meaningless, and the shuffle can be optimized by using LocalRead.
For example:
```sql
SELECT * FROM testData LEFT JOIN testData2 ON key = a and a > 10
```
Before this pr:
```tex
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(3) Project [key#13, value#14, cast(null as int) AS a#23, cast(null as
int) AS b#24]
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- Exchange hashpartitioning(key#13, 5),
ENSURE_REQUIREMENTS_MEANINGLESS, [id=#107]
+- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0,
org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13,
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType,
fromString, knownnotnull(assertnotnull(input[0,
org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false,
true) AS value#14]
+- Scan[obj#12]
```
After this pr:
```tex
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(3) Project [key#13, value#14, cast(null as int) AS a#23, cast(null as
int) AS b#24]
+- AQEShuffleRead local
+- ShuffleQueryStage 0
+- Exchange hashpartitioning(key#13, 5),
ENSURE_REQUIREMENTS_MEANINGLESS, [id=#107]
+- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0,
org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13,
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType,
fromString, knownnotnull(assertnotnull(input[0,
org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false,
true) AS value#14]
+- Scan[obj#12]
```
### Why are the changes needed?
Avoid remote shuffle read and improve efficiency.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]