zml1206 opened a new pull request, #51451:
URL: https://github.com/apache/spark/pull/51451
### What changes were proposed in this pull request?
Make the maxRows of every child in join and union only calculated at most
once.
### Why are the changes needed?
Improve performance, especially when there are dozens of joins and unions.
Before pr, the number of maxRows executions of join/union increases
exponentially with the number of joins/unions.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Local test, 28 tables join before pr 36s, after pr 4s, 29 tables join
before pr 67s, after pr 5s
```
Seq(1).toDF("a").write.mode("overwrite").parquet("tmp/t1")
spark.read.parquet("tmp/t1").createOrReplaceTempView("t")
val t1 = System.currentTimeMillis()
spark.sql(
"""
|select a,count(1) from (
|select t1.a from (select distinct a from t) t1
|join t t2 on t1.a=t2.a
|join t t3 on t1.a=t3.a
|join t t4 on t1.a=t4.a
|join t t5 on t1.a=t5.a
|join t t6 on t1.a=t6.a
|join t t7 on t1.a=t7.a
|join t t8 on t1.a=t8.a
|join t t9 on t1.a=t9.a
|join t t10 on t1.a=t10.a
|join t t11 on t1.a=t11.a
|join t t12 on t1.a=t12.a
|join t t13 on t1.a=t13.a
|join t t14 on t1.a=t14.a
|join t t15 on t1.a=t15.a
|join t t16 on t1.a=t16.a
|join t t17 on t1.a=t17.a
|join t t18 on t1.a=t18.a
|join t t19 on t1.a=t19.a
|join t t20 on t1.a=t20.a
|join t t21 on t1.a=t21.a
|join t t22 on t1.a=t22.a
|join t t23 on t1.a=t23.a
|join t t24 on t1.a=t24.a
|join t t25 on t1.a=t25.a
|join t t26 on t1.a=t26.a
|join t t27 on t1.a=t27.a
|join t t28 on t1.a=t28.a
|) group by a
|""".stripMargin).show
```
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]