c21 commented on pull request #34444:
URL: https://github.com/apache/spark/pull/34444#issuecomment-956587980
> Just out of curiosity, how much performance gain this code generation
brings?
@Tagar - I ran the small micro benchmark (similar to Spark's
`JoinBenchmark.scala`), it can give ~10-20% run time improvement for the given
query. I did notice the run time improvement may vary per run of benchmark, and
I don't expect real production query can get as much as improvement in theory.
So do take the number as just a reference here.
```
val N: Long = 4 << 20
withSQLConf(
SQLConf.SHUFFLE_PARTITIONS.key -> "2",
SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "10000000",
SQLConf.PREFER_SORTMERGEJOIN.key -> "false") {
codegenBenchmark("shuffle hash join", N) {
val df1 = spark.range(N).selectExpr(s"id as k1")
val df2 = spark.range(N / 3).selectExpr(s"id * 3 as k2")
val df = df1.join(df2, col("k1") === col("k2"), "full_outer")
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[ShuffledHashJoinExec]).isDefined)
df.noop()
}
}
Running benchmark: shuffle hash join
Running case: shuffle hash join wholestage off
Stopped after 2 iterations, 3051 ms
Running case: shuffle hash join wholestage on
Stopped after 5 iterations, 6638 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
shuffle hash join: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
shuffle hash join wholestage off 1519 1526
10 2.8 362.2 1.0X
shuffle hash join wholestage on 1273 1328
70 3.3 303.4 1.2X
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]