wangyum commented on issue #27252: [SPARK-29231][SQL] Constraints should be inferred from cast equality constraint URL: https://github.com/apache/spark/pull/27252#issuecomment-584471609 Benchmark code: ```scala import org.apache.spark.benchmark.Benchmark import org.apache.spark.sql.SaveMode.Overwrite val numRows = 1024 * 1024 * 15 spark.range(numRows).selectExpr("cast(id as bigint) as c1", "cast(id as string) as c2").write.saveAsTable("t1") spark.range(numRows).selectExpr("cast(id as int) as c1", "cast(id as string) as c2").write.saveAsTable("t2") val title = "Constraints inferred from cast equality constraint" val benchmark = new Benchmark(title, numRows, minNumIters = 5) benchmark.addCase("t1.c1 > 100000") { _ => spark.sql(s"select count(*) from t1 join t2 on (t1.c1 = t2.c1 and t1.c1 > ${numRows - 100})").write.format("noop").mode(Overwrite).save() } benchmark.addCase("t1.c1 = 100000") { _ => spark.sql("select count(*) from t1 join t2 on (t1.c1 = t2.c1 and t1.c1 = 100000)").write.format("noop").mode(Overwrite).save() } benchmark.run() ``` Before this PR: ``` Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Constraints inferred from cast equality constraint: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ t1.c1 > 100000 3891 4189 462 4.0 247.4 1.0X t1.c1 = 100000 2170 2255 74 7.2 138.0 1.8X ``` After this PR: ``` Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Constraints inferred from cast equality constraint: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ t1.c1 > 100000 388 460 42 40.5 24.7 1.0X t1.c1 = 100000 306 342 21 51.4 19.5 1.3X ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
