wangyum commented on code in PR #37104:
URL: https://github.com/apache/spark/pull/37104#discussion_r920722372
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala:
##########
@@ -130,8 +130,23 @@ abstract class SparkStrategies extends
QueryPlanner[SparkPlan] {
if limit < conf.topKSortFallbackThreshold =>
Some(TakeOrderedAndProjectExec(
limit, order, projectList, planLater(child)))
+ case Sort(order, true, child) if supportTakeOrdered(child) =>
+ Some(TakeOrderedAndProjectExec(
+ child.maxRows.get.toInt, order, child.output, planLater(child)))
+ case Project(projectList, Sort(order, true, child)) if
supportTakeOrdered(child) =>
+ Some(TakeOrderedAndProjectExec(
+ child.maxRows.get.toInt, order, projectList, planLater(child)))
case _ => None
}
+
+ private def supportTakeOrdered(plan: LogicalPlan): Boolean = {
+ plan.maxRows.exists(_ < math.min(conf.topKSortFallbackThreshold,
655360)) &&
Review Comment:
A larger number of rows does not improve performance. Please see the
benchmark:
```
Benchmark SPARK-39698 with max rows = 10: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
TakeOrderedAndProjectExec is Disabled 156 203
38 100.8 9.9 1.0X
TakeOrderedAndProjectExec is Enabled 92 100
8 171.1 5.8 1.7X
Benchmark SPARK-39698 with max rows = 100: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
TakeOrderedAndProjectExec is Disabled 117 129
11 134.8 7.4 1.0X
TakeOrderedAndProjectExec is Enabled 77 86
7 203.6 4.9 1.5X
Benchmark SPARK-39698 with max rows = 1000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------
TakeOrderedAndProjectExec is Disabled 120 132
16 131.1 7.6 1.0X
TakeOrderedAndProjectExec is Enabled 80 89
10 195.6 5.1 1.5X
Benchmark SPARK-39698 with max rows = 10000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
TakeOrderedAndProjectExec is Disabled 139 151
12 113.1 8.8 1.0X
TakeOrderedAndProjectExec is Enabled 90 100
16 174.4 5.7 1.5X
Benchmark SPARK-39698 with max rows = 20000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
TakeOrderedAndProjectExec is Disabled 157 166
7 100.4 10.0 1.0X
TakeOrderedAndProjectExec is Enabled 96 104
9 163.2 6.1 1.6X
Benchmark SPARK-39698 with max rows = 50000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
TakeOrderedAndProjectExec is Disabled 203 211
8 77.4 12.9 1.0X
TakeOrderedAndProjectExec is Enabled 123 132
7 128.1 7.8 1.7X
Benchmark SPARK-39698 with max rows = 100000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------
TakeOrderedAndProjectExec is Disabled 282 288
5 55.8 17.9 1.0X
TakeOrderedAndProjectExec is Enabled 164 173
8 95.8 10.4 1.7X
Benchmark SPARK-39698 with max rows = 500000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------
TakeOrderedAndProjectExec is Disabled 748 753
6 21.0 47.5 1.0X
TakeOrderedAndProjectExec is Enabled 649 662
11 24.2 41.2 1.2X
Benchmark SPARK-39698 with max rows = 800000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------
TakeOrderedAndProjectExec is Disabled 1109 1160
73 14.2 70.5 1.0X
TakeOrderedAndProjectExec is Enabled 985 1010
21 16.0 62.6 1.1X
Benchmark SPARK-39698 with max rows = 1000000: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
-----------------------------------------------------------------------------------------------------------------------------
TakeOrderedAndProjectExec is Disabled 1286 1329
37 12.2 81.8 1.0X
TakeOrderedAndProjectExec is Enabled 1314 1336
26 12.0 83.6 1.0X
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]