wangyum commented on pull request #31691:
URL: https://github.com/apache/spark/pull/31691#issuecomment-789547222
> After: input -> global top k (shuffle to one partition) -> run rank
function -> limit
`EliminateLimits` will remove the final limit.
Before:
```
== Optimized Logical Plan ==
GlobalLimit 5, Statistics(sizeInBytes=140.0 B, rowCount=5)
+- LocalLimit 5, Statistics(sizeInBytes=1731.0 B)
+- Window [row_number() windowspecdefinition(a#10L ASC NULLS FIRST,
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS
rowId#8], [a#10L ASC NULLS FIRST], Statistics(sizeInBytes=1731.0 B)
+- Relation default.t1[a#10L,b#11L] parquet,
Statistics(sizeInBytes=1484.0 B)
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- CollectLimit 5
+- Window [row_number() windowspecdefinition(a#10L ASC NULLS FIRST,
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS
rowId#8], [a#10L ASC NULLS FIRST]
+- Sort [a#10L ASC NULLS FIRST], false, 0
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#25]
+- FileScan parquet default.t1[a#10L,b#11L] Batched: true,
DataFilters: [], Format: Parquet, PartitionFilters: [], PushedFilters: [],
ReadSchema: struct<a:bigint,b:bigint>
```
After:
```
== Optimized Logical Plan ==
Window [row_number() windowspecdefinition(a#10L ASC NULLS FIRST,
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS
rowId#8], [a#10L ASC NULLS FIRST], Statistics(sizeInBytes=140.0 B)
+- GlobalLimit 5, Statistics(sizeInBytes=120.0 B, rowCount=5)
+- LocalLimit 5, Statistics(sizeInBytes=1484.0 B)
+- Sort [a#10L ASC NULLS FIRST], true, Statistics(sizeInBytes=1484.0 B)
+- Relation default.t1[a#10L,b#11L] parquet,
Statistics(sizeInBytes=1484.0 B)
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Window [row_number() windowspecdefinition(a#10L ASC NULLS FIRST,
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS
rowId#8], [a#10L ASC NULLS FIRST]
+- TakeOrderedAndProject(limit=5, orderBy=[a#10L ASC NULLS FIRST],
output=[a#10L,b#11L])
+- FileScan parquet default.t1[a#10L,b#11L] Batched: true,
DataFilters: [], Format: Parquet, PartitionFilters: [], PushedFilters: [],
ReadSchema: struct<a:bigint,b:bigint>
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]