[GitHub] [spark] wangyum commented on pull request #31691: [SPARK-34575][SQL] Push down limit through window when partitionSpec is empty

GitBox Wed, 03 Mar 2021 00:50:07 -0800


wangyum commented on pull request #31691:
URL: https://github.com/apache/spark/pull/31691#issuecomment-789547222



   > After: input -> global top k (shuffle to one partition) -> run rank 
function -> limit
   
   `EliminateLimits` will remove the final limit.
   Before:
   ```
   == Optimized Logical Plan ==
   GlobalLimit 5, Statistics(sizeInBytes=140.0 B, rowCount=5)
   +- LocalLimit 5, Statistics(sizeInBytes=1731.0 B)
      +- Window [row_number() windowspecdefinition(a#10L ASC NULLS FIRST, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
rowId#8], [a#10L ASC NULLS FIRST], Statistics(sizeInBytes=1731.0 B)
         +- Relation default.t1[a#10L,b#11L] parquet, 
Statistics(sizeInBytes=1484.0 B)
   
   == Physical Plan ==
   AdaptiveSparkPlan isFinalPlan=false
   +- CollectLimit 5
      +- Window [row_number() windowspecdefinition(a#10L ASC NULLS FIRST, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
rowId#8], [a#10L ASC NULLS FIRST]
         +- Sort [a#10L ASC NULLS FIRST], false, 0
            +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#25]
               +- FileScan parquet default.t1[a#10L,b#11L] Batched: true, 
DataFilters: [], Format: Parquet, PartitionFilters: [], PushedFilters: [], 
ReadSchema: struct<a:bigint,b:bigint>
   ```
   
   After:
   ```
   == Optimized Logical Plan ==
   Window [row_number() windowspecdefinition(a#10L ASC NULLS FIRST, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
rowId#8], [a#10L ASC NULLS FIRST], Statistics(sizeInBytes=140.0 B)
   +- GlobalLimit 5, Statistics(sizeInBytes=120.0 B, rowCount=5)
      +- LocalLimit 5, Statistics(sizeInBytes=1484.0 B)
         +- Sort [a#10L ASC NULLS FIRST], true, Statistics(sizeInBytes=1484.0 B)
            +- Relation default.t1[a#10L,b#11L] parquet, 
Statistics(sizeInBytes=1484.0 B)
   
   == Physical Plan ==
   AdaptiveSparkPlan isFinalPlan=false
   +- Window [row_number() windowspecdefinition(a#10L ASC NULLS FIRST, 
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
rowId#8], [a#10L ASC NULLS FIRST]
      +- TakeOrderedAndProject(limit=5, orderBy=[a#10L ASC NULLS FIRST], 
output=[a#10L,b#11L])
         +- FileScan parquet default.t1[a#10L,b#11L] Batched: true, 
DataFilters: [], Format: Parquet, PartitionFilters: [], PushedFilters: [], 
ReadSchema: struct<a:bigint,b:bigint>
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wangyum commented on pull request #31691: [SPARK-34575][SQL] Push down limit through window when partitionSpec is empty

Reply via email to