Re: [PR] [SPARK-55092][SQL] Disable partition grouping in `KeyGroupedPartitioning` when not needed [spark]

via GitHub Thu, 29 Jan 2026 10:05:23 -0800


peter-toth commented on PR #53859:
URL: https://github.com/apache/spark/pull/53859#issuecomment-3819334983


   > One advantage of this approach is that it allows us to avoid grouping for 
the (presumably not uncommon) case of a simple scan from a partitioned table, 
and it should still be safe for checkpointed scans (as the checkpointed scans 
would have a `KeyGroupedPartitioning` w/ `disableGrouping=true`, which would 
not satisfy most required distributions). This approach should also decrease 
the complexity of the EnsureRequirements changes (since we wouldn't have to 
catch all the cases in which a KeyGroupedPartitioning scan doesn't contribute 
to the output partitioning of the plan).
   
   My concern with this approach is that we can introduce an extra shuffle 
above the checkpointed (ungrouped) data.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55092][SQL] Disable partition grouping in `KeyGroupedPartitioning` when not needed [spark]

Reply via email to