peter-toth commented on PR #53859: URL: https://github.com/apache/spark/pull/53859#issuecomment-3819334983
> One advantage of this approach is that it allows us to avoid grouping for the (presumably not uncommon) case of a simple scan from a partitioned table, and it should still be safe for checkpointed scans (as the checkpointed scans would have a `KeyGroupedPartitioning` w/ `disableGrouping=true`, which would not satisfy most required distributions). This approach should also decrease the complexity of the EnsureRequirements changes (since we wouldn't have to catch all the cases in which a KeyGroupedPartitioning scan doesn't contribute to the output partitioning of the plan). My concern with this approach is that we can introduce an extra shuffle above the checkpointed (ungrouped) data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
