sunchao commented on PR #45314: URL: https://github.com/apache/spark/pull/45314#issuecomment-2021927724
> @sunchao Aside from picking the side of partially clustered distribution, would we also be able to use it to group smaller partitions? Example a table is partition by date, and older days have not much data (on both sides), group many of the older days into the same task. Yea I I think that would be an interesting use case. If we know the partitions from both sides of the join AND the size for each partition, we can probably make some better decisions. > Similar to AQE coalesce partitions, but it looks like that applies only after shuffle, so looks like it doesnt apply for SPJ? Right, this doesn't to SPJ. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
