szehon-ho commented on PR #45314: URL: https://github.com/apache/spark/pull/45314#issuecomment-2021228813
> cc @aokolnychyi @RussellSpitzer @rdblue do you think this could be useful for Iceberg to pass partition stats to Spark? SPJ could leverage this to make better decisions on how to combine partitions (like which side to choose during partially clustered distribution), but I'm not sure whether there are more use cases. @sunchao Aside from picking the side of partially clustered distribution, would we also be able to use it to group smaller partitions? Example a table is partition by date, and older days have not much data (on both sides), group many of the older days into the same task. Similar to AQE coalesce partitions, but it looks like it applies only after shuffle, so looks like it doesnt apply for SPJ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
