sunchao commented on PR #45314:
URL: https://github.com/apache/spark/pull/45314#issuecomment-2021927724

   > @sunchao Aside from picking the side of partially clustered distribution, 
would we also be able to use it to group smaller partitions? Example a table is 
partition by date, and older days have not much data (on both sides), group 
many of the older days into the same task.
   
   Yea I I think that would be an interesting use case. If we know the 
partitions from both sides of the join AND the size for each partition, we can 
probably make some better decisions. 
   
   > Similar to AQE coalesce partitions, but it looks like that applies only 
after shuffle, so looks like it doesnt apply for SPJ?
   
   Right, this doesn't to SPJ.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to