mridulm commented on pull request #35185: URL: https://github.com/apache/spark/pull/35185#issuecomment-1014966382
> > Took an initial pass through the PR and added some comments - overall looks good. We would need to make sure that skew join and partition coalescing in SQL interact well with this change. > > Thanks for you reply. I have test partition coalescing in SQL interact, it works well with this change. What I want @cloud-fan, @dongjoon-hyun, etc who are more familiar with SQL to look at is - given a single partition gets computed by multiple tasks, what is the expectation ? Do multiple tasks end up with the same partition-id ? If yes, how do we differentiate between them in case of failures/recompute - if not, how do we identify them ? (or, if I am missing something - would love to understand how this pr is compatible with sql in partition coalascing/skew join scenarios). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
