[GitHub] [spark] mridulm commented on pull request #35185: [SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics

GitBox Mon, 17 Jan 2022 16:11:46 -0800


mridulm commented on pull request #35185:
URL: https://github.com/apache/spark/pull/35185#issuecomment-1014966382



   > > Took an initial pass through the PR and added some comments - overall 
looks good. We would need to make sure that skew join and partition coalescing 
in SQL interact well with this change.
   > 
   > Thanks for you reply. I have test partition coalescing in SQL interact, it 
works well with this change.
   
   What I want @cloud-fan, @dongjoon-hyun, etc who are more familiar with SQL 
to look at is - given a single partition gets computed by multiple tasks, what 
is the expectation ?
   Do multiple tasks end up with the same partition-id ? If yes, how do we 
differentiate between them in case of failures/recompute - if not, how do we 
identify them ? (or, if I am missing something - would love to understand how 
this pr is compatible with sql in partition coalascing/skew join scenarios).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on pull request #35185: [SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics

Reply via email to