WeichenXu123 commented on issue #25235: [SPARK-28483][Core] Fix canceling a 
spark job using barrier mode but barrier tasks blocking on 
BarrierTaskContext.barrier()
URL: https://github.com/apache/spark/pull/25235#issuecomment-515901937
 
 
   @Ngone51 
   This approach above is doable. Current `barrier()` will possibly hold a long 
time RPC (and we have no way to cancel it). We can split the long run RPC into 
two steps:
   1) task send new "barrier epoch" to coordinator
   2) task wait coordinator reply sync ok message.
   
   But, this approach require many code changes and bring heavy review burden.
   I am thinking about whether there're simpler ways to fix it.
   CC @cloud-fan 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to