[GitHub] [spark] Ngone51 commented on a change in pull request #28257: [SPARK-31485][CORE] Avoid application hang if only partial barrier tasks launched

GitBox Thu, 23 Apr 2020 18:36:57 -0700


Ngone51 commented on a change in pull request #28257:
URL: https://github.com/apache/spark/pull/28257#discussion_r414231747




##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -675,11 +676,15 @@ private[spark] class TaskSchedulerImpl(
           // Check whether the barrier tasks are partially launched.
           // TODO SPARK-24818 handle the assert failure case (that can happen 
when some locality
           // requirements are not fulfilled, and we should revert the launched 
tasks).
-          require(addressesWithDescs.size == taskSet.numTasks,
-            s"Skip current round of resource offers for barrier stage 
${taskSet.stageId} " +

Review comment:
       > Could we instead have a counter inside the taskSet or other mechanism 
to allow for X retries?
   
   I believe barrier retry is next step we plan to do in the future release but 
not 2.4.
   
   > It seems like turning it off bis a bit of a behaviour change from the 
point of view of considering backporting.
   
   What's behavior change? Previously, application gets hang and now it fail as 
we expect in first place.
   
   > require would have ended up throwing an exception in this case - we should 
do the same after taskSet.abort to prevent change in behavior - particularly 
for backport
   
   To be honest, I'm fine to keep throwing exception there, but I disagree that 
throwing exception is expected behavior we can not change. Actually, no one 
would handle the exception thrown here. And I believe our expect behavior is to 
fail the application with the clear error message.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Ngone51 commented on a change in pull request #28257: [SPARK-31485][CORE] Avoid application hang if only partial barrier tasks launched

Reply via email to