[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

tgravescs Tue, 23 Oct 2018 07:42:42 -0700

Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/22771
  
    @markhamstra thanks for the reference, I was looking for some background on 
this.   I agree those are still issues like mentioned in SPARK-17064 but I 
don't think that directly impacts this.  We can at least try to abort the tasks 
and still honors the interrupt on cancel flag.  It seems like best case is 
things actually get killed and we free up resources, worst case seems to be 
that the task ignores the  interrupt and continues just like now.  I guess if 
the user code spawns other threads its possible that you clean up the main 
thread and leave other threads running, but short of killing the executor jvm I 
don't think there is a way around that. 
     We now have the task reaper functionality as well which at least gives the 
user some options.  
    
    Do you have specific concerns where this would actually cause problems, 
there is a lot of discussion there so want to make sure I didn't miss 
something? In the jira, you mention "possibility of nodes being marked dead 
when a Task thread is interrupted".   What exactly do you mean by that?   Do 
you mean user code is badly handling and exiting the jvm? 
    Reynold mentions something about storage clients not handling interrupts 
well, do you know if that means it was actually causing corruptions or was it 
just ignoring?
    
    I didn't thoroughly go through the spark code to ensure it doesn't cause 
our resource accounting to get off.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

Reply via email to