[ 
https://issues.apache.org/jira/browse/SPARK-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout resolved SPARK-14958.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.2.0

> Failed task hangs if error is encountered when getting task result
> ------------------------------------------------------------------
>
>                 Key: SPARK-14958
>                 URL: https://issues.apache.org/jira/browse/SPARK-14958
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.0, 2.0.0, 2.1.0
>            Reporter: Rui Li
>            Assignee: Rui Li
>             Fix For: 2.2.0
>
>
> In {{TaskResultGetter}}, if we get an error when deserialize 
> {{TaskEndReason}}, TaskScheduler won't have a chance to handle the failed 
> task and the task just hangs.
> {code}
>   def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, taskState: 
> TaskState,
>     serializedData: ByteBuffer) {
>     var reason : TaskEndReason = UnknownReason
>     try {
>       getTaskResultExecutor.execute(new Runnable {
>         override def run(): Unit = Utils.logUncaughtExceptions {
>           val loader = Utils.getContextOrSparkClassLoader
>           try {
>             if (serializedData != null && serializedData.limit() > 0) {
>               reason = serializer.get().deserialize[TaskEndReason](
>                 serializedData, loader)
>             }
>           } catch {
>             case cnd: ClassNotFoundException =>
>               // Log an error but keep going here -- the task failed, so not 
> catastrophic
>               // if we can't deserialize the reason.
>               logError(
>                 "Could not deserialize TaskEndReason: ClassNotFound with 
> classloader " + loader)
>             case ex: Exception => {}
>           }
>           scheduler.handleFailedTask(taskSetManager, tid, taskState, reason)
>         }
>       })
>     } catch {
>       case e: RejectedExecutionException if sparkEnv.isStopped =>
>         // ignore it
>     }
>   }
> {code}
> In my specific case, I got a NoClassDefFoundError and the failed task hangs 
> forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to