Rui Li created SPARK-14958: ------------------------------ Summary: Failed task hangs if error is encountered when getting task result Key: SPARK-14958 URL: https://issues.apache.org/jira/browse/SPARK-14958 Project: Spark Issue Type: Bug Reporter: Rui Li
In {{TaskResultGetter}}, if we get an error when deserialize {{TaskEndReason}}, TaskScheduler won't have a chance to handle the failed task and the task just hangs. {code} def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, taskState: TaskState, serializedData: ByteBuffer) { var reason : TaskEndReason = UnknownReason try { getTaskResultExecutor.execute(new Runnable { override def run(): Unit = Utils.logUncaughtExceptions { val loader = Utils.getContextOrSparkClassLoader try { if (serializedData != null && serializedData.limit() > 0) { reason = serializer.get().deserialize[TaskEndReason]( serializedData, loader) } } catch { case cnd: ClassNotFoundException => // Log an error but keep going here -- the task failed, so not catastrophic // if we can't deserialize the reason. logError( "Could not deserialize TaskEndReason: ClassNotFound with classloader " + loader) case ex: Exception => {} } scheduler.handleFailedTask(taskSetManager, tid, taskState, reason) } }) } catch { case e: RejectedExecutionException if sparkEnv.isStopped => // ignore it } } {code} In my specific case, I got a NoClassDefFoundError and the failed task hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org