[ https://issues.apache.org/jira/browse/SPARK-23053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Imran Rashid resolved SPARK-23053. ---------------------------------- Resolution: Fixed > taskBinarySerialization and task partitions calculate in > DagScheduler.submitMissingTasks should keep the same RDD checkpoint status > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-23053 > URL: https://issues.apache.org/jira/browse/SPARK-23053 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core > Affects Versions: 2.1.0 > Reporter: huangtengfei > Assignee: huangtengfei > Priority: Major > Fix For: 2.2.2, 2.3.1, 2.4.0 > > > When we run concurrent jobs using the same rdd which is marked to do > checkpoint. If one job has finished running the job, and start the process of > RDD.doCheckpoint, while another job is submitted, then submitStage and > submitMissingTasks will be called. In > [submitMissingTasks|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L961], > will serialize taskBinaryBytes and calculate task partitions which are both > affected by the status of checkpoint, if the former is calculated before > doCheckpoint finished, while the latter is calculated after doCheckpoint > finished, when run task, rdd.compute will be called, for some rdds with > particular partition type such as > [MapWithStateRDD|https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/rdd/MapWithStateRDD.scala] > who will do partition type cast, will get a ClassCastException because the > part params is actually a CheckpointRDDPartition. > This error occurs because rdd.doCheckpoint occurs in the same thread that > called sc.runJob, while the task serialization occurs in the DAGSchedulers > event loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org