Github user mccheah commented on the issue: https://github.com/apache/spark/pull/22112 @cloud-fan @tgravescs was wondering if we could get an ETA on this landing? Also, I tried running something analogous to the example script from the description of https://issues.apache.org/jira/browse/SPARK-23207, but for RDDs. However, it did not manifest the correctness problem even before this patch was applied. Are there any ways to reliably reproduce this with a minimal script? The below script is run in my Spark shell, Spark standalone mode single-node cluster with 2 workers, client mode, with the external shuffle service enabled. It does not reproduce the issue. ``` import scala.sys.process._ import org.apache.spark.TaskContext val res = sc.parallelize(0 until 1000 * 1000, 1).coalesce(200, shuffle = true).map { x => x }.coalesce(200, shuffle = true).map { x => if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) { throw new Exception("pkill -f -n java".!!) // Kills the newest Java process, ideally the executors } x } res.distinct().count() ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org