[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910994#comment-16910994 ] Dongjoon Hyun commented on SPARK-28699: --- Thank you for the update, [~XuanYuan]! > Cache an indeterminate RDD could lead to incorrect result while stage rerun > --- > > Key: SPARK-28699 > URL: https://issues.apache.org/jira/browse/SPARK-28699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.3, 3.0.0, 2.4.3 >Reporter: Yuanjian Li >Priority: Blocker > Labels: correctness > > It's another case for the indeterminate stage/RDD rerun while stage rerun > happened. > We can reproduce this by the following code, thanks to Tyson for reporting > this! > > {code:scala} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1 * 1, 1).map{ x => (x % 1000, x)} > // kill an executor in the stage that performs repartition(239) > val df = res.repartition(113).cache.repartition(239).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && > TaskContext.get.stageAttemptNumber == 0) { > throw new Exception("pkill -f -n java".!!) > } > x > } > val r2 = df.distinct.count() > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910990#comment-16910990 ] Yuanjian Li commented on SPARK-28699: - [~dongjoon] Sure, the affects version is spark-2.1 after 2.1.4, spark 2.2 after 2.2.3, spark 2.3 and spark 2.4, Jira field update is done. > Cache an indeterminate RDD could lead to incorrect result while stage rerun > --- > > Key: SPARK-28699 > URL: https://issues.apache.org/jira/browse/SPARK-28699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Blocker > Labels: correctness > > It's another case for the indeterminate stage/RDD rerun while stage rerun > happened. In the CachedRDDBuilder. > We can reproduce this by the following code, thanks to Tyson for reporting > this! > > {code:scala} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1 * 1, 1).map{ x => (x % 1000, x)} > // kill an executor in the stage that performs repartition(239) > val df = res.repartition(113).cache.repartition(239).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && > TaskContext.get.stageAttemptNumber == 0) { > throw new Exception("pkill -f -n java".!!) > } > x > } > val r2 = df.distinct.count() > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910987#comment-16910987 ] Dongjoon Hyun commented on SPARK-28699: --- :) BTW, I updated this to `Blocker` according to [~smilegator]'s advice. > Cache an indeterminate RDD could lead to incorrect result while stage rerun > --- > > Key: SPARK-28699 > URL: https://issues.apache.org/jira/browse/SPARK-28699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Blocker > Labels: correctness > > It's another case for the indeterminate stage/RDD rerun while stage rerun > happened. In the CachedRDDBuilder. > We can reproduce this by the following code, thanks to Tyson for reporting > this! > > {code:scala} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1 * 1, 1).map{ x => (x % 1000, x)} > // kill an executor in the stage that performs repartition(239) > val df = res.repartition(113).cache.repartition(239).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && > TaskContext.get.stageAttemptNumber == 0) { > throw new Exception("pkill -f -n java".!!) > } > x > } > val r2 = df.distinct.count() > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910983#comment-16910983 ] Kazuaki Ishizaki commented on SPARK-28699: -- [~dongjoon] Thank you for pointing out my typo. You are right. I should have said {2.3.4-rc1}. Actually, while I was doing the following, it is not reflected at the repository yet! After fixing this, let me restart the release process for {2.3.4-rc1}. {code} Release details: BRANCH: branch-2.3 VERSION: 2.3.4 TAG: v2.3.4-rc1 {code} > Cache an indeterminate RDD could lead to incorrect result while stage rerun > --- > > Key: SPARK-28699 > URL: https://issues.apache.org/jira/browse/SPARK-28699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > Labels: correctness > > Related with SPARK-23207 SPARK-23243 > It's another case for the indeterminate stage/RDD rerun while stage rerun > happened. In the CachedRDDBuilder. > We can reproduce this by the following code, thanks to Tyson for reporting > this! > > {code:scala} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)} > // kill an executor in the stage that performs repartition(239) > val df = res.repartition(113).cache.repartition(239).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && > TaskContext.get.stageAttemptNumber == 0) { > throw new Exception("pkill -f -n java".!!) > } > x > } > val r2 = df.distinct.count() > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910974#comment-16910974 ] Dongjoon Hyun commented on SPARK-28699: --- ? [~kiszk]. `2.4.4-rc1` is `branch-2.4` and mine. You should not remove that. I guess you wanted to say `2.3.4-rc1` and `2.3.4-rc1` is not created yet. > Cache an indeterminate RDD could lead to incorrect result while stage rerun > --- > > Key: SPARK-28699 > URL: https://issues.apache.org/jira/browse/SPARK-28699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > Labels: correctness > > Related with SPARK-23207 SPARK-23243 > It's another case for the indeterminate stage/RDD rerun while stage rerun > happened. In the CachedRDDBuilder, we miss tracking the `isOrderSensitive` > characteristic to the newly created MapPartitionsRDD. > We can reproduce this by the following code, thanks to Tyson for reporting > this! > > {code:scala} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)} > // kill an executor in the stage that performs repartition(239) > val df = res.repartition(113).cache.repartition(239).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && > TaskContext.get.stageAttemptNumber == 0) { > throw new Exception("pkill -f -n java".!!) > } > x > } > val r2 = df.distinct.count() > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910907#comment-16910907 ] Kazuaki Ishizaki commented on SPARK-28699: -- [~smilegator] Thank you for cc. I wait for fixing this. I was in the middle of releasing RC1. Thus, there is already {{2.4.4-rc1}} tag in the [branch-2.3|https://github.com/apache/spark/tree/branch-2.3]. Should I remove this tag and release rc1? Or should I leave this tag and release rc2 at first? > Cache an indeterminate RDD could lead to incorrect result while stage rerun > --- > > Key: SPARK-28699 > URL: https://issues.apache.org/jira/browse/SPARK-28699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > Labels: correctness > > Related with SPARK-23207 SPARK-23243 > It's another case for the indeterminate stage/RDD rerun while stage rerun > happened. In the CachedRDDBuilder, we miss tracking the `isOrderSensitive` > characteristic to the newly created MapPartitionsRDD. > We can reproduce this by the following code, thanks to Tyson for reporting > this! > > {code:scala} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)} > // kill an executor in the stage that performs repartition(239) > val df = res.repartition(113).cache.repartition(239).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && > TaskContext.get.stageAttemptNumber == 0) { > throw new Exception("pkill -f -n java".!!) > } > x > } > val r2 = df.distinct.count() > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910859#comment-16910859 ] Xiao Li commented on SPARK-28699: - Also cc [~kiszk] Let us wait for this before starting RC1 for 2.3 > Cache an indeterminate RDD could lead to incorrect result while stage rerun > --- > > Key: SPARK-28699 > URL: https://issues.apache.org/jira/browse/SPARK-28699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > Labels: correctness > > Related with SPARK-23207 SPARK-23243 > It's another case for the indeterminate stage/RDD rerun while stage rerun > happened. In the CachedRDDBuilder, we miss tracking the `isOrderSensitive` > characteristic to the newly created MapPartitionsRDD. > We can reproduce this by the following code, thanks to Tyson for reporting > this! > > {code:scala} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)} > // kill an executor in the stage that performs repartition(239) > val df = res.repartition(113).cache.repartition(239).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && > TaskContext.get.stageAttemptNumber == 0) { > throw new Exception("pkill -f -n java".!!) > } > x > } > val r2 = df.distinct.count() > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910795#comment-16910795 ] Dongjoon Hyun commented on SPARK-28699: --- Hi, [~XuanYuan]. Could you check old Spark versions and update `Affects Version/s:` of this JIRA issue? > Cache an indeterminate RDD could lead to incorrect result while stage rerun > --- > > Key: SPARK-28699 > URL: https://issues.apache.org/jira/browse/SPARK-28699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > Labels: correctness > > Related with SPARK-23207 SPARK-23243 > It's another case for the indeterminate stage/RDD rerun while stage rerun > happened. In the CachedRDDBuilder, we miss tracking the `isOrderSensitive` > characteristic to the newly created MapPartitionsRDD. > We can reproduce this by the following code, thanks to Tyson for reporting > this! > > {code:scala} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)} > // kill an executor in the stage that performs repartition(239) > val df = res.repartition(113).cache.repartition(239).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && > TaskContext.get.stageAttemptNumber == 0) { > throw new Exception("pkill -f -n java".!!) > } > x > } > val r2 = df.distinct.count() > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun
[ https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905250#comment-16905250 ] Yuanjian Li commented on SPARK-28699: - The current [approach|https://github.com/apache/spark/pull/25420] just a bandage fix for returning the wrong answer. After we finish the work of indeterminate stage rerunning(SPARK-25341), we can fix this by unpersisting the original RDD and rerunning the cached indeterminate stage. Gives a preview codebase [here|https://github.com/xuanyuanking/spark/tree/SPARK-28699-RERUN]. > Cache an indeterminate RDD could lead to incorrect result while stage rerun > --- > > Key: SPARK-28699 > URL: https://issues.apache.org/jira/browse/SPARK-28699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > Related with SPARK-23207 SPARK-23243 > It's another case for the indeterminate stage/RDD rerun while stage rerun > happened. In the CachedRDDBuilder, we miss tracking the `isOrderSensitive` > characteristic to the newly created MapPartitionsRDD. > We can reproduce this by the following code, thanks to Tyson for reporting > this! > > {code:scala} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)} > // kill an executor in the stage that performs repartition(239) > val df = res.repartition(113).cache.repartition(239).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && > TaskContext.get.stageAttemptNumber == 0) { > throw new Exception("pkill -f -n java".!!) > } > x > } > val r2 = df.distinct.count() > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org