[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910994#comment-16910994
 ] 

Dongjoon Hyun commented on SPARK-28699:
---

Thank you for the update, [~XuanYuan]!

> Cache an indeterminate RDD could lead to incorrect result while stage rerun
> ---
>
> Key: SPARK-28699
> URL: https://issues.apache.org/jira/browse/SPARK-28699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.3, 3.0.0, 2.4.3
>Reporter: Yuanjian Li
>Priority: Blocker
>  Labels: correctness
>
> It's another case for the indeterminate stage/RDD rerun while stage rerun 
> happened.
> We can reproduce this by the following code, thanks to Tyson for reporting 
> this!
>   
> {code:scala}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1 * 1, 1).map{ x => (x % 1000, x)}
> // kill an executor in the stage that performs repartition(239)
> val df = res.repartition(113).cache.repartition(239).map { x =>
>  if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && 
> TaskContext.get.stageAttemptNumber == 0) {
>  throw new Exception("pkill -f -n java".!!)
>  }
>  x
> }
> val r2 = df.distinct.count()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Yuanjian Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910990#comment-16910990
 ] 

Yuanjian Li commented on SPARK-28699:
-

[~dongjoon] Sure, the affects version is spark-2.1 after 2.1.4, spark 2.2 after 
2.2.3, spark 2.3 and spark 2.4, Jira field update is done.

> Cache an indeterminate RDD could lead to incorrect result while stage rerun
> ---
>
> Key: SPARK-28699
> URL: https://issues.apache.org/jira/browse/SPARK-28699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Blocker
>  Labels: correctness
>
> It's another case for the indeterminate stage/RDD rerun while stage rerun 
> happened. In the CachedRDDBuilder.
> We can reproduce this by the following code, thanks to Tyson for reporting 
> this!
>   
> {code:scala}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1 * 1, 1).map{ x => (x % 1000, x)}
> // kill an executor in the stage that performs repartition(239)
> val df = res.repartition(113).cache.repartition(239).map { x =>
>  if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && 
> TaskContext.get.stageAttemptNumber == 0) {
>  throw new Exception("pkill -f -n java".!!)
>  }
>  x
> }
> val r2 = df.distinct.count()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910987#comment-16910987
 ] 

Dongjoon Hyun commented on SPARK-28699:
---

:)

BTW, I updated this to `Blocker` according to [~smilegator]'s advice.

> Cache an indeterminate RDD could lead to incorrect result while stage rerun
> ---
>
> Key: SPARK-28699
> URL: https://issues.apache.org/jira/browse/SPARK-28699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Blocker
>  Labels: correctness
>
> It's another case for the indeterminate stage/RDD rerun while stage rerun 
> happened. In the CachedRDDBuilder.
> We can reproduce this by the following code, thanks to Tyson for reporting 
> this!
>   
> {code:scala}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1 * 1, 1).map{ x => (x % 1000, x)}
> // kill an executor in the stage that performs repartition(239)
> val df = res.repartition(113).cache.repartition(239).map { x =>
>  if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && 
> TaskContext.get.stageAttemptNumber == 0) {
>  throw new Exception("pkill -f -n java".!!)
>  }
>  x
> }
> val r2 = df.distinct.count()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Kazuaki Ishizaki (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910983#comment-16910983
 ] 

Kazuaki Ishizaki commented on SPARK-28699:
--

[~dongjoon] Thank you for pointing out my typo. You are right. I should have 
said {2.3.4-rc1}.

Actually, while I was doing the following, it is not reflected at the 
repository yet!  After fixing this, let me restart the release process for 
{2.3.4-rc1}.

{code}
Release details:
BRANCH: branch-2.3
VERSION: 2.3.4
TAG: v2.3.4-rc1
{code}

> Cache an indeterminate RDD could lead to incorrect result while stage rerun
> ---
>
> Key: SPARK-28699
> URL: https://issues.apache.org/jira/browse/SPARK-28699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>  Labels: correctness
>
> Related with SPARK-23207 SPARK-23243
> It's another case for the indeterminate stage/RDD rerun while stage rerun 
> happened. In the CachedRDDBuilder.
> We can reproduce this by the following code, thanks to Tyson for reporting 
> this!
>   
> {code:scala}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)}
> // kill an executor in the stage that performs repartition(239)
> val df = res.repartition(113).cache.repartition(239).map { x =>
>  if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && 
> TaskContext.get.stageAttemptNumber == 0) {
>  throw new Exception("pkill -f -n java".!!)
>  }
>  x
> }
> val r2 = df.distinct.count()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910974#comment-16910974
 ] 

Dongjoon Hyun commented on SPARK-28699:
---

? [~kiszk]. `2.4.4-rc1` is `branch-2.4` and mine. You should not remove that.
I guess you wanted to say `2.3.4-rc1` and `2.3.4-rc1` is not created yet.

> Cache an indeterminate RDD could lead to incorrect result while stage rerun
> ---
>
> Key: SPARK-28699
> URL: https://issues.apache.org/jira/browse/SPARK-28699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>  Labels: correctness
>
> Related with SPARK-23207 SPARK-23243
> It's another case for the indeterminate stage/RDD rerun while stage rerun 
> happened. In the CachedRDDBuilder, we miss tracking the `isOrderSensitive` 
> characteristic to the newly created MapPartitionsRDD.
> We can reproduce this by the following code, thanks to Tyson for reporting 
> this!
>  
> {code:scala}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)}
> // kill an executor in the stage that performs repartition(239)
> val df = res.repartition(113).cache.repartition(239).map { x =>
>  if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && 
> TaskContext.get.stageAttemptNumber == 0) {
>  throw new Exception("pkill -f -n java".!!)
>  }
>  x
> }
> val r2 = df.distinct.count()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Kazuaki Ishizaki (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910907#comment-16910907
 ] 

Kazuaki Ishizaki commented on SPARK-28699:
--

[~smilegator] Thank you for cc. I wait for fixing this.

I was in the middle of releasing RC1. Thus, there is already {{2.4.4-rc1}} tag 
in the [branch-2.3|https://github.com/apache/spark/tree/branch-2.3]. Should I 
remove this tag and release rc1? Or should I leave this tag and release rc2 at 
first?


> Cache an indeterminate RDD could lead to incorrect result while stage rerun
> ---
>
> Key: SPARK-28699
> URL: https://issues.apache.org/jira/browse/SPARK-28699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>  Labels: correctness
>
> Related with SPARK-23207 SPARK-23243
> It's another case for the indeterminate stage/RDD rerun while stage rerun 
> happened. In the CachedRDDBuilder, we miss tracking the `isOrderSensitive` 
> characteristic to the newly created MapPartitionsRDD.
> We can reproduce this by the following code, thanks to Tyson for reporting 
> this!
>  
> {code:scala}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)}
> // kill an executor in the stage that performs repartition(239)
> val df = res.repartition(113).cache.repartition(239).map { x =>
>  if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && 
> TaskContext.get.stageAttemptNumber == 0) {
>  throw new Exception("pkill -f -n java".!!)
>  }
>  x
> }
> val r2 = df.distinct.count()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910859#comment-16910859
 ] 

Xiao Li commented on SPARK-28699:
-

Also cc [~kiszk] Let us wait for this before starting RC1 for 2.3

> Cache an indeterminate RDD could lead to incorrect result while stage rerun
> ---
>
> Key: SPARK-28699
> URL: https://issues.apache.org/jira/browse/SPARK-28699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>  Labels: correctness
>
> Related with SPARK-23207 SPARK-23243
> It's another case for the indeterminate stage/RDD rerun while stage rerun 
> happened. In the CachedRDDBuilder, we miss tracking the `isOrderSensitive` 
> characteristic to the newly created MapPartitionsRDD.
> We can reproduce this by the following code, thanks to Tyson for reporting 
> this!
>  
> {code:scala}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)}
> // kill an executor in the stage that performs repartition(239)
> val df = res.repartition(113).cache.repartition(239).map { x =>
>  if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && 
> TaskContext.get.stageAttemptNumber == 0) {
>  throw new Exception("pkill -f -n java".!!)
>  }
>  x
> }
> val r2 = df.distinct.count()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-19 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910795#comment-16910795
 ] 

Dongjoon Hyun commented on SPARK-28699:
---

Hi, [~XuanYuan].
Could you check old Spark versions and update `Affects Version/s:` of this JIRA 
issue?

> Cache an indeterminate RDD could lead to incorrect result while stage rerun
> ---
>
> Key: SPARK-28699
> URL: https://issues.apache.org/jira/browse/SPARK-28699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>  Labels: correctness
>
> Related with SPARK-23207 SPARK-23243
> It's another case for the indeterminate stage/RDD rerun while stage rerun 
> happened. In the CachedRDDBuilder, we miss tracking the `isOrderSensitive` 
> characteristic to the newly created MapPartitionsRDD.
> We can reproduce this by the following code, thanks to Tyson for reporting 
> this!
>  
> {code:scala}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)}
> // kill an executor in the stage that performs repartition(239)
> val df = res.repartition(113).cache.repartition(239).map { x =>
>  if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && 
> TaskContext.get.stageAttemptNumber == 0) {
>  throw new Exception("pkill -f -n java".!!)
>  }
>  x
> }
> val r2 = df.distinct.count()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28699) Cache an indeterminate RDD could lead to incorrect result while stage rerun

2019-08-12 Thread Yuanjian Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905250#comment-16905250
 ] 

Yuanjian Li commented on SPARK-28699:
-

The current [approach|https://github.com/apache/spark/pull/25420] just a 
bandage fix for returning the wrong answer.

After we finish the work of indeterminate stage rerunning(SPARK-25341), we can 
fix this by unpersisting the original RDD and rerunning the cached 
indeterminate stage. Gives a preview codebase 
[here|https://github.com/xuanyuanking/spark/tree/SPARK-28699-RERUN].

> Cache an indeterminate RDD could lead to incorrect result while stage rerun
> ---
>
> Key: SPARK-28699
> URL: https://issues.apache.org/jira/browse/SPARK-28699
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>
> Related with SPARK-23207 SPARK-23243
> It's another case for the indeterminate stage/RDD rerun while stage rerun 
> happened. In the CachedRDDBuilder, we miss tracking the `isOrderSensitive` 
> characteristic to the newly created MapPartitionsRDD.
> We can reproduce this by the following code, thanks to Tyson for reporting 
> this!
>  
> {code:scala}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1 * 1, 1).map\{ x => (x % 1000, x)}
> // kill an executor in the stage that performs repartition(239)
> val df = res.repartition(113).cache.repartition(239).map { x =>
>  if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && 
> TaskContext.get.stageAttemptNumber == 0) {
>  throw new Exception("pkill -f -n java".!!)
>  }
>  x
> }
> val r2 = df.distinct.count()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org