[jira] [Updated] (SPARK-23948) Trigger mapstage's job listener in submitMissingTasks

2018-04-17 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid updated SPARK-23948:
-
Fix Version/s: 2.3.1

> Trigger mapstage's job listener in submitMissingTasks
> -
>
> Key: SPARK-23948
> URL: https://issues.apache.org/jira/browse/SPARK-23948
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler, Spark Core
>Affects Versions: 2.3.0
>Reporter: jin xing
>Assignee: jin xing
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> SparkContext submitted a map stage from "submitMapStage" to DAGScheduler, 
> "markMapStageJobAsFinished" is called only in 
> (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L933
>  and   
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1314);
> But think about below scenario:
> 1. stage0 and stage1 are all "ShuffleMapStage" and stage1 depends on stage0;
> 2. We submit stage1 by "submitMapStage";
> 3. When stage 1 running, "FetchFailed" happened, stage0 and stage1 got 
> resubmitted as stage0_1 and stage1_1;
> 4. When stage0_1 running, speculated tasks in old stage1 come as succeeded, 
> but stage1 is not inside "runningStages". So even though all splits(including 
> the speculated tasks) in stage1 succeeded, job listener in stage1 will not be 
> called;
> 5. stage0_1 finished, stage1_1 starts running. When "submitMissingTasks", 
> there is no missing tasks. But in current code, job listener is not triggered



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23948) Trigger mapstage's job listener in submitMissingTasks

2018-04-11 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid updated SPARK-23948:
-
Component/s: Scheduler

> Trigger mapstage's job listener in submitMissingTasks
> -
>
> Key: SPARK-23948
> URL: https://issues.apache.org/jira/browse/SPARK-23948
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler, Spark Core
>Affects Versions: 2.3.0
>Reporter: jin xing
>Priority: Major
>
> SparkContext submitted a map stage from "submitMapStage" to DAGScheduler, 
> "markMapStageJobAsFinished" is called only in 
> (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L933
>  and   
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1314);
> But think about below scenario:
> 1. stage0 and stage1 are all "ShuffleMapStage" and stage1 depends on stage0;
> 2. We submit stage1 by "submitMapStage";
> 3. When stage 1 running, "FetchFailed" happened, stage0 and stage1 got 
> resubmitted as stage0_1 and stage1_1;
> 4. When stage0_1 running, speculated tasks in old stage1 come as succeeded, 
> but stage1 is not inside "runningStages". So even though all splits(including 
> the speculated tasks) in stage1 succeeded, job listener in stage1 will not be 
> called;
> 5. stage0_1 finished, stage1_1 starts running. When "submitMissingTasks", 
> there is no missing tasks. But in current code, job listener is not triggered



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23948) Trigger mapstage's job listener in submitMissingTasks

2018-04-10 Thread jin xing (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jin xing updated SPARK-23948:
-
Description: 
SparkContext submitted a map stage from "submitMapStage" to DAGScheduler, 
"markMapStageJobAsFinished" is called only in 
(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L933
 and   
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1314);

But think about below scenario:
1. stage0 and stage1 are all "ShuffleMapStage" and stage1 depends on stage0;
2. We submit stage1 by "submitMapStage";
3. When stage 1 running, "FetchFailed" happened, stage0 and stage1 got 
resubmitted as stage0_1 and stage1_1;
4. When stage0_1 running, speculated tasks in old stage1 come as succeeded, but 
stage1 is not inside "runningStages". So even though all splits(including the 
speculated tasks) in stage1 succeeded, job listener in stage1 will not be 
called;
5. stage0_1 finished, stage1_1 starts running. When "submitMissingTasks", there 
is no missing tasks. But in current code, job listener is not triggered

  was:
SparkContext submitted a map stage from "submitMapStage" to DAGScheduler, 
"markMapStageJobAsFinished" is called only in 
(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L933
 and   
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1314);

But think about below scenario:
1. stage0 and stage1 are all "ShuffleMapStage" and stage1 depends on stage0;
2. We submit stage1 by "submitMapStage", there are 10 missing tasks in stage1
3. When stage 1 running, "FetchFailed" happened, stage0 and stage1 got 
resubmitted as stage0_1 and stage1_1;
4. When stage0_1 running, speculated tasks in old stage1 come as succeeded, but 
stage1 is not inside "runningStages". So even though all splits(including the 
speculated tasks) in stage1 succeeded, job listener in stage1 will not be 
called;
5. stage0_1 finished, stage1_1 starts running. When "submitMissingTasks", there 
is no missing tasks. But in current code, job listener is not triggered


> Trigger mapstage's job listener in submitMissingTasks
> -
>
> Key: SPARK-23948
> URL: https://issues.apache.org/jira/browse/SPARK-23948
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: jin xing
>Priority: Major
>
> SparkContext submitted a map stage from "submitMapStage" to DAGScheduler, 
> "markMapStageJobAsFinished" is called only in 
> (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L933
>  and   
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1314);
> But think about below scenario:
> 1. stage0 and stage1 are all "ShuffleMapStage" and stage1 depends on stage0;
> 2. We submit stage1 by "submitMapStage";
> 3. When stage 1 running, "FetchFailed" happened, stage0 and stage1 got 
> resubmitted as stage0_1 and stage1_1;
> 4. When stage0_1 running, speculated tasks in old stage1 come as succeeded, 
> but stage1 is not inside "runningStages". So even though all splits(including 
> the speculated tasks) in stage1 succeeded, job listener in stage1 will not be 
> called;
> 5. stage0_1 finished, stage1_1 starts running. When "submitMissingTasks", 
> there is no missing tasks. But in current code, job listener is not triggered



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23948) Trigger mapstage's job listener in submitMissingTasks

2018-04-09 Thread jin xing (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jin xing updated SPARK-23948:
-
Description: 
SparkContext submitted a map stage from "submitMapStage" to DAGScheduler, 
"markMapStageJobAsFinished" is called only in 
(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L933
 and   
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1314);

But think about below scenario:
1. stage0 and stage1 are all "ShuffleMapStage" and stage1 depends on stage0;
2. We submit stage1 by "submitMapStage", there are 10 missing tasks in stage1
3. When stage 1 running, "FetchFailed" happened, stage0 and stage1 got 
resubmitted as stage0_1 and stage1_1;
4. When stage0_1 running, speculated tasks in old stage1 come as succeeded, but 
stage1 is not inside "runningStages". So even though all splits(including the 
speculated tasks) in stage1 succeeded, job listener in stage1 will not be 
called;
5. stage0_1 finished, stage1_1 starts running. When "submitMissingTasks", there 
is no missing tasks. But in current code, job listener is not triggered

  was:
SparkContext submitted a map stage from "submitMapStage" to DAGScheduler, 
"markMapStageJobAsFinished" is called only in ();

But think about below scenario:
1. stage0 and stage1 are all "ShuffleMapStage" and stage1 depends on stage0;
2. We submit stage1 by "submitMapStage", there are 10 missing tasks in stage1
3. When stage 1 running, "FetchFailed" happened, stage0 and stage1 got 
resubmitted as stage0_1 and stage1_1;
4. When stage0_1 running, speculated tasks in old stage1 come as succeeded, but 
stage1 is not inside "runningStages". So even though all splits(including the 
speculated tasks) in stage1 succeeded, job listener in stage1 will not be 
called;
5. stage0_1 finished, stage1_1 starts running. When "submitMissingTasks", there 
is no missing tasks. But in current code, job listener is not triggered


> Trigger mapstage's job listener in submitMissingTasks
> -
>
> Key: SPARK-23948
> URL: https://issues.apache.org/jira/browse/SPARK-23948
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: jin xing
>Priority: Major
>
> SparkContext submitted a map stage from "submitMapStage" to DAGScheduler, 
> "markMapStageJobAsFinished" is called only in 
> (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L933
>  and   
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1314);
> But think about below scenario:
> 1. stage0 and stage1 are all "ShuffleMapStage" and stage1 depends on stage0;
> 2. We submit stage1 by "submitMapStage", there are 10 missing tasks in stage1
> 3. When stage 1 running, "FetchFailed" happened, stage0 and stage1 got 
> resubmitted as stage0_1 and stage1_1;
> 4. When stage0_1 running, speculated tasks in old stage1 come as succeeded, 
> but stage1 is not inside "runningStages". So even though all splits(including 
> the speculated tasks) in stage1 succeeded, job listener in stage1 will not be 
> called;
> 5. stage0_1 finished, stage1_1 starts running. When "submitMissingTasks", 
> there is no missing tasks. But in current code, job listener is not triggered



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org