Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/12655#issuecomment-215250204
  
    Yes, I'd also like to avoid trying to fix all of these related issues in 
one big PR.  There are overlapping concerns, but I agree that a simple fix to 
the duplicate-stage bug (SPARK-13902) is where I'd like to start -- which means 
delaying any merge of the other, related PRs [@rxin @andrewor14 @srowen 
@JoshRosen ... hopefully that's enough to get the message out].
    
    I'd like to move the `shuffleToMapStage.getOrElse` within 
`newOrUsedShuffleStage` because that matches up better with the way I remember 
the desired semantics of newOrUsedStage -- i.e. if the desired Stage already 
exists, then re-use it; else create a new Stage.  Instead, what we do is: 
create a new Stage unconditionally; if some of the guts of the desired Stage 
already exist, then copy those over into the new Stage; else create a new 
Stage.  The way https://github.com/apache/spark/pull/8923 works is to retain 
the copy-the-guts approach  in newOrUsed, but to preface the callsite of 
newOrUsed with a guard to avoid the call in the Used case.  Seems like 
needlessly convoluted logic -- not that we're otherwise free from that 
particular sin within the DAGScheduler.
    
    What I have now is kind of belt-and-suspenders. I don't know that the 
copy-the-guts logic within the `if 
(mapOutputTracker.containsShuffle(shuffleDep.shuffleId))` branch can ever do 
anything useful within the OrElse of `shuffleToMapStage.getOrElse`, so maybe 
that branch can be dropped.  I'm also still trying to convince myself that I 
can't lose my pants by not doing any mapOutput checking and copying in the case 
that `shuffleToMapStage` already has a reference to the desired used Stage.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to