Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/1545#issuecomment-52573795
Ok so if you're anxious to get this in, how about this simpler fix to make
this a little less ugly:
(1) Change the numTasks parameter to Stage *not* to be a val -- so it's not
saved as part of the class, since it's incorrect for later attempts. Then,
change StageInfo.fromStage to always accept a number of tasks. Also update the
docstring for Stage to specify that a Stage object is used across multiple
stage attempts.
(2) Change the comment above Stage.info to say it's a pointer to the most
recent StageInfo, and will be updated by the DAGScheduler for new stage
attempts. Maybe also change the name to latestInfo so it's abundantly clear
that this can be updated.
(3) Reset the info in resubmitFailedStages, rather than the current place
that you have it. I think that makes it more clear what's going on / why
Stage.info needs to be set.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]