Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/6360#issuecomment-110862558
One higher level thing I'd like to mention is that the YARN code is getting
quite difficult to review. We cleaned up a lot of the duplicate code and added
documentation everywhere at one point, and that helped a lot. However, it is
still difficult to follow the code or ensure its correctness because of a few
things:
- Monstrous 100-200 lines methods like `setupLaunchEnv` and
`prepareLocalResources`. These will only grow over time.
- Lack of high level documentation of what important classes like `Client`,
`ApplicationMaster` and `ExecutorRunnable` do. Most of these don't even have
class-level java docs.
- There are many implicit assumptions (e.g. use URI fragments if the local
scheme is present) that are carried over from legacy code and undocumented
- Test coverage is poor. This is related to the first point; having large
methods makes it difficult to write tests at fine granularity.
Rather than suggesting that we fix all these at once, I'm just jotting down
my thoughts so we don't forget to fix these down the road. I'll add these
thoughts to the umbrella YARN clean up JIRA.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]