Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/6360#issuecomment-110862558
  
    One higher level thing I'd like to mention is that the YARN code is getting 
quite difficult to review. We cleaned up a lot of the duplicate code and added 
documentation everywhere at one point, and that helped a lot. However, it is 
still difficult to follow the code or ensure its correctness because of a few 
things:
    
    - Monstrous 100-200 lines methods like `setupLaunchEnv` and 
`prepareLocalResources`. These will only grow over time.
    - Lack of high level documentation of what important classes like `Client`, 
`ApplicationMaster` and `ExecutorRunnable` do. Most of these don't even have 
class-level java docs.
    - There are many implicit assumptions (e.g. use URI fragments if the local 
scheme is present) that are carried over from legacy code and undocumented
    - Test coverage is poor. This is related to the first point; having large 
methods makes it difficult to write tests at fine granularity.
    
    Rather than suggesting that we fix all these at once, I'm just jotting down 
my thoughts so we don't forget to fix these down the road. I'll add these 
thoughts to the umbrella YARN clean up JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to