[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...

andrewor14 Wed, 08 Apr 2015 18:29:00 -0700

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5144#issuecomment-91085082
  
    @tnachen Thanks for following up on the reviews. This is much better than 
before. However, I still find the code in `MesosClusterScheduler` very 
difficult to follow for a few reasons. I apologize in advance for the 
"slightly" long response, but I think these are important points that will 
improve the quality of the code significantly.
    
    ---
    
    **First, the naming is confusing.** It's strange to see a 
`MesosClusterSchedulerDriver` extend a `MesosClusterScheduler` (why is a driver 
a scheduler?). What's worse is that `MesosClusterSchedulerDriver` contains a 
`driver` field that it inherits from `MesosSchedulerUtils`, so we end up with a 
driver that has a driver that can submit drivers!
    
    My suggestion is to rename `MesosClusterScheduler` to 
`MesosSubmissionClient`, and `MesosClusterSchedulerDriver` to 
`MesosSubmissionClientImpl`. Since we are following the YARN pattern in code, 
we might as well also follow it in naming. (Not my favorite pattern, but there 
are benefits here that you pointed out previously.) In all of these cases we 
will be explicit in the java docs that it's intended for Mesos cluster mode 
only. Then maybe `MesosSchedulerUtils`'s `driver` field should be renamed to 
`mesosDriver`, or maybe even `mesosClient` to indicate that it's the interface 
to talk to the Mesos master with.
    
    **Second, the class structure is still confusing.** There are still many 
classes that seem somewhat related but not really. In particular, in 
`MesosClusterSchedulerDriver` there are three different data structures that 
contain "drivers", one is a `DriverQueue`, the other is a `LaunchedDrivers`, 
which is really a map, and then there's `finishedDrivers`, which is an actual 
list. This means someone who wants to understand how this code works needs to 
first understand all of these smaller classes, which is cumbersome even with an 
IDE, and virtually impossible without one. These classes aren't even for 
reducing duplicate logic. In `LaunchedDriver` for instance, the only methods 
with some logic in them (`states`, `remove`, `set`) are only used in one place.
    
    If you look at `o.a.s.deploy.master.Master`, we keep track of 
`waitingDrivers`, `completedDrivers`, and `drivers` (which are running 
drivers), and these are all `ArrayBuffer[DriverInfo]`. I'm not suggesting that 
we should follow exactly what is done there, but an effort towards simplifying 
the class structure further will be fruitful here.
    
    **Third, the code still lacks documentation.** The latest changes already 
added a lot of comments, which is great. However, certain unintuitive parts of 
the code are still largely uncommented. An example is 
`MesosClusterSchedulerDriver#resourceOffers`. This method contains two inner 
methods, neither of which is has a javadoc, and one takes in a closure that 
returns a 2-tuple where the second field is an `Option`. When is the closure 
invoked? When is the `Option` defined vs empty? Why is the `superviseRetryList` 
and `RetryState` involved even if supervise mode is not enabled by default? ... 
Personally I frequently find simpler ways to express the same logic in the 
process of writing comments.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...

Reply via email to