Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/5144#issuecomment-91085082
@tnachen Thanks for following up on the reviews. This is much better than
before. However, I still find the code in `MesosClusterScheduler` very
difficult to follow for a few reasons. I apologize in advance for the
"slightly" long response, but I think these are important points that will
improve the quality of the code significantly.
---
**First, the naming is confusing.** It's strange to see a
`MesosClusterSchedulerDriver` extend a `MesosClusterScheduler` (why is a driver
a scheduler?). What's worse is that `MesosClusterSchedulerDriver` contains a
`driver` field that it inherits from `MesosSchedulerUtils`, so we end up with a
driver that has a driver that can submit drivers!
My suggestion is to rename `MesosClusterScheduler` to
`MesosSubmissionClient`, and `MesosClusterSchedulerDriver` to
`MesosSubmissionClientImpl`. Since we are following the YARN pattern in code,
we might as well also follow it in naming. (Not my favorite pattern, but there
are benefits here that you pointed out previously.) In all of these cases we
will be explicit in the java docs that it's intended for Mesos cluster mode
only. Then maybe `MesosSchedulerUtils`'s `driver` field should be renamed to
`mesosDriver`, or maybe even `mesosClient` to indicate that it's the interface
to talk to the Mesos master with.
**Second, the class structure is still confusing.** There are still many
classes that seem somewhat related but not really. In particular, in
`MesosClusterSchedulerDriver` there are three different data structures that
contain "drivers", one is a `DriverQueue`, the other is a `LaunchedDrivers`,
which is really a map, and then there's `finishedDrivers`, which is an actual
list. This means someone who wants to understand how this code works needs to
first understand all of these smaller classes, which is cumbersome even with an
IDE, and virtually impossible without one. These classes aren't even for
reducing duplicate logic. In `LaunchedDriver` for instance, the only methods
with some logic in them (`states`, `remove`, `set`) are only used in one place.
If you look at `o.a.s.deploy.master.Master`, we keep track of
`waitingDrivers`, `completedDrivers`, and `drivers` (which are running
drivers), and these are all `ArrayBuffer[DriverInfo]`. I'm not suggesting that
we should follow exactly what is done there, but an effort towards simplifying
the class structure further will be fruitful here.
**Third, the code still lacks documentation.** The latest changes already
added a lot of comments, which is great. However, certain unintuitive parts of
the code are still largely uncommented. An example is
`MesosClusterSchedulerDriver#resourceOffers`. This method contains two inner
methods, neither of which is has a javadoc, and one takes in a closure that
returns a 2-tuple where the second field is an `Option`. When is the closure
invoked? When is the `Option` defined vs empty? Why is the `superviseRetryList`
and `RetryState` involved even if supervise mode is not enabled by default? ...
Personally I frequently find simpler ways to express the same logic in the
process of writing comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]