Github user mccheah commented on the issue:
https://github.com/apache/spark/pull/19954
> It's nice that there is some documentation somewhere, but that
documentation doesn't really seem to address my comments. For one example, it
only explicitly talks about the driver - which sort of makes sense because the
document is about submission. But why aren't orchestrators used when starting
executors too? It seems there's similar code baked into another class instead.
> What I'm asking is for this to be documented properly so that someone who
didn't write the code has enough information to know that it's working as it
should. Right now I don't see what some of these abstractions are for at all -
for example, as far as I can see, the orchestrator can be replaced by a method
call instead of being a completely separate type; it's not really abstracting
anything. Look at where it's used:
@vanzin Thanks for this feedback. I might suggest that documentation isn't
the main issue, but rather that it's possible the abstractions themselves are
not self-documenting. We're open to ideas for alternative representations of
the submission logic that are easier to parse for the outside reader, and this
dialogue is very productive.
> But why aren't orchestrators used when starting executors too? It seems
there's similar code baked into another class instead.
As with any abstraction, we can inline the orchestrator into the submission
client, and then consider if that makes the `Client` do too many things, and
also, what would the unit tests look like as a result? I like that we can test
the orchestrator and its selection of which steps to apply independently from
testing that the `Client` takes the API objects and creates them accordingly,
particularly with the hooked up owner references. A goal we had with this
design is that every class has as few responsibilities as possible; ideally
every module is responsible for exactly one thing. Though the long constructor
argument list for the orchestrator would suggest that the orchestrator is tied
to the Client pretty tightly.
> But why aren't orchestrators used when starting executors too? It seems
there's similar code baked into another class instead.
A common situation one will see is that the submission client needs to do
strictly more work than the driver does, as there is more work required to set
up a Spark application and create the objects for it, than it is to bootstrap
the executor pods with the objects that already exist. For example, the
submission client has to create the config map that sets up the init-container,
as well as mount the config map volume into the driver; but the driver does not
create the config map but instead uses the pre-existing one, but still has to
do the same mounting into the executor pod as the submission client does. These
semantics are shown where we first have a submodule that attaches the config
map volumes to an arbitrary pod (the âbootstrapâ), then, the driver need
only use that submodule in isolation, but the submission client wraps that
submodule in a step that both uses that submodule and also creates the API
objects in the API server.
It would be neat to explore the idea of using a step-based system that is
shared for creating the executors and creating the driver, but we have to think
carefully about a proper abstraction. For example, the `KubernetesDriverSpec`
has the notion of `additionalKubernetesResources` and the `driverSparkConf`,
which are constructs that do not apply when creating the executor pods. This
illustrates what we observe above - the construction of the application is
strictly more work than creating the executors. So then, does the submission
client have two orchestrators - one for configuring the driver pod itself, and
one for configuring everything else in the application, e.g. creating config
maps? Then the orchestration and step selection for the pod-configuration-steps
could be shared between the submission client and the driver, but the
orchestrator for strictly creating Kubernetes API objects and the driver spark
conf would only be used by the submission client. There are some direct
ions we can explore here, and we are open to ideas.
> So aside from not really being able to infer the structure of how these
things work, the current abstraction seems to be creating a lot of constructors
and methods with long lists of arguments, which is another thing that hurts the
readability of the code.
I think regardless of the abstractions we choose, we're eventually going to
end up with many classes with unique constructors simply due to the complexity
of the submission client itself. Creating a Spark application in Kubernetes has
many steps, and we want to put each step into a submodule - regardless if the
submodules have a shared trait like we have now, or if the submodules are just
independent classes with independent APIs. Each submodule will have an argument
list that is befitting of the properties that submodule will require to
configure the driver. See
https://github.com/apache-spark-on-k8s/spark/tree/branch-2.2-kubernetes/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/submitsteps
for all the different submodules the submission client will eventually need.
I do think that we should remove the `InitContainer` level of steps and
orchestration, as having two layers of steps and orchestration is pretty
confusing. Thereâs only one code branch in the init container steps
orchestrator which we only apply when we are also submitting local files - and
that isnât even done in this PR - so it shouldnât be too big of a burden to
inline that logic all into the top level step when the time comes.
Again, the feedback is very much appreciated. What are some ways we can
improve and move forward here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]