Github user mccheah commented on the issue:
https://github.com/apache/spark/pull/19954
> I disagree. For example, just to pick one example I spotted while
glancing at the code quickly. There are many steps that take a
kubernetesResourceNamePrefix as a constructor argument. Then the code that
instantiates those steps needs to know each of those steps, and that they take
that prefix as an argument to their constructor. I don't see that as
abstracting much.
> Instead, for example, if you encapsulate the state of a "kubernetes
submission" into some helper type that wraps SparkConf and other common things
needed to submit an app, those steps could potentially just take that helper
type as an argument to the abstract method that does the processing (e.g.
configureDriver). There's less step-specific logic needed everywhere, and the
list of arguments to constructors and methods shrinks a lot.
@vanzin An excellent idea - this is very much in line with the e.g.
`SQLConf` that's in Spark-SQL. @liyinan926 it would be great to introduce this
and swap out our parameters in various places. Our original methodology was to
pass parameters down that would normally need to be repeatedly retrieved by the
same config key from the `SparkConf` over and over again, but we can make an
abstraction for that. This basically moves us from parameter primitives to
parameter objects, an elegant solution.
> That's the thing. That's not an abstraction, that's just code that lives
in a separate method. You can test the "method that creates a list of steps"
separately too without having to stash it in a completely separate type and
create more coupling in your code. Abstraction is meant to reduce coupling, not
increase it ...
> ... You just have code living in separate classes with the goal of making
testing easier, but you can achieve the same thing without the code living in
separate classes, and things would probably be simpler overall.
A design we aimed for here was to make every class in the code do exactly
one thing, and the test for that class thus can focus solely on testing that
single thing. Having the `@VisibleForTesting` annotation with public methods or
something similar to test multiple methods in the same class makes that single
unit test class difficult to read. Or we would have to split up the unit test
for the given class to multiple unit test classes, but then it becomes
difficult to track down all of the classes that come together to test that
class. So although I agree that we're not creating abstractions per se in the
sense that we're not creating generic APIs that have multiple implementations,
we are creating helper objects and submodules to decompose the submission
process such that we have smaller chunks to test at a time.
Therefore a key hallmark for why the code is decomposed the way it is, is
to make the tests as readable as possible, that each test class is honed in on
covering all code paths in a single unit. Again - open to ideas of how to make
that more readable in the main code.
> An abstraction for an orchestrator would mean that there is a code path
that can take a different orchestrator to achieve different things. For
example, if you had code that started containers, and given a different
orchestrator, could start either a driver or an executor, or even be used by
tests to mock some behavior you want to verify.
I'm interested in exploring this idea - will have to think a bit more on
what that would specifically look like.
> I think the biggest thing that makes reviewing this PR challenging is the
way the code is structured, not the amount of changes. This PR is actually both
shorter and conceptually simpler than the first two big ones, IMO. Unless we do
a thorough refactoring of the code, I personally don't think splitting into
smaller ones will help much on reducing the confusion. Splitting also risks
making half baked feature into 2.3, which I personally don't think is a good
idea.
@liyinan926 the code structure is the main issue, yes, but we can tackle
the code structure more effectively by having a better decomposition of the
review process as well. We have these three distinct components which are
relatively independent. We can therefore separate out the three pieces and
consider the architecture for each of them individually. Whereas here we may
want to tackle how to change the submission and the executor pod configuration,
but what if in our focus on those we miss crucial issues in the design of the
init container? Breaking up the PR will allow us to make each PR focus on the
architectural considerations of each part more thoroughly, and it makes it
easier to follow the discussion surrounding each part.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]