Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5694#issuecomment-111625224
After thinking about it a bit more, I think that this PR's test triggering
logic could be significantly easier to understand if we rewrote it in terms of
a job / dependency graph abstraction.
At a high level, we have Spark modules / components which
1. are affected / impacted by file changes (e.g. a module is associated
with a set of source files, so changes to those files change the module),
2. contain a set of tests to run, which are triggered via Maven, SBT, or
via Python / R scripts.
3. depend on other modules and have dependent modules: if we change core,
then every downstream test should be run.
Right now, the per-module logic is spread across a few different places: we
have one function that describes how to detect changes for all modules, another
function that (implicitly) deals with module dependencies, etc.
Instead, I propose that we introduce a class for describing a module, use
instances of this class to build up a dependency graph, then phrase the "find
which tests to run" operations in terms of that graph. I think that this will
be easier to understand / maintain.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]