NealSun96 opened a new pull request #1508:
URL: https://github.com/apache/helix/pull/1508
### Issues
- [x] My PR addresses the following Helix issues and references them in the
PR description:
Fixes #1506, #1507
### Description
- [x] Here are some details about my PR, including screenshots of any UI
changes:
This PR fixes 2 issues:
1. Ondemand rebalance flooding after TF IS removal. This is caused by an old
problem. A jobConfig can exist for the previous iterations of a workflow, but
the job doesn't exist in the job DAG. Because of runtime DAG refresh logic,
such a jobConfig will cause the runtime DAG to be refreshed every time the
pipeline runs. A new runtime DAG causes all the jobs to be reprocessed, and
during processing, the cleanup logic is run for every job that has already
completed. Before IS removal, the cleanup logic attempts to delete the IS
again, which has no effect; in the new code, an onDemand rebalance is triggered
instead.
This "dangling job" will not be removed, so the job dag refresh keeps on
happening, which causes the ondemand rebalances to keep on firing.
2. Log flooding for jobs that miss target resources after TF IS removal.
Similarly, "dangling jobs" can have missing target resources once their target
resources are deleted. Before IS removal, this is not a problem because this
log is only triggered when the job is first processed; now, the processing
logic happens every pipeline (since it's config based, instead of IS based), so
the log could keep on firing.
### Tests
- [x] The following is the result of the "mvn test" command on the
appropriate module:
```
[ERROR] Tests run: 1236, Failures: 1, Errors: 0, Skipped: 1, Time elapsed:
4,922.413 s <<< FAILURE! - in TestSuite
[ERROR] testDeactivateCluster(org.apache.helix.tools.TestHelixAdminCli)
Time elapsed: 2.383 s <<< FAILURE!
org.apache.helix.HelixException: There are still LEADER in the cluster, shut
them down first.
at
org.apache.helix.tools.TestHelixAdminCli.testDeactivateCluster(TestHelixAdminCli.java:604)
[INFO]
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestHelixAdminCli.testDeactivateCluster:604 ยป Helix There are
still LEADER in ...
[INFO]
[ERROR] Tests run: 1236, Failures: 1, Errors: 0, Skipped: 1
[INFO]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 01:22 h
[INFO] Finished at: 2020-11-02T18:52:09-08:00
[INFO]
------------------------------------------------------------------------
```
Rerun
```
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
23.672 s - in org.apache.helix.tools.TestHelixAdminCli
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 28.917 s
[INFO] Finished at: 2020-11-02T19:44:51-08:00
[INFO]
------------------------------------------------------------------------
```
### Documentation (Optional)
- In case of new functionality, my PR adds documentation in the following
wiki page:
(Link the GitHub wiki you added)
### Commits
- My commits all reference appropriate Apache Helix GitHub issues in their
subject lines. In addition, my commits follow the guidelines from "[How to
write a good git commit message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
1. Subject is limited to 50 characters (not including Jira issue reference)
1. Subject does not end with a period
1. Subject uses the imperative mood ("add", not "adding")
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"
### Code Quality
- My diff has been formatted using helix-style.xml
(helix-style-intellij.xml if IntelliJ IDE is used)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]