-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65339/#review196227
-----------------------------------------------------------
Master (dbe7137) is red with this patch.
./build-support/jenkins/build.sh
:distZip
:assemble
:compileTestJavaNote: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note:
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
:processTestResources
:testClasses
:compileJmhJavaNote:
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/fakes/FakeSchedulerDriver.java
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
:processJmhResources NO-SOURCE
:jmhClasses
:checkstyleJmh
:checkstyleMain
:checkstyleTest
:licenseJmh UP-TO-DATE
:licenseMain UP-TO-DATE
:licenseTest UP-TO-DATE
:license UP-TO-DATE
:pmdJmh
:pmdMain
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java:182:
These nested if statements could be combined
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java:182:
These nested if statements could be combined
:pmdMain FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':pmdMain'.
> 2 PMD rule violations were found. See the report at:
> file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/pmd/main.html
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug
option to get more log output.
* Get more help at https://help.gradle.org
BUILD FAILED in 4m 30s
38 actionable tasks: 29 executed, 9 up-to-date
I will refresh this build result if you post a review containing "@ReviewBot
retry"
- Aurora ReviewBot
On Jan. 25, 2018, 9:03 a.m., David McLaughlin wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65339/
> -----------------------------------------------------------
>
> (Updated Jan. 25, 2018, 9:03 a.m.)
>
>
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
>
>
> Bugs: AURORA-1966
> https://issues.apache.org/jira/browse/AURORA-1966
>
>
> Repository: aurora
>
>
> Description
> -------
>
> As reported in https://issues.apache.org/jira/browse/AURORA-1966, Mesos sends
> a TASK_UNKNOWN when we try to kill (or reconcile) tasks that are unknown. On
> master, this leads to an infinite loop. The sequence of events is:
>
> 1) We map TASK_UNKNOWN to PARTITIONED
> 2) We react to restarting or terminal -> PARTITIONED state by telling Mesos
> "that is a bad state transition, that task should be dead".
> 3) Mesos replies with: that task is TASK_UNKNOWN
> 4) GO TO 1
>
> AURORA-1966 describes just one case of this happening, but there are many
> other legitimate paths to this.
>
> This patch cleans up the logic. The two main changes:
>
> 1) Do not allow ASSIGNED -> PARTITIONED. This is not really related to this
> bug, but I found this logic error during debugging. ASSIGNED is a transient
> state and is subject to the transient task timeout in the Scheduler, so we
> should not attempt to move to PARTITIONED during that window.
> 2) Do not try to kill tasks we think are terminal when Mesos tells us they
> are unknown. Originally we did this because "manageTerminalTasks" is also
> used for restarting tasks - but in both cases it never makes sense to respond
> to "I don't know about that task" with a request to kill it.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java
> b8ba5da729fcf5965b577c23e3062e5607bd07e7
> src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java
> 3d98fe651ad2b89a03044e8a06953a0cea876321
>
>
> Diff: https://reviews.apache.org/r/65339/diff/1/
>
>
> Testing
> -------
>
> ./gradlew test
>
> Verified this fixes the issue reported in AURORA-1966 by forcing
> LaunchException in OfferManagerImpl in my vagrant image and viewing logs.
>
>
> Thanks,
>
> David McLaughlin
>
>