-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191229
-----------------------------------------------------------


Fix it, then Ship it!




Looks great overall!  The e2e test is very easy to follow.


src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
Lines 185 (patched)
<https://reviews.apache.org/r/63536/#comment268905>

    Nit: `isPartitionAware` to minutely improve readability.



src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 91 (patched)
<https://reviews.apache.org/r/63536/#comment268912>

    A `LOG.info` on either side of the branch on task configuration would be 
nice to simplify production debugging.
    
    `Partitioned task will be rescheduled in %s seconds`
    
    `Partitioned task will not be rescheduled`



src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java
Lines 209 (patched)
<https://reviews.apache.org/r/63536/#comment268953>

    I suggest keeping this as a valid transition and allowing the 
PartitionManager to deal with it.



src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java
Lines 281 (patched)
<https://reviews.apache.org/r/63536/#comment268956>

    If `FAILED` is in this list, i would expect `FINISHED` as well.



src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java
Line 266 (original), 307 (patched)
<https://reviews.apache.org/r/63536/#comment268955>

    Nit: order the states consistently where possible.  Makes it easier to scan 
for where states are handled differently.  The prevailing convention in this 
patch seems to be `PARTITIONED` last, which sounds fine to me.



src/main/python/apache/aurora/config/schema/base.py
Lines 157 (patched)
<https://reviews.apache.org/r/63536/#comment268963>

    I would prefer the name `RescheduleImmediately` (or similar) to make this 
self-documenting.



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 55 (patched)
<https://reviews.apache.org/r/63536/#comment268969>

    (disclaimer: see my longer comment about an integration test before trying 
to satisfy other comments in this file)
    
    Have a look at `FakeScheduledExecutor` and see if you find it helpful.  The 
API is a bit weird, but it aims to simplify this type of test where you need a 
`Clock` and a `ScheduledExecutor`.  At the very least, it eliminates the noise 
of using `Capture`s.



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 57 (patched)
<https://reviews.apache.org/r/63536/#comment268966>

    nit: rm line



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 60 (patched)
<https://reviews.apache.org/r/63536/#comment268967>

    nit: static; ditto elsewhere



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 74-75 (patched)
<https://reviews.apache.org/r/63536/#comment268971>

    can you swap old/new param order?  i was confused by some callers until i 
came up here :-)



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 87 (patched)
<https://reviews.apache.org/r/63536/#comment268968>

    nit: newline before



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 109-115 (patched)
<https://reviews.apache.org/r/63536/#comment268973>

    Test cases like this lead me to believe that an integration test is more 
appropriate.  This test case reads as "nothing happens when a task with 
reschedule=false transitions from RUNNING to PARTITIONED".  What we really 
mean, though, is that "a task with reschedule=false immediately moves to LOST 
upon transition attempt from RUNNING to PARTITIONED".
    
    With this in mind, i suggest you eliminate this test and move the coverage 
into `StateManagerImplTest`.  I think this is rather natural, since they work 
in unison to manage state transitions.
    
    I don't feel strongly about this, since it is not the convention in the 
codebase, and i don't know what hurdles you may encounter if you try to bundle 
this behavior in `StateManagerImplTest`.



src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java
Lines 431 (patched)
<https://reviews.apache.org/r/63536/#comment268975>

    Please phrase this check as:
    
    ```java
    assertEquals(
      ImmutableList.of(PENDING, ASSIGNED, ...),
      updatedTask.getTaskEvents().stream()
          .map(e -> e.getStatus())
          .collect(Collectors.toList()));
    ```
    
    This will give greater confidence that the compaction worked as intended.


- Bill Farner


On Nov. 15, 2017, 5:54 p.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> -----------------------------------------------------------
> 
> (Updated Nov. 15, 2017, 5:54 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> f0819fb7ac758ad1229a76fd9794b393400e9f63 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/6/
> 
> 
> Testing
> -------
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> ----------------
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>

Reply via email to