-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64940/#review194777
-----------------------------------------------------------




src/master/master.cpp
Lines 10058-10060 (patched)
<https://reviews.apache.org/r/64940/#comment273769>

    Perhaps we just need to comment on the logic we use to determine if the 
task is `unreachable` and not what `removeTask` is going to do about it 
(because it's hidden here)? i.e., remove the comment here and use something 
like what I suggested below?



src/master/master.cpp
Lines 10061 (patched)
<https://reviews.apache.org/r/64940/#comment273760>

    Perhaps move all the logic around determining the unreachable task down 
here.
    
    ```
    // We need to inform `removeTask` whether the task became unreachable due 
to the agent being removed. Note that terminal tasks are not considered 
unreachable. We can not rely on the task's state because it may have been set 
to `TASK_LOST` for backwards-compatibility.
    bool unreachable =
      unreachableTime.isSome() && !protobuf::isTerminalState(task->state());
    ```
    
    Inlining `unreachable` is probably fine too but separating it helps clarify 
it with comments?



src/master/master.cpp
Line 10273 (original), 10279-10280 (patched)
<https://reviews.apache.org/r/64940/#comment273766>

    So we had to use a boolean because we cannot rely on the task *state*, so 
"any of the unreachable states" is a bit misleading?
    
    I think the CHECK message is already clear enough for now? (Like you did in 
`removeTask`)
    
    I suggest we don't comment further on the removable-ness of the task 
because we might want to change it: 
https://issues.apache.org/jira/browse/MESOS-8389



src/tests/master_tests.cpp
Lines 7601-7602 (patched)
<https://reviews.apache.org/r/64940/#comment273805>

    This reads like "terminal tasks" are stored as "either finished or lost" 
when it's only the non-terminal task that is transitioned to lost.
    
    Can we just say that we are verifying "the terminal task is stored in its 
original state when parition happens"?



src/tests/master_tests.cpp
Lines 7603 (patched)
<https://reviews.apache.org/r/64940/#comment273804>

    Most agent partition related tests are put in partition_tests.cpp, can we 
put this one there as well for better grouping? 
    
    In particular there is a test named `TaskCompletedOnPartitionedAgent` there 
which is somewhat related. We can name this one 
`AgentWithCompletedTaskPartitioned` or `TaskCompletedBeforeAgentPartitioned`
    
    The main word I want to change is `Lost`. While I can understand you are 
using it interchangably with *partitioned*, in this context when it's call 
partitioned elsewhere and there's `TaskLost` to confuse with it, perhaps it's 
better to use the more precise word.



src/tests/master_tests.cpp
Lines 7621 (patched)
<https://reviews.apache.org/r/64940/#comment273811>

    We typically name this `offers` since it's a vector. You'll be using the 
first and only one so later you should
    
    ```
      AWAIT_READY(offers);
      ASSERT_FALSE(offers->empty());
      const Offer& offer = offers->at(0);
    ```



src/tests/master_tests.cpp
Lines 7637 (patched)
<https://reviews.apache.org/r/64940/#comment273812>

    You could just wait for the the status update (being droppped) instead of 
capaturing `Slave::executorTerminated` and risk the race condition of the 
master not receiving the status update before the agent is deemed partitioned?



src/tests/master_tests.cpp
Lines 7641 (patched)
<https://reviews.apache.org/r/64940/#comment273810>

    Instead of settling it's more direct to just wait for the offer before 
using it (like I mentioned above).



src/tests/master_tests.cpp
Lines 7675 (patched)
<https://reviews.apache.org/r/64940/#comment273783>

    "fast fast" to emphasize? :)



src/tests/master_tests.cpp
Lines 7681 (patched)
<https://reviews.apache.org/r/64940/#comment273784>

    If you wait for the ping, you don't need to settle?



src/tests/master_tests.cpp
Lines 7686 (patched)
<https://reviews.apache.org/r/64940/#comment273785>

    No need to settle?


- Jiang Yan Xu


On Jan. 3, 2018, 5:35 p.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64940/
> -----------------------------------------------------------
> 
> (Updated Jan. 3, 2018, 5:35 p.m.)
> 
> 
> Review request for mesos, Gaston Kleiman, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8337
>     https://issues.apache.org/jira/browse/MESOS-8337
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If an agent is lost, we try to remove all the tasks that might
> have been lost. However, if a task is already terminal, it hasn't
> really been lost so we should not be tracking it in the framework's
> unreachable tasks list.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 8fe9420dbe03ea2cefc6a40b0f64284aa9fe7915 
>   src/master/master.cpp 282fdf8ac38e3613c621c1c8e5c50f27bde9dafd 
>   src/tests/master_tests.cpp 5546fd937d078c1f757964d5163449ffd993388e 
> 
> 
> Diff: https://reviews.apache.org/r/64940/diff/1/
> 
> 
> Testing
> -------
> 
> make check (Fedora 27)
> 
> 
> Thanks,
> 
> James Peach
> 
>

Reply via email to