-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61473/#review188966
-----------------------------------------------------------




src/master/http.cpp
Line 317 (original), 317 (patched)
<https://reviews.apache.org/r/61473/#comment266299>

    This comment isn't about this line but it seems my previous comment at the 
top of my last review was overlooked so I am raising an issue here :)
    
    For this patch we should also update the comments above PARTITION_AWARE API 
to reflect the change.
    
https://github.com/apache/mesos/blob/6eefc685ccf304d0fb8ed4ff9bc314197d77f078/include/mesos/mesos.proto#L336
    
    Also please search the docs for necessary changes.



src/master/http.cpp
Lines 322-324 (patched)
<https://reviews.apache.org/r/61473/#comment266286>

    Seems like if the framework is NPA, we don't have to iterate through the 
list? i.e., put the check outside the `foreachvalue` loop?



src/master/http.cpp
Lines 345-346 (patched)
<https://reviews.apache.org/r/61473/#comment266288>

    Given that we now store all NPA tasks in TASK_LOST (instead of being 
converted to TASK_LOST on the fly here), is it more straightfoward to say 
    
    ```
    // Unreachable tasks belonging to a non-partition-aware framework have been 
stored as TASK_LOST for backward compatibility so we should export them as 
completed tasks.
    ```
    
    ?



src/master/http.cpp
Lines 348-350 (patched)
<https://reviews.apache.org/r/61473/#comment266285>

    The reason we are here is that tasks in `framework_->unreachableTasks` 
could be in `TASK_LOST` state and we should export those as "completed_tasks".
    
    So I think the condition check isn't right? `if 
(framework_->capabilities.partitionAware)` is going to skip all tasks 
previously stored as `TASK_LOST`.
    
    Like I suggested in another place, should we be checking against `if 
(task.get()->state() != TASK_LOST)` directly?



src/master/http.cpp
Lines 4225 (patched)
<https://reviews.apache.org/r/61473/#comment266293>

    Is is more straightfoward to do `if (task.get()->state() == 
TASK_UNREACHABLE)` than the inequality check?



src/master/http.cpp
Lines 4228-4229 (patched)
<https://reviews.apache.org/r/61473/#comment266295>

    Ditto on the update of the comment.



src/master/master.hpp
Lines 2591 (patched)
<https://reviews.apache.org/r/61473/#comment265908>

    You made a redundant copy here but I understand this line may go away 
anyways. :)



src/master/master.hpp
Line 2588 (original), 2594 (patched)
<https://reviews.apache.org/r/61473/#comment266296>

    Sorry we have gone back and forth but for now let's use this API:
    
    ```
    void removeTask(Task* task, bool unreachable)
    ```
    
    I originally hoped to solely rely on `task->state()` in the workflow but it 
turned out to be pretty difficult. So here let's add a comment above the method 
explaning the reason for the boolean.
    
    e.g.,
    
    ```
    // Removes the task. `unreachable` indicates whether the task is removed 
due to being unreachable. Note that we cannot rely on the task state because it 
may not reflect unreachability due to being set to TASK_LOST for backwards 
compatibility.
    ```



src/master/master.cpp
Lines 9683-9684 (original), 9637-9639 (patched)
<https://reviews.apache.org/r/61473/#comment266297>

    If the framework is PA, it can still contain tasks in `TASK_LOST`. For now 
let's follow the implementation in `Master::_tasks_killing()`: check the state 
of each task.



src/tests/partition_tests.cpp
Lines 2044-2045 (patched)
<https://reviews.apache.org/r/61473/#comment266298>

    A comment shouldn't split the chain. I think 
    
    ```
        .WillRepeatedly(Return()); // The agent may resend status updates.
    ```
    
    fits in one line?


- Jiang Yan Xu


On Oct. 16, 2017, 1:59 a.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61473/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2017, 1:59 a.m.)
> 
> 
> Review request for mesos, James Peach, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master will not kill the tasks for non-Partition aware frameworks
> when an unreachable agent re-registers with the master.
> Master used to send a ShutdownFrameworkMessages to the agent
> to kill the tasks from non partition aware frameworks including the
> ones that are still registered which was problematic because the offer
> from this agent could still go to the same framework which could then
> launch new tasks. The agent would then receive tasks of the same
> framework and ignore them because it thinks the framework is shutting
> down. The framework is not shutting down of course, so from the master
> and the scheduler’s perspective the task is pending in STAGING forever
> until the next agent reregistration, which could happen much later.
> This commit fixes the problem by not shutting down the non-partition
> aware frameworks on such an agent.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 42139bec519d36316e324ef921157c49cdf2d043 
>   src/master/master.hpp 0ddc98259f64b3921d08f5f4ec81543bb0826378 
>   src/master/master.cpp 3603878f02ae3dba82811a4a5770dd21ec790ef6 
>   src/tests/partition_tests.cpp 0597bd2afaa60121245e0d43b81ac223257e377a 
> 
> 
> Diff: https://reviews.apache.org/r/61473/diff/8/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>

Reply via email to