> On May 10, 2017, 11:24 p.m., Neil Conway wrote:
> > My apologies for the delay in reviewing this.
> > 
> > High-level comments:
> > 
> > (a) Can we improve the description of the problem in the commit summary? It 
> > took me quite a while to figure out what is actually going on here. My 
> > understanding is:
> > 
> > (1) Agent re-registers
> > (2) Master sends `ShutdownFrameworkMessage` for non-PA frameworks on the 
> > agent
> > (3) Master offers agent resources to framework
> > (4) Framework launches new task on agent _before the agent has finished 
> > shutting down the framework_
> > (5) Agent ignores launch task because it believes the framework is still 
> > terminating.
> > 
> > This is a race condition, because if the agent finished shutting down the 
> > framework (in #4) before the launch task was received, there would not be a 
> > problem. Is my understanding correct?
> > 
> > (2) I'd prefer a new unit test that constructs exactly this scenario, 
> > rather than changing existing unit tests.
> > 
> > (3) The new behavior is that the framework will receive `TASK_KILLED` for 
> > any non-PA tasks running on a partitioned agent that re-registers. This 
> > does not seem ideal, because `TASK_KILLED` _normally_ corresponds to a 
> > framework-initiated `killTask` operation.
> > 
> > (4) Thinking out loud, what if a custom executor ignores the `killTask` 
> > request? When shutting down a framework, it seems we forcibly destroy the 
> > container (via `containerizer->destroy()`), if the executor doesn't exit 
> > promptly upon receiving the framework shutdown request. AFAIK we don't have 
> > similar logic for `killTask` requests.
> > 
> > 3 + 4 suggests that maybe we want a different way to terminate the task on 
> > the agent -- let's discuss.
> 
> Megha Sharma wrote:
>     Summarizing the 2 approaches we talked about.
>     
>     Approach 1: ShutdownFrameworkMessage
>     
>     1. Upon agent re-registration, master will add tasks even for non-PA 
> frameworks on this agent. This is needed by the master to do correct resource 
> accounting and not offer resources already in use on this agent. We need to 
> mutate the TaskState on the Task before adding them to the master's data 
> structures since the TaskState might be non-terminal when the agent sends 
> these tasks with ReregisterSlaveMessage. And the master has already sent 
> TASK_LOST for these tasks to the frameworks so we need to set the TaskState 
> to TASK_LOST so that any future reconciliations with the framework doesn't 
> have this task transitioning from TASK_LOST to TASK_RUNNNG/TASK_STARTING. 
> This is to avoid unnecessary confusion about task state as observed by the 
> framework but indeed this could have happened with non-strict registry as 
> well where the framework can actually receive a non terminal task state 
> update after receiving a TASK_LOST for the same task in the past.
>     
>     2. When the agent re-registers, the master will continue to send a 
> ShutdownFrameworkMessage to the agent to kill the tasks pertaining to non-PA 
> frameworks on the agent as it does today. An additional optional field will 
> be added to the ShutdownFrameworkMessage to indicate whether or not the 
> shutdown was initiated internally.
>     
>     3. During framework shutdown the state of the framework is set to 
> Framework::TERMINATING which prevents it from launching new tasks. Here, 
> since the framework is not really terminating so in order to allow it to 
> launch new tasks, the agent will not set the state to terminating if the 
> ShutdownFrameworkMessage is generated internally.
>     
>     4. The framework shutdown today doesn't generate any status updates which 
> needs to change. The status updates will be sent if the framework shutdown is 
> triggered internally, this is needed to remove the tasks of non-PA frameworks 
> that got added when the agent re-registered (1).
>     
>     Approach 2: Do not shutdown non-PA framework when agent re-registers and 
> let the frameworks make the decision on what needs to be done when they 
> receive non-terminal status updates for tasks for which they have already 
> received a TASK_LOST. This hopefully won't break any frameworks since it 
> could have happened in the past with non-strict registry as well and 
> frameworks should be resilient enough to handle this scenario.
>     
>     Let me know if I have missed anything. Personally, I am in favor of 
> approach 1 as it seems less catastrophic for the frameworks. What do you 
> think?
> 
> Megha Sharma wrote:
>     The proposed solution to fix the race between new task launches and 
> shutdown framework on the agent, was to not kill the non-partition aware 
> tasks when an unreachable agent re-registers with the master. So now as far 
> as Mesos internals are concerned, the tasks from non-partition aware 
> frameworks are to be treated the same way as partition aware tasks and one 
> way to do it cleanly in Mesos was to transition such non-partition aware 
> tasks to TASK_UNREACHABLE state instead of TASK_LOST when the agent becomes 
> unreachable. Internally such tasks will be in Framework#unreachableTasks 
> cache instead of Framework#completedTasks and their state would be 
> TASK_UNREACHABLE but to be backwards compatible we will do some 
> transformations when the data is being exposed to the users so there is no 
> difference in how these tasks are presented before and after this patch. I 
> will post the patch soon but do let me know what do you think about the 
> approach.
> 
> Megha Sharma wrote:
>     Since we are now adding the tasks from non-partition aware frameworks to 
> unreachableTasks cache instead of completedTasks when the agent goes 
> unreachable so we needed to do some transformations when the completed_tasks 
> and unreachable_tasks metric are being presented to the user. To achieve this 
> we needed an additional cache nonParitionAwareTasks to remember the taskIDs 
> for non-partition aware tasks which have been added to unreachableTasks so 
> they can be filtered out when the unreachable_tasks are shown and added when 
> we present the completed_tasks (code snippet below).
>     ```    
>         writer->field("completed_tasks", [this](JSON::ArrayWriter* writer) {
>           foreach (const Owned<Task>& task, framework_->completedTasks) {
>             // Skip unauthorized tasks.
>             if (!authorizeTask_->accept(*task.get(), framework_->info)) {
>               continue;
>             }
>     
>             writer->element(*task.get());
>           }
>           foreach (const TaskID& taskID, framework_->nonParitionAwareTasks) {
>             if (framework_->unreachableTasks.contains(taskID)) {
>               Task task =
>                 framework_->transformNonPartitionAwareTask(
>                     *framework_->unreachableTasks.at(taskID).get());
>               // Skip unauthorized tasks.
>               if (!authorizeTask_->accept(task, framework_->info)) {
>                 continue;
>               }
>               writer->element(task);
>             }
>           }
>         });
>     ```

I have created a new Review Request https://reviews.apache.org/r/61473. We can 
continue discussing the approach here.


- Megha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/#review174194
-----------------------------------------------------------


On Aug. 6, 2017, 4:35 a.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58898/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2017, 4:35 a.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master will not kill the tasks for non-Partition aware frameworks
> when an unreachable agent re-registers with the master.
> Master used to send a ShutdownFrameworkMessages to the agent
> to kill the tasks from non partition aware frameworks including the
> ones that are still registered which was problematic because the offer
> from this agent could still go to the same framework which could then
> launch new tasks. The agent would then receive tasks of the same
> framework and ignore them because it thinks the framework is shutting
> down. The framework is not shutting down of course, so from the master
> and the scheduler’s perspective the task is pending in STAGING forever
> until the next agent reregistration, which could happen much later.
> This commit fixes the problem by not shutting down the non-partition
> aware frameworks on such an agent.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
>   src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
>   src/master/master.cpp 7f38a5e21884546d4b4c866ca5918db779af8f99 
>   src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 
> 
> 
> Diff: https://reviews.apache.org/r/58898/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>

Reply via email to