[jira] [Created] (MESOS-9940) Framework removal may lead to inconsistent task states between master and agent.

2019-08-14 Thread Meng Zhu (JIRA)
Meng Zhu created MESOS-9940:
---

 Summary: Framework removal may lead to inconsistent task states 
between master and agent.
 Key: MESOS-9940
 URL: https://issues.apache.org/jira/browse/MESOS-9940
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Meng Zhu


When a framework is removed from the master (say due to disconnection), master 
sends a `ShutdownFrameworkMessage` to the agent. At the same time, master would 
transition the task status to e.g. KILLED. 
(https://github.com/apache/mesos/blob/master/src/master/master.cpp#L11247-L11291)

When agent got the shutdown message, it would try to shutdown all the executor 
and destroy all the containers. The tasks' status is updated after all these 
are done. 
(https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L7914-L7922)

However, if the executor shutdown gets stuck (e.g. due to hanging docker 
daemon), the task status transition will never happen. And master and agent 
will have diverged view of these tasks.

One consequence is that masters may try to schedule more workloads onto the 
problematic agent (because it thinks those task resources are freed up). Since 
we do not have overcommit check on agent, agent will comply and launch those 
tasks. This will lead to over-allocation.

One possible solution is to hold on the master status update until the agent is 
done with the framework shutdown.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-9123) Expose quota consumption metrics.

2019-08-14 Thread Benjamin Mahler (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907416#comment-16907416
 ] 

Benjamin Mahler commented on MESOS-9123:


An alternative approach to consider (rather than using the role tree in the 
master), is to enhance the allocator interface to let the allocator know when 
resources transition from offered to allocated, which would enable the 
allocator to expose quota consumption.

> Expose quota consumption metrics.
> -
>
> Key: MESOS-9123
> URL: https://issues.apache.org/jira/browse/MESOS-9123
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Meng Zhu
>Assignee: Meng Zhu
>Priority: Major
>  Labels: allocator, mesosphere, metrics, resource-management
>
> Currently, quota related metrics exposes quota guarantee and allocated quota. 
> We should expose "consumed" which is allocated quota plus unallocated 
> reservations. We already have this info in the allocator as 
> `consumedQuotaScalarQuantities`, just needs to expose it.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)