Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Klaus Ma
@Tom, one more question: how about your task run time? If the task run time
is too short, e.g. 100ms, the resources will be return to allocator when
task finished and will allocate it until next allocation cycle.


Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
Platform OpenSource Technology, STG, IBM GCG
+86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me

On Tue, Feb 23, 2016 at 10:25 AM, Guangya Liu  wrote:

> Hi Tom,
>
> I saw that the two frameworks with roles is consuming most of the
> resources, so I think that you can do more test by removing the two
> frameworks with roles.
>
> Another I want to mention is that the DRF allocator may have some issues
> when there are plenty of frameworks and the community is trying to improve
> this by some projects, such as 'Optimistic Offer MESOS-1607', 'Quota
> Enhancement MESOS-1791' etc.
>
> The issues for allocator include the following etc:
> https://issues.apache.org/jira/browse/MESOS-4302
> https://issues.apache.org/jira/browse/MESOS-3202 << You may take a look
> at this one in detail.
> https://issues.apache.org/jira/browse/MESOS-3078
>
> Hope this helps.
>
> Thanks,
>
> Guangya
>
>
> On Tue, Feb 23, 2016 at 1:53 AM, Tom Arnfeld  wrote:
>
>> Hi Guangya,
>>
>> Most of the agents do not have a role, so they use the default wildcard
>> role for resources. Also none of the frameworks have a role, therefore they
>> fall into the wildcard role too.
>>
>> Frameworks are being offered resources *up to a certain level of
>> fairness* but no further. The issue appears to be inside the allocator,
>> relating to how it is deciding how many resources each framework should get
>> within the role (wildcard ‘*') in relation to fairness.
>>
>> We seem to have circumvented the problem in the allocator by creating two 
>> *completely
>> new* roles and putting *one framework in each*. No agents have this role
>> assigned to any resources, but by doing this we seem to have got around the
>> bug in the allocator that’s causing strange fairness allocations, resulting
>> in no offers being sent.
>>
>> I’m going to look into defining a reproducible test case for this
>> scheduling situation to coax the allocator into behaving this way in a test
>> environment.
>>
>> Tom.
>>
>> On 22 Feb 2016, at 15:39, Guangya Liu  wrote:
>>
>> If non of the framework has role, then no framework can consume reserved
>> resources, so I think that at least the framework
>> 20160219-164457-67375276-5050-28802-0014 and
>> 20160219-164457-67375276-5050-28802-0015 should have role.
>>
>> Can you please show some detail for the following:
>> 1) Master start command or master http endpoint for flags.
>> 2) All slave start command or slave http endpoint for flags
>> 3) master http endpoint for state
>>
>> Thanks,
>>
>> Guangya
>>
>> On Mon, Feb 22, 2016 at 10:57 PM, Tom Arnfeld  wrote:
>>
>>> Ah yes sorry my mistake, there are a couple of agents with a *dev* role
>>> and only one or two frameworks connect to the cluster with that role, but
>>> not very often. Whether they’re connected or not doesn’t seem to cause any
>>> change in allocation behaviour.
>>>
>>> No other agents have roles.
>>>
>>> 974 2420 I0219 18:08:37.504587 28808 hierarchical.hpp:941] Allocating
>>> ports(*):[3000-5000]; cpus(*):0.5; mem(*):16384; disk(*):51200 on slave
>>> 20160112-174949-84152492-5050-19807-S316 to framework
>>> 20160219-164457-67375276-5050-28802-0014
>>>
>>> This agent should have another 9.5 cpus reserved by some role and no
>>> framework is configured using resources from this role, thus the resources
>>> on this role are wasting.  I think that the following agent may also have
>>> some reserved resources configured:
>>> 20160112-174949-84152492-5050-19807-S317,
>>> 20160112-174949-84152492-5050-19807-S322 and even more agents.
>>>
>>>
>>> I don’t think that’s correct, this is likely to be an offer for a slave
>>> where 9CPUs are currently allocated to an executor.
>>>
>>> I can verify via the agent configuration and HTTP endpoints that most of
>>> the agents do not have a role, and none of the frameworks do.
>>>
>>> Tom.
>>>
>>> On 22 Feb 2016, at 14:09, Guangya Liu  wrote:
>>>
>>> Hi Tom,
>>>
>>> I think that your cluster should have some role, weight configuration
>>> because I can see there are at least two agent has role with "dev"
>>> configured.
>>>
>>> 56 1363 I0219 18:08:26.284010 28810 hierarchical.hpp:1025] Filtered
>>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
>>> slave 20160112-165226-67375276-5050-22401-S300 for framework
>>> 20160219-164457-67375276-5050-28802-0015
>>> 57 1364 I0219 18:08:26.284162 28810 hierarchical.hpp:941] Allocating
>>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
>>> slave 20160112-165226-67375276-5050-22401-S300 to framework
>>> 20160219-164457-67375276-5050-28802-0014
>>> 58 1365 I0219 18:08:26.286725 

Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Guangya Liu
Hi Tom,

I saw that the two frameworks with roles is consuming most of the
resources, so I think that you can do more test by removing the two
frameworks with roles.

Another I want to mention is that the DRF allocator may have some issues
when there are plenty of frameworks and the community is trying to improve
this by some projects, such as 'Optimistic Offer MESOS-1607', 'Quota
Enhancement MESOS-1791' etc.

The issues for allocator include the following etc:
https://issues.apache.org/jira/browse/MESOS-4302
https://issues.apache.org/jira/browse/MESOS-3202 << You may take a look at
this one in detail.
https://issues.apache.org/jira/browse/MESOS-3078

Hope this helps.

Thanks,

Guangya


On Tue, Feb 23, 2016 at 1:53 AM, Tom Arnfeld  wrote:

> Hi Guangya,
>
> Most of the agents do not have a role, so they use the default wildcard
> role for resources. Also none of the frameworks have a role, therefore they
> fall into the wildcard role too.
>
> Frameworks are being offered resources *up to a certain level of fairness* but
> no further. The issue appears to be inside the allocator, relating to how
> it is deciding how many resources each framework should get within the role
> (wildcard ‘*') in relation to fairness.
>
> We seem to have circumvented the problem in the allocator by creating two 
> *completely
> new* roles and putting *one framework in each*. No agents have this role
> assigned to any resources, but by doing this we seem to have got around the
> bug in the allocator that’s causing strange fairness allocations, resulting
> in no offers being sent.
>
> I’m going to look into defining a reproducible test case for this
> scheduling situation to coax the allocator into behaving this way in a test
> environment.
>
> Tom.
>
> On 22 Feb 2016, at 15:39, Guangya Liu  wrote:
>
> If non of the framework has role, then no framework can consume reserved
> resources, so I think that at least the framework
> 20160219-164457-67375276-5050-28802-0014 and
> 20160219-164457-67375276-5050-28802-0015 should have role.
>
> Can you please show some detail for the following:
> 1) Master start command or master http endpoint for flags.
> 2) All slave start command or slave http endpoint for flags
> 3) master http endpoint for state
>
> Thanks,
>
> Guangya
>
> On Mon, Feb 22, 2016 at 10:57 PM, Tom Arnfeld  wrote:
>
>> Ah yes sorry my mistake, there are a couple of agents with a *dev* role
>> and only one or two frameworks connect to the cluster with that role, but
>> not very often. Whether they’re connected or not doesn’t seem to cause any
>> change in allocation behaviour.
>>
>> No other agents have roles.
>>
>> 974 2420 I0219 18:08:37.504587 28808 hierarchical.hpp:941] Allocating
>> ports(*):[3000-5000]; cpus(*):0.5; mem(*):16384; disk(*):51200 on slave
>> 20160112-174949-84152492-5050-19807-S316 to framework
>> 20160219-164457-67375276-5050-28802-0014
>>
>> This agent should have another 9.5 cpus reserved by some role and no
>> framework is configured using resources from this role, thus the resources
>> on this role are wasting.  I think that the following agent may also have
>> some reserved resources configured:
>> 20160112-174949-84152492-5050-19807-S317,
>> 20160112-174949-84152492-5050-19807-S322 and even more agents.
>>
>>
>> I don’t think that’s correct, this is likely to be an offer for a slave
>> where 9CPUs are currently allocated to an executor.
>>
>> I can verify via the agent configuration and HTTP endpoints that most of
>> the agents do not have a role, and none of the frameworks do.
>>
>> Tom.
>>
>> On 22 Feb 2016, at 14:09, Guangya Liu  wrote:
>>
>> Hi Tom,
>>
>> I think that your cluster should have some role, weight configuration
>> because I can see there are at least two agent has role with "dev"
>> configured.
>>
>> 56 1363 I0219 18:08:26.284010 28810 hierarchical.hpp:1025] Filtered
>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
>> slave 20160112-165226-67375276-5050-22401-S300 for framework
>> 20160219-164457-67375276-5050-28802-0015
>> 57 1364 I0219 18:08:26.284162 28810 hierarchical.hpp:941] Allocating
>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
>> slave 20160112-165226-67375276-5050-22401-S300 to framework
>> 20160219-164457-67375276-5050-28802-0014
>> 58 1365 I0219 18:08:26.286725 28810 hierarchical.hpp:1025] Filtered
>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
>> slave 20160112-165226-67375276-5050-22401-S303 for framework
>> 20160219-164457-67375276-5050-28802-0015
>> 59 1366 I0219 18:08:26.286875 28810 hierarchical.hpp:941] Allocating
>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
>> slave 20160112-165226-67375276-5050-22401-S303 to framework
>> 20160219-164457-67375276-5050-28802-0014
>>
>> Also I think that the framework 20160219-164457-67375276-5050-28802-0014
>> and 

Re: Safe update of agent attributes

2016-02-22 Thread Adam Bordelon
Currently, changing any --attributes or --resources requires draining the
agent and killing all running tasks.
See https://issues.apache.org/jira/browse/MESOS-1739
You could do a `mesos-slave --recovery=cleanup` which essentially kills all
the tasks and clears the work_dir; then restart with a `mesos-slave
--attributes=new_attributes`
Note that even adding a new attribute is the kind of change that could
cause a framework scheduler to no longer want its task on that node. For
example, you add "public_ip=true" and now my scheduler no longer wants to
run private tasks there. As such, any attribute change needs to notify all
schedulers of the change.


On Mon, Feb 22, 2016 at 2:01 PM, Marco Massenzio 
wrote:

> IIRC you can avoid the issue by either using a different work_dir for the
> agent, or removing (and, possibly, re-creating) it.
>
> I'm afraid I don't have a running instance of Mesos on this machine and
> can't test it out.
>
> Also (and this is strictly my opinion :) I would consider a change of
> attribute a "material" change for the Agent and I would avoid trying to
> recover state from previous runs; but, again, there may be perfectly
> legitimate cases in which this is desirable.
>
> --
> *Marco Massenzio*
> http://codetrips.com
>
> On Mon, Feb 22, 2016 at 12:11 PM, Zhitao Li  wrote:
>
>> Hi,
>>
>> We recently discovered that updating attributes on Mesos agents is a very
>> risk operation, and has a potential to send agent(s) into a crash loop if
>> not done properly with errors like "Failed to perform recovery:
>> Incompatible slave info detected". This combined with --recovery_timeout
>> made the situation even worse.
>>
>> In our setup, some of the attributes are generated from automated
>> configuration management system, so this opens a possibility that "bad"
>> configuration could be left on the machine and causing big trouble on next
>> agent upgrade, if the USR1 signal was not sent on time.
>>
>> Some questions:
>>
>> 1. Does anyone have a good practice recommended on managing these
>> attributes safely?
>> 2. Has Mesos considered to fallback to old metadata if it detects
>> incompatibility, so agents would keep running with old attributes instead
>> of falling into crash loop?
>>
>> Thanks.
>>
>> --
>> Cheers,
>>
>> Zhitao Li
>>
>
>


[RESULT][VOTE] Release Apache Mesos 0.27.1 (rc1)

2016-02-22 Thread Michael Park
Hi all,

The vote for Mesos 0.27.1 (rc1) has passed with the
following votes.

+1 (Binding)
--
Bernd Mathiske
Joris Van Remoortere
Vinod Kone

+1 (Non-binding)
--
Zhitao Li
Jörg Schad

There were no 0 or -1 votes.

Please find the release at:
https://dist.apache.org/repos/dist/release/mesos/0.27.1

It is recommended to use a mirror to download the release:
http://www.apache.org/dyn/closer.cgi

The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.27.1

The mesos-0.27.1.jar has been released to:
https://repository.apache.org

The website (http://mesos.apache.org) will be updated shortly to reflect
this release.

Thanks,

Joris, MPark


Re: Safe update of agent attributes

2016-02-22 Thread Marco Massenzio
IIRC you can avoid the issue by either using a different work_dir for the
agent, or removing (and, possibly, re-creating) it.

I'm afraid I don't have a running instance of Mesos on this machine and
can't test it out.

Also (and this is strictly my opinion :) I would consider a change of
attribute a "material" change for the Agent and I would avoid trying to
recover state from previous runs; but, again, there may be perfectly
legitimate cases in which this is desirable.

-- 
*Marco Massenzio*
http://codetrips.com

On Mon, Feb 22, 2016 at 12:11 PM, Zhitao Li  wrote:

> Hi,
>
> We recently discovered that updating attributes on Mesos agents is a very
> risk operation, and has a potential to send agent(s) into a crash loop if
> not done properly with errors like "Failed to perform recovery:
> Incompatible slave info detected". This combined with --recovery_timeout
> made the situation even worse.
>
> In our setup, some of the attributes are generated from automated
> configuration management system, so this opens a possibility that "bad"
> configuration could be left on the machine and causing big trouble on next
> agent upgrade, if the USR1 signal was not sent on time.
>
> Some questions:
>
> 1. Does anyone have a good practice recommended on managing these
> attributes safely?
> 2. Has Mesos considered to fallback to old metadata if it detects
> incompatibility, so agents would keep running with old attributes instead
> of falling into crash loop?
>
> Thanks.
>
> --
> Cheers,
>
> Zhitao Li
>


Re: Reusing Task IDs

2016-02-22 Thread Erik Weathers
Thanks for the responses.  Filed a ticket for this:

   - https://issues.apache.org/jira/browse/MESOS-4737

- Erik

On Mon, Feb 22, 2016 at 1:23 PM, Sargun Dhillon  wrote:

> As someone who has been there and back again (Reusing task-IDs, and
> realizing it's a terrible idea), I'd put some advise in the docs +
> mesos.proto to compose task IDs from GUIDs, and add that it's
> dangerous to reuse them.
>
> I would advocate for a mechanism to prevent the usage of non-unique
> IDs for executors, tasks, and frameworks, but I feel that's a more
> complex, and larger problem.
>
> On Mon, Feb 22, 2016 at 1:19 PM, Vinod Kone  wrote:
> > I would vote for updating comments in mesos.proto to warn users to not
> > re-use task id for now.
> >
> > On Sun, Feb 21, 2016 at 9:05 PM, Klaus Ma 
> wrote:
> >>
> >> Yes, it's dangerous to reuse TaskID; there's a JIRA (MESOS-3070) that
> >> Master'll crash when Master failover with duplicated TaskID.
> >>
> >> Here's the case of MESOS-3070:
> >> T1: launch task (t1) on agent (agent_1)
> >> T2: master failover
> >> T3: launch another task (t1) on agent (agent_2) before agent_1
> >> re-registering back
> >> T4: agent_1 re-registered back; master'll crash because of `CHECK` when
> >> add task (t1) back to master
> >>
> >> Is there any special case that framework has to re-use the TaskID; if no
> >> special case, I think we should ask framework to avoid reuse TaskID.
> >>
> >> 
> >> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
> >> Platform OpenSource Technology, STG, IBM GCG
> >> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
> >>
> >> On Mon, Feb 22, 2016 at 12:24 PM, Erik Weathers 
> >> wrote:
> >>>
> >>> tldr; Reusing TaskIDs clashes with the mesos-agent recovery feature.
> >>>
> >>> Adam Bordelon wrote:
> >>> > Reusing taskIds may work if you're guaranteed to never be running two
> >>> > instances of the same taskId simultaneously
> >>>
> >>> I've encountered another scenario where reusing TaskIDs is dangerous,
> >>> even if you meet the guarantee of never running 2 task instances with
> the
> >>> same TaskID simultaneously.
> >>>
> >>> Scenario leading to a problem:
> >>>
> >>> Say you have a task with ID "T1", which terminates for some reason, so
> >>> its terminal status update gets recorded into the agent's current
> "run" in
> >>> the task's updates file:
> >>>
> >>>
> >>>
> MESOS_WORK_DIR/meta/slaves/latest/frameworks/FRAMEWORK_ID/executors/EXECUTOR_ID/runs/latest/tasks/T1/task.updates
> >>>
> >>> Then say a new task is launched with the same ID of T1, and it gets
> >>> scheduled under the same Executor on the same agent host. In that
> case, the
> >>> task will be reusing the same work_dir path, and thus have the already
> >>> recorded "terminal status update" in its task.updates file.  So the
> updates
> >>> file has a stream of updates that might look like this:
> >>>
> >>> TASK_RUNNING
> >>> TASK_FINISHED
> >>> TASK_RUNNING
> >>>
> >>> Say you subsequently restart the mesos-slave/agent, expecting all tasks
> >>> to survive the restart via the recovery process.  Unfortunately, T1 is
> >>> terminated because the task recovery logic [1] looks at the current
> run's
> >>> tasks' task.updates files, searching for tasks with "terminal status
> >>> updates", and then terminating any such tasks.  So, even though T1 was
> >>> actually running just fine, it gets terminated because at some point
> in its
> >>> previous incarnation it had a "terminal status update" recorded.
> >>>
> >>> Leads to inconsistent state
> >>>
> >>> Compounding the problem, this termination is done without informing the
> >>> Executor, and thus the process underlying the task continues to run,
> even
> >>> though mesos thinks it's gone.  Which is really bad since it leaves
> the host
> >>> with a different state than mesos thinks exists. e.g., if the task had
> a
> >>> port resource, then mesos incorrectly thinks the port is now free, so a
> >>> framework might try to launch a task/executor that uses the port, but
> it
> >>> will fail because the process cannot bind to the port.
> >>>
> >>> Change recovery code or just update comments in mesos.proto?
> >>>
> >>> Perhaps this behavior could be considered a "bug" and the recovery
> logic
> >>> that processes tasks status updates could be modified to ignore
> "terminal
> >>> status updates" if there is a subsequent TASK_RUNNING update in the
> >>> task.updates file.  If that sounds like a desirable change, I'm happy
> to
> >>> file a JIRA issue for that and work on the fix myself.
> >>>
> >>> If we think the recovery logic is fine as it is, then we should update
> >>> these comments [2] in mesos.proto since they are incorrect given the
> >>> behavior I just encountered:
> >>>
>  A framework generated ID to distinguish a task. The ID must remain
>  unique while the task is active. However, a framework can reuse an
>  ID _only_ if a 

Re: Reusing Task IDs

2016-02-22 Thread Sargun Dhillon
As someone who has been there and back again (Reusing task-IDs, and
realizing it's a terrible idea), I'd put some advise in the docs +
mesos.proto to compose task IDs from GUIDs, and add that it's
dangerous to reuse them.

I would advocate for a mechanism to prevent the usage of non-unique
IDs for executors, tasks, and frameworks, but I feel that's a more
complex, and larger problem.

On Mon, Feb 22, 2016 at 1:19 PM, Vinod Kone  wrote:
> I would vote for updating comments in mesos.proto to warn users to not
> re-use task id for now.
>
> On Sun, Feb 21, 2016 at 9:05 PM, Klaus Ma  wrote:
>>
>> Yes, it's dangerous to reuse TaskID; there's a JIRA (MESOS-3070) that
>> Master'll crash when Master failover with duplicated TaskID.
>>
>> Here's the case of MESOS-3070:
>> T1: launch task (t1) on agent (agent_1)
>> T2: master failover
>> T3: launch another task (t1) on agent (agent_2) before agent_1
>> re-registering back
>> T4: agent_1 re-registered back; master'll crash because of `CHECK` when
>> add task (t1) back to master
>>
>> Is there any special case that framework has to re-use the TaskID; if no
>> special case, I think we should ask framework to avoid reuse TaskID.
>>
>> 
>> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
>> Platform OpenSource Technology, STG, IBM GCG
>> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>>
>> On Mon, Feb 22, 2016 at 12:24 PM, Erik Weathers 
>> wrote:
>>>
>>> tldr; Reusing TaskIDs clashes with the mesos-agent recovery feature.
>>>
>>> Adam Bordelon wrote:
>>> > Reusing taskIds may work if you're guaranteed to never be running two
>>> > instances of the same taskId simultaneously
>>>
>>> I've encountered another scenario where reusing TaskIDs is dangerous,
>>> even if you meet the guarantee of never running 2 task instances with the
>>> same TaskID simultaneously.
>>>
>>> Scenario leading to a problem:
>>>
>>> Say you have a task with ID "T1", which terminates for some reason, so
>>> its terminal status update gets recorded into the agent's current "run" in
>>> the task's updates file:
>>>
>>>
>>> MESOS_WORK_DIR/meta/slaves/latest/frameworks/FRAMEWORK_ID/executors/EXECUTOR_ID/runs/latest/tasks/T1/task.updates
>>>
>>> Then say a new task is launched with the same ID of T1, and it gets
>>> scheduled under the same Executor on the same agent host. In that case, the
>>> task will be reusing the same work_dir path, and thus have the already
>>> recorded "terminal status update" in its task.updates file.  So the updates
>>> file has a stream of updates that might look like this:
>>>
>>> TASK_RUNNING
>>> TASK_FINISHED
>>> TASK_RUNNING
>>>
>>> Say you subsequently restart the mesos-slave/agent, expecting all tasks
>>> to survive the restart via the recovery process.  Unfortunately, T1 is
>>> terminated because the task recovery logic [1] looks at the current run's
>>> tasks' task.updates files, searching for tasks with "terminal status
>>> updates", and then terminating any such tasks.  So, even though T1 was
>>> actually running just fine, it gets terminated because at some point in its
>>> previous incarnation it had a "terminal status update" recorded.
>>>
>>> Leads to inconsistent state
>>>
>>> Compounding the problem, this termination is done without informing the
>>> Executor, and thus the process underlying the task continues to run, even
>>> though mesos thinks it's gone.  Which is really bad since it leaves the host
>>> with a different state than mesos thinks exists. e.g., if the task had a
>>> port resource, then mesos incorrectly thinks the port is now free, so a
>>> framework might try to launch a task/executor that uses the port, but it
>>> will fail because the process cannot bind to the port.
>>>
>>> Change recovery code or just update comments in mesos.proto?
>>>
>>> Perhaps this behavior could be considered a "bug" and the recovery logic
>>> that processes tasks status updates could be modified to ignore "terminal
>>> status updates" if there is a subsequent TASK_RUNNING update in the
>>> task.updates file.  If that sounds like a desirable change, I'm happy to
>>> file a JIRA issue for that and work on the fix myself.
>>>
>>> If we think the recovery logic is fine as it is, then we should update
>>> these comments [2] in mesos.proto since they are incorrect given the
>>> behavior I just encountered:
>>>
 A framework generated ID to distinguish a task. The ID must remain
 unique while the task is active. However, a framework can reuse an
 ID _only_ if a previous task with the same ID has reached a
 terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
>>>
>>>
>>> Conclusion
>>>
>>> It is dangerous indeed to reuse a TaskID for separate task runs, even if
>>> they are guaranteed to not be running concurrently.
>>>
>>> - Erik
>>>
>>>
>>> P.S., I encountered this problem while trying to use mesos-agent recovery
>>> with the storm-mesos framework [3].  

Re: Reusing Task IDs

2016-02-22 Thread Vinod Kone
I would vote for updating comments in mesos.proto to warn users to not
re-use task id for now.

On Sun, Feb 21, 2016 at 9:05 PM, Klaus Ma  wrote:

> Yes, it's dangerous to reuse TaskID; there's a JIRA (MESOS-3070) that
> Master'll crash when Master failover with duplicated TaskID.
>
> Here's the case of *MESOS-3070*:
> T1: launch task (t1) on agent (agent_1)
> T2: master failover
> T3: launch another task (t1) on agent (agent_2) before agent_1
> re-registering back
> T4: agent_1 re-registered back; master'll crash because of `CHECK` when
> add task (t1) back to master
>
> Is there any special case that framework has to re-use the TaskID; if no
> special case, I think we should ask framework to avoid reuse TaskID.
>
> 
> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
> Platform OpenSource Technology, STG, IBM GCG
> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>
> On Mon, Feb 22, 2016 at 12:24 PM, Erik Weathers 
> wrote:
>
>> tldr; *Reusing TaskIDs clashes with the mesos-agent recovery feature.*
>>
>> Adam Bordelon wrote:
>> > Reusing taskIds may work if you're guaranteed to never be running two
>> instances of the same taskId simultaneously
>>
>> I've encountered another scenario where reusing TaskIDs is dangerous,
>> even if you meet the guarantee of never running 2 task instances with the
>> same TaskID simultaneously.
>>
>> *Scenario leading to a problem:*
>>
>> Say you have a task with ID "T1", which terminates for some reason, so
>> its terminal status update gets recorded into the agent's current "run" in
>> the task's updates file:
>>
>>
>> MESOS_WORK_DIR/meta/slaves/latest/frameworks/FRAMEWORK_ID/executors/EXECUTOR_ID/runs/latest/tasks/T1/task.updates
>>
>> Then say a new task is launched with the same ID of T1, and it gets
>> scheduled under the same Executor on the same agent host. In that case, the
>> task will be reusing the same work_dir path, and thus have the already
>> recorded "terminal status update" in its task.updates file.  So the updates
>> file has a stream of updates that might look like this:
>>
>>- TASK_RUNNING
>>- TASK_FINISHED
>>- TASK_RUNNING
>>
>> Say you subsequently restart the mesos-slave/agent, expecting all tasks
>> to survive the restart via the recovery process.  Unfortunately, T1 is
>> terminated because the task recovery logic
>> 
>>  [1]
>> looks at the current run's tasks' task.updates files, searching for tasks
>> with "terminal status updates", and then terminating any such tasks.  So,
>> even though T1 was actually running just fine, it gets terminated because
>> at some point in its previous incarnation it had a "terminal status update"
>> recorded.
>>
>> *Leads to inconsistent state*
>>
>> Compounding the problem, this termination is done without informing the
>> Executor, and thus the process underlying the task continues to run, even
>> though mesos thinks it's gone.  Which is really bad since it leaves the
>> host with a different state than mesos thinks exists. e.g., if the task had
>> a port resource, then mesos incorrectly thinks the port is now free, so a
>> framework might try to launch a task/executor that uses the port, but it
>> will fail because the process cannot bind to the port.
>>
>> *Change recovery code or just update comments in mesos.proto?*
>>
>> Perhaps this behavior could be considered a "bug" and the recovery logic
>> that processes tasks status updates could be modified to ignore "terminal
>> status updates" if there is a subsequent TASK_RUNNING update in the
>> task.updates file.  If that sounds like a desirable change, I'm happy to
>> file a JIRA issue for that and work on the fix myself.
>>
>> If we think the recovery logic is fine as it is, then we should update these
>> comments
>> 
>>  [2]
>> in mesos.proto since they are incorrect given the behavior I just
>> encountered:
>>
>> A framework generated ID to distinguish a task. The ID must remain
>>> unique while the task is active. However, a framework can reuse an
>>> ID _only_ if a previous task with the same ID has reached a
>>> terminal state (e.g., TASK_FINISHED, TASK_LOST, TASK_KILLED, etc.).
>>
>>
>> *Conclusion*
>>
>> It is dangerous indeed to reuse a TaskID for separate task runs, even if
>> they are guaranteed to not be running concurrently.
>>
>> - Erik
>>
>>
>> P.S., I encountered this problem while trying to use mesos-agent recovery
>> with the storm-mesos framework  [3].
>> Notably, this framework sets the TaskID to
>> "-" for the storm worker tasks, so when a
>> storm worker dies and is reborn on that host, the TaskID gets reused.  But
>> then the task doesn't survive an agent restart (even though the worker
>> *process* does survive, putting us in an inconsistent state!).
>>
>> P.P.S., being able 

Re: Safe update of agent attributes

2016-02-22 Thread Zameer Manji
Zhitao,

In my experience the best way to manage these attributes is to ensure
attribute changes are minimal (ie one attribute at a time) and roll them
out slowly across a cluster. This way you can catch unsafe mutations
quickly and rollback if needed.

I don't think there is a whitelist/blacklist of attributes to reference so
I think this is the safest way to go.

On Mon, Feb 22, 2016 at 12:11 PM, Zhitao Li  wrote:

> Hi,
>
> We recently discovered that updating attributes on Mesos agents is a very
> risk operation, and has a potential to send agent(s) into a crash loop if
> not done properly with errors like "Failed to perform recovery:
> Incompatible slave info detected". This combined with --recovery_timeout
> made the situation even worse.
>
> In our setup, some of the attributes are generated from automated
> configuration management system, so this opens a possibility that "bad"
> configuration could be left on the machine and causing big trouble on next
> agent upgrade, if the USR1 signal was not sent on time.
>
> Some questions:
>
> 1. Does anyone have a good practice recommended on managing these
> attributes safely?
> 2. Has Mesos considered to fallback to old metadata if it detects
> incompatibility, so agents would keep running with old attributes instead
> of falling into crash loop?
>
> Thanks.
>
> --
> Cheers,
>
> Zhitao Li
>
> --
> Zameer Manji
>
>


Safe update of agent attributes

2016-02-22 Thread Zhitao Li
Hi,

We recently discovered that updating attributes on Mesos agents is a very
risk operation, and has a potential to send agent(s) into a crash loop if
not done properly with errors like "Failed to perform recovery:
Incompatible slave
info detected". This combined with --recovery_timeout made the situation
even worse.

In our setup, some of the attributes are generated from automated
configuration management system, so this opens a possibility that "bad"
configuration could be left on the machine and causing big trouble on next
agent upgrade, if the USR1 signal was not sent on time.

Some questions:

1. Does anyone have a good practice recommended on managing these
attributes safely?
2. Has Mesos considered to fallback to old metadata if it detects
incompatibility, so agents would keep running with old attributes instead
of falling into crash loop?

Thanks.

-- 
Cheers,

Zhitao Li


[proposal] Generalized Authorized Interface

2016-02-22 Thread Alexander Rojas
Hey guys,

After some extra thought, we came to what we think is a nice interface for the 
Mesos authorizer [1] which will allow users of Mesos to use to your custom 
backends in a nice way. Please share your thoughts with us in case we missed 
something or there are improvements we can make to the interface.

[1] 
https://docs.google.com/document/d/1gCR6fpD_1wKbVUtj6iP2sqsARtMfgTz5YJpJ1nd7zBA/

Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Tom Arnfeld
Hi Guangya,

Most of the agents do not have a role, so they use the default wildcard role 
for resources. Also none of the frameworks have a role, therefore they fall 
into the wildcard role too.

Frameworks are being offered resources up to a certain level of fairness but no 
further. The issue appears to be inside the allocator, relating to how it is 
deciding how many resources each framework should get within the role (wildcard 
‘*') in relation to fairness.

We seem to have circumvented the problem in the allocator by creating two 
completely new roles and putting one framework in each. No agents have this 
role assigned to any resources, but by doing this we seem to have got around 
the bug in the allocator that’s causing strange fairness allocations, resulting 
in no offers being sent.

I’m going to look into defining a reproducible test case for this scheduling 
situation to coax the allocator into behaving this way in a test environment.

Tom.

> On 22 Feb 2016, at 15:39, Guangya Liu  wrote:
> 
> If non of the framework has role, then no framework can consume reserved 
> resources, so I think that at least the framework 
> 20160219-164457-67375276-5050-28802-0014 and 
> 20160219-164457-67375276-5050-28802-0015 should have role.
> 
> Can you please show some detail for the following:
> 1) Master start command or master http endpoint for flags.
> 2) All slave start command or slave http endpoint for flags
> 3) master http endpoint for state 
> 
> Thanks,
> 
> Guangya
> 
> On Mon, Feb 22, 2016 at 10:57 PM, Tom Arnfeld  > wrote:
> Ah yes sorry my mistake, there are a couple of agents with a dev role and 
> only one or two frameworks connect to the cluster with that role, but not 
> very often. Whether they’re connected or not doesn’t seem to cause any change 
> in allocation behaviour.
> 
> No other agents have roles.
> 
>> 974 2420 I0219 18:08:37.504587 28808 hierarchical.hpp:941] Allocating 
>> ports(*):[3000-5000]; cpus(*):0.5; mem(*):16384; disk(*):51200 on slave 
>> 20160112-174949-84152492-5050-19807-S316 to framework 
>> 20160219-164457-67375276-5050-28802-0014
>> 
>> This agent should have another 9.5 cpus reserved by some role and no 
>> framework is configured using resources from this role, thus the resources 
>> on this role are wasting.  I think that the following agent may also have 
>> some reserved resources configured: 
>> 20160112-174949-84152492-5050-19807-S317, 
>> 20160112-174949-84152492-5050-19807-S322 and even more agents.
> 
> 
> I don’t think that’s correct, this is likely to be an offer for a slave where 
> 9CPUs are currently allocated to an executor.
> 
> I can verify via the agent configuration and HTTP endpoints that most of the 
> agents do not have a role, and none of the frameworks do.
> 
> Tom.
> 
>> On 22 Feb 2016, at 14:09, Guangya Liu > > wrote:
>> 
>> Hi Tom,
>> 
>> I think that your cluster should have some role, weight configuration 
>> because I can see there are at least two agent has role with "dev" 
>> configured.
>> 
>> 56 1363 I0219 18:08:26.284010 28810 hierarchical.hpp:1025] Filtered 
>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on 
>> slave 20160112-165226-67375276-5050-22401-S300 for framework 
>> 20160219-164457-67375276-5050-28802-0015
>> 57 1364 I0219 18:08:26.284162 28810 hierarchical.hpp:941] Allocating 
>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on 
>> slave 20160112-165226-67375276-5050-22401-S300 to framework 
>> 20160219-164457-67375276-5050-28802-0014
>> 58 1365 I0219 18:08:26.286725 28810 hierarchical.hpp:1025] Filtered 
>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on 
>> slave 20160112-165226-67375276-5050-22401-S303 for framework 
>> 20160219-164457-67375276-5050-28802-0015
>> 59 1366 I0219 18:08:26.286875 28810 hierarchical.hpp:941] Allocating 
>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on 
>> slave 20160112-165226-67375276-5050-22401-S303 to framework 
>> 20160219-164457-67375276-5050-28802-0014
>> 
>> Also I think that the framework 20160219-164457-67375276-5050-28802-0014 and 
>> 20160219-164457-67375276-5050-28802-0015 may have a high weight cause I saw 
>> that framework  20160219-164457-67375276-5050-28802-0014 get 26 agents at 
>> 18:08:26.
>> 
>> Another is that some other agents may also have role configured but no 
>> frameworks are configured with the agent role and this caused some agents 
>> have some static reserved resources cannot be allocated.
>> 
>> I searched 20160112-174949-84152492-5050-19807-S316 in the log and found 
>> that it was allocating the following resources to frameworks:
>> 
>> 974 2420 I0219 18:08:37.504587 28808 hierarchical.hpp:941] Allocating 
>> ports(*):[3000-5000]; cpus(*):0.5; mem(*):16384; disk(*):51200 on slave 
>> 20160112-174949-84152492-5050-19807-S316 to 

Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Guangya Liu
If non of the framework has role, then no framework can consume reserved
resources, so I think that at least the framework
20160219-164457-67375276-5050-28802-0014 and
20160219-164457-67375276-5050-28802-0015
should have role.

Can you please show some detail for the following:
1) Master start command or master http endpoint for flags.
2) All slave start command or slave http endpoint for flags
3) master http endpoint for state

Thanks,

Guangya

On Mon, Feb 22, 2016 at 10:57 PM, Tom Arnfeld  wrote:

> Ah yes sorry my mistake, there are a couple of agents with a *dev* role
> and only one or two frameworks connect to the cluster with that role, but
> not very often. Whether they’re connected or not doesn’t seem to cause any
> change in allocation behaviour.
>
> No other agents have roles.
>
> 974 2420 I0219 18:08:37.504587 28808 hierarchical.hpp:941] Allocating
> ports(*):[3000-5000]; cpus(*):0.5; mem(*):16384; disk(*):51200 on slave
> 20160112-174949-84152492-5050-19807-S316 to framework
> 20160219-164457-67375276-5050-28802-0014
>
> This agent should have another 9.5 cpus reserved by some role and no
> framework is configured using resources from this role, thus the resources
> on this role are wasting.  I think that the following agent may also have
> some reserved resources configured:
> 20160112-174949-84152492-5050-19807-S317,
> 20160112-174949-84152492-5050-19807-S322 and even more agents.
>
>
> I don’t think that’s correct, this is likely to be an offer for a slave
> where 9CPUs are currently allocated to an executor.
>
> I can verify via the agent configuration and HTTP endpoints that most of
> the agents do not have a role, and none of the frameworks do.
>
> Tom.
>
> On 22 Feb 2016, at 14:09, Guangya Liu  wrote:
>
> Hi Tom,
>
> I think that your cluster should have some role, weight configuration
> because I can see there are at least two agent has role with "dev"
> configured.
>
> 56 1363 I0219 18:08:26.284010 28810 hierarchical.hpp:1025] Filtered
> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
> slave 20160112-165226-67375276-5050-22401-S300 for framework
> 20160219-164457-67375276-5050-28802-0015
> 57 1364 I0219 18:08:26.284162 28810 hierarchical.hpp:941] Allocating
> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
> slave 20160112-165226-67375276-5050-22401-S300 to framework
> 20160219-164457-67375276-5050-28802-0014
> 58 1365 I0219 18:08:26.286725 28810 hierarchical.hpp:1025] Filtered
> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
> slave 20160112-165226-67375276-5050-22401-S303 for framework
> 20160219-164457-67375276-5050-28802-0015
> 59 1366 I0219 18:08:26.286875 28810 hierarchical.hpp:941] Allocating
> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
> slave 20160112-165226-67375276-5050-22401-S303 to framework
> 20160219-164457-67375276-5050-28802-0014
>
> Also I think that the framework 20160219-164457-67375276-5050-28802-0014
> and 20160219-164457-67375276-5050-28802-0015 may have a high weight cause I
> saw that framework  20160219-164457-67375276-5050-28802-0014 get 26 agents
> at 18:08:26.
>
> Another is that some other agents may also have role configured but no
> frameworks are configured with the agent role and this caused some agents
> have some static reserved resources cannot be allocated.
>
> I searched 20160112-174949-84152492-5050-19807-S316 in the log and found
> that it was allocating the following resources to frameworks:
>
> 974 2420 I0219 18:08:37.504587 28808 hierarchical.hpp:941] Allocating
> ports(*):[3000-5000]; cpus(*):0.5; mem(*):16384; disk(*):51200 on slave
> 20160112-174949-84152492-5050-19807-S316 to framework
> 20160219-164457-67375276-5050-28802-0014
>
> This agent should have another 9.5 cpus reserved by some role and no
> framework is configured using resources from this role, thus the resources
> on this role are wasting.  I think that the following agent may also have
> some reserved resources configured:
> 20160112-174949-84152492-5050-19807-S317,
> 20160112-174949-84152492-5050-19807-S322 and even more agents.
>
> So I would suggest that you check the master and each slave start command
> to see how does role configured. You can also check this via the command: < 
> curl
> "http://master-ip:5050/master/state.json; 2>/dev/null| jq . >  (Note:
> There is a dot in the end of the command) to get all slave resources
> status: reserved, used, total resources etc.
>
> Thanks,
>
> Guangya
>
>
> On Mon, Feb 22, 2016 at 5:16 PM, Tom Arnfeld  wrote:
>
>> No roles, no reservations.
>>
>> We're using the default filter options with all frameworks and default
>> allocation interval.
>>
>> On 21 Feb 2016, at 08:10, Guangya Liu  wrote:
>>
>> Hi Tom,
>>
>> I traced the agent of "20160112-165226-67375276-5050-22401-S199" and
>> found that it is keeps declining by many frameworks: once a 

Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Tom Arnfeld
Ah yes sorry my mistake, there are a couple of agents with a dev role and only 
one or two frameworks connect to the cluster with that role, but not very 
often. Whether they’re connected or not doesn’t seem to cause any change in 
allocation behaviour.

No other agents have roles.

> 974 2420 I0219 18:08:37.504587 28808 hierarchical.hpp:941] Allocating 
> ports(*):[3000-5000]; cpus(*):0.5; mem(*):16384; disk(*):51200 on slave 
> 20160112-174949-84152492-5050-19807-S316 to framework 
> 20160219-164457-67375276-5050-28802-0014
> 
> This agent should have another 9.5 cpus reserved by some role and no 
> framework is configured using resources from this role, thus the resources on 
> this role are wasting.  I think that the following agent may also have some 
> reserved resources configured: 20160112-174949-84152492-5050-19807-S317, 
> 20160112-174949-84152492-5050-19807-S322 and even more agents.

I don’t think that’s correct, this is likely to be an offer for a slave where 
9CPUs are currently allocated to an executor.

I can verify via the agent configuration and HTTP endpoints that most of the 
agents do not have a role, and none of the frameworks do.

Tom.

> On 22 Feb 2016, at 14:09, Guangya Liu  wrote:
> 
> Hi Tom,
> 
> I think that your cluster should have some role, weight configuration because 
> I can see there are at least two agent has role with "dev" configured.
> 
> 56 1363 I0219 18:08:26.284010 28810 hierarchical.hpp:1025] Filtered 
> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on 
> slave 20160112-165226-67375276-5050-22401-S300 for framework 
> 20160219-164457-67375276-5050-28802-0015
> 57 1364 I0219 18:08:26.284162 28810 hierarchical.hpp:941] Allocating 
> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on 
> slave 20160112-165226-67375276-5050-22401-S300 to framework 
> 20160219-164457-67375276-5050-28802-0014
> 58 1365 I0219 18:08:26.286725 28810 hierarchical.hpp:1025] Filtered 
> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on 
> slave 20160112-165226-67375276-5050-22401-S303 for framework 
> 20160219-164457-67375276-5050-28802-0015
> 59 1366 I0219 18:08:26.286875 28810 hierarchical.hpp:941] Allocating 
> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on 
> slave 20160112-165226-67375276-5050-22401-S303 to framework 
> 20160219-164457-67375276-5050-28802-0014
> 
> Also I think that the framework 20160219-164457-67375276-5050-28802-0014 and 
> 20160219-164457-67375276-5050-28802-0015 may have a high weight cause I saw 
> that framework  20160219-164457-67375276-5050-28802-0014 get 26 agents at 
> 18:08:26.
> 
> Another is that some other agents may also have role configured but no 
> frameworks are configured with the agent role and this caused some agents 
> have some static reserved resources cannot be allocated.
> 
> I searched 20160112-174949-84152492-5050-19807-S316 in the log and found that 
> it was allocating the following resources to frameworks:
> 
> 974 2420 I0219 18:08:37.504587 28808 hierarchical.hpp:941] Allocating 
> ports(*):[3000-5000]; cpus(*):0.5; mem(*):16384; disk(*):51200 on slave 
> 20160112-174949-84152492-5050-19807-S316 to framework 
> 20160219-164457-67375276-5050-28802-0014
> 
> This agent should have another 9.5 cpus reserved by some role and no 
> framework is configured using resources from this role, thus the resources on 
> this role are wasting.  I think that the following agent may also have some 
> reserved resources configured: 20160112-174949-84152492-5050-19807-S317, 
> 20160112-174949-84152492-5050-19807-S322 and even more agents.
>  
> So I would suggest that you check the master and each slave start command to 
> see how does role configured. You can also check this via the command: < curl 
> "http://master-ip:5050/master/state.json 
> " 2>/dev/null| jq . >  (Note: There 
> is a dot in the end of the command) to get all slave resources status: 
> reserved, used, total resources etc.
> 
> Thanks,
> 
> Guangya
> 
> 
> On Mon, Feb 22, 2016 at 5:16 PM, Tom Arnfeld  > wrote:
> No roles, no reservations.
> 
> We're using the default filter options with all frameworks and default 
> allocation interval.
> 
> On 21 Feb 2016, at 08:10, Guangya Liu  > wrote:
> 
>> Hi Tom,
>> 
>> I traced the agent of "20160112-165226-67375276-5050-22401-S199" and found 
>> that it is keeps declining by many frameworks: once a framework got it, the 
>> framework will decline it immediately. Does some your framework has special 
>> offer filter logic?
>> 
>> Also I want to get more for your cluster:
>> 1) What is the role for each framework and what is the weight for each role?
>> 2) Do you start all agents without any reservation?
>> 
>> Thanks,
>> 
>> Guangya 
>> 
>> On Sun, Feb 21, 2016 at 9:23 AM, Klaus Ma 

Re: Mesos sometimes not allocating the entire cluster

2016-02-22 Thread Guangya Liu
Hi Tom,

I think that your cluster should have some role, weight configuration
because I can see there are at least two agent has role with "dev"
configured.

56 1363 I0219 18:08:26.284010 28810 hierarchical.hpp:1025] Filtered
ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
slave 20160112-165226-67375276-5050-22401-S300 for framework
20160219-164457-67375276-5050-28802-0015
57 1364 I0219 18:08:26.284162 28810 hierarchical.hpp:941] Allocating
ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
slave 20160112-165226-67375276-5050-22401-S300 to framework
20160219-164457-67375276-5050-28802-0014
58 1365 I0219 18:08:26.286725 28810 hierarchical.hpp:1025] Filtered
ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
slave 20160112-165226-67375276-5050-22401-S303 for framework
20160219-164457-67375276-5050-28802-0015
59 1366 I0219 18:08:26.286875 28810 hierarchical.hpp:941] Allocating
ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
slave 20160112-165226-67375276-5050-22401-S303 to framework
20160219-164457-67375276-5050-28802-0014

Also I think that the framework 20160219-164457-67375276-5050-28802-0014
and 20160219-164457-67375276-5050-28802-0015 may have a high weight cause I
saw that framework  20160219-164457-67375276-5050-28802-0014 get 26 agents
at 18:08:26.

Another is that some other agents may also have role configured but no
frameworks are configured with the agent role and this caused some agents
have some static reserved resources cannot be allocated.

I searched 20160112-174949-84152492-5050-19807-S316 in the log and found
that it was allocating the following resources to frameworks:

974 2420 I0219 18:08:37.504587 28808 hierarchical.hpp:941] Allocating
ports(*):[3000-5000]; cpus(*):0.5; mem(*):16384; disk(*):51200 on slave
20160112-174949-84152492-5050-19807-S316 to framework
20160219-164457-67375276-5050-28802-0014

This agent should have another 9.5 cpus reserved by some role and no
framework is configured using resources from this role, thus the resources
on this role are wasting.  I think that the following agent may also have
some reserved resources configured:
20160112-174949-84152492-5050-19807-S317,
20160112-174949-84152492-5050-19807-S322 and even more agents.

So I would suggest that you check the master and each slave start command
to see how does role configured. You can also check this via the
command: < curl
"http://master-ip:5050/master/state.json; 2>/dev/null| jq . >  (Note: There
is a dot in the end of the command) to get all slave resources status:
reserved, used, total resources etc.

Thanks,

Guangya


On Mon, Feb 22, 2016 at 5:16 PM, Tom Arnfeld  wrote:

> No roles, no reservations.
>
> We're using the default filter options with all frameworks and default
> allocation interval.
>
> On 21 Feb 2016, at 08:10, Guangya Liu  wrote:
>
> Hi Tom,
>
> I traced the agent of "20160112-165226-67375276-5050-22401-S199" and found
> that it is keeps declining by many frameworks: once a framework got it, the
> framework will decline it immediately. Does some your framework has special
> offer filter logic?
>
> Also I want to get more for your cluster:
> 1) What is the role for each framework and what is the weight for each
> role?
> 2) Do you start all agents without any reservation?
>
> Thanks,
>
> Guangya
>
> On Sun, Feb 21, 2016 at 9:23 AM, Klaus Ma  wrote:
>
>> Hi Tom,
>>
>> What's the allocation interval, can you try to reduce filter's timeout of
>> framework?
>>
>> According to the log, ~12 frameworks on cluster with ~42 agents; the
>> filter duration is 5sec, and there're ~60 times filtered in each seconds
>> (e.g. 65 in 18:08:34). For example, framework 
>> (20160219-164457-67375276-5050-28802-0015)
>> just get resources from 6 agents and filtered the other 36 agents at
>> 18:08:35 (egrep "Alloca|Filtered" mesos-master.log | grep
>> "20160219-164457-67375276-5050-28802-0015" | grep "18:08:35")
>>
>> Thanks
>> Klaus
>>
>> --
>> From: t...@duedil.com
>> Subject: Re: Mesos sometimes not allocating the entire cluster
>> Date: Sat, 20 Feb 2016 16:36:54 +
>> To: user@mesos.apache.org
>>
>> Hi Guangya,
>>
>> Indeed we have about ~45 agents. I’ve attached the log from the master…
>>
>>
>>
>> Hope there’s something here that highlights the issue, we can’t find
>> anything that we can’t explain.
>>
>> Cheers,
>>
>> Tom.
>>
>> On 19 Feb 2016, at 03:02, Guangya Liu  wrote:
>>
>> Hi Tom,
>>
>> After the patch was applied, there is no need to restart framework but
>> only mesos master.
>>
>> One question is that I saw from your log, seems your cluster has at least
>> 36 agents, right? I was asking this question because if there are more
>> frameworks than agents, frameworks with low weight may not able to get
>> resources sometimes.
>>
>> Can you please enable GLOG_v=2 for mesos master for a 

RE: AW: Feature request: move in-flight containers w/o stopping them

2016-02-22 Thread Aaron Carey
If this is of any use to anyone: There is also an outstanding branch of Docker 
which has checkpoint/restore functionality in it (based on CRIU I believe) 
which is hopefully being merged into experimental soon.


From: Sharma Podila [spod...@netflix.com]
Sent: 19 February 2016 14:49
To: user@mesos.apache.org
Subject: Re: AW: Feature request: move in-flight containers w/o stopping them

Moving stateless services can be trivial or a non problem, as others have 
suggested.
Migrating state full services becomes a function of migrating the state, 
including any network conx, etc. To think aloud, from a bit of past 
considerations in hpc like systems, some systems relied upon the underlying 
systems to support migration (vMotion, etc.), to 3rd party libraries (was that 
Meiosys) that could work on existing application binaries, to libraries 
(BLCR) 
that need support from application developer. I was involved with providing 
support for BLCR based applications. One of the challenges was the time to 
checkpoint an application with large memory footprint, say, 100 GB or more, 
which isn't uncommon in hpc. Incremental checkpointing wasn't an option, at 
least at that point.
Regardless, Mesos' support for checkpoint-restore would have to consider the 
type of checkpoint-restore being used. I would imagine that the core part of 
the solution would be simple'ish, in providing a "workflow" for the 
checkpoint-restore system (sort of send signal to start checkpoint, wait 
certain time to complete or timeout). Relatively less simple would be the 
actual integration of the checkpoint-restore system and dealing with its 
constraints and idiosyncrasies.


On Fri, Feb 19, 2016 at 4:50 AM, Dick Davies 
> wrote:
Agreed, vMotion always struck me as something for those monolithic
apps with a lot of local state.

The industry seems to be moving away from that as fast as its little
legs will carry it.

On 19 February 2016 at 11:35, Jason Giedymin 
> wrote:
> Food for thought:
>
> One should refrain from monolithic apps. If they're small and stateless you
> should be doing rolling upgrades.
>
> If you find yourself with one container and you can't easily distribute that
> work load by just scaling and load balancing then you have a monolith. Time
> to enhance it.
>
> Containers should not be treated like VMs.
>
> -Jason
>
> On Feb 19, 2016, at 6:05 AM, Mike Michel 
> > wrote:
>
> Question is if you really need this when you are moving in the world of
> containers/microservices where it is about building stateless 12factor apps
> except databases. Why moving a service when you can just kill it and let the
> work be done by 10 other containers doing the same? I remember a talk on
> dockercon about containers and live migration. It was like: „And now where
> you know how to do it, dont’t do it!“
>
>
>
> Von: Avinash Sridharan 
> [mailto:avin...@mesosphere.io]
> Gesendet: Freitag, 19. Februar 2016 05:48
> An: user@mesos.apache.org
> Betreff: Re: Feature request: move in-flight containers w/o stopping them
>
>
>
> One problem with implementing something like vMotion for Mesos is to address
> seamless movement of network connectivity as well. This effectively requires
> moving the IP address of the container across hosts. If the container shares
> host network stack, this won't be possible since this would imply moving the
> host IP address from one host to another. When a container has its network
> namespace, attached to the host, using a bridge, moving across L2 segments
> might be a possibility. To move across L3 segments you will need some form
> of overlay (VxLAN maybe ?) .
>
>
>
> On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor 
> > wrote:
>
> Is this theoretically feasible with Linux checkpoint and restore, perhaps
> via CRIU?http://criu.org/Main_Page
>
>
> On Feb 18, 2016, at 4:35 AM, Paul Bell 
> > wrote:
>
> Hello All,
>
>
>
> Has there ever been any consideration of the ability to move in-flight
> containers from one Mesos host node to another?
>
>
>
> I see this as analogous to VMware's "vMotion" facility wherein VMs can be
> moved from one ESXi host to another.
>
>
>
> I suppose something like this could be useful from a load-balancing
> perspective.
>
>
>
> Just curious if it's ever been considered and if so - and rejected - why
> rejected?
>
>
>
> Thanks.
>
>
>
> -Paul
>
>
>
>
>
>
>
>
>
> --
>
> Avinash Sridharan, Mesosphere
>
> +1 (323) 702 5245



RE: AW: Feature request: move in-flight containers w/o stopping them

2016-02-22 Thread Aaron Carey
Would you be able to elaborate a bit more on how you did this?


From: Mauricio Garavaglia [mauri...@medallia.com]
Sent: 19 February 2016 19:20
To: user@mesos.apache.org
Subject: Re: AW: Feature request: move in-flight containers w/o stopping them

Mesos is not only about running stateless microservices to handle http 
requests. There are long duration workloads that would benefit from being 
rescheduled to a different host and not being interrupted; i.e. to implement 
dynamic bin packing in the cluster.

The networking issues has been proved through CRIU that is possible even at the 
socket level. Regarding IP moving around, Project 
Calico offers a way to do that; We tried with a 
homemade modifications to do it using docker and OSPF and it works very well.

On Fri, Feb 19, 2016 at 11:49 AM, Sharma Podila 
> wrote:
Moving stateless services can be trivial or a non problem, as others have 
suggested.
Migrating state full services becomes a function of migrating the state, 
including any network conx, etc. To think aloud, from a bit of past 
considerations in hpc like systems, some systems relied upon the underlying 
systems to support migration (vMotion, etc.), to 3rd party libraries (was that 
Meiosys) that could work on existing application binaries, to libraries 
(BLCR) 
that need support from application developer. I was involved with providing 
support for BLCR based applications. One of the challenges was the time to 
checkpoint an application with large memory footprint, say, 100 GB or more, 
which isn't uncommon in hpc. Incremental checkpointing wasn't an option, at 
least at that point.
Regardless, Mesos' support for checkpoint-restore would have to consider the 
type of checkpoint-restore being used. I would imagine that the core part of 
the solution would be simple'ish, in providing a "workflow" for the 
checkpoint-restore system (sort of send signal to start checkpoint, wait 
certain time to complete or timeout). Relatively less simple would be the 
actual integration of the checkpoint-restore system and dealing with its 
constraints and idiosyncrasies.


On Fri, Feb 19, 2016 at 4:50 AM, Dick Davies 
> wrote:
Agreed, vMotion always struck me as something for those monolithic
apps with a lot of local state.

The industry seems to be moving away from that as fast as its little
legs will carry it.

On 19 February 2016 at 11:35, Jason Giedymin 
> wrote:
> Food for thought:
>
> One should refrain from monolithic apps. If they're small and stateless you
> should be doing rolling upgrades.
>
> If you find yourself with one container and you can't easily distribute that
> work load by just scaling and load balancing then you have a monolith. Time
> to enhance it.
>
> Containers should not be treated like VMs.
>
> -Jason
>
> On Feb 19, 2016, at 6:05 AM, Mike Michel 
> > wrote:
>
> Question is if you really need this when you are moving in the world of
> containers/microservices where it is about building stateless 12factor apps
> except databases. Why moving a service when you can just kill it and let the
> work be done by 10 other containers doing the same? I remember a talk on
> dockercon about containers and live migration. It was like: „And now where
> you know how to do it, dont’t do it!“
>
>
>
> Von: Avinash Sridharan 
> [mailto:avin...@mesosphere.io]
> Gesendet: Freitag, 19. Februar 2016 05:48
> An: user@mesos.apache.org
> Betreff: Re: Feature request: move in-flight containers w/o stopping them
>
>
>
> One problem with implementing something like vMotion for Mesos is to address
> seamless movement of network connectivity as well. This effectively requires
> moving the IP address of the container across hosts. If the container shares
> host network stack, this won't be possible since this would imply moving the
> host IP address from one host to another. When a container has its network
> namespace, attached to the host, using a bridge, moving across L2 segments
> might be a possibility. To move across L3 segments you will need some form
> of overlay (VxLAN maybe ?) .
>
>
>
> On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor 
> > wrote:
>
> Is this theoretically feasible with Linux checkpoint and restore, perhaps
> via CRIU?http://criu.org/Main_Page
>
>
> On Feb 18, 2016, at 4:35 AM, Paul Bell 
> > wrote:
>
> Hello All,
>
>
>
> Has there ever been any consideration of the ability to move in-flight
> containers from one Mesos host node to another?
>
>
>
> I see this as analogous to VMware's "vMotion" facility