[
https://issues.apache.org/jira/browse/MESOS-9460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787423#comment-16787423
]
Greg Mann commented on MESOS-9460:
--
Sharing some learnings here after a couple difficult attempts at resolving this
issue:
The most recent reviews attempt to add a {{Sequence}} to the master which
prevents problematic interleavings between multiple updates to the
allocator/master state. Due to recent changes which added the concept of
"orphan operations" (MESOS-9542), it has become difficult to accomplish this
with per-agent {{Sequence}}s, without some messy refactoring of functions like
{{recoverFramework()}}. It's fairly straightforward to accomplish a fix with a
global {{Sequence}} in the master, but this seems undesirable since, for
example, all calls to {{updateOperationStatus()}} would need to be sequenced.
Since the orphan operation code is tech debt which can be removed once
MESOS-9556 and MESOS-8582 are resolved, I'm hesitant to add more complexity to
the code on top of the orphan operation handling. I think I would prefer to
punt on the issue described in this ticket until those other issues are
resolved, at which point we will be able to handle this one in a simpler way.
> Speculative operations may make master and allocator resource views out of
> sync.
>
>
> Key: MESOS-9460
> URL: https://issues.apache.org/jira/browse/MESOS-9460
> Project: Mesos
> Issue Type: Bug
> Components: agent, master
>Affects Versions: 1.5.1, 1.6.1, 1.7.0
>Reporter: Meng Zhu
>Assignee: Greg Mann
>Priority: Major
> Labels: foundations
>
> When speculative operations (RESERVE, UNRESERVE, CREATE, DESTROY) are issued
> via the master operator API, the master updates the allocator state in
> {{Master::apply()}}, and then later updates its internal state in
> {{Master::_apply}}. This means that other updates to the allocator may be
> interleaved between these two continuations, causing the master state to be
> out of sync with the allocator state.
> This bug could happen with the following sequence of events:
> - agent (re)registers with the master
> - multiple speculative operation calls are made to the master via the
> operator API
> - the allocator is speculatively updated in
> https://github.com/apache/mesos/blob/1d1af190b0eb674beecf20646d0b6ce082db4ed0/src/master/master.cpp#L11326
> - before agent resource gets updated, it sends `UpdateSlaveMessage` when
> getting the (re)registered message if it has the capability
> `RESOURCE_PROVIDER` or oversubscription is used
> (https://github.com/apache/mesos/blob/3badf7179992e61f30f5a79da9d481dd451c7c2f/src/slave/slave.cpp#L1560-L1566
> and
> https://github.com/apache/mesos/blob/3badf7179992e61f30f5a79da9d481dd451c7c2f/src/slave/slave.cpp#L1643-L1648)
> - as long as the first operation via the operator API has been added to the
> {{Slave}} struct at this point, then the master won't hit [this block
> here|https://github.com/apache/mesos/blob/1d1af190b0eb674beecf20646d0b6ce082db4ed0/src/master/master.cpp#L7940-L7945]
> and the `UpdateSlaveMessage` triggers allocator to update the total
> resources with STALE info from the {{Slave}} struct
> [here|https://github.com/apache/mesos/blob/1d1af190b0eb674beecf20646d0b6ce082db4ed0/src/master/master.cpp#L8207],
> thus the update from the previous operation is overwritten and LOST. Since
> the {{Slave}} struct has not yet been updated, the allocator update at that
> point uses stale resources from {{slave->totalResources}}.
> - agent finishes the operation and informs the master through
> `UpdateOperationStatusMessage` but for the speculative operation, we do not
> update the allocator
> https://github.com/apache/mesos/blob/3badf7179992e61f30f5a79da9d481dd451c7c2f/src/master/master.cpp#L11187-L11189
> - The resource views of the master/agent state and the allocator state are
> now inconsistent
> This caused MESOS-7971 and likely MESOS-9458 as well.
> It's unclear how this can be fixed in a reliable way. It's possible that
> ensuring that updates to the allocator state and the master state are
> performed in a single synchronous block of code could work, but in the case
> of operator-initiated operations this is difficult. It may also be possible
> to ensure consistency by ensuring that every time such updates are done in
> the master, the allocator is updated before the master state.
> This ticket will be Done when a comprehensive solution for this issue is
> designed. A subsequent ticket for actual implementation of that solution
> should be filed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)