[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139745#comment-16139745 ] Michael Park commented on MESOS-7714: - https://reviews.apache.org/r/61880/ > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Assignee: Michael Park >Priority: Critical > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129728#comment-16129728 ] Adam B commented on MESOS-7714: --- Downgrading from blocker because [~mcypark] says "I think we’ll have to ship without it". Please retarget to 1.4.1 and/or 1.5.0 so [~karya] and [~anandmazumdar] can cut 1.4.0-rc1 > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Assignee: Michael Park >Priority: Critical > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120473#comment-16120473 ] Yan Xu commented on MESOS-7714: --- [~mcypark] did you get a chance to work on this? > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Assignee: Michael Park >Priority: Blocker > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107680#comment-16107680 ] Yan Xu commented on MESOS-7714: --- Yes we are. Thanks! Hope we can prioritize this one (possibly over other 1.4 blockers) so we can promote dev versions of 1.4 further for more thorough testing. > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Priority: Blocker > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105962#comment-16105962 ] Michael Park commented on MESOS-7714: - Ah, yes, and this ticket is for (2). Seems like we're on the same page now? > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Priority: Blocker > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105960#comment-16105960 ] Yan Xu commented on MESOS-7714: --- I mean when we are not using new features, so this appears to be 2). I didn't know the details until I just read the design doc and saw that you mentioned about the agent "On disk (checkpointing), it will also generally use the new Resources format, except for resources with a single dynamic reservation it will continue to checkpoint in the old Resource format." > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Priority: Blocker > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105893#comment-16105893 ] Michael Park commented on MESOS-7714: - In order to support (1) I think we'd have to checkpoint resources with refined reservations in a different location. You're saying you wouldn't want to upgrade to 1.4 because you can't downgrade once people start using new features? Just for comparison, we have the same limitations for multi-role support. That is, once you upgrade to 1.3 and and frameworks start using multi-role, you can't downgrade. > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Priority: Blocker > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105874#comment-16105874 ] Yan Xu commented on MESOS-7714: --- I see, but 1) is a real operational concern for upgrading to 1.4 right? I wouldn't want to upgrade my agents to 1.4 knowing I won't be able to roll them back once refined reservations are made (i.e., after they are used for a while)... I think to support 1) we have to support 'pre-reservation-refinement' for a while (across 1.x versions?) https://github.com/apache/mesos/blob/master/docs/versioning.md#upgrades mentions upgrades but not downgrades but I don't see how it would work if downgrades are not implicitly covered by the same guarantee... Thoughts? > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Priority: Blocker > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105864#comment-16105864 ] Michael Park commented on MESOS-7714: - Ah okay. First, the support is for downgrading a 1.4 agent to <= 1.3.x agent as long as refined reservations have not been made yet. The way we achieve this is to "downgrade" all the resources that get checkpointed in the "pre-reservation-refinement" format as long as none of them have refined reservations. The reason why that {{CHECK}} would fail would be either (1) there are refined reservations made on the 1.4 agent, or (2) there are resources that we didn't checkpoint in the "pre-reservation-refinement" when we should have. The goal of this ticket is to fix (2). > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Priority: Blocker > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103626#comment-16103626 ] Yan Xu commented on MESOS-7714: --- Great. I guess I am just not clear on the mechanism to achieve that. Downgrading currently fails [the CHECK here|https://github.com/apache/mesos/blob/1.3.0/src/slave/paths.cpp#L478] for me (from a 1.4 agent with persistent volumes). It looks like the proposal is to commit some downgrading logic to 1.3.x branch? Sorry it's not clear to me if the case I am mentioning is covered. > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Priority: Blocker > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103520#comment-16103520 ] Michael Park commented on MESOS-7714: - [~xujyan]: This *is* for the downgrade of 1.4 to <= 1.3.x. > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Priority: Blocker > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement
[ https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103505#comment-16103505 ] Yan Xu commented on MESOS-7714: --- [~mcypark] Just to be sure. This ticket is not for supporting downgrade of an agent from 1.4 to <= 1.3.x right? Could you clarify? > Fix agent downgrade for reservation refinement > -- > > Key: MESOS-7714 > URL: https://issues.apache.org/jira/browse/MESOS-7714 > Project: Mesos > Issue Type: Bug >Reporter: Michael Park >Priority: Blocker > > The agent code only partially supports downgrading of an agent correctly. > The checkpointed resources are done correctly, but the resources within > the {{SlaveInfo}} message as well as tasks and executors also need to be > downgraded > correctly and converted back on recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)