[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-08-24 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139745#comment-16139745
 ] 

Michael Park commented on MESOS-7714:
-

https://reviews.apache.org/r/61880/

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Michael Park
>Priority: Critical
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-08-16 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129728#comment-16129728
 ] 

Adam B commented on MESOS-7714:
---

Downgrading from blocker because [~mcypark] says "I think we’ll have to ship 
without it".
Please retarget to 1.4.1 and/or 1.5.0 so [~karya] and [~anandmazumdar] can cut 
1.4.0-rc1

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Michael Park
>Priority: Critical
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-08-09 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120473#comment-16120473
 ] 

Yan Xu commented on MESOS-7714:
---

[~mcypark] did you get a chance to work on this?

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Michael Park
>Priority: Blocker
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-07-31 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107680#comment-16107680
 ] 

Yan Xu commented on MESOS-7714:
---

Yes we are. Thanks! Hope we can prioritize this one (possibly over other 1.4 
blockers) so we can promote dev versions of 1.4 further for more thorough 
testing.

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Priority: Blocker
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-07-28 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105962#comment-16105962
 ] 

Michael Park commented on MESOS-7714:
-

Ah, yes, and this ticket is for (2). Seems like we're on the same page now?

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Priority: Blocker
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-07-28 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105960#comment-16105960
 ] 

Yan Xu commented on MESOS-7714:
---

I mean when we are not using new features, so this appears to be 2). I didn't 
know the details until I just read the design doc and saw that you mentioned 
about the agent "On disk (checkpointing), it will also generally use the new 
Resources format, except for resources with a single dynamic reservation it 
will continue to checkpoint in the old Resource format."

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Priority: Blocker
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-07-28 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105893#comment-16105893
 ] 

Michael Park commented on MESOS-7714:
-

In order to support (1) I think we'd have to checkpoint resources with refined 
reservations in a different location.
You're saying you wouldn't want to upgrade to 1.4 because you can't downgrade 
once people start using new features?
Just for comparison, we have the same limitations for multi-role support. That 
is, once you upgrade to 1.3 and
and frameworks start using multi-role, you can't downgrade.

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Priority: Blocker
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-07-28 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105874#comment-16105874
 ] 

Yan Xu commented on MESOS-7714:
---

I see, but 1) is a real operational concern for upgrading to 1.4 right? I 
wouldn't want to upgrade my agents to 1.4 knowing I won't be able to roll them 
back once refined reservations are made (i.e., after they are used for a 
while)...

I think to support 1) we have to support 'pre-reservation-refinement' for a 
while (across 1.x versions?)

https://github.com/apache/mesos/blob/master/docs/versioning.md#upgrades 
mentions upgrades but not downgrades but I don't see how it would work if 
downgrades are not implicitly covered by the same guarantee...

Thoughts?

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Priority: Blocker
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-07-28 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105864#comment-16105864
 ] 

Michael Park commented on MESOS-7714:
-

Ah okay. First, the support is for downgrading a 1.4 agent to <= 1.3.x agent as 
long as refined reservations have not been made yet.
The way we achieve this is to "downgrade" all the resources that get 
checkpointed in the "pre-reservation-refinement" format
as long as none of them have refined reservations. The reason why that 
{{CHECK}} would fail would be either (1) there are refined
reservations made on the 1.4 agent, or (2) there are resources that we didn't 
checkpoint in the "pre-reservation-refinement" when
we should have. The goal of this ticket is to fix (2).

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Priority: Blocker
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-07-27 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103626#comment-16103626
 ] 

Yan Xu commented on MESOS-7714:
---

Great. I guess I am just not clear on the mechanism to achieve that.

Downgrading currently fails [the CHECK 
here|https://github.com/apache/mesos/blob/1.3.0/src/slave/paths.cpp#L478] for 
me (from a 1.4 agent with persistent volumes). It looks like the proposal is to 
commit some downgrading logic to 1.3.x branch? Sorry it's not clear to me if 
the case I am mentioning is covered.

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Priority: Blocker
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-07-27 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103520#comment-16103520
 ] 

Michael Park commented on MESOS-7714:
-

[~xujyan]: This *is* for the downgrade of 1.4 to <= 1.3.x.

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Priority: Blocker
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7714) Fix agent downgrade for reservation refinement

2017-07-27 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103505#comment-16103505
 ] 

Yan Xu commented on MESOS-7714:
---

[~mcypark] Just to be sure. This ticket is not for supporting downgrade of an 
agent from 1.4 to <= 1.3.x right? Could you clarify?

> Fix agent downgrade for reservation refinement
> --
>
> Key: MESOS-7714
> URL: https://issues.apache.org/jira/browse/MESOS-7714
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Priority: Blocker
>
> The agent code only partially supports downgrading of an agent correctly.
> The checkpointed resources are done correctly, but the resources within
> the {{SlaveInfo}} message as well as tasks and executors also need to be 
> downgraded
> correctly and converted back on recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)