Review Request 60417: Updated comments and help text in mesos-execute.

2017-06-24 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60417/
---

Review request for mesos and Adam B.


Repository: mesos


Description
---

Updated comments and help text in mesos-execute.


Diffs
-

  src/cli/execute.cpp 58eaa47bf8388424fd42f7830fdbb7cdecbebb7b 


Diff: https://reviews.apache.org/r/60417/diff/1/


Testing
---

Visual inspection of help output.


Thanks,

Neil Conway



Review Request 60416: Avoided master crash on re-registration of old agents.

2017-06-24 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60416/
---

Review request for mesos and Adam B.


Repository: mesos


Description
---

At present, agents that are refinement-capable send their task and
executor resources in the post-refinement format, whereas older agents
use the pre-refinement format. However, the master code only converted
agent resources to post-refinement format if the agent was not
multi-role capable. This leads to a subsequent `CHECK` failure for
agents that have multi-role but not reservation-refinement capabilities.

We should instead convert to post-refinement format for all
non-refinement capable agents.


Diffs
-

  src/master/master.cpp 33eca0d17459781fdc2ea915e8f40c78dd306962 


Diff: https://reviews.apache.org/r/60416/diff/1/


Testing
---

`make check`, manual testing.


Thanks,

Neil Conway



Review Request 60415: Added comment to master logic for downgrading checkpointed resources.

2017-06-24 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60415/
---

Review request for mesos and Adam B.


Repository: mesos


Description
---

Added comment to master logic for downgrading checkpointed resources.


Diffs
-

  src/master/master.cpp 33eca0d17459781fdc2ea915e8f40c78dd306962 


Diff: https://reviews.apache.org/r/60415/diff/1/


Testing
---


Thanks,

Neil Conway



Re: Review Request 60407: Avoided master crash on agent re-registration.

2017-06-24 Thread Neil Conway


> On June 24, 2017, 2:35 a.m., Adam B wrote:
> > src/master/validation.cpp
> > Line 370 (original), 370-372 (patched)
> > 
> >
> > Even if we can't `validateDynamicReservationInfo` (or  
> > `validateDiskInfo`?), would it be worthwhile to `validateGpus`? Maybe 
> > clone/parameterize `resource::validate` to validate what we can?

We could certainly do better, but since all this validation is best-effort 
anyway, I think the current approach is okay for now. I added `TODO`s to note 
this for future improvement.


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60407/#review178836
---


On June 24, 2017, 7:28 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60407/
> ---
> 
> (Updated June 24, 2017, 7:28 p.m.)
> 
> 
> Review request for mesos, Adam B and Michael Park.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When validating the agent's ReregisterSlaveMessage, the master's
> validation code neglected to account for the fact that the task
> resources might not be in post-refinement format (e.g., if the agent
> does not support reservation refinement). This lead to a `CHECK` failure
> during validation.
> 
> Fix this by relaxing the validation of ReregisterSlaveMessage so that we
> do not depend on the task resources being in post-refinement
> format. This means validation of ReregisterSlaveMessage will be less
> effective, but since it is best-effort anyway, this seems tolerable.
> 
> 
> Diffs
> -
> 
>   src/master/validation.cpp 33e9ff7db9e2789cbb2d6dfd015288dfa1faa7c5 
> 
> 
> Diff: https://reviews.apache.org/r/60407/diff/3/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 60407: Avoided master crash on agent re-registration.

2017-06-24 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60407/
---

(Updated June 24, 2017, 7:28 p.m.)


Review request for mesos, Adam B and Michael Park.


Changes
---

Add comments / TODOs.


Repository: mesos


Description
---

When validating the agent's ReregisterSlaveMessage, the master's
validation code neglected to account for the fact that the task
resources might not be in post-refinement format (e.g., if the agent
does not support reservation refinement). This lead to a `CHECK` failure
during validation.

Fix this by relaxing the validation of ReregisterSlaveMessage so that we
do not depend on the task resources being in post-refinement
format. This means validation of ReregisterSlaveMessage will be less
effective, but since it is best-effort anyway, this seems tolerable.


Diffs (updated)
-

  src/master/validation.cpp 33e9ff7db9e2789cbb2d6dfd015288dfa1faa7c5 


Diff: https://reviews.apache.org/r/60407/diff/3/

Changes: https://reviews.apache.org/r/60407/diff/2-3/


Testing
---

`make check`


Thanks,

Neil Conway



Re: Review Request 60405: Documented resource format in agent <-> master protocol.

2017-06-24 Thread Adam B

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60405/#review178847
---


Ship it!




Ship It!

- Adam B


On June 24, 2017, 10:30 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60405/
> ---
> 
> (Updated June 24, 2017, 10:30 a.m.)
> 
> 
> Review request for mesos, Adam B and Michael Park.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Documented resource format in agent <-> master protocol.
> 
> 
> Diffs
> -
> 
>   src/messages/messages.proto 2c086263fdcee4d54a76a61379c2d4dba5271d23 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60405/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 60405: Documented resource format in agent <-> master protocol.

2017-06-24 Thread Adam B


> On June 23, 2017, 7:21 p.m., Adam B wrote:
> > src/slave/slave.cpp
> > Lines 1405-1407 (original), 1405-1408 (patched)
> > 
> >
> > If this agent has refinements, and we send post format to an old 
> > master, will the old master safely reject the registration, crash and burn, 
> > or something in between?
> 
> Neil Conway wrote:
> The master will basically consider the resources to be unreserved; 
> because the master and agent will have inconsistent views of the resource 
> state at the agent, this will cause problems.
> 
> Since you need a new master to create reservation refinements in the 
> first place, you can only arrive in this situation by:
> 
> Upgrading master
> Upgrading agent
> Creating res refinement
> Downgrading master
> 
> Which arguably falls under the "don't downgrade if you are using new 
> featues" bucket. But yes, this is certainly unfortunate. Hard to prevent 
> without introducing something similar to master capabilities, which we 
> definitely need (MESOS-5675). I'll drop this issue for now, since AFAIK 
> there's not much we can do to improve this in the short term.

Sounds good. Thanks for the explanation. At least we don't crash..


> On June 23, 2017, 7:21 p.m., Adam B wrote:
> > src/slave/slave.cpp
> > Line 1408 (original), 1412-1414 (patched)
> > 
> >
> > We could at least log an INFO/WARN if we aren't able to downgrade, and 
> > still send it anyway.
> 
> Neil Conway wrote:
> Hmm, not sure a warning/log is warranted here. In the common case 
> (refined reservation, >= 1.4 master, >= 1.4 agent), we will fail to downgrade 
> the resources, but that is fine and expected. Should we really be cluttering 
> the logs with this information?

Good point. Don't want to over-log in the common case. Would be nice if we had 
master capabilities so we only had to log failure-to-downgrade when we're 
(re)registering with an old master.


- Adam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60405/#review178834
---


On June 24, 2017, 10:30 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60405/
> ---
> 
> (Updated June 24, 2017, 10:30 a.m.)
> 
> 
> Review request for mesos, Adam B and Michael Park.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Documented resource format in agent <-> master protocol.
> 
> 
> Diffs
> -
> 
>   src/messages/messages.proto 2c086263fdcee4d54a76a61379c2d4dba5271d23 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60405/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 60405: Documented resource format in agent <-> master protocol.

2017-06-24 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60405/
---

(Updated June 24, 2017, 5:30 p.m.)


Review request for mesos, Adam B and Michael Park.


Changes
---

Clarify comment.


Repository: mesos


Description
---

Documented resource format in agent <-> master protocol.


Diffs (updated)
-

  src/messages/messages.proto 2c086263fdcee4d54a76a61379c2d4dba5271d23 
  src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 


Diff: https://reviews.apache.org/r/60405/diff/3/

Changes: https://reviews.apache.org/r/60405/diff/2-3/


Testing
---


Thanks,

Neil Conway



Re: Review Request 60405: Documented resource format in agent <-> master protocol.

2017-06-24 Thread Neil Conway


> On June 24, 2017, 2:21 a.m., Adam B wrote:
> > src/slave/slave.cpp
> > Lines 1405-1407 (original), 1405-1408 (patched)
> > 
> >
> > If this agent has refinements, and we send post format to an old 
> > master, will the old master safely reject the registration, crash and burn, 
> > or something in between?

The master will basically consider the resources to be unreserved; because the 
master and agent will have inconsistent views of the resource state at the 
agent, this will cause problems.

Since you need a new master to create reservation refinements in the first 
place, you can only arrive in this situation by:

Upgrading master
Upgrading agent
Creating res refinement
Downgrading master

Which arguably falls under the "don't downgrade if you are using new featues" 
bucket. But yes, this is certainly unfortunate. Hard to prevent without 
introducing something similar to master capabilities, which we definitely need 
(MESOS-5675). I'll drop this issue for now, since AFAIK there's not much we can 
do to improve this in the short term.


> On June 24, 2017, 2:21 a.m., Adam B wrote:
> > src/slave/slave.cpp
> > Line 1408 (original), 1412-1414 (patched)
> > 
> >
> > We could at least log an INFO/WARN if we aren't able to downgrade, and 
> > still send it anyway.

Hmm, not sure a warning/log is warranted here. In the common case (refined 
reservation, >= 1.4 master, >= 1.4 agent), we will fail to downgrade the 
resources, but that is fine and expected. Should we really be cluttering the 
logs with this information?


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60405/#review178834
---


On June 24, 2017, 1:48 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60405/
> ---
> 
> (Updated June 24, 2017, 1:48 a.m.)
> 
> 
> Review request for mesos, Adam B and Michael Park.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Documented resource format in agent <-> master protocol.
> 
> 
> Diffs
> -
> 
>   src/messages/messages.proto 2c086263fdcee4d54a76a61379c2d4dba5271d23 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60405/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 60405: Documented resource format in agent <-> master protocol.

2017-06-24 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60405/#review178844
---




src/slave/slave.cpp
Lines 1405-1407 (original), 1405-1408 (patched)


The master will basically consider the resources to be unreserved; because 
the master and agent will have inconsistent views of the resource state at the 
agent, this will cause problems.

Since you need a new master to create reservation refinements in the first 
place, you can only arrive in this situation by:

1. Upgrading master
2. Upgrading agent
3. Creating res refinement
4. Downgrading master

Which arguably falls under the "don't downgrade if you are using new 
featues" bucket. But yes, this is certainly unfortunate. Hard to prevent 
without introducing something similar to master capabilities, which we 
definitely need (MESOS-5675). I'll drop this issue for now, since AFAIK there's 
not much we can do to improve this in the short term.



src/slave/slave.cpp
Line 1408 (original), 1412-1414 (patched)


Hmm, not sure a warning/log is warranted here. In the common case (refined 
reservation, >= 1.4 master, >= 1.4 agent), we will fail to downgrade the 
resources, but that is fine and expected. Should we really be cluttering the 
logs with this information?


- Neil Conway


On June 24, 2017, 1:48 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60405/
> ---
> 
> (Updated June 24, 2017, 1:48 a.m.)
> 
> 
> Review request for mesos, Adam B and Michael Park.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Documented resource format in agent <-> master protocol.
> 
> 
> Diffs
> -
> 
>   src/messages/messages.proto 2c086263fdcee4d54a76a61379c2d4dba5271d23 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60405/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Neil Conway
> 
>