Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-07-10 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review180073
---



Please comment with the result of a real reboot test.

- Jiang Yan Xu


On June 29, 2017, 4:42 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 29, 2017, 4:42 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/10/
> 
> 
> Testing
> ---
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-07-06 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review179817
---


Ship it!




We'll need to document this behavior change in CHANGELOG and upgrades.md.

- Jiang Yan Xu


On June 29, 2017, 4:42 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 29, 2017, 4:42 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/10/
> 
> 
> Testing
> ---
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-29 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/
---

(Updated June 29, 2017, 11:42 p.m.)


Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.


Bugs: MESOS-6223
https://issues.apache.org/jira/browse/MESOS-6223


Repository: mesos


Description
---

Prior to Mesos 1.4 we bypass the state recovery and
start as a new agent upon reboot. Starting in Mesos 1.4
we'll attempt to recover the slave state even after reboot
except for when there is a SlaveInfo mismatch.
Here, we cleanup the slave state for a rebooted agent if
there's been a SlaveInfo mismatch during recovery to
ensure that no other state is recovered and the
agent enventually registers as a new agent.


Diffs (updated)
-

  src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 


Diff: https://reviews.apache.org/r/60105/diff/10/

Changes: https://reviews.apache.org/r/60105/diff/9-10/


Testing
---

make check done together with 60104 and 56895


Thanks,

Megha Sharma



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-29 Thread Megha Sharma


> On June 24, 2017, 6:19 a.m., Jiang Yan Xu wrote:
> > src/slave/slave.cpp
> > Lines 6014 (patched)
> > 
> >
> > s/INFO/WARNING/.

Changed to warning. I was debating earlier between warning and info and felt 
info was more apt since mesos is already doing the remediation here.


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178830
---


On June 29, 2017, 11:35 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 29, 2017, 11:35 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/9/
> 
> 
> Testing
> ---
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-29 Thread Megha Sharma


> On June 24, 2017, 6:19 a.m., Jiang Yan Xu wrote:
> > src/slave/slave.cpp
> > Lines 6011-6020 (patched)
> > 
> >
> > Could you use the code in my comment:
> > 
> > ```
> >   if (!state->rebooted) {
> > return Failure(message);
> >   }
> > 
> >   LOG(WARNING) << "Falling back to recover as a new agent due to error: 
> > "
> ><< message;
> > 
> >   // Clean up the slave state to avoid any state recovery for the
> >   // old agent.
> >   slaveState = None();
> > ```
> > 
> > I'll comment on the specific lines for the reasoning.

Aah my bad, will get rid of the else.


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178830
---


On June 29, 2017, 11:35 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 29, 2017, 11:35 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/9/
> 
> 
> Testing
> ---
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-29 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/
---

(Updated June 29, 2017, 11:35 p.m.)


Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.


Bugs: MESOS-6223
https://issues.apache.org/jira/browse/MESOS-6223


Repository: mesos


Description
---

Prior to Mesos 1.4 we bypass the state recovery and
start as a new agent upon reboot. Starting in Mesos 1.4
we'll attempt to recover the slave state even after reboot
except for when there is a SlaveInfo mismatch.
Here, we cleanup the slave state for a rebooted agent if
there's been a SlaveInfo mismatch during recovery to
ensure that no other state is recovered and the
agent enventually registers as a new agent.


Diffs (updated)
-

  src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 


Diff: https://reviews.apache.org/r/60105/diff/9/

Changes: https://reviews.apache.org/r/60105/diff/8-9/


Testing
---

make check done together with 60104 and 56895


Thanks,

Megha Sharma



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-24 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178830
---


Fix it, then Ship it!





src/slave/slave.cpp
Lines 6011-6020 (patched)


Could you use the code in my comment:

```
  if (!state->rebooted) {
return Failure(message);
  }

  LOG(WARNING) << "Falling back to recover as a new agent due to error: "
   << message;

  // Clean up the slave state to avoid any state recovery for the
  // old agent.
  slaveState = None();
```

I'll comment on the specific lines for the reasoning.



src/slave/slave.cpp
Lines 6013 (patched)


`else`: I was suggesting pulling up the direct return case so we can avoid 
another `else` block. Normally one style is not necessary superior than the 
other but in this case it helps avoid three levels if `if ... else` branching.



src/slave/slave.cpp
Lines 6014 (patched)


s/INFO/WARNING/.



src/slave/slave.cpp
Lines 6015 (patched)


No `\n` necessary.



src/slave/slave.cpp
Lines 6016 (patched)


We normally in logs first state what action is taken and then attach the 
error after `: `. In this case since the error has multiple lines, let's state 
the action first so it's not buried by the error.


- Jiang Yan Xu


On June 23, 2017, 2:30 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 23, 2017, 2:30 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/8/
> 
> 
> Testing
> ---
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-23 Thread Jiang Yan Xu


> On June 23, 2017, 11:21 a.m., Jiang Yan Xu wrote:
> > src/slave/slave.cpp
> > Line 6003 (original), 6007 (patched)
> > 
> >
> > Empty line above.
> 
> Megha Sharma wrote:
> This is not possible I guess, git pre-commit hook complains: Redundant 
> blank line at the start of a code block should be deleted.

OK let's see the change. it's not the start of a code block in the verison I 
commented on. :)


- Jiang Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178750
---


On June 23, 2017, 2:30 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 23, 2017, 2:30 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/8/
> 
> 
> Testing
> ---
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-23 Thread Megha Sharma


> On June 23, 2017, 6:21 p.m., Jiang Yan Xu wrote:
> > src/slave/slave.cpp
> > Line 6003 (original), 6007 (patched)
> > 
> >
> > Empty line above.

This is not possible I guess, git pre-commit hook complains: Redundant blank 
line at the start of a code block should be deleted.


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178750
---


On June 23, 2017, 9:30 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 23, 2017, 9:30 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/8/
> 
> 
> Testing
> ---
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-23 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/
---

(Updated June 23, 2017, 9:30 p.m.)


Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.


Bugs: MESOS-6223
https://issues.apache.org/jira/browse/MESOS-6223


Repository: mesos


Description
---

Prior to Mesos 1.4 we bypass the state recovery and
start as a new agent upon reboot. Starting in Mesos 1.4
we'll attempt to recover the slave state even after reboot
except for when there is a SlaveInfo mismatch.
Here, we cleanup the slave state for a rebooted agent if
there's been a SlaveInfo mismatch during recovery to
ensure that no other state is recovered and the
agent enventually registers as a new agent.


Diffs (updated)
-

  src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 


Diff: https://reviews.apache.org/r/60105/diff/8/

Changes: https://reviews.apache.org/r/60105/diff/7-8/


Testing
---

make check done together with 60104 and 56895


Thanks,

Megha Sharma



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-23 Thread Megha Sharma


> On June 23, 2017, 6:21 p.m., Jiang Yan Xu wrote:
> > src/slave/slave.cpp
> > Lines 5994-5999 (patched)
> > 
> >
> > I tweaked it a little bit:
> > 
> > ```
> >   // Fail the recovery unless the agent is recovering for the first
> >   // time after host reboot.
> >   //
> >   // Prior to Mesos 1.4 we directly bypass the state recovery and
> >   // start as a new agent upon reboot (introduced in MESOS-844).
> >   // This unncessarily discards the existing agent ID (MESOS-6223).
> >   // Starting in Mesos 1.4 we'll attempt to recover the slave state
> >   // even after reboot but in case of slave info mismatch we'll fall
> >   // back to recovering as a new agent (existing behavior). This
> >   // prevents the agent from flapping if the slave info (resources,
> >   // attributes, etc.) change is due to host maintainance associated
> >   // with the reboot.
> > ```
> > 
> > What do you think? Feel free to improve on it.

+1, good and concise explanation about the changed behavior.


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178750
---


On June 23, 2017, 5:19 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 23, 2017, 5:19 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/7/
> 
> 
> Testing
> ---
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-23 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178750
---




src/slave/slave.cpp
Lines 5980-5981 (original), 5980-5981 (patched)


```
SlaveInfo _info(info);
```

We typically add a prefix for two variations of the same thing.



src/slave/slave.cpp
Line 5985 (original), 5985 (patched)


Nit: s/errorMessage/message/

There should be no confusion here with a shorter name.



src/slave/slave.cpp
Lines 5994-5999 (patched)


I tweaked it a little bit:

```
  // Fail the recovery unless the agent is recovering for the first
  // time after host reboot.
  //
  // Prior to Mesos 1.4 we directly bypass the state recovery and
  // start as a new agent upon reboot (introduced in MESOS-844).
  // This unncessarily discards the existing agent ID (MESOS-6223).
  // Starting in Mesos 1.4 we'll attempt to recover the slave state
  // even after reboot but in case of slave info mismatch we'll fall
  // back to recovering as a new agent (existing behavior). This
  // prevents the agent from flapping if the slave info (resources,
  // attributes, etc.) change is due to host maintainance associated
  // with the reboot.
```

What do you think? Feel free to improve on it.



src/slave/slave.cpp
Lines 6000 (patched)


`if (state.rebooted)` instead? (Whitespace and variable usage)

The above examples

```
  Option resourcesState = state->resources;
  Option slaveState = state->slave;
```

help reduce line length and are used many times. It's not the same 
situation with `rebooted`.



src/slave/slave.cpp
Lines 5997-6003 (original), 6000-6016 (patched)


To avoid going to deep in the branching hierarchy perhaps the following is 
better?

```
  if (!state.rebooted) {
return Failure(message);
  }

  LOG(WARNING) << "Falling back to recover as a new agent due to error: 
"
   << message;

  // Cleaning up the slave state to avoid any state recovery for the
  // old agent.
  slaveState = None();
```



src/slave/slave.cpp
Lines 5997-6002 (original), 6001-6006 (patched)


This log message and metric increment used to occur outside

```
if (flags.recover == "reconnect" &&
!(infoCopy == slaveState->info.get())) {
```

but it's now changed.

How about we put it directly under 

```
if (slaveState.isSome() && slaveState->info.isSome()) {
```

You can find similar handling above with `resourcesState`:

```
  if (resourcesState.isSome()) {
if (resourcesState->errors > 0) {
  LOG(WARNING) << "Errors encountered during resources recovery: "
   << resourcesState->errors;

  metrics.recovery_errors += resourcesState->errors;
}
```



src/slave/slave.cpp
Line 6003 (original), 6007 (patched)


Empty line above.



src/slave/slave.cpp
Line 6004 (original), 6019 (patched)


Empty line above comments.


- Jiang Yan Xu


On June 23, 2017, 10:19 a.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 23, 2017, 10:19 a.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/7/
> 
> 
> Testing
> ---
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-23 Thread Megha Sharma


> On June 21, 2017, 4:39 p.m., Jiang Yan Xu wrote:
> > src/slave/slave.cpp
> > Lines 5798-5867 (original), 5798-5867 (patched)
> > 
> >
> > So all of this work is only useful for recovering frameworks, it looks 
> > like we don't need to do them before we decide we actually are going to 
> > recover the frameworks (i.e., not continue as a new agent)?
> > 
> > This reasoning certainly also applies to the current HEAD but since we 
> > are donig refactor here, we should probably address this (in a separate 
> > preceding review perhaps)
> 
> Megha Sharma wrote:
> I strongly agree we should defer doing this work until we start to 
> recover the frameworks. I was thinking of doing this may be as another review 
> for slave recovery improvements after this change is in.

Agreed to do in a separate refactoring/improvements patch may be after this one 
lands.


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178388
---


On June 23, 2017, 5:19 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 23, 2017, 5:19 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/7/
> 
> 
> Testing
> ---
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-23 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/
---

(Updated June 23, 2017, 5:19 p.m.)


Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.


Bugs: MESOS-6223
https://issues.apache.org/jira/browse/MESOS-6223


Repository: mesos


Description
---

Prior to Mesos 1.4 we bypass the state recovery and
start as a new agent upon reboot. Starting in Mesos 1.4
we'll attempt to recover the slave state even after reboot
except for when there is a SlaveInfo mismatch.
Here, we cleanup the slave state for a rebooted agent if
there's been a SlaveInfo mismatch during recovery to
ensure that no other state is recovered and the
agent enventually registers as a new agent.


Diffs (updated)
-

  src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 


Diff: https://reviews.apache.org/r/60105/diff/7/

Changes: https://reviews.apache.org/r/60105/diff/6-7/


Testing
---

make check done together with 60104 and 56895


Thanks,

Megha Sharma



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-22 Thread Megha Sharma


> On June 21, 2017, 4:39 p.m., Jiang Yan Xu wrote:
> > PTAL at these comments but I realize for this review it's probably easier 
> > if we sync on the high-level strategy first.

Rearranged the code as per our discussion.


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178388
---


On June 22, 2017, 11:38 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 22, 2017, 11:38 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/6/
> 
> 
> Testing
> ---
> 
> make check done together with 60104 and 56895
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-22 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/
---

(Updated June 22, 2017, 11:38 p.m.)


Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.


Bugs: MESOS-6223
https://issues.apache.org/jira/browse/MESOS-6223


Repository: mesos


Description
---

Prior to Mesos 1.4 we bypass the state recovery and
start as a new agent upon reboot. Starting in Mesos 1.4
we'll attempt to recover the slave state even after reboot
except for when there is a SlaveInfo mismatch.
Here, we cleanup the slave state for a rebooted agent if
there's been a SlaveInfo mismatch during recovery to
ensure that no other state is recovered and the
agent enventually registers as a new agent.


Diffs (updated)
-

  src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 


Diff: https://reviews.apache.org/r/60105/diff/6/

Changes: https://reviews.apache.org/r/60105/diff/5-6/


Testing
---

make check done together with 60104 and 56895


Thanks,

Megha Sharma



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-22 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/
---

(Updated June 22, 2017, 9:16 p.m.)


Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.


Bugs: MESOS-6223
https://issues.apache.org/jira/browse/MESOS-6223


Repository: mesos


Description
---

Prior to Mesos 1.4 we bypass the state recovery and
start as a new agent upon reboot. Starting in Mesos 1.4
we'll attempt to recover the slave state even after reboot
except for when there is a SlaveInfo mismatch.
Here, we cleanup the slave state for a rebooted agent if
there's been a SlaveInfo mismatch during recovery to
ensure that no other state is recovered and the
agent enventually registers as a new agent.


Diffs
-

  src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 


Diff: https://reviews.apache.org/r/60105/diff/5/


Testing (updated)
---

make check done together with 60104 and 56895


Thanks,

Megha Sharma



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-21 Thread Megha Sharma


> On June 21, 2017, 4:39 p.m., Jiang Yan Xu wrote:
> > src/slave/slave.cpp
> > Lines 5798-5867 (original), 5798-5867 (patched)
> > 
> >
> > So all of this work is only useful for recovering frameworks, it looks 
> > like we don't need to do them before we decide we actually are going to 
> > recover the frameworks (i.e., not continue as a new agent)?
> > 
> > This reasoning certainly also applies to the current HEAD but since we 
> > are donig refactor here, we should probably address this (in a separate 
> > preceding review perhaps)

I strongly agree we should defer doing this work until we start to recover the 
frameworks. I was thinking of doing this may be as another review for slave 
recovery improvements after this change is in.


- Megha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/#review178388
---


On June 21, 2017, 11:22 p.m., Megha Sharma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60105/
> ---
> 
> (Updated June 21, 2017, 11:22 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Prior to Mesos 1.4 we bypass the state recovery and
> start as a new agent upon reboot. Starting in Mesos 1.4
> we'll attempt to recover the slave state even after reboot
> except for when there is a SlaveInfo mismatch.
> Here, we cleanup the slave state for a rebooted agent if
> there's been a SlaveInfo mismatch during recovery to
> ensure that no other state is recovered and the
> agent enventually registers as a new agent.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 
> 
> 
> Diff: https://reviews.apache.org/r/60105/diff/5/
> 
> 
> Testing
> ---
> 
> Latest make check results pending.
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>



Re: Review Request 60105: Clean rebooted slave's state if slaveInfo mismatches.

2017-06-21 Thread Megha Sharma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60105/
---

(Updated June 21, 2017, 11:22 p.m.)


Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.


Summary (updated)
-

Clean rebooted slave's state if slaveInfo mismatches.


Bugs: MESOS-6223
https://issues.apache.org/jira/browse/MESOS-6223


Repository: mesos


Description
---

Prior to Mesos 1.4 we bypass the state recovery and
start as a new agent upon reboot. Starting in Mesos 1.4
we'll attempt to recover the slave state even after reboot
except for when there is a SlaveInfo mismatch.
Here, we cleanup the slave state for a rebooted agent if
there's been a SlaveInfo mismatch during recovery to
ensure that no other state is recovered and the
agent enventually registers as a new agent.


Diffs (updated)
-

  src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c 


Diff: https://reviews.apache.org/r/60105/diff/5/

Changes: https://reviews.apache.org/r/60105/diff/4-5/


Testing (updated)
---

Latest make check results pending.


Thanks,

Megha Sharma