----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60105/#review178750 -----------------------------------------------------------
src/slave/slave.cpp Lines 5980-5981 (original), 5980-5981 (patched) <https://reviews.apache.org/r/60105/#comment252968> ``` SlaveInfo _info(info); ``` We typically add a prefix for two variations of the same thing. src/slave/slave.cpp Line 5985 (original), 5985 (patched) <https://reviews.apache.org/r/60105/#comment252969> Nit: s/errorMessage/message/ There should be no confusion here with a shorter name. src/slave/slave.cpp Lines 5994-5999 (patched) <https://reviews.apache.org/r/60105/#comment253029> I tweaked it a little bit: ``` // Fail the recovery unless the agent is recovering for the first // time after host reboot. // // Prior to Mesos 1.4 we directly bypass the state recovery and // start as a new agent upon reboot (introduced in MESOS-844). // This unncessarily discards the existing agent ID (MESOS-6223). // Starting in Mesos 1.4 we'll attempt to recover the slave state // even after reboot but in case of slave info mismatch we'll fall // back to recovering as a new agent (existing behavior). This // prevents the agent from flapping if the slave info (resources, // attributes, etc.) change is due to host maintainance associated // with the reboot. ``` What do you think? Feel free to improve on it. src/slave/slave.cpp Lines 6000 (patched) <https://reviews.apache.org/r/60105/#comment253015> `if (state.rebooted)` instead? (Whitespace and variable usage) The above examples ``` Option<ResourcesState> resourcesState = state->resources; Option<SlaveState> slaveState = state->slave; ``` help reduce line length and are used many times. It's not the same situation with `rebooted`. src/slave/slave.cpp Lines 5997-6003 (original), 6000-6016 (patched) <https://reviews.apache.org/r/60105/#comment253031> To avoid going to deep in the branching hierarchy perhaps the following is better? ``` if (!state.rebooted) { return Failure(message); } LOG(WARNING) << "Falling back to recover as a new agent due to error: " << message; // Cleaning up the slave state to avoid any state recovery for the // old agent. slaveState = None(); ``` src/slave/slave.cpp Lines 5997-6002 (original), 6001-6006 (patched) <https://reviews.apache.org/r/60105/#comment253026> This log message and metric increment used to occur outside ``` if (flags.recover == "reconnect" && !(infoCopy == slaveState->info.get())) { ``` but it's now changed. How about we put it directly under ``` if (slaveState.isSome() && slaveState->info.isSome()) { ``` You can find similar handling above with `resourcesState`: ``` if (resourcesState.isSome()) { if (resourcesState->errors > 0) { LOG(WARNING) << "Errors encountered during resources recovery: " << resourcesState->errors; metrics.recovery_errors += resourcesState->errors; } ``` src/slave/slave.cpp Line 6003 (original), 6007 (patched) <https://reviews.apache.org/r/60105/#comment253017> Empty line above. src/slave/slave.cpp Line 6004 (original), 6019 (patched) <https://reviews.apache.org/r/60105/#comment253021> Empty line above comments. - Jiang Yan Xu On June 23, 2017, 10:19 a.m., Megha Sharma wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60105/ > ----------------------------------------------------------- > > (Updated June 23, 2017, 10:19 a.m.) > > > Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu. > > > Bugs: MESOS-6223 > https://issues.apache.org/jira/browse/MESOS-6223 > > > Repository: mesos > > > Description > ------- > > Prior to Mesos 1.4 we bypass the state recovery and > start as a new agent upon reboot. Starting in Mesos 1.4 > we'll attempt to recover the slave state even after reboot > except for when there is a SlaveInfo mismatch. > Here, we cleanup the slave state for a rebooted agent if > there's been a SlaveInfo mismatch during recovery to > ensure that no other state is recovered and the > agent enventually registers as a new agent. > > > Diffs > ----- > > src/slave/slave.cpp f808458849bb9667a91abe18868751d377d36e0c > > > Diff: https://reviews.apache.org/r/60105/diff/7/ > > > Testing > ------- > > make check done together with 60104 and 56895 > > > Thanks, > > Megha Sharma > >
