Re: Review Request 49308: Added agent and scheduler authentication backoff.

2016-07-06 Thread Adam B


> On July 6, 2016, 11:21 a.m., Adam B wrote:
> > src/sched/sched.cpp, line 483
> > 
> >
> > Seems `failedAuthentications` is never 0 here (since you increment it 
> > just before), so you'll never delay by the `[0, b * 2^0]` amount you 
> > suggest in the docs. Should we make this `std::pow(2, 
> > failedAuthentications-1)` or update the doc?
> > Same issue on the agent.

I just updated the doc.
We also discussed adding an initial delay before authenticating, to prevent 
thundering herds. I added TODOs to that effect.


- Adam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49308/#review141041
---


On July 6, 2016, 6:58 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49308/
> ---
> 
> (Updated July 6, 2016, 6:58 a.m.)
> 
> 
> Review request for mesos, Adam B and Vinod Kone.
> 
> 
> Bugs: MESOS-2043
> https://issues.apache.org/jira/browse/MESOS-2043
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The backoff follows to existing pattern for backoff used during agent
> or scheduler registration where we backoff for some random time in an
> interval of increasing length, capped by
> `REGISTER_RETRY_INTERVAL_MAX`.
> 
> 
> Diffs
> -
> 
>   docs/configuration.md 8c8678c7e2251923298b90b7216a4e584faf6b26 
>   docs/endpoints/slave/state.json.md 0f82c1926404e79b281b2ea5f4d0ca21323aeded 
>   docs/endpoints/slave/state.md b34459e8624f0b29e927ff79be7fc845ac88080b 
>   src/sched/constants.hpp df8a1cc83ee3986400d633b2192b6da7fbe6b626 
>   src/sched/flags.hpp 989cebe40c6b4ecc7c4d47f8cf9d968cc795ad3f 
>   src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 
>   src/slave/constants.hpp 668fc47e72d6f1b904aef5d0750b990fe162c9a3 
>   src/slave/flags.hpp ff45876a44ed00fdea36986f052f10e8b8031925 
>   src/slave/flags.cpp 010e78347f72edd5e60628b8bdda8de8b5feed21 
>   src/slave/slave.hpp 484ba758b4c87935aabd2f76a0e654a3c6d09167 
>   src/slave/slave.cpp 36f63bc54bec88f7e7b11ed0cde8bc78314908b2 
> 
> Diff: https://reviews.apache.org/r/49308/diff/
> 
> 
> Testing
> ---
> 
> make check (OS X w/o optimizations).
> 
> Ran agent-related `AuthenticationTest`s in repetition (300 times).
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 49308: Added agent and scheduler authentication backoff.

2016-07-06 Thread Adam B

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49308/#review141041
---


Fix it, then Ship it!




Looks great, except for the mismatch between your math and the docs.


src/sched/sched.cpp (lines 479 - 480)


We generally prefer to wrap after the `=`, but I'll allow this since the 
rhs doesn't all fit on one line



docs/configuration.md (line 974)


s/authentication/authenticate/



docs/configuration.md (line 975)


s/1str/1st/



src/sched/flags.hpp (line 44)


Did you want a `\n` at the end of the line?



src/sched/flags.hpp (lines 45 - 46)


These two lines fit together. I'm not sure why you wrapped.



src/sched/sched.cpp (line 480)


Seems `failedAuthentications` is never 0 here (since you increment it just 
before), so you'll never delay by the `[0, b * 2^0]` amount you suggest in the 
docs. Should we make this `std::pow(2, failedAuthentications-1)` or update the 
doc?
Same issue on the agent.



src/slave/flags.cpp (line 241)


s/authentication/authenticate/



src/slave/flags.cpp (line 242)


s/1str/1st/


- Adam B


On July 6, 2016, 6:58 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49308/
> ---
> 
> (Updated July 6, 2016, 6:58 a.m.)
> 
> 
> Review request for mesos, Adam B and Vinod Kone.
> 
> 
> Bugs: MESOS-2043
> https://issues.apache.org/jira/browse/MESOS-2043
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The backoff follows to existing pattern for backoff used during agent
> or scheduler registration where we backoff for some random time in an
> interval of increasing length, capped by
> `REGISTER_RETRY_INTERVAL_MAX`.
> 
> 
> Diffs
> -
> 
>   docs/configuration.md 8c8678c7e2251923298b90b7216a4e584faf6b26 
>   docs/endpoints/slave/state.json.md 0f82c1926404e79b281b2ea5f4d0ca21323aeded 
>   docs/endpoints/slave/state.md b34459e8624f0b29e927ff79be7fc845ac88080b 
>   src/sched/constants.hpp df8a1cc83ee3986400d633b2192b6da7fbe6b626 
>   src/sched/flags.hpp 989cebe40c6b4ecc7c4d47f8cf9d968cc795ad3f 
>   src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 
>   src/slave/constants.hpp 668fc47e72d6f1b904aef5d0750b990fe162c9a3 
>   src/slave/flags.hpp ff45876a44ed00fdea36986f052f10e8b8031925 
>   src/slave/flags.cpp 010e78347f72edd5e60628b8bdda8de8b5feed21 
>   src/slave/slave.hpp 484ba758b4c87935aabd2f76a0e654a3c6d09167 
>   src/slave/slave.cpp 36f63bc54bec88f7e7b11ed0cde8bc78314908b2 
> 
> Diff: https://reviews.apache.org/r/49308/diff/
> 
> 
> Testing
> ---
> 
> make check (OS X w/o optimizations).
> 
> Ran agent-related `AuthenticationTest`s in repetition (300 times).
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 49308: Added agent and scheduler authentication backoff.

2016-07-06 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49308/
---

(Updated July 6, 2016, 3:58 p.m.)


Review request for mesos, Adam B and Vinod Kone.


Bugs: MESOS-2043
https://issues.apache.org/jira/browse/MESOS-2043


Repository: mesos


Description
---

The backoff follows to existing pattern for backoff used during agent
or scheduler registration where we backoff for some random time in an
interval of increasing length, capped by
`REGISTER_RETRY_INTERVAL_MAX`.


Diffs
-

  docs/configuration.md 8c8678c7e2251923298b90b7216a4e584faf6b26 
  docs/endpoints/slave/state.json.md 0f82c1926404e79b281b2ea5f4d0ca21323aeded 
  docs/endpoints/slave/state.md b34459e8624f0b29e927ff79be7fc845ac88080b 
  src/sched/constants.hpp df8a1cc83ee3986400d633b2192b6da7fbe6b626 
  src/sched/flags.hpp 989cebe40c6b4ecc7c4d47f8cf9d968cc795ad3f 
  src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 
  src/slave/constants.hpp 668fc47e72d6f1b904aef5d0750b990fe162c9a3 
  src/slave/flags.hpp ff45876a44ed00fdea36986f052f10e8b8031925 
  src/slave/flags.cpp 010e78347f72edd5e60628b8bdda8de8b5feed21 
  src/slave/slave.hpp 484ba758b4c87935aabd2f76a0e654a3c6d09167 
  src/slave/slave.cpp 36f63bc54bec88f7e7b11ed0cde8bc78314908b2 

Diff: https://reviews.apache.org/r/49308/diff/


Testing (updated)
---

make check (OS X w/o optimizations).

Ran agent-related `AuthenticationTest`s in repetition (300 times).


Thanks,

Benjamin Bannier



Re: Review Request 49308: Added agent and scheduler authentication backoff.

2016-07-06 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49308/
---

(Updated July 6, 2016, 3:04 p.m.)


Review request for mesos, Adam B and Vinod Kone.


Changes
---

Addressed Adam's review comments: Use dedicated authentication backoff 
constants instead of re-using reregistration ones.


Bugs: MESOS-2043
https://issues.apache.org/jira/browse/MESOS-2043


Repository: mesos


Description
---

The backoff follows to existing pattern for backoff used during agent
or scheduler registration where we backoff for some random time in an
interval of increasing length, capped by
`REGISTER_RETRY_INTERVAL_MAX`.


Diffs (updated)
-

  docs/configuration.md 8c8678c7e2251923298b90b7216a4e584faf6b26 
  docs/endpoints/slave/state.json.md 0f82c1926404e79b281b2ea5f4d0ca21323aeded 
  docs/endpoints/slave/state.md b34459e8624f0b29e927ff79be7fc845ac88080b 
  src/sched/constants.hpp df8a1cc83ee3986400d633b2192b6da7fbe6b626 
  src/sched/flags.hpp 989cebe40c6b4ecc7c4d47f8cf9d968cc795ad3f 
  src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 
  src/slave/constants.hpp 668fc47e72d6f1b904aef5d0750b990fe162c9a3 
  src/slave/flags.hpp ff45876a44ed00fdea36986f052f10e8b8031925 
  src/slave/flags.cpp 010e78347f72edd5e60628b8bdda8de8b5feed21 
  src/slave/slave.hpp 484ba758b4c87935aabd2f76a0e654a3c6d09167 
  src/slave/slave.cpp 36f63bc54bec88f7e7b11ed0cde8bc78314908b2 

Diff: https://reviews.apache.org/r/49308/diff/


Testing
---

make check (OS X w/o optimizations).


Thanks,

Benjamin Bannier



Re: Review Request 49308: Added agent and scheduler authentication backoff.

2016-07-06 Thread Benjamin Bannier


> On July 5, 2016, 12:20 p.m., Adam B wrote:
> > src/slave/slave.cpp, lines 984-987
> > 
> >
> > Does your solution address this trickiness after a master failover?

AFAICT even the existing code exhibits that tricky situation when two 
`defer`'ed `Slave::detected`s are in the queue (like e.g., after a master fails 
over before we finish authenticating). In that case the seconds one to execute 
`Slave::authenticate` will see `Some` `authenticating` an stop its attempt.

Now with the added backoff where we `delay`'ed retries we might end it a 
similar setup where two concurrent calls to `Slave::authenticate` are in the 
queue. Like before, the second one to execute `Slave::authenticate` should stop 
its attempt.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49308/#review140745
---


On July 6, 2016, 3:04 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49308/
> ---
> 
> (Updated July 6, 2016, 3:04 p.m.)
> 
> 
> Review request for mesos, Adam B and Vinod Kone.
> 
> 
> Bugs: MESOS-2043
> https://issues.apache.org/jira/browse/MESOS-2043
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The backoff follows to existing pattern for backoff used during agent
> or scheduler registration where we backoff for some random time in an
> interval of increasing length, capped by
> `REGISTER_RETRY_INTERVAL_MAX`.
> 
> 
> Diffs
> -
> 
>   docs/configuration.md 8c8678c7e2251923298b90b7216a4e584faf6b26 
>   docs/endpoints/slave/state.json.md 0f82c1926404e79b281b2ea5f4d0ca21323aeded 
>   docs/endpoints/slave/state.md b34459e8624f0b29e927ff79be7fc845ac88080b 
>   src/sched/constants.hpp df8a1cc83ee3986400d633b2192b6da7fbe6b626 
>   src/sched/flags.hpp 989cebe40c6b4ecc7c4d47f8cf9d968cc795ad3f 
>   src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 
>   src/slave/constants.hpp 668fc47e72d6f1b904aef5d0750b990fe162c9a3 
>   src/slave/flags.hpp ff45876a44ed00fdea36986f052f10e8b8031925 
>   src/slave/flags.cpp 010e78347f72edd5e60628b8bdda8de8b5feed21 
>   src/slave/slave.hpp 484ba758b4c87935aabd2f76a0e654a3c6d09167 
>   src/slave/slave.cpp 36f63bc54bec88f7e7b11ed0cde8bc78314908b2 
> 
> Diff: https://reviews.apache.org/r/49308/diff/
> 
> 
> Testing
> ---
> 
> make check (OS X w/o optimizations).
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 49308: Added agent and scheduler authentication backoff.

2016-07-05 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49308/#review140807
---



Patch looks great!

Reviews applied: [49308]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On June 29, 2016, 7:08 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49308/
> ---
> 
> (Updated June 29, 2016, 7:08 a.m.)
> 
> 
> Review request for mesos, Adam B and Vinod Kone.
> 
> 
> Bugs: MESOS-2043
> https://issues.apache.org/jira/browse/MESOS-2043
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The backoff follows to existing pattern for backoff used during agent
> or scheduler registration where we backoff for some random time in an
> interval of increasing length, capped by
> `REGISTER_RETRY_INTERVAL_MAX`.
> 
> 
> Diffs
> -
> 
>   src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 
>   src/slave/slave.hpp 2afd7d152dcd2f5390014cd7bd4e926b62c292d1 
>   src/slave/slave.cpp da643e6e50b2f313705d2f862c961291aa5d2f22 
> 
> Diff: https://reviews.apache.org/r/49308/diff/
> 
> 
> Testing
> ---
> 
> make check (OS X w/o optimizations).
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 49308: Added agent and scheduler authentication backoff.

2016-07-05 Thread Adam B

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49308/#review140745
---



Yes, this is more like what I had in mind.
- I would like to see you use authentication-specific flags/constants instead 
of registration_backoff_factor and REGISTER_RETRY_INTERVAL_MAX
- How did you test this?


src/slave/slave.cpp 


Does your solution address this trickiness after a master failover?


- Adam B


On June 29, 2016, 12:08 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49308/
> ---
> 
> (Updated June 29, 2016, 12:08 a.m.)
> 
> 
> Review request for mesos, Adam B and Vinod Kone.
> 
> 
> Bugs: MESOS-2043
> https://issues.apache.org/jira/browse/MESOS-2043
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The backoff follows to existing pattern for backoff used during agent
> or scheduler registration where we backoff for some random time in an
> interval of increasing length, capped by
> `REGISTER_RETRY_INTERVAL_MAX`.
> 
> 
> Diffs
> -
> 
>   src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 
>   src/slave/slave.hpp 2afd7d152dcd2f5390014cd7bd4e926b62c292d1 
>   src/slave/slave.cpp da643e6e50b2f313705d2f862c961291aa5d2f22 
> 
> Diff: https://reviews.apache.org/r/49308/diff/
> 
> 
> Testing
> ---
> 
> make check (OS X w/o optimizations).
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 49308: Added agent and scheduler authentication backoff.

2016-06-29 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49308/
---

(Updated June 29, 2016, 9:08 a.m.)


Review request for mesos, Adam B and Vinod Kone.


Changes
---

Removed stale comments.


Bugs: MESOS-2043
https://issues.apache.org/jira/browse/MESOS-2043


Repository: mesos


Description
---

The backoff follows to existing pattern for backoff used during agent
or scheduler registration where we backoff for some random time in an
interval of increasing length, capped by
`REGISTER_RETRY_INTERVAL_MAX`.


Diffs (updated)
-

  src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 
  src/slave/slave.hpp 2afd7d152dcd2f5390014cd7bd4e926b62c292d1 
  src/slave/slave.cpp da643e6e50b2f313705d2f862c961291aa5d2f22 

Diff: https://reviews.apache.org/r/49308/diff/


Testing
---

make check (OS X w/o optimizations).


Thanks,

Benjamin Bannier



Re: Review Request 49308: Added agent and scheduler authentication backoff.

2016-06-28 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49308/#review139906
---



Patch looks great!

Reviews applied: [49308]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker_build.sh

- Mesos ReviewBot


On June 28, 2016, 1:16 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49308/
> ---
> 
> (Updated June 28, 2016, 1:16 p.m.)
> 
> 
> Review request for mesos, Adam B and Vinod Kone.
> 
> 
> Bugs: MESOS-2043
> https://issues.apache.org/jira/browse/MESOS-2043
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The backoff follows to existing pattern for backoff used during agent
> or scheduler registration where we backoff for some random time in an
> interval of increasing length, capped by
> `REGISTER_RETRY_INTERVAL_MAX`.
> 
> 
> Diffs
> -
> 
>   src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 
>   src/slave/slave.hpp 2afd7d152dcd2f5390014cd7bd4e926b62c292d1 
>   src/slave/slave.cpp da643e6e50b2f313705d2f862c961291aa5d2f22 
> 
> Diff: https://reviews.apache.org/r/49308/diff/
> 
> 
> Testing
> ---
> 
> make check (OS X w/o optimizations).
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Review Request 49308: Added agent and scheduler authentication backoff.

2016-06-28 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49308/
---

Review request for mesos, Adam B and Vinod Kone.


Bugs: MESOS-2043
https://issues.apache.org/jira/browse/MESOS-2043


Repository: mesos


Description
---

The backoff follows to existing pattern for backoff used during agent
or scheduler registration where we backoff for some random time in an
interval of increasing length, capped by
`REGISTER_RETRY_INTERVAL_MAX`.


Diffs
-

  src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 
  src/slave/slave.hpp 2afd7d152dcd2f5390014cd7bd4e926b62c292d1 
  src/slave/slave.cpp da643e6e50b2f313705d2f862c961291aa5d2f22 

Diff: https://reviews.apache.org/r/49308/diff/


Testing
---

make check (OS X w/o optimizations).


Thanks,

Benjamin Bannier