Re: Review Request 68346: Fixed a backoff overflow bug in scheduler authentication retry logic.

2018-08-16 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68346/#review207435
---


Ship it!




Ship It!

- Benjamin Mahler


On Aug. 16, 2018, 9:03 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68346/
> ---
> 
> (Updated Aug. 16, 2018, 9:03 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9147
> https://issues.apache.org/jira/browse/MESOS-9147
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixed the backoff time calculation
> overflow bug described in MESOS-9147.
> 
> The old approach times out an authentication request after
> `flags.authentication_timeout` and then retries after some
> backoff time. This is not optimal because, if the scheduler
> is going to backoff some time before retry, we might as well
> wait that long for the previous authentication request
> (instead of timeout early).
> 
> This patch combines the authentication timeout and
> authentication retry backoff interval into a single
> wait time interval. Now scheduler will timeout the previous
> authentication request after the wait time interval and
> then immediately retry.
> 
> 
> Diffs
> -
> 
>   src/sched/constants.hpp 9edb25b38ba8e7e1dbbb4ce4c957bb6bd9f4af81 
>   src/sched/flags.hpp 2492665d44c424ff9f4f73c796520ebc51abbdff 
>   src/sched/sched.cpp 4de76225c73c9c17904512f5a72303d93ec915a7 
> 
> 
> Diff: https://reviews.apache.org/r/68346/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 68346: Fixed a backoff overflow bug in scheduler authentication retry logic.

2018-08-16 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68346/
---

(Updated Aug. 16, 2018, 2:03 p.m.)


Review request for mesos, Benjamin Mahler and Gastón Kleiman.


Changes
---

Added flag description about max timeout.


Bugs: MESOS-9147
https://issues.apache.org/jira/browse/MESOS-9147


Repository: mesos


Description
---

This patch fixed the backoff time calculation
overflow bug described in MESOS-9147.

The old approach times out an authentication request after
`flags.authentication_timeout` and then retries after some
backoff time. This is not optimal because, if the scheduler
is going to backoff some time before retry, we might as well
wait that long for the previous authentication request
(instead of timeout early).

This patch combines the authentication timeout and
authentication retry backoff interval into a single
wait time interval. Now scheduler will timeout the previous
authentication request after the wait time interval and
then immediately retry.


Diffs (updated)
-

  src/sched/constants.hpp 9edb25b38ba8e7e1dbbb4ce4c957bb6bd9f4af81 
  src/sched/flags.hpp 2492665d44c424ff9f4f73c796520ebc51abbdff 
  src/sched/sched.cpp 4de76225c73c9c17904512f5a72303d93ec915a7 


Diff: https://reviews.apache.org/r/68346/diff/5/

Changes: https://reviews.apache.org/r/68346/diff/4-5/


Testing
---

make check


Thanks,

Meng Zhu



Re: Review Request 68346: Fixed a backoff overflow bug in scheduler authentication retry logic.

2018-08-16 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68346/
---

(Updated Aug. 16, 2018, 1:46 p.m.)


Review request for mesos, Benjamin Mahler and Gastón Kleiman.


Bugs: MESOS-9147
https://issues.apache.org/jira/browse/MESOS-9147


Repository: mesos


Description
---

This patch fixed the backoff time calculation
overflow bug described in MESOS-9147.

The old approach times out an authentication request after
`flags.authentication_timeout` and then retries after some
backoff time. This is not optimal because, if the scheduler
is going to backoff some time before retry, we might as well
wait that long for the previous authentication request
(instead of timeout early).

This patch combines the authentication timeout and
authentication retry backoff interval into a single
wait time interval. Now scheduler will timeout the previous
authentication request after the wait time interval and
then immediately retry.


Diffs (updated)
-

  src/sched/constants.hpp 9edb25b38ba8e7e1dbbb4ce4c957bb6bd9f4af81 
  src/sched/flags.hpp 2492665d44c424ff9f4f73c796520ebc51abbdff 
  src/sched/sched.cpp 4de76225c73c9c17904512f5a72303d93ec915a7 


Diff: https://reviews.apache.org/r/68346/diff/4/

Changes: https://reviews.apache.org/r/68346/diff/3-4/


Testing
---

make check


Thanks,

Meng Zhu



Re: Review Request 68346: Fixed a backoff overflow bug in scheduler authentication retry logic.

2018-08-16 Thread Meng Zhu


> On Aug. 16, 2018, 12:24 p.m., Benjamin Mahler wrote:
> > src/sched/flags.hpp
> > Lines 122-125 (original), 122-125 (patched)
> > 
> >
> > Either in this review or the next one, we can rename to min and use a 
> > deprecated alias for this old name:
> > 
> > E.g.
> > 
> > https://github.com/apache/mesos/blob/1.6.1/src/slave/flags.cpp#L441

Will do in the next review.


> On Aug. 16, 2018, 12:24 p.m., Benjamin Mahler wrote:
> > src/sched/sched.cpp
> > Lines 53 (patched)
> > 
> >
> > Actually, Future::after doesn't need this include, did you use the 
> > free-standing after?

Removed.


> On Aug. 16, 2018, 12:24 p.m., Benjamin Mahler wrote:
> > src/sched/sched.cpp
> > Lines 425-426 (patched)
> > 
> >
> > Any reason not to use the same indentation as the agent logic here?

Both are formatted by the formater. Here we are more indented to start with 
comparing to the agent. Thus we can not have the same indentation.


- Meng


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68346/#review207424
---


On Aug. 16, 2018, 1:46 p.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68346/
> ---
> 
> (Updated Aug. 16, 2018, 1:46 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9147
> https://issues.apache.org/jira/browse/MESOS-9147
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixed the backoff time calculation
> overflow bug described in MESOS-9147.
> 
> The old approach times out an authentication request after
> `flags.authentication_timeout` and then retries after some
> backoff time. This is not optimal because, if the scheduler
> is going to backoff some time before retry, we might as well
> wait that long for the previous authentication request
> (instead of timeout early).
> 
> This patch combines the authentication timeout and
> authentication retry backoff interval into a single
> wait time interval. Now scheduler will timeout the previous
> authentication request after the wait time interval and
> then immediately retry.
> 
> 
> Diffs
> -
> 
>   src/sched/constants.hpp 9edb25b38ba8e7e1dbbb4ce4c957bb6bd9f4af81 
>   src/sched/flags.hpp 2492665d44c424ff9f4f73c796520ebc51abbdff 
>   src/sched/sched.cpp 4de76225c73c9c17904512f5a72303d93ec915a7 
> 
> 
> Diff: https://reviews.apache.org/r/68346/diff/4/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 68346: Fixed a backoff overflow bug in scheduler authentication retry logic.

2018-08-16 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68346/#review207424
---



Looks good! Just an issue in the future.after, as well as the flag description. 
We can also rename the flag here and use a deprecated alias (see below).


src/sched/constants.hpp
Lines 41-42 (original), 41-42 (patched)


is authenticating



src/sched/flags.hpp
Lines 41-45 (original), 41-45 (patched)


Looks like in both the agent and scheduler patches, this review should 
update the approach description here in the flag help message?



src/sched/flags.hpp
Lines 122-125 (original), 122-125 (patched)


Either in this review or the next one, we can rename to min and use a 
deprecated alias for this old name:

E.g.

https://github.com/apache/mesos/blob/1.6.1/src/slave/flags.cpp#L441



src/sched/sched.cpp
Lines 53 (patched)


Actually, Future::after doesn't need this include, did you use the 
free-standing after?



src/sched/sched.cpp
Lines 109 (patched)


I don't see a use of the free-standing after in your diff?



src/sched/sched.cpp
Lines 425-426 (patched)


Any reason not to use the same indentation as the agent logic here?



src/sched/sched.cpp
Lines 433-440 (original), 444-458 (patched)


accessing this->running here without a defer seems unsafe? looks like it 
will crash if this gets destructed before the timeout

Probably we can just do a future.discard here, it should be ok to discard 
the future if running is false and `_authenticate` will check running.


- Benjamin Mahler


On Aug. 16, 2018, 12:11 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68346/
> ---
> 
> (Updated Aug. 16, 2018, 12:11 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9147
> https://issues.apache.org/jira/browse/MESOS-9147
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixed the backoff time calculation
> overflow bug described in MESOS-9147.
> 
> The old approach times out an authentication request after
> `flags.authentication_timeout` and then retries after some
> backoff time. This is not optimal because, if the scheduler
> is going to backoff some time before retry, we might as well
> wait that long for the previous authentication request
> (instead of timeout early).
> 
> This patch combines the authentication timeout and
> authentication retry backoff interval into a single
> wait time interval. Now scheduler will timeout the previous
> authentication request after the wait time interval and
> then immediately retry.
> 
> 
> Diffs
> -
> 
>   src/sched/constants.hpp 9edb25b38ba8e7e1dbbb4ce4c957bb6bd9f4af81 
>   src/sched/flags.hpp 2492665d44c424ff9f4f73c796520ebc51abbdff 
>   src/sched/sched.cpp 4de76225c73c9c17904512f5a72303d93ec915a7 
> 
> 
> Diff: https://reviews.apache.org/r/68346/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 68346: Fixed a backoff overflow bug in scheduler authentication retry logic.

2018-08-15 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68346/
---

(Updated Aug. 15, 2018, 5:11 p.m.)


Review request for mesos, Benjamin Mahler and Gastón Kleiman.


Bugs: MESOS-9147
https://issues.apache.org/jira/browse/MESOS-9147


Repository: mesos


Description
---

This patch fixed the backoff time calculation
overflow bug described in MESOS-9147.

The old approach times out an authentication request after
`flags.authentication_timeout` and then retries after some
backoff time. This is not optimal because, if the scheduler
is going to backoff some time before retry, we might as well
wait that long for the previous authentication request
(instead of timeout early).

This patch combines the authentication timeout and
authentication retry backoff interval into a single
wait time interval. Now scheduler will timeout the previous
authentication request after the wait time interval and
then immediately retry.


Diffs (updated)
-

  src/sched/constants.hpp 9edb25b38ba8e7e1dbbb4ce4c957bb6bd9f4af81 
  src/sched/flags.hpp 2492665d44c424ff9f4f73c796520ebc51abbdff 
  src/sched/sched.cpp 4de76225c73c9c17904512f5a72303d93ec915a7 


Diff: https://reviews.apache.org/r/68346/diff/3/

Changes: https://reviews.apache.org/r/68346/diff/2-3/


Testing
---

make check


Thanks,

Meng Zhu



Re: Review Request 68346: Fixed a backoff overflow bug in scheduler authentication retry logic.

2018-08-15 Thread Meng Zhu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68346/
---

(Updated Aug. 14, 2018, 11:10 p.m.)


Review request for mesos, Benjamin Mahler and Gastón Kleiman.


Bugs: MESOS-9147
https://issues.apache.org/jira/browse/MESOS-9147


Repository: mesos


Description
---

This patch fixed the backoff time calculation
overflow bug described in MESOS-9147.

The old approach times out an authentication request after
`flags.authentication_timeout` and then retries after some
backoff time. This is not optimal because, if the scheduler
is going to backoff some time before retry, we might as well
wait that long for the previous authentication request
(instead of timeout early).

This patch combines the authentication timeout and
authentication retry backoff interval into a single
wait time interval. Now scheduler will timeout the previous
authentication request after the wait time interval and
then immediately retry.


Diffs (updated)
-

  src/sched/constants.hpp 9edb25b38ba8e7e1dbbb4ce4c957bb6bd9f4af81 
  src/sched/sched.cpp 4de76225c73c9c17904512f5a72303d93ec915a7 


Diff: https://reviews.apache.org/r/68346/diff/2/

Changes: https://reviews.apache.org/r/68346/diff/1-2/


Testing
---

make check


Thanks,

Meng Zhu