Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-06-04 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204266
---


Ship it!




Ship It!

- Joseph Wu


On June 1, 2018, 7:17 a.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated June 1, 2018, 7:17 a.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-06-04 Thread Joseph Wu


> On May 31, 2018, 9:41 a.m., Vinod Kone wrote:
> > Can you add a unit test for this?
> 
> Benno Evers wrote:
> It's tricky because we need very precise control over the scheduling, and 
> I'm not sure our testing infrastructure provides it. But I'll look into it.
> 
> Vinod Kone wrote:
> I see.  The bug is in the allocator, so you cannot use a mock allocator 
> unfortunately for control. Consider pausing the clock to have better control 
> in the test.
> 
> Benno Evers wrote:
> After discussing with Benjamin Bannier, we came to the conclusion that 
> it's currently not possible to write a unit test for this scenario, because 
> we're lacking the capability to intercept a dispatch and re-insert it into 
> the event queue at a later time.
> 
> Joseph Wu wrote:
> I gave writing the test a shot... and I think it might be possible, but 
> the resulting test would be too fragile to be a regression test.
> 
> Here's my (not working yet) attempt: 
> https://github.com/kaysoky/mesos/commit/29c6a1807d65d01440b7c67a73062ae9af892afe
> 
> Benno Evers wrote:
> Do you plan to continue working on that, or should we go ahead and commit 
> the fix?

I'll commit this patch shortly.

The test is more of an experiment to see how bad a test for this scenario would 
look like :D


- Joseph


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
---


On June 1, 2018, 7:17 a.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated June 1, 2018, 7:17 a.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-06-04 Thread Benno Evers


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > Can you add a unit test for this?
> 
> Benno Evers wrote:
> It's tricky because we need very precise control over the scheduling, and 
> I'm not sure our testing infrastructure provides it. But I'll look into it.
> 
> Vinod Kone wrote:
> I see.  The bug is in the allocator, so you cannot use a mock allocator 
> unfortunately for control. Consider pausing the clock to have better control 
> in the test.
> 
> Benno Evers wrote:
> After discussing with Benjamin Bannier, we came to the conclusion that 
> it's currently not possible to write a unit test for this scenario, because 
> we're lacking the capability to intercept a dispatch and re-insert it into 
> the event queue at a later time.
> 
> Joseph Wu wrote:
> I gave writing the test a shot... and I think it might be possible, but 
> the resulting test would be too fragile to be a regression test.
> 
> Here's my (not working yet) attempt: 
> https://github.com/kaysoky/mesos/commit/29c6a1807d65d01440b7c67a73062ae9af892afe

Do you plan to continue working on that, or should we go ahead and commit the 
fix?


- Benno


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
---


On June 1, 2018, 2:17 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated June 1, 2018, 2:17 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-06-01 Thread Joseph Wu


> On May 31, 2018, 9:41 a.m., Vinod Kone wrote:
> > Can you add a unit test for this?
> 
> Benno Evers wrote:
> It's tricky because we need very precise control over the scheduling, and 
> I'm not sure our testing infrastructure provides it. But I'll look into it.
> 
> Vinod Kone wrote:
> I see.  The bug is in the allocator, so you cannot use a mock allocator 
> unfortunately for control. Consider pausing the clock to have better control 
> in the test.
> 
> Benno Evers wrote:
> After discussing with Benjamin Bannier, we came to the conclusion that 
> it's currently not possible to write a unit test for this scenario, because 
> we're lacking the capability to intercept a dispatch and re-insert it into 
> the event queue at a later time.

I gave writing the test a shot... and I think it might be possible, but the 
resulting test would be too fragile to be a regression test.

Here's my (not working yet) attempt: 
https://github.com/kaysoky/mesos/commit/29c6a1807d65d01440b7c67a73062ae9af892afe


- Joseph


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
---


On June 1, 2018, 7:17 a.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated June 1, 2018, 7:17 a.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-06-01 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204192
---



Patch looks great!

Reviews applied: [67403]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On June 1, 2018, 2:17 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated June 1, 2018, 2:17 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-06-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204186
---



PASS: Mesos patch 67403 was successfully built and tested.

Reviews applied: `['67403']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67403

- Mesos Reviewbot Windows


On June 1, 2018, 2:17 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated June 1, 2018, 2:17 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-06-01 Thread Benno Evers


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > Can you add a unit test for this?
> 
> Benno Evers wrote:
> It's tricky because we need very precise control over the scheduling, and 
> I'm not sure our testing infrastructure provides it. But I'll look into it.
> 
> Vinod Kone wrote:
> I see.  The bug is in the allocator, so you cannot use a mock allocator 
> unfortunately for control. Consider pausing the clock to have better control 
> in the test.

After discussing with Benjamin Bannier, we came to the conclusion that it's 
currently not possible to write a unit test for this scenario, because we're 
lacking the capability to intercept a dispatch and re-insert it into the event 
queue at a later time.


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Lines 9466 (patched)
> > 
> >
> > s/allocator/master/ ? we care about master invariant here right?

I think both formulations are correct, depending on how you look at it.


- Benno


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
---


On June 1, 2018, 2:17 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated June 1, 2018, 2:17 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-06-01 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/
---

(Updated June 1, 2018, 2:17 p.m.)


Review request for mesos, Joseph Wu and Vinod Kone.


Bugs: MESOS-7966
https://issues.apache.org/jira/browse/MESOS-7966


Repository: mesos


Description
---

When executing the `Master::inverseOffers()` callback, it
could happen that the maintenance window the reverse offer
referred to was already removed by a concurrent call to
to the maintenance endpoint of Mesos.

In this case, we must not send out a reverse offer, because
having outstanding inverse offers for an agent without
any scheduled maintenance window will lead to a crash in
the allocator when attempting to remove this offer.


Diffs (updated)
-

  src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 


Diff: https://reviews.apache.org/r/67403/diff/2/

Changes: https://reviews.apache.org/r/67403/diff/1-2/


Testing
---

`make check`

Set up the reproduction environment locally and ran `while :; python call.py; 
done` for about a minute. (see linked ticket)


Thanks,

Benno Evers



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-06-01 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204174
---



Patch looks great!

Reviews applied: [67403]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On May 31, 2018, 4:32 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated May 31, 2018, 4:32 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-05-31 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204138
---



PASS: Mesos patch 67403 was successfully built and tested.

Reviews applied: `['67403']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67403

- Mesos Reviewbot Windows


On May 31, 2018, 4:32 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated May 31, 2018, 4:32 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-05-31 Thread Vinod Kone


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > Can you add a unit test for this?
> 
> Benno Evers wrote:
> It's tricky because we need very precise control over the scheduling, and 
> I'm not sure our testing infrastructure provides it. But I'll look into it.

I see.  The bug is in the allocator, so you cannot use a mock allocator 
unfortunately for control. Consider pausing the clock to have better control in 
the test.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
---


On May 31, 2018, 4:32 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated May 31, 2018, 4:32 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-05-31 Thread Benno Evers


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > Can you add a unit test for this?

It's tricky because we need very precise control over the scheduling, and I'm 
not sure our testing infrastructure provides it. But I'll look into it.


- Benno


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
---


On May 31, 2018, 4:32 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated May 31, 2018, 4:32 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-05-31 Thread Vinod Kone


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Line 9459 (original), 9459 (patched)
> > 
> >
> > not yours, but can you log framework id here too?

s/framework id/*framework/


> On May 31, 2018, 4:41 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Lines 9470 (patched)
> > 
> >
> > log framework id?

s/framework id/*framework/


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
---


On May 31, 2018, 4:32 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated May 31, 2018, 4:32 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 67403: Handled race condition when removing maintenance windows.

2018-05-31 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67403/#review204121
---



Can you add a unit test for this?


src/master/master.cpp
Line 9459 (original), 9459 (patched)


not yours, but can you log framework id here too?



src/master/master.cpp
Lines 9466 (patched)


s/allocator/master/ ? we care about master invariant here right?



src/master/master.cpp
Lines 9470 (patched)


log framework id?


- Vinod Kone


On May 31, 2018, 4:32 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67403/
> ---
> 
> (Updated May 31, 2018, 4:32 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Bugs: MESOS-7966
> https://issues.apache.org/jira/browse/MESOS-7966
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When executing the `Master::inverseOffers()` callback, it
> could happen that the maintenance window the reverse offer
> referred to was already removed by a concurrent call to
> to the maintenance endpoint of Mesos.
> 
> In this case, we must not send out a reverse offer, because
> having outstanding inverse offers for an agent without
> any scheduled maintenance window will lead to a crash in
> the allocator when attempting to remove this offer.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ba3f8746ea393c8655fcd5ceaace099f68df0b19 
> 
> 
> Diff: https://reviews.apache.org/r/67403/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> Set up the reproduction environment locally and ran `while :; python call.py; 
> done` for about a minute. (see linked ticket)
> 
> 
> Thanks,
> 
> Benno Evers
> 
>