Re: Review Request 65482: Fixed allocator bookkeeping of pending operations on master failover.

2018-02-05 Thread Benjamin Bannier


> On Feb. 3, 2018, 12:05 a.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 7596 (patched)
> > 
> >
> > Does this mean we will not correctly account for resources when a 
> > framework has performed operations on an agent, but has not launched tasks 
> > there?

Yes, that is what would happen currently. To work around that we would need to 
add a channel to update the slave about framework infos before accepting 
operations.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65482/#review196753
---


On Feb. 2, 2018, 3:07 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65482/
> ---
> 
> (Updated Feb. 2, 2018, 3:07 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8536
> https://issues.apache.org/jira/browse/MESOS-8536
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixes a bug where pending operations on a resource provider
> resources where not properly accounted for in the allocator. This lead
> to assertion failures when the operation became terminal and we
> attempted to recover the used resources.
> 
> Since framework information is only remembered on agents if the
> framework launched a task, there exists the possibility that a master
> learns about an allocation to a framework unknown to it, yet. To
> accommodate that do not bookkeep allocations to unknown frameworks in
> the allocator and update code handling of terminal operation updates
> accordingly.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> f33ff767dcb93556beb696c96f8cfc17baccb05e 
>   src/master/master.cpp cc2685a6bc14103c639ce776cf1c912361e93381 
> 
> 
> Diff: https://reviews.apache.org/r/65482/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`, also tested with a version of the test added in r/65045 which 
> triggered this issue.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 65482: Fixed allocator bookkeeping of pending operations on master failover.

2018-02-05 Thread Benjamin Bannier


> On Feb. 3, 2018, 12:06 a.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 7594-7597 (original), 7609-7612 (patched)
> > 
> >
> > Is this function now only called with resources from already-existing 
> > frameworks? Does that mean that we can update the code here? 
> > https://github.com/apache/mesos/blob/5a1433576eca20f15f1ea309fc202f4bbaf3b6c7/src/master/allocator/mesos/hierarchical.cpp#L690-L703

I am not sure we can make sure this will always happen in general. As a 
precaution I think it is still a good idea to leave this check in the 
allocator, if only to document allocator-local invariants.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65482/#review196752
---


On Feb. 2, 2018, 3:07 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65482/
> ---
> 
> (Updated Feb. 2, 2018, 3:07 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8536
> https://issues.apache.org/jira/browse/MESOS-8536
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixes a bug where pending operations on a resource provider
> resources where not properly accounted for in the allocator. This lead
> to assertion failures when the operation became terminal and we
> attempted to recover the used resources.
> 
> Since framework information is only remembered on agents if the
> framework launched a task, there exists the possibility that a master
> learns about an allocation to a framework unknown to it, yet. To
> accommodate that do not bookkeep allocations to unknown frameworks in
> the allocator and update code handling of terminal operation updates
> accordingly.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> f33ff767dcb93556beb696c96f8cfc17baccb05e 
>   src/master/master.cpp cc2685a6bc14103c639ce776cf1c912361e93381 
> 
> 
> Diff: https://reviews.apache.org/r/65482/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`, also tested with a version of the test added in r/65045 which 
> triggered this issue.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 65482: Fixed allocator bookkeeping of pending operations on master failover.

2018-02-02 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65482/#review196752
---




src/master/master.cpp
Lines 7594-7597 (original), 7609-7612 (patched)


Is this function now only called with resources from already-existing 
frameworks? Does that mean that we can update the code here? 
https://github.com/apache/mesos/blob/5a1433576eca20f15f1ea309fc202f4bbaf3b6c7/src/master/allocator/mesos/hierarchical.cpp#L690-L703


- Greg Mann


On Feb. 2, 2018, 2:07 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65482/
> ---
> 
> (Updated Feb. 2, 2018, 2:07 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8536
> https://issues.apache.org/jira/browse/MESOS-8536
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixes a bug where pending operations on a resource provider
> resources where not properly accounted for in the allocator. This lead
> to assertion failures when the operation became terminal and we
> attempted to recover the used resources.
> 
> Since framework information is only remembered on agents if the
> framework launched a task, there exists the possibility that a master
> learns about an allocation to a framework unknown to it, yet. To
> accommodate that do not bookkeep allocations to unknown frameworks in
> the allocator and update code handling of terminal operation updates
> accordingly.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> f33ff767dcb93556beb696c96f8cfc17baccb05e 
>   src/master/master.cpp cc2685a6bc14103c639ce776cf1c912361e93381 
> 
> 
> Diff: https://reviews.apache.org/r/65482/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`, also tested with a version of the test added in r/65045 which 
> triggered this issue.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 65482: Fixed allocator bookkeeping of pending operations on master failover.

2018-02-02 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65482/#review196753
---




src/master/master.cpp
Lines 7596 (patched)


Does this mean we will not correctly account for resources when a framework 
has performed operations on an agent, but has not launched tasks there?


- Greg Mann


On Feb. 2, 2018, 2:07 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65482/
> ---
> 
> (Updated Feb. 2, 2018, 2:07 p.m.)
> 
> 
> Review request for mesos, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8536
> https://issues.apache.org/jira/browse/MESOS-8536
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixes a bug where pending operations on a resource provider
> resources where not properly accounted for in the allocator. This lead
> to assertion failures when the operation became terminal and we
> attempted to recover the used resources.
> 
> Since framework information is only remembered on agents if the
> framework launched a task, there exists the possibility that a master
> learns about an allocation to a framework unknown to it, yet. To
> accommodate that do not bookkeep allocations to unknown frameworks in
> the allocator and update code handling of terminal operation updates
> accordingly.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> f33ff767dcb93556beb696c96f8cfc17baccb05e 
>   src/master/master.cpp cc2685a6bc14103c639ce776cf1c912361e93381 
> 
> 
> Diff: https://reviews.apache.org/r/65482/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`, also tested with a version of the test added in r/65045 which 
> triggered this issue.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 65482: Fixed allocator bookkeeping of pending operations on master failover.

2018-02-02 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65482/#review196744
---



FAIL: The file 'D:\DCOS\mesos\build-output\logs\apply-review-65482-stdout.log' 
already exists.

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/65482

- Mesos Reviewbot Windows


On Feb. 2, 2018, 6:07 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65482/
> ---
> 
> (Updated Feb. 2, 2018, 6:07 a.m.)
> 
> 
> Review request for mesos, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8536
> https://issues.apache.org/jira/browse/MESOS-8536
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch fixes a bug where pending operations on a resource provider
> resources where not properly accounted for in the allocator. This lead
> to assertion failures when the operation became terminal and we
> attempted to recover the used resources.
> 
> Since framework information is only remembered on agents if the
> framework launched a task, there exists the possibility that a master
> learns about an allocation to a framework unknown to it, yet. To
> accommodate that do not bookkeep allocations to unknown frameworks in
> the allocator and update code handling of terminal operation updates
> accordingly.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> f33ff767dcb93556beb696c96f8cfc17baccb05e 
>   src/master/master.cpp cc2685a6bc14103c639ce776cf1c912361e93381 
> 
> 
> Diff: https://reviews.apache.org/r/65482/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`, also tested with a version of the test added in r/65045 which 
> triggered this issue.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Review Request 65482: Fixed allocator bookkeeping of pending operations on master failover.

2018-02-02 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65482/
---

Review request for mesos, Jie Yu and Jan Schlicht.


Bugs: MESOS-8536
https://issues.apache.org/jira/browse/MESOS-8536


Repository: mesos


Description
---

This patch fixes a bug where pending operations on a resource provider
resources where not properly accounted for in the allocator. This lead
to assertion failures when the operation became terminal and we
attempted to recover the used resources.

Since framework information is only remembered on agents if the
framework launched a task, there exists the possibility that a master
learns about an allocation to a framework unknown to it, yet. To
accommodate that do not bookkeep allocations to unknown frameworks in
the allocator and update code handling of terminal operation updates
accordingly.


Diffs
-

  src/master/allocator/mesos/hierarchical.cpp 
f33ff767dcb93556beb696c96f8cfc17baccb05e 
  src/master/master.cpp cc2685a6bc14103c639ce776cf1c912361e93381 


Diff: https://reviews.apache.org/r/65482/diff/1/


Testing
---

`make check`, also tested with a version of the test added in r/65045 which 
triggered this issue.


Thanks,

Benjamin Bannier