Re: Review Request 65975: Sequentialize reconciliations of storage pools in SLRP.

2018-06-18 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65975/#review204920
---


Fix it, then Ship it!





src/resource_provider/storage/provider.cpp
Lines 1312 (patched)


We could break after `return` for readability.



src/resource_provider/storage/provider.cpp
Line 2927 (original), 2974 (patched)


We can codify this comment,

CHECK(!protobuf::isSpeculativeOperation(operation.info())) << "Operation " 
<< operation << " is speculative";


- Benjamin Bannier


On June 14, 2018, 2:06 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65975/
> ---
> 
> (Updated June 14, 2018, 2:06 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Jie Yu.
> 
> 
> Bugs: MESOS-8825
> https://issues.apache.org/jira/browse/MESOS-8825
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The storage pools needs to be reconciled in the following two scenarios:
> 
> 1. When there is a change in the set of known profiles.
> 2. When a volume/block of an unknown profile is destroyed, because the
>disk space being freed up may belong to a known profile.
> 
> This patch adds a sequence to coordinate the reconciliations for the
> above two cases.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> b90a4b81838fec410a97a10ce44a811bb81c87eb 
> 
> 
> Diff: https://reviews.apache.org/r/65975/diff/5/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 65975: Sequentialize reconciliations of storage pools in SLRP.

2018-06-13 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65975/
---

(Updated June 14, 2018, 12:06 a.m.)


Review request for mesos, Benjamin Bannier and Jie Yu.


Changes
---

Simplified the reconciliation logic and replaced the `reconciliationCount` with 
a `reconciled` future for the latest reconciliation.


Bugs: MESOS-8825
https://issues.apache.org/jira/browse/MESOS-8825


Repository: mesos


Description
---

The storage pools needs to be reconciled in the following two scenarios:

1. When there is a change in the set of known profiles.
2. When a volume/block of an unknown profile is destroyed, because the
   disk space being freed up may belong to a known profile.

This patch adds a sequence to coordinate the reconciliations for the
above two cases.


Diffs (updated)
-

  src/resource_provider/storage/provider.cpp 
b90a4b81838fec410a97a10ce44a811bb81c87eb 


Diff: https://reviews.apache.org/r/65975/diff/5/

Changes: https://reviews.apache.org/r/65975/diff/4-5/


Testing
---

sudo make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 65975: Sequentialize reconciliations of storage pools in SLRP.

2018-06-13 Thread Chun-Hung Hsiao


> On April 20, 2018, 1:27 p.m., Benjamin Bannier wrote:
> > First round of reviews.
> > 
> > I am not a big fan on how reconcilations are modelled here. The counting 
> > seems to lead to an incomplete encapsulation of correct behavior. I'd much 
> > rather see standard `libprocess` actor semantics and better use of data 
> > structures instead. Right now the code appears too complicated and 
> > potentially brittle to the touch to me.
> 
> Benjamin Bannier wrote:
> I am nto sure what a good example of a solution would, but have a look at 
> e.g., the master registrar in how it pushes updates (operations) into a 
> simple `deque` which it then works on sequentially. While it triggers work on 
> the updates manually the control flow is easier to see (still not easy enough 
> though). Maybe we have other examples as well.

A few notes here:

There are two concurrent sources that will trigger the reconciliation:
A. Changes in the set of known profiles.
B. Deletion of a volume/block with a gone profile (this is not supported before 
this patch).
The reconciliation triggered by B could be combined with that triggered by A, 
but not vice versa.

Also, I'd like to point out that the reconciliation must wait for all pending 
deletion to complete,
and during the wait, the resource provider should not accept new related 
operations.

So let's consider that such a deletion arrives: (i) before, (ii) during, or 
(iii) after a
reconciliation (either A or B):

For (i), if B is triggered for the deletion first, then A needs to wait for B;
if B is not tiggered yet, then B will wait for the deletion to be completed,
and B will be combined with the reconciliation (A/B).

For (ii), the deletion will be dropped. Note that we need to drop all operations
when a reconciliation is *enqueued*, not when it is executed.

For (iii), the deletion will be accepted, and B will be triggered after the 
reconciliation.

In other words, all A-type reconciliations need to be executed after the 
previous reconciliation,
and a B-type reconcilation can be combined with another A-type or B-type 
reconciliation that is in progress.

Meanwhile, whenever a reconciliation is enqueued to the actor, we need to 
*synchronously* mark the actor
in some "reconciliation" mode to drop related incoming operations.
The primary reason we want to do that is because the current code synchronously 
apply all speculative operations.
This decision is made to simplify the overall logic and maintain the invariant 
that
once an operation is accepted (and checkpointed), it must be able to be applied 
on the current total resources.
So, the only way to support both A and B and maintaining this invariant is to 
have ensure that
whenever a reconciliation is enqueued, no more related operation will be 
enqueued.

Hope this summary could help reviewing this patch.

I'll update this patch later with a simpler logic.


> On April 20, 2018, 1:27 p.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 1092 (patched)
> > 
> >
> > Could you add a comment why this and the assertion in the continuation 
> > below are valid? The guaranteed execution paths of this `loop` with below 
> > `defer`s are not obvious to me.
> > 
> > Ideally we would not need this variable at all to ensure correct 
> > push-pop semantics, but defer to actor behavior and queues. This is hidden 
> > under very thick continuation layers at the moment.

As described above, whenever a reconciliation is enqueued, it is not allowed to 
enqueue more operations,
so the main purpose of this variable is to *synchronously* mark this actor in a 
special mode to drop all incoming operations.

But yes, this counter is hard to understand and kinda error prone. I'll update 
the patch with a different (and hopefully easier) logic.


> On April 20, 2018, 1:27 p.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 1284 (patched)
> > 
> >
> > It would be great if this function would either return a value, or 
> > trigger `fatal()` to make sure we do our part to keep the agent up-to-date.

Will do.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65975/#review201629
---


On April 12, 2018, 3:36 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65975/
> ---
> 
> (Updated April 12, 2018, 3:36 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Jie Yu.
> 
> 
> Bugs: MESOS-8825
> 

Re: Review Request 65975: Sequentialize reconciliations of storage pools in SLRP.

2018-04-20 Thread Benjamin Bannier


> On April 20, 2018, 3:27 p.m., Benjamin Bannier wrote:
> > First round of reviews.
> > 
> > I am not a big fan on how reconcilations are modelled here. The counting 
> > seems to lead to an incomplete encapsulation of correct behavior. I'd much 
> > rather see standard `libprocess` actor semantics and better use of data 
> > structures instead. Right now the code appears too complicated and 
> > potentially brittle to the touch to me.

I am nto sure what a good example of a solution would, but have a look at e.g., 
the master registrar in how it pushes updates (operations) into a simple 
`deque` which it then works on sequentially. While it triggers work on the 
updates manually the control flow is easier to see (still not easy enough 
though). Maybe we have other examples as well.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65975/#review201629
---


On April 12, 2018, 5:36 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65975/
> ---
> 
> (Updated April 12, 2018, 5:36 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Jie Yu.
> 
> 
> Bugs: MESOS-8492
> https://issues.apache.org/jira/browse/MESOS-8492
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The storage pools needs to be reconciled in the following two scenarios:
> 
> 1. When there is a change in the set of known profiles.
> 2. When a volume/block of an unknown profile is destroyed, because the
>disk space being freed up may belong to a known profile.
> 
> This patch adds a sequence to coordinate the reconciliations for the
> above two cases.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> a07620d1c4bf618f669575b3e183bf598da905a6 
> 
> 
> Diff: https://reviews.apache.org/r/65975/diff/4/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 65975: Sequentialize reconciliations of storage pools in SLRP.

2018-04-20 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65975/#review201629
---



First round of reviews.

I am not a big fan on how reconcilations are modelled here. The counting seems 
to lead to an incomplete encapsulation of correct behavior. I'd much rather see 
standard `libprocess` actor semantics and better use of data structures 
instead. Right now the code appears too complicated and potentially brittle to 
the touch to me.


src/resource_provider/storage/provider.cpp
Lines 1069-1109 (patched)


Could we move this to some named function/lambda? I feel with this change 
the code here becomes too hard to follow.



src/resource_provider/storage/provider.cpp
Lines 1092 (patched)


Could you add a comment why this and the assertion in the continuation 
below are valid? The guaranteed execution paths of this `loop` with below 
`defer`s are not obvious to me.

Ideally we would not need this variable at all to ensure correct push-pop 
semantics, but defer to actor behavior and queues. This is hidden under very 
thick continuation layers at the moment.



src/resource_provider/storage/provider.cpp
Lines 1105 (patched)


Please use a temporary, this does not help readability.



src/resource_provider/storage/provider.cpp
Lines 1284 (patched)


It would be great if this function would either return a value, or trigger 
`fatal()` to make sure we do our part to keep the agent up-to-date.


- Benjamin Bannier


On April 12, 2018, 5:36 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65975/
> ---
> 
> (Updated April 12, 2018, 5:36 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Jie Yu.
> 
> 
> Bugs: MESOS-8492
> https://issues.apache.org/jira/browse/MESOS-8492
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The storage pools needs to be reconciled in the following two scenarios:
> 
> 1. When there is a change in the set of known profiles.
> 2. When a volume/block of an unknown profile is destroyed, because the
>disk space being freed up may belong to a known profile.
> 
> This patch adds a sequence to coordinate the reconciliations for the
> above two cases.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> a07620d1c4bf618f669575b3e183bf598da905a6 
> 
> 
> Diff: https://reviews.apache.org/r/65975/diff/4/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 65975: Sequentialize reconciliations of storage pools in SLRP.

2018-04-11 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65975/
---

(Updated April 12, 2018, 3:36 a.m.)


Review request for mesos, Benjamin Bannier and Jie Yu.


Changes
---

Rebased and adopted CSI v0.2.


Bugs: MESOS-8492
https://issues.apache.org/jira/browse/MESOS-8492


Repository: mesos


Description
---

The storage pools needs to be reconciled in the following two scenarios:

1. When there is a change in the set of known profiles.
2. When a volume/block of an unknown profile is destroyed, because the
   disk space being freed up may belong to a known profile.

This patch adds a sequence to coordinate the reconciliations for the
above two cases.


Diffs (updated)
-

  src/resource_provider/storage/provider.cpp 
a07620d1c4bf618f669575b3e183bf598da905a6 


Diff: https://reviews.apache.org/r/65975/diff/3/

Changes: https://reviews.apache.org/r/65975/diff/2-3/


Testing
---

sudo make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 65975: Sequentialize reconciliations of storage pools in SLRP.

2018-03-08 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65975/
---

(Updated March 8, 2018, 11:41 p.m.)


Review request for mesos, Benjamin Bannier, Jie Yu, and Joseph Wu.


Changes
---

Resloved test flakiness in r65995.


Bugs: MESOS-8492
https://issues.apache.org/jira/browse/MESOS-8492


Repository: mesos


Description
---

The storage pools needs to be reconciled in the following two scenarios:

1. When there is a change in the set of known profiles.
2. When a volume/block of an unknown profile is destroyed, because the
   disk space being freed up may belong to a known profile.

This patch adds a sequence to coordinate the reconciliations for the
above two cases.


Diffs
-

  src/resource_provider/storage/provider.cpp 
63dde512fd8cc9f68f5f48a96869eb09b23b6f4a 


Diff: https://reviews.apache.org/r/65975/diff/1/


Testing (updated)
---

sudo make check


Thanks,

Chun-Hung Hsiao