Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-29 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/
---

(Updated Oct. 29, 2017, 3:15 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
Gaston Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.


Bugs: MESOS-7235 and MESOS-8054
https://issues.apache.org/jira/browse/MESOS-7235
https://issues.apache.org/jira/browse/MESOS-8054


Repository: mesos


Description
---

The resource version UUID is used to establish the relationship
between the operation and the resources that the operation is
operating on. Each resource provider will keep a resource version
UUID, and change it when it believes that the resources from this
resource provider are out of sync from the master's view.  The master
will keep track of the last known resource version UUID for each
resource provider, and attach the resource version UUID in each
operation it sends out. The resource provider should reject operations
that have a different resource version UUID than that it maintains,
because this means the operation is operating on resources that might
have already been invalidated.


Diffs
-

  include/mesos/resource_provider/resource_provider.proto 
f5a9073075327019fd133bd51265f695ef464845 
  include/mesos/v1/resource_provider/resource_provider.proto 
e5cbede5b6e57a8641fca1ebfee5454f292cc24c 
  src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
  src/tests/resource_provider_manager_tests.cpp 
ca49e1f0203494fc8b4a4507c33e5a3885a14a59 
  src/tests/resource_provider_validation_tests.cpp 
f182bff4670318e9de22c2915c5dbb423a74ad6c 


Diff: https://reviews.apache.org/r/63094/diff/5/


Testing
---

make check


Thanks,

Jie Yu



Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-29 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/
---

(Updated Oct. 29, 2017, 10:37 a.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
Gaston Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.


Changes
---

Addressed comments.


Repository: mesos


Description (updated)
---

The resource version UUID is used to establish the relationship
between the operation and the resources that the operation is
operating on. Each resource provider will keep a resource version
UUID, and change it when it believes that the resources from this
resource provider are out of sync from the master's view.  The master
will keep track of the last known resource version UUID for each
resource provider, and attach the resource version UUID in each
operation it sends out. The resource provider should reject operations
that have a different resource version UUID than that it maintains,
because this means the operation is operating on resources that might
have already been invalidated.


Diffs (updated)
-

  include/mesos/resource_provider/resource_provider.proto 
f5a9073075327019fd133bd51265f695ef464845 
  include/mesos/v1/resource_provider/resource_provider.proto 
e5cbede5b6e57a8641fca1ebfee5454f292cc24c 
  src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
  src/tests/resource_provider_manager_tests.cpp 
ca49e1f0203494fc8b4a4507c33e5a3885a14a59 
  src/tests/resource_provider_validation_tests.cpp 
f182bff4670318e9de22c2915c5dbb423a74ad6c 


Diff: https://reviews.apache.org/r/63094/diff/5/

Changes: https://reviews.apache.org/r/63094/diff/4-5/


Testing
---

make check


Thanks,

Jie Yu



Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-27 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/#review189457
---


Fix it, then Ship it!





include/mesos/resource_provider/resource_provider.proto
Lines 58 (patched)


s/changes/change/

Here and elsewhere (also in the description :)



src/messages/messages.proto
Lines 716 (patched)


Maybe use `ResourceVersionUUID` here instead?


- Greg Mann


On Oct. 27, 2017, 5:06 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63094/
> ---
> 
> (Updated Oct. 27, 2017, 5:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
> Gaston Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The resource version UUID is used to establish the relationship
> between the operation and the resources that the operation is
> operating on. Each resource provider will keep a resource version
> UUID, and changes it when it believes that the resources from this
> resource provider are out of sync from the master's view.  The master
> will keep track of the last known resource version UUID for each
> resource provider, and attach the resource version UUID in each
> operation it sends out. The resource provider should reject operations
> that have a different resource version UUID than that it maintains,
> because this means the operation is operating on resources that might
> have already been invalidated.
> 
> 
> Diffs
> -
> 
>   include/mesos/resource_provider/resource_provider.proto 
> f5a9073075327019fd133bd51265f695ef464845 
>   include/mesos/v1/resource_provider/resource_provider.proto 
> e5cbede5b6e57a8641fca1ebfee5454f292cc24c 
>   src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
>   src/tests/resource_provider_manager_tests.cpp 
> ca49e1f0203494fc8b4a4507c33e5a3885a14a59 
>   src/tests/resource_provider_validation_tests.cpp 
> f182bff4670318e9de22c2915c5dbb423a74ad6c 
> 
> 
> Diff: https://reviews.apache.org/r/63094/diff/4/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-27 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/
---

(Updated Oct. 27, 2017, 5:06 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
Gaston Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.


Changes
---

Addressed comments.


Repository: mesos


Description
---

The resource version UUID is used to establish the relationship
between the operation and the resources that the operation is
operating on. Each resource provider will keep a resource version
UUID, and changes it when it believes that the resources from this
resource provider are out of sync from the master's view.  The master
will keep track of the last known resource version UUID for each
resource provider, and attach the resource version UUID in each
operation it sends out. The resource provider should reject operations
that have a different resource version UUID than that it maintains,
because this means the operation is operating on resources that might
have already been invalidated.


Diffs (updated)
-

  include/mesos/resource_provider/resource_provider.proto 
f5a9073075327019fd133bd51265f695ef464845 
  include/mesos/v1/resource_provider/resource_provider.proto 
e5cbede5b6e57a8641fca1ebfee5454f292cc24c 
  src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
  src/tests/resource_provider_manager_tests.cpp 
ca49e1f0203494fc8b4a4507c33e5a3885a14a59 
  src/tests/resource_provider_validation_tests.cpp 
f182bff4670318e9de22c2915c5dbb423a74ad6c 


Diff: https://reviews.apache.org/r/63094/diff/4/

Changes: https://reviews.apache.org/r/63094/diff/3-4/


Testing
---

make check


Thanks,

Jie Yu



Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-27 Thread Jie Yu


> On Oct. 26, 2017, 11:08 p.m., Greg Mann wrote:
> > src/messages/messages.proto
> > Lines 712 (patched)
> > 
> >
> > Do we need multiple ResourceVersionUUIDs here? Can the master just 
> > include the UUID for the relevant resource provider/agent?

Yeah, good point.


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/#review189371
---


On Oct. 19, 2017, 9:35 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63094/
> ---
> 
> (Updated Oct. 19, 2017, 9:35 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
> Gaston Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The resource version UUID is used to establish the relationship
> between the operation and the resources that the operation is
> operating on. Each resource provider will keep a resource version
> UUID, and changes it when it believes that the resources from this
> resource provider are out of sync from the master's view.  The master
> will keep track of the last known resource version UUID for each
> resource provider, and attach the resource version UUID in each
> operation it sends out. The resource provider should reject operations
> that have a different resource version UUID than that it maintains,
> because this means the operation is operating on resources that might
> have already been invalidated.
> 
> 
> Diffs
> -
> 
>   include/mesos/resource_provider/resource_provider.proto 
> f5a9073075327019fd133bd51265f695ef464845 
>   include/mesos/v1/resource_provider/resource_provider.proto 
> e5cbede5b6e57a8641fca1ebfee5454f292cc24c 
>   src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
>   src/tests/resource_provider_manager_tests.cpp 
> ca49e1f0203494fc8b4a4507c33e5a3885a14a59 
>   src/tests/resource_provider_validation_tests.cpp 
> f182bff4670318e9de22c2915c5dbb423a74ad6c 
> 
> 
> Diff: https://reviews.apache.org/r/63094/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-27 Thread Jie Yu


> On Oct. 20, 2017, 12:44 p.m., Benjamin Bannier wrote:
> > src/messages/messages.proto
> > Lines 663 (patched)
> > 
> >
> > Even though currently operation on agent resources cannot exhibit 
> > operation speculation failures, I would still prefer would we include some 
> > version uuid for the agent resources here as well.
> > 
> > AFAICT this could happen either by adjusting `ResourceVersionUUID` so 
> > that `resource_provider_id` becomes `optional` or by introducing an 
> > explicit proto3 `map` here, or alternatively, by the agent maintaining its 
> > own uuid which changes whenever a RP uuid changes and using that uuid 
> > instead of a vector here (we discussed this offline).

I'll go with making `resource_provider_id` optional. proto3 map does not 
support a message type key.


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/#review188829
---


On Oct. 19, 2017, 9:35 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63094/
> ---
> 
> (Updated Oct. 19, 2017, 9:35 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
> Gaston Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The resource version UUID is used to establish the relationship
> between the operation and the resources that the operation is
> operating on. Each resource provider will keep a resource version
> UUID, and changes it when it believes that the resources from this
> resource provider are out of sync from the master's view.  The master
> will keep track of the last known resource version UUID for each
> resource provider, and attach the resource version UUID in each
> operation it sends out. The resource provider should reject operations
> that have a different resource version UUID than that it maintains,
> because this means the operation is operating on resources that might
> have already been invalidated.
> 
> 
> Diffs
> -
> 
>   include/mesos/resource_provider/resource_provider.proto 
> f5a9073075327019fd133bd51265f695ef464845 
>   include/mesos/v1/resource_provider/resource_provider.proto 
> e5cbede5b6e57a8641fca1ebfee5454f292cc24c 
>   src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
>   src/tests/resource_provider_manager_tests.cpp 
> ca49e1f0203494fc8b4a4507c33e5a3885a14a59 
>   src/tests/resource_provider_validation_tests.cpp 
> f182bff4670318e9de22c2915c5dbb423a74ad6c 
> 
> 
> Diff: https://reviews.apache.org/r/63094/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-26 Thread Greg Mann


> On Oct. 18, 2017, 9:46 a.m., Benjamin Bannier wrote:
> > include/mesos/resource_provider/resource_provider.proto
> > Lines 56-66 (patched)
> > 
> >
> > I would prefer if we would not assume that this number changes for 
> > _every_ resource change, but just for failures.
> > * * *
> > 
> > For one we do not want users to assume that this number has certain 
> > values so we can recognize failures. Imagine a user applied an operation to 
> > the resource provider resources (`sequence_id=0`), and the agent speculated 
> > that the new `sequence_id` would be `1`. Now the operation fails in the RP 
> > and its `sequence_id` would be incremented to `1`. We cannot distinguish a 
> > case where the agent assumed the operation succeeded from a case where it 
> > failed based on the clock value alone in this case. Would we just increase 
> > this number in case of failures it would be clear that any operation 
> > assuming `sequence_id=0` would be invalid after the failure since the agent 
> > would be unlikely to assume failures on a "normal" execution path. 
> > 
> > * * *
> > 
> > Another reason would be that it would be great to be able to use 
> > similar notions of `sequence_id` for the agent's view of its RPs, and the 
> > master's view of its agents. We plan to implement speculative application 
> > of "legacy" offer operations like e.g., `RESERVE`, `CREATE` in the master, 
> > and incrementing this number even in successful cases would require too 
> > much knowledge about how the agent applies these operations in the master.
> > 
> > Imagine an agent with resource provider resources `disk(*):1` and 
> > `disk(*):1` (each `disk` from a different resource provider). The agent 
> > will expose `disk(*):2` to the master. A framework now reserves, 
> > `disk(*):2->disk(A):2`. The agent would inform the resource providers about 
> > that operation. Let's say the first of these operations succeeds, but the 
> > second on fails (the argument works as well for the other order, or even 
> > concurrent application). Having the master increment this 
> > `resource_sequence_id` for speculated operations would require it to know 
> > that the single `RESERVE` operation did map onto two state changes, i.e., 
> > it should have incremented its agent `resource_sequence_id` by two instead 
> > of just by one.
> > 
> > This seems not only leaky, but also inflexible since it becomes hard to 
> > decompose an agent's resources without affecting the master's assumptions.
> > 
> > * * *
> > 
> > Having `sequence_id` just increase in case of failures would eliminate 
> > a large class of cases where users would need to make assumption about how 
> > it does change. Instead we would more clearly move into a direction where 
> > it is only changed by pushes from lower layers (where an operation either 
> > succeeded or failed) to user layers above. This should also simplify how we 
> > would project multiple `sequence_id`s onto a single `sequence_id` (multiple 
> > resource provider `sequence_id`s onto agent `sequence_id`); in that case we 
> > could just take the maximum of all RP IDs and the agent ID (e.g., last 
> > failure of an operation on any RP is equivalent to last failure of any 
> > operation on the agent). Counting all operations makes this harder, 
> > especially when thinking of master-speculated operations.
> 
> Greg Mann wrote:
> This confused me a bit: "Having the master increment this 
> resource_sequence_id for speculated operations"
> 
> IIUC, the master will only increment the sequence ID when it receives a 
> {Register,Reregister,Update}SlaveMessage from the agent containing a new ID. 
> So, I don't think the ID is incremented routinely during the successful 
> application of offer operations.
> 
> My understanding of the comment in this patch is that the sequence ID 
> will be incremented only when a legacy operation fails, or when the RP has 
> its total resources updated (i.e., storage backend has more/less storage to 
> offer and updates the CSI plugin/RP accordingly).
> 
> Jie Yu wrote:
> I had some offline discussion with Benjamin. It looks to us that using a 
> single scalar clock value for the agent does not make sense. It's more easy 
> to reason about the correctness if we maintain per RP resource version uuid. 
> I updated this review with the new thinking. PTAL.
> 
> Jie Yu wrote:
> See this notes to see why a single scalar clock for the agnet does not 
> work:
> 
> https://docs.google.com/document/d/1RrrLVATZUyaURpEOeGjgxA6ccshuLo94G678IbL-Yco/edit#bookmark=id.v893d0oct2wh

After some discussion, I'm fine with using a vector clock.


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/#review188465

Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-26 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/#review189371
---




include/mesos/v1/resource_provider/resource_provider.proto
Lines 116 (patched)


s/The resource version UUID is used/Used/

Here and elsewhere.



src/messages/messages.proto
Lines 608 (patched)


s/Describe/Describes/



src/messages/messages.proto
Lines 712 (patched)


Do we need multiple ResourceVersionUUIDs here? Can the master just include 
the UUID for the relevant resource provider/agent?


- Greg Mann


On Oct. 19, 2017, 9:35 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63094/
> ---
> 
> (Updated Oct. 19, 2017, 9:35 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
> Gaston Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The resource version UUID is used to establish the relationship
> between the operation and the resources that the operation is
> operating on. Each resource provider will keep a resource version
> UUID, and changes it when it believes that the resources from this
> resource provider are out of sync from the master's view.  The master
> will keep track of the last known resource version UUID for each
> resource provider, and attach the resource version UUID in each
> operation it sends out. The resource provider should reject operations
> that have a different resource version UUID than that it maintains,
> because this means the operation is operating on resources that might
> have already been invalidated.
> 
> 
> Diffs
> -
> 
>   include/mesos/resource_provider/resource_provider.proto 
> f5a9073075327019fd133bd51265f695ef464845 
>   include/mesos/v1/resource_provider/resource_provider.proto 
> e5cbede5b6e57a8641fca1ebfee5454f292cc24c 
>   src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
>   src/tests/resource_provider_manager_tests.cpp 
> ca49e1f0203494fc8b4a4507c33e5a3885a14a59 
>   src/tests/resource_provider_validation_tests.cpp 
> f182bff4670318e9de22c2915c5dbb423a74ad6c 
> 
> 
> Diff: https://reviews.apache.org/r/63094/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-20 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/#review188829
---




src/messages/messages.proto
Lines 613 (patched)


See below.



src/messages/messages.proto
Lines 663 (patched)


Even though currently operation on agent resources cannot exhibit operation 
speculation failures, I would still prefer would we include some version uuid 
for the agent resources here as well.

AFAICT this could happen either by adjusting `ResourceVersionUUID` so that 
`resource_provider_id` becomes `optional` or by introducing an explicit proto3 
`map` here, or alternatively, by the agent maintaining its own uuid which 
changes whenever a RP uuid changes and using that uuid instead of a vector here 
(we discussed this offline).



src/messages/messages.proto
Lines 712 (patched)


See comment on `UpdateSlaveMessage`.


- Benjamin Bannier


On Oct. 19, 2017, 11:35 p.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63094/
> ---
> 
> (Updated Oct. 19, 2017, 11:35 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
> Gaston Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The resource version UUID is used to establish the relationship
> between the operation and the resources that the operation is
> operating on. Each resource provider will keep a resource version
> UUID, and changes it when it believes that the resources from this
> resource provider are out of sync from the master's view.  The master
> will keep track of the last known resource version UUID for each
> resource provider, and attach the resource version UUID in each
> operation it sends out. The resource provider should reject operations
> that have a different resource version UUID than that it maintains,
> because this means the operation is operating on resources that might
> have already been invalidated.
> 
> 
> Diffs
> -
> 
>   include/mesos/resource_provider/resource_provider.proto 
> f5a9073075327019fd133bd51265f695ef464845 
>   include/mesos/v1/resource_provider/resource_provider.proto 
> e5cbede5b6e57a8641fca1ebfee5454f292cc24c 
>   src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
>   src/tests/resource_provider_manager_tests.cpp 
> ca49e1f0203494fc8b4a4507c33e5a3885a14a59 
>   src/tests/resource_provider_validation_tests.cpp 
> f182bff4670318e9de22c2915c5dbb423a74ad6c 
> 
> 
> Diff: https://reviews.apache.org/r/63094/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>



Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-19 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/
---

(Updated Oct. 19, 2017, 9:35 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
Gaston Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.


Repository: mesos


Description
---

The resource version UUID is used to establish the relationship
between the operation and the resources that the operation is
operating on. Each resource provider will keep a resource version
UUID, and changes it when it believes that the resources from this
resource provider are out of sync from the master's view.  The master
will keep track of the last known resource version UUID for each
resource provider, and attach the resource version UUID in each
operation it sends out. The resource provider should reject operations
that have a different resource version UUID than that it maintains,
because this means the operation is operating on resources that might
have already been invalidated.


Diffs (updated)
-

  include/mesos/resource_provider/resource_provider.proto 
f5a9073075327019fd133bd51265f695ef464845 
  include/mesos/v1/resource_provider/resource_provider.proto 
e5cbede5b6e57a8641fca1ebfee5454f292cc24c 
  src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
  src/tests/resource_provider_manager_tests.cpp 
ca49e1f0203494fc8b4a4507c33e5a3885a14a59 
  src/tests/resource_provider_validation_tests.cpp 
f182bff4670318e9de22c2915c5dbb423a74ad6c 


Diff: https://reviews.apache.org/r/63094/diff/3/

Changes: https://reviews.apache.org/r/63094/diff/2-3/


Testing
---

make check


Thanks,

Jie Yu



Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-18 Thread Jie Yu


> On Oct. 18, 2017, 9:46 a.m., Benjamin Bannier wrote:
> > include/mesos/resource_provider/resource_provider.proto
> > Lines 56-66 (patched)
> > 
> >
> > I would prefer if we would not assume that this number changes for 
> > _every_ resource change, but just for failures.
> > * * *
> > 
> > For one we do not want users to assume that this number has certain 
> > values so we can recognize failures. Imagine a user applied an operation to 
> > the resource provider resources (`sequence_id=0`), and the agent speculated 
> > that the new `sequence_id` would be `1`. Now the operation fails in the RP 
> > and its `sequence_id` would be incremented to `1`. We cannot distinguish a 
> > case where the agent assumed the operation succeeded from a case where it 
> > failed based on the clock value alone in this case. Would we just increase 
> > this number in case of failures it would be clear that any operation 
> > assuming `sequence_id=0` would be invalid after the failure since the agent 
> > would be unlikely to assume failures on a "normal" execution path. 
> > 
> > * * *
> > 
> > Another reason would be that it would be great to be able to use 
> > similar notions of `sequence_id` for the agent's view of its RPs, and the 
> > master's view of its agents. We plan to implement speculative application 
> > of "legacy" offer operations like e.g., `RESERVE`, `CREATE` in the master, 
> > and incrementing this number even in successful cases would require too 
> > much knowledge about how the agent applies these operations in the master.
> > 
> > Imagine an agent with resource provider resources `disk(*):1` and 
> > `disk(*):1` (each `disk` from a different resource provider). The agent 
> > will expose `disk(*):2` to the master. A framework now reserves, 
> > `disk(*):2->disk(A):2`. The agent would inform the resource providers about 
> > that operation. Let's say the first of these operations succeeds, but the 
> > second on fails (the argument works as well for the other order, or even 
> > concurrent application). Having the master increment this 
> > `resource_sequence_id` for speculated operations would require it to know 
> > that the single `RESERVE` operation did map onto two state changes, i.e., 
> > it should have incremented its agent `resource_sequence_id` by two instead 
> > of just by one.
> > 
> > This seems not only leaky, but also inflexible since it becomes hard to 
> > decompose an agent's resources without affecting the master's assumptions.
> > 
> > * * *
> > 
> > Having `sequence_id` just increase in case of failures would eliminate 
> > a large class of cases where users would need to make assumption about how 
> > it does change. Instead we would more clearly move into a direction where 
> > it is only changed by pushes from lower layers (where an operation either 
> > succeeded or failed) to user layers above. This should also simplify how we 
> > would project multiple `sequence_id`s onto a single `sequence_id` (multiple 
> > resource provider `sequence_id`s onto agent `sequence_id`); in that case we 
> > could just take the maximum of all RP IDs and the agent ID (e.g., last 
> > failure of an operation on any RP is equivalent to last failure of any 
> > operation on the agent). Counting all operations makes this harder, 
> > especially when thinking of master-speculated operations.
> 
> Greg Mann wrote:
> This confused me a bit: "Having the master increment this 
> resource_sequence_id for speculated operations"
> 
> IIUC, the master will only increment the sequence ID when it receives a 
> {Register,Reregister,Update}SlaveMessage from the agent containing a new ID. 
> So, I don't think the ID is incremented routinely during the successful 
> application of offer operations.
> 
> My understanding of the comment in this patch is that the sequence ID 
> will be incremented only when a legacy operation fails, or when the RP has 
> its total resources updated (i.e., storage backend has more/less storage to 
> offer and updates the CSI plugin/RP accordingly).
> 
> Jie Yu wrote:
> I had some offline discussion with Benjamin. It looks to us that using a 
> single scalar clock value for the agent does not make sense. It's more easy 
> to reason about the correctness if we maintain per RP resource version uuid. 
> I updated this review with the new thinking. PTAL.

See this notes to see why a single scalar clock for the agnet does not work:
https://docs.google.com/document/d/1RrrLVATZUyaURpEOeGjgxA6ccshuLo94G678IbL-Yco/edit#bookmark=id.v893d0oct2wh


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/#review188465
---


On Oct. 19, 2017, 12:10 a.m., Jie Yu wrote:
> 
> 

Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-18 Thread Jie Yu


> On Oct. 18, 2017, 9:46 a.m., Benjamin Bannier wrote:
> > include/mesos/resource_provider/resource_provider.proto
> > Lines 56-66 (patched)
> > 
> >
> > I would prefer if we would not assume that this number changes for 
> > _every_ resource change, but just for failures.
> > * * *
> > 
> > For one we do not want users to assume that this number has certain 
> > values so we can recognize failures. Imagine a user applied an operation to 
> > the resource provider resources (`sequence_id=0`), and the agent speculated 
> > that the new `sequence_id` would be `1`. Now the operation fails in the RP 
> > and its `sequence_id` would be incremented to `1`. We cannot distinguish a 
> > case where the agent assumed the operation succeeded from a case where it 
> > failed based on the clock value alone in this case. Would we just increase 
> > this number in case of failures it would be clear that any operation 
> > assuming `sequence_id=0` would be invalid after the failure since the agent 
> > would be unlikely to assume failures on a "normal" execution path. 
> > 
> > * * *
> > 
> > Another reason would be that it would be great to be able to use 
> > similar notions of `sequence_id` for the agent's view of its RPs, and the 
> > master's view of its agents. We plan to implement speculative application 
> > of "legacy" offer operations like e.g., `RESERVE`, `CREATE` in the master, 
> > and incrementing this number even in successful cases would require too 
> > much knowledge about how the agent applies these operations in the master.
> > 
> > Imagine an agent with resource provider resources `disk(*):1` and 
> > `disk(*):1` (each `disk` from a different resource provider). The agent 
> > will expose `disk(*):2` to the master. A framework now reserves, 
> > `disk(*):2->disk(A):2`. The agent would inform the resource providers about 
> > that operation. Let's say the first of these operations succeeds, but the 
> > second on fails (the argument works as well for the other order, or even 
> > concurrent application). Having the master increment this 
> > `resource_sequence_id` for speculated operations would require it to know 
> > that the single `RESERVE` operation did map onto two state changes, i.e., 
> > it should have incremented its agent `resource_sequence_id` by two instead 
> > of just by one.
> > 
> > This seems not only leaky, but also inflexible since it becomes hard to 
> > decompose an agent's resources without affecting the master's assumptions.
> > 
> > * * *
> > 
> > Having `sequence_id` just increase in case of failures would eliminate 
> > a large class of cases where users would need to make assumption about how 
> > it does change. Instead we would more clearly move into a direction where 
> > it is only changed by pushes from lower layers (where an operation either 
> > succeeded or failed) to user layers above. This should also simplify how we 
> > would project multiple `sequence_id`s onto a single `sequence_id` (multiple 
> > resource provider `sequence_id`s onto agent `sequence_id`); in that case we 
> > could just take the maximum of all RP IDs and the agent ID (e.g., last 
> > failure of an operation on any RP is equivalent to last failure of any 
> > operation on the agent). Counting all operations makes this harder, 
> > especially when thinking of master-speculated operations.
> 
> Greg Mann wrote:
> This confused me a bit: "Having the master increment this 
> resource_sequence_id for speculated operations"
> 
> IIUC, the master will only increment the sequence ID when it receives a 
> {Register,Reregister,Update}SlaveMessage from the agent containing a new ID. 
> So, I don't think the ID is incremented routinely during the successful 
> application of offer operations.
> 
> My understanding of the comment in this patch is that the sequence ID 
> will be incremented only when a legacy operation fails, or when the RP has 
> its total resources updated (i.e., storage backend has more/less storage to 
> offer and updates the CSI plugin/RP accordingly).

I had some offline discussion with Benjamin. It looks to us that using a single 
scalar clock value for the agent does not make sense. It's more easy to reason 
about the correctness if we maintain per RP resource version uuid. I updated 
this review with the new thinking. PTAL.


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/#review188465
---


On Oct. 19, 2017, 12:10 a.m., Jie Yu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63094/
> 

Re: Review Request 63094: Added resource version uuid for offer operations.

2017-10-18 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/
---

(Updated Oct. 19, 2017, 12:10 a.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
Gaston Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.


Changes
---

Used UUID and a vector instead. Singular clock idea for the agent does not 
work. See reasoning in this notes:
https://docs.google.com/document/d/1RrrLVATZUyaURpEOeGjgxA6ccshuLo94G678IbL-Yco/edit#bookmark=id.v893d0oct2wh

This patch is based on https://reviews.apache.org/r/62903/


Summary (updated)
-

Added resource version uuid for offer operations.


Repository: mesos


Description (updated)
---

The resource version UUID is used to establish the relationship
between the operation and the resources that the operation is
operating on. Each resource provider will keep a resource version
UUID, and changes it when it believes that the resources from this
resource provider are out of sync from the master's view.  The master
will keep track of the last known resource version UUID for each
resource provider, and attach the resource version UUID in each
operation it sends out. The resource provider should reject operations
that have a different resource version UUID than that it maintains,
because this means the operation is operating on resources that might
have already been invalidated.


Diffs (updated)
-

  include/mesos/resource_provider/resource_provider.proto 
f5a9073075327019fd133bd51265f695ef464845 
  include/mesos/v1/resource_provider/resource_provider.proto 
e5cbede5b6e57a8641fca1ebfee5454f292cc24c 
  src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
  src/tests/resource_provider_manager_tests.cpp 
ca49e1f0203494fc8b4a4507c33e5a3885a14a59 
  src/tests/resource_provider_validation_tests.cpp 
f182bff4670318e9de22c2915c5dbb423a74ad6c 


Diff: https://reviews.apache.org/r/63094/diff/2/

Changes: https://reviews.apache.org/r/63094/diff/1-2/


Testing
---

make check


Thanks,

Jie Yu