Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-09-07 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review148064
---



Given the way this review has evolved how about the following summary for the 
commit?

```
Support sharing of persistent volumes via shared resources.

- One copy of a shared resource is made allocatable in each allocation 
  cycle. 
- All frameworks that are allocated the same shared resource are
  charged the scalar quantity of the resource in wDRF.
- Resource accounting for *individual* copies of shared resources is
  handled mostly in the same way as non-shared resources in the master 
  although they are grouped within the `Resources` abstraction for
  performance.
- DESTROY for shared persistent volumes is allowed only if there are
  no running or pending tasks using the volume. However, if the volume
  is in a pending offer to one or more frameworks, the master rescinds
  that offer before processing the DESTROY.
- Multiple tasks are allowed to be launched in the same ACCEPT call
  using the same shared resource. For this we update the allocator and
  sorter to allocate additional copies of these shared resources.
```

- Jiang Yan Xu


On Sept. 7, 2016, 10:56 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated Sept. 7, 2016, 10:56 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
> https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, rescind that offer before processing the DESTROY.
> o To allow multiple tasks to be launched in the same ACCEPT call
>   using the same shared resource, we update the allocator and
>   sorter with additional copies of shared resources to reflect the
>   true shared count of allocated shared resources.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> dd07ed221d2c1755d2478369641ffdc46ecc4471 
>   src/master/allocator/mesos/hierarchical.cpp 
> 9e5db2196c6a541dc1208ba8b9f13ef9a518bcc4 
>   src/master/allocator/sorter/drf/sorter.hpp 
> 09aa685f2bd7197385959d7d70d5411d0fd72f06 
>   src/master/allocator/sorter/drf/sorter.cpp 
> 1f2954c564eea5a3c649a5cc7181cb69329f9e35 
>   src/master/http.cpp 525ef6cd7545d25f3ac89a6325ace6e6d768262a 
>   src/master/master.hpp c32c7e9d859ef73216354e2c03ecc07d0009b12f 
>   src/master/master.cpp b2a19a645528e8fc1fd48f5ac9929d38c9a76b49 
>   src/master/validation.hpp 3205ee1e699d0fb7ca16ced96a07be3a07cf8956 
>   src/master/validation.cpp 15e2ecc2de99d6bed522f45f855ba686bf19c008 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> ---
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-09-07 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated Sept. 7, 2016, 5:56 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

Rebase.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, rescind that offer before processing the DESTROY.
o To allow multiple tasks to be launched in the same ACCEPT call
  using the same shared resource, we update the allocator and
  sorter with additional copies of shared resources to reflect the
  true shared count of allocated shared resources.


Diffs (updated)
-

  src/master/allocator/mesos/hierarchical.hpp 
dd07ed221d2c1755d2478369641ffdc46ecc4471 
  src/master/allocator/mesos/hierarchical.cpp 
9e5db2196c6a541dc1208ba8b9f13ef9a518bcc4 
  src/master/allocator/sorter/drf/sorter.hpp 
09aa685f2bd7197385959d7d70d5411d0fd72f06 
  src/master/allocator/sorter/drf/sorter.cpp 
1f2954c564eea5a3c649a5cc7181cb69329f9e35 
  src/master/http.cpp 525ef6cd7545d25f3ac89a6325ace6e6d768262a 
  src/master/master.hpp c32c7e9d859ef73216354e2c03ecc07d0009b12f 
  src/master/master.cpp b2a19a645528e8fc1fd48f5ac9929d38c9a76b49 
  src/master/validation.hpp 3205ee1e699d0fb7ca16ced96a07be3a07cf8956 
  src/master/validation.cpp 15e2ecc2de99d6bed522f45f855ba686bf19c008 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-09-07 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review148024
---


Ship it!




Minor things that I'll take care of when committing.


src/master/allocator/mesos/hierarchical.cpp (lines 675 - 679)


```
s/additional instances of shared resources/additional instances of the same 
shared resources/
```



src/master/allocator/mesos/hierarchical.cpp (line 1361)


s/cycle/stage/



src/master/allocator/sorter/drf/sorter.cpp (line 163)


s/total/allocated/



src/master/allocator/sorter/drf/sorter.cpp (lines 165 - 168)


Changed the indentation style to

```
  const Resources newShared = resources.shared()
.filter([this, name, slaveId](const Resource& resource) {
  return !allocations[name].resources[slaveId].contains(resource);
});

```

which I found more readable. Similarly in add/remove/unallocated.


- Jiang Yan Xu


On Sept. 6, 2016, 3:02 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated Sept. 6, 2016, 3:02 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
> https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, rescind that offer before processing the DESTROY.
> o To allow multiple tasks to be launched in the same ACCEPT call
>   using the same shared resource, we update the allocator and
>   sorter with additional copies of shared resources to reflect the
>   true shared count of allocated shared resources.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> dd07ed221d2c1755d2478369641ffdc46ecc4471 
>   src/master/allocator/mesos/hierarchical.cpp 
> 9e5db2196c6a541dc1208ba8b9f13ef9a518bcc4 
>   src/master/allocator/sorter/drf/sorter.hpp 
> 09aa685f2bd7197385959d7d70d5411d0fd72f06 
>   src/master/allocator/sorter/drf/sorter.cpp 
> 1f2954c564eea5a3c649a5cc7181cb69329f9e35 
>   src/master/http.cpp 525ef6cd7545d25f3ac89a6325ace6e6d768262a 
>   src/master/master.hpp c32c7e9d859ef73216354e2c03ecc07d0009b12f 
>   src/master/master.cpp b2a19a645528e8fc1fd48f5ac9929d38c9a76b49 
>   src/master/validation.hpp 3205ee1e699d0fb7ca16ced96a07be3a07cf8956 
>   src/master/validation.cpp 15e2ecc2de99d6bed522f45f855ba686bf19c008 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> ---
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-09-06 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated Sept. 6, 2016, 10:02 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

Minor fix and rebase.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, rescind that offer before processing the DESTROY.
o To allow multiple tasks to be launched in the same ACCEPT call
  using the same shared resource, we update the allocator and
  sorter with additional copies of shared resources to reflect the
  true shared count of allocated shared resources.


Diffs (updated)
-

  src/master/allocator/mesos/hierarchical.hpp 
dd07ed221d2c1755d2478369641ffdc46ecc4471 
  src/master/allocator/mesos/hierarchical.cpp 
9e5db2196c6a541dc1208ba8b9f13ef9a518bcc4 
  src/master/allocator/sorter/drf/sorter.hpp 
09aa685f2bd7197385959d7d70d5411d0fd72f06 
  src/master/allocator/sorter/drf/sorter.cpp 
1f2954c564eea5a3c649a5cc7181cb69329f9e35 
  src/master/http.cpp 525ef6cd7545d25f3ac89a6325ace6e6d768262a 
  src/master/master.hpp c32c7e9d859ef73216354e2c03ecc07d0009b12f 
  src/master/master.cpp b2a19a645528e8fc1fd48f5ac9929d38c9a76b49 
  src/master/validation.hpp 3205ee1e699d0fb7ca16ced96a07be3a07cf8956 
  src/master/validation.cpp 15e2ecc2de99d6bed522f45f855ba686bf19c008 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-09-06 Thread Anindya Sinha


> On Sept. 1, 2016, 5:43 p.m., Jiang Yan Xu wrote:
> > Updated the review with the following changes.
> > 
> > I have already committed all 'resources*.cpp's so they are removed from the 
> > review. 
> > I moved the tests to /r/45962/ so the remainder of the current set of tests 
> > are in one place.
> > 
> > Please take a look.

LGTM, with the small modification in master.cpp.


> On Sept. 1, 2016, 5:43 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3808-3809
> > 
> >
> > Why the change from `offeredSharedResources = 
> > _offeredResources.shared();`?
> > 
> > This won't work if `offeredSharedResources` has multiple copies.
> > 
> > I also changed the CREATE case to be more consistent.

Agreed. For the CREATE, I modified from:
```
offeredSharedResources += _offeredResources.shared();
```

to:

```
offeredSharedResources = _offeredResources.shared();
```


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review147218
---


On Sept. 6, 2016, 6:13 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated Sept. 6, 2016, 6:13 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
> https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, rescind that offer before processing the DESTROY.
> o To allow multiple tasks to be launched in the same ACCEPT call
>   using the same shared resource, we update the allocator and
>   sorter with additional copies of shared resources to reflect the
>   true shared count of allocated shared resources.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.hpp 
> dd07ed221d2c1755d2478369641ffdc46ecc4471 
>   src/master/allocator/mesos/hierarchical.cpp 
> 9e5db2196c6a541dc1208ba8b9f13ef9a518bcc4 
>   src/master/allocator/sorter/drf/sorter.hpp 
> 09aa685f2bd7197385959d7d70d5411d0fd72f06 
>   src/master/allocator/sorter/drf/sorter.cpp 
> 1f2954c564eea5a3c649a5cc7181cb69329f9e35 
>   src/master/http.cpp 525ef6cd7545d25f3ac89a6325ace6e6d768262a 
>   src/master/master.hpp c32c7e9d859ef73216354e2c03ecc07d0009b12f 
>   src/master/master.cpp 192b2ce5eb24deb6e843f2f1a8c915114614b268 
>   src/master/validation.hpp 3205ee1e699d0fb7ca16ced96a07be3a07cf8956 
>   src/master/validation.cpp 15e2ecc2de99d6bed522f45f855ba686bf19c008 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> ---
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-09-06 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated Sept. 6, 2016, 6:13 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, rescind that offer before processing the DESTROY.
o To allow multiple tasks to be launched in the same ACCEPT call
  using the same shared resource, we update the allocator and
  sorter with additional copies of shared resources to reflect the
  true shared count of allocated shared resources.


Diffs (updated)
-

  src/master/allocator/mesos/hierarchical.hpp 
dd07ed221d2c1755d2478369641ffdc46ecc4471 
  src/master/allocator/mesos/hierarchical.cpp 
9e5db2196c6a541dc1208ba8b9f13ef9a518bcc4 
  src/master/allocator/sorter/drf/sorter.hpp 
09aa685f2bd7197385959d7d70d5411d0fd72f06 
  src/master/allocator/sorter/drf/sorter.cpp 
1f2954c564eea5a3c649a5cc7181cb69329f9e35 
  src/master/http.cpp 525ef6cd7545d25f3ac89a6325ace6e6d768262a 
  src/master/master.hpp c32c7e9d859ef73216354e2c03ecc07d0009b12f 
  src/master/master.cpp 192b2ce5eb24deb6e843f2f1a8c915114614b268 
  src/master/validation.hpp 3205ee1e699d0fb7ca16ced96a07be3a07cf8956 
  src/master/validation.cpp 15e2ecc2de99d6bed522f45f855ba686bf19c008 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-09-01 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review147218
---



Updated the review with the following changes.

I have already committed all 'resources*.cpp's so they are removed from the 
review. 
I moved the tests to /r/45962/ so the remainder of the current set of tests are 
in one place.

Please take a look.


src/master/allocator/mesos/hierarchical.cpp (lines 620 - 621)






src/master/allocator/mesos/hierarchical.cpp (lines 622 - 630)


I am a bit uncomfortable with processing LAUNCHes after all other 
operations are applied. This may or may not be problematic but it's not the 
same as the original order. Implementation cleanliness aside, I think it's most 
intuitive if we structure it as "loop through the list of operations and handle 
them individually, we just added some logic for handling the LAUNCH operation".

In fact in my change I kept most of the method unchanged.



src/master/allocator/mesos/hierarchical.cpp (lines 651 - 653)






src/master/allocator/mesos/hierarchical.cpp (line 693)


`additional` could have more instances than `updatedFrameworkAllocation`.



src/master/allocator/sorter/drf/sorter.cpp (lines 163 - 178)


I changed it to the following which I think is cleaner and more direct 
translation of the comment.

```
  // Add shared resources to the total quantities when the same
  // resources don't already exist in the allocation.
  const Resources newShared = resources.shared().filter([this, name, 
slaveId](
  const Resource& resource) {
return !allocations[name].resources[slaveId].contains(resource);
  });

  allocations[name].resources[slaveId] += resources;
  allocations[name].scalarQuantities +=
(resources.nonShared() + newShared).createStrippedScalarQuantity();
```



src/master/allocator/sorter/drf/sorter.cpp (lines 285 - 296)


Similarly:

```
  // Remove shared resources from the allocated quantities when there
  // are no instances of same resources left in the allocation.
  const Resources absentShared = resources.shared().filter(
  [this, name, slaveId](const Resource& resource) {
return !allocations[name].resources[slaveId].contains(resource);
  });

  const Resources resourcesQuantity =
(resources.nonShared() + absentShared).createStrippedScalarQuantity();
```



src/master/allocator/sorter/drf/sorter.cpp (lines 314 - 328)


Similarly,

```
// Add shared resources to the total quantities when the same
// resources don't already exist in the total.
const Resources newShared = resources.shared().filter([this, slaveId](
const Resource& resource) {
  return !total_.resources[slaveId].contains(resource);
});

total_.resources[slaveId] += resources;
total_.scalarQuantities +=
  (resources.nonShared() + newShared).createStrippedScalarQuantity();

```



src/master/allocator/sorter/drf/sorter.cpp (lines 347 - 359)


Similarly,

```
// Remove shared resources from the total quantities when there
// are no instances of same resources left in the total.
const Resources absentShared = resources.shared().filter(
[this, slaveId](const Resource& resource) {
  return !total_.resources[slaveId].contains(resource);
});

const Resources resourcesQuantity =
  (resources.nonShared() + absentShared).createStrippedScalarQuantity();
```



src/master/master.cpp (lines 3807 - 3808)


Why the change from `offeredSharedResources = _offeredResources.shared();`?

This won't work if `offeredSharedResources` has multiple copies.

I also changed the CREATE case to be more consistent.



src/master/master.cpp (lines 3820 - 3821)


This is far away from its first usage. Plus the comments about why we are 
doing it is below. I moved it down.



src/master/master.cpp (line 3821)


Indentation.



src/master/master.cpp (line 6636)


Kill this redundant line.



src/master/validation.cpp (line 904)


I moved 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-08-31 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated Aug. 31, 2016, 9:17 a.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

Addressed review comments and migration to updated updateAllocation() api


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, rescind that offer before processing the DESTROY.
o To allow multiple tasks to be launched in the same ACCEPT call
  using the same shared resource, we update the allocator and
  sorter with additional copies of shared resources to reflect the
  true shared count of allocated shared resources.


Diffs (updated)
-

  src/common/resources.cpp a5f5902d8f7f2757e3aee35619bff5cc3a52f29b 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.hpp 
dd07ed221d2c1755d2478369641ffdc46ecc4471 
  src/master/allocator/mesos/hierarchical.cpp 
9e5db2196c6a541dc1208ba8b9f13ef9a518bcc4 
  src/master/allocator/sorter/drf/sorter.hpp 
09aa685f2bd7197385959d7d70d5411d0fd72f06 
  src/master/allocator/sorter/drf/sorter.cpp 
1f2954c564eea5a3c649a5cc7181cb69329f9e35 
  src/master/http.cpp 525ef6cd7545d25f3ac89a6325ace6e6d768262a 
  src/master/master.hpp c32c7e9d859ef73216354e2c03ecc07d0009b12f 
  src/master/master.cpp 192b2ce5eb24deb6e843f2f1a8c915114614b268 
  src/master/validation.hpp 3205ee1e699d0fb7ca16ced96a07be3a07cf8956 
  src/master/validation.cpp 15e2ecc2de99d6bed522f45f855ba686bf19c008 
  src/tests/master_validation_tests.cpp 
e440ff3f6d04f797e65874b9f610ed63d9f28e0e 
  src/tests/sorter_tests.cpp 821e30d6574b045d25d4de4f7c3b8ac5346d3002 
  src/v1/resources.cpp 172217505d80d66cb7e10b3635dc273229313601 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-08-29 Thread Anindya Sinha


> On Aug. 25, 2016, 3:22 a.m., Jiang Yan Xu wrote:
> > src/tests/sorter_tests.cpp, lines 773-779
> > 
> >
> > Doesn't look like this block with `e` add much to the test coverage, 
> > remove it?

I removed 'd' and renamed 'e' as 'd'. I think we can keep 'd' (originally 'e') 
as it allows to show comparison between 3 clients when 1 of the clients is 
deactivated.


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review146046
---


On Aug. 29, 2016, 7:21 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated Aug. 29, 2016, 7:21 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
> https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, rescind that offer before processing the DESTROY.
> o To allow multiple tasks to be launched in the same ACCEPT call
>   using the same shared resource, we update the allocator and
>   sorter with additional copies of shared resources to reflect the
>   true shared count of allocated shared resources.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp a5f5902d8f7f2757e3aee35619bff5cc3a52f29b 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.hpp 
> bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
>   src/master/allocator/mesos/hierarchical.cpp 
> 234ef98529964a0b6d3f132426a6c8ccbb1263ee 
>   src/master/allocator/sorter/drf/sorter.hpp 
> bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
>   src/master/allocator/sorter/drf/sorter.cpp 
> ac85b327fc33d34246788e6a8c8bf5a486c61434 
>   src/master/http.cpp 525ef6cd7545d25f3ac89a6325ace6e6d768262a 
>   src/master/master.hpp b118293823d37bba9b751aac12e0403318ce5736 
>   src/master/master.cpp 2b4aff80385d02c71dc3eeff4ddda326c9824ede 
>   src/master/validation.hpp 3205ee1e699d0fb7ca16ced96a07be3a07cf8956 
>   src/master/validation.cpp 15e2ecc2de99d6bed522f45f855ba686bf19c008 
>   src/tests/master_validation_tests.cpp 
> e440ff3f6d04f797e65874b9f610ed63d9f28e0e 
>   src/tests/sorter_tests.cpp 821e30d6574b045d25d4de4f7c3b8ac5346d3002 
>   src/v1/resources.cpp 172217505d80d66cb7e10b3635dc273229313601 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> ---
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-08-29 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated Aug. 29, 2016, 7:21 a.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

updateAllocation() for all offer operations.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, rescind that offer before processing the DESTROY.
o To allow multiple tasks to be launched in the same ACCEPT call
  using the same shared resource, we update the allocator and
  sorter with additional copies of shared resources to reflect the
  true shared count of allocated shared resources.


Diffs (updated)
-

  src/common/resources.cpp a5f5902d8f7f2757e3aee35619bff5cc3a52f29b 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.hpp 
bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
  src/master/allocator/mesos/hierarchical.cpp 
234ef98529964a0b6d3f132426a6c8ccbb1263ee 
  src/master/allocator/sorter/drf/sorter.hpp 
bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
  src/master/allocator/sorter/drf/sorter.cpp 
ac85b327fc33d34246788e6a8c8bf5a486c61434 
  src/master/http.cpp 525ef6cd7545d25f3ac89a6325ace6e6d768262a 
  src/master/master.hpp b118293823d37bba9b751aac12e0403318ce5736 
  src/master/master.cpp 2b4aff80385d02c71dc3eeff4ddda326c9824ede 
  src/master/validation.hpp 3205ee1e699d0fb7ca16ced96a07be3a07cf8956 
  src/master/validation.cpp 15e2ecc2de99d6bed522f45f855ba686bf19c008 
  src/tests/master_validation_tests.cpp 
e440ff3f6d04f797e65874b9f610ed63d9f28e0e 
  src/tests/sorter_tests.cpp 821e30d6574b045d25d4de4f7c3b8ac5346d3002 
  src/v1/resources.cpp 172217505d80d66cb7e10b3635dc273229313601 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-08-25 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated Aug. 25, 2016, 8:12 a.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, rescind that offer before processing the DESTROY.
o To allow multiple tasks to be launched in the same ACCEPT call
  using the same shared resource, we update the allocator and
  sorter with additional copies of shared resources to reflect the
  true shared count of allocated shared resources.


Diffs (updated)
-

  src/common/resources.cpp a5f5902d8f7f2757e3aee35619bff5cc3a52f29b 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.hpp 
bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
  src/master/allocator/mesos/hierarchical.cpp 
234ef98529964a0b6d3f132426a6c8ccbb1263ee 
  src/master/allocator/sorter/drf/sorter.hpp 
bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
  src/master/allocator/sorter/drf/sorter.cpp 
ac85b327fc33d34246788e6a8c8bf5a486c61434 
  src/master/http.cpp c6bdad64e6058ec63bfbacda736d174fd15f8a05 
  src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad 
  src/master/master.cpp 789fb9760a5ea60ce3b7b08f17e36cec69349c60 
  src/master/validation.hpp 3205ee1e699d0fb7ca16ced96a07be3a07cf8956 
  src/master/validation.cpp 3d2965e4cfaae075f988fd43386bd00c7d807b2e 
  src/tests/master_validation_tests.cpp 
86b4b22350175af592876fd2f6d3fecca7acabce 
  src/tests/sorter_tests.cpp 821e30d6574b045d25d4de4f7c3b8ac5346d3002 
  src/v1/resources.cpp 172217505d80d66cb7e10b3635dc273229313601 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-08-24 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated Aug. 25, 2016, 4:14 a.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

Updated to handle Offer operations within allocator to determine additional 
copies of shared resources to be added in the allocator.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, rescind that offer before processing the DESTROY.
o To allow multiple tasks to be launched in the same ACCEPT call
  using the same shared resource, we update the allocator and
  sorter with additional copies of shared resources to reflect the
  true shared count of allocated shared resources.


Diffs (updated)
-

  src/common/resources.cpp a5f5902d8f7f2757e3aee35619bff5cc3a52f29b 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.hpp 
bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
  src/master/allocator/mesos/hierarchical.cpp 
234ef98529964a0b6d3f132426a6c8ccbb1263ee 
  src/master/allocator/sorter/drf/sorter.hpp 
bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
  src/master/allocator/sorter/drf/sorter.cpp 
ac85b327fc33d34246788e6a8c8bf5a486c61434 
  src/master/http.cpp c6bdad64e6058ec63bfbacda736d174fd15f8a05 
  src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad 
  src/master/master.cpp 789fb9760a5ea60ce3b7b08f17e36cec69349c60 
  src/master/validation.hpp 3205ee1e699d0fb7ca16ced96a07be3a07cf8956 
  src/master/validation.cpp 3d2965e4cfaae075f988fd43386bd00c7d807b2e 
  src/tests/master_validation_tests.cpp 
86b4b22350175af592876fd2f6d3fecca7acabce 
  src/tests/sorter_tests.cpp 821e30d6574b045d25d4de4f7c3b8ac5346d3002 
  src/v1/resources.cpp 172217505d80d66cb7e10b3635dc273229313601 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-08-24 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review146046
---



Comments mainly on tests (which I didn't look at earlier).


src/master/allocator/sorter/drf/sorter.cpp (line 165)


At this point we don't know if the resources are **scalars** or not, they 
could be of other types.

So `s/scalars/_resources/` ?



src/master/allocator/sorter/drf/sorter.cpp (line 285)


Ditto about the name `scalars`.



src/tests/master_validation_tests.cpp (line 609)


s/SharedPersistentVolumes/SharedPersistentVolumeInUse/



src/tests/master_validation_tests.cpp (lines 611 - 612)


s/cpus1/cpus/
s/mem1/mem/



src/tests/master_validation_tests.cpp (line 640)


We can just use `{}` here.



src/tests/master_validation_tests.cpp (line 649)


Ditto.



src/tests/master_validation_tests.cpp (lines 664 - 665)


Ditto.



src/tests/master_validation_tests.cpp (lines 670 - 671)


Ditto.



src/tests/sorter_tests.cpp (line 728)


I see that this is modeled after `SorterTest.DRFSorter` but in general 
shorter and more focused tests are preferred, especially the ones like this 
which require no setup.

So here at least let's try to make it shorter by removing bits that are not 
strictly increasing our test coverage, especially given there's already 
`SorterTest.DRFSorter`.



src/tests/sorter_tests.cpp (line 739)


```
// 1000 'disk' in total (shared + non-shared).
```



src/tests/sorter_tests.cpp (lines 747 - 748)


Let's be consistent with the sequence in a block. Same as with client a: 
first `sorter.add("b");` then prepare b's resources and call `allocated()`. 
Here and below for other clients as well.



src/tests/sorter_tests.cpp (line 751)


This would help me understand better. Here and below.

Also capitalize it.

```
// Shares: a = .1 (dominant: disk), b = .06 (dominant: cpus).
```



src/tests/sorter_tests.cpp (lines 759 - 761)






src/tests/sorter_tests.cpp (lines 759 - 761)


We already have a, b and c, d looks to be just another simple client with 
only non-shared allocation and not all that interesting? Can we keep it simple 
and remove `d`?



src/tests/sorter_tests.cpp (line 763)


Some shared resources specific comments?

```
// 'a' and 'c' share the same shared volume which are the the dominant 
resource for both.
```



src/tests/sorter_tests.cpp (lines 773 - 779)


Doesn't look like this block with `e` add much to the test coverage, remove 
it?



src/tests/sorter_tests.cpp (line 781)


The rest of the test doesn't seem to have much to do with sharedness of 
resources directly so let's just make it clear:

```
// Verify other basic allocator methods work when shared resources are in 
the allocations.
```

Correspondingly, the first half of the test can be summarized as:

```
// Verify sort() works when shared resources are in the allocations.
```


- Jiang Yan Xu


On Aug. 12, 2016, 5:57 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated Aug. 12, 2016, 5:57 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
> https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
> 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-08-12 Thread Anindya Sinha


> On Aug. 12, 2016, 5:44 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, line 628
> > 
> >
> > Move the check blow into the loop?
> > 
> > ```
> > // Master would only allow additional instances of shared resources to 
> > be allocated.
> > CHECK_EQ(Resources(task.resources()).shared(), 
> > Resources(task.resources()));
> > 
> > // Master only uses pass these resources through task resources.
> > CHECK_TRUE(Resources(task.executor().resources()).empty());
> > ```

As discussed, there is no way for master to *not* know the intention of the 
allocator pertaining to `LAUNCH` operations in `updateAllocation()`. Therefore, 
we only populate the additional shared resources in `TaskInfo::resources` and 
let allocator retrieve those additional resources. The rest of the fields are 
not important to the master and allocator.


> On Aug. 12, 2016, 5:44 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, line 668
> > 
> >
> > Below this line:
> > 
> > ```
> > // A shared resource must have already been allocated to this framework 
> > for it to be eligile for allocation for additional instances.
> > foreach (const Resource& resource, additional) {
> >   CHECK(frameworkAllocation.contains(resource));
> > }
> > ```

This should be equivalent right?
```
CHECK(frameworkAllocation.contains(additional));
```


> On Aug. 12, 2016, 5:44 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.cpp, line 163
> > 
> >
> > s/, and/; / is more grammatrically correct?

Well, that is debatable but updated nevertheless.


> On Aug. 12, 2016, 5:44 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, line 3881
> > 
> >
> > s/needs/need/ (because of "copies")

Same comment as the one before?


> On Aug. 12, 2016, 5:44 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, line 3886
> > 
> >
> > I suggested this paragraph but I feel the following is more clear, what 
> > do you think?
> > 
> > ```
> > TODO(anindya_sinha): The solution is temporary as it should be up to 
> > the allocator to accept the request and allocate these additional 
> > resources. However it will only be possible when the allocator becomes the 
> > entity that accepts and manages offers (MESOS-4553). Then we should move 
> > the logic that allocates additional shared resources when accepting the 
> > LAUNCH call to the allocator.
> > ```

Ok. Changed to this version.


> On Aug. 12, 2016, 5:44 p.m., Jiang Yan Xu wrote:
> > src/master/validation.cpp, line 1170
> > 
> >
> > This would involve copying too. 
> > 
> > I think we can directly use these resources without copying.
> > 
> > ```
> > if (task.resources().contains(volume)) {
> >   return Error(...);
> > }
> > 
> > if (task.has_executor() && 
> > task.executor().resources().contains(volume)) {
> >   return Error(...);
> > }
> > ```
> > 
> > Note the singular volume here for the same reason as noted previously.

I do not think we can avoid a copy since `task.resources()` or 
`task.executor().resources()` returns a  
`google::protobuf::RepeatedPtrField`. We need a `Resources` 
object from `google::protobuf::RepeatedPtrField` to do a 
`contains()`.


> On Aug. 12, 2016, 5:44 p.m., Jiang Yan Xu wrote:
> > src/master/validation.cpp, lines 1166-1167
> > 
> >
> > This could result in copying.
> > 
> > The alternative should be able to avoid it.
> > 
> > ```
> > foreachvalue(const hashmap& tasks, pendingTasks) {
> >   foreachvalue (const TaskInfo& task, tasks) {
> >   ...
> >   }
> > }
> > ```

As discussed, we get into a compile issue which we need to further investigate. 
In the meantime, I will keep it as is.


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review144723
---


On Aug. 13, 2016, 12:57 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated Aug. 13, 2016, 12:57 a.m.)
> 
> 
> Review request for mesos, 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-08-12 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated Aug. 13, 2016, 12:57 a.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

Addressed review comments and rebased.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, rescind that offer before processing the DESTROY.
o To allow multiple tasks to be launched in the same ACCEPT call
  using the same shared resource, we update the allocator and
  sorter with additional copies of shared resources to reflect the
  true shared count of allocated shared resources.


Diffs (updated)
-

  src/common/resources.cpp 6b7af9179121efbdc5c29484eb042778bcea8288 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.hpp 
bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
  src/master/allocator/mesos/hierarchical.cpp 
234ef98529964a0b6d3f132426a6c8ccbb1263ee 
  src/master/allocator/sorter/drf/sorter.hpp 
bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
  src/master/allocator/sorter/drf/sorter.cpp 
ac85b327fc33d34246788e6a8c8bf5a486c61434 
  src/master/http.cpp 52dd80b856cf2317c0b73ba54bf501696786088d 
  src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad 
  src/master/master.cpp 0bd1a3490a86fede86a3f5f62ce4745b65aae258 
  src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
  src/master/validation.cpp af18e5aef3be59830b0a7b0235bbc739dba1029c 
  src/tests/master_validation_tests.cpp 
9eb82a569f7c48caa96d4d54a93b199ccac74385 
  src/tests/sorter_tests.cpp 821e30d6574b045d25d4de4f7c3b8ac5346d3002 
  src/v1/resources.cpp 03ee0cb0bb5abe7fc1ae4cb47c5b6dbcd8d11998 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-08-12 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review144723
---



Skipped tests again but will pick them up soon as we are converging on the 
non-test code.


src/master/allocator/mesos/hierarchical.cpp (line 604)


Some comments in this change read like we have changed the flow of the 
entire method: for LAUNCH we do one thing, for non-LAUNCHes we do something 
else.

IMO we are merely adding a special processing step for LAUNCHes and then we 
continue with the common logic that **doesn't** distinguish the operation types.



src/master/allocator/mesos/hierarchical.cpp (lines 615 - 618)


Move this comment to above

```
Try updatedSlaveAllocation =
  slaves[slaveId].allocated.apply(operations);
```

The sentence `The available resources remain unchanged` is still true: 
LAUNCH doesn't change `total - allocated` and shared resources are always 
*available*.



src/master/allocator/mesos/hierarchical.cpp (lines 620 - 623)


It's helpful to explain the algorithm here:

```
For LAUNCH operations we support tasks requesting more instances of shared 
resources than those being offered so we may need to allocate additional 
instances of these shared resources (i.e., add them to the allocated resources 
in the allocator and in each of the sorters) after all operations are applied.

TODO: It should be up to the allocator to "accept" the request for these 
additional resources but the allocator cannot do it yet as it isn't managing 
offers (see MESOS-4553) so for now the master needs to inform the allocator the 
additional allocation that each task needs. We need to fix MESOS-4553 soon.
```



src/master/allocator/mesos/hierarchical.cpp (lines 621 - 622)


Not "multiple task launches against a single copy".



src/master/allocator/mesos/hierarchical.cpp (line 628)


Move the check blow into the loop?

```
// Master would only allow additional instances of shared resources to be 
allocated.
CHECK_EQ(Resources(task.resources()).shared(), Resources(task.resources()));

// Master only uses pass these resources through task resources.
CHECK_TRUE(Resources(task.executor().resources()).empty());
```



src/master/allocator/mesos/hierarchical.cpp (lines 637 - 638)


I find this sentence "non-LAUNCH operations followed by the LAUNCH 
operations" misleading: here you haven't removed any LAUNCH operations from the 
`operations` variable so `apply()` here is processing **all** of the 
operations. The fact that `apply()` is a no-op for LAUNCH is not the concern of 
this method.

We could instead directly comment on `slaves[slaveId].allocated += 
additional`.



src/master/allocator/mesos/hierarchical.cpp (lines 646 - 648)


No need for the if guard, Resources's `+=` is a noop if `additional` is 
empty.



src/master/allocator/mesos/hierarchical.cpp (line 650)


Ditto. This is not "for non-LAUNCH operations only".



src/master/allocator/mesos/hierarchical.cpp (lines 656 - 657)


Ditto, not "for non-LAUNCH followed by the LAUNCH operations."



src/master/allocator/mesos/hierarchical.cpp (line 668)


Below this line:

```
// A shared resource must have already been allocated to this framework for 
it to be eligile for allocation for additional instances.
foreach (const Resource& resource, additional) {
  CHECK(frameworkAllocation.contains(resource));
}
```



src/master/allocator/mesos/hierarchical.cpp (lines 669 - 671)


No need for the if guard, Resources's `+=` is a noop is additional is empty.



src/master/allocator/sorter/drf/sorter.cpp (line 163)


s/, and/; / is more grammatrically correct?



src/master/allocator/sorter/drf/sorter.cpp (line 166)


`const Resources`



src/master/allocator/sorter/drf/sorter.cpp (lines 167 - 171)


This is not going to remove the shared resources that have been already 
allocated to the client: there can be multpile insances of it in `resources`.

You have to start from `Resources scalars = resources.nonShared()` and add 
shared 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-08-04 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated Aug. 4, 2016, 8:26 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

Rebased.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, rescind that offer before processing the DESTROY.
o To allow multiple tasks to be launched in the same ACCEPT call
  using the same shared resource, we update the allocator and
  sorter with additional copies of shared resources to reflect the
  true shared count of allocated shared resources.


Diffs (updated)
-

  src/common/resources.cpp 2470c0280db7d48d9484c42bc2150e53e7ce6e1c 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.hpp 
bdbc6d3b5b959990538f4e3b7b1a3b031d9aea05 
  src/master/allocator/mesos/hierarchical.cpp 
7d4064535a20b93950f5a95eef1ad3f0d37d305b 
  src/master/allocator/sorter/drf/sorter.hpp 
bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
  src/master/allocator/sorter/drf/sorter.cpp 
ac85b327fc33d34246788e6a8c8bf5a486c61434 
  src/master/http.cpp e26dc2ff19fdfebc4d57009f355ebc92df3b62fd 
  src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad 
  src/master/master.cpp 17d5c5875647bb35e2518cc2bd9e134b020c05bf 
  src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
  src/master/validation.cpp af18e5aef3be59830b0a7b0235bbc739dba1029c 
  src/tests/master_validation_tests.cpp 
9eb82a569f7c48caa96d4d54a93b199ccac74385 
  src/tests/sorter_tests.cpp ade356cbbba5b93a6d3e5c9de30eefd3982d15c1 
  src/v1/resources.cpp 6c4e3b299e701d477947dd7427c31d2ae64c05ae 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-30 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated July 30, 2016, 7:03 a.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

Rebased with master.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, rescind that offer before processing the DESTROY.
o To allow multiple tasks to be launched in the same ACCEPT call
  using the same shared resource, we update the allocator and
  sorter with additional copies of shared resources to reflect the
  true shared count of allocated shared resources.


Diffs (updated)
-

  src/common/resources.cpp 468581da550bcabf44fbaba8897d5fbbc330c2cb 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.hpp 
b72ba16277a3210e4d309b616d185a10e2029a66 
  src/master/allocator/mesos/hierarchical.cpp 
7d4064535a20b93950f5a95eef1ad3f0d37d305b 
  src/master/allocator/sorter/drf/sorter.hpp 
bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
  src/master/allocator/sorter/drf/sorter.cpp 
ac85b327fc33d34246788e6a8c8bf5a486c61434 
  src/master/http.cpp e26dc2ff19fdfebc4d57009f355ebc92df3b62fd 
  src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad 
  src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
  src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
  src/master/validation.cpp 52002beac29c371411348cb026a216e99ac96ab2 
  src/tests/master_validation_tests.cpp 
9eb82a569f7c48caa96d4d54a93b199ccac74385 
  src/tests/sorter_tests.cpp ade356cbbba5b93a6d3e5c9de30eefd3982d15c1 
  src/v1/resources.cpp 230d55b099511499eef111bddd1c552df0093b82 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-28 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review143893
---




src/master/master.cpp (line 3936)


We now replaces `slave->pendingResources` with `slave->pendingTasks`.


- Anindya Sinha


On July 28, 2016, 10:44 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated July 28, 2016, 10:44 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
> https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, rescind that offer before processing the DESTROY.
> o To allow multiple tasks to be launched in the same ACCEPT call
>   using the same shared resource, we update the allocator and
>   sorter with additional copies of shared resources to reflect the
>   true shared count of allocated shared resources.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp 3dbff24d6859d3b1ed8589cec50170a5202cfbcb 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.hpp 
> b72ba16277a3210e4d309b616d185a10e2029a66 
>   src/master/allocator/mesos/hierarchical.cpp 
> 7d4064535a20b93950f5a95eef1ad3f0d37d305b 
>   src/master/allocator/sorter/drf/sorter.hpp 
> bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
>   src/master/allocator/sorter/drf/sorter.cpp 
> ac85b327fc33d34246788e6a8c8bf5a486c61434 
>   src/master/http.cpp 1cbec976ccedae81def388c62889624aaf9a794e 
>   src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad 
>   src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 52002beac29c371411348cb026a216e99ac96ab2 
>   src/tests/master_validation_tests.cpp 
> 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp ade356cbbba5b93a6d3e5c9de30eefd3982d15c1 
>   src/v1/resources.cpp 3c85dc8aa8125962b44e60806ece83a7653d0dc7 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> ---
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-28 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated July 28, 2016, 10:44 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

Rebase with master and addressed review comments.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description (updated)
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, rescind that offer before processing the DESTROY.
o To allow multiple tasks to be launched in the same ACCEPT call
  using the same shared resource, we update the allocator and
  sorter with additional copies of shared resources to reflect the
  true shared count of allocated shared resources.


Diffs (updated)
-

  src/common/resources.cpp 3dbff24d6859d3b1ed8589cec50170a5202cfbcb 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.hpp 
b72ba16277a3210e4d309b616d185a10e2029a66 
  src/master/allocator/mesos/hierarchical.cpp 
7d4064535a20b93950f5a95eef1ad3f0d37d305b 
  src/master/allocator/sorter/drf/sorter.hpp 
bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
  src/master/allocator/sorter/drf/sorter.cpp 
ac85b327fc33d34246788e6a8c8bf5a486c61434 
  src/master/http.cpp 1cbec976ccedae81def388c62889624aaf9a794e 
  src/master/master.hpp 6decff6f4b9c3434de030fd5c06df4c683a7abad 
  src/master/master.cpp 060dc7f9730808c7fd9b8f9ecdbde0aac14d135c 
  src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
  src/master/validation.cpp 52002beac29c371411348cb026a216e99ac96ab2 
  src/tests/master_validation_tests.cpp 
9eb82a569f7c48caa96d4d54a93b199ccac74385 
  src/tests/sorter_tests.cpp ade356cbbba5b93a6d3e5c9de30eefd3982d15c1 
  src/v1/resources.cpp 3c85dc8aa8125962b44e60806ece83a7653d0dc7 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-28 Thread Anindya Sinha


> On July 26, 2016, 4:06 p.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, lines 817-822
> > 
> >
> > As commented below, I realized that we may not need this. We shouldn't 
> > calculate total task resources as it's done in `Master::addTask()`.

Removed this method.


> On July 26, 2016, 4:06 p.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, line 331
> > 
> >
> > s/assigned/requested/ (assigned is a bit ambiguous, at this point 
> > resources are still not given to the task yet).

Removed `slave->pendingResources` and replaced with `slave->pendingTasks`.


> On July 26, 2016, 4:06 p.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, line 340
> > 
> >
> > s/accept/ACCEPT/

Removed `slave->pendingResources` and replaced with `slave->pendingTasks`.


> On July 26, 2016, 4:06 p.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, line 341
> > 
> >
> > s/removing it/removing them/

Removed `slave->pendingResources` and replaced with `slave->pendingTasks`.


> On July 26, 2016, 4:06 p.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, line 344
> > 
> >
> > With `hashmap pendingResources` we are 
> > preventing DESTROY from affecting pending tasks of the same framework but 
> > letting DESTROY to go through under pending tasks of other frameworks (of 
> > the same role).
> > 
> > If we use `pendingResources` it should be across frameworks right?
> > 
> > But because we already need to add
> > 
> > ```
> > multihashmap pendingTasks;
> > ```
> > 
> > for /r/47082/,
> > 
> > we can just do that instead in this review, the number of tasks pending 
> > for each slave should be small enough to loop through quickly.
> > 
> > Overall this should be more straightfoward than maintaining 
> > `pendingResources`.

Removed `slave->pendingResources` and replaced with `slave->pendingTasks`.


> On July 26, 2016, 4:06 p.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, lines 2501-2502
> > 
> >
> > Prbably can't store them here as we don't aggregate identity-based 
> > resources across agents.
> > 
> > Additionally it looks like we don't need them per comments below.

Removed these.


> On July 26, 2016, 4:06 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3672-3673
> > 
> >
> > Shouldn't need to keep track of used shared resources.

Not needed any more.


> On July 26, 2016, 4:06 p.m., Jiang Yan Xu wrote:
> > src/master/validation.hpp, line 158
> > 
> >
> > This will not take pendingTasks on the agent as the last argument.

I think you meant s/not/now. Yes.


> On July 26, 2016, 4:06 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 604-607
> > 
> >
> > Now we have to modify this method to process the LAUNCH call and adjust 
> > the API docs accordingly.

As discussed, we are not going to comment in the base class regarding using 
`Offer::Operation::LAUNCH` in `updateAllocation()` since it is specific to our 
implementation. Appropriate comments added in `hierarchical.cpp`.


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review143040
---


On July 19, 2016, 10:52 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated July 19, 2016, 10:52 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
> https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-26 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review143040
---



Haven't commented on tests. Will look at them together with other tests.

---

Chatted with BenM and Anindya. I think we are able to simplify the code and 
minimize the specially handling of shared resources by separating the 
responsibilities between master, allocator and sorter this way:

- Invariants: All usage of resources (shared or not) should have directly 
correponding allocations in the allocator and sorter.
- Allocator:
- The only thing special about shared resources in `allocate()` is that 
it's *available* once created regardless of past allocations (i.e., `total - 
allocated`), this adheres to the definition of shared resources and is easily 
explanable. When multiple instances of the same shared resources are allocated, 
all instances/copies are maintained in `slave.allocated` and other data 
structures.
- We just need to tweak `updateAllocation` for the master's special 
allocation.
- Sorter: Shared resources only needs a little special handling when creating 
`createStrippedScalarQuantity`, multiple instances can appear in a framework's 
allocation but they don't affect sorting.
- Master: The only special handlings are that 
- Task validation allows tasks that consume shared resources not in 
`_offeredResources` as long as they are in the original `offeredResources` or 
they are created in the same ACCEPT call. 
- Tasks may use more shared resources than being allocated, this leads to a 
special allocation for additional instances of these shared resources.

These semantics combined should result in simpler and smaller changes than 
maintaining other additional variables which are adjusted in special ways due 
to shared resources.


src/common/resources.cpp (line 1224)


Add comment.

// Because we currently only allow persistent volumes to be shared, the 
originally resource must be non-shared.



src/common/resources.cpp (line 1266)


Add a comment:

// Because we currently only allow persistent volumes to be shared, we 
return the resource to the non-shared state after destroy of the volume.



src/master/allocator/mesos/hierarchical.hpp (line 327)


s/each copy/the number of copies/



src/master/allocator/mesos/hierarchical.hpp (line 328)


s/allocated/allocated to/
s/recovered/recovered from/



src/master/allocator/mesos/hierarchical.cpp (lines 604 - 607)


Now we have to modify this method to process the LAUNCH call and adjust the 
API docs accordingly.



src/master/allocator/mesos/hierarchical.cpp (line 1429)


s/=/-/



src/master/allocator/sorter/drf/sorter.cpp (line 164)


s/no more/no existing/



src/master/allocator/sorter/drf/sorter.cpp (lines 165 - 166)


For this you want to exclude shared resources already allocated to the 
client but `resources` can have more instances of the same shared resources 
than `allocations[name].resources[slaveId].shared()`.



src/master/master.hpp (line 324)


s/shared resources/each shared resource/ (because later in this sentence 
you are referencing "this shared resource"



src/master/master.hpp (line 325)


Say "each copy corresponds to a running task using it" instead?



src/master/master.hpp (line 331)


s/assigned/requested/ (assigned is a bit ambiguous, at this point resources 
are still not given to the task yet).



src/master/master.hpp (line 340)


s/accept/ACCEPT/



src/master/master.hpp (line 341)


s/removing it/removing them/



src/master/master.hpp (line 344)


With `hashmap pendingResources` we are preventing 
DESTROY from affecting pending tasks of the same framework but letting DESTROY 
to go through under pending tasks of other frameworks (of the same role).

If we use `pendingResources` it should be across frameworks right?

But because we already need to add

```
multihashmap pendingTasks;
```

for /r/47082/,

we can just do that instead in this review, the number of tasks pending for 
each slave should 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-19 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated July 19, 2016, 10:52 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, resind that offer before processing the DESTROY.


Diffs (updated)
-

  src/common/resources.cpp b1bd2784aefdebf91892638b40b215373b998574 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.hpp 
b72ba16277a3210e4d309b616d185a10e2029a66 
  src/master/allocator/mesos/hierarchical.cpp 
7d4064535a20b93950f5a95eef1ad3f0d37d305b 
  src/master/allocator/sorter/drf/sorter.hpp 
bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
  src/master/allocator/sorter/drf/sorter.cpp 
ac85b327fc33d34246788e6a8c8bf5a486c61434 
  src/master/http.cpp 0293e4956afb5968f84b516e3d4b5f6f858ef7a3 
  src/master/master.hpp bade8af69f567341667b9907368207189d0dab0e 
  src/master/master.cpp 370fd8712062dc75bb81824cb99ccc7920acbf78 
  src/master/quota_handler.cpp bf6a613a7bb3c62fd77d1ffde3170749d6c21fa2 
  src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
  src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
  src/master/weights_handler.cpp 9a901e71ba2fbf2ca1c02f658a72d44cfaa5ec62 
  src/tests/master_validation_tests.cpp 
9eb82a569f7c48caa96d4d54a93b199ccac74385 
  src/tests/sorter_tests.cpp b0e5ef8a55bbcca3553a221bd5691a9c801a04f7 
  src/v1/resources.cpp 6d4ec75fbb7edab013563142aaf13d5302cc50af 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-18 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated July 18, 2016, 2:31 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

rebased.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, resind that offer before processing the DESTROY.


Diffs (updated)
-

  src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.hpp 
b72ba16277a3210e4d309b616d185a10e2029a66 
  src/master/allocator/mesos/hierarchical.cpp 
7d4064535a20b93950f5a95eef1ad3f0d37d305b 
  src/master/allocator/sorter/drf/sorter.hpp 
bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
  src/master/allocator/sorter/drf/sorter.cpp 
ac85b327fc33d34246788e6a8c8bf5a486c61434 
  src/master/http.cpp 0293e4956afb5968f84b516e3d4b5f6f858ef7a3 
  src/master/master.hpp bade8af69f567341667b9907368207189d0dab0e 
  src/master/master.cpp 370fd8712062dc75bb81824cb99ccc7920acbf78 
  src/master/quota_handler.cpp bf6a613a7bb3c62fd77d1ffde3170749d6c21fa2 
  src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
  src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
  src/master/weights_handler.cpp 9a901e71ba2fbf2ca1c02f658a72d44cfaa5ec62 
  src/tests/master_validation_tests.cpp 
9eb82a569f7c48caa96d4d54a93b199ccac74385 
  src/tests/sorter_tests.cpp b0e5ef8a55bbcca3553a221bd5691a9c801a04f7 
  src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-17 Thread Anindya Sinha


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review141305
---


On July 17, 2016, 6:28 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated July 17, 2016, 6:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
> https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, resind that offer before processing the DESTROY.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.hpp 
> b72ba16277a3210e4d309b616d185a10e2029a66 
>   src/master/allocator/mesos/hierarchical.cpp 
> 7d4064535a20b93950f5a95eef1ad3f0d37d305b 
>   src/master/allocator/sorter/drf/sorter.hpp 
> bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
>   src/master/allocator/sorter/drf/sorter.cpp 
> ac85b327fc33d34246788e6a8c8bf5a486c61434 
>   src/master/http.cpp d1fe22bd2d9eb6bd82e32cf01720bbe3d8be26d5 
>   src/master/master.hpp ac998b1f5b305a9bff9d9e5cd205a6c3481f9b38 
>   src/master/master.cpp 61eaa4a92741a2d1e3f6624c555a40f6b9240d90 
>   src/master/quota_handler.cpp bf6a613a7bb3c62fd77d1ffde3170749d6c21fa2 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
>   src/master/weights_handler.cpp 9a901e71ba2fbf2ca1c02f658a72d44cfaa5ec62 
>   src/tests/master_validation_tests.cpp 
> 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp b0e5ef8a55bbcca3553a221bd5691a9c801a04f7 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> ---
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-17 Thread Anindya Sinha


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > Adding to the previous partial review after discussion.
> > 
> > ## Invariant checking and documentation
> > 
> > An overall comment is that I think since we keep track of shared resource 
> > across Master, allocator and sorter, each for a different aspect (allocated 
> > vs. used vs. available, etc.) and with different semantics and expactations 
> > on uniquesness/number of copies etc., we should very explicitly document 
> > these variables and method arguments and check their invariants.
> > 
> > Specifically, the following diagram helped me understand the relationship 
> > between these variables.
> > 
> > Sorter
> > |
> > |- allocations: hashmap
> >  |
> >  |- resources: hashmap > Resources>
> >  |- scalarQuantities 
> > 
> > `Allocation.resources` expects 1 copy of each shared resource because it's 
> > tracking whether a client has been allocated this resource and doesn't care 
> > about how many times it's been allocated or used by the client. Therefore 
> > distinct shared resources is an invariant that we should check in 
> > `allocated()`.
> > 
> > Allocator
> > |
> > |- slaves: hashmap
> > |
> > |- allocated: Resources
> > 
> > `Slave.allocated` expects 1 copy of each shared resource **per client** and 
> > it's grouped into one Resources object.
> > 
> > Master
> > |
> > |- Slave
> > |  |
> > |  |- usedResources: hashmap
> > |
> > |- Framework
> >|
> >|- usedResources: hashmap
> >
> > `Slave.usedResources` and `Framework.usedResources` expect 1 copy of each 
> > shared resource **per task** as they are tracking "usage".

Yes, I will update the comments to reflect scope of shared count (ie. copies) 
in the context of the `Resources` objects used in master, allocator and sorter. 
So, now with the current version:

1) Sorter=> allocations (resources, scalarQuantities): 1 copy of each shared 
resource per allocation (that has not been recovered) by a client.
2) Allocator=> slaves[].allocated: 1 copy of each shared resource per 
allocation (that has not been recovered) by a framework.
3) Master=> usedResources (in Slave and Framework): 1 copy of each shared 
resource per task as they are tracking "usage".


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, line 274
> > 
> >
> > Chatted offline. We should refactor this into a 
> > `Master::recoverResources()` call to encapsulate this logic and call 
> > `allocator->recoverResources()` inside with adjusted arguments.

Wrapped `Allocator::recoverResources()` within `Master::recoverResources()`.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 450-459
> > 
> >
> > Seems like in `slaves[slaveId].allocated` we should be maintaining one 
> > copy of each shared resource **per framework** but this if condition here 
> > assumes one copy of each shared resource across framework.

Yes, this was done when we maintained atmost a single copy of a shared 
resource. Now with 1 copy of shared resource per framework, we need to just sum 
up all resources.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/tests/master_validation_tests.cpp, lines 630-632
> > 
> >
> > Here we are testing that can't destroy a  nonshared persistent volume 
> > `disk2` because it's in use?
> > 
> > Perhaps it's still reasonable to test it (even thought it can't happen 
> > right now) but it's not relevant to this test?

Ok. Since it is not a use case that happens in the `DESTROY` callflow, I think 
we can remove it.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/tests/sorter_tests.cpp, lines 520-521
> > 
> >
> > Even if a and b both use 1/10 of the persistent volume, shouldn't they 
> > each get 0.1 share?

Yes, the shares is true based on when we were evenly distributing the shares 
across frameworks to which the shared resources were allocated to. I did not 
update the test in this commit so you do not see it. I will fix this to match 
the current behavior.


> On July 12, 2016, 7:26 a.m., Jiang Yan Xu wrote:
> > src/tests/sorter_tests.cpp, lines 499-500
> > 
> >
> > This is not a realistic example. "id1" is already a volume, you 
> > shouldn't claim 1/10 of a volume. (I don't see us validating this though)

Yes, a typo 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-17 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated July 17, 2016, 6:28 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

Updates based on review comments. Also, fixed filtering based on shared 
resources by maintining n copies of shared resources (allocated but not 
recovered) in the allocator (for slaves) as well as the various sorters (for 
respective clients).


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, resind that offer before processing the DESTROY.


Diffs (updated)
-

  src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.hpp 
b72ba16277a3210e4d309b616d185a10e2029a66 
  src/master/allocator/mesos/hierarchical.cpp 
7d4064535a20b93950f5a95eef1ad3f0d37d305b 
  src/master/allocator/sorter/drf/sorter.hpp 
bc6bfb2d5d3b32d55be055a0514861b4e7d889bb 
  src/master/allocator/sorter/drf/sorter.cpp 
ac85b327fc33d34246788e6a8c8bf5a486c61434 
  src/master/http.cpp d1fe22bd2d9eb6bd82e32cf01720bbe3d8be26d5 
  src/master/master.hpp ac998b1f5b305a9bff9d9e5cd205a6c3481f9b38 
  src/master/master.cpp 61eaa4a92741a2d1e3f6624c555a40f6b9240d90 
  src/master/quota_handler.cpp bf6a613a7bb3c62fd77d1ffde3170749d6c21fa2 
  src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
  src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
  src/master/weights_handler.cpp 9a901e71ba2fbf2ca1c02f658a72d44cfaa5ec62 
  src/tests/master_validation_tests.cpp 
9eb82a569f7c48caa96d4d54a93b199ccac74385 
  src/tests/sorter_tests.cpp b0e5ef8a55bbcca3553a221bd5691a9c801a04f7 
  src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-17 Thread Anindya Sinha


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3499-3502
> > 
> >
> > `task.has_executor()` doesn't always lead to additional resources being 
> > used/charged against the framework. See `Master::addTask()`. It will suck a 
> > lot if we have to duplicate that logic here though so we might have to 
> > consider doing the pending task refactor first...

Refactored `Master::addTask()` and moved calculation of resources for the task 
into a separate function `Master::totalTaskResources()`, and used this function 
to determine the resources for pending tasks (in `Slave::pendingResources`).


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3913-3916
> > 
> >
> > Need to check whether this executor actually consumes resources.

Please refer to earlier comment regarding `Master::totalTaskResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 4291-4301
> > 
> >
> > Ditto about `task.has_executor()`.

Please refer to earlier comment regarding `Master::totalTaskResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, line 3497
> > 
> >
> > Checking `task.has_executor()` is insufficient as a condition to add 
> > the executor's resources because they may not consume additional resources.

Please refer to earlier comment regarding `Master::totalTaskResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, line 3911
> > 
> >
> > Ditto to my comment above, we don't know if the task's executor 
> > actually consumes resources by just checking `task.has_executor()`.

Please refer to earlier comment regarding `Master::totalTaskResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3601-3602
> > 
> >
> >

Wrapped `Allocator::recoverResources()` into a new function 
`Master::recoverResources()` with the same arguments. Inside 
`Master::recoverResources()`, we check for `slave == nullptr` to determine what 
resources are recoverable, and call `Allocator::recoverResources()` thereafter.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.hpp, line 274
> > 
> >
> > Add some comment to help explain why we need this method:
> > 
> > ```
> >   // Return a subset of `resources` offered to `frameworkId` that can 
> >   // be recovered. Right now we filter out shared resources that are
> >   // still used by `frameworkId`.
> >   // NOTE: A framework can reuse a shared resource to launch multiple
> >   // tasks and the allocator only recovers a shared resource allocated
> >   // to the framework if it's not used by any task of the framework.
> > ```

Refactored this within `Master::recoverResources()` and added comments within 
that function.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, line 3411
> > 
> >
> > Shouldn't this go through `recoverable()` as well?
> > 
> > It'll be helpful if add some comments on the `recoverResources` to 
> > spell out what exactly is expected.

Wrapped `Allocator::recoverResources()` into `Master::recoverResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3599-3600
> > 
> >
> > At least we need a comment on why it's safe to recover all 
> > `offeredResources` if `slave == nullptr`: if slave is nullptr, no 
> > `usedResources` is on it so all resources should be recoverable.
> > 
> > But honestly this looks odd and we need to invoke this conditional 
> > operator in multiple places. Adding the same comment to all places this is 
> > called is also very redundant.
> > 
> > Perhaps `recoverable` can be moved to Master and hide this detail about 
> > `slave == nullptr`? Perhaps we should just wrap around 
> > `allocator->recoverResources` in Master to insert this logic.

Wrapped `Allocator::recoverResources()` into `Master::recoverResources()`.


> On July 11, 2016, 6:17 p.m., Jiang Yan Xu wrote:
> > src/master/master.cpp, lines 3643-3644
> > 
> >
> > Ditto.

Wrapped `Allocator::recoverResources()` into 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-12 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review141757
---



Adding to the previous partial review after discussion.

## Invariant checking and documentation

An overall comment is that I think since we keep track of shared resource 
across Master, allocator and sorter, each for a different aspect (allocated vs. 
used vs. available, etc.) and with different semantics and expactations on 
uniquesness/number of copies etc., we should very explicitly document these 
variables and method arguments and check their invariants.

Specifically, the following diagram helped me understand the relationship 
between these variables.

Sorter
|
|- allocations: hashmap
 |
 |- resources: hashmap
 |- scalarQuantities 

`Allocation.resources` expects 1 copy of each shared resource because it's 
tracking whether a client has been allocated this resource and doesn't care 
about how many times it's been allocated or used by the client. Therefore 
distinct shared resources is an invariant that we should check in `allocated()`.

Allocator
|
|- slaves: hashmap
|
|- allocated: Resources

`Slave.allocated` expects 1 copy of each shared resource **per client** and 
it's grouped into one Resources object.

Master
|
|- Slave
|  |
|  |- usedResources: hashmap
|
|- Framework
   |
   |- usedResources: hashmap
   
`Slave.usedResources` and `Framework.usedResources` expect 1 copy of each 
shared resource **per task** as they are tracking "usage".


src/master/allocator/mesos/hierarchical.cpp (lines 450 - 459)


Seems like in `slaves[slaveId].allocated` we should be maintaining one copy 
of each shared resource **per framework** but this if condition here assumes 
one copy of each shared resource across framework.



src/master/allocator/mesos/hierarchical.cpp (lines 1386 - 1400)


Ignore my previous comment on this. It wasn't clear to me what invariant 
for `slaves[slaveId].allocated` is. Now I have looked again, we define shared 
resources in `slaves[slaveId].allocated` as: We maintain one copy of shared 
resource in `slaves[slaveId].allocated` **for each framework it is allocated 
to**. 

The first question is why do we do this? I can understand that this way if 
a shared resource is allocated to N frameworks, when it's recovered from all N 
frameworks it disappears from `slaves[slaveId].allocated`.

We should document this where this variable is defined and check this 
invariant wherever appropriate.

And in here, is this equivalent to the block of code?

```
slaves[slaveId].allocated += resources - allocation.shared();
```



src/master/allocator/mesos/hierarchical.cpp (lines 1406 - 1409)


Since `allocated` above already excludes shared resources already allocated 
to this framework, why couldn't be just pass it along to 
`frameworkSorters[role]->add()|allocated()`?

For `roleSorter->allocated()` and `quotaRoleSorter->allocated()` we can use 
`resources - slaves[slaveId].allocated.shared()` right?



src/master/allocator/mesos/hierarchical.cpp (lines 1570 - 1571)


To expand on this: I understand that if shared resources are offered out in 
the 1st stage, it'll be exluded from `remainingClusterResources` so by 
excluding shared resources from `scalarQuantity` we are comparing apples to 
apples. 

On the other hand it's possible for some shared resources to be not offered 
in the 1st stage so it'll remain in `remainingClusterResources`, if we  exclude 
these shared resources from `scalarQuantity` we could be over-allocating 
nonshared resources.

Perhaps we should be consistently checking nonshared resources to prevent 
over-allocation in stage 2. i.e.,

```
// We exlude shared resources from over-allocation check because shared 
resources are alwas allocatable.
if (!remainingClusterResources.nonShared().contains(
(allocatedStage2 + scalarQuantity).nonShared())) {
  continue;
}
```



src/master/allocator/sorter/drf/sorter.hpp (line 168)


Some comments about shared resources would be nice.

```
// We maintain one copy of each shared resource allocated to a client here. 
A shared resource may be offered to the same client multiple times but here we 
are only concerned with whether it's in the 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-11 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review141305
---



Partial review. Let's discuss first.


src/master/master.cpp (lines 3495 - 3498)


`task.has_executor()` doesn't always lead to additional resources being 
used/charged against the framework. See `Master::addTask()`. It will suck a lot 
if we have to duplicate that logic here though so we might have to consider 
doing the pending task refactor first...



src/master/master.cpp (lines 3597 - 3598)






src/master/master.cpp (lines 3907 - 3910)


Need to check whether this executor actually consumes resources.



src/master/allocator/mesos/hierarchical.cpp (line 1271)


A comment on what it is used for?

```
Due to the two stages in the allocation algorithm and the nature of shared 
resources being re-offerable even if already allocated, the same shared 
resources can appear in two distinct offers in one allocation cycle. This would 
be bad because the allocator API contract shouldn't depend on its 
implementation details. For now we make sure a shared resource is only 
allocated once in one offer cycle. To achieve this we use 
`offeredSharedResources` to keep track of shared resources already allocated in 
the current cycle.
```



src/master/allocator/mesos/hierarchical.cpp (lines 1331 - 1340)


Given this is the 1st stage of the allocate cycle there's no chance a 
shared resource could be in `offeredSharedResources` already?



src/master/allocator/mesos/hierarchical.cpp (lines 1386 - 1400)


IIUC the comment still means "this is done to make sure shared resources 
are counted at most once in the framework's allocation".

The last revision had:

```
slaves[slaveId].allocated -= resources.shared();
slaves[slaveId].allocated += resources;
```

How is this longer version different?



src/master/allocator/mesos/hierarchical.cpp (lines 1499 - 1513)


Is this equivalent?

```
Resources available = slaves[slaveId].total.nonShared() - 
slaves[slaveId].allocated).nonShared();
available += slaves[slaveId].total.shared() - 
offeredSharedResources[slaveId];
```



src/master/allocator/mesos/hierarchical.cpp (lines 1570 - 1571)


Explain the `nonShared()` here.

`remainingClusterResources` doesn't exclude shared resources.



src/master/allocator/mesos/hierarchical.cpp (lines 1595 - 1605)


Ditto to my comment on the same code above.



src/master/allocator/sorter/drf/sorter.cpp (lines 163 - 170)


So `DRFSorter::allocated` makes sure there's no duplicate shared resources 
in `allocations` so the caller can call it without pruning out the redundant 
copies. However the `unallocated` counterpart doesn't do the same? Ideally 
usage of the pair of API should be symmetric.



src/master/allocator/sorter/drf/sorter.cpp (line 194)


No longer relevant?



src/master/master.hpp (line 274)


Add some comment to help explain why we need this method:

```
  // Return a subset of `resources` offered to `frameworkId` that can 
  // be recovered. Right now we filter out shared resources that are
  // still used by `frameworkId`.
  // NOTE: A framework can reuse a shared resource to launch multiple
  // tasks and the allocator only recovers a shared resource allocated
  // to the framework if it's not used by any task of the framework.
```



src/master/master.hpp (line 278)


s/updated/result/.



src/master/master.hpp (lines 280 - 286)


If `updated` has multple copies of a shared resource, the `-=` here can't 
remove it.

We have to add it back. The result would have a single copy of the shared 
resource that are unused (by this framework; I think this is what we want).

```
Resources recoverable(const Resources& resources)
{
  Resources recoverable = resources.nonShared();
  foreach (const Resource& resource, resources.shared()) {
if (!usedResources[frameworkId].contains(resource)) {
  recoverable += resource;
}
  }

  return recoverable;
}
```




Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-07 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated July 7, 2016, 3:44 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, resind that offer before processing the DESTROY.


Diffs (updated)
-

  src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.cpp 
c1e00039606164599e25ff5f76245e4d35ec3112 
  src/master/allocator/sorter/drf/sorter.hpp 
e29feebd70277c79f7c3f6fb233e7a36501cf220 
  src/master/allocator/sorter/drf/sorter.cpp 
7df4dd641b21ea0705368861bf4679fed1ef078d 
  src/master/http.cpp bff1fd53462bfec19e4a025c49a00e2855faf4f3 
  src/master/master.hpp 60efd22280c62801b080365986fe9773737ca563 
  src/master/master.cpp 79e3d78ba45060bc2f2532fdc3d105d1cc888d0f 
  src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
  src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
  src/tests/master_validation_tests.cpp 
9eb82a569f7c48caa96d4d54a93b199ccac74385 
  src/tests/sorter_tests.cpp 0a921347586808863e615ca3dcc23fae92b629f5 
  src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-03 Thread Anindya Sinha


> On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.cpp, lines 457-467
> > 
> >
> > For this we need to
> > 
> > 1. Run existing benchmarks to see how much performance degradation is 
> > there for clusters (large number of agents or frameworks) that are not 
> > using shared resources. Ideally there should be negligible performance 
> > impact for such clusters.
> > 2. Create a benchmark that uses shared resources in a realistic way.
> > 
> > In implementing this patch we do have the options to store shared 
> > resources separately in a few places (e.g., store shared resources outside 
> > of `allocations[name].resources`) that can help with performance but the 
> > trade off is that it may increase code complexity. Benchmarks can help us 
> > address these tradeoffs.
> 
> Anindya Sinha wrote:
> Verified that changes in allocator and sorter does not affect run time 
> performance adversely for regular resources. I will add a benchmark for 
> shared resources in the next update.

Added a new benchmark test in https://reviews.apache.org/r/49571/, and verifies 
allocation times over a large number of agents and frameworks is not impacted 
adversely with this change.


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review136970
---


On July 1, 2016, 10:28 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated July 1, 2016, 10:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
> https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> o Each shared resource is accouted via its share count. This count is
>   updated based on the resource operations (such as add and subtract)
>   in various scenarios such as task launch and terminate at multiple
>   modules such as master, allocator, sorter, etc.
> o Only allow DESTROY if there are no running or pending tasks using
>   the volume. However, if the volume is in a pending offer to one or
>   more frameworks, resind that offer before processing the DESTROY.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.cpp 
> eca949e07abb00423a2f35a56eedc5d4287d81f3 
>   src/master/allocator/sorter/drf/sorter.hpp 
> 0aa1a71da4501a3b469d07538a043b4c1d74d688 
>   src/master/allocator/sorter/drf/sorter.cpp 
> 967290d4d1100208900b4b724422c3218abc23cb 
>   src/master/http.cpp 7b2f77b9264242f31ab62eb9db7c621a1b8aa2fe 
>   src/master/master.hpp be7cd239c49a0710a29a8187c01484f9f6d615e1 
>   src/master/master.cpp 78a8889313179b4509a6657e6c61d9f6d3bb99c0 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
>   src/tests/master_validation_tests.cpp 
> 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp 6fc58829892dc0223140f1b47593a3e5853cace5 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> ---
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-01 Thread Anindya Sinha


> On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.cpp, lines 457-467
> > 
> >
> > For this we need to
> > 
> > 1. Run existing benchmarks to see how much performance degradation is 
> > there for clusters (large number of agents or frameworks) that are not 
> > using shared resources. Ideally there should be negligible performance 
> > impact for such clusters.
> > 2. Create a benchmark that uses shared resources in a realistic way.
> > 
> > In implementing this patch we do have the options to store shared 
> > resources separately in a few places (e.g., store shared resources outside 
> > of `allocations[name].resources`) that can help with performance but the 
> > trade off is that it may increase code complexity. Benchmarks can help us 
> > address these tradeoffs.

Verified that changes in allocator and sorter does not affect run time 
performance adversely for regular resources. I will add a benchmark for shared 
resources in the next update.


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review136970
---


On June 23, 2016, 3:50 p.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated June 23, 2016, 3:50 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael 
> Park, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
> https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Each shared resource is accouted via its share count. This count is
> updated based on the resource operations (such as add and subtract)
> in various scenarios such as task launch and terminate at multiple
> modules such as master, allocator, sorter, etc.
> Only allow DESTROY if no task is using the volume, or has not been
> offered to any framework.
> Since shared resources are available to multiple frameworks of the
> same role simultaneously, we normalize it with a weight equivalent
> to the number of frameworks to which this shared resource has been
> allocated to in the sorter.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.cpp 
> 5b2331b3f99252fd16f2514123006dd4ba7f1c0d 
>   src/master/allocator/sorter/drf/sorter.hpp 
> 35273b5dcf39938125a112c5e56b5a8a542157db 
>   src/master/allocator/sorter/drf/sorter.cpp 
> 27d56f274c41bbabe6f2abbbcebd2c4eff52693e 
>   src/master/http.cpp 7daaf12a4086635bbc5aba5e3375c95e8899ac6e 
>   src/master/master.hpp fe57878dc59637459d5c5cdae0be2aa159133fa4 
>   src/master/master.cpp 8def7156f4a05b39590321ce7743f7385a68bed0 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 9120b71fc7725bdf7094aac6619d8aadcc352df5 
>   src/tests/master_validation_tests.cpp 
> 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp 6fc58829892dc0223140f1b47593a3e5853cace5 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> ---
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-07-01 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated July 1, 2016, 10:28 p.m.)


Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael Park, 
and Jiang Yan Xu.


Changes
---

Following changes have been addressed for shared resources based on discussion 
in Mesos Working Group:
1) DESTROY on a persistent volume should not be allowed if there is a running 
or pending task with that persistent volume. However, if the volume to be 
DESTROYed has been offered to another framework, proceed with the DESTROY but 
rescind all such offers.
2) Give equal share to all clients in calculation of dominant resource if that 
client has a shared resource allocated (instead of any normalization).


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description (updated)
---

o Each shared resource is accouted via its share count. This count is
  updated based on the resource operations (such as add and subtract)
  in various scenarios such as task launch and terminate at multiple
  modules such as master, allocator, sorter, etc.
o Only allow DESTROY if there are no running or pending tasks using
  the volume. However, if the volume is in a pending offer to one or
  more frameworks, resind that offer before processing the DESTROY.


Diffs (updated)
-

  src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.cpp 
eca949e07abb00423a2f35a56eedc5d4287d81f3 
  src/master/allocator/sorter/drf/sorter.hpp 
0aa1a71da4501a3b469d07538a043b4c1d74d688 
  src/master/allocator/sorter/drf/sorter.cpp 
967290d4d1100208900b4b724422c3218abc23cb 
  src/master/http.cpp 7b2f77b9264242f31ab62eb9db7c621a1b8aa2fe 
  src/master/master.hpp be7cd239c49a0710a29a8187c01484f9f6d615e1 
  src/master/master.cpp 78a8889313179b4509a6657e6c61d9f6d3bb99c0 
  src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
  src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd 
  src/tests/master_validation_tests.cpp 
9eb82a569f7c48caa96d4d54a93b199ccac74385 
  src/tests/sorter_tests.cpp 6fc58829892dc0223140f1b47593a3e5853cace5 
  src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-06-22 Thread Anindya Sinha


> On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote:
> > A high level comment is that when we are making significant changes to the 
> > allocator we have to measure the performance implication both when clusters 
> > don't use this feature and when clusters do in benchmarks.

+1. Will do.


- Anindya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review136970
---


On June 23, 2016, 1:27 a.m., Anindya Sinha wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45961/
> ---
> 
> (Updated June 23, 2016, 1:27 a.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
> https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Each shared resource is accouted via its share count. This count is
> updated based on the resource operations (such as add and subtract)
> in various scenarios such as task launch and terminate at multiple
> modules such as master, allocator, sorter, etc.
> Only allow DESTROY if no task is using the volume, or has not been
> offered to any framework.
> Since shared resources are available to multiple frameworks of the
> same role simultaneously, we normalize it with a weight equivalent
> to the number of frameworks to which this shared resource has been
> allocated to in the sorter.
> 
> 
> Diffs
> -
> 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
>   src/master/allocator/mesos/hierarchical.cpp 
> 5b2331b3f99252fd16f2514123006dd4ba7f1c0d 
>   src/master/allocator/sorter/drf/sorter.hpp 
> 35273b5dcf39938125a112c5e56b5a8a542157db 
>   src/master/allocator/sorter/drf/sorter.cpp 
> 27d56f274c41bbabe6f2abbbcebd2c4eff52693e 
>   src/master/http.cpp 7daaf12a4086635bbc5aba5e3375c95e8899ac6e 
>   src/master/master.hpp fe57878dc59637459d5c5cdae0be2aa159133fa4 
>   src/master/master.cpp 8def7156f4a05b39590321ce7743f7385a68bed0 
>   src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
>   src/master/validation.cpp 9120b71fc7725bdf7094aac6619d8aadcc352df5 
>   src/tests/master_validation_tests.cpp 
> 9eb82a569f7c48caa96d4d54a93b199ccac74385 
>   src/tests/sorter_tests.cpp 6fc58829892dc0223140f1b47593a3e5853cace5 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/45961/diff/
> 
> 
> Testing
> ---
> 
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-06-22 Thread Anindya Sinha


> On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote:
> > src/common/resources.cpp, line 1773
> > 
> >
> > Make sure we add this to the tests.
> > 
> > So now the shared persistent volume could look like `disk(alice, 
> > hdfs-p, {foo: bar, foo})[hadoop:/hdfs:/data:rw]:1<6>`
> > 
> > Would `` be a little redundant given that we have `<6>`?

I think having the `` makes it clearer. I would want to keep the 
`` while logging the `Resource` object. Test case 
`ResourcesTest.PrintingSharedResources` added in 
https://reviews.apache.org/r/45964/.


> On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.hpp, line 171
> > 
> >
> > Is `nonSharedScalarQuantities` a better name?

I would prefer to not change the name of this existing member variable and 
change at a whole bunch of places in the sorter. I think a comment should 
suffice.


> On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.hpp, line 181
> > 
> >
> > I feel that we can find a better place/name for this field. 
> > `shareResources` doesn't say if it's the allocation or the total pool.
> > 
> > I feel like it should be in `total_` as it's reflects a state of the 
> > total pool. If we leave it here then perhaps give it a clearer name? What 
> > do you think?

I would rename it to `allocatedSharedResources`. I think putting it in total 
gives an inaccurate meaning since we are storing the allocated shared resources 
here (with n copies).


> On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 898-901
> > 
> >
> > Can't quite understand this comment as to providing an answer for the 
> > use of `.nonShared()` on `resources`?

We always keep one copy of a specific shared resource in 
`slaves[slaveId].allocated`. Assume 2 offers go out to 2 different frameworks 
with the same shared resource. Let us say Framework 1 returns it back, so we 
subtract the shared resource from `slaves[slaveId].allocated`, hence the shared 
resource is removed from `slaves[slaveId].allocated` (since its shared count is 
always 1 if present). When Framework 2 also returns it, the 
`CHECK(slaves[slaveId].allocated.contains(resources))` would fail. That is why 
we only check against non shared resources.

However, I think the following would be a better approach. We can move to a 
model of keeping n copies for shared resources in `slaves[slaveId].allocated` 
(where n is the number of frameworks who have been offered this shared 
resource). As a result, we can update the slave's allocated list without 
worrying about the shared count being 0 for shared resources.


> On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, line 1321
> > 
> >
> > If a framework explicitly turns down the shared resources we should not 
> > send them back again right?

If we check for shared resource in `isFiltered()`, we should be able to filter 
resources correctly if the shared resource being offered in the current offer 
cycle was returned by the framework in the previous offer cycle (i.e. 
`OfferFilter` created in `recoverResources()` includes the shared resources.

However, let us say we offered shared resource in previous offer cycle and the 
framework launched a task in the `ACCEPT` call using that shared resource (or 
the other case is although the framework returned the shared resource back, but 
it already has 1 or more tasks running from a previous accept cycle using that 
shared resource). In that case, the `OfferFilter` would not contain the shared 
resource. In the next offer cycle, we will have the shared resource in call to 
`isFiltered()` [since the shared resource is always added] which would result 
`isFiltered()` to be `false` which means the offer shall be sent to this 
framework every seond (instead of `refuse_seconds`). Since shared resource is 
always added in the calculation, using a shared resource in a task by a 
framework would result in too frequent offers when a task is using that shared 
resource, which does not seem correct since the shared resource is always added 
in the offer to applicable frameworks.

On further discussion: Since we send shared resource only once, I think we 
should be fine with filtering on all resources since only 1 framework gets 
offered every 1 second and the remaining ones do not.


> On June 14, 2016, 10:50 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 1426-1428
> > 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-06-22 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated June 23, 2016, 1:27 a.m.)


Review request for mesos and Jiang Yan Xu.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description (updated)
---

Each shared resource is accouted via its share count. This count is
updated based on the resource operations (such as add and subtract)
in various scenarios such as task launch and terminate at multiple
modules such as master, allocator, sorter, etc.
Only allow DESTROY if no task is using the volume, or has not been
offered to any framework.
Since shared resources are available to multiple frameworks of the
same role simultaneously, we normalize it with a weight equivalent
to the number of frameworks to which this shared resource has been
allocated to in the sorter.


Diffs (updated)
-

  src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.cpp 
5b2331b3f99252fd16f2514123006dd4ba7f1c0d 
  src/master/allocator/sorter/drf/sorter.hpp 
35273b5dcf39938125a112c5e56b5a8a542157db 
  src/master/allocator/sorter/drf/sorter.cpp 
27d56f274c41bbabe6f2abbbcebd2c4eff52693e 
  src/master/http.cpp 7daaf12a4086635bbc5aba5e3375c95e8899ac6e 
  src/master/master.hpp fe57878dc59637459d5c5cdae0be2aa159133fa4 
  src/master/master.cpp 8def7156f4a05b39590321ce7743f7385a68bed0 
  src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 
  src/master/validation.cpp 9120b71fc7725bdf7094aac6619d8aadcc352df5 
  src/tests/master_validation_tests.cpp 
9eb82a569f7c48caa96d4d54a93b199ccac74385 
  src/tests/sorter_tests.cpp 6fc58829892dc0223140f1b47593a3e5853cace5 
  src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-06-14 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review136970
---



A high level comment is that when we are making significant changes to the 
allocator we have to measure the performance implication both when clusters 
don't use this feature and when clusters do in benchmarks.


src/common/resources.cpp (line 1225)


A blank line after `}`.



src/common/resources.cpp (line 1266)


A blank line after `}`.



src/common/resources.cpp (line 1773)


Make sure we add this to the tests.

So now the shared persistent volume could look like `disk(alice, hdfs-p, 
{foo: bar, foo})[hadoop:/hdfs:/data:rw]:1<6>`

Would `` be a little redundant given that we have `<6>`?



src/master/allocator/mesos/hierarchical.cpp (lines 898 - 901)


Can't quite understand this comment as to providing an answer for the use 
of `.nonShared()` on `resources`?



src/master/allocator/mesos/hierarchical.cpp (line 1282)


"the difference in non-shared resources between total and allocated plus 
all shared resources on the slave"



src/master/allocator/mesos/hierarchical.cpp (lines 1283 - 1284)


To clarify "we always have one copy of any shared resource", how about:

```
Since shared resources are offerable even when they are in use, in stage 1 
of each allocation cycle we make one copy of the shared resources available 
regardless of the past allocations.
```



src/master/allocator/mesos/hierarchical.cpp (line 1321)


If a framework explicitly turns down the shared resources we should not 
send them back again right?



src/master/allocator/mesos/hierarchical.cpp (lines 1426 - 1428)


Thinking about this, we should discuss what the best behavior is for now:

In the current design two offers from the same agent can already be created 
due to the two stage allocation algorithm. However the same resource are never 
sent in two offers.

Here we have two options for shared resources for stage 2:

1. Replenish 'available' with a (potential) 2nd copy of the same shared 
resources, or
2. Don't add the 2nd copy if it's already in an offer in this cycle but DO 
add another copy if it's not in an offer.

Eventually we may be sending the same shared resources to all eligible 
frameworks anyways but for now it feels that 2) is more in line with the 
current behavior: one resource in one offer in one offer cycle.

Thoughts?



src/master/allocator/sorter/drf/sorter.hpp (line 157)


Here also add:

```
NOTE: Sharedness info is also stripped out when resource identities are 
omitted because shareness inherently refers to the identities of resources and 
not quantities.
```



src/master/allocator/sorter/drf/sorter.hpp (lines 164 - 165)


At the end of the 1st sentences, add `(for non-shared resources only)`.



src/master/allocator/sorter/drf/sorter.hpp (lines 167 - 169)


So far there is not a place that lays out the algorithm coherently about 
how DRF works with shared resources. We can describe it in detail in 
`calculateShare` and refer to it here.

Here we can emphasize on how we structure the data members to make the 
algorithm work:

```
We do not aggregate shared resources here since the client's "normalized 
allocation" of a shared resource can only be derived based on the its state in 
`total_.sharedResources` when `calculateShare()` is run. See `calculateShare()` 
for comments on the algorithm.
```



src/master/allocator/sorter/drf/sorter.hpp (line 171)


Is `nonSharedScalarQuantities` a better name?



src/master/allocator/sorter/drf/sorter.hpp (lines 177 - 180)






src/master/allocator/sorter/drf/sorter.hpp (line 181)


I feel that we can find a better place/name for this field. 
`shareResources` doesn't say if it's the allocation or the total pool.

I feel like it should be in `total_` as it's reflects a state of the total 
pool. If we leave it here then perhaps give it a clearer name? What do you 
think?



src/master/allocator/sorter/drf/sorter.cpp (lines 206 - 207)

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-06-13 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated June 13, 2016, 7:20 a.m.)


Review request for mesos and Jiang Yan Xu.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

Each shared resource is accouted via its share count. This count is
updated based on the resource operations (such as add and subtract)
in various scenarios such as task launch and terminate at multiple
modules such as master, allocator, sorter, etc.
Allow DESTROY for shared volumes only if share count is 0.
Since shared resources are available to multiple frameworks of the
same role simultaneously, we normalize it with a weight equivalent
to the number of frameworks to which this shared resource has been
allocated to in the sorter.


Diffs (updated)
-

  src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.cpp 
8b7b3afb5770c617918ec4864faaf8d8a7a81ef2 
  src/master/allocator/sorter/drf/sorter.hpp 
88bd9233e345ff679051703a40f3be47d7a86e1a 
  src/master/allocator/sorter/drf/sorter.cpp 
65d473a5da0d846214c930c14d333040b2085b13 
  src/master/http.cpp cd089cf6afd654b07608548f441a0e225df6425e 
  src/master/master.hpp 2c45dab291a153b42809ab12e4252bf58559feeb 
  src/master/master.cpp dd80367768a1edbb6ff94d40819b9755a8b8135f 
  src/master/validation.hpp e1271bbaebaac52a078eedfdc2299c7c6f4474f8 
  src/master/validation.cpp 7b9c2281b2ab1295211f0dd385cd77947fbd63be 
  src/tests/master_validation_tests.cpp 
9eb82a569f7c48caa96d4d54a93b199ccac74385 
  src/tests/sorter_tests.cpp eb207a36c18198588cd8a98b3860a66c2ff7a641 
  src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-05-22 Thread Anindya Sinha


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > Looks like the SHARE and UNSHARE operations are not in this patch?

Yes that is correct.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/common/resources.cpp, line 1328
> > 
> >
> > Should we require persistent volumes to be unshared before we destroy 
> > them?
> > 
> > Here the concern is that we are clearing things in DESTROY that are not 
> > added by its pairing operation CREATE.

Since SHARE and UNSHARE apis are not supported yet, the only way to mark a 
resource as shared is in CREATE so clearing it in DESTROY seems correct.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/common/resources_utils.cpp, lines 71-74
> > 
> >
> > Do we need `Resources::isShared(resource)`?
> > 
> > If we don't need that method, here we can actually just 
> > `stripped.clear_shared()`.

I will remove the check for `if (Resources::isShareable(resource))` [now 
`Resources::isShared(resource)`] and always do a `stripped.clear_shared()`. I 
would keep the `Resources::isShared()` method since that is the pattern being 
used for `Resources::shared()` and `Resources::nonShared()`, which is how it is 
done for revocable resources as well.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > include/mesos/resources.hpp, lines 424-430
> > 
> >
> > See my comment at the definition.

Refactored this within sorter.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/common/resources.cpp, lines 1464-1496
> > 
> >
> > This concept of weightedGet tightly couples with the sorter logic which 
> > is not used elsewhere so it shouldn't be pulled out.
> > 
> > Here to have something that's more easily explainable in a generic way, 
> > perhaps we can introduce something like
> > 
> > 
> > ```
> > // Returns the count of the target Resource in the Resources.
> > size_t Resources::count(const Resource& target) const
> > {
> >   // Search the list of resources in the collection.
> >   // For shared resources, return the Resource_::sharedCount but 
> > because this
> >   // is strictly an internal optimzation, the caller just needs to 
> > refer to this
> >   // as "the count".
> >   // For nonshared resources, given how we collapse addable resources, 
> > the result
> >   // is at most 1.
> > }
> > ```
> > 
> > I feel API is easiler to explain in a general sense for the callers and 
> > the implementation is easy to explain for readers of the cpp file without 
> > going to through the DRF logic.
> > 
> > Chatted with Anindya offline and we are still debating whether we can 
> > avoid even having `count` altogether but I am writing down my thoughts here.
> > 
> > As to whether we are leaking internal information, I don't feel like 
> > this is the case: we are not asking the caller to carefully manipulating 
> > the internal state in order to use share resource arithmetics correctly but 
> > rather this provides additional information for logic that doesn't deal 
> > with resource arithmetic. Therefore I don't think we are breaking 
> > abstractions but merely exposing a const `count` method.

Based on our f2f discussion, we decided to add `Resources::count()` which 
returns the shared count for shared resources, and 1 for non-shared resources. 
This API is being used within the sorter in calculation of resource share (in 
DRF).


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/common/resources_utils.cpp, lines 88-119
> > 
> >
> > In the most recent iteration we can just -= from the call sites right?

And hence this function is removed.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.cpp, lines 252-254
> > 
> >
> > What does "included in non shareable resources already." mean?

Removed the erraneous comment.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.cpp, lines 394-402
> > 
> >
> > `allocations[(*it).name].scalarQuantities` has only nonshared resources.

Refactored this to bring the calculation of total scalars within sorter. So, we 
no longer need to add a vector of scalars for each client.


> On May 11, 2016, 3:49 p.m., Jiang Yan Xu wrote:
> > src/master/allocator/sorter/drf/sorter.cpp, lines 427-432
> > 
> >
> > 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-05-22 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated May 22, 2016, 7:11 a.m.)


Review request for mesos and Jiang Yan Xu.


Changes
---

Updates to reflect share->shared, and renaming of other APIs in Resources class.
Share count now reflects actual reference count in usedResources, but presence 
(or absence) of shared resources in offer, allocator and sorter.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description (updated)
---

Each shared resource is accouted via its share count. This count is
updated based on the resource operations (such as add and subtract)
in various scenarios such as task launch and terminate at multiple
modules such as master, allocator, sorter, etc.
Allow DESTROY for shared volumes only if share count is 0.
Since shared resources are available to multiple frameworks of the
same role simultaneously, we normalize it with a weight equivalent
to the number of frameworks to which this shared resource has been
allocated to in the sorter.


Diffs (updated)
-

  src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.cpp 
8b7b3afb5770c617918ec4864faaf8d8a7a81ef2 
  src/master/allocator/sorter/drf/sorter.hpp 
05d5205d29ad74e01e07c508d88b6f8371541513 
  src/master/allocator/sorter/drf/sorter.cpp 
4306973277b9d32356eed31ceabac09fb2a03e6c 
  src/master/http.cpp b36b439a1fa07c52146deff2b90728f92676ade3 
  src/master/master.hpp 1a875c32eddfb6d884e3d0dda7f5716ee53966c3 
  src/master/master.cpp 0c05938de3e4eaeea2187559e81559f85318fa73 
  src/master/validation.hpp f29f9753c75e5abc3402ed23381edcea26517ab3 
  src/master/validation.cpp f490b899758bdac9676a6f6939918efa6ac52781 
  src/tests/master_validation_tests.cpp 
ca4442aa1ef0087a7d058d1b3aa430a1dbc16960 
  src/tests/sorter_tests.cpp eb207a36c18198588cd8a98b3860a66c2ff7a641 
  src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha



Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-05-11 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/#review129838
---



Looks like the SHARE and UNSHARE operations are not in this patch?


include/mesos/resources.hpp (lines 424 - 430)


See my comment at the definition.



include/mesos/resources.hpp (lines 513 - 518)


See my comment at the definition.



include/mesos/v1/resources.hpp (lines 424 - 431)


Ditto.



include/mesos/v1/resources.hpp (lines 513 - 518)


Ditto.



src/common/resources.cpp (line 1169)


Chatted offline but I think we do need to `clear_shared()` and 
`clear_disk()` for shared resources here. 

The problem here is that only quantities can be aggregated across slaves 
and if `clear_shared()` and `clear_disk()` are not called, the resources have 
identities and cannot be aggregated across slaves. e.g., different slaves can 
have persistent volumes with the same size and ID but are actually different.



src/common/resources.cpp (lines 1311 - 1316)


Chatted about removing `destroyAllowed` method but here an additional point 
is that: determining if a shared resource is no longer used and thus can be 
destroyed is the responsiblity of the master (more specifically the 
valication.cpp file in it and not the Resource abstraction. 

Here the concern should be: if the resource is in there, destroy it, return 
Error otherwise. Therefore the original code is still correct.



src/common/resources.cpp (line 1328)


Should we require persistent volumes to be unshared before we destroy them?

Here the concern is that we are clearing things in DESTROY that are not 
added by its pairing operation CREATE.



src/common/resources.cpp (lines 1464 - 1496)


This concept of weightedGet tightly couples with the sorter logic which is 
not used elsewhere so it shouldn't be pulled out.

Here to have something that's more easily explainable in a generic way, 
perhaps we can introduce something like

```
// Returns the count of the target Resource in the Resources.
size_t Resources::count(const Resource& target) const
{
  // Search the list of resources in the collection.
  // For shared resources, return the Resource_::sharedCount but because 
this
  // is strictly an internal optimzation, the caller just needs to refer to 
this
  // as "the count".
  // For nonshared resources, given how we collapse addable resources, the 
result
  // is at most 1.
}
```

I feel API is easiler to explain in a general sense for the callers and the 
implementation is easy to explain for readers of the cpp file without going to 
through the DRF logic.

Chatted with Anindya offline and we are still debating whether we can avoid 
even having `count` altogether but I am writing down my thoughts here.

As to whether we are leaking internal information, I don't feel like this 
is the case: we are not asking the caller to carefully manipulating the 
internal state in order to use share resource arithmetics correctly but rather 
this provides additional information for logic that doesn't deal with resource 
arithmetic. Therefore I don't think we are breaking abstractions but merely 
exposing a const `count` method.



src/common/resources.cpp (lines 1613 - 1635)


If we don't need `weightedGet` we don't need this either.



src/common/resources.cpp (line 1980)


I understand that there are cases where we call `<<` on the Resource 
directly so the `Resource_` version doesn't get called, but perhaps `` 
pairs better with the shared count `<6>`?



src/common/resources_utils.hpp (lines 41 - 45)


The comments at the definitions.



src/common/resources_utils.cpp (lines 71 - 74)


Do we need `Resources::isShared(resource)`?

If we don't need that method, here we can actually just 
`stripped.clear_shared()`.



src/common/resources_utils.cpp (lines 88 - 119)


In the most recent iteration we can just -= from the call sites right?



src/common/resources_utils.cpp (lines 122 - 139)


This method is basically `Resources& Resources::operator+=(const 

Re: Review Request 45961: Support sharing of resources through reference counting of resources.

2016-04-28 Thread Anindya Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45961/
---

(Updated April 29, 2016, 12:16 a.m.)


Review request for mesos and Jiang Yan Xu.


Changes
---

Refactored based on changes in public APIs in `Resource_`.


Bugs: MESOS-4431
https://issues.apache.org/jira/browse/MESOS-4431


Repository: mesos


Description
---

Number of consumers for each shared resource is tracked. The count is
incremented on task launch, and decremented on task termination whenever
the task uses a shared resource.
Allow DESTROY for shared volumes only if reference count is 0.
Since shared resources are available to multiple frameworks of the
same role simultaneously, we normalize it with a weight equivalent
to the number of frameworks to which this shared resource has been
allocated to in the sorter.


Diffs (updated)
-

  include/mesos/resources.hpp a557e97c65194d4aad879fb88d8edefd1c95b8d8 
  include/mesos/v1/resources.hpp a5ba8fec4c9c3643646308f75a4b28cefe0b3df3 
  src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
  src/common/resources_utils.hpp 2840f459288bbe8440eda08119d4f86a8be5a291 
  src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d 
  src/master/allocator/mesos/hierarchical.cpp 
0de03c7347e01fde2b42f5ec38a34a62edf642a1 
  src/master/allocator/sorter/drf/sorter.hpp 
05d5205d29ad74e01e07c508d88b6f8371541513 
  src/master/allocator/sorter/drf/sorter.cpp 
4306973277b9d32356eed31ceabac09fb2a03e6c 
  src/master/master.hpp 3e55114ee7866e06513071e86e15608099dae052 
  src/master/master.cpp ff41da3d077b65b44277e1bbae88c61b7bb88a3d 
  src/master/validation.cpp f458100d22ec1f9f10921c1c91b6931a5671e28f 
  src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 

Diff: https://reviews.apache.org/r/45961/diff/


Testing
---

Tests successful.


Thanks,

Anindya Sinha