Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-26 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/
---

(Updated Feb. 27, 2016, 1:57 a.m.)


Review request for mesos, Joris Van Remoortere and Michael Park.


Changes
---

Address review comments.


Bugs: MESOS-4687
https://issues.apache.org/jira/browse/MESOS-4687


Repository: mesos


Description
---

Scalar resource values are represented using floating point. As a result, users
could see unexpected results when accepting offers and making reservations for
fractional resources: values like "0.1" cannot be precisely represented using
standard floating point, and the resource values returned to frameworks might
contain an unpredictable amount of roundoff error.

This commit adjusts the master to use fixed-point when doing internal
computations on scalar resource values. The fixed-point format only supports
three decimal digits of precision: that is, fractional resource values like
"0.001" will be supported, but "0.0001" will not be.


Diffs (updated)
-

  docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
  docs/upgrades.md 21faea8a3c152b15023d6fa69cde9382dac80c18 
  include/mesos/mesos.proto 33f6b0838360b61db70a247e5d6dfb16af15aa06 
  include/mesos/v1/mesos.proto 1b0e709e76f3f6b44ab0434c649c064e8866c8a1 
  src/common/resources.cpp 529a1cd99707f8ce7bcc22889793d1eea04c3338 
  src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
  src/master/allocator/mesos/hierarchical.cpp 
5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
  src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
  src/v1/resources.cpp 49dc3429f3b4b18e24f80052ed8be830df53b59d 
  src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 

Diff: https://reviews.apache.org/r/43635/diff/


Testing
---

make check

Manually verified that some of the floating point oddities in 
https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
is applied, although I wasn't able to reproduce the crash described in that 
ticket.

REVIEW NOTES:

* We don't currently emit a warning when discarding additional digits of 
precision from input scalar resource values. Should we? That would require 
identifying all the points where a resource value is seemed to be 
"user-provided", and also runs the risk of generating a ton of log messages 
when an old framework is used.
* Similarly, if the user gives us a resource value and we don't do anything to 
it, we won't discard any additional precision that appears in the value -- the 
precision only gets discarded when we apply an operator like `+` or `-`. 
Unclear if we should trim additional precision from all scalar resource values 
more aggressively.

PERFORMANCE:

This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
applied (optimized build on my laptop):

```
[ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 200 frameworks
round 0 allocate took 2.192425secs to make 200 offers
round 1 allocate took 2.221243secs to make 200 offers
round 2 allocate took 2.236314secs to make 200 offers
round 3 allocate took 2.224045secs to make 200 offers
round 4 allocate took 2.232822secs to make 200 offers
round 5 allocate took 2.264807secs to make 200 offers
round 6 allocate took 2.224853secs to make 200 offers
round 7 allocate took 2.224829secs to make 200 offers
round 8 allocate took 2.24862secs to make 200 offers
round 9 allocate took 2.2556secs to make 200 offers
round 10 allocate took 2.256616secs to make 200 offers
```

And after applying this RR:

```
[ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 200 frameworks
round 0 allocate took 2.267919secs to make 200 offers
round 1 allocate took 2.202843secs to make 200 offers
round 2 allocate took 2.20426secs to make 200 offers
round 3 allocate took 2.263887secs to make 200 offers
round 4 allocate took 2.266237secs to make 200 offers
round 5 allocate took 2.276957secs to make 200 offers
round 6 allocate took 2.291821secs to make 200 offers
round 7 allocate took 2.261839secs to make 200 offers
round 8 allocate took 2.325696secs to make 200 offers
round 9 allocate took 2.310469secs to make 200 offers
round 10 allocate took 2.21802secs to make 200 offers
```

Which suggests to me that any performance hit is pretty minimal.


Thanks,

Neil Conway



Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-26 Thread Michael Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review121043
---


Fix it, then Ship it!





src/common/values.cpp (lines 96 - 100)


Can we just leverage the `operator+=`?

```
Value::Scalar result = left;
result += right;
return result;
```

Same for `operator-` and the `v1` API.


- Michael Park


On Feb. 27, 2016, 12:10 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 27, 2016, 12:10 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 21faea8a3c152b15023d6fa69cde9382dac80c18 
>   include/mesos/mesos.proto 33f6b0838360b61db70a247e5d6dfb16af15aa06 
>   include/mesos/v1/mesos.proto 1b0e709e76f3f6b44ab0434c649c064e8866c8a1 
>   src/common/resources.cpp 529a1cd99707f8ce7bcc22889793d1eea04c3338 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> 5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 49dc3429f3b4b18e24f80052ed8be830df53b59d 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> PERFORMANCE:
> 
> This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
> applied (optimized build on my laptop):
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.192425secs to make 200 offers
> round 1 allocate took 2.221243secs to make 200 offers
> round 2 allocate took 2.236314secs to make 200 offers
> round 3 allocate took 2.224045secs to make 200 offers
> round 4 allocate took 2.232822secs to make 200 offers
> round 5 allocate took 2.264807secs to make 200 offers
> round 6 allocate took 2.224853secs to make 200 offers
> round 7 allocate took 2.224829secs to make 200 offers
> round 8 allocate took 2.24862secs to make 200 offers
> round 9 allocate took 2.2556secs to make 200 offers
> round 10 allocate took 2.256616secs to make 200 offers
> ```
> 
> And after applying this RR:
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.267919secs to make 200 offers
> round 1 allocate took 2.202843secs to make 200 offers
> round 2 allocate took 2.20426secs to make 200 offers
> round 3 allocate took 2.263887secs to make 200 offers
> round 4 allocate took 2.266237secs to make 200 offers
> round 5 allocate took 2.276957secs to make 200 offers
> round 6 allocate took 2.291821secs to make 200 offers
> round 7 allocate took 2.261839secs to make 200 offers
>

Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-26 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/
---

(Updated Feb. 27, 2016, 12:10 a.m.)


Review request for mesos, Joris Van Remoortere and Michael Park.


Changes
---

Add comments.


Bugs: MESOS-4687
https://issues.apache.org/jira/browse/MESOS-4687


Repository: mesos


Description
---

Scalar resource values are represented using floating point. As a result, users
could see unexpected results when accepting offers and making reservations for
fractional resources: values like "0.1" cannot be precisely represented using
standard floating point, and the resource values returned to frameworks might
contain an unpredictable amount of roundoff error.

This commit adjusts the master to use fixed-point when doing internal
computations on scalar resource values. The fixed-point format only supports
three decimal digits of precision: that is, fractional resource values like
"0.001" will be supported, but "0.0001" will not be.


Diffs (updated)
-

  docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
  docs/upgrades.md 21faea8a3c152b15023d6fa69cde9382dac80c18 
  include/mesos/mesos.proto 33f6b0838360b61db70a247e5d6dfb16af15aa06 
  include/mesos/v1/mesos.proto 1b0e709e76f3f6b44ab0434c649c064e8866c8a1 
  src/common/resources.cpp 529a1cd99707f8ce7bcc22889793d1eea04c3338 
  src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
  src/master/allocator/mesos/hierarchical.cpp 
5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
  src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
  src/v1/resources.cpp 49dc3429f3b4b18e24f80052ed8be830df53b59d 
  src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 

Diff: https://reviews.apache.org/r/43635/diff/


Testing
---

make check

Manually verified that some of the floating point oddities in 
https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
is applied, although I wasn't able to reproduce the crash described in that 
ticket.

REVIEW NOTES:

* We don't currently emit a warning when discarding additional digits of 
precision from input scalar resource values. Should we? That would require 
identifying all the points where a resource value is seemed to be 
"user-provided", and also runs the risk of generating a ton of log messages 
when an old framework is used.
* Similarly, if the user gives us a resource value and we don't do anything to 
it, we won't discard any additional precision that appears in the value -- the 
precision only gets discarded when we apply an operator like `+` or `-`. 
Unclear if we should trim additional precision from all scalar resource values 
more aggressively.

PERFORMANCE:

This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
applied (optimized build on my laptop):

```
[ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 200 frameworks
round 0 allocate took 2.192425secs to make 200 offers
round 1 allocate took 2.221243secs to make 200 offers
round 2 allocate took 2.236314secs to make 200 offers
round 3 allocate took 2.224045secs to make 200 offers
round 4 allocate took 2.232822secs to make 200 offers
round 5 allocate took 2.264807secs to make 200 offers
round 6 allocate took 2.224853secs to make 200 offers
round 7 allocate took 2.224829secs to make 200 offers
round 8 allocate took 2.24862secs to make 200 offers
round 9 allocate took 2.2556secs to make 200 offers
round 10 allocate took 2.256616secs to make 200 offers
```

And after applying this RR:

```
[ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 200 frameworks
round 0 allocate took 2.267919secs to make 200 offers
round 1 allocate took 2.202843secs to make 200 offers
round 2 allocate took 2.20426secs to make 200 offers
round 3 allocate took 2.263887secs to make 200 offers
round 4 allocate took 2.266237secs to make 200 offers
round 5 allocate took 2.276957secs to make 200 offers
round 6 allocate took 2.291821secs to make 200 offers
round 7 allocate took 2.261839secs to make 200 offers
round 8 allocate took 2.325696secs to make 200 offers
round 9 allocate took 2.310469secs to make 200 offers
round 10 allocate took 2.21802secs to make 200 offers
```

Which suggests to me that any performance hit is pretty minimal.


Thanks,

Neil Conway



Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-26 Thread Joris Van Remoortere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review121007
---


Fix it, then Ship it!





src/common/values.cpp (line 62)


Can we add a comment explaining why we use this quotient, remainder 
approach?


- Joris Van Remoortere


On Feb. 26, 2016, 11:52 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 26, 2016, 11:52 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 21faea8a3c152b15023d6fa69cde9382dac80c18 
>   include/mesos/mesos.proto 33f6b0838360b61db70a247e5d6dfb16af15aa06 
>   include/mesos/v1/mesos.proto 1b0e709e76f3f6b44ab0434c649c064e8866c8a1 
>   src/common/resources.cpp 529a1cd99707f8ce7bcc22889793d1eea04c3338 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> 5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 49dc3429f3b4b18e24f80052ed8be830df53b59d 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> PERFORMANCE:
> 
> This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
> applied (optimized build on my laptop):
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.192425secs to make 200 offers
> round 1 allocate took 2.221243secs to make 200 offers
> round 2 allocate took 2.236314secs to make 200 offers
> round 3 allocate took 2.224045secs to make 200 offers
> round 4 allocate took 2.232822secs to make 200 offers
> round 5 allocate took 2.264807secs to make 200 offers
> round 6 allocate took 2.224853secs to make 200 offers
> round 7 allocate took 2.224829secs to make 200 offers
> round 8 allocate took 2.24862secs to make 200 offers
> round 9 allocate took 2.2556secs to make 200 offers
> round 10 allocate took 2.256616secs to make 200 offers
> ```
> 
> And after applying this RR:
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.267919secs to make 200 offers
> round 1 allocate took 2.202843secs to make 200 offers
> round 2 allocate took 2.20426secs to make 200 offers
> round 3 allocate took 2.263887secs to make 200 offers
> round 4 allocate took 2.266237secs to make 200 offers
> round 5 allocate took 2.276957secs to make 200 offers
> round 6 allocate took 2.291821secs to make 200 offers
> round 7 allocate took 2.261839secs to make 200 offers
> round 8 allocate took 2.325696secs to make 200 offers
> round 9 allocate took 2.310469secs to make 2

Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-26 Thread Neil Conway


> On Feb. 26, 2016, 9:40 p.m., Joris Van Remoortere wrote:
> > src/common/values.cpp, line 61
> > 
> >
> > let's pull out into a constant as discussed offline.

Per discussion, we aren't going to do this for now (stupid `constexpr`).


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review120961
---


On Feb. 26, 2016, 11:52 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 26, 2016, 11:52 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 21faea8a3c152b15023d6fa69cde9382dac80c18 
>   include/mesos/mesos.proto 33f6b0838360b61db70a247e5d6dfb16af15aa06 
>   include/mesos/v1/mesos.proto 1b0e709e76f3f6b44ab0434c649c064e8866c8a1 
>   src/common/resources.cpp 529a1cd99707f8ce7bcc22889793d1eea04c3338 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> 5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 49dc3429f3b4b18e24f80052ed8be830df53b59d 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> PERFORMANCE:
> 
> This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
> applied (optimized build on my laptop):
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.192425secs to make 200 offers
> round 1 allocate took 2.221243secs to make 200 offers
> round 2 allocate took 2.236314secs to make 200 offers
> round 3 allocate took 2.224045secs to make 200 offers
> round 4 allocate took 2.232822secs to make 200 offers
> round 5 allocate took 2.264807secs to make 200 offers
> round 6 allocate took 2.224853secs to make 200 offers
> round 7 allocate took 2.224829secs to make 200 offers
> round 8 allocate took 2.24862secs to make 200 offers
> round 9 allocate took 2.2556secs to make 200 offers
> round 10 allocate took 2.256616secs to make 200 offers
> ```
> 
> And after applying this RR:
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.267919secs to make 200 offers
> round 1 allocate took 2.202843secs to make 200 offers
> round 2 allocate took 2.20426secs to make 200 offers
> round 3 allocate took 2.263887secs to make 200 offers
> round 4 allocate took 2.266237secs to make 200 offers
> round 5 allocate took 2.276957secs to make 200 offers
> round 6 allocate took 2.291821secs to make 200 offers
> round 7 allocate took 2.261839secs to make 200 offers

Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-26 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/
---

(Updated Feb. 26, 2016, 11:52 p.m.)


Review request for mesos, Joris Van Remoortere and Michael Park.


Changes
---

Address code review comments.


Bugs: MESOS-4687
https://issues.apache.org/jira/browse/MESOS-4687


Repository: mesos


Description
---

Scalar resource values are represented using floating point. As a result, users
could see unexpected results when accepting offers and making reservations for
fractional resources: values like "0.1" cannot be precisely represented using
standard floating point, and the resource values returned to frameworks might
contain an unpredictable amount of roundoff error.

This commit adjusts the master to use fixed-point when doing internal
computations on scalar resource values. The fixed-point format only supports
three decimal digits of precision: that is, fractional resource values like
"0.001" will be supported, but "0.0001" will not be.


Diffs (updated)
-

  docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
  docs/upgrades.md 21faea8a3c152b15023d6fa69cde9382dac80c18 
  include/mesos/mesos.proto 33f6b0838360b61db70a247e5d6dfb16af15aa06 
  include/mesos/v1/mesos.proto 1b0e709e76f3f6b44ab0434c649c064e8866c8a1 
  src/common/resources.cpp 529a1cd99707f8ce7bcc22889793d1eea04c3338 
  src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
  src/master/allocator/mesos/hierarchical.cpp 
5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
  src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
  src/v1/resources.cpp 49dc3429f3b4b18e24f80052ed8be830df53b59d 
  src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 

Diff: https://reviews.apache.org/r/43635/diff/


Testing
---

make check

Manually verified that some of the floating point oddities in 
https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
is applied, although I wasn't able to reproduce the crash described in that 
ticket.

REVIEW NOTES:

* We don't currently emit a warning when discarding additional digits of 
precision from input scalar resource values. Should we? That would require 
identifying all the points where a resource value is seemed to be 
"user-provided", and also runs the risk of generating a ton of log messages 
when an old framework is used.
* Similarly, if the user gives us a resource value and we don't do anything to 
it, we won't discard any additional precision that appears in the value -- the 
precision only gets discarded when we apply an operator like `+` or `-`. 
Unclear if we should trim additional precision from all scalar resource values 
more aggressively.

PERFORMANCE:

This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
applied (optimized build on my laptop):

```
[ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 200 frameworks
round 0 allocate took 2.192425secs to make 200 offers
round 1 allocate took 2.221243secs to make 200 offers
round 2 allocate took 2.236314secs to make 200 offers
round 3 allocate took 2.224045secs to make 200 offers
round 4 allocate took 2.232822secs to make 200 offers
round 5 allocate took 2.264807secs to make 200 offers
round 6 allocate took 2.224853secs to make 200 offers
round 7 allocate took 2.224829secs to make 200 offers
round 8 allocate took 2.24862secs to make 200 offers
round 9 allocate took 2.2556secs to make 200 offers
round 10 allocate took 2.256616secs to make 200 offers
```

And after applying this RR:

```
[ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 200 frameworks
round 0 allocate took 2.267919secs to make 200 offers
round 1 allocate took 2.202843secs to make 200 offers
round 2 allocate took 2.20426secs to make 200 offers
round 3 allocate took 2.263887secs to make 200 offers
round 4 allocate took 2.266237secs to make 200 offers
round 5 allocate took 2.276957secs to make 200 offers
round 6 allocate took 2.291821secs to make 200 offers
round 7 allocate took 2.261839secs to make 200 offers
round 8 allocate took 2.325696secs to make 200 offers
round 9 allocate took 2.310469secs to make 200 offers
round 10 allocate took 2.21802secs to make 200 offers
```

Which suggests to me that any performance hit is pretty minimal.


Thanks,

Neil Conway



Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-26 Thread Neil Conway


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 67
> > 
> >
> > Let's add check on overflow; it will be helpful if scalar value was 
> > big. Scalar is a general type, there maybe used with a big value, e.g. 
> > total size of distributed filesystem.
> 
> Neil Conway wrote:
> What should we do in case of overflow?
> 
> Note that we don't check for overflow in operator+ (or for negative 
> result values in operator-)...
> 
> Klaus Ma wrote:
> `CHECK` or warning log should be fine, it only improves the debugability; 
> it different with `operator+` because `max_double/2` should be big enough for 
> us; but for `double * 1000 -> long`, I'm not sure of that.
> 
> And I think we need to use `long long` or `int64` instead of `long`; 
> `long` is not garantee to be 64bit in c++.
> 
> Neil Conway wrote:
> re: `long`, good point -- we only officially support 64-bit OSX, Linux, 
> and Windows, but `long` is 32-bit on Windows.

After discussing this with a few folks offline, our feeling was that the range 
of resource values we can represent without overflow should be sufficient for 
any reasonable purpose for a considerable length of time. Hopefully we'll have 
switched to a fixed point format before people use Mesos to manage > 10s of 
exabytes as a single resource :)


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review119382
---


On Feb. 19, 2016, 10:27 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 19, 2016, 10:27 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto 636550f255a122d7f714dbd58f587bea8221b461 
>   include/mesos/v1/mesos.proto 1d5af88a343fe9d81688bb3e83aa997ccba7d030 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> 5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> PERFORMANCE:
> 
> This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
> applied (optimized build on my laptop):
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.192425secs to make 200 offers
> round 1 allocate took 2.221243secs to make 200 of

Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-26 Thread Joris Van Remoortere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review120982
---


Fix it, then Ship it!





src/common/values.cpp (lines 60 - 62)


How about if we re-construct the double from its whole and decimal parts 
using `/1000` and `%1000`?


- Joris Van Remoortere


On Feb. 19, 2016, 10:27 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 19, 2016, 10:27 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto 636550f255a122d7f714dbd58f587bea8221b461 
>   include/mesos/v1/mesos.proto 1d5af88a343fe9d81688bb3e83aa997ccba7d030 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> 5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> PERFORMANCE:
> 
> This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
> applied (optimized build on my laptop):
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.192425secs to make 200 offers
> round 1 allocate took 2.221243secs to make 200 offers
> round 2 allocate took 2.236314secs to make 200 offers
> round 3 allocate took 2.224045secs to make 200 offers
> round 4 allocate took 2.232822secs to make 200 offers
> round 5 allocate took 2.264807secs to make 200 offers
> round 6 allocate took 2.224853secs to make 200 offers
> round 7 allocate took 2.224829secs to make 200 offers
> round 8 allocate took 2.24862secs to make 200 offers
> round 9 allocate took 2.2556secs to make 200 offers
> round 10 allocate took 2.256616secs to make 200 offers
> ```
> 
> And after applying this RR:
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.267919secs to make 200 offers
> round 1 allocate took 2.202843secs to make 200 offers
> round 2 allocate took 2.20426secs to make 200 offers
> round 3 allocate took 2.263887secs to make 200 offers
> round 4 allocate took 2.266237secs to make 200 offers
> round 5 allocate took 2.276957secs to make 200 offers
> round 6 allocate took 2.291821secs to make 200 offers
> round 7 allocate took 2.261839secs to make 200 offers
> round 8 allocate took 2.325696secs to make 200 offers
> round 9 allocate

Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-26 Thread Joris Van Remoortere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review120961
---




src/common/values.cpp (line 55)


`std::llround`



src/common/values.cpp (line 61)


let's pull out into a constant as discussed offline.



src/tests/resources_tests.cpp (line 859)


expected, actual ?



src/tests/resources_tests.cpp (lines 1592 - 1605)


Can you add another test like this that adds 100K times, and then subtracts 
100K times? and checks between the top and bottom?


- Joris Van Remoortere


On Feb. 19, 2016, 10:27 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 19, 2016, 10:27 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto 636550f255a122d7f714dbd58f587bea8221b461 
>   include/mesos/v1/mesos.proto 1d5af88a343fe9d81688bb3e83aa997ccba7d030 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> 5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> PERFORMANCE:
> 
> This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
> applied (optimized build on my laptop):
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.192425secs to make 200 offers
> round 1 allocate took 2.221243secs to make 200 offers
> round 2 allocate took 2.236314secs to make 200 offers
> round 3 allocate took 2.224045secs to make 200 offers
> round 4 allocate took 2.232822secs to make 200 offers
> round 5 allocate took 2.264807secs to make 200 offers
> round 6 allocate took 2.224853secs to make 200 offers
> round 7 allocate took 2.224829secs to make 200 offers
> round 8 allocate took 2.24862secs to make 200 offers
> round 9 allocate took 2.2556secs to make 200 offers
> round 10 allocate took 2.256616secs to make 200 offers
> ```
> 
> And after applying this RR:
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.267919secs to make 200 offers
> round 1 allocate took 2.202843secs to make 200 offers
> round 2 allocate took

Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-20 Thread Klaus Ma


> On Feb. 17, 2016, 8:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 60
> > 
> >
> > Regarnding `lround`, it maybe overcommit resources, if framework keep 
> > launching task with smaller CPU usage; I'd suggest to discard the 
> > additional precission directly (e.g. 1.2345 -> 1.234). If there's any idle 
> > resources because of discarding, oversubscription will help.
> > 
> > And we need to reject operation request with small resources which is 
> > handled in MESOS-1807.
> 
> Neil Conway wrote:
> Truncation rather than rounding might also be reasonable behavior; I'm 
> curious what other people think.
> 
> Neil Conway wrote:
> Chatting with Joris, our feeling was that it is hard to do something that 
> is right for everyone, but that the risk of overcommitting resources due to 
> rounding is probably minimal. Overall rounding probably seems like a better 
> policy than either taking the floor or the ceiling.

I'm OK with that; thanks Neil.


- Klaus


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review119382
---


On Feb. 20, 2016, 6:27 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 20, 2016, 6:27 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto 636550f255a122d7f714dbd58f587bea8221b461 
>   include/mesos/v1/mesos.proto 1d5af88a343fe9d81688bb3e83aa997ccba7d030 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> 5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> PERFORMANCE:
> 
> This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
> applied (optimized build on my laptop):
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.192425secs to make 200 offers
> round 1 allocate took 2.221243secs to make 200 offers
> round 2 allocate took 2.236314secs to make 200 offers
> round 3 allocate took 2.224045secs to make 200 offers
> round 4 allocate took 2.232822secs to make 200 offers
> round 5 allocate took 2.264807secs to make 200 offers
> round 6 allocate took 2.224853secs to make 200 offers
> round 7 allocate took 2.224829secs to make 200 offers
> round 8 allocat

Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-19 Thread Neil Conway


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 59
> > 
> >
> > Replace 1000 with const integer

I refactored the code to use functions rather than repeating `1000` in a bunch 
of places, which I think addresses this concern.


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 60
> > 
> >
> > Regarnding `lround`, it maybe overcommit resources, if framework keep 
> > launching task with smaller CPU usage; I'd suggest to discard the 
> > additional precission directly (e.g. 1.2345 -> 1.234). If there's any idle 
> > resources because of discarding, oversubscription will help.
> > 
> > And we need to reject operation request with small resources which is 
> > handled in MESOS-1807.
> 
> Neil Conway wrote:
> Truncation rather than rounding might also be reasonable behavior; I'm 
> curious what other people think.

Chatting with Joris, our feeling was that it is hard to do something that is 
right for everyone, but that the risk of overcommitting resources due to 
rounding is probably minimal. Overall rounding probably seems like a better 
policy than either taking the floor or the ceiling.


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review119382
---


On Feb. 19, 2016, 10:27 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 19, 2016, 10:27 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto 636550f255a122d7f714dbd58f587bea8221b461 
>   include/mesos/v1/mesos.proto 1d5af88a343fe9d81688bb3e83aa997ccba7d030 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> 5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> PERFORMANCE:
> 
> This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
> applied (optimized build on my laptop):
> 
> ```
> [ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
> Using 2000 slaves and 200 frameworks
> round 0 allocate took 2.192425secs to make 200 offers
> round 1 allocate took 2.221243secs to make 200 offers
> round 2 allocate took 2.236314secs to make 200 offers
> round 3 allocate took

Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-19 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/
---

(Updated Feb. 19, 2016, 10:27 p.m.)


Review request for mesos, Joris Van Remoortere and Michael Park.


Changes
---

Various fixes: introduce functions, replace "long" with "long long", etc.


Bugs: MESOS-4687
https://issues.apache.org/jira/browse/MESOS-4687


Repository: mesos


Description
---

Scalar resource values are represented using floating point. As a result, users
could see unexpected results when accepting offers and making reservations for
fractional resources: values like "0.1" cannot be precisely represented using
standard floating point, and the resource values returned to frameworks might
contain an unpredictable amount of roundoff error.

This commit adjusts the master to use fixed-point when doing internal
computations on scalar resource values. The fixed-point format only supports
three decimal digits of precision: that is, fractional resource values like
"0.001" will be supported, but "0.0001" will not be.


Diffs (updated)
-

  docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
  docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
  include/mesos/mesos.proto 636550f255a122d7f714dbd58f587bea8221b461 
  include/mesos/v1/mesos.proto 1d5af88a343fe9d81688bb3e83aa997ccba7d030 
  src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
  src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
  src/master/allocator/mesos/hierarchical.cpp 
5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d 
  src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
  src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
  src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 

Diff: https://reviews.apache.org/r/43635/diff/


Testing (updated)
---

make check

Manually verified that some of the floating point oddities in 
https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
is applied, although I wasn't able to reproduce the crash described in that 
ticket.

REVIEW NOTES:

* We don't currently emit a warning when discarding additional digits of 
precision from input scalar resource values. Should we? That would require 
identifying all the points where a resource value is seemed to be 
"user-provided", and also runs the risk of generating a ton of log messages 
when an old framework is used.
* Similarly, if the user gives us a resource value and we don't do anything to 
it, we won't discard any additional precision that appears in the value -- the 
precision only gets discarded when we apply an operator like `+` or `-`. 
Unclear if we should trim additional precision from all scalar resource values 
more aggressively.

PERFORMANCE:

This is the performance of the `DeclineOffers` benchmark WITHOUT this RR 
applied (optimized build on my laptop):

```
[ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 200 frameworks
round 0 allocate took 2.192425secs to make 200 offers
round 1 allocate took 2.221243secs to make 200 offers
round 2 allocate took 2.236314secs to make 200 offers
round 3 allocate took 2.224045secs to make 200 offers
round 4 allocate took 2.232822secs to make 200 offers
round 5 allocate took 2.264807secs to make 200 offers
round 6 allocate took 2.224853secs to make 200 offers
round 7 allocate took 2.224829secs to make 200 offers
round 8 allocate took 2.24862secs to make 200 offers
round 9 allocate took 2.2556secs to make 200 offers
round 10 allocate took 2.256616secs to make 200 offers
```

And after applying this RR:

```
[ RUN  ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers
Using 2000 slaves and 200 frameworks
round 0 allocate took 2.267919secs to make 200 offers
round 1 allocate took 2.202843secs to make 200 offers
round 2 allocate took 2.20426secs to make 200 offers
round 3 allocate took 2.263887secs to make 200 offers
round 4 allocate took 2.266237secs to make 200 offers
round 5 allocate took 2.276957secs to make 200 offers
round 6 allocate took 2.291821secs to make 200 offers
round 7 allocate took 2.261839secs to make 200 offers
round 8 allocate took 2.325696secs to make 200 offers
round 9 allocate took 2.310469secs to make 200 offers
round 10 allocate took 2.21802secs to make 200 offers
```

Which suggests to me that any performance hit is pretty minimal.


Thanks,

Neil Conway



Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-17 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/
---

(Updated Feb. 17, 2016, 6:49 p.m.)


Review request for mesos, Joris Van Remoortere and Michael Park.


Changes
---

Address some more code review comments.


Bugs: MESOS-4687
https://issues.apache.org/jira/browse/MESOS-4687


Repository: mesos


Description
---

Scalar resource values are represented using floating point. As a result, users
could see unexpected results when accepting offers and making reservations for
fractional resources: values like "0.1" cannot be precisely represented using
standard floating point, and the resource values returned to frameworks might
contain an unpredictable amount of roundoff error.

This commit adjusts the master to use fixed-point when doing internal
computations on scalar resource values. The fixed-point format only supports
three decimal digits of precision: that is, fractional resource values like
"0.001" will be supported, but "0.0001" will not be.


Diffs (updated)
-

  docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
  docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
  include/mesos/mesos.proto e24d3e03a7dc7c6bfd07f34531cb593fe4925646 
  include/mesos/v1/mesos.proto d909e60ddfd8e3ba2075f82c372edde04cd99d54 
  src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
  src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
  src/master/allocator/mesos/hierarchical.cpp 
a9d2c23162892e0f97d89a076d2311091d91 
  src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
  src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
  src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 

Diff: https://reviews.apache.org/r/43635/diff/


Testing
---

make check

Manually verified that some of the floating point oddities in 
https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
is applied, although I wasn't able to reproduce the crash described in that 
ticket.

REVIEW NOTES:

* We don't currently emit a warning when discarding additional digits of 
precision from input scalar resource values. Should we? That would require 
identifying all the points where a resource value is seemed to be 
"user-provided", and also runs the risk of generating a ton of log messages 
when an old framework is used.
* Similarly, if the user gives us a resource value and we don't do anything to 
it, we won't discard any additional precision that appears in the value -- the 
precision only gets discarded when we apply an operator like `+` or `-`. 
Unclear if we should trim additional precision from all scalar resource values 
more aggressively.


Thanks,

Neil Conway



Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-17 Thread Neil Conway


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 67
> > 
> >
> > Let's add check on overflow; it will be helpful if scalar value was 
> > big. Scalar is a general type, there maybe used with a big value, e.g. 
> > total size of distributed filesystem.
> 
> Neil Conway wrote:
> What should we do in case of overflow?
> 
> Note that we don't check for overflow in operator+ (or for negative 
> result values in operator-)...
> 
> Klaus Ma wrote:
> `CHECK` or warning log should be fine, it only improves the debugability; 
> it different with `operator+` because `max_double/2` should be big enough for 
> us; but for `double * 1000 -> long`, I'm not sure of that.
> 
> And I think we need to use `long long` or `int64` instead of `long`; 
> `long` is not garantee to be 64bit in c++.

re: `long`, good point -- we only officially support 64-bit OSX, Linux, and 
Windows, but `long` is 32-bit on Windows.


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review119382
---


On Feb. 17, 2016, 12:49 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 17, 2016, 12:49 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto e24d3e03a7dc7c6bfd07f34531cb593fe4925646 
>   include/mesos/v1/mesos.proto d909e60ddfd8e3ba2075f82c372edde04cd99d54 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> a9d2c23162892e0f97d89a076d2311091d91 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-16 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review119400
---




docs/attributes-resources.md (line 39)


s/e.g., reserving "1.5001 CPUs" is considered equivalent to reserving "1.5 
CPUs"/e.g., reserving "1.5123 CPUs" is considered equivalent to reserving 
"1.512 CPUs" ?

This is more clear to reflect that three decimal digits of precision are 
supportted.

You may also want to mention the logic here:
1) Trailing zeros (1.500 -> 1.5).
2) Round up (1.5015 -> 1.502)



src/tests/resources_tests.cpp (line 856)


add a period to the end and same as following.



src/tests/resources_tests.cpp (line 1622)


s/We //


- Guangya Liu


On 二月 17, 2016, 12:49 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated 二月 17, 2016, 12:49 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto e24d3e03a7dc7c6bfd07f34531cb593fe4925646 
>   include/mesos/v1/mesos.proto d909e60ddfd8e3ba2075f82c372edde04cd99d54 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> a9d2c23162892e0f97d89a076d2311091d91 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>



Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-16 Thread Klaus Ma


> On Feb. 17, 2016, 8:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 67
> > 
> >
> > Let's add check on overflow; it will be helpful if scalar value was 
> > big. Scalar is a general type, there maybe used with a big value, e.g. 
> > total size of distributed filesystem.
> 
> Neil Conway wrote:
> What should we do in case of overflow?
> 
> Note that we don't check for overflow in operator+ (or for negative 
> result values in operator-)...

`CHECK` or warning log should be fine, it only improves the debugability; it 
different with `operator+` because `max_double/2` should be big enough for us; 
but for `double * 1000 -> long`, I'm not sure of that.

And I think we need to use `long long` or `int64` instead of `long`; `long` is 
not garantee to be 64bit in c++.


> On Feb. 17, 2016, 8:30 a.m., Klaus Ma wrote:
> > docs/attributes-resources.md, line 7
> > 
> >
> > Any related to this patches?
> 
> Neil Conway wrote:
> Not really, just a minor doc cleanup I made along the way. Happy to split 
> into a separate patch if you'd prefer.

It's OK to handle it in this patches.


> On Feb. 17, 2016, 8:30 a.m., Klaus Ma wrote:
> > docs/attributes-resources.md, line 39
> > 
> >
> > There seems two space before "Those are used to ..." in review board; 
> > but it's OK drop this issues if it's one in the code.
> 
> Neil Conway wrote:
> Two spaces after a period is considered good style in English prose 
> according to some people (this style is used in various places throughout the 
> comments and docs, but we aren't consistent). I don't have a strong view, but 
> it will require changing a lot more places than just here to adopt a single 
> style.

Yes, seems both styles are accepted in comments & document. I'm OK with that.


- Klaus


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review119382
---


On Feb. 17, 2016, 8:49 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 17, 2016, 8:49 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto e24d3e03a7dc7c6bfd07f34531cb593fe4925646 
>   include/mesos/v1/mesos.proto d909e60ddfd8e3ba2075f82c372edde04cd99d54 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> a9d2c23162892e0f97d89a076d2311091d91 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additi

Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-16 Thread Neil Conway


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > docs/attributes-resources.md, line 7
> > 
> >
> > Any related to this patches?

Not really, just a minor doc cleanup I made along the way. Happy to split into 
a separate patch if you'd prefer.


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > docs/attributes-resources.md, line 39
> > 
> >
> > There seems two space before "Those are used to ..." in review board; 
> > but it's OK drop this issues if it's one in the code.

Two spaces after a period is considered good style in English prose according 
to some people (this style is used in various places throughout the comments 
and docs, but we aren't consistent). I don't have a strong view, but it will 
require changing a lot more places than just here to adopt a single style.


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 52
> > 
> >
> > remove empty line.

I'd prefer to keep the newline: without a newline, it suggests that the comment 
is specific to the function that follows (operator<<), which would be 
misleading: the comment applies to the following ~7 functions.


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 60
> > 
> >
> > Regarnding `lround`, it maybe overcommit resources, if framework keep 
> > launching task with smaller CPU usage; I'd suggest to discard the 
> > additional precission directly (e.g. 1.2345 -> 1.234). If there's any idle 
> > resources because of discarding, oversubscription will help.
> > 
> > And we need to reject operation request with small resources which is 
> > handled in MESOS-1807.

Truncation rather than rounding might also be reasonable behavior; I'm curious 
what other people think.


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 61
> > 
> >
> > Should we `setprecision`?

I didn't use setprecision, because:

1. There is value in making sure we use the same rounding method in this 
operator as elsewhere, e.g, for corner-cases like 1.2345
2. setprecision is side-effecting and modifying the caller's ostream seems 
problematic.


> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote:
> > src/common/values.cpp, line 67
> > 
> >
> > Let's add check on overflow; it will be helpful if scalar value was 
> > big. Scalar is a general type, there maybe used with a big value, e.g. 
> > total size of distributed filesystem.

What should we do in case of overflow?

Note that we don't check for overflow in operator+ (or for negative result 
values in operator-)...


- Neil


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review119382
---


On Feb. 16, 2016, 11:29 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 16, 2016, 11:29 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto e24d3e03a7dc7c6bfd07f34531cb593fe4925646 
>   include/mesos/v1/mesos.proto d909e60ddfd8e3ba2075f82c372edde04cd99d54 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> a9d2c23162892e0f97d89a076d2311091d91 
>   src

Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-16 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/
---

(Updated Feb. 17, 2016, 12:49 a.m.)


Review request for mesos, Joris Van Remoortere and Michael Park.


Changes
---

Address some of Klaus' comments.


Bugs: MESOS-4687
https://issues.apache.org/jira/browse/MESOS-4687


Repository: mesos


Description
---

Scalar resource values are represented using floating point. As a result, users
could see unexpected results when accepting offers and making reservations for
fractional resources: values like "0.1" cannot be precisely represented using
standard floating point, and the resource values returned to frameworks might
contain an unpredictable amount of roundoff error.

This commit adjusts the master to use fixed-point when doing internal
computations on scalar resource values. The fixed-point format only supports
three decimal digits of precision: that is, fractional resource values like
"0.001" will be supported, but "0.0001" will not be.


Diffs (updated)
-

  docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
  docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
  include/mesos/mesos.proto e24d3e03a7dc7c6bfd07f34531cb593fe4925646 
  include/mesos/v1/mesos.proto d909e60ddfd8e3ba2075f82c372edde04cd99d54 
  src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
  src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
  src/master/allocator/mesos/hierarchical.cpp 
a9d2c23162892e0f97d89a076d2311091d91 
  src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
  src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
  src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 

Diff: https://reviews.apache.org/r/43635/diff/


Testing
---

make check

Manually verified that some of the floating point oddities in 
https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
is applied, although I wasn't able to reproduce the crash described in that 
ticket.

REVIEW NOTES:

* We don't currently emit a warning when discarding additional digits of 
precision from input scalar resource values. Should we? That would require 
identifying all the points where a resource value is seemed to be 
"user-provided", and also runs the risk of generating a ton of log messages 
when an old framework is used.
* Similarly, if the user gives us a resource value and we don't do anything to 
it, we won't discard any additional precision that appears in the value -- the 
precision only gets discarded when we apply an operator like `+` or `-`. 
Unclear if we should trim additional precision from all scalar resource values 
more aggressively.


Thanks,

Neil Conway



Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-16 Thread Neil Conway

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review119393
---




docs/attributes-resources.md (line 7)


Not really, just a minor doc cleanup I made along the way. Happy to split 
into a separate patch if you'd prefer.



docs/attributes-resources.md (line 39)


Two spaces after a period is considered good style in English prose 
according to some people (this style is used in various places throughout the 
comments and docs, but we aren't consistent). I don't have a strong view, but 
it will require changing a lot more places than just here to adopt a single 
style.



src/common/values.cpp (line 52)


I'd prefer to keep the newline: without a newline, it suggests that the 
comment is specific to the function that follows (`operator<<`), which would be 
misleading: the comment applies to the following ~7 functions.



src/common/values.cpp (line 60)


Truncation rather than rounding might also be reasonable behavior; I'm 
curious what other people think.



src/common/values.cpp (line 61)


I didn't use `setprecision`, because:

1. There is value in making sure we use the same rounding method in this 
operator as elsewhere, e.g, for corner-cases like 1.2345
2. `setprecision` is a side-effecting operation, so modifying the caller's 
`ostream` seems problematic.



src/common/values.cpp (line 67)


What should we do in case of overflow?

Note that we don't check for overflow in `operator+` (or for negative 
result values in `operator-`)...


- Neil Conway


On Feb. 16, 2016, 11:29 p.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 16, 2016, 11:29 p.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto e24d3e03a7dc7c6bfd07f34531cb593fe4925646 
>   include/mesos/v1/mesos.proto d909e60ddfd8e3ba2075f82c372edde04cd99d54 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> a9d2c23162892e0f97d89a076d2311091d91 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> 
> Thanks,
> 
> Ne

Re: Review Request 43635: Changed scalar resources to use fixed-point internally.

2016-02-16 Thread Klaus Ma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43635/#review119382
---




docs/attributes-resources.md (line 7)


Any related to this patches?



docs/attributes-resources.md (line 39)


There seems two space before "Those are used to ..." in review board; but 
it's OK drop this issues if it's one in the code.



include/mesos/mesos.proto (line 513)


Remove one space before "Any additional precision"



src/common/values.cpp (line 52)


remove empty line.



src/common/values.cpp (line 59)


Replace 1000 with const integer



src/common/values.cpp (line 60)


Regarnding `lround`, it maybe overcommit resources, if framework keep 
launching task with smaller CPU usage; I'd suggest to discard the additional 
precission directly (e.g. 1.2345 -> 1.234). If there's any idle resources 
because of discarding, oversubscription will help.

And we need to reject operation request with small resources which is 
handled in MESOS-1807.



src/common/values.cpp (line 61)


Should we `setprecision`?



src/common/values.cpp (line 67)


Let's add check on overflow; it will be helpful if scalar value was big. 
Scalar is a general type, there maybe used with a big value, e.g. total size of 
distributed filesystem.



src/tests/resources_tests.cpp (line 23)


Seems those two headers are not necessary.


- Klaus Ma


On Feb. 17, 2016, 7:29 a.m., Neil Conway wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43635/
> ---
> 
> (Updated Feb. 17, 2016, 7:29 a.m.)
> 
> 
> Review request for mesos, Joris Van Remoortere and Michael Park.
> 
> 
> Bugs: MESOS-4687
> https://issues.apache.org/jira/browse/MESOS-4687
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Scalar resource values are represented using floating point. As a result, 
> users
> could see unexpected results when accepting offers and making reservations for
> fractional resources: values like "0.1" cannot be precisely represented using
> standard floating point, and the resource values returned to frameworks might
> contain an unpredictable amount of roundoff error.
> 
> This commit adjusts the master to use fixed-point when doing internal
> computations on scalar resource values. The fixed-point format only supports
> three decimal digits of precision: that is, fractional resource values like
> "0.001" will be supported, but "0.0001" will not be.
> 
> 
> Diffs
> -
> 
>   docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e 
>   docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 
>   include/mesos/mesos.proto e24d3e03a7dc7c6bfd07f34531cb593fe4925646 
>   include/mesos/v1/mesos.proto d909e60ddfd8e3ba2075f82c372edde04cd99d54 
>   src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc 
>   src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 
>   src/master/allocator/mesos/hierarchical.cpp 
> a9d2c23162892e0f97d89a076d2311091d91 
>   src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff 
>   src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af 
>   src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 
> 
> Diff: https://reviews.apache.org/r/43635/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> Manually verified that some of the floating point oddities in 
> https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch 
> is applied, although I wasn't able to reproduce the crash described in that 
> ticket.
> 
> REVIEW NOTES:
> 
> * We don't currently emit a warning when discarding additional digits of 
> precision from input scalar resource values. Should we? That would require 
> identifying all the points where a resource value is seemed to be 
> "user-provided", and also runs the risk of generating a ton of log messages 
> when an old framework is used.
> * Similarly, if the user gives us a resource value and we don't do anything 
> to it, we won't discard any additional precision that appears in the value -- 
> the precision only gets discarded when we apply an operator like `+` or `-`. 
> Unclear if we should trim additional precision from all scalar resource 
> values more aggressively.
> 
> 
> Thanks,
> 
> Neil Conway
> 
>