> On Feb. 17, 2016, 12:30 a.m., Klaus Ma wrote: > > src/common/values.cpp, line 67 > > <https://reviews.apache.org/r/43635/diff/1/?file=1252182#file1252182line67> > > > > Let's add check on overflow; it will be helpful if scalar value was > > big. Scalar is a general type, there maybe used with a big value, e.g. > > total size of distributed filesystem. > > Neil Conway wrote: > What should we do in case of overflow? > > Note that we don't check for overflow in operator+ (or for negative > result values in operator-)... > > Klaus Ma wrote: > `CHECK` or warning log should be fine, it only improves the debugability; > it different with `operator+` because `max_double/2` should be big enough for > us; but for `double * 1000 -> long`, I'm not sure of that. > > And I think we need to use `long long` or `int64` instead of `long`; > `long` is not garantee to be 64bit in c++. > > Neil Conway wrote: > re: `long`, good point -- we only officially support 64-bit OSX, Linux, > and Windows, but `long` is 32-bit on Windows.
After discussing this with a few folks offline, our feeling was that the range of resource values we can represent without overflow should be sufficient for any reasonable purpose for a considerable length of time. Hopefully we'll have switched to a fixed point format before people use Mesos to manage > 10s of exabytes as a single resource :) - Neil ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43635/#review119382 ----------------------------------------------------------- On Feb. 19, 2016, 10:27 p.m., Neil Conway wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/43635/ > ----------------------------------------------------------- > > (Updated Feb. 19, 2016, 10:27 p.m.) > > > Review request for mesos, Joris Van Remoortere and Michael Park. > > > Bugs: MESOS-4687 > https://issues.apache.org/jira/browse/MESOS-4687 > > > Repository: mesos > > > Description > ------- > > Scalar resource values are represented using floating point. As a result, > users > could see unexpected results when accepting offers and making reservations for > fractional resources: values like "0.1" cannot be precisely represented using > standard floating point, and the resource values returned to frameworks might > contain an unpredictable amount of roundoff error. > > This commit adjusts the master to use fixed-point when doing internal > computations on scalar resource values. The fixed-point format only supports > three decimal digits of precision: that is, fractional resource values like > "0.001" will be supported, but "0.0001" will not be. > > > Diffs > ----- > > docs/attributes-resources.md 818da8ab0c672144b02f526b2b805cf0505d2c7e > docs/upgrades.md 4f30d725c6ed28c09a1c5528fd4193c3f06b2d93 > include/mesos/mesos.proto 636550f255a122d7f714dbd58f587bea8221b461 > include/mesos/v1/mesos.proto 1d5af88a343fe9d81688bb3e83aa997ccba7d030 > src/common/resources.cpp 5d731870542166cec11f9956ccdf16207b2d22cc > src/common/values.cpp c64407bc97ad858300f4661d616e0480920fc541 > src/master/allocator/mesos/hierarchical.cpp > 5ef29f26ec8071f79c2f4f78dbe2bb0a613cc92d > src/tests/resources_tests.cpp 96864c3945729fe33be8b243b9c826fb12e90eff > src/v1/resources.cpp 207eb61d6a6d03d314539d42751cac65fcffa9af > src/v1/values.cpp 58ea9875804bf0287855a1e9855855e5e54de4c4 > > Diff: https://reviews.apache.org/r/43635/diff/ > > > Testing > ------- > > make check > > Manually verified that some of the floating point oddities in > https://issues.apache.org/jira/browse/MESOS-4071 do not occur when this patch > is applied, although I wasn't able to reproduce the crash described in that > ticket. > > REVIEW NOTES: > > * We don't currently emit a warning when discarding additional digits of > precision from input scalar resource values. Should we? That would require > identifying all the points where a resource value is seemed to be > "user-provided", and also runs the risk of generating a ton of log messages > when an old framework is used. > * Similarly, if the user gives us a resource value and we don't do anything > to it, we won't discard any additional precision that appears in the value -- > the precision only gets discarded when we apply an operator like `+` or `-`. > Unclear if we should trim additional precision from all scalar resource > values more aggressively. > > PERFORMANCE: > > This is the performance of the `DeclineOffers` benchmark WITHOUT this RR > applied (optimized build on my laptop): > > ``` > [ RUN ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers > Using 2000 slaves and 200 frameworks > round 0 allocate took 2.192425secs to make 200 offers > round 1 allocate took 2.221243secs to make 200 offers > round 2 allocate took 2.236314secs to make 200 offers > round 3 allocate took 2.224045secs to make 200 offers > round 4 allocate took 2.232822secs to make 200 offers > round 5 allocate took 2.264807secs to make 200 offers > round 6 allocate took 2.224853secs to make 200 offers > round 7 allocate took 2.224829secs to make 200 offers > round 8 allocate took 2.24862secs to make 200 offers > round 9 allocate took 2.2556secs to make 200 offers > round 10 allocate took 2.256616secs to make 200 offers > ``` > > And after applying this RR: > > ``` > [ RUN ] HierarchicalAllocator_BENCHMARK_Test.DeclineOffers > Using 2000 slaves and 200 frameworks > round 0 allocate took 2.267919secs to make 200 offers > round 1 allocate took 2.202843secs to make 200 offers > round 2 allocate took 2.20426secs to make 200 offers > round 3 allocate took 2.263887secs to make 200 offers > round 4 allocate took 2.266237secs to make 200 offers > round 5 allocate took 2.276957secs to make 200 offers > round 6 allocate took 2.291821secs to make 200 offers > round 7 allocate took 2.261839secs to make 200 offers > round 8 allocate took 2.325696secs to make 200 offers > round 9 allocate took 2.310469secs to make 200 offers > round 10 allocate took 2.21802secs to make 200 offers > ``` > > Which suggests to me that any performance hit is pretty minimal. > > > Thanks, > > Neil Conway > >
