> On Jan. 13, 2016, 5:06 p.m., Dmitriy Shirchenko wrote:
> > src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java,
> >  lines 216-217
> > <https://reviews.apache.org/r/42126/diff/8/?file=1196146#file1196146line216>
> >
> >     Is this intentionally commented out?
> 
> Zhitao Li wrote:
>     Yes this is intentially removed. Old behavior of changing this `config` 
> here resulted in `resources` in static final object `OFFER` not sufficient to 
> be allocated, and I don't see any point to use a different executor for this 
> test case, so I've deleted this code.
>     
>     If any reviewer thinks there is a reason to test this with a different 
> executor, I can reconstruct an `Offer` object with correct resource and add 
> this back.
> 
> Bill Farner wrote:
>     This is slightly suspicious...are you sure resource accounting wasn't 
> impacted in some way?  I would expect old test cases to logically work with 
> this feature.
> 
> Zhitao Li wrote:
>     I recalled why I commented this out in the first pass now (this was a bit 
> tricky and I mistakenly thought it's logical bug while it's due to floating 
> point precision).
>     
>     In TaskExecutors.java, `SOME_OVERHEAD_EXECUTOR` uses `0.01` cpus:
>     ```
>       public static final ExecutorSettings SOME_OVERHEAD_EXECUTOR =
>           TestExecutorSettings.thermosOnlyWithOverhead(
>               new ResourceSlot(0.01, Amount.of(256L, Data.MB), Amount.of(0L, 
> Data.MB), 0));
>     ```
>     Unfortunately, `5.01 - 5 - 0.01` **is not zero** in floating point math 
> of Java, so the test keeps throwing `InsufficientResourcesException` even if 
> I constructed an `Offer` whose resource is correctly equivalent to 
> `ResourceSlot.from(TASK_CONFIG).add(TaskExecutors.SOME_OVERHEAD_EXECUTOR.getExecutorOverhead())`.
>     
>     **My proposal is to change all comparison with zero in `AcceptedOffer` 
> class (3 places in current patch) to compare with an `EPSILON` whose value is 
> `1e-6` or something even smaller.** I don't think there is any reasonable 
> usage of resource unit smaller than this granularity, and the floating point 
> math in the class would be more robust. We can document this behavior 
> somewhere if it sounds interesting.
>     
>     The default resource usage for `THERMOS_EXECUTOR` is 0.25 so I don't 
> encounter this elsewhere.
>     
>     Some alternatives which I liked less:
>     1. Change `TaskExecutors.SOME_OVERHEAD_EXECUTOR`'s cpus to a value which 
> could be precisely represented by floating point (0.125 or something similar);
>     2. Manually add a bit more cpu resource to the `Offer` object to trick 
> the test to pass.
>     
>     If the above proposed solution sounds fine, I can send an update to the 
> patch.

I'm in favor of using an EPSILON value.  Please be sure to extract a function 
to handle the equivalence check.

It's too bad mesos doesn't use fixed point for scalar resources to avoid these 
issues.  Looks like there's been recent momentum to make that change: 
https://issues.apache.org/jira/browse/MESOS-3997

They recently addressed an instance of this issue with this change:
https://github.com/apache/mesos/commit/5b5dd566aea71f654c51b2706a958ca1b5cb07a3

```
CHECK_NEAR(result.cpus().get(), cpus().get(), MIN_CPUS);
```

For reference, `MIN_CPUS` is 0.01.


- Bill


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42126/#review114353
-----------------------------------------------------------


On Jan. 13, 2016, 5:39 p.m., Zhitao Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42126/
> -----------------------------------------------------------
> 
> (Updated Jan. 13, 2016, 5:39 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko, Dmitriy Shirchenko, and Bill 
> Farner.
> 
> 
> Bugs: AURORA-1109
>     https://issues.apache.org/jira/browse/AURORA-1109
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This review is a prototype for introducing multiple role support in Aurora.
> This creates a new class OfferAllocation, which allcoates resources to 
> resources field in TaskInfo and ExecutorInfo from an offer.
> 
> Current implementation prefers reserved resources over shared resources ('*' 
> role) if both are present
> 
> Several caveats:
> 1. This performs the allocate after scheduling decision in 
> TaskAssigner.maybeAssign is done, which leaves possibility of inconsistency 
> and late failure.
> 
> 
> Diffs
> -----
> 
>   NEWS acaff9eb2ab184b0ef750f8b8a00c20131997f6b 
>   src/main/java/org/apache/aurora/scheduler/AcceptedOffer.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/ResourceSlot.java 
> 7c3d681c216b78eeecebbe950186e5a79c6fe982 
>   src/main/java/org/apache/aurora/scheduler/Resources.java 
> db422a959ee7b982c2a44323de41ad75d1a40754 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  2255dd407cd1810c7df5baf17cfa85f79bfffeb8 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
> 8fdadda67478bb3110aa442b7d78493cf9c3edb4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7e8e456e288986eb0ce92a123b294e1e25d8ed18 
>   src/test/java/org/apache/aurora/scheduler/AcceptedOfferTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/ResourceSlotTest.java 
> e4ae943303823ac4bfbe999ed22f5999484462d8 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  33149ab415292eff04f38b61f2b1d1eac79f347a 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
> a5793bffabf4e5d6195b1b99f2363d241c0cecf9 
>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
> 3cbe9acd75def14ae2e0986914ba621fb164b3e4 
> 
> Diff: https://reviews.apache.org/r/42126/diff/
> 
> 
> Testing
> -------
> 
> 1. Unit tested with old and new tests;
> 2. vagrant integration tests: I manually separate out the vagrant box's cpu 
> and memory between 'aurora-test' role and '*' and verified that jobs can 
> still be launched (I can post the vagrant change in another follow upon 
> request).
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>

Reply via email to