> On Jan. 14, 2016, 1:06 a.m., Dmitriy Shirchenko wrote:
> > src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java,
> >  lines 216-217
> > <https://reviews.apache.org/r/42126/diff/8/?file=1196146#file1196146line216>
> >
> >     Is this intentionally commented out?
> 
> Zhitao Li wrote:
>     Yes this is intentially removed. Old behavior of changing this `config` 
> here resulted in `resources` in static final object `OFFER` not sufficient to 
> be allocated, and I don't see any point to use a different executor for this 
> test case, so I've deleted this code.
>     
>     If any reviewer thinks there is a reason to test this with a different 
> executor, I can reconstruct an `Offer` object with correct resource and add 
> this back.
> 
> Bill Farner wrote:
>     This is slightly suspicious...are you sure resource accounting wasn't 
> impacted in some way?  I would expect old test cases to logically work with 
> this feature.

I recalled why I commented this out in the first pass now (this was a bit 
tricky and I mistakenly thought it's logical bug while it's due to floating 
point precision).

In TaskExecutors.java, `SOME_OVERHEAD_EXECUTOR` uses `0.01` cpus:
```
  public static final ExecutorSettings SOME_OVERHEAD_EXECUTOR =
      TestExecutorSettings.thermosOnlyWithOverhead(
          new ResourceSlot(0.01, Amount.of(256L, Data.MB), Amount.of(0L, 
Data.MB), 0));
```
Unfortunately, `5.01 - 5 - 0.01` **is not zero** in floating point math of 
Java, so the test keeps throwing `InsufficientResourcesException` even if I 
constructed an `Offer` whose resource is correctly equivalent to 
`ResourceSlot.from(TASK_CONFIG).add(TaskExecutors.SOME_OVERHEAD_EXECUTOR.getExecutorOverhead())`.

**My proposal is to change all comparison with zero in `AcceptedOffer` class (3 
places in current patch) to compare with an `EPSILON` whose value is `1e-6` or 
something even smaller.** I don't think there is any reasonable usage of 
resource unit smaller than this granularity, and the floating point math in the 
class would be more robust. We can document this behavior somewhere if it 
sounds interesting.

The default resource usage for `THERMOS_EXECUTOR` is 0.25 so I don't encounter 
this elsewhere.

Some alternatives which I liked less:
1. Change `TaskExecutors.SOME_OVERHEAD_EXECUTOR`'s cpus to a value which could 
be precisely represented by floating point (0.125 or something similar);
2. Manually add a bit more cpu resource to the `Offer` object to trick the test 
to pass.

If the above proposed solution sounds fine, I can send an update to the patch.


- Zhitao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42126/#review114353
-----------------------------------------------------------


On Jan. 14, 2016, 1:39 a.m., Zhitao Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42126/
> -----------------------------------------------------------
> 
> (Updated Jan. 14, 2016, 1:39 a.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko, Dmitriy Shirchenko, and Bill 
> Farner.
> 
> 
> Bugs: AURORA-1109
>     https://issues.apache.org/jira/browse/AURORA-1109
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This review is a prototype for introducing multiple role support in Aurora.
> This creates a new class OfferAllocation, which allcoates resources to 
> resources field in TaskInfo and ExecutorInfo from an offer.
> 
> Current implementation prefers reserved resources over shared resources ('*' 
> role) if both are present
> 
> Several caveats:
> 1. This performs the allocate after scheduling decision in 
> TaskAssigner.maybeAssign is done, which leaves possibility of inconsistency 
> and late failure.
> 
> 
> Diffs
> -----
> 
>   NEWS acaff9eb2ab184b0ef750f8b8a00c20131997f6b 
>   src/main/java/org/apache/aurora/scheduler/AcceptedOffer.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/ResourceSlot.java 
> 7c3d681c216b78eeecebbe950186e5a79c6fe982 
>   src/main/java/org/apache/aurora/scheduler/Resources.java 
> db422a959ee7b982c2a44323de41ad75d1a40754 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  2255dd407cd1810c7df5baf17cfa85f79bfffeb8 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
> 8fdadda67478bb3110aa442b7d78493cf9c3edb4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7e8e456e288986eb0ce92a123b294e1e25d8ed18 
>   src/test/java/org/apache/aurora/scheduler/AcceptedOfferTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/ResourceSlotTest.java 
> e4ae943303823ac4bfbe999ed22f5999484462d8 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  33149ab415292eff04f38b61f2b1d1eac79f347a 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
> a5793bffabf4e5d6195b1b99f2363d241c0cecf9 
>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
> 3cbe9acd75def14ae2e0986914ba621fb164b3e4 
> 
> Diff: https://reviews.apache.org/r/42126/diff/
> 
> 
> Testing
> -------
> 
> 1. Unit tested with old and new tests;
> 2. vagrant integration tests: I manually separate out the vagrant box's cpu 
> and memory between 'aurora-test' role and '*' and verified that jobs can 
> still be launched (I can post the vagrant change in another follow upon 
> request).
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>

Reply via email to