> On March 7, 2018, 6:48 p.m., David McLaughlin wrote:
> > So what happens if there are two bad hosts? :)
> 
> Jordan Ly wrote:
>     This does not scale past n=1
>     
>     We can make this more generic by getting the list of hosts the task has 
> previously failed on and looking through offers for a host the task did not 
> fail on for some operator defined value (something like 
> `-failure_avoidance_factor`)
> 
> Santhosh Kumar Shanmugham wrote:
>     Note making this more generic is still incumbent on the amount of task 
> history we have on the scheduler.

Discussed offline:

Going to go a different route -- this method is very domain-specific and does 
not allow for preemption to kick in since if there is only one host matching 
and it is bad you can still be repeatedly scheduled on it. Instead, going to go 
a more generic solution involving banning scheduling on a host temporarily if 
the task fails on that host via `SchedulingFilter`. This would be enabled 
through a operator-defined option.


- Jordan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65941/#review198803
-----------------------------------------------------------


On March 7, 2018, 5:50 a.m., Jordan Ly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65941/
> -----------------------------------------------------------
> 
> (Updated March 7, 2018, 5:50 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> If a task fails on a host, we should try to avoid rescheduling the task on 
> the same host if possible. This is done in order to avoid a potentially bad 
> host. This issue generally comes up when you are bin-packing hosts (i.e. 
> using the `-offer_order` option).
> 
> If there are no other offers to schedule the task on, we will still use the 
> offer.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImpl.java 
> fcafecf63040f9c410458dedfd3d87b0d669d205 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImplTest.java
>  864538b6730d7318385494818276ba370124b8e9 
> 
> 
> Diff: https://reviews.apache.org/r/65941/diff/1/
> 
> 
> Testing
> -------
> 
> `./gradlew test`
> 
> Benchmarks and live-cluster testing coming soon.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>

Reply via email to