> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/async/RescheduleCalculator.java, 
> > line 117
> > <https://reviews.apache.org/r/20398/diff/2/?file=561774#file561774line117>
> >
> >     Pull this part up to the previous line:
> >     
> >       ITaskEvent activeEvent = Iterables.find(
> >           arg1,
> >           arg2);

Done.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, line 
> > 141
> > <https://reviews.apache.org/r/20398/diff/2/?file=561776#file561776line141>
> >
> >     This is a lazy iterable - you probably want to call toList() at the end 
> > to avoid on-the-fly filtering each time you walk over it.

Done.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 114
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line114>
> >
> >     Log message would be more useful with the task ID.

I have realized that errors were not reported from the executor thread. Added 
afterExecute() into the executor and converted this to explicit exception to 
avoid unneeded string concatenation.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 155
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line155>
> >
> >     Have you considered making MedianAlgorithm non-abstract, and have it 
> > accept a ScheduleStatus in the constructor?  That would eliminate the need 
> > for these two classes.

These were mostly placeholders for the documentation. I guess we don't lose 
much from collapsing them into one. Done.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 199
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line199>
> >
> >     Can you elaborate or clarify "relative to request time"?

Done.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 200
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line200>
> >
> >     The decay comment threw me a bit - this might be improved by using 
> > consistent terminology "time frame".

Dropped it completely as it only adds confusion here.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 226
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line226>
> >
> >     Can you add a comment explaining the rationale here?  I would have 
> > expected that this would only look at tasks currently in RUNNING.

Added a comment. We consider states like KILLING or RESTARTING as alive for job 
uptime.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 233
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line233>
> >
> >     s/numElements/percentileIndex/?

percentileElements would be more accurate here as it is not the index yet. The 
next line yields index.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 235
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line235>
> >
> >     It would be good to extract a function to calculate the Nth percentile 
> > of a List of numbers.  As it stands, this is duplicated and also 
> > inconsistent with the median calculation (the other one averages when 
> > there's a non-even number of elements).
> >     
> >     You might even go so far to extract a PercentileAlgorithm, whose 
> > constructor accepts something like Predicate<IScheduledTask> and 
> > Function<IScheduledTask, Long>.  I'll leave that decision up to you, it 
> > might not make sense.

I see your point but don't think we should mix the two. The percentile 
calculation needs to be discrete to accurately reflect the uptime at the 
specified instance percentile (floor(percentileElements)), whereas the median 
algorithm tends to report the mid average (floor(average(n/2, n/2 - 1))). I 
would prefer to keep uptime calculation more conservative with respect to the 
reported uptime value.    


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 253
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line253>
> >
> >     The name and doc don't line up with the values returned.  You're 
> > returning a percentage [0, 100] (higher is better), but everything else 
> > suggests it's accumulated downtime (implying higher is worse).

Agreed. Changed it to AggregatePlatformUptime to better align with job uptime 
and exposed metric name.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 255
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line255>
> >
> >     please doc

Done.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 328
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line328>
> >
> >     Given that some functions deal with collections of events that pertain 
> > to different tasks, it would be helpful to document places like this where 
> > the events must be for a single task.

It's actually the opposite here. List<ITaskEvent> represents a cross task 
unified view of the instance history. The TASKS_TO_TASK_EVENTS creates an 
instance timeline and it remains like that to the end.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 333
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line333>
> >
> >     Use ImmutableList.Builder, to make sure that an immutable list is 
> > returned.

Done.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 339
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line339>
> >
> >     Please add comments in here, i suspect a newcomer would be pretty 
> > unclear on what's going on vs what's intended.

Done.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 408
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line408>
> >
> >     Can you more thoroughly document the behavior here?

Sure, done.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, line 413
> > <https://reviews.apache.org/r/20398/diff/2/?file=561777#file561777line413>
> >
> >     In your comment, please be sure to explain why the events from multiple 
> > tasks are merged and sorted.  Also, please rename TASKS_TO_TASK_EVENTS to 
> > indicate the sorting behavior.

Done.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java, line 103
> > <https://reviews.apache.org/r/20398/diff/2/?file=561778#file561778line103>
> >
> >     Please share instance size definitions with SlotSizeCounter to prevent 
> > drift.

Good idea. Converged on ResourceAggregates.


> On April 17, 2014, 11:41 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java, line 138
> > <https://reviews.apache.org/r/20398/diff/2/?file=561778#file561778line138>
> >
> >     These three classes seem to have a pretty big overlap.  Can you solve 
> > this with a single implementation that accepts:
> >     
> >       Map<String, Range<Double>>
> >       Function<IScheduledTask, Double>

I tried a similar approach before but did not like the upcasting to double 
everywhere (in the map init and within the function), so I would rather prefer 
the current approach as less clumsy. 


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20398/#review40705
-----------------------------------------------------------


On April 17, 2014, 11:39 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20398/
> -----------------------------------------------------------
> 
> (Updated April 17, 2014, 11:39 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-293
>     https://issues.apache.org/jira/browse/AURORA-293
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> High level overview:
> - MetricCalculator runs periodically (every minute), pulls all task history, 
> packages it into SlaInstance list and updates stats;
> - Stat calculation is handled by a pair of: SlaAlgorithm and a set of 
> applicable SlaGroups (logical groupings by job, cluster, resource and etc.);
> - Stat name is generated by combining group and algorithm name parts.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> eeafc784e915137cacd5f64df1252ccbaf6c0f6c 
>   src/main/java/org/apache/aurora/scheduler/async/RescheduleCalculator.java 
> bc77bf2e6fbc1ff4159d049b0c28afd6832499ef 
>   src/main/java/org/apache/aurora/scheduler/base/Tasks.java 
> fae2d237d18f945d4dd73ec56cd42981359bea46 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/20398/diff/
> 
> 
> Testing
> -------
> 
> gradle build
> gradle run
> 
> Sample from local scheduler:
> 
> sla_cluster_mtta 3271.493
> sla_cluster_mttr 3273.497
> sla_cluster_platform_uptime_percent 100.0
> sla_cpu_small_mtta 3271.493
> sla_cpu_small_mttr 3273.497
> sla_disk_small_mtta 3271.493
> sla_disk_small_mttr 3273.497
> sla_mesos_test_serviceJob0_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob0_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob0_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob0_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob0_mtta 0.0
> sla_mesos_test_serviceJob0_mttr 0.0
> sla_mesos_test_serviceJob0_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob10_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob10_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob10_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob10_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob10_mtta 0.0
> sla_mesos_test_serviceJob10_mttr 0.0
> sla_mesos_test_serviceJob10_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob12_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob12_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob12_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob12_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob12_mtta 0.0
> sla_mesos_test_serviceJob12_mttr 0.0
> sla_mesos_test_serviceJob12_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob14_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob14_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob14_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob14_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob14_mtta 0.0
> sla_mesos_test_serviceJob14_mttr 0.0
> sla_mesos_test_serviceJob14_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob16_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob16_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob16_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob16_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob16_mtta 0.0
> sla_mesos_test_serviceJob16_mttr 0.0
> sla_mesos_test_serviceJob16_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob18_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob18_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob18_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob18_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob18_mtta 0.0
> sla_mesos_test_serviceJob18_mttr 0.0
> sla_mesos_test_serviceJob18_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob2_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob2_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob2_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob2_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob2_mtta 0.0
> sla_mesos_test_serviceJob2_mttr 0.0
> sla_mesos_test_serviceJob2_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob4_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob4_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob4_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob4_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob4_mtta 0.0
> sla_mesos_test_serviceJob4_mttr 0.0
> sla_mesos_test_serviceJob4_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob6_job_uptime_75_00_sec 7125
> sla_mesos_test_serviceJob6_job_uptime_90_00_sec 7125
> sla_mesos_test_serviceJob6_job_uptime_95_00_sec 7125
> sla_mesos_test_serviceJob6_job_uptime_99_00_sec 7125
> sla_mesos_test_serviceJob6_mtta 3271.493
> sla_mesos_test_serviceJob6_mttr 3273.497
> sla_mesos_test_serviceJob6_platform_uptime_percent 100.0
> sla_mesos_test_serviceJob8_job_uptime_75_00_sec 0
> sla_mesos_test_serviceJob8_job_uptime_90_00_sec 0
> sla_mesos_test_serviceJob8_job_uptime_95_00_sec 0
> sla_mesos_test_serviceJob8_job_uptime_99_00_sec 0
> sla_mesos_test_serviceJob8_mtta 0.0
> sla_mesos_test_serviceJob8_mttr 0.0
> sla_mesos_test_serviceJob8_platform_uptime_percent 100.0
> sla_ram_small_mtta 3271.493
> sla_ram_small_mttr 3273.497
> 
> 
> File Attachments
> ----------------
> 
> Coverage report
>   
> https://reviews.apache.org/media/uploaded/files/2014/04/16/ffe00c63-bb3a-4b90-95f8-f23878b0fdab__SLA_coverage_report.png
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>

Reply via email to