----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20398/#review40705 -----------------------------------------------------------
src/main/java/org/apache/aurora/scheduler/async/RescheduleCalculator.java <https://reviews.apache.org/r/20398/#comment73815> Pull this part up to the previous line: ITaskEvent activeEvent = Iterables.find( arg1, arg2); src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java <https://reviews.apache.org/r/20398/#comment73816> This is a lazy iterable - you probably want to call toList() at the end to avoid on-the-fly filtering each time you walk over it. src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73821> Log message would be more useful with the task ID. src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73834> Have you considered making MedianAlgorithm non-abstract, and have it accept a ScheduleStatus in the constructor? That would eliminate the need for these two classes. src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73835> Can you elaborate or clarify "relative to request time"? src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73836> The decay comment threw me a bit - this might be improved by using consistent terminology "time frame". src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73839> Can you add a comment explaining the rationale here? I would have expected that this would only look at tasks currently in RUNNING. src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73838> s/numElements/percentileIndex/? src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73837> It would be good to extract a function to calculate the Nth percentile of a List of numbers. As it stands, this is duplicated and also inconsistent with the median calculation (the other one averages when there's a non-even number of elements). You might even go so far to extract a PercentileAlgorithm, whose constructor accepts something like Predicate<IScheduledTask> and Function<IScheduledTask, Long>. I'll leave that decision up to you, it might not make sense. src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73845> The name and doc don't line up with the values returned. You're returning a percentage [0, 100] (higher is better), but everything else suggests it's accumulated downtime (implying higher is worse). src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73844> please doc src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73840> Given that some functions deal with collections of events that pertain to different tasks, it would be helpful to document places like this where the events must be for a single task. src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73829> Use ImmutableList.Builder, to make sure that an immutable list is returned. src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73843> Please add comments in here, i suspect a newcomer would be pretty unclear on what's going on vs what's intended. src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73825> Can you more thoroughly document the behavior here? src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java <https://reviews.apache.org/r/20398/#comment73826> In your comment, please be sure to explain why the events from multiple tasks are merged and sorted. Also, please rename TASKS_TO_TASK_EVENTS to indicate the sorting behavior. src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java <https://reviews.apache.org/r/20398/#comment73846> Please share instance size definitions with SlotSizeCounter to prevent drift. src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java <https://reviews.apache.org/r/20398/#comment73847> These three classes seem to have a pretty big overlap. Can you solve this with a single implementation that accepts: Map<String, Range<Double>> Function<IScheduledTask, Double> - Bill Farner On April 17, 2014, 11:39 p.m., Maxim Khutornenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/20398/ > ----------------------------------------------------------- > > (Updated April 17, 2014, 11:39 p.m.) > > > Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman. > > > Bugs: AURORA-293 > https://issues.apache.org/jira/browse/AURORA-293 > > > Repository: aurora > > > Description > ------- > > High level overview: > - MetricCalculator runs periodically (every minute), pulls all task history, > packages it into SlaInstance list and updates stats; > - Stat calculation is handled by a pair of: SlaAlgorithm and a set of > applicable SlaGroups (logical groupings by job, cluster, resource and etc.); > - Stat name is generated by combining group and algorithm name parts. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/app/AppModule.java > eeafc784e915137cacd5f64df1252ccbaf6c0f6c > src/main/java/org/apache/aurora/scheduler/async/RescheduleCalculator.java > bc77bf2e6fbc1ff4159d049b0c28afd6832499ef > src/main/java/org/apache/aurora/scheduler/base/Tasks.java > fae2d237d18f945d4dd73ec56cd42981359bea46 > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java PRE-CREATION > src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java PRE-CREATION > src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java > PRE-CREATION > src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java > PRE-CREATION > src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java PRE-CREATION > > Diff: https://reviews.apache.org/r/20398/diff/ > > > Testing > ------- > > gradle build > gradle run > > Sample from local scheduler: > > sla_cluster_mtta 3271.493 > sla_cluster_mttr 3273.497 > sla_cluster_platform_uptime_percent 100.0 > sla_cpu_small_mtta 3271.493 > sla_cpu_small_mttr 3273.497 > sla_disk_small_mtta 3271.493 > sla_disk_small_mttr 3273.497 > sla_mesos_test_serviceJob0_job_uptime_75_00_sec 0 > sla_mesos_test_serviceJob0_job_uptime_90_00_sec 0 > sla_mesos_test_serviceJob0_job_uptime_95_00_sec 0 > sla_mesos_test_serviceJob0_job_uptime_99_00_sec 0 > sla_mesos_test_serviceJob0_mtta 0.0 > sla_mesos_test_serviceJob0_mttr 0.0 > sla_mesos_test_serviceJob0_platform_uptime_percent 100.0 > sla_mesos_test_serviceJob10_job_uptime_75_00_sec 0 > sla_mesos_test_serviceJob10_job_uptime_90_00_sec 0 > sla_mesos_test_serviceJob10_job_uptime_95_00_sec 0 > sla_mesos_test_serviceJob10_job_uptime_99_00_sec 0 > sla_mesos_test_serviceJob10_mtta 0.0 > sla_mesos_test_serviceJob10_mttr 0.0 > sla_mesos_test_serviceJob10_platform_uptime_percent 100.0 > sla_mesos_test_serviceJob12_job_uptime_75_00_sec 0 > sla_mesos_test_serviceJob12_job_uptime_90_00_sec 0 > sla_mesos_test_serviceJob12_job_uptime_95_00_sec 0 > sla_mesos_test_serviceJob12_job_uptime_99_00_sec 0 > sla_mesos_test_serviceJob12_mtta 0.0 > sla_mesos_test_serviceJob12_mttr 0.0 > sla_mesos_test_serviceJob12_platform_uptime_percent 100.0 > sla_mesos_test_serviceJob14_job_uptime_75_00_sec 0 > sla_mesos_test_serviceJob14_job_uptime_90_00_sec 0 > sla_mesos_test_serviceJob14_job_uptime_95_00_sec 0 > sla_mesos_test_serviceJob14_job_uptime_99_00_sec 0 > sla_mesos_test_serviceJob14_mtta 0.0 > sla_mesos_test_serviceJob14_mttr 0.0 > sla_mesos_test_serviceJob14_platform_uptime_percent 100.0 > sla_mesos_test_serviceJob16_job_uptime_75_00_sec 0 > sla_mesos_test_serviceJob16_job_uptime_90_00_sec 0 > sla_mesos_test_serviceJob16_job_uptime_95_00_sec 0 > sla_mesos_test_serviceJob16_job_uptime_99_00_sec 0 > sla_mesos_test_serviceJob16_mtta 0.0 > sla_mesos_test_serviceJob16_mttr 0.0 > sla_mesos_test_serviceJob16_platform_uptime_percent 100.0 > sla_mesos_test_serviceJob18_job_uptime_75_00_sec 0 > sla_mesos_test_serviceJob18_job_uptime_90_00_sec 0 > sla_mesos_test_serviceJob18_job_uptime_95_00_sec 0 > sla_mesos_test_serviceJob18_job_uptime_99_00_sec 0 > sla_mesos_test_serviceJob18_mtta 0.0 > sla_mesos_test_serviceJob18_mttr 0.0 > sla_mesos_test_serviceJob18_platform_uptime_percent 100.0 > sla_mesos_test_serviceJob2_job_uptime_75_00_sec 0 > sla_mesos_test_serviceJob2_job_uptime_90_00_sec 0 > sla_mesos_test_serviceJob2_job_uptime_95_00_sec 0 > sla_mesos_test_serviceJob2_job_uptime_99_00_sec 0 > sla_mesos_test_serviceJob2_mtta 0.0 > sla_mesos_test_serviceJob2_mttr 0.0 > sla_mesos_test_serviceJob2_platform_uptime_percent 100.0 > sla_mesos_test_serviceJob4_job_uptime_75_00_sec 0 > sla_mesos_test_serviceJob4_job_uptime_90_00_sec 0 > sla_mesos_test_serviceJob4_job_uptime_95_00_sec 0 > sla_mesos_test_serviceJob4_job_uptime_99_00_sec 0 > sla_mesos_test_serviceJob4_mtta 0.0 > sla_mesos_test_serviceJob4_mttr 0.0 > sla_mesos_test_serviceJob4_platform_uptime_percent 100.0 > sla_mesos_test_serviceJob6_job_uptime_75_00_sec 7125 > sla_mesos_test_serviceJob6_job_uptime_90_00_sec 7125 > sla_mesos_test_serviceJob6_job_uptime_95_00_sec 7125 > sla_mesos_test_serviceJob6_job_uptime_99_00_sec 7125 > sla_mesos_test_serviceJob6_mtta 3271.493 > sla_mesos_test_serviceJob6_mttr 3273.497 > sla_mesos_test_serviceJob6_platform_uptime_percent 100.0 > sla_mesos_test_serviceJob8_job_uptime_75_00_sec 0 > sla_mesos_test_serviceJob8_job_uptime_90_00_sec 0 > sla_mesos_test_serviceJob8_job_uptime_95_00_sec 0 > sla_mesos_test_serviceJob8_job_uptime_99_00_sec 0 > sla_mesos_test_serviceJob8_mtta 0.0 > sla_mesos_test_serviceJob8_mttr 0.0 > sla_mesos_test_serviceJob8_platform_uptime_percent 100.0 > sla_ram_small_mtta 3271.493 > sla_ram_small_mttr 3273.497 > > > File Attachments > ---------------- > > Coverage report > > https://reviews.apache.org/media/uploaded/files/2014/04/16/ffe00c63-bb3a-4b90-95f8-f23878b0fdab__SLA_coverage_report.png > > > Thanks, > > Maxim Khutornenko > >