> On Sept. 16, 2016, 1:20 a.m., Aurora ReviewBot wrote: > > Master (783baae) is red with this patch. > > ./build-support/jenkins/build.sh > > > > [1m # Create file stdout for capturing output. > > We can't use StringIO mock[0m > > [1m # because TestProcess is running fork.[0m > > [1m with open(os.path.join(td, 'sys_stdout'), > > 'w+') as stdout:[0m > > [1m with open(os.path.join(td, > > 'sys_stderr'), 'w+') as stderr:[0m > > [1m with mutable_sys():[0m > > [1m sys.stdout, sys.stderr = stdout, > > stderr[0m > > [1m [0m > > [1m p = TestProcess('process', 'echo > > hello world; echo >&2 hello stderr', 0,[0m > > [1m taskpath, sandbox, > > logger_destination=LoggerDestination.BOTH)[0m > > [1m p.start()[0m > > [1m rc = > > wait_for_rc(taskpath.getpath('process_checkpoint'))[0m > > [1m [0m > > [1m assert rc == 0[0m > > [1m # Check log files were created in > > std path with correct content[0m > > [1m> assert_log_content(taskpath, > > 'stdout', 'hello world\n')[0m > > > > > > src/test/python/apache/thermos/core/test_process.py:487: > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > > > taskpath = <apache.thermos.common.path.TaskPath object > > at 0x7fdd3cd73b10> > > log_name = 'stdout' > > expected_content = 'hello world\n' > > > > [1m def assert_log_content(taskpath, log_name, > > expected_content):[0m > > [1m log = > > taskpath.with_filename(log_name).getpath('process_logdir')[0m > > [1m assert os.path.exists(log)[0m > > [1m with open(log, 'r') as fp:[0m > > [1m> assert fp.read() == expected_content[0m > > [1m[31mE assert '' == 'hello world\n'[0m > > [1m[31mE + hello world[0m > > > > > > src/test/python/apache/thermos/core/test_process.py:313: AssertionError > > generated xml file: > > /home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/415337499eb72578eab327a6487c1f5c9452b3d6.xml > > > > [1m[31m 1 failed, 710 passed, 6 skipped, 1 warnings > > in 226.09 seconds [0m > > > > FAILURE > > > > > > 01:19:57 04:18 [complete][31m > > FAILURE[0m > > > > > > I will refresh this build result if you post a review containing > > "@ReviewBot retry"
@ReviewBot retry - Maxim ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51929/#review149162 ----------------------------------------------------------- On Sept. 16, 2016, 12:51 a.m., Maxim Khutornenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51929/ > ----------------------------------------------------------- > > (Updated Sept. 16, 2016, 12:51 a.m.) > > > Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji. > > > Repository: aurora > > > Description > ------- > > This is phase 2 of scheduling perf improvement effort started in > https://reviews.apache.org/r/51759/. > > We can now take multiple (configurable) number of task IDs from a given > `TaskGroup` per scheduling. The idea is to go deeper through the offer queue > and assign more than one task if possible. This approach delivers > substantially better MTTA and still ensures fairness across multiple > `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 > tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be > set even higher if the majority of cluster jobs have large number of > instances and/or update batch sizes. > > As far as a single round perf goes, we can consider the following 2 > worst-case scenarios: > - master: single task scheduling fails after trying all offers in the queue > - this patch: N tasks launched with the very last N offers in the queue + `(N > x single_task_launch_latency)` > > Assuming that matching N tasks against M offers takes exactly the same time > as 1 task against M offers (as they all share the same `TaskGroup`), the only > measurable difference comes from the additional `N x > single_task_launch_latency` overhead. Based on real cluster observations, the > `single_task_launch_latency` is less than 1% of a single task scheduling > attempt, which is << than the savings from avoided additional scheduling > rounds. > > As far as jmh results go, the new approach (batching + multiple tasks per > round) is only slightly more demanding (~8%). Both results though are MUCH > higher than the real cluster perf, which just confirms we are not bound by > CPU time here: > > Master: > ``` > Benchmark > Mode Cnt Score Error Units > SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark > thrpt 10 17126.183 ± 488.425 ops/s > ``` > > This patch: > ``` > Benchmark > Mode Cnt Score Error Units > SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark > thrpt 10 15838.051 ± 187.890 ops/s > ``` > > NOTE: this will not apply cleanly as it branched off of > https://reviews.apache.org/r/51765, which itself depends on > https://reviews.apache.org/r/51759/. > > > Diffs > ----- > > src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java > 9d0d40b82653fb923bed16d06546288a1576c21d > src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java > 87b9e1928ab2d44668df1123f32ffdc4197c0c70 > src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java > 11e8033438ad0808e446e41bb26b3fa4c04136c7 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java > 5d319557057e27fd5fc6d3e553e9ca9139399c50 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java > c044ebe6f72183a67462bbd8e5be983eb592c3e9 > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java > d266f6a25ae2360db2977c43768a19b1f1efe8ff > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java > 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 > > src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java > ece476b918e6f2c128039e561eea23a94d8ed396 > > src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java > 209f9298a1d55207b9b41159f2ab366f92c1eb70 > > src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java > 0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 > src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java > c2ceb4e7685a9301f8014a9183e02fbad65bca26 > > src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java > ee5c6528af89cc62a35fdb314358c489556d8131 > src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java > 98048fabc00f233925b6cca015c2525980556e2b > > src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java > 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 > src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java > 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 > > src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java > 72562e6bd9a9860c834e6a9faa094c28600a8fed > src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java > b4d27f69ad5d4cce03da9f04424dc35d30e8af29 > > Diff: https://reviews.apache.org/r/51929/diff/ > > > Testing > ------- > > All types of testing including deploying to test and production clusters. > > > Thanks, > > Maxim Khutornenko > >