Re: Review Request 31559: Split thermos cli into pieces. Add custom aurora thermos entry point.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31559/ --- (Updated March 4, 2015, 6:52 p.m.) Review request for Aurora, Joe Smith and Zameer Manji. Changes --- Update diff post master merge. Bugs: AURORA-1027 https://issues.apache.org/jira/browse/AURORA-1027 Repository: aurora Description --- This splits src/main/python/apache/thermos/bin/thermos.py into src/main/python/apache/thermos/cli/**. This also adds a path detector registry (AURORA-1027) for the client so that we can easily add a customized aurora entry point for thermos that is capable of finding checkpoint roots in executor sandboxes. Added said entry point in src/main/apache/aurora/tools/thermos.py -- open for other suggestions where this should live (possibly under aurora/executor/bin?) This is a drop-in replacement for the existing thermos binary. The cli/commands/*.py are mostly no-op splits with only a few lines changed in gc, status and tail to get_path_detector. Added a basic testing framework and an example test for simplerun but did not go with full coverage, instead opting to ship AURORA-1027 for now. Diffs (updated) - src/main/python/apache/aurora/tools/BUILD PRE-CREATION src/main/python/apache/aurora/tools/thermos.py PRE-CREATION src/main/python/apache/thermos/bin/BUILD 1a002da4f8294b3ecd43ab845eb15bc8ebd12b26 src/main/python/apache/thermos/bin/thermos.py b8a638822f44cfc596b16483a19efdeb4507758f src/main/python/apache/thermos/cli/BUILD PRE-CREATION src/main/python/apache/thermos/cli/bin/BUILD PRE-CREATION src/main/python/apache/thermos/cli/bin/thermos.py PRE-CREATION src/main/python/apache/thermos/cli/commands/BUILD PRE-CREATION src/main/python/apache/thermos/cli/commands/gc.py PRE-CREATION src/main/python/apache/thermos/cli/commands/help.py PRE-CREATION src/main/python/apache/thermos/cli/commands/inspect.py PRE-CREATION src/main/python/apache/thermos/cli/commands/kill.py PRE-CREATION src/main/python/apache/thermos/cli/commands/read.py PRE-CREATION src/main/python/apache/thermos/cli/commands/run.py PRE-CREATION src/main/python/apache/thermos/cli/commands/simplerun.py PRE-CREATION src/main/python/apache/thermos/cli/commands/status.py PRE-CREATION src/main/python/apache/thermos/cli/commands/tail.py PRE-CREATION src/main/python/apache/thermos/cli/common.py PRE-CREATION src/main/python/apache/thermos/cli/main.py PRE-CREATION src/test/python/apache/thermos/BUILD df50964851c6be89c646015edf91d07992743497 src/test/python/apache/thermos/bin/BUILD 596aeb200dc55b077ed73fc342b9187045732890 src/test/python/apache/thermos/bin/test_thermos.py ae15571948606f40bb360136311d22f52685517c src/test/python/apache/thermos/cli/commands/BUILD PRE-CREATION src/test/python/apache/thermos/cli/commands/test_import.py PRE-CREATION src/test/python/apache/thermos/cli/commands/test_simplerun.py PRE-CREATION src/test/python/apache/thermos/cli/test_common.py PRE-CREATION Diff: https://reviews.apache.org/r/31559/diff/ Testing --- ./pants test src/test/python/apache/thermos/cli:: Thanks, Brian Wickman
Re: Review Request 31350: Fix clusters.patch contextmanager cleanup
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31350/#review75203 --- Stephan: I'm ready to commit this but it looks like you don't have an email address in reviewboard. Can you set one so that we can attribute this patch and ping this review? ``` File /Users/ksweeney/workspace/aurora/build-support/rbt.venv/lib/python2.7/site-packages/rbtools/clients/git.py, line 740, in create_commit '--author=%s %s' % (author.fullname, author.email)]) File /Users/ksweeney/workspace/aurora/build-support/rbt.venv/lib/python2.7/site-packages/rbtools/api/resource.py, line 299, in __getattr__ raise AttributeError AttributeError ``` - Kevin Sweeney On Feb. 24, 2015, 2:12 p.m., Stephan Erb wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31350/ --- (Updated Feb. 24, 2015, 2:12 p.m.) Review request for Aurora, Kevin Sweeney and Maxim Khutornenko. Repository: aurora Description --- Fix clusters.patch contextmanager cleanup Otherwise one failing testcase can affect the entire suite. Diffs - src/main/python/apache/aurora/common/clusters.py c4b7fefca30313b281808478bf23158a9b7fbf85 src/test/python/apache/aurora/common/test_clusters.py 1bd696e9cd28d87d0cac68b33ab043407d796b61 Diff: https://reviews.apache.org/r/31350/diff/ Testing --- ./pants test.pytest --no-fast src/test/python/apache/aurora/common:: Thanks, Stephan Erb
Re: Review Request 31559: Split thermos cli into pieces. Add custom aurora thermos entry point.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31559/#review75210 --- Ship it! Ship It! - Joe Smith On March 4, 2015, 10:52 a.m., Brian Wickman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31559/ --- (Updated March 4, 2015, 10:52 a.m.) Review request for Aurora, Joe Smith and Zameer Manji. Bugs: AURORA-1027 https://issues.apache.org/jira/browse/AURORA-1027 Repository: aurora Description --- This splits src/main/python/apache/thermos/bin/thermos.py into src/main/python/apache/thermos/cli/**. This also adds a path detector registry (AURORA-1027) for the client so that we can easily add a customized aurora entry point for thermos that is capable of finding checkpoint roots in executor sandboxes. Added said entry point in src/main/apache/aurora/tools/thermos.py -- open for other suggestions where this should live (possibly under aurora/executor/bin?) This is a drop-in replacement for the existing thermos binary. The cli/commands/*.py are mostly no-op splits with only a few lines changed in gc, status and tail to get_path_detector. Added a basic testing framework and an example test for simplerun but did not go with full coverage, instead opting to ship AURORA-1027 for now. Diffs - src/main/python/apache/aurora/tools/BUILD PRE-CREATION src/main/python/apache/aurora/tools/thermos.py PRE-CREATION src/main/python/apache/thermos/bin/BUILD 1a002da4f8294b3ecd43ab845eb15bc8ebd12b26 src/main/python/apache/thermos/bin/thermos.py b8a638822f44cfc596b16483a19efdeb4507758f src/main/python/apache/thermos/cli/BUILD PRE-CREATION src/main/python/apache/thermos/cli/bin/BUILD PRE-CREATION src/main/python/apache/thermos/cli/bin/thermos.py PRE-CREATION src/main/python/apache/thermos/cli/commands/BUILD PRE-CREATION src/main/python/apache/thermos/cli/commands/gc.py PRE-CREATION src/main/python/apache/thermos/cli/commands/help.py PRE-CREATION src/main/python/apache/thermos/cli/commands/inspect.py PRE-CREATION src/main/python/apache/thermos/cli/commands/kill.py PRE-CREATION src/main/python/apache/thermos/cli/commands/read.py PRE-CREATION src/main/python/apache/thermos/cli/commands/run.py PRE-CREATION src/main/python/apache/thermos/cli/commands/simplerun.py PRE-CREATION src/main/python/apache/thermos/cli/commands/status.py PRE-CREATION src/main/python/apache/thermos/cli/commands/tail.py PRE-CREATION src/main/python/apache/thermos/cli/common.py PRE-CREATION src/main/python/apache/thermos/cli/main.py PRE-CREATION src/test/python/apache/thermos/BUILD df50964851c6be89c646015edf91d07992743497 src/test/python/apache/thermos/bin/BUILD 596aeb200dc55b077ed73fc342b9187045732890 src/test/python/apache/thermos/bin/test_thermos.py ae15571948606f40bb360136311d22f52685517c src/test/python/apache/thermos/cli/commands/BUILD PRE-CREATION src/test/python/apache/thermos/cli/commands/test_import.py PRE-CREATION src/test/python/apache/thermos/cli/commands/test_simplerun.py PRE-CREATION src/test/python/apache/thermos/cli/test_common.py PRE-CREATION Diff: https://reviews.apache.org/r/31559/diff/ Testing --- ./pants test src/test/python/apache/thermos/cli:: Thanks, Brian Wickman
Re: Review Request 31739: Making task preemption asynchronous.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31739/#review75222 --- src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java https://reviews.apache.org/r/31739/#comment122186 I think a comment would help explain why 30 Days is the desired value to use for the benchmarks. src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java https://reviews.apache.org/r/31739/#comment122189 Can you explain the reason for the delay? We already wait for a task to be pending for 10 minutes before we do preemption, I don't see the reason for another delay until we look for a victim. - Zameer Manji On March 4, 2015, 11:30 a.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31739/ --- (Updated March 4, 2015, 11:30 a.m.) Review request for Aurora, Bill Farner and Zameer Manji. Bugs: AURORA-1158 https://issues.apache.org/jira/browse/AURORA-1158 Repository: aurora Description --- Reservations now happen asynchronously with a configurable delay between a failed task scheduling and a preemption attempt. Added a new `PreemptorBenchmark` to measure preemption perf as it now happens off the main scheduling loop and thus unreachable by earlier benchmarks. Benchmark results are unsurprisingly great. The biggest winner is the PreemptorFallbackForLargeClusterBenchmark (now ClusterFullUtilizationBenchmark). Without the preemptor fallback and thanks to static veto offer filtering it's now 99.995% faster :) The lowest gain is for the limit constraint benchmark. It's the only dynamic veto type and thus is not subjected to offer filtering. Still ~71% improvement is nothing to complain about. Before: Benchmark Mode Cnt ScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100781243.004 ± 9308.450 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 1205278.826 ± 19800.452 ns/op SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 77048458.974 ± 918593.702 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100769919.326 ± 18963.264 ns/op After: Benchmark Mode CntScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 10028117.603 ±243.556 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 348667.808 ± 2956.521 ns/op SchedulingBenchmarks.ClusterFullUtilizationBenchmark.runBenchmark avgt 100 3978.828 ±351.186 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 10026096.782 ±412.138 ns/op SchedulingBenchmarks.PreemptorBenchmark.runBenchmark avgt 100 6054216.773 ± 105428.318 ns/op Perf gain summary: InsufficientResourcesSchedulingBenchmark - 96.4% LimitConstraintMismatchSchedulingBenchmark - 71% PreemptorFallbackForLargeClusterBenchmark- 99.995% ValueConstraintMismatchSchedulingBenchmark - 96.6% Diffs - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 3239eaa139e35e8c3acdacf6375f492de2b5bfee src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java e87dda47a355654c66f6f54fb25a4d9a7f68422d src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java d0fe3e133cbec2418f31160bf8ab8adaa45bb958 src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerImplTest.java 4ee13c8e5d46ba863f4d9871884c7d494d07758d src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerTest.java 87bc531d2a72f21c36ddd0c1bd3b2367826cc422 Diff: https://reviews.apache.org/r/31739/diff/ Testing --- ./gradlew -Pq build Manual testing in vagrant. Thanks, Maxim Khutornenko
Re: Review Request 31570: Suppressing duplicate update instance events.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31570/#review75224 --- Ship it! Ship It! - Bill Farner On Feb. 28, 2015, 1:37 a.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31570/ --- (Updated Feb. 28, 2015, 1:37 a.m.) Review request for Aurora, David McLaughlin and Bill Farner. Bugs: AURORA-1097 https://issues.apache.org/jira/browse/AURORA-1097 Repository: aurora Description --- All instance update actions are supposed to be unique. Suppressing duplicats due to pause/resume cycle. Diffs - src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java acdade3dca807a221b4da975d0310c91884ee752 src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java e24d6bde5f3479a75522e825cce4ec6c30c117aa Diff: https://reviews.apache.org/r/31570/diff/ Testing --- ./gradlew -Pq build Manual testing in vagrant. Thanks, Maxim Khutornenko
Re: Review Request 31716: Add an end-to-end test for scheduler-driven job updates.
On March 4, 2015, 2:46 a.m., Maxim Khutornenko wrote: src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh, line 145 https://reviews.apache.org/r/31716/diff/1/?file=884042#file884042line145 kill newline This was to follow the convention we have for continued statements in java, e.g. ``` if (longStatement otherLongStatement) { // beginning of body } ``` However, a refactor to satisfy pause/resume obviated this. On March 4, 2015, 2:46 a.m., Maxim Khutornenko wrote: src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh, line 159 https://reviews.apache.org/r/31716/diff/1/?file=884042#file884042line159 Should it also include a pause/resume cycle? Done. - Bill --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31716/#review75141 --- On March 4, 2015, 1:21 a.m., Bill Farner wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31716/ --- (Updated March 4, 2015, 1:21 a.m.) Review request for Aurora and Maxim Khutornenko. Bugs: AURORA-1157 https://issues.apache.org/jira/browse/AURORA-1157 Repository: aurora Description --- Add an end-to-end test for scheduler-driven job updates. Diffs - src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 6a75528aeb62c119e4f8ab602d0e2081dc07fe49 Diff: https://reviews.apache.org/r/31716/diff/ Testing --- Ran end-to-end tests. Thanks, Bill Farner
Re: Review Request 31716: Add an end-to-end test for scheduler-driven job updates.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31716/ --- (Updated March 4, 2015, 7:01 p.m.) Review request for Aurora and Maxim Khutornenko. Bugs: AURORA-1157 https://issues.apache.org/jira/browse/AURORA-1157 Repository: aurora Description --- Add an end-to-end test for scheduler-driven job updates. Diffs (updated) - src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 6a75528aeb62c119e4f8ab602d0e2081dc07fe49 Diff: https://reviews.apache.org/r/31716/diff/ Testing --- Ran end-to-end tests. Thanks, Bill Farner
Review Request 31739: Making task preemption asynchronous.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31739/ --- Review request for Aurora, Bill Farner and Zameer Manji. Bugs: AURORA-1158 https://issues.apache.org/jira/browse/AURORA-1158 Repository: aurora Description --- Reservations are now happen asynchronously with a configurable delay between a failed task scheduling and a preemption attempt. Added a new `PreemptorBenchmark` to measure preemption perf as it now happens off the main scheduling loop and thus unreachable by earlier benchmarks. Benchmark results are unsurprisingly great. The biggest winner is the PreemptorFallbackForLargeClusterBenchmark (now ClusterFullUtilizationBenchmark). Without the preemptor fallback and thanks to static veto offer filtering it's now 99.995% faster :) The lowest gain is for the limit constraint benchmark. It's the only dynamic veto type and thus is not subjected to offer filtering. Still ~82% improvement is nothing to complain about. Before: Benchmark Mode Cnt ScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100781243.004 ± 9308.450 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 1205278.826 ± 19800.452 ns/op SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 77048458.974 ± 918593.702 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100769919.326 ± 18963.264 ns/op After: Benchmark Mode CntScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 10028117.603 ±243.556 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 348667.808 ± 2956.521 ns/op SchedulingBenchmarks.ClusterFullUtilizationBenchmark.runBenchmark avgt 100 3978.828 ±351.186 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 10026096.782 ±412.138 ns/op SchedulingBenchmarks.PreemptorBenchmark.runBenchmark avgt 100 6054216.773 ± 105428.318 ns/op Diffs - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 3239eaa139e35e8c3acdacf6375f492de2b5bfee src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java e87dda47a355654c66f6f54fb25a4d9a7f68422d src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java d0fe3e133cbec2418f31160bf8ab8adaa45bb958 src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerImplTest.java 4ee13c8e5d46ba863f4d9871884c7d494d07758d src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerTest.java 87bc531d2a72f21c36ddd0c1bd3b2367826cc422 Diff: https://reviews.apache.org/r/31739/diff/ Testing --- ./gradlew -Pq build Manual testing in vagrant. Thanks, Maxim Khutornenko
Re: Review Request 31739: Making task preemption asynchronous.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31739/ --- (Updated March 4, 2015, 7:30 p.m.) Review request for Aurora, Bill Farner and Zameer Manji. Bugs: AURORA-1158 https://issues.apache.org/jira/browse/AURORA-1158 Repository: aurora Description (updated) --- Reservations now happen asynchronously with a configurable delay between a failed task scheduling and a preemption attempt. Added a new `PreemptorBenchmark` to measure preemption perf as it now happens off the main scheduling loop and thus unreachable by earlier benchmarks. Benchmark results are unsurprisingly great. The biggest winner is the PreemptorFallbackForLargeClusterBenchmark (now ClusterFullUtilizationBenchmark). Without the preemptor fallback and thanks to static veto offer filtering it's now 99.995% faster :) The lowest gain is for the limit constraint benchmark. It's the only dynamic veto type and thus is not subjected to offer filtering. Still ~71% improvement is nothing to complain about. Before: Benchmark Mode Cnt ScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100781243.004 ± 9308.450 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 1205278.826 ± 19800.452 ns/op SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 77048458.974 ± 918593.702 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100769919.326 ± 18963.264 ns/op After: Benchmark Mode CntScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 10028117.603 ±243.556 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 348667.808 ± 2956.521 ns/op SchedulingBenchmarks.ClusterFullUtilizationBenchmark.runBenchmark avgt 100 3978.828 ±351.186 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 10026096.782 ±412.138 ns/op SchedulingBenchmarks.PreemptorBenchmark.runBenchmark avgt 100 6054216.773 ± 105428.318 ns/op Perf gain summary: InsufficientResourcesSchedulingBenchmark - 96.4% LimitConstraintMismatchSchedulingBenchmark - 71% PreemptorFallbackForLargeClusterBenchmark- 99.995% ValueConstraintMismatchSchedulingBenchmark - 96.6% Diffs - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 3239eaa139e35e8c3acdacf6375f492de2b5bfee src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java e87dda47a355654c66f6f54fb25a4d9a7f68422d src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java d0fe3e133cbec2418f31160bf8ab8adaa45bb958 src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerImplTest.java 4ee13c8e5d46ba863f4d9871884c7d494d07758d src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerTest.java 87bc531d2a72f21c36ddd0c1bd3b2367826cc422 Diff: https://reviews.apache.org/r/31739/diff/ Testing --- ./gradlew -Pq build Manual testing in vagrant. Thanks, Maxim Khutornenko
Re: Review Request 31739: Making task preemption asynchronous.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31739/#review75247 --- src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java https://reviews.apache.org/r/31739/#comment122213 By only using one task per host we don't benchmark the costly O(tasks per slave) candidate search. The latter can probably be improved significantly, I therefore think it is wise to extend the benchmark accordingly. src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java https://reviews.apache.org/r/31739/#comment122217 Is this the same single threaded scheduler used for ordinary scheduling? (Otherwise I would expect races) - Stephan Erb On March 4, 2015, 8:30 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31739/ --- (Updated March 4, 2015, 8:30 p.m.) Review request for Aurora, Bill Farner and Zameer Manji. Bugs: AURORA-1158 https://issues.apache.org/jira/browse/AURORA-1158 Repository: aurora Description --- Reservations now happen asynchronously with a configurable delay between a failed task scheduling and a preemption attempt. Added a new `PreemptorBenchmark` to measure preemption perf as it now happens off the main scheduling loop and thus unreachable by earlier benchmarks. Benchmark results are unsurprisingly great. The biggest winner is the PreemptorFallbackForLargeClusterBenchmark (now ClusterFullUtilizationBenchmark). Without the preemptor fallback and thanks to static veto offer filtering it's now 99.995% faster :) The lowest gain is for the limit constraint benchmark. It's the only dynamic veto type and thus is not subjected to offer filtering. Still ~71% improvement is nothing to complain about. Before: Benchmark Mode Cnt ScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100781243.004 ± 9308.450 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 1205278.826 ± 19800.452 ns/op SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 77048458.974 ± 918593.702 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100769919.326 ± 18963.264 ns/op After: Benchmark Mode CntScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 10028117.603 ±243.556 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 348667.808 ± 2956.521 ns/op SchedulingBenchmarks.ClusterFullUtilizationBenchmark.runBenchmark avgt 100 3978.828 ±351.186 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 10026096.782 ±412.138 ns/op SchedulingBenchmarks.PreemptorBenchmark.runBenchmark avgt 100 6054216.773 ± 105428.318 ns/op Perf gain summary: InsufficientResourcesSchedulingBenchmark - 96.4% LimitConstraintMismatchSchedulingBenchmark - 71% PreemptorFallbackForLargeClusterBenchmark- 99.995% ValueConstraintMismatchSchedulingBenchmark - 96.6% Diffs - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 3239eaa139e35e8c3acdacf6375f492de2b5bfee src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java e87dda47a355654c66f6f54fb25a4d9a7f68422d src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java d0fe3e133cbec2418f31160bf8ab8adaa45bb958 src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerImplTest.java 4ee13c8e5d46ba863f4d9871884c7d494d07758d src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerTest.java 87bc531d2a72f21c36ddd0c1bd3b2367826cc422 Diff: https://reviews.apache.org/r/31739/diff/ Testing --- ./gradlew -Pq build Manual testing in vagrant. Thanks, Maxim Khutornenko
Re: Review Request 31350: Fix clusters.patch contextmanager cleanup
On March 4, 2015, 7:30 p.m., Kevin Sweeney wrote: Stephan: I'm ready to commit this but it looks like you don't have an email address in reviewboard. Can you set one so that we can attribute this patch and ping this review? ``` File /Users/ksweeney/workspace/aurora/build-support/rbt.venv/lib/python2.7/site-packages/rbtools/clients/git.py, line 740, in create_commit '--author=%s %s' % (author.fullname, author.email)]) File /Users/ksweeney/workspace/aurora/build-support/rbt.venv/lib/python2.7/site-packages/rbtools/api/resource.py, line 299, in __getattr__ raise AttributeError AttributeError ``` It should now be public! - Stephan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31350/#review75203 --- On Feb. 24, 2015, 11:12 p.m., Stephan Erb wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31350/ --- (Updated Feb. 24, 2015, 11:12 p.m.) Review request for Aurora, Kevin Sweeney and Maxim Khutornenko. Repository: aurora Description --- Fix clusters.patch contextmanager cleanup Otherwise one failing testcase can affect the entire suite. Diffs - src/main/python/apache/aurora/common/clusters.py c4b7fefca30313b281808478bf23158a9b7fbf85 src/test/python/apache/aurora/common/test_clusters.py 1bd696e9cd28d87d0cac68b33ab043407d796b61 Diff: https://reviews.apache.org/r/31350/diff/ Testing --- ./pants test.pytest --no-fast src/test/python/apache/aurora/common:: Thanks, Stephan Erb
Review Request 31751: Adding missing service binding into UpdaterModule.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31751/ --- Review request for Aurora and Bill Farner. Bugs: AURORA-1173 https://issues.apache.org/jira/browse/AURORA-1173 Repository: aurora Description --- Adding missing binding and a e2e test. Diffs - src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 216849145fe681d05ed54ac7fbf385bb31d8cdea Diff: https://reviews.apache.org/r/31751/diff/ Testing --- ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh Thanks, Maxim Khutornenko
Re: Review Request 31751: Adding missing service binding into UpdaterModule.
On March 5, 2015, 1:01 a.m., Bill Farner wrote: It would be great to have coverage for this within the scheduler's tests. `JobUpdaterIT` already consumes `UpdaterModule`, pehaps this fix could be proven there? AFAICT, to truly test that binding would require wiring up the SchedulerModule and all its dependent modules to simulate a DriverRegistered event. That would trigger SchedulerLifecycle handleRegistered() and all services annotated with @SchedulerActive. IMO, not worth the increased test complexity. On March 5, 2015, 1:01 a.m., Bill Farner wrote: src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh, line 169 https://reviews.apache.org/r/31751/diff/1/?file=885238#file885238line169 Any reason you chose `stop` and `start` over `restart`? Did not notice restart command. Changed. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31751/#review75272 --- On March 5, 2015, 12:48 a.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31751/ --- (Updated March 5, 2015, 12:48 a.m.) Review request for Aurora and Bill Farner. Bugs: AURORA-1173 https://issues.apache.org/jira/browse/AURORA-1173 Repository: aurora Description --- Adding missing binding and a e2e test. Diffs - src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 216849145fe681d05ed54ac7fbf385bb31d8cdea Diff: https://reviews.apache.org/r/31751/diff/ Testing --- ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh Thanks, Maxim Khutornenko
Re: Review Request 31751: Adding missing service binding into UpdaterModule.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31751/#review75272 --- It would be great to have coverage for this within the scheduler's tests. `JobUpdaterIT` already consumes `UpdaterModule`, pehaps this fix could be proven there? src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh https://reviews.apache.org/r/31751/#comment122240 Any reason you chose `stop` and `start` over `restart`? - Bill Farner On March 5, 2015, 12:48 a.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31751/ --- (Updated March 5, 2015, 12:48 a.m.) Review request for Aurora and Bill Farner. Bugs: AURORA-1173 https://issues.apache.org/jira/browse/AURORA-1173 Repository: aurora Description --- Adding missing binding and a e2e test. Diffs - src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 216849145fe681d05ed54ac7fbf385bb31d8cdea Diff: https://reviews.apache.org/r/31751/diff/ Testing --- ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh Thanks, Maxim Khutornenko
Review Request 31754: Break out API servlet configuration into its own module.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31754/ --- Review request for Aurora. Repository: aurora Description --- Break out API servlet configuration into its own module. This is necessary to make a follow-up patch introducing HTTP Basic auth testable. Diffs - src/main/java/org/apache/aurora/scheduler/app/AppModule.java 5f6a019e4d6401e1efd075b72c049fa245cc0d0a src/main/java/org/apache/aurora/scheduler/http/JettyServerModule.java 8a59d89c07b406ce98076ca7ee51b958599a39ec src/main/java/org/apache/aurora/scheduler/http/SchedulerAPIServlet.java 33ad43b3202e5e9ef5be919b6abc5cbc7f62b660 src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java d1ab9b18394ad37fe9dcb131816fcfb2952bf8b6 src/main/java/org/apache/aurora/scheduler/http/api/ApiModule.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/http/JettyServerModuleTest.java 80beb258d9f2786668d29db85b1295163a402d42 src/test/java/org/apache/aurora/scheduler/http/ServletFilterTest.java 47d54e3c3bb1ba5e0fb26379792f515f25316480 src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java 5019094333f9807c64a49c29569ade191ee61824 src/test/java/org/apache/aurora/scheduler/http/api/ApiTServletTest.java PRE-CREATION Diff: https://reviews.apache.org/r/31754/diff/ Testing --- Thanks, Kevin Sweeney
Re: Review Request 31739: Making task preemption asynchronous.
On March 4, 2015, 10:25 p.m., Stephan Erb wrote: src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java, line 269 https://reviews.apache.org/r/31739/diff/1/?file=884469#file884469line269 Is this the same single threaded scheduler used for ordinary scheduling? (Otherwise I would expect races) Yes, it is. On March 4, 2015, 10:25 p.m., Stephan Erb wrote: src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java, line 331 https://reviews.apache.org/r/31739/diff/1/?file=884467#file884467line331 By only using one task per host we don't benchmark the costly O(tasks per slave) candidate search. The latter can probably be improved significantly, I therefore think it is wise to extend the benchmark accordingly. This line is actually referring to a benchmark tester task rather than preemption candidates, which are populated in the base `buildClusterTasks()`. However, you are correct about one task per slave observation. We could potentially add a multiple tasks per slave preemption benchmark but we should probably do so when we try to optimize the preemptor algorithm itself. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31739/#review75247 --- On March 4, 2015, 7:30 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31739/ --- (Updated March 4, 2015, 7:30 p.m.) Review request for Aurora, Bill Farner and Zameer Manji. Bugs: AURORA-1158 https://issues.apache.org/jira/browse/AURORA-1158 Repository: aurora Description --- Reservations now happen asynchronously with a configurable delay between a failed task scheduling and a preemption attempt. Added a new `PreemptorBenchmark` to measure preemption perf as it now happens off the main scheduling loop and thus unreachable by earlier benchmarks. Benchmark results are unsurprisingly great. The biggest winner is the PreemptorFallbackForLargeClusterBenchmark (now ClusterFullUtilizationBenchmark). Without the preemptor fallback and thanks to static veto offer filtering it's now 99.995% faster :) The lowest gain is for the limit constraint benchmark. It's the only dynamic veto type and thus is not subjected to offer filtering. Still ~71% improvement is nothing to complain about. Before: Benchmark Mode Cnt ScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100781243.004 ± 9308.450 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 1205278.826 ± 19800.452 ns/op SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 77048458.974 ± 918593.702 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100769919.326 ± 18963.264 ns/op After: Benchmark Mode CntScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 10028117.603 ±243.556 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 348667.808 ± 2956.521 ns/op SchedulingBenchmarks.ClusterFullUtilizationBenchmark.runBenchmark avgt 100 3978.828 ±351.186 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 10026096.782 ±412.138 ns/op SchedulingBenchmarks.PreemptorBenchmark.runBenchmark avgt 100 6054216.773 ± 105428.318 ns/op Perf gain summary: InsufficientResourcesSchedulingBenchmark - 96.4% LimitConstraintMismatchSchedulingBenchmark - 71% PreemptorFallbackForLargeClusterBenchmark- 99.995% ValueConstraintMismatchSchedulingBenchmark - 96.6% Diffs - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 3239eaa139e35e8c3acdacf6375f492de2b5bfee src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java e87dda47a355654c66f6f54fb25a4d9a7f68422d src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java d0fe3e133cbec2418f31160bf8ab8adaa45bb958 src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerImplTest.java 4ee13c8e5d46ba863f4d9871884c7d494d07758d src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerTest.java 87bc531d2a72f21c36ddd0c1bd3b2367826cc422 Diff: https://reviews.apache.org/r/31739/diff/ Testing --- ./gradlew -Pq build Manual testing in
Re: Review Request 31751: Adding missing service binding into UpdaterModule.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31751/#review75279 --- Ship it! Ship It! - Bill Farner On March 5, 2015, 2:22 a.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31751/ --- (Updated March 5, 2015, 2:22 a.m.) Review request for Aurora and Bill Farner. Bugs: AURORA-1173 https://issues.apache.org/jira/browse/AURORA-1173 Repository: aurora Description --- Adding missing binding and a e2e test. Diffs - src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 216849145fe681d05ed54ac7fbf385bb31d8cdea Diff: https://reviews.apache.org/r/31751/diff/ Testing --- ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh Thanks, Maxim Khutornenko
Re: Review Request 31739: Making task preemption asynchronous.
On March 4, 2015, 8:17 p.m., Bill Farner wrote: Is there a reason you did not opt to implement this behind the `Preemptor` interface? Seems like if you went with that approach, `TaskScheduler` can be oblivious to the background operations. Maxim Khutornenko wrote: Trying to keep things simple. Moving it behind `Preemptor` would require sharing `Reservations` (or some equivalent feedback notificaiton) between TaskScheduler and Preemptor. Bill Farner wrote: I don't see why the data structure would need to be shared. On one call you could asynchronously kick off the work, and a subsequent call could report back the result of the previous. Maxim Khutornenko wrote: Perhaps I am missing the point but how does it correlate with TaskScheduler can be oblivious to the background operations? If there is no immediate response back from the preemptor what is responsible for getting the reservation data and when? Where does that reservation data live in this case? Chatted with Bill offline and we agreed that we should start moving towards a standalone (background worker) preemptor. That would require moving async decision making into the preemptor itself. I am going to discard this RB and start working towards what benefits us longer term. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31739/#review75226 --- On March 4, 2015, 7:30 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31739/ --- (Updated March 4, 2015, 7:30 p.m.) Review request for Aurora, Bill Farner and Zameer Manji. Bugs: AURORA-1158 https://issues.apache.org/jira/browse/AURORA-1158 Repository: aurora Description --- Reservations now happen asynchronously with a configurable delay between a failed task scheduling and a preemption attempt. Added a new `PreemptorBenchmark` to measure preemption perf as it now happens off the main scheduling loop and thus unreachable by earlier benchmarks. Benchmark results are unsurprisingly great. The biggest winner is the PreemptorFallbackForLargeClusterBenchmark (now ClusterFullUtilizationBenchmark). Without the preemptor fallback and thanks to static veto offer filtering it's now 99.995% faster :) The lowest gain is for the limit constraint benchmark. It's the only dynamic veto type and thus is not subjected to offer filtering. Still ~71% improvement is nothing to complain about. Before: Benchmark Mode Cnt ScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100781243.004 ± 9308.450 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 1205278.826 ± 19800.452 ns/op SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 77048458.974 ± 918593.702 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100769919.326 ± 18963.264 ns/op After: Benchmark Mode CntScoreError Units SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 10028117.603 ±243.556 ns/op SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark avgt 100 348667.808 ± 2956.521 ns/op SchedulingBenchmarks.ClusterFullUtilizationBenchmark.runBenchmark avgt 100 3978.828 ±351.186 ns/op SchedulingBenchmarks.ValueConstraintMismatchSchedulingBenchmark.runBenchmark avgt 10026096.782 ±412.138 ns/op SchedulingBenchmarks.PreemptorBenchmark.runBenchmark avgt 100 6054216.773 ± 105428.318 ns/op Perf gain summary: InsufficientResourcesSchedulingBenchmark - 96.4% LimitConstraintMismatchSchedulingBenchmark - 71% PreemptorFallbackForLargeClusterBenchmark- 99.995% ValueConstraintMismatchSchedulingBenchmark - 96.6% Diffs - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 3239eaa139e35e8c3acdacf6375f492de2b5bfee src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java e87dda47a355654c66f6f54fb25a4d9a7f68422d src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java d0fe3e133cbec2418f31160bf8ab8adaa45bb958 src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerImplTest.java 4ee13c8e5d46ba863f4d9871884c7d494d07758d src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerTest.java
Re: Review Request 31754: Break out API servlet configuration into its own module.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31754/#review75275 --- Looks like my IDE was a little overzealous applying a different styleguide - updated diff to follow. - Kevin Sweeney On March 4, 2015, 5:55 p.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31754/ --- (Updated March 4, 2015, 5:55 p.m.) Review request for Aurora, Joshua Cohen and Bill Farner. Repository: aurora Description --- Break out API servlet configuration into its own module. This is necessary to make a follow-up patch introducing HTTP Basic auth testable. Diffs - src/main/java/org/apache/aurora/scheduler/app/AppModule.java 5f6a019e4d6401e1efd075b72c049fa245cc0d0a src/main/java/org/apache/aurora/scheduler/http/JettyServerModule.java 8a59d89c07b406ce98076ca7ee51b958599a39ec src/main/java/org/apache/aurora/scheduler/http/SchedulerAPIServlet.java 33ad43b3202e5e9ef5be919b6abc5cbc7f62b660 src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java d1ab9b18394ad37fe9dcb131816fcfb2952bf8b6 src/main/java/org/apache/aurora/scheduler/http/api/ApiModule.java PRE-CREATION src/test/java/org/apache/aurora/scheduler/http/JettyServerModuleTest.java 80beb258d9f2786668d29db85b1295163a402d42 src/test/java/org/apache/aurora/scheduler/http/ServletFilterTest.java 47d54e3c3bb1ba5e0fb26379792f515f25316480 src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java 5019094333f9807c64a49c29569ade191ee61824 src/test/java/org/apache/aurora/scheduler/http/api/ApiTServletTest.java PRE-CREATION Diff: https://reviews.apache.org/r/31754/diff/ Testing --- Thanks, Kevin Sweeney