-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66103/#review199418
-----------------------------------------------------------
Master (aaadad7) is red with this patch.
./build-support/jenkins/build.sh
at org.easymock.internal.ReplayState.invoke(ReplayState.java:46)
at
org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:40)
at
org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:94)
at com.sun.proxy.$Proxy21.changeState(Unknown Source)
at
org.apache.aurora.scheduler.TaskStatusHandlerImpl.lambda$run$0(TaskStatusHandlerImpl.java:158)
at
org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:144)
at
org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:139)
at
org.apache.aurora.scheduler.storage.testing.StorageTestUtil.lambda$expectWrite$1(StorageTestUtil.java:83)
at org.easymock.internal.Result.answer(Result.java:106)
at org.easymock.internal.ReplayState.invokeInner(ReplayState.java:60)
at org.easymock.internal.ReplayState.invoke(ReplayState.java:46)
at
org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:40)
at
org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:94)
at com.sun.proxy.$Proxy20.write(Unknown Source)
at
org.apache.aurora.scheduler.TaskStatusHandlerImpl.run(TaskStatusHandlerImpl.java:154)
at
com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:66)
at com.google.common.util.concurrent.Callables$4.run(Callables.java:122)
at java.lang.Thread.run(Thread.java:748)
I0319 16:19:42.150 [ShutdownHook, SchedulerMain] Stopping scheduler services.
1081 tests completed, 1 failed, 1 skipped
:test FAILED
:jacocoTestReport
Coverage report generated:
file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/jacoco/test/html/index.html
:jacocoTestCoverageVerification
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':test'.
> There were failing tests. See the report at:
> file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/tests/test/index.html
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug
option to get more log output.
* Get more help at https://help.gradle.org
BUILD FAILED in 7m 40s
45 actionable tasks: 36 executed, 9 up-to-date
I will refresh this build result if you post a review containing "@ReviewBot
retry"
- Aurora ReviewBot
On March 19, 2018, 2:58 p.m., Reza Motamedi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66103/
> -----------------------------------------------------------
>
> (Updated March 19, 2018, 2:58 p.m.)
>
>
> Review request for Aurora, David McLaughlin, Daniel Knightly, Jordan Ly,
> Santhosh Kumar Shanmugham, and Stephan Erb.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> When disk isolation is enabled in a Mesos agent it calculates the disk usage
> for each container.
> Thermos Observer also monitors disk usage using `twitter.common.dirutil`,
> essentially repeating the work already done by the agent. In practice, we see
> that disk monitoring is one of the most expensive resource monitoring tasks.
> For instance, when there are deeply nested directories, the CPU utilization
> of the observer process can easily reach 1.5 CPUs. It would be ideal if we
> delegate the disk monitoring task to the agent and do it only once. With this
> approach, when disk collection has improved in the agent (for instance by
> implementing XFS isolation), we can simply benefit from it without any code
> change. Some more information about the problem is provided in AURORA-1918.
>
> This patch that introduces `MesosDiskCollector` which queries the agent's API
> endpoint to lookup disk_used_bytes. Note that there is also resource
> monitoring in thermos executor. Currently, I left the disk collector there to
> use the `du` implementation. That can be changed in a later patch.
>
> I modified some vagrant config files including `aurora-executor.service` and
> `etc_mesos-slave/isolation` for testing. They can be left as is. I included
> them in this patch to show how this would work e2e.
>
>
> Diffs
> -----
>
> 3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11
> examples/jobs/hello_world.aurora 5401bfebe753b5e53abd08baeac501144ced9b5a
> examples/vagrant/mesos_config/etc_mesos-slave/isolation
> 1a7028ffc70116b104ef3ad22b7388f637707a0f
> examples/vagrant/systemd/aurora-executor.service
> 5a1a9082ecd7b1367ec677d760a5c375b6db9076
> src/main/python/apache/aurora/tools/thermos_observer.py
> dd9f0c46ceac9e939b1b763073314161de0ea614
> src/main/python/apache/thermos/monitoring/BUILD
> 65ba7088f65e7baa5d30744736ba456b46a55e86
> src/main/python/apache/thermos/monitoring/disk.py
> 52c5d74fd70b5942ea3ef5101ba3f27bfc98fc21
> src/main/python/apache/thermos/monitoring/resource.py
> f5e3849ca6682c6d4720698be869ca6b9f703b94
> src/main/python/apache/thermos/observer/task_observer.py
> 4bb5d239e81fe4659397f899760c0e8853e93786
>
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
> fe74bd1d36666ecd89fca1b5b2251202cbbc0f24
> src/test/python/apache/thermos/monitoring/BUILD
> 8f2b39336dce6c7b580e6ba0009f60afdcb89179
> src/test/python/apache/thermos/monitoring/test_disk.py
> 362393bfd1facf3198e2d438d0596b16700b72b8
>
>
> Diff: https://reviews.apache.org/r/66103/diff/1/
>
>
> Testing
> -------
>
> I added unit tests.
> Tested in vagrant and it works as intenced.
> I also built and deployed in our test enviroment. In order to measure
> imporoved performance I created jobs with nested folders and noticed
> reduction in CPU utilization of the Observer process, by at least 60%. (1.5
> CPU cores to 0.4 CPU cores)
>
>
> Thanks,
>
> Reza Motamedi
>
>