-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66103/#review199448
-----------------------------------------------------------



Approach looks good. Few comments on improving the interface and 
code-structuring.

Instead of plumbing the new arguments all the way through to 
`TaskResourceMonitor` can we build a partial factory method in `ThermoObserver` 
since we already have all the information at this point itself.


3rdparty/python/requirements.txt
Lines 23 (patched)
<https://reviews.apache.org/r/66103/#comment279767>

    Any reason not using the more widely used `jq`?



examples/vagrant/systemd/aurora-executor.service
Lines 22-25 (patched)
<https://reviews.apache.org/r/66103/#comment279768>

    Snake-case arguments like `log_to_disk`.



src/main/python/apache/thermos/monitoring/disk.py
Lines 153 (patched)
<https://reviews.apache.org/r/66103/#comment279772>

    Calling it API_URL is misleading since HTTP endpoints have both the regular 
status, flags and metrics endpoints and the new HTTP API as well.
    
    s/API_URL/AGENT_HTTP_ENDPOINT/
    
    We have APIs under /api and that is not the ones we are calling here.



src/main/python/apache/thermos/monitoring/disk.py
Lines 154 (patched)
<https://reviews.apache.org/r/66103/#comment279773>

    Can we use the `/containers` which should list all containers (AFAIK) and 
get rid of the `API_PATH` configuration parameter?



src/main/python/apache/thermos/monitoring/resource.py
Line 158 (original), 159 (patched)
<https://reviews.apache.org/r/66103/#comment279779>

    Call this `disk_collector_class`? It reads a little wierd when we call this 
`disk_collector`, meaning it is the actual object to be used.



src/main/python/apache/thermos/monitoring/resource.py
Lines 164 (patched)
<https://reviews.apache.org/r/66103/#comment279780>

    Can we combine this via partial function to the `disk_collector_class` 
argument? This will keep the constructor more idiomatic.


- Santhosh Kumar Shanmugham


On March 19, 2018, 7:58 a.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66103/
> -----------------------------------------------------------
> 
> (Updated March 19, 2018, 7:58 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Jordan Ly, 
> Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> When disk isolation is enabled in a Mesos agent it calculates the disk usage 
> for each container. 
> Thermos Observer also monitors disk usage using `twitter.common.dirutil`, 
> essentially repeating the work already done by the agent. In practice, we see 
> that disk monitoring is one of the most expensive resource monitoring tasks. 
> For instance, when there are deeply nested directories, the CPU utilization 
> of the observer process can easily reach 1.5 CPUs. It would be ideal if we 
> delegate the disk monitoring task to the agent and do it only once. With this 
> approach, when disk collection has improved in the agent (for instance by 
> implementing XFS isolation), we can simply benefit from it without any code 
> change. Some more information about the problem is provided in AURORA-1918.
> 
> This patch that introduces `MesosDiskCollector` which queries the agent's API 
> endpoint to lookup disk_used_bytes. Note that there is also resource 
> monitoring in thermos executor. Currently, I left the disk collector there to 
> use the `du` implementation. That can be changed in a later patch.
> 
> I modified some vagrant config files including `aurora-executor.service` and 
> `etc_mesos-slave/isolation` for testing. They can be left as is. I included 
> them in this patch to show how this would work e2e.
> 
> 
> Diffs
> -----
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   examples/jobs/hello_world.aurora 5401bfebe753b5e53abd08baeac501144ced9b5a 
>   examples/vagrant/mesos_config/etc_mesos-slave/isolation 
> 1a7028ffc70116b104ef3ad22b7388f637707a0f 
>   examples/vagrant/systemd/aurora-executor.service 
> 5a1a9082ecd7b1367ec677d760a5c375b6db9076 
>   src/main/python/apache/aurora/tools/thermos_observer.py 
> dd9f0c46ceac9e939b1b763073314161de0ea614 
>   src/main/python/apache/thermos/monitoring/BUILD 
> 65ba7088f65e7baa5d30744736ba456b46a55e86 
>   src/main/python/apache/thermos/monitoring/disk.py 
> 52c5d74fd70b5942ea3ef5101ba3f27bfc98fc21 
>   src/main/python/apache/thermos/monitoring/resource.py 
> f5e3849ca6682c6d4720698be869ca6b9f703b94 
>   src/main/python/apache/thermos/observer/task_observer.py 
> 4bb5d239e81fe4659397f899760c0e8853e93786 
>   
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
>  fe74bd1d36666ecd89fca1b5b2251202cbbc0f24 
>   src/test/python/apache/thermos/monitoring/BUILD 
> 8f2b39336dce6c7b580e6ba0009f60afdcb89179 
>   src/test/python/apache/thermos/monitoring/test_disk.py 
> 362393bfd1facf3198e2d438d0596b16700b72b8 
> 
> 
> Diff: https://reviews.apache.org/r/66103/diff/1/
> 
> 
> Testing
> -------
> 
> I added unit tests.
> Tested in vagrant and it works as intenced.
> I also built and deployed in our test enviroment. In order to measure 
> imporoved performance I created jobs with nested folders and noticed 
> reduction in CPU utilization of the Observer process, by at least 60%. (1.5 
> CPU cores to 0.4 CPU cores)
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>

Reply via email to