> On March 19, 2018, 9 p.m., Santhosh Kumar Shanmugham wrote: > > 3rdparty/python/requirements.txt > > Lines 23 (patched) > > <https://reviews.apache.org/r/66103/diff/1/?file=1982409#file1982409line23> > > > > Any reason not using the more widely used `jq`?
There are two python libraries for jq 1) https://pypi.python.org/pypi/jq 2) https://pypi.python.org/pypi/pyjq These two libs have not be updataed recently. We also don't need all the functions of jq. I am open to suggestions. If any of the above or another lib is preferred. > On March 19, 2018, 9 p.m., Santhosh Kumar Shanmugham wrote: > > src/main/python/apache/thermos/monitoring/resource.py > > Line 158 (original), 159 (patched) > > <https://reviews.apache.org/r/66103/diff/1/?file=1982416#file1982416line159> > > > > Call this `disk_collector_class`? It reads a little wierd when we call > > this `disk_collector`, meaning it is the actual object to be used. I agree. Addressed. > On March 19, 2018, 9 p.m., Santhosh Kumar Shanmugham wrote: > > src/main/python/apache/thermos/monitoring/resource.py > > Lines 164 (patched) > > <https://reviews.apache.org/r/66103/diff/1/?file=1982416#file1982416line164> > > > > Can we combine this via partial function to the `disk_collector_class` > > argument? This will keep the constructor more idiomatic. I added `DiskCollectorProvider` to address this. - Reza ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66103/#review199448 ----------------------------------------------------------- On March 20, 2018, 5:37 a.m., Reza Motamedi wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66103/ > ----------------------------------------------------------- > > (Updated March 20, 2018, 5:37 a.m.) > > > Review request for Aurora, David McLaughlin, Daniel Knightly, Jordan Ly, > Santhosh Kumar Shanmugham, and Stephan Erb. > > > Repository: aurora > > > Description > ------- > > When disk isolation is enabled in a Mesos agent it calculates the disk usage > for each container. > Thermos Observer also monitors disk usage using `twitter.common.dirutil`, > essentially repeating the work already done by the agent. In practice, we see > that disk monitoring is one of the most expensive resource monitoring tasks. > For instance, when there are deeply nested directories, the CPU utilization > of the observer process can easily reach 1.5 CPUs. It would be ideal if we > delegate the disk monitoring task to the agent and do it only once. With this > approach, when disk collection has improved in the agent (for instance by > implementing XFS isolation), we can simply benefit from it without any code > change. Some more information about the problem is provided in AURORA-1918. > > This patch that introduces `MesosDiskCollector` which queries the agent's API > endpoint to lookup disk_used_bytes. Note that there is also resource > monitoring in thermos executor. Currently, I left the disk collector there to > use the `du` implementation. That can be changed in a later patch. > > I modified some vagrant config files including `aurora-executor.service` and > `etc_mesos-slave/isolation` for testing. They can be left as is. I included > them in this patch to show how this would work e2e. > > > Diffs > ----- > > 3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 > examples/jobs/hello_world.aurora 5401bfebe753b5e53abd08baeac501144ced9b5a > examples/vagrant/mesos_config/etc_mesos-slave/isolation > 1a7028ffc70116b104ef3ad22b7388f637707a0f > examples/vagrant/systemd/aurora-executor.service > 5a1a9082ecd7b1367ec677d760a5c375b6db9076 > src/main/python/apache/aurora/tools/thermos_observer.py > dd9f0c46ceac9e939b1b763073314161de0ea614 > src/main/python/apache/thermos/monitoring/BUILD > 65ba7088f65e7baa5d30744736ba456b46a55e86 > src/main/python/apache/thermos/monitoring/disk.py > 52c5d74fd70b5942ea3ef5101ba3f27bfc98fc21 > src/main/python/apache/thermos/monitoring/resource.py > f5e3849ca6682c6d4720698be869ca6b9f703b94 > src/main/python/apache/thermos/observer/task_observer.py > 4bb5d239e81fe4659397f899760c0e8853e93786 > > src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py > fe74bd1d36666ecd89fca1b5b2251202cbbc0f24 > src/test/python/apache/thermos/monitoring/BUILD > 8f2b39336dce6c7b580e6ba0009f60afdcb89179 > src/test/python/apache/thermos/monitoring/test_disk.py > 362393bfd1facf3198e2d438d0596b16700b72b8 > src/test/python/apache/thermos/monitoring/test_resource.py > e577e552d4ee1807096a15401851bb9fd95fa426 > > > Diff: https://reviews.apache.org/r/66103/diff/2/ > > > Testing > ------- > > I added unit tests. > Tested in vagrant and it works as intenced. > I also built and deployed in our test enviroment. In order to measure > imporoved performance I created jobs with nested folders and noticed > reduction in CPU utilization of the Observer process, by at least 60%. (1.5 > CPU cores to 0.4 CPU cores) > > > Thanks, > > Reza Motamedi > >
