[jira] [Commented] (MESOS-9806) Address allocator performance regression due to the addition of quota limits.
[ https://issues.apache.org/jira/browse/MESOS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914674#comment-16914674 ] Meng Zhu commented on MESOS-9806: - As of now, the performance is close to 1.8.1 even with the addition of limits enforcement. There will be more improvement as we deprecate the framework sorter and optimize the role sorter (MESOS-9942 and MESOS-9943). > Address allocator performance regression due to the addition of quota limits. > - > > Key: MESOS-9806 > URL: https://issues.apache.org/jira/browse/MESOS-9806 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Meng Zhu >Assignee: Meng Zhu >Priority: Critical > Labels: resource-management > > In MESOS-9802, we removed the quota role sorter which is tech debt. > However, this slows down the allocator. The problem is that in the first > stage, even though a cluster might have no active roles with non-default > quota, the allocator will now have to sort and go through each and every role > in the cluster. Benchmark result shows that for 1k roles with 2k frameworks, > the allocator could experience ~50% performance degradation. > There are a couple of ways to address this issue. For example, we could make > the sorter aware of quota. And add a method, say `sortQuotaRoles`, to return > all the roles with non-default quota. Alternatively, an even better approach > would be to deprecate the sorter concept and just have two standalone > functions e.g. sortRoles() and sortQuotaRoles() that takes in the role tree > structure (not yet exist in the allocator) and return the sorted roles. > In addition, when implementing MESOS-8068, we need to do more during the > allocation cycle. In particular, we need to call shrink many more times than > before. These all contribute to the performance slowdown. Specifically, for > the quota oriented benchmark > `HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2` we can observe > 2-3x slowdown compared to the previous release (1.8.1): > Current master: > QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 > Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter > Made 3500 allocations in 32.051382735secs > Made 0 allocation in 27.976022773secs > 1.8.1: > HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 > Made 3500 allocations in 13.810811063secs > Made 0 allocation in 9.885972984secs -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (MESOS-9806) Address allocator performance regression due to the addition of quota limits.
[ https://issues.apache.org/jira/browse/MESOS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914673#comment-16914673 ] Meng Zhu commented on MESOS-9806: - All the optimizations improved the performance by 50% 1.8.1 HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 Made 3500 allocations in 13.810811063secs Made 0 allocation in 9.885972984secs Before the optimization: QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter Made 3500 allocations in 32.051382735secs Made 0 allocation in 27.976022773secs After the optimization: HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 Made 3500 allocations in 15.385276405secs Made 0 allocation in 13.718502414secs > Address allocator performance regression due to the addition of quota limits. > - > > Key: MESOS-9806 > URL: https://issues.apache.org/jira/browse/MESOS-9806 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Meng Zhu >Assignee: Meng Zhu >Priority: Critical > Labels: resource-management > > In MESOS-9802, we removed the quota role sorter which is tech debt. > However, this slows down the allocator. The problem is that in the first > stage, even though a cluster might have no active roles with non-default > quota, the allocator will now have to sort and go through each and every role > in the cluster. Benchmark result shows that for 1k roles with 2k frameworks, > the allocator could experience ~50% performance degradation. > There are a couple of ways to address this issue. For example, we could make > the sorter aware of quota. And add a method, say `sortQuotaRoles`, to return > all the roles with non-default quota. Alternatively, an even better approach > would be to deprecate the sorter concept and just have two standalone > functions e.g. sortRoles() and sortQuotaRoles() that takes in the role tree > structure (not yet exist in the allocator) and return the sorted roles. > In addition, when implementing MESOS-8068, we need to do more during the > allocation cycle. In particular, we need to call shrink many more times than > before. These all contribute to the performance slowdown. Specifically, for > the quota oriented benchmark > `HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2` we can observe > 2-3x slowdown compared to the previous release (1.8.1): > Current master: > QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 > Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter > Made 3500 allocations in 32.051382735secs > Made 0 allocation in 27.976022773secs > 1.8.1: > HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 > Made 3500 allocations in 13.810811063secs > Made 0 allocation in 9.885972984secs -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (MESOS-9806) Address allocator performance regression due to the addition of quota limits.
[ https://issues.apache.org/jira/browse/MESOS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914672#comment-16914672 ] Meng Zhu commented on MESOS-9806: - Small vector optimization for ResourceQuantities, ResourceLimits and Resources: {noformat} commit 73033130de7872c6f240b9b05dced039d7666138 Author: Meng Zhu Date: Thu Aug 22 17:19:30 2019 -0700 Used boost `small_vector` in `Resources`. Master + previous patch: *HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 Made 3500 allocations in 16.307044003secs Made 0 allocation in 14.948262599secs Master + previous patch + this patch: *HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 Made 3500 allocations in 15.385276405secs Made 0 allocation in 13.718502414secs Review: https://reviews.apache.org/r/71357 commit 95201cbe4dc87eae2fde5754d16f5effbb6c1974 Author: Meng Zhu Date: Thu Aug 22 16:55:34 2019 -0700 Used boost `small_vector` in Resource Quantities and Limits. Master + previous patch *HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 Made 3500 allocations in 16.831380548secs Made 0 allocation in 15.102885644secs Master + previous patch + this patch: *HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 Made 3500 allocations in 16.307044003secs Made 0 allocation in 14.948262599secs Review: https://reviews.apache.org/r/71355 commit 25070f232a9bb97d1b78f8a7e5b774bbd50654f9 Author: Meng Zhu Date: Thu Aug 22 16:54:42 2019 -0700 Updated the boost library. This update includes adding `container/small_vector.hpp`. Review: https://reviews.apache.org/r/71356 {noformat} > Address allocator performance regression due to the addition of quota limits. > - > > Key: MESOS-9806 > URL: https://issues.apache.org/jira/browse/MESOS-9806 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Meng Zhu >Assignee: Meng Zhu >Priority: Critical > Labels: resource-management > > In MESOS-9802, we removed the quota role sorter which is tech debt. > However, this slows down the allocator. The problem is that in the first > stage, even though a cluster might have no active roles with non-default > quota, the allocator will now have to sort and go through each and every role > in the cluster. Benchmark result shows that for 1k roles with 2k frameworks, > the allocator could experience ~50% performance degradation. > There are a couple of ways to address this issue. For example, we could make > the sorter aware of quota. And add a method, say `sortQuotaRoles`, to return > all the roles with non-default quota. Alternatively, an even better approach > would be to deprecate the sorter concept and just have two standalone > functions e.g. sortRoles() and sortQuotaRoles() that takes in the role tree > structure (not yet exist in the allocator) and return the sorted roles. > In addition, when implementing MESOS-8068, we need to do more during the > allocation cycle. In particular, we need to call shrink many more times than > before. These all contribute to the performance slowdown. Specifically, for > the quota oriented benchmark > `HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2` we can observe > 2-3x slowdown compared to the previous release (1.8.1): > Current master: > QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 > Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter > Made 3500 allocations in 32.051382735secs > Made 0 allocation in 27.976022773secs > 1.8.1: > HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 > Made 3500 allocations in 13.810811063secs > Made 0 allocation in 9.885972984secs -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (MESOS-9806) Address allocator performance regression due to the addition of quota limits.
[ https://issues.apache.org/jira/browse/MESOS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914670#comment-16914670 ] Meng Zhu commented on MESOS-9806: - Optimized the allocation loop {noformat} commit ec6b7b34215e821a63cb79e7d52d94ff08c1e110 Author: Meng Zhu Date: Thu Aug 22 17:54:25 2019 -0700 Optimized the allocation loop. Master: HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 Made 3500 allocations in 23.37 secs Made 0 allocation in 19.72 secs Master + this patch: HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 Made 3500 allocations in 16.831380548secs Made 0 allocation in 15.102885644secs Review: https://reviews.apache.org/r/71359 {noformat} > Address allocator performance regression due to the addition of quota limits. > - > > Key: MESOS-9806 > URL: https://issues.apache.org/jira/browse/MESOS-9806 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Meng Zhu >Assignee: Meng Zhu >Priority: Critical > Labels: resource-management > > In MESOS-9802, we removed the quota role sorter which is tech debt. > However, this slows down the allocator. The problem is that in the first > stage, even though a cluster might have no active roles with non-default > quota, the allocator will now have to sort and go through each and every role > in the cluster. Benchmark result shows that for 1k roles with 2k frameworks, > the allocator could experience ~50% performance degradation. > There are a couple of ways to address this issue. For example, we could make > the sorter aware of quota. And add a method, say `sortQuotaRoles`, to return > all the roles with non-default quota. Alternatively, an even better approach > would be to deprecate the sorter concept and just have two standalone > functions e.g. sortRoles() and sortQuotaRoles() that takes in the role tree > structure (not yet exist in the allocator) and return the sorted roles. > In addition, when implementing MESOS-8068, we need to do more during the > allocation cycle. In particular, we need to call shrink many more times than > before. These all contribute to the performance slowdown. Specifically, for > the quota oriented benchmark > `HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2` we can observe > 2-3x slowdown compared to the previous release (1.8.1): > Current master: > QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 > Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter > Made 3500 allocations in 32.051382735secs > Made 0 allocation in 27.976022773secs > 1.8.1: > HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota/2 > Made 3500 allocations in 13.810811063secs > Made 0 allocation in 9.885972984secs -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (MESOS-9836) Docker containerizer overwrites `/mesos/slave` cgroups.
[ https://issues.apache.org/jira/browse/MESOS-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914034#comment-16914034 ] Qian Zhang commented on MESOS-9836: --- Master: commit 5db1835531edd16b8a7b550a944c27b0dacafbf7 Author: Qian Zhang Date: Wed Aug 21 16:50:06 2019 +0800 Used cached cgroups for updating resources in Docker containerizer. Review: https://reviews.apache.org/r/71335 1.8.x commit ee01c8d479b34ced35ee6bd172108a128086277e Author: Qian Zhang Date: Wed Aug 21 16:50:06 2019 +0800 Used cached cgroups for updating resources in Docker containerizer. Review: https://reviews.apache.org/r/71335 1.7.x commit 7b01d0d35e8e17434d6f8e8840c8586565fe8d6c Author: Qian Zhang Date: Wed Aug 21 16:50:06 2019 +0800 Used cached cgroups for updating resources in Docker containerizer. Review: https://reviews.apache.org/r/71335 1.6.x commit 1c70f29bdd270cdff8ce55bdfaab56581829017c (HEAD -> ci/qzhang/bp_9836_1.6.x) Author: Qian Zhang Date: Wed Aug 21 16:50:06 2019 +0800 Used cached cgroups for updating resources in Docker containerizer. Review: https://reviews.apache.org/r/71335 > Docker containerizer overwrites `/mesos/slave` cgroups. > --- > > Key: MESOS-9836 > URL: https://issues.apache.org/jira/browse/MESOS-9836 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Chun-Hung Hsiao >Assignee: Qian Zhang >Priority: Critical > Labels: docker, mesosphere > > The following bug was observed on our internal testing cluster. > The docker containerizer launched a container on an agent: > {noformat} > I0523 06:00:53.888579 21815 docker.cpp:1195] Starting container > 'f69c8a8c-eba4-4494-a305-0956a44a6ad2' for task > 'apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1' (and executor > 'apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1') of framework > 415284b7-2967-407d-b66f-f445e93f064e-0011 > I0523 06:00:54.524171 21815 docker.cpp:783] Checkpointing pid 13716 to > '/var/lib/mesos/slave/meta/slaves/60c42ab7-eb1a-4cec-b03d-ea06bff00c3f-S2/frameworks/415284b7-2967-407d-b66f-f445e93f064e-0011/executors/apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1/runs/f69c8a8c-eba4-4494-a305-0956a44a6ad2/pids/forked.pid' > {noformat} > After the container was launched, the docker containerizer did a {{docker > inspect}} on the container and cached the pid: > > [https://github.com/apache/mesos/blob/0c431dd60ae39138cc7e8b099d41ad794c02c9a9/src/slave/containerizer/docker.cpp#L1764] > The pid should be slightly greater than 13716. > The docker executor sent a {{TASK_FINISHED}} status update around 16 minutes > later: > {noformat} > I0523 06:16:17.287595 21809 slave.cpp:5566] Handling status update > TASK_FINISHED (Status UUID: 4e00b786-b773-46cd-8327-c7deb08f1de9) for task > apps_docker-sleep-app.1fda5b8e-7d20-11e9-9717-7aa030269ee1 of framework > 415284b7-2967-407d-b66f-f445e93f064e-0011 from executor(1)@172.31.1.7:36244 > {noformat} > After receiving the terminal status update, the agent asked the docker > containerizer to update {{cpu.cfs_period_us}}, {{cpu.cfs_quota_us}} and > {{memory.soft_limit_in_bytes}} of the container through the cached pid: > > [https://github.com/apache/mesos/blob/0c431dd60ae39138cc7e8b099d41ad794c02c9a9/src/slave/containerizer/docker.cpp#L1696] > {noformat} > I0523 06:16:17.290447 21815 docker.cpp:1868] Updated 'cpu.shares' to 102 at > /sys/fs/cgroup/cpu,cpuacct/mesos/slave for container > f69c8a8c-eba4-4494-a305-0956a44a6ad2 > I0523 06:16:17.290660 21815 docker.cpp:1895] Updated 'cpu.cfs_period_us' to > 100ms and 'cpu.cfs_quota_us' to 10ms (cpus 0.1) for container > f69c8a8c-eba4-4494-a305-0956a44a6ad2 > I0523 06:16:17.889816 21815 docker.cpp:1937] Updated > 'memory.soft_limit_in_bytes' to 32MB for container > f69c8a8c-eba4-4494-a305-0956a44a6ad2 > {noformat} > Note that the cgroup of {{cpu.shares}} was {{/mesos/slave}}. This was > possibly because that over the 16 minutes the pid got reused: > {noformat} > # zgrep 'systemd.cpp:98\]' /var/log/mesos/archive/mesos-agent.log.12.gz > ... > I0523 06:00:54.525178 21815 systemd.cpp:98] Assigned child process '13716' to > 'mesos_executors.slice' > I0523 06:00:55.078546 21808 systemd.cpp:98] Assigned child process '13798' to > 'mesos_executors.slice' > I0523 06:00:55.134096 21808 systemd.cpp:98] Assigned child process '13799' to > 'mesos_executors.slice' > ... > I0523 06:06:30.997439 21808 systemd.cpp:98] Assigned child process '32689' to > 'mesos_executors.slice' > I0523 06:06:31.050976 21808 systemd.cpp:98] Assigned child process '32690' to > 'mesos_executors.slice' > I0523 06:06:31.110514 21815 systemd.cpp:98] Assigned child process '32692' to > 'mesos_executors.slice' > I0523 06:06:33.143726 21818 systemd.cpp:98