[Devel] Re: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v7)
On Mon, Mar 15, 2010 at 11:36:12AM +0900, KAMEZAWA Hiroyuki wrote: On Mon, 15 Mar 2010 00:26:37 +0100 Andrea Righi ari...@develer.com wrote: Control the maximum amount of dirty pages a cgroup can have at any given time. Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim) page cache used by any cgroup. So, in case of multiple cgroup writers, they will not be able to consume more than their designated share of dirty pages and will be forced to perform write-out if they cross that limit. The overall design is the following: - account dirty pages per cgroup - limit the number of dirty pages via memory.dirty_ratio / memory.dirty_bytes and memory.dirty_background_ratio / memory.dirty_background_bytes in cgroupfs - start to write-out (background or actively) when the cgroup limits are exceeded This feature is supposed to be strictly connected to any underlying IO controller implementation, so we can stop increasing dirty pages in VM layer and enforce a write-out before any cgroup will consume the global amount of dirty pages defined by the /proc/sys/vm/dirty_ratio|dirty_bytes and /proc/sys/vm/dirty_background_ratio|dirty_background_bytes limits. Changelog (v6 - v7) ~~ * introduce trylock_page_cgroup() to guarantee that lock_page_cgroup() is never called under tree_lock (no strict accounting, but better overall performance) * do not account file cache statistics for the root cgroup (zero overhead for the root cgroup) * fix: evaluate cgroup free pages as at the minimum free pages of all its parents Results ~~~ The testcase is a kernel build (2.6.33 x86_64_defconfig) on a Intel Core 2 @ 1.2GHz: before - root cgroup:11m51.983s - child cgroup:11m56.596s after - root cgroup: 11m51.742s - child cgroup:12m5.016s In the previous version of this patchset, using the complex locking scheme with the _locked and _unlocked version of mem_cgroup_update_page_stat(), the child cgroup required 11m57.896s and 12m9.920s with lock_page_cgroup()+irq_disabled. With this version there's no overhead for the root cgroup (the small difference is in error range). I expected to see less overhead for the child cgroup, I'll do more testing and try to figure better what's happening. Okay, thanks. This seems good result. Optimization for children can be done under -mm tree, I think. (If no nack, this seems ready for test in -mm.) OK, I'll wait a bit to see if someone has other fixes or issues and post a new version soon including these small changes. Thanks, -Andrea ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v7)
* Andrea Righi ari...@develer.com [2010-03-15 00:26:37]: Control the maximum amount of dirty pages a cgroup can have at any given time. Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim) page cache used by any cgroup. So, in case of multiple cgroup writers, they will not be able to consume more than their designated share of dirty pages and will be forced to perform write-out if they cross that limit. The overall design is the following: - account dirty pages per cgroup - limit the number of dirty pages via memory.dirty_ratio / memory.dirty_bytes and memory.dirty_background_ratio / memory.dirty_background_bytes in cgroupfs - start to write-out (background or actively) when the cgroup limits are exceeded This feature is supposed to be strictly connected to any underlying IO controller implementation, so we can stop increasing dirty pages in VM layer and enforce a write-out before any cgroup will consume the global amount of dirty pages defined by the /proc/sys/vm/dirty_ratio|dirty_bytes and /proc/sys/vm/dirty_background_ratio|dirty_background_bytes limits. Changelog (v6 - v7) ~~ * introduce trylock_page_cgroup() to guarantee that lock_page_cgroup() is never called under tree_lock (no strict accounting, but better overall performance) * do not account file cache statistics for the root cgroup (zero overhead for the root cgroup) * fix: evaluate cgroup free pages as at the minimum free pages of all its parents Results ~~~ The testcase is a kernel build (2.6.33 x86_64_defconfig) on a Intel Core 2 @ 1.2GHz: before - root cgroup: 11m51.983s - child cgroup: 11m56.596s after - root cgroup: 11m51.742s - child cgroup: 12m5.016s In the previous version of this patchset, using the complex locking scheme with the _locked and _unlocked version of mem_cgroup_update_page_stat(), the child cgroup required 11m57.896s and 12m9.920s with lock_page_cgroup()+irq_disabled. With this version there's no overhead for the root cgroup (the small difference is in error range). I expected to see less overhead for the child cgroup, I'll do more testing and try to figure better what's happening. I like that the root overhead is going away. In the while, it would be great if someone could perform some tests on a larger system... unfortunately at the moment I don't have a big system available for this kind of tests... I'll test this, I have a small machine to test on at the moment, I'll revert back with data. -- Three Cheers, Balbir ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v7)
* Vivek Goyal vgo...@redhat.com [2010-03-15 13:19:21]: On Mon, Mar 15, 2010 at 01:12:09PM -0400, Vivek Goyal wrote: On Mon, Mar 15, 2010 at 12:26:37AM +0100, Andrea Righi wrote: Control the maximum amount of dirty pages a cgroup can have at any given time. Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim) page cache used by any cgroup. So, in case of multiple cgroup writers, they will not be able to consume more than their designated share of dirty pages and will be forced to perform write-out if they cross that limit. For me even with this version I see that group with 100M limit is getting much more BW. root cgroup == #time dd if=/dev/zero of=/root/zerofile bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 55.7979 s, 77.0 MB/s real0m56.209s test1 cgroup with memory limit of 100M == # time dd if=/dev/zero of=/root/zerofile1 bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 20.9252 s, 205 MB/s real0m21.096s Note, these two jobs are not running in parallel. These are running one after the other. Ok, here is the strange part. I am seeing similar behavior even without your patches applied. root cgroup == #time dd if=/dev/zero of=/root/zerofile bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 56.098 s, 76.6 MB/s real 0m56.614s test1 cgroup with memory limit 100M === # time dd if=/dev/zero of=/root/zerofile1 bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 19.8097 s, 217 MB/s real 0m19.992s This is strange, did you flish the cache between the two runs? NOTE: Since the files are same, we reuse page cache from the other cgroup. -- Three Cheers, Balbir ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v7)
On Thu, Mar 18, 2010 at 12:23:27AM +0530, Balbir Singh wrote: * Vivek Goyal vgo...@redhat.com [2010-03-17 09:34:07]: root cgroup == #time dd if=/dev/zero of=/root/zerofile bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 56.098 s, 76.6 MB/s real0m56.614s test1 cgroup with memory limit 100M === # time dd if=/dev/zero of=/root/zerofile1 bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 19.8097 s, 217 MB/s real0m19.992s This is strange, did you flish the cache between the two runs? NOTE: Since the files are same, we reuse page cache from the other cgroup. Files are different. Note suffix 1. Thanks, I'll get the perf output and see what I get. One more thing I noticed and that is, it happens only if we limit the memory of cgroup to 100M. If same cgroup test1 is unlimited memory thing, then it did not happen. I also did not notice this happening on other system where I have 4G of memory. So it also seems to be related with only bigger configurations. Thanks Vivek -- Three Cheers, Balbir ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v7)
* Vivek Goyal vgo...@redhat.com [2010-03-17 09:34:07]: On Wed, Mar 17, 2010 at 05:24:28PM +0530, Balbir Singh wrote: * Vivek Goyal vgo...@redhat.com [2010-03-15 13:19:21]: On Mon, Mar 15, 2010 at 01:12:09PM -0400, Vivek Goyal wrote: On Mon, Mar 15, 2010 at 12:26:37AM +0100, Andrea Righi wrote: Control the maximum amount of dirty pages a cgroup can have at any given time. Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim) page cache used by any cgroup. So, in case of multiple cgroup writers, they will not be able to consume more than their designated share of dirty pages and will be forced to perform write-out if they cross that limit. For me even with this version I see that group with 100M limit is getting much more BW. root cgroup == #time dd if=/dev/zero of=/root/zerofile bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 55.7979 s, 77.0 MB/s real0m56.209s test1 cgroup with memory limit of 100M == # time dd if=/dev/zero of=/root/zerofile1 bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 20.9252 s, 205 MB/s real0m21.096s Note, these two jobs are not running in parallel. These are running one after the other. The data is not always repeatable at my end. Are you able to modify the order and get repeatable results? In fact, I saw for cgroup != root -- 4294967296 bytes (4.3 GB) copied, 120.359 s, 35.7 MB/s for cgroup = root - 4294967296 bytes (4.3 GB) copied, 84.504 s, 50.8 MB/s This is without the patches applied. -- Three Cheers, Balbir ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v7)
On Thu, Mar 18, 2010 at 12:47:43AM +0530, Balbir Singh wrote: * Vivek Goyal vgo...@redhat.com [2010-03-17 09:34:07]: On Wed, Mar 17, 2010 at 05:24:28PM +0530, Balbir Singh wrote: * Vivek Goyal vgo...@redhat.com [2010-03-15 13:19:21]: On Mon, Mar 15, 2010 at 01:12:09PM -0400, Vivek Goyal wrote: On Mon, Mar 15, 2010 at 12:26:37AM +0100, Andrea Righi wrote: Control the maximum amount of dirty pages a cgroup can have at any given time. Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim) page cache used by any cgroup. So, in case of multiple cgroup writers, they will not be able to consume more than their designated share of dirty pages and will be forced to perform write-out if they cross that limit. For me even with this version I see that group with 100M limit is getting much more BW. root cgroup == #time dd if=/dev/zero of=/root/zerofile bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 55.7979 s, 77.0 MB/s real 0m56.209s test1 cgroup with memory limit of 100M == # time dd if=/dev/zero of=/root/zerofile1 bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 20.9252 s, 205 MB/s real 0m21.096s Note, these two jobs are not running in parallel. These are running one after the other. The data is not always repeatable at my end. Are you able to modify the order and get repeatable results? In fact, I saw for cgroup != root -- 4294967296 bytes (4.3 GB) copied, 120.359 s, 35.7 MB/s for cgroup = root - 4294967296 bytes (4.3 GB) copied, 84.504 s, 50.8 MB/s This is without the patches applied. I lost the access to the big configuration machine but on that machine I could reproduce it all the time. But on smaller machine (4core, 4G), I could not. I will do some more tests later. Vivek ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v7)
On Mon, Mar 15, 2010 at 12:26:37AM +0100, Andrea Righi wrote: Control the maximum amount of dirty pages a cgroup can have at any given time. Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim) page cache used by any cgroup. So, in case of multiple cgroup writers, they will not be able to consume more than their designated share of dirty pages and will be forced to perform write-out if they cross that limit. For me even with this version I see that group with 100M limit is getting much more BW. root cgroup == #time dd if=/dev/zero of=/root/zerofile bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 55.7979 s, 77.0 MB/s real0m56.209s test1 cgroup with memory limit of 100M == # time dd if=/dev/zero of=/root/zerofile1 bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 20.9252 s, 205 MB/s real0m21.096s Note, these two jobs are not running in parallel. These are running one after the other. Vivek The overall design is the following: - account dirty pages per cgroup - limit the number of dirty pages via memory.dirty_ratio / memory.dirty_bytes and memory.dirty_background_ratio / memory.dirty_background_bytes in cgroupfs - start to write-out (background or actively) when the cgroup limits are exceeded This feature is supposed to be strictly connected to any underlying IO controller implementation, so we can stop increasing dirty pages in VM layer and enforce a write-out before any cgroup will consume the global amount of dirty pages defined by the /proc/sys/vm/dirty_ratio|dirty_bytes and /proc/sys/vm/dirty_background_ratio|dirty_background_bytes limits. Changelog (v6 - v7) ~~ * introduce trylock_page_cgroup() to guarantee that lock_page_cgroup() is never called under tree_lock (no strict accounting, but better overall performance) * do not account file cache statistics for the root cgroup (zero overhead for the root cgroup) * fix: evaluate cgroup free pages as at the minimum free pages of all its parents Results ~~~ The testcase is a kernel build (2.6.33 x86_64_defconfig) on a Intel Core 2 @ 1.2GHz: before - root cgroup: 11m51.983s - child cgroup: 11m56.596s after - root cgroup: 11m51.742s - child cgroup: 12m5.016s In the previous version of this patchset, using the complex locking scheme with the _locked and _unlocked version of mem_cgroup_update_page_stat(), the child cgroup required 11m57.896s and 12m9.920s with lock_page_cgroup()+irq_disabled. With this version there's no overhead for the root cgroup (the small difference is in error range). I expected to see less overhead for the child cgroup, I'll do more testing and try to figure better what's happening. In the while, it would be great if someone could perform some tests on a larger system... unfortunately at the moment I don't have a big system available for this kind of tests... Thanks, -Andrea Documentation/cgroups/memory.txt | 36 +++ fs/nfs/write.c |4 + include/linux/memcontrol.h | 87 ++- include/linux/page_cgroup.h | 35 +++ include/linux/writeback.h|2 - mm/filemap.c |1 + mm/memcontrol.c | 542 +++--- mm/page-writeback.c | 215 ++-- mm/rmap.c|4 +- mm/truncate.c|1 + 10 files changed, 806 insertions(+), 121 deletions(-) ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v7)
On Mon, Mar 15, 2010 at 01:12:09PM -0400, Vivek Goyal wrote: On Mon, Mar 15, 2010 at 12:26:37AM +0100, Andrea Righi wrote: Control the maximum amount of dirty pages a cgroup can have at any given time. Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim) page cache used by any cgroup. So, in case of multiple cgroup writers, they will not be able to consume more than their designated share of dirty pages and will be forced to perform write-out if they cross that limit. For me even with this version I see that group with 100M limit is getting much more BW. root cgroup == #time dd if=/dev/zero of=/root/zerofile bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 55.7979 s, 77.0 MB/s real 0m56.209s test1 cgroup with memory limit of 100M == # time dd if=/dev/zero of=/root/zerofile1 bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 20.9252 s, 205 MB/s real 0m21.096s Note, these two jobs are not running in parallel. These are running one after the other. Ok, here is the strange part. I am seeing similar behavior even without your patches applied. root cgroup == #time dd if=/dev/zero of=/root/zerofile bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 56.098 s, 76.6 MB/s real0m56.614s test1 cgroup with memory limit 100M === # time dd if=/dev/zero of=/root/zerofile1 bs=4K count=1M 4294967296 bytes (4.3 GB) copied, 19.8097 s, 217 MB/s real0m19.992s Vivek The overall design is the following: - account dirty pages per cgroup - limit the number of dirty pages via memory.dirty_ratio / memory.dirty_bytes and memory.dirty_background_ratio / memory.dirty_background_bytes in cgroupfs - start to write-out (background or actively) when the cgroup limits are exceeded This feature is supposed to be strictly connected to any underlying IO controller implementation, so we can stop increasing dirty pages in VM layer and enforce a write-out before any cgroup will consume the global amount of dirty pages defined by the /proc/sys/vm/dirty_ratio|dirty_bytes and /proc/sys/vm/dirty_background_ratio|dirty_background_bytes limits. Changelog (v6 - v7) ~~ * introduce trylock_page_cgroup() to guarantee that lock_page_cgroup() is never called under tree_lock (no strict accounting, but better overall performance) * do not account file cache statistics for the root cgroup (zero overhead for the root cgroup) * fix: evaluate cgroup free pages as at the minimum free pages of all its parents Results ~~~ The testcase is a kernel build (2.6.33 x86_64_defconfig) on a Intel Core 2 @ 1.2GHz: before - root cgroup:11m51.983s - child cgroup:11m56.596s after - root cgroup: 11m51.742s - child cgroup:12m5.016s In the previous version of this patchset, using the complex locking scheme with the _locked and _unlocked version of mem_cgroup_update_page_stat(), the child cgroup required 11m57.896s and 12m9.920s with lock_page_cgroup()+irq_disabled. With this version there's no overhead for the root cgroup (the small difference is in error range). I expected to see less overhead for the child cgroup, I'll do more testing and try to figure better what's happening. In the while, it would be great if someone could perform some tests on a larger system... unfortunately at the moment I don't have a big system available for this kind of tests... Thanks, -Andrea Documentation/cgroups/memory.txt | 36 +++ fs/nfs/write.c |4 + include/linux/memcontrol.h | 87 ++- include/linux/page_cgroup.h | 35 +++ include/linux/writeback.h|2 - mm/filemap.c |1 + mm/memcontrol.c | 542 +++--- mm/page-writeback.c | 215 ++-- mm/rmap.c|4 +- mm/truncate.c|1 + 10 files changed, 806 insertions(+), 121 deletions(-) ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [PATCH -mmotm 0/5] memcg: per cgroup dirty limit (v7)
On Mon, 15 Mar 2010 00:26:37 +0100 Andrea Righi ari...@develer.com wrote: Control the maximum amount of dirty pages a cgroup can have at any given time. Per cgroup dirty limit is like fixing the max amount of dirty (hard to reclaim) page cache used by any cgroup. So, in case of multiple cgroup writers, they will not be able to consume more than their designated share of dirty pages and will be forced to perform write-out if they cross that limit. The overall design is the following: - account dirty pages per cgroup - limit the number of dirty pages via memory.dirty_ratio / memory.dirty_bytes and memory.dirty_background_ratio / memory.dirty_background_bytes in cgroupfs - start to write-out (background or actively) when the cgroup limits are exceeded This feature is supposed to be strictly connected to any underlying IO controller implementation, so we can stop increasing dirty pages in VM layer and enforce a write-out before any cgroup will consume the global amount of dirty pages defined by the /proc/sys/vm/dirty_ratio|dirty_bytes and /proc/sys/vm/dirty_background_ratio|dirty_background_bytes limits. Changelog (v6 - v7) ~~ * introduce trylock_page_cgroup() to guarantee that lock_page_cgroup() is never called under tree_lock (no strict accounting, but better overall performance) * do not account file cache statistics for the root cgroup (zero overhead for the root cgroup) * fix: evaluate cgroup free pages as at the minimum free pages of all its parents Results ~~~ The testcase is a kernel build (2.6.33 x86_64_defconfig) on a Intel Core 2 @ 1.2GHz: before - root cgroup: 11m51.983s - child cgroup: 11m56.596s after - root cgroup: 11m51.742s - child cgroup: 12m5.016s In the previous version of this patchset, using the complex locking scheme with the _locked and _unlocked version of mem_cgroup_update_page_stat(), the child cgroup required 11m57.896s and 12m9.920s with lock_page_cgroup()+irq_disabled. With this version there's no overhead for the root cgroup (the small difference is in error range). I expected to see less overhead for the child cgroup, I'll do more testing and try to figure better what's happening. Okay, thanks. This seems good result. Optimization for children can be done under -mm tree, I think. (If no nack, this seems ready for test in -mm.) In the while, it would be great if someone could perform some tests on a larger system... unfortunately at the moment I don't have a big system available for this kind of tests... I hope, too. Thanks, -Kame ___ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel