Re: [PATCH 0/8] Reduce system disruption due to kswapd followup V3
于 2013/5/30 7:17, Mel Gorman 写道: > tldr; Overall the system is getting less kicked in the face. Scan rates > between zones is often more balanced than it used to be. There are > now fewer writes from reclaim context and a reduction in IO wait > times. > > This series replaces all of the previous follow-up series. It was clear > that more of the stall logic needed to be in the same place so it is > comprehensible and easier to predict. > > Changelog since V2 > o Consolidate stall decisions into one place > o Add is_dirty_writeback for NFS > o Move accounting around > > Further testing of the "Reduce system disruption due to kswapd" discovered > a few problems. First and foremost, it's possible for pages under writeback > to be freed which will lead to badness. Second, as pages were not being > swapped the file LRU was being scanned faster and clean file pages were > being reclaimed. In some cases this results in increased read IO to re-read > data from disk. Third, more pages were being written from kswapd context > which can adversly affect IO performance. Lastly, it was observed that > PageDirty pages are not necessarily dirty on all filesystems (buffers can be > clean while PageDirty is set and ->writepage generates no IO) and not all > filesystems set PageWriteback when the page is being written (e.g. ext3). > This disconnect confuses the reclaim stalling logic. This follow-up series > is aimed at these problems. > > The tests were based on three kernels > > vanilla: kernel 3.9 as that is what the current mmotm uses as a baseline > mmotm-20130522is mmotm as of 22nd May with "Reduce system disruption > due to > kswapd" applied on top as per what should be in Andrew's tree > right now > lessdisrupt-v7r10 is this follow-up series on top of the mmotm kernel > > The first test used memcached+memcachetest while some background IO > was in progress as implemented by the parallel IO tests implement in > MM Tests. memcachetest benchmarks how many operations/second memcached > can service. It starts with no background IO on a freshly created ext4 > filesystem and then re-runs the test with larger amounts of IO in the > background to roughly simulate a large copy in progress. The expectation > is that the IO should have little or no impact on memcachetest which is > running entirely in memory. > > parallelio > 3.9.0 > 3.9.0 3.9.0 >vanilla > mm1-mmotm-20130522 mm1-lessdisrupt-v7r10 > Ops memcachetest-0M 23117.00 ( 0.00%) 22780.00 ( > -1.46%) 22763.00 ( -1.53%) > Ops memcachetest-715M 23774.00 ( 0.00%) 23299.00 ( > -2.00%) 22934.00 ( -3.53%) > Ops memcachetest-2385M 4208.00 ( 0.00%) 24154.00 > (474.00%) 23765.00 (464.76%) > Ops memcachetest-4055M 4104.00 ( 0.00%) 25130.00 > (512.33%) 24614.00 (499.76%) > Ops io-duration-0M 0.00 ( 0.00%) 0.00 ( > 0.00%) 0.00 ( 0.00%) > Ops io-duration-715M 12.00 ( 0.00%) 7.00 ( > 41.67%) 6.00 ( 50.00%) > Ops io-duration-2385M 116.00 ( 0.00%) 21.00 ( > 81.90%) 21.00 ( 81.90%) > Ops io-duration-4055M 160.00 ( 0.00%) 36.00 ( > 77.50%) 35.00 ( 78.12%) > Ops swaptotal-0M0.00 ( 0.00%) 0.00 ( > 0.00%) 0.00 ( 0.00%) > Ops swaptotal-715M 140138.00 ( 0.00%) 18.00 ( > 99.99%) 18.00 ( 99.99%) > Ops swaptotal-2385M385682.00 ( 0.00%) 0.00 ( > 0.00%) 0.00 ( 0.00%) > Ops swaptotal-4055M418029.00 ( 0.00%) 0.00 ( > 0.00%) 0.00 ( 0.00%) > Ops swapin-0M 0.00 ( 0.00%) 0.00 ( > 0.00%) 0.00 ( 0.00%) > Ops swapin-715M 144.00 ( 0.00%) 0.00 ( > 0.00%) 0.00 ( 0.00%) > Ops swapin-2385M 134227.00 ( 0.00%) 0.00 ( > 0.00%) 0.00 ( 0.00%) > Ops swapin-4055M 125618.00 ( 0.00%) 0.00 ( > 0.00%) 0.00 ( 0.00%) > Ops minorfaults-0M1536429.00 ( 0.00%)1531632.00 ( > 0.31%)1533541.00 ( 0.19%) > Ops minorfaults-715M 1786996.00 ( 0.00%)1612148.00 ( > 9.78%)1608832.00 ( 9.97%) > Ops minorfaults-2385M 1757952.00 ( 0.00%)1614874.00 ( > 8.14%)1613541.00 ( 8.21%) > Ops minorfaults-4055M 1774460.00 ( 0.00%)1633400.00 ( > 7.95%)1630881.00 ( 8.09%) > Ops majorfaults-0M 1.00 ( 0.00%) 0.00 ( > 0.00%) 0.00 ( 0.00%) > Ops
Re: [PATCH 0/8] Reduce system disruption due to kswapd followup V3
于 2013/5/30 7:17, Mel Gorman 写道: tldr; Overall the system is getting less kicked in the face. Scan rates between zones is often more balanced than it used to be. There are now fewer writes from reclaim context and a reduction in IO wait times. This series replaces all of the previous follow-up series. It was clear that more of the stall logic needed to be in the same place so it is comprehensible and easier to predict. Changelog since V2 o Consolidate stall decisions into one place o Add is_dirty_writeback for NFS o Move accounting around Further testing of the Reduce system disruption due to kswapd discovered a few problems. First and foremost, it's possible for pages under writeback to be freed which will lead to badness. Second, as pages were not being swapped the file LRU was being scanned faster and clean file pages were being reclaimed. In some cases this results in increased read IO to re-read data from disk. Third, more pages were being written from kswapd context which can adversly affect IO performance. Lastly, it was observed that PageDirty pages are not necessarily dirty on all filesystems (buffers can be clean while PageDirty is set and -writepage generates no IO) and not all filesystems set PageWriteback when the page is being written (e.g. ext3). This disconnect confuses the reclaim stalling logic. This follow-up series is aimed at these problems. The tests were based on three kernels vanilla: kernel 3.9 as that is what the current mmotm uses as a baseline mmotm-20130522is mmotm as of 22nd May with Reduce system disruption due to kswapd applied on top as per what should be in Andrew's tree right now lessdisrupt-v7r10 is this follow-up series on top of the mmotm kernel The first test used memcached+memcachetest while some background IO was in progress as implemented by the parallel IO tests implement in MM Tests. memcachetest benchmarks how many operations/second memcached can service. It starts with no background IO on a freshly created ext4 filesystem and then re-runs the test with larger amounts of IO in the background to roughly simulate a large copy in progress. The expectation is that the IO should have little or no impact on memcachetest which is running entirely in memory. parallelio 3.9.0 3.9.0 3.9.0 vanilla mm1-mmotm-20130522 mm1-lessdisrupt-v7r10 Ops memcachetest-0M 23117.00 ( 0.00%) 22780.00 ( -1.46%) 22763.00 ( -1.53%) Ops memcachetest-715M 23774.00 ( 0.00%) 23299.00 ( -2.00%) 22934.00 ( -3.53%) Ops memcachetest-2385M 4208.00 ( 0.00%) 24154.00 (474.00%) 23765.00 (464.76%) Ops memcachetest-4055M 4104.00 ( 0.00%) 25130.00 (512.33%) 24614.00 (499.76%) Ops io-duration-0M 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops io-duration-715M 12.00 ( 0.00%) 7.00 ( 41.67%) 6.00 ( 50.00%) Ops io-duration-2385M 116.00 ( 0.00%) 21.00 ( 81.90%) 21.00 ( 81.90%) Ops io-duration-4055M 160.00 ( 0.00%) 36.00 ( 77.50%) 35.00 ( 78.12%) Ops swaptotal-0M0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swaptotal-715M 140138.00 ( 0.00%) 18.00 ( 99.99%) 18.00 ( 99.99%) Ops swaptotal-2385M385682.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swaptotal-4055M418029.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-0M 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-715M 144.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-2385M 134227.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-4055M 125618.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops minorfaults-0M1536429.00 ( 0.00%)1531632.00 ( 0.31%)1533541.00 ( 0.19%) Ops minorfaults-715M 1786996.00 ( 0.00%)1612148.00 ( 9.78%)1608832.00 ( 9.97%) Ops minorfaults-2385M 1757952.00 ( 0.00%)1614874.00 ( 8.14%)1613541.00 ( 8.21%) Ops minorfaults-4055M 1774460.00 ( 0.00%)1633400.00 ( 7.95%)1630881.00 ( 8.09%) Ops majorfaults-0M 1.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops majorfaults-715M 184.00 ( 0.00%)167.00 ( 9.24%)166.00 ( 9.78%) Ops
Re: [PATCH 0/8] Reduce system disruption due to kswapd followup V3
On Thu, May 30, 2013 at 12:17:29AM +0100, Mel Gorman wrote: > tldr; Overall the system is getting less kicked in the face. Scan rates > between zones is often more balanced than it used to be. There are > now fewer writes from reclaim context and a reduction in IO wait > times. > > This series replaces all of the previous follow-up series. It was clear > that more of the stall logic needed to be in the same place so it is > comprehensible and easier to predict. > There was some unfortunate crossover in timing as I see mmotm has pulled in the previous follow up series. It would probably be easiest to replace these patches mm-vmscan-stall-page-reclaim-and-writeback-pages-based-on-dirty-writepage-pages-encountered.patch mm-vmscan-stall-page-reclaim-after-a-list-of-pages-have-been-processed.patch mm-vmscan-take-page-buffers-dirty-and-locked-state-into-account.patch mm-vmscan-stall-page-reclaim-and-writeback-pages-based-on-dirty-writepage-pages-encountered.patch with patches 2-8 of this series. The fixup patch mm-vmscan-block-kswapd-if-it-is-encountering-pages-under-writeback-fix-2.patch is still the same Sorry for the inconvenience. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] Reduce system disruption due to kswapd followup V3
On Thu, May 30, 2013 at 12:17:29AM +0100, Mel Gorman wrote: tldr; Overall the system is getting less kicked in the face. Scan rates between zones is often more balanced than it used to be. There are now fewer writes from reclaim context and a reduction in IO wait times. This series replaces all of the previous follow-up series. It was clear that more of the stall logic needed to be in the same place so it is comprehensible and easier to predict. There was some unfortunate crossover in timing as I see mmotm has pulled in the previous follow up series. It would probably be easiest to replace these patches mm-vmscan-stall-page-reclaim-and-writeback-pages-based-on-dirty-writepage-pages-encountered.patch mm-vmscan-stall-page-reclaim-after-a-list-of-pages-have-been-processed.patch mm-vmscan-take-page-buffers-dirty-and-locked-state-into-account.patch mm-vmscan-stall-page-reclaim-and-writeback-pages-based-on-dirty-writepage-pages-encountered.patch with patches 2-8 of this series. The fixup patch mm-vmscan-block-kswapd-if-it-is-encountering-pages-under-writeback-fix-2.patch is still the same Sorry for the inconvenience. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/8] Reduce system disruption due to kswapd followup V3
tldr; Overall the system is getting less kicked in the face. Scan rates between zones is often more balanced than it used to be. There are now fewer writes from reclaim context and a reduction in IO wait times. This series replaces all of the previous follow-up series. It was clear that more of the stall logic needed to be in the same place so it is comprehensible and easier to predict. Changelog since V2 o Consolidate stall decisions into one place o Add is_dirty_writeback for NFS o Move accounting around Further testing of the "Reduce system disruption due to kswapd" discovered a few problems. First and foremost, it's possible for pages under writeback to be freed which will lead to badness. Second, as pages were not being swapped the file LRU was being scanned faster and clean file pages were being reclaimed. In some cases this results in increased read IO to re-read data from disk. Third, more pages were being written from kswapd context which can adversly affect IO performance. Lastly, it was observed that PageDirty pages are not necessarily dirty on all filesystems (buffers can be clean while PageDirty is set and ->writepage generates no IO) and not all filesystems set PageWriteback when the page is being written (e.g. ext3). This disconnect confuses the reclaim stalling logic. This follow-up series is aimed at these problems. The tests were based on three kernels vanilla:kernel 3.9 as that is what the current mmotm uses as a baseline mmotm-20130522 is mmotm as of 22nd May with "Reduce system disruption due to kswapd" applied on top as per what should be in Andrew's tree right now lessdisrupt-v7r10 is this follow-up series on top of the mmotm kernel The first test used memcached+memcachetest while some background IO was in progress as implemented by the parallel IO tests implement in MM Tests. memcachetest benchmarks how many operations/second memcached can service. It starts with no background IO on a freshly created ext4 filesystem and then re-runs the test with larger amounts of IO in the background to roughly simulate a large copy in progress. The expectation is that the IO should have little or no impact on memcachetest which is running entirely in memory. parallelio 3.9.0 3.9.0 3.9.0 vanilla mm1-mmotm-20130522 mm1-lessdisrupt-v7r10 Ops memcachetest-0M 23117.00 ( 0.00%) 22780.00 ( -1.46%) 22763.00 ( -1.53%) Ops memcachetest-715M 23774.00 ( 0.00%) 23299.00 ( -2.00%) 22934.00 ( -3.53%) Ops memcachetest-2385M 4208.00 ( 0.00%) 24154.00 (474.00%) 23765.00 (464.76%) Ops memcachetest-4055M 4104.00 ( 0.00%) 25130.00 (512.33%) 24614.00 (499.76%) Ops io-duration-0M 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops io-duration-715M 12.00 ( 0.00%) 7.00 ( 41.67%) 6.00 ( 50.00%) Ops io-duration-2385M 116.00 ( 0.00%) 21.00 ( 81.90%) 21.00 ( 81.90%) Ops io-duration-4055M 160.00 ( 0.00%) 36.00 ( 77.50%) 35.00 ( 78.12%) Ops swaptotal-0M0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swaptotal-715M 140138.00 ( 0.00%) 18.00 ( 99.99%) 18.00 ( 99.99%) Ops swaptotal-2385M385682.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swaptotal-4055M418029.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-0M 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-715M 144.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-2385M 134227.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-4055M 125618.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops minorfaults-0M1536429.00 ( 0.00%)1531632.00 ( 0.31%) 1533541.00 ( 0.19%) Ops minorfaults-715M 1786996.00 ( 0.00%)1612148.00 ( 9.78%) 1608832.00 ( 9.97%) Ops minorfaults-2385M 1757952.00 ( 0.00%)1614874.00 ( 8.14%) 1613541.00 ( 8.21%) Ops minorfaults-4055M 1774460.00 ( 0.00%)1633400.00 ( 7.95%) 1630881.00 ( 8.09%) Ops majorfaults-0M 1.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops majorfaults-715M 184.00 ( 0.00%)167.00 ( 9.24%) 166.00 ( 9.78%) Ops majorfaults-2385M 2.00 ( 0.00%)155.00 ( 99.37%) 93.00 ( 99.62%) Ops
[PATCH 0/8] Reduce system disruption due to kswapd followup V3
tldr; Overall the system is getting less kicked in the face. Scan rates between zones is often more balanced than it used to be. There are now fewer writes from reclaim context and a reduction in IO wait times. This series replaces all of the previous follow-up series. It was clear that more of the stall logic needed to be in the same place so it is comprehensible and easier to predict. Changelog since V2 o Consolidate stall decisions into one place o Add is_dirty_writeback for NFS o Move accounting around Further testing of the Reduce system disruption due to kswapd discovered a few problems. First and foremost, it's possible for pages under writeback to be freed which will lead to badness. Second, as pages were not being swapped the file LRU was being scanned faster and clean file pages were being reclaimed. In some cases this results in increased read IO to re-read data from disk. Third, more pages were being written from kswapd context which can adversly affect IO performance. Lastly, it was observed that PageDirty pages are not necessarily dirty on all filesystems (buffers can be clean while PageDirty is set and -writepage generates no IO) and not all filesystems set PageWriteback when the page is being written (e.g. ext3). This disconnect confuses the reclaim stalling logic. This follow-up series is aimed at these problems. The tests were based on three kernels vanilla:kernel 3.9 as that is what the current mmotm uses as a baseline mmotm-20130522 is mmotm as of 22nd May with Reduce system disruption due to kswapd applied on top as per what should be in Andrew's tree right now lessdisrupt-v7r10 is this follow-up series on top of the mmotm kernel The first test used memcached+memcachetest while some background IO was in progress as implemented by the parallel IO tests implement in MM Tests. memcachetest benchmarks how many operations/second memcached can service. It starts with no background IO on a freshly created ext4 filesystem and then re-runs the test with larger amounts of IO in the background to roughly simulate a large copy in progress. The expectation is that the IO should have little or no impact on memcachetest which is running entirely in memory. parallelio 3.9.0 3.9.0 3.9.0 vanilla mm1-mmotm-20130522 mm1-lessdisrupt-v7r10 Ops memcachetest-0M 23117.00 ( 0.00%) 22780.00 ( -1.46%) 22763.00 ( -1.53%) Ops memcachetest-715M 23774.00 ( 0.00%) 23299.00 ( -2.00%) 22934.00 ( -3.53%) Ops memcachetest-2385M 4208.00 ( 0.00%) 24154.00 (474.00%) 23765.00 (464.76%) Ops memcachetest-4055M 4104.00 ( 0.00%) 25130.00 (512.33%) 24614.00 (499.76%) Ops io-duration-0M 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops io-duration-715M 12.00 ( 0.00%) 7.00 ( 41.67%) 6.00 ( 50.00%) Ops io-duration-2385M 116.00 ( 0.00%) 21.00 ( 81.90%) 21.00 ( 81.90%) Ops io-duration-4055M 160.00 ( 0.00%) 36.00 ( 77.50%) 35.00 ( 78.12%) Ops swaptotal-0M0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swaptotal-715M 140138.00 ( 0.00%) 18.00 ( 99.99%) 18.00 ( 99.99%) Ops swaptotal-2385M385682.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swaptotal-4055M418029.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-0M 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-715M 144.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-2385M 134227.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops swapin-4055M 125618.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops minorfaults-0M1536429.00 ( 0.00%)1531632.00 ( 0.31%) 1533541.00 ( 0.19%) Ops minorfaults-715M 1786996.00 ( 0.00%)1612148.00 ( 9.78%) 1608832.00 ( 9.97%) Ops minorfaults-2385M 1757952.00 ( 0.00%)1614874.00 ( 8.14%) 1613541.00 ( 8.21%) Ops minorfaults-4055M 1774460.00 ( 0.00%)1633400.00 ( 7.95%) 1630881.00 ( 8.09%) Ops majorfaults-0M 1.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Ops majorfaults-715M 184.00 ( 0.00%)167.00 ( 9.24%) 166.00 ( 9.78%) Ops majorfaults-2385M 2.00 ( 0.00%)155.00 ( 99.37%) 93.00 ( 99.62%) Ops