Re: [PATCH 0/8] Reduce system disruption due to kswapd followup V3

2013-07-15 Thread Hush Bensen
于 2013/5/30 7:17, Mel Gorman 写道:
> tldr; Overall the system is getting less kicked in the face. Scan rates
>   between zones is often more balanced than it used to be. There are
>   now fewer writes from reclaim context and a reduction in IO wait
>   times.
>
> This series replaces all of the previous follow-up series. It was clear
> that more of the stall logic needed to be in the same place so it is
> comprehensible and easier to predict.
>
> Changelog since V2
> o Consolidate stall decisions into one place
> o Add is_dirty_writeback for NFS
> o Move accounting around
>
> Further testing of the "Reduce system disruption due to kswapd" discovered
> a few problems. First and foremost, it's possible for pages under writeback
> to be freed which will lead to badness. Second, as pages were not being
> swapped the file LRU was being scanned faster and clean file pages were
> being reclaimed. In some cases this results in increased read IO to re-read
> data from disk.  Third, more pages were being written from kswapd context
> which can adversly affect IO performance. Lastly, it was observed that
> PageDirty pages are not necessarily dirty on all filesystems (buffers can be
> clean while PageDirty is set and ->writepage generates no IO) and not all
> filesystems set PageWriteback when the page is being written (e.g. ext3).
> This disconnect confuses the reclaim stalling logic. This follow-up series
> is aimed at these problems.
>
> The tests were based on three kernels
>
> vanilla:  kernel 3.9 as that is what the current mmotm uses as a baseline
> mmotm-20130522is mmotm as of 22nd May with "Reduce system disruption 
> due to
>   kswapd" applied on top as per what should be in Andrew's tree
>   right now
> lessdisrupt-v7r10 is this follow-up series on top of the mmotm kernel
>
> The first test used memcached+memcachetest while some background IO
> was in progress as implemented by the parallel IO tests implement in
> MM Tests. memcachetest benchmarks how many operations/second memcached
> can service. It starts with no background IO on a freshly created ext4
> filesystem and then re-runs the test with larger amounts of IO in the
> background to roughly simulate a large copy in progress. The expectation
> is that the IO should have little or no impact on memcachetest which is
> running entirely in memory.
>
> parallelio
>  3.9.0   
> 3.9.0   3.9.0
>vanilla  
> mm1-mmotm-20130522   mm1-lessdisrupt-v7r10
> Ops memcachetest-0M 23117.00 (  0.00%)  22780.00 ( 
> -1.46%)  22763.00 ( -1.53%)
> Ops memcachetest-715M   23774.00 (  0.00%)  23299.00 ( 
> -2.00%)  22934.00 ( -3.53%)
> Ops memcachetest-2385M   4208.00 (  0.00%)  24154.00 
> (474.00%)  23765.00 (464.76%)
> Ops memcachetest-4055M   4104.00 (  0.00%)  25130.00 
> (512.33%)  24614.00 (499.76%)
> Ops io-duration-0M  0.00 (  0.00%)  0.00 (  
> 0.00%)  0.00 (  0.00%)
> Ops io-duration-715M   12.00 (  0.00%)  7.00 ( 
> 41.67%)  6.00 ( 50.00%)
> Ops io-duration-2385M 116.00 (  0.00%) 21.00 ( 
> 81.90%) 21.00 ( 81.90%)
> Ops io-duration-4055M 160.00 (  0.00%) 36.00 ( 
> 77.50%) 35.00 ( 78.12%)
> Ops swaptotal-0M0.00 (  0.00%)  0.00 (  
> 0.00%)  0.00 (  0.00%)
> Ops swaptotal-715M 140138.00 (  0.00%) 18.00 ( 
> 99.99%) 18.00 ( 99.99%)
> Ops swaptotal-2385M385682.00 (  0.00%)  0.00 (  
> 0.00%)  0.00 (  0.00%)
> Ops swaptotal-4055M418029.00 (  0.00%)  0.00 (  
> 0.00%)  0.00 (  0.00%)
> Ops swapin-0M   0.00 (  0.00%)  0.00 (  
> 0.00%)  0.00 (  0.00%)
> Ops swapin-715M   144.00 (  0.00%)  0.00 (  
> 0.00%)  0.00 (  0.00%)
> Ops swapin-2385M   134227.00 (  0.00%)  0.00 (  
> 0.00%)  0.00 (  0.00%)
> Ops swapin-4055M   125618.00 (  0.00%)  0.00 (  
> 0.00%)  0.00 (  0.00%)
> Ops minorfaults-0M1536429.00 (  0.00%)1531632.00 (  
> 0.31%)1533541.00 (  0.19%)
> Ops minorfaults-715M  1786996.00 (  0.00%)1612148.00 (  
> 9.78%)1608832.00 (  9.97%)
> Ops minorfaults-2385M 1757952.00 (  0.00%)1614874.00 (  
> 8.14%)1613541.00 (  8.21%)
> Ops minorfaults-4055M 1774460.00 (  0.00%)1633400.00 (  
> 7.95%)1630881.00 (  8.09%)
> Ops majorfaults-0M  1.00 (  0.00%)  0.00 (  
> 0.00%)  0.00 (  0.00%)
> Ops 

Re: [PATCH 0/8] Reduce system disruption due to kswapd followup V3

2013-07-15 Thread Hush Bensen
于 2013/5/30 7:17, Mel Gorman 写道:
 tldr; Overall the system is getting less kicked in the face. Scan rates
   between zones is often more balanced than it used to be. There are
   now fewer writes from reclaim context and a reduction in IO wait
   times.

 This series replaces all of the previous follow-up series. It was clear
 that more of the stall logic needed to be in the same place so it is
 comprehensible and easier to predict.

 Changelog since V2
 o Consolidate stall decisions into one place
 o Add is_dirty_writeback for NFS
 o Move accounting around

 Further testing of the Reduce system disruption due to kswapd discovered
 a few problems. First and foremost, it's possible for pages under writeback
 to be freed which will lead to badness. Second, as pages were not being
 swapped the file LRU was being scanned faster and clean file pages were
 being reclaimed. In some cases this results in increased read IO to re-read
 data from disk.  Third, more pages were being written from kswapd context
 which can adversly affect IO performance. Lastly, it was observed that
 PageDirty pages are not necessarily dirty on all filesystems (buffers can be
 clean while PageDirty is set and -writepage generates no IO) and not all
 filesystems set PageWriteback when the page is being written (e.g. ext3).
 This disconnect confuses the reclaim stalling logic. This follow-up series
 is aimed at these problems.

 The tests were based on three kernels

 vanilla:  kernel 3.9 as that is what the current mmotm uses as a baseline
 mmotm-20130522is mmotm as of 22nd May with Reduce system disruption 
 due to
   kswapd applied on top as per what should be in Andrew's tree
   right now
 lessdisrupt-v7r10 is this follow-up series on top of the mmotm kernel

 The first test used memcached+memcachetest while some background IO
 was in progress as implemented by the parallel IO tests implement in
 MM Tests. memcachetest benchmarks how many operations/second memcached
 can service. It starts with no background IO on a freshly created ext4
 filesystem and then re-runs the test with larger amounts of IO in the
 background to roughly simulate a large copy in progress. The expectation
 is that the IO should have little or no impact on memcachetest which is
 running entirely in memory.

 parallelio
  3.9.0   
 3.9.0   3.9.0
vanilla  
 mm1-mmotm-20130522   mm1-lessdisrupt-v7r10
 Ops memcachetest-0M 23117.00 (  0.00%)  22780.00 ( 
 -1.46%)  22763.00 ( -1.53%)
 Ops memcachetest-715M   23774.00 (  0.00%)  23299.00 ( 
 -2.00%)  22934.00 ( -3.53%)
 Ops memcachetest-2385M   4208.00 (  0.00%)  24154.00 
 (474.00%)  23765.00 (464.76%)
 Ops memcachetest-4055M   4104.00 (  0.00%)  25130.00 
 (512.33%)  24614.00 (499.76%)
 Ops io-duration-0M  0.00 (  0.00%)  0.00 (  
 0.00%)  0.00 (  0.00%)
 Ops io-duration-715M   12.00 (  0.00%)  7.00 ( 
 41.67%)  6.00 ( 50.00%)
 Ops io-duration-2385M 116.00 (  0.00%) 21.00 ( 
 81.90%) 21.00 ( 81.90%)
 Ops io-duration-4055M 160.00 (  0.00%) 36.00 ( 
 77.50%) 35.00 ( 78.12%)
 Ops swaptotal-0M0.00 (  0.00%)  0.00 (  
 0.00%)  0.00 (  0.00%)
 Ops swaptotal-715M 140138.00 (  0.00%) 18.00 ( 
 99.99%) 18.00 ( 99.99%)
 Ops swaptotal-2385M385682.00 (  0.00%)  0.00 (  
 0.00%)  0.00 (  0.00%)
 Ops swaptotal-4055M418029.00 (  0.00%)  0.00 (  
 0.00%)  0.00 (  0.00%)
 Ops swapin-0M   0.00 (  0.00%)  0.00 (  
 0.00%)  0.00 (  0.00%)
 Ops swapin-715M   144.00 (  0.00%)  0.00 (  
 0.00%)  0.00 (  0.00%)
 Ops swapin-2385M   134227.00 (  0.00%)  0.00 (  
 0.00%)  0.00 (  0.00%)
 Ops swapin-4055M   125618.00 (  0.00%)  0.00 (  
 0.00%)  0.00 (  0.00%)
 Ops minorfaults-0M1536429.00 (  0.00%)1531632.00 (  
 0.31%)1533541.00 (  0.19%)
 Ops minorfaults-715M  1786996.00 (  0.00%)1612148.00 (  
 9.78%)1608832.00 (  9.97%)
 Ops minorfaults-2385M 1757952.00 (  0.00%)1614874.00 (  
 8.14%)1613541.00 (  8.21%)
 Ops minorfaults-4055M 1774460.00 (  0.00%)1633400.00 (  
 7.95%)1630881.00 (  8.09%)
 Ops majorfaults-0M  1.00 (  0.00%)  0.00 (  
 0.00%)  0.00 (  0.00%)
 Ops majorfaults-715M  184.00 (  0.00%)167.00 (  
 9.24%)166.00 (  9.78%)
 Ops 

Re: [PATCH 0/8] Reduce system disruption due to kswapd followup V3

2013-05-30 Thread Mel Gorman
On Thu, May 30, 2013 at 12:17:29AM +0100, Mel Gorman wrote:
> tldr; Overall the system is getting less kicked in the face. Scan rates
>   between zones is often more balanced than it used to be. There are
>   now fewer writes from reclaim context and a reduction in IO wait
>   times.
> 
> This series replaces all of the previous follow-up series. It was clear
> that more of the stall logic needed to be in the same place so it is
> comprehensible and easier to predict.
> 

There was some unfortunate crossover in timing as I see mmotm has pulled
in the previous follow up series. It would probably be easiest to replace
these patches

mm-vmscan-stall-page-reclaim-and-writeback-pages-based-on-dirty-writepage-pages-encountered.patch
mm-vmscan-stall-page-reclaim-after-a-list-of-pages-have-been-processed.patch
mm-vmscan-take-page-buffers-dirty-and-locked-state-into-account.patch
mm-vmscan-stall-page-reclaim-and-writeback-pages-based-on-dirty-writepage-pages-encountered.patch

with patches 2-8 of this series. The fixup patch
mm-vmscan-block-kswapd-if-it-is-encountering-pages-under-writeback-fix-2.patch
is still the same

Sorry for the inconvenience.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] Reduce system disruption due to kswapd followup V3

2013-05-30 Thread Mel Gorman
On Thu, May 30, 2013 at 12:17:29AM +0100, Mel Gorman wrote:
 tldr; Overall the system is getting less kicked in the face. Scan rates
   between zones is often more balanced than it used to be. There are
   now fewer writes from reclaim context and a reduction in IO wait
   times.
 
 This series replaces all of the previous follow-up series. It was clear
 that more of the stall logic needed to be in the same place so it is
 comprehensible and easier to predict.
 

There was some unfortunate crossover in timing as I see mmotm has pulled
in the previous follow up series. It would probably be easiest to replace
these patches

mm-vmscan-stall-page-reclaim-and-writeback-pages-based-on-dirty-writepage-pages-encountered.patch
mm-vmscan-stall-page-reclaim-after-a-list-of-pages-have-been-processed.patch
mm-vmscan-take-page-buffers-dirty-and-locked-state-into-account.patch
mm-vmscan-stall-page-reclaim-and-writeback-pages-based-on-dirty-writepage-pages-encountered.patch

with patches 2-8 of this series. The fixup patch
mm-vmscan-block-kswapd-if-it-is-encountering-pages-under-writeback-fix-2.patch
is still the same

Sorry for the inconvenience.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/8] Reduce system disruption due to kswapd followup V3

2013-05-29 Thread Mel Gorman
tldr; Overall the system is getting less kicked in the face. Scan rates
between zones is often more balanced than it used to be. There are
now fewer writes from reclaim context and a reduction in IO wait
times.

This series replaces all of the previous follow-up series. It was clear
that more of the stall logic needed to be in the same place so it is
comprehensible and easier to predict.

Changelog since V2
o Consolidate stall decisions into one place
o Add is_dirty_writeback for NFS
o Move accounting around

Further testing of the "Reduce system disruption due to kswapd" discovered
a few problems. First and foremost, it's possible for pages under writeback
to be freed which will lead to badness. Second, as pages were not being
swapped the file LRU was being scanned faster and clean file pages were
being reclaimed. In some cases this results in increased read IO to re-read
data from disk.  Third, more pages were being written from kswapd context
which can adversly affect IO performance. Lastly, it was observed that
PageDirty pages are not necessarily dirty on all filesystems (buffers can be
clean while PageDirty is set and ->writepage generates no IO) and not all
filesystems set PageWriteback when the page is being written (e.g. ext3).
This disconnect confuses the reclaim stalling logic. This follow-up series
is aimed at these problems.

The tests were based on three kernels

vanilla:kernel 3.9 as that is what the current mmotm uses as a baseline
mmotm-20130522  is mmotm as of 22nd May with "Reduce system disruption due to
kswapd" applied on top as per what should be in Andrew's tree
right now
lessdisrupt-v7r10 is this follow-up series on top of the mmotm kernel

The first test used memcached+memcachetest while some background IO
was in progress as implemented by the parallel IO tests implement in
MM Tests. memcachetest benchmarks how many operations/second memcached
can service. It starts with no background IO on a freshly created ext4
filesystem and then re-runs the test with larger amounts of IO in the
background to roughly simulate a large copy in progress. The expectation
is that the IO should have little or no impact on memcachetest which is
running entirely in memory.

parallelio
 3.9.0   3.9.0  
 3.9.0
   vanilla  mm1-mmotm-20130522  
 mm1-lessdisrupt-v7r10
Ops memcachetest-0M 23117.00 (  0.00%)  22780.00 ( -1.46%)  
22763.00 ( -1.53%)
Ops memcachetest-715M   23774.00 (  0.00%)  23299.00 ( -2.00%)  
22934.00 ( -3.53%)
Ops memcachetest-2385M   4208.00 (  0.00%)  24154.00 (474.00%)  
23765.00 (464.76%)
Ops memcachetest-4055M   4104.00 (  0.00%)  25130.00 (512.33%)  
24614.00 (499.76%)
Ops io-duration-0M  0.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops io-duration-715M   12.00 (  0.00%)  7.00 ( 41.67%)  
6.00 ( 50.00%)
Ops io-duration-2385M 116.00 (  0.00%) 21.00 ( 81.90%)  
   21.00 ( 81.90%)
Ops io-duration-4055M 160.00 (  0.00%) 36.00 ( 77.50%)  
   35.00 ( 78.12%)
Ops swaptotal-0M0.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swaptotal-715M 140138.00 (  0.00%) 18.00 ( 99.99%)  
   18.00 ( 99.99%)
Ops swaptotal-2385M385682.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swaptotal-4055M418029.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swapin-0M   0.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swapin-715M   144.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swapin-2385M   134227.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swapin-4055M   125618.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops minorfaults-0M1536429.00 (  0.00%)1531632.00 (  0.31%)  
  1533541.00 (  0.19%)
Ops minorfaults-715M  1786996.00 (  0.00%)1612148.00 (  9.78%)  
  1608832.00 (  9.97%)
Ops minorfaults-2385M 1757952.00 (  0.00%)1614874.00 (  8.14%)  
  1613541.00 (  8.21%)
Ops minorfaults-4055M 1774460.00 (  0.00%)1633400.00 (  7.95%)  
  1630881.00 (  8.09%)
Ops majorfaults-0M  1.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops majorfaults-715M  184.00 (  0.00%)167.00 (  9.24%)  
  166.00 (  9.78%)
Ops majorfaults-2385M   2.00 (  0.00%)155.00 ( 99.37%)  
   93.00 ( 99.62%)
Ops 

[PATCH 0/8] Reduce system disruption due to kswapd followup V3

2013-05-29 Thread Mel Gorman
tldr; Overall the system is getting less kicked in the face. Scan rates
between zones is often more balanced than it used to be. There are
now fewer writes from reclaim context and a reduction in IO wait
times.

This series replaces all of the previous follow-up series. It was clear
that more of the stall logic needed to be in the same place so it is
comprehensible and easier to predict.

Changelog since V2
o Consolidate stall decisions into one place
o Add is_dirty_writeback for NFS
o Move accounting around

Further testing of the Reduce system disruption due to kswapd discovered
a few problems. First and foremost, it's possible for pages under writeback
to be freed which will lead to badness. Second, as pages were not being
swapped the file LRU was being scanned faster and clean file pages were
being reclaimed. In some cases this results in increased read IO to re-read
data from disk.  Third, more pages were being written from kswapd context
which can adversly affect IO performance. Lastly, it was observed that
PageDirty pages are not necessarily dirty on all filesystems (buffers can be
clean while PageDirty is set and -writepage generates no IO) and not all
filesystems set PageWriteback when the page is being written (e.g. ext3).
This disconnect confuses the reclaim stalling logic. This follow-up series
is aimed at these problems.

The tests were based on three kernels

vanilla:kernel 3.9 as that is what the current mmotm uses as a baseline
mmotm-20130522  is mmotm as of 22nd May with Reduce system disruption due to
kswapd applied on top as per what should be in Andrew's tree
right now
lessdisrupt-v7r10 is this follow-up series on top of the mmotm kernel

The first test used memcached+memcachetest while some background IO
was in progress as implemented by the parallel IO tests implement in
MM Tests. memcachetest benchmarks how many operations/second memcached
can service. It starts with no background IO on a freshly created ext4
filesystem and then re-runs the test with larger amounts of IO in the
background to roughly simulate a large copy in progress. The expectation
is that the IO should have little or no impact on memcachetest which is
running entirely in memory.

parallelio
 3.9.0   3.9.0  
 3.9.0
   vanilla  mm1-mmotm-20130522  
 mm1-lessdisrupt-v7r10
Ops memcachetest-0M 23117.00 (  0.00%)  22780.00 ( -1.46%)  
22763.00 ( -1.53%)
Ops memcachetest-715M   23774.00 (  0.00%)  23299.00 ( -2.00%)  
22934.00 ( -3.53%)
Ops memcachetest-2385M   4208.00 (  0.00%)  24154.00 (474.00%)  
23765.00 (464.76%)
Ops memcachetest-4055M   4104.00 (  0.00%)  25130.00 (512.33%)  
24614.00 (499.76%)
Ops io-duration-0M  0.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops io-duration-715M   12.00 (  0.00%)  7.00 ( 41.67%)  
6.00 ( 50.00%)
Ops io-duration-2385M 116.00 (  0.00%) 21.00 ( 81.90%)  
   21.00 ( 81.90%)
Ops io-duration-4055M 160.00 (  0.00%) 36.00 ( 77.50%)  
   35.00 ( 78.12%)
Ops swaptotal-0M0.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swaptotal-715M 140138.00 (  0.00%) 18.00 ( 99.99%)  
   18.00 ( 99.99%)
Ops swaptotal-2385M385682.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swaptotal-4055M418029.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swapin-0M   0.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swapin-715M   144.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swapin-2385M   134227.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops swapin-4055M   125618.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops minorfaults-0M1536429.00 (  0.00%)1531632.00 (  0.31%)  
  1533541.00 (  0.19%)
Ops minorfaults-715M  1786996.00 (  0.00%)1612148.00 (  9.78%)  
  1608832.00 (  9.97%)
Ops minorfaults-2385M 1757952.00 (  0.00%)1614874.00 (  8.14%)  
  1613541.00 (  8.21%)
Ops minorfaults-4055M 1774460.00 (  0.00%)1633400.00 (  7.95%)  
  1630881.00 (  8.09%)
Ops majorfaults-0M  1.00 (  0.00%)  0.00 (  0.00%)  
0.00 (  0.00%)
Ops majorfaults-715M  184.00 (  0.00%)167.00 (  9.24%)  
  166.00 (  9.78%)
Ops majorfaults-2385M   2.00 (  0.00%)155.00 ( 99.37%)  
   93.00 ( 99.62%)
Ops