Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-24 Thread David Rientjes
On Tue, 24 Jul 2018, Kirill A. Shutemov wrote:

> > use_zero_page is currently a simple thp flag, meaning it rejects writes 
> > where val != !!val, so perhaps it would be best to overload it with 
> > additional options?  I can imagine 0x2 defining persistent allocation so 
> > that the hzp is not freed when the refcount goes to 0 and 0x4 defining if 
> > the hzp should be per node.  Implementing persistent allocation fixes our 
> > concern with it, so I'd like to start there.  Comments?
> 
> Why not a separate files?
> 

That works as well.  I'll write a patch for persistent allocation first to 
address our most immediate need.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-24 Thread David Rientjes
On Tue, 24 Jul 2018, Kirill A. Shutemov wrote:

> > use_zero_page is currently a simple thp flag, meaning it rejects writes 
> > where val != !!val, so perhaps it would be best to overload it with 
> > additional options?  I can imagine 0x2 defining persistent allocation so 
> > that the hzp is not freed when the refcount goes to 0 and 0x4 defining if 
> > the hzp should be per node.  Implementing persistent allocation fixes our 
> > concern with it, so I'd like to start there.  Comments?
> 
> Why not a separate files?
> 

That works as well.  I'll write a patch for persistent allocation first to 
address our most immediate need.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-24 Thread Kirill A. Shutemov
On Mon, Jul 23, 2018 at 02:33:08PM -0700, David Rientjes wrote:
> On Mon, 23 Jul 2018, David Rientjes wrote:
> 
> > > > The huge zero page can be reclaimed under memory pressure and, if it 
> > > > is, 
> > > > it is attempted to be allocted again with gfp flags that attempt memory 
> > > > compaction that can become expensive.  If we are constantly under 
> > > > memory 
> > > > pressure, it gets freed and reallocated millions of times always trying 
> > > > to 
> > > > compact memory both directly and by kicking kcompactd in the background.
> > > > 
> > > > It likely should also be per node.
> > > 
> > > Have you benchmarked making the non-huge zero page per-node?
> > > 
> > 
> > Not since we disable it :)  I will, though.  The more concerning issue for 
> > us, modulo CVE-2017-1000405, is the cpu cost of constantly directly 
> > compacting memory for allocating the hzp in real time after it has been 
> > reclaimed.  We've observed this happening tens or hundreds of thousands 
> > of times on some systems.  It will be 2MB per node on x86 if the data 
> > suggests we should make it NUMA aware, I don't think the cost is too high 
> > to leave it persistently available even under memory pressure if 
> > use_zero_page is enabled.
> > 
> 
> Measuring access latency to 4GB of memory on Naples I observe ~6.7% 
> slower access latency intrasocket and ~14% slower intersocket.
> 
> use_zero_page is currently a simple thp flag, meaning it rejects writes 
> where val != !!val, so perhaps it would be best to overload it with 
> additional options?  I can imagine 0x2 defining persistent allocation so 
> that the hzp is not freed when the refcount goes to 0 and 0x4 defining if 
> the hzp should be per node.  Implementing persistent allocation fixes our 
> concern with it, so I'd like to start there.  Comments?

Why not a separate files?

-- 
 Kirill A. Shutemov


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-24 Thread Kirill A. Shutemov
On Mon, Jul 23, 2018 at 02:33:08PM -0700, David Rientjes wrote:
> On Mon, 23 Jul 2018, David Rientjes wrote:
> 
> > > > The huge zero page can be reclaimed under memory pressure and, if it 
> > > > is, 
> > > > it is attempted to be allocted again with gfp flags that attempt memory 
> > > > compaction that can become expensive.  If we are constantly under 
> > > > memory 
> > > > pressure, it gets freed and reallocated millions of times always trying 
> > > > to 
> > > > compact memory both directly and by kicking kcompactd in the background.
> > > > 
> > > > It likely should also be per node.
> > > 
> > > Have you benchmarked making the non-huge zero page per-node?
> > > 
> > 
> > Not since we disable it :)  I will, though.  The more concerning issue for 
> > us, modulo CVE-2017-1000405, is the cpu cost of constantly directly 
> > compacting memory for allocating the hzp in real time after it has been 
> > reclaimed.  We've observed this happening tens or hundreds of thousands 
> > of times on some systems.  It will be 2MB per node on x86 if the data 
> > suggests we should make it NUMA aware, I don't think the cost is too high 
> > to leave it persistently available even under memory pressure if 
> > use_zero_page is enabled.
> > 
> 
> Measuring access latency to 4GB of memory on Naples I observe ~6.7% 
> slower access latency intrasocket and ~14% slower intersocket.
> 
> use_zero_page is currently a simple thp flag, meaning it rejects writes 
> where val != !!val, so perhaps it would be best to overload it with 
> additional options?  I can imagine 0x2 defining persistent allocation so 
> that the hzp is not freed when the refcount goes to 0 and 0x4 defining if 
> the hzp should be per node.  Implementing persistent allocation fixes our 
> concern with it, so I'd like to start there.  Comments?

Why not a separate files?

-- 
 Kirill A. Shutemov


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes
On Mon, 23 Jul 2018, Yang Shi wrote:

> > > I agree to keep it for a while to let that security bug cool down,
> > > however, if
> > > there is no user anymore, it sounds pointless to still keep a dead knob.
> > > 
> > It's not a dead knob.  We use it, and for reasons other than
> > CVE-2017-1000405.  To mitigate the cost of constantly compacting memory to
> > allocate it after it has been freed due to memry pressure, we can either
> > continue to disable it, allow it to be persistently available, or use a
> > new value for use_zero_page to specify it should be persistently
> > available.
> 
> My understanding is the cost of memory compaction is *not* unique for huge
> zero page, right? It is expected when memory pressure is met, even though huge
> zero page is disabled.
> 

It's caused by fragmentation, not necessarily memory pressure.  We've 
disabled it because compacting for tens of thousands of huge zero pages in 
the background has a noticeable impact on cpu.  Additionally, if the hzp 
cannot be allocated at runtime it increases the rss of applications that 
map it, making it unpredictable.  Making it persistent, as I've been 
suggesting, fixes these issues.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes
On Mon, 23 Jul 2018, Yang Shi wrote:

> > > I agree to keep it for a while to let that security bug cool down,
> > > however, if
> > > there is no user anymore, it sounds pointless to still keep a dead knob.
> > > 
> > It's not a dead knob.  We use it, and for reasons other than
> > CVE-2017-1000405.  To mitigate the cost of constantly compacting memory to
> > allocate it after it has been freed due to memry pressure, we can either
> > continue to disable it, allow it to be persistently available, or use a
> > new value for use_zero_page to specify it should be persistently
> > available.
> 
> My understanding is the cost of memory compaction is *not* unique for huge
> zero page, right? It is expected when memory pressure is met, even though huge
> zero page is disabled.
> 

It's caused by fragmentation, not necessarily memory pressure.  We've 
disabled it because compacting for tens of thousands of huge zero pages in 
the background has a noticeable impact on cpu.  Additionally, if the hzp 
cannot be allocated at runtime it increases the rss of applications that 
map it, making it unpredictable.  Making it persistent, as I've been 
suggesting, fixes these issues.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread Yang Shi




On 7/23/18 1:31 PM, David Rientjes wrote:

On Fri, 20 Jul 2018, Yang Shi wrote:


I agree to keep it for a while to let that security bug cool down, however, if
there is no user anymore, it sounds pointless to still keep a dead knob.


It's not a dead knob.  We use it, and for reasons other than
CVE-2017-1000405.  To mitigate the cost of constantly compacting memory to
allocate it after it has been freed due to memry pressure, we can either
continue to disable it, allow it to be persistently available, or use a
new value for use_zero_page to specify it should be persistently
available.


My understanding is the cost of memory compaction is *not* unique for 
huge zero page, right? It is expected when memory pressure is met, even 
though huge zero page is disabled.




Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread Yang Shi




On 7/23/18 1:31 PM, David Rientjes wrote:

On Fri, 20 Jul 2018, Yang Shi wrote:


I agree to keep it for a while to let that security bug cool down, however, if
there is no user anymore, it sounds pointless to still keep a dead knob.


It's not a dead knob.  We use it, and for reasons other than
CVE-2017-1000405.  To mitigate the cost of constantly compacting memory to
allocate it after it has been freed due to memry pressure, we can either
continue to disable it, allow it to be persistently available, or use a
new value for use_zero_page to specify it should be persistently
available.


My understanding is the cost of memory compaction is *not* unique for 
huge zero page, right? It is expected when memory pressure is met, even 
though huge zero page is disabled.




Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread Yang Shi




On 7/23/18 2:33 PM, David Rientjes wrote:

On Mon, 23 Jul 2018, David Rientjes wrote:


The huge zero page can be reclaimed under memory pressure and, if it is,
it is attempted to be allocted again with gfp flags that attempt memory
compaction that can become expensive.  If we are constantly under memory
pressure, it gets freed and reallocated millions of times always trying to
compact memory both directly and by kicking kcompactd in the background.

It likely should also be per node.

Have you benchmarked making the non-huge zero page per-node?


Not since we disable it :)  I will, though.  The more concerning issue for
us, modulo CVE-2017-1000405, is the cpu cost of constantly directly
compacting memory for allocating the hzp in real time after it has been
reclaimed.  We've observed this happening tens or hundreds of thousands
of times on some systems.  It will be 2MB per node on x86 if the data
suggests we should make it NUMA aware, I don't think the cost is too high
to leave it persistently available even under memory pressure if
use_zero_page is enabled.


Measuring access latency to 4GB of memory on Naples I observe ~6.7%
slower access latency intrasocket and ~14% slower intersocket.

use_zero_page is currently a simple thp flag, meaning it rejects writes
where val != !!val, so perhaps it would be best to overload it with
additional options?  I can imagine 0x2 defining persistent allocation so
that the hzp is not freed when the refcount goes to 0 and 0x4 defining if
the hzp should be per node.  Implementing persistent allocation fixes our
concern with it, so I'd like to start there.  Comments?


Sounds worth trying to me :-)  It might be worth making it persistent by 
default. Keeping 2MB memory unreclaimable sounds not harmful for the use 
case which prefer to use THP.





Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread Yang Shi




On 7/23/18 2:33 PM, David Rientjes wrote:

On Mon, 23 Jul 2018, David Rientjes wrote:


The huge zero page can be reclaimed under memory pressure and, if it is,
it is attempted to be allocted again with gfp flags that attempt memory
compaction that can become expensive.  If we are constantly under memory
pressure, it gets freed and reallocated millions of times always trying to
compact memory both directly and by kicking kcompactd in the background.

It likely should also be per node.

Have you benchmarked making the non-huge zero page per-node?


Not since we disable it :)  I will, though.  The more concerning issue for
us, modulo CVE-2017-1000405, is the cpu cost of constantly directly
compacting memory for allocating the hzp in real time after it has been
reclaimed.  We've observed this happening tens or hundreds of thousands
of times on some systems.  It will be 2MB per node on x86 if the data
suggests we should make it NUMA aware, I don't think the cost is too high
to leave it persistently available even under memory pressure if
use_zero_page is enabled.


Measuring access latency to 4GB of memory on Naples I observe ~6.7%
slower access latency intrasocket and ~14% slower intersocket.

use_zero_page is currently a simple thp flag, meaning it rejects writes
where val != !!val, so perhaps it would be best to overload it with
additional options?  I can imagine 0x2 defining persistent allocation so
that the hzp is not freed when the refcount goes to 0 and 0x4 defining if
the hzp should be per node.  Implementing persistent allocation fixes our
concern with it, so I'd like to start there.  Comments?


Sounds worth trying to me :-)  It might be worth making it persistent by 
default. Keeping 2MB memory unreclaimable sounds not harmful for the use 
case which prefer to use THP.





Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes
On Mon, 23 Jul 2018, David Rientjes wrote:

> > > The huge zero page can be reclaimed under memory pressure and, if it is, 
> > > it is attempted to be allocted again with gfp flags that attempt memory 
> > > compaction that can become expensive.  If we are constantly under memory 
> > > pressure, it gets freed and reallocated millions of times always trying 
> > > to 
> > > compact memory both directly and by kicking kcompactd in the background.
> > > 
> > > It likely should also be per node.
> > 
> > Have you benchmarked making the non-huge zero page per-node?
> > 
> 
> Not since we disable it :)  I will, though.  The more concerning issue for 
> us, modulo CVE-2017-1000405, is the cpu cost of constantly directly 
> compacting memory for allocating the hzp in real time after it has been 
> reclaimed.  We've observed this happening tens or hundreds of thousands 
> of times on some systems.  It will be 2MB per node on x86 if the data 
> suggests we should make it NUMA aware, I don't think the cost is too high 
> to leave it persistently available even under memory pressure if 
> use_zero_page is enabled.
> 

Measuring access latency to 4GB of memory on Naples I observe ~6.7% 
slower access latency intrasocket and ~14% slower intersocket.

use_zero_page is currently a simple thp flag, meaning it rejects writes 
where val != !!val, so perhaps it would be best to overload it with 
additional options?  I can imagine 0x2 defining persistent allocation so 
that the hzp is not freed when the refcount goes to 0 and 0x4 defining if 
the hzp should be per node.  Implementing persistent allocation fixes our 
concern with it, so I'd like to start there.  Comments?


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes
On Mon, 23 Jul 2018, David Rientjes wrote:

> > > The huge zero page can be reclaimed under memory pressure and, if it is, 
> > > it is attempted to be allocted again with gfp flags that attempt memory 
> > > compaction that can become expensive.  If we are constantly under memory 
> > > pressure, it gets freed and reallocated millions of times always trying 
> > > to 
> > > compact memory both directly and by kicking kcompactd in the background.
> > > 
> > > It likely should also be per node.
> > 
> > Have you benchmarked making the non-huge zero page per-node?
> > 
> 
> Not since we disable it :)  I will, though.  The more concerning issue for 
> us, modulo CVE-2017-1000405, is the cpu cost of constantly directly 
> compacting memory for allocating the hzp in real time after it has been 
> reclaimed.  We've observed this happening tens or hundreds of thousands 
> of times on some systems.  It will be 2MB per node on x86 if the data 
> suggests we should make it NUMA aware, I don't think the cost is too high 
> to leave it persistently available even under memory pressure if 
> use_zero_page is enabled.
> 

Measuring access latency to 4GB of memory on Naples I observe ~6.7% 
slower access latency intrasocket and ~14% slower intersocket.

use_zero_page is currently a simple thp flag, meaning it rejects writes 
where val != !!val, so perhaps it would be best to overload it with 
additional options?  I can imagine 0x2 defining persistent allocation so 
that the hzp is not freed when the refcount goes to 0 and 0x4 defining if 
the hzp should be per node.  Implementing persistent allocation fixes our 
concern with it, so I'd like to start there.  Comments?


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes
On Fri, 20 Jul 2018, Yang Shi wrote:

> I agree to keep it for a while to let that security bug cool down, however, if
> there is no user anymore, it sounds pointless to still keep a dead knob.
> 

It's not a dead knob.  We use it, and for reasons other than 
CVE-2017-1000405.  To mitigate the cost of constantly compacting memory to 
allocate it after it has been freed due to memry pressure, we can either 
continue to disable it, allow it to be persistently available, or use a 
new value for use_zero_page to specify it should be persistently 
available.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes
On Fri, 20 Jul 2018, Yang Shi wrote:

> I agree to keep it for a while to let that security bug cool down, however, if
> there is no user anymore, it sounds pointless to still keep a dead knob.
> 

It's not a dead knob.  We use it, and for reasons other than 
CVE-2017-1000405.  To mitigate the cost of constantly compacting memory to 
allocate it after it has been freed due to memry pressure, we can either 
continue to disable it, allow it to be persistently available, or use a 
new value for use_zero_page to specify it should be persistently 
available.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes
On Sat, 21 Jul 2018, Matthew Wilcox wrote:

> > The huge zero page can be reclaimed under memory pressure and, if it is, 
> > it is attempted to be allocted again with gfp flags that attempt memory 
> > compaction that can become expensive.  If we are constantly under memory 
> > pressure, it gets freed and reallocated millions of times always trying to 
> > compact memory both directly and by kicking kcompactd in the background.
> > 
> > It likely should also be per node.
> 
> Have you benchmarked making the non-huge zero page per-node?
> 

Not since we disable it :)  I will, though.  The more concerning issue for 
us, modulo CVE-2017-1000405, is the cpu cost of constantly directly 
compacting memory for allocating the hzp in real time after it has been 
reclaimed.  We've observed this happening tens or hundreds of thousands 
of times on some systems.  It will be 2MB per node on x86 if the data 
suggests we should make it NUMA aware, I don't think the cost is too high 
to leave it persistently available even under memory pressure if 
use_zero_page is enabled.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-23 Thread David Rientjes
On Sat, 21 Jul 2018, Matthew Wilcox wrote:

> > The huge zero page can be reclaimed under memory pressure and, if it is, 
> > it is attempted to be allocted again with gfp flags that attempt memory 
> > compaction that can become expensive.  If we are constantly under memory 
> > pressure, it gets freed and reallocated millions of times always trying to 
> > compact memory both directly and by kicking kcompactd in the background.
> > 
> > It likely should also be per node.
> 
> Have you benchmarked making the non-huge zero page per-node?
> 

Not since we disable it :)  I will, though.  The more concerning issue for 
us, modulo CVE-2017-1000405, is the cpu cost of constantly directly 
compacting memory for allocating the hzp in real time after it has been 
reclaimed.  We've observed this happening tens or hundreds of thousands 
of times on some systems.  It will be 2MB per node on x86 if the data 
suggests we should make it NUMA aware, I don't think the cost is too high 
to leave it persistently available even under memory pressure if 
use_zero_page is enabled.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-21 Thread Matthew Wilcox
On Fri, Jul 20, 2018 at 02:05:52PM -0700, David Rientjes wrote:
> The huge zero page can be reclaimed under memory pressure and, if it is, 
> it is attempted to be allocted again with gfp flags that attempt memory 
> compaction that can become expensive.  If we are constantly under memory 
> pressure, it gets freed and reallocated millions of times always trying to 
> compact memory both directly and by kicking kcompactd in the background.
> 
> It likely should also be per node.

Have you benchmarked making the non-huge zero page per-node?


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-21 Thread Matthew Wilcox
On Fri, Jul 20, 2018 at 02:05:52PM -0700, David Rientjes wrote:
> The huge zero page can be reclaimed under memory pressure and, if it is, 
> it is attempted to be allocted again with gfp flags that attempt memory 
> compaction that can become expensive.  If we are constantly under memory 
> pressure, it gets freed and reallocated millions of times always trying to 
> compact memory both directly and by kicking kcompactd in the background.
> 
> It likely should also be per node.

Have you benchmarked making the non-huge zero page per-node?


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread Yang Shi




On 7/20/18 2:06 PM, Kirill A. Shutemov wrote:

On Sat, Jul 21, 2018 at 02:13:50AM +0800, Yang Shi wrote:

By digging into the original review, it looks use_zero_page sysfs knob
was added to help ease-of-testing and give user a way to mitigate
refcounting overhead.

It has been a few years since the knob was added at the first place, I
think we are confident that it is stable enough. And, since commit
6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
it looks refcounting overhead has been reduced significantly.

Other than the above, the value of the knob is always 1 (enabled by
default), I'm supposed very few people turn it off by default.

So, it sounds not worth to still keep this knob around.

I don't think that having the knob around is huge maintenance burden.
And since it helped to workaround a security bug relative recently I would
rather keep it.


I agree to keep it for a while to let that security bug cool down, 
however, if there is no user anymore, it sounds pointless to still keep 
a dead knob.








Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread Yang Shi




On 7/20/18 2:06 PM, Kirill A. Shutemov wrote:

On Sat, Jul 21, 2018 at 02:13:50AM +0800, Yang Shi wrote:

By digging into the original review, it looks use_zero_page sysfs knob
was added to help ease-of-testing and give user a way to mitigate
refcounting overhead.

It has been a few years since the knob was added at the first place, I
think we are confident that it is stable enough. And, since commit
6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
it looks refcounting overhead has been reduced significantly.

Other than the above, the value of the knob is always 1 (enabled by
default), I'm supposed very few people turn it off by default.

So, it sounds not worth to still keep this knob around.

I don't think that having the knob around is huge maintenance burden.
And since it helped to workaround a security bug relative recently I would
rather keep it.


I agree to keep it for a while to let that security bug cool down, 
however, if there is no user anymore, it sounds pointless to still keep 
a dead knob.








Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread Yang Shi




On 7/20/18 2:05 PM, David Rientjes wrote:

On Fri, 20 Jul 2018, Yang Shi wrote:


We disable the huge zero page through this interface, there were issues
related to the huge zero page shrinker (probably best to never free a
per-node huge zero page after allocated) and CVE-2017-1000405 for huge
dirty COW.

Thanks for the information. It looks the CVE has been resolved by commit
a8f97366452ed491d13cf1e44241bc0b5740b1f0 ("mm, thp: Do not make page table
dirty unconditionally in touch_p[mu]d()"), which is in 4.15 already.


For users who run kernels earlier than 4.15 they may choose to mitigate
the CVE by using this tunable.  It's not something we permanently need to
have, but it may likely be too early.


Yes, it might be good to keep it around for a while.




What was the shrinker related issue? I'm supposed it has been resolved, right?


The huge zero page can be reclaimed under memory pressure and, if it is,
it is attempted to be allocted again with gfp flags that attempt memory
compaction that can become expensive.  If we are constantly under memory
pressure, it gets freed and reallocated millions of times always trying to
compact memory both directly and by kicking kcompactd in the background.


Even though we don't use huge zero page, we may also run into the 
similar issue under memory pressure. Just save the cost of calling huge 
zero page shrinker, but actually its cost sound not high.




It likely should also be per node.





Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread Yang Shi




On 7/20/18 2:05 PM, David Rientjes wrote:

On Fri, 20 Jul 2018, Yang Shi wrote:


We disable the huge zero page through this interface, there were issues
related to the huge zero page shrinker (probably best to never free a
per-node huge zero page after allocated) and CVE-2017-1000405 for huge
dirty COW.

Thanks for the information. It looks the CVE has been resolved by commit
a8f97366452ed491d13cf1e44241bc0b5740b1f0 ("mm, thp: Do not make page table
dirty unconditionally in touch_p[mu]d()"), which is in 4.15 already.


For users who run kernels earlier than 4.15 they may choose to mitigate
the CVE by using this tunable.  It's not something we permanently need to
have, but it may likely be too early.


Yes, it might be good to keep it around for a while.




What was the shrinker related issue? I'm supposed it has been resolved, right?


The huge zero page can be reclaimed under memory pressure and, if it is,
it is attempted to be allocted again with gfp flags that attempt memory
compaction that can become expensive.  If we are constantly under memory
pressure, it gets freed and reallocated millions of times always trying to
compact memory both directly and by kicking kcompactd in the background.


Even though we don't use huge zero page, we may also run into the 
similar issue under memory pressure. Just save the cost of calling huge 
zero page shrinker, but actually its cost sound not high.




It likely should also be per node.





Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread Kirill A. Shutemov
On Sat, Jul 21, 2018 at 02:13:50AM +0800, Yang Shi wrote:
> By digging into the original review, it looks use_zero_page sysfs knob
> was added to help ease-of-testing and give user a way to mitigate
> refcounting overhead.
> 
> It has been a few years since the knob was added at the first place, I
> think we are confident that it is stable enough. And, since commit
> 6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
> it looks refcounting overhead has been reduced significantly.
> 
> Other than the above, the value of the knob is always 1 (enabled by
> default), I'm supposed very few people turn it off by default.
> 
> So, it sounds not worth to still keep this knob around.

I don't think that having the knob around is huge maintenance burden.
And since it helped to workaround a security bug relative recently I would
rather keep it.

-- 
 Kirill A. Shutemov


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread Kirill A. Shutemov
On Sat, Jul 21, 2018 at 02:13:50AM +0800, Yang Shi wrote:
> By digging into the original review, it looks use_zero_page sysfs knob
> was added to help ease-of-testing and give user a way to mitigate
> refcounting overhead.
> 
> It has been a few years since the knob was added at the first place, I
> think we are confident that it is stable enough. And, since commit
> 6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
> it looks refcounting overhead has been reduced significantly.
> 
> Other than the above, the value of the knob is always 1 (enabled by
> default), I'm supposed very few people turn it off by default.
> 
> So, it sounds not worth to still keep this knob around.

I don't think that having the knob around is huge maintenance burden.
And since it helped to workaround a security bug relative recently I would
rather keep it.

-- 
 Kirill A. Shutemov


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread David Rientjes
On Fri, 20 Jul 2018, Yang Shi wrote:

> > We disable the huge zero page through this interface, there were issues
> > related to the huge zero page shrinker (probably best to never free a
> > per-node huge zero page after allocated) and CVE-2017-1000405 for huge
> > dirty COW.
> 
> Thanks for the information. It looks the CVE has been resolved by commit
> a8f97366452ed491d13cf1e44241bc0b5740b1f0 ("mm, thp: Do not make page table
> dirty unconditionally in touch_p[mu]d()"), which is in 4.15 already.
> 

For users who run kernels earlier than 4.15 they may choose to mitigate 
the CVE by using this tunable.  It's not something we permanently need to 
have, but it may likely be too early.

> What was the shrinker related issue? I'm supposed it has been resolved, right?
> 

The huge zero page can be reclaimed under memory pressure and, if it is, 
it is attempted to be allocted again with gfp flags that attempt memory 
compaction that can become expensive.  If we are constantly under memory 
pressure, it gets freed and reallocated millions of times always trying to 
compact memory both directly and by kicking kcompactd in the background.

It likely should also be per node.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread David Rientjes
On Fri, 20 Jul 2018, Yang Shi wrote:

> > We disable the huge zero page through this interface, there were issues
> > related to the huge zero page shrinker (probably best to never free a
> > per-node huge zero page after allocated) and CVE-2017-1000405 for huge
> > dirty COW.
> 
> Thanks for the information. It looks the CVE has been resolved by commit
> a8f97366452ed491d13cf1e44241bc0b5740b1f0 ("mm, thp: Do not make page table
> dirty unconditionally in touch_p[mu]d()"), which is in 4.15 already.
> 

For users who run kernels earlier than 4.15 they may choose to mitigate 
the CVE by using this tunable.  It's not something we permanently need to 
have, but it may likely be too early.

> What was the shrinker related issue? I'm supposed it has been resolved, right?
> 

The huge zero page can be reclaimed under memory pressure and, if it is, 
it is attempted to be allocted again with gfp flags that attempt memory 
compaction that can become expensive.  If we are constantly under memory 
pressure, it gets freed and reallocated millions of times always trying to 
compact memory both directly and by kicking kcompactd in the background.

It likely should also be per node.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread Yang Shi




On 7/20/18 1:02 PM, David Rientjes wrote:

On Fri, 20 Jul 2018, Andrew Morton wrote:


By digging into the original review, it looks use_zero_page sysfs knob
was added to help ease-of-testing and give user a way to mitigate
refcounting overhead.

It has been a few years since the knob was added at the first place, I
think we are confident that it is stable enough. And, since commit
6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
it looks refcounting overhead has been reduced significantly.

Other than the above, the value of the knob is always 1 (enabled by
default), I'm supposed very few people turn it off by default.

So, it sounds not worth to still keep this knob around.

Probably OK.  Might not be OK, nobody knows.

It's been there for seven years so another six months won't kill us.
How about as an intermediate step we add a printk("use_zero_page is
scheduled for removal.  Please contact linux...@kvack.org if you need
it").


We disable the huge zero page through this interface, there were issues
related to the huge zero page shrinker (probably best to never free a
per-node huge zero page after allocated) and CVE-2017-1000405 for huge
dirty COW.


Thanks for the information. It looks the CVE has been resolved by commit 
a8f97366452ed491d13cf1e44241bc0b5740b1f0 ("mm, thp: Do not make page 
table dirty unconditionally in touch_p[mu]d()"), which is in 4.15 already.


What was the shrinker related issue? I'm supposed it has been resolved, 
right?





Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread Yang Shi




On 7/20/18 1:02 PM, David Rientjes wrote:

On Fri, 20 Jul 2018, Andrew Morton wrote:


By digging into the original review, it looks use_zero_page sysfs knob
was added to help ease-of-testing and give user a way to mitigate
refcounting overhead.

It has been a few years since the knob was added at the first place, I
think we are confident that it is stable enough. And, since commit
6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
it looks refcounting overhead has been reduced significantly.

Other than the above, the value of the knob is always 1 (enabled by
default), I'm supposed very few people turn it off by default.

So, it sounds not worth to still keep this knob around.

Probably OK.  Might not be OK, nobody knows.

It's been there for seven years so another six months won't kill us.
How about as an intermediate step we add a printk("use_zero_page is
scheduled for removal.  Please contact linux...@kvack.org if you need
it").


We disable the huge zero page through this interface, there were issues
related to the huge zero page shrinker (probably best to never free a
per-node huge zero page after allocated) and CVE-2017-1000405 for huge
dirty COW.


Thanks for the information. It looks the CVE has been resolved by commit 
a8f97366452ed491d13cf1e44241bc0b5740b1f0 ("mm, thp: Do not make page 
table dirty unconditionally in touch_p[mu]d()"), which is in 4.15 already.


What was the shrinker related issue? I'm supposed it has been resolved, 
right?





Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread David Rientjes
On Fri, 20 Jul 2018, Andrew Morton wrote:

> > By digging into the original review, it looks use_zero_page sysfs knob
> > was added to help ease-of-testing and give user a way to mitigate
> > refcounting overhead.
> > 
> > It has been a few years since the knob was added at the first place, I
> > think we are confident that it is stable enough. And, since commit
> > 6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
> > it looks refcounting overhead has been reduced significantly.
> > 
> > Other than the above, the value of the knob is always 1 (enabled by
> > default), I'm supposed very few people turn it off by default.
> > 
> > So, it sounds not worth to still keep this knob around.
> 
> Probably OK.  Might not be OK, nobody knows.
> 
> It's been there for seven years so another six months won't kill us. 
> How about as an intermediate step we add a printk("use_zero_page is
> scheduled for removal.  Please contact linux...@kvack.org if you need
> it").
> 

We disable the huge zero page through this interface, there were issues 
related to the huge zero page shrinker (probably best to never free a 
per-node huge zero page after allocated) and CVE-2017-1000405 for huge 
dirty COW.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread David Rientjes
On Fri, 20 Jul 2018, Andrew Morton wrote:

> > By digging into the original review, it looks use_zero_page sysfs knob
> > was added to help ease-of-testing and give user a way to mitigate
> > refcounting overhead.
> > 
> > It has been a few years since the knob was added at the first place, I
> > think we are confident that it is stable enough. And, since commit
> > 6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
> > it looks refcounting overhead has been reduced significantly.
> > 
> > Other than the above, the value of the knob is always 1 (enabled by
> > default), I'm supposed very few people turn it off by default.
> > 
> > So, it sounds not worth to still keep this knob around.
> 
> Probably OK.  Might not be OK, nobody knows.
> 
> It's been there for seven years so another six months won't kill us. 
> How about as an intermediate step we add a printk("use_zero_page is
> scheduled for removal.  Please contact linux...@kvack.org if you need
> it").
> 

We disable the huge zero page through this interface, there were issues 
related to the huge zero page shrinker (probably best to never free a 
per-node huge zero page after allocated) and CVE-2017-1000405 for huge 
dirty COW.


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread Andrew Morton
On Sat, 21 Jul 2018 02:13:50 +0800 Yang Shi  wrote:

> By digging into the original review, it looks use_zero_page sysfs knob
> was added to help ease-of-testing and give user a way to mitigate
> refcounting overhead.
> 
> It has been a few years since the knob was added at the first place, I
> think we are confident that it is stable enough. And, since commit
> 6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
> it looks refcounting overhead has been reduced significantly.
> 
> Other than the above, the value of the knob is always 1 (enabled by
> default), I'm supposed very few people turn it off by default.
> 
> So, it sounds not worth to still keep this knob around.

Probably OK.  Might not be OK, nobody knows.

It's been there for seven years so another six months won't kill us. 
How about as an intermediate step we add a printk("use_zero_page is
scheduled for removal.  Please contact linux...@kvack.org if you need
it").


Re: [PATCH] mm: thp: remove use_zero_page sysfs knob

2018-07-20 Thread Andrew Morton
On Sat, 21 Jul 2018 02:13:50 +0800 Yang Shi  wrote:

> By digging into the original review, it looks use_zero_page sysfs knob
> was added to help ease-of-testing and give user a way to mitigate
> refcounting overhead.
> 
> It has been a few years since the knob was added at the first place, I
> think we are confident that it is stable enough. And, since commit
> 6fcb52a56ff60 ("thp: reduce usage of huge zero page's atomic counter"),
> it looks refcounting overhead has been reduced significantly.
> 
> Other than the above, the value of the knob is always 1 (enabled by
> default), I'm supposed very few people turn it off by default.
> 
> So, it sounds not worth to still keep this knob around.

Probably OK.  Might not be OK, nobody knows.

It's been there for seven years so another six months won't kill us. 
How about as an intermediate step we add a printk("use_zero_page is
scheduled for removal.  Please contact linux...@kvack.org if you need
it").