Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-21 Thread Michal Hocko
It seems that this email didn't get delivered due to some stupid gmail
spam policy. Let me try to repost via a different relay. Sorry to those
who have seen the original message and get a duplicate now.

On Wed 21-12-16 08:03:53, Michal Hocko wrote:
> On Tue 20-12-16 14:13:41, Andrew Morton wrote:
> > On Tue, 20 Dec 2016 09:38:22 -0800 Joe Perches  wrote:
> > 
> > > > So what are we going to do about this patch?
> > > 
> > > Well if Andrew doesn't object again, it should probably be applied.
> > > Unless his silence here acts like a pocket-veto.
> > > 
> > > Andrew?  Anything to add?
> > 
> > I guess we should give in to reality and do this, or something like it.
> > But Al said he was going to dig out some patches for us to consider?
> 
> Al wanted to cover vmalloc GFP_NOFS context _inside_ the vmalloc
> code.  This is mostly orthogonal to this patch I believe. Besides
> that I _think_ that it would be better to convert those vmalloc(NOFS)
> users to the scope api rather than tweak the vmalloc. One reason to go
> that way is that those vmalloc(NOFS) users need to be checked anyway
> and something tells me that some of them can really be changed to
> GFP_KERNEL.
> 
> This helper is clear about its gfp mask expectation and complain loudly
> if somebody wants something unexpected which is a good start I believe.
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-21 Thread Michal Hocko
It seems that this email didn't get delivered due to some stupid gmail
spam policy. Let me try to repost via a different relay. Sorry to those
who have seen the original message and get a duplicate now.

On Wed 21-12-16 08:03:53, Michal Hocko wrote:
> On Tue 20-12-16 14:13:41, Andrew Morton wrote:
> > On Tue, 20 Dec 2016 09:38:22 -0800 Joe Perches  wrote:
> > 
> > > > So what are we going to do about this patch?
> > > 
> > > Well if Andrew doesn't object again, it should probably be applied.
> > > Unless his silence here acts like a pocket-veto.
> > > 
> > > Andrew?  Anything to add?
> > 
> > I guess we should give in to reality and do this, or something like it.
> > But Al said he was going to dig out some patches for us to consider?
> 
> Al wanted to cover vmalloc GFP_NOFS context _inside_ the vmalloc
> code.  This is mostly orthogonal to this patch I believe. Besides
> that I _think_ that it would be better to convert those vmalloc(NOFS)
> users to the scope api rather than tweak the vmalloc. One reason to go
> that way is that those vmalloc(NOFS) users need to be checked anyway
> and something tells me that some of them can really be changed to
> GFP_KERNEL.
> 
> This helper is clear about its gfp mask expectation and complain loudly
> if somebody wants something unexpected which is a good start I believe.
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-20 Thread Andrew Morton
On Tue, 20 Dec 2016 09:38:22 -0800 Joe Perches  wrote:

> > So what are we going to do about this patch?
> 
> Well if Andrew doesn't object again, it should probably be applied.
> Unless his silence here acts like a pocket-veto.
> 
> Andrew?  Anything to add?

I guess we should give in to reality and do this, or something like it.
But Al said he was going to dig out some patches for us to consider?




Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-20 Thread Andrew Morton
On Tue, 20 Dec 2016 09:38:22 -0800 Joe Perches  wrote:

> > So what are we going to do about this patch?
> 
> Well if Andrew doesn't object again, it should probably be applied.
> Unless his silence here acts like a pocket-veto.
> 
> Andrew?  Anything to add?

I guess we should give in to reality and do this, or something like it.
But Al said he was going to dig out some patches for us to consider?




Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-20 Thread Joe Perches
On Tue, 2016-12-20 at 14:50 +0100, Michal Hocko wrote:
> On Wed 14-12-16 09:59:16, Michal Hocko wrote:
> > On Tue 13-12-16 14:07:33, Joe Perches wrote:
> > > On Tue, 2016-12-13 at 11:14 +0100, Michal Hocko wrote:
> > > > Are there any more comments or objections to this patch? Is this a good
> > > > start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> > > > well in the initial version.
> > > 
> > > Did Andrew Morton ever comment on this?
> > > I believe he was the primary objector in the past.
> > > 
> > > Last I recollect was over a year ago:
> > > 
> > > https://lkml.org/lkml/2015/7/7/1050
> > 
> > Let me quote:
> > : Sigh.  We've resisted doing this because vmalloc() is somewhat of a bad
> > : thing, and we don't want to make it easy for people to do bad things.
> > : 
> > : And vmalloc is bad because a) it's slow and b) it does GFP_KERNEL
> > : allocations for page tables and c) it is susceptible to arena
> > : fragmentation.
> > : 
> > : We'd prefer that people fix their junk so it doesn't depend upon large
> > : contiguous allocations.  This isn't userspace - kernel space is hostile
> > : and kernel code should be robust.
> > : 
> > : So I dunno.  Should we continue to make it a bit more awkward to use
> > : vmalloc()?  Probably that tactic isn't being very successful - people
> > : will just go ahead and open-code it.  And given the surprising amount
> > : of stuff you've placed in kvmalloc_node(), they'll implement it
> > : incorrectly...
> > : 
> > : How about we compromise: add kvmalloc_node(), but include a BUG_ON("you
> > : suck") to it?
> > 
> > While I agree with some of those points, the reality really sucks,
> > though. We have tried the same tactic with __GFP_NOFAIL and failed as
> > well. I guess we should just bite the bullet and provide an api which is
> > so common that people keep reinventing their own ways around that, many
> > times wrongly or suboptimally. BUG_ON("you suck") is just not going to
> > help much I am afraid.
> > 
> > What do you think Andrew?
> 
> So what are we going to do about this patch?

Well if Andrew doesn't object again, it should probably be applied.
Unless his silence here acts like a pocket-veto.

Andrew?  Anything to add?



Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-20 Thread Joe Perches
On Tue, 2016-12-20 at 14:50 +0100, Michal Hocko wrote:
> On Wed 14-12-16 09:59:16, Michal Hocko wrote:
> > On Tue 13-12-16 14:07:33, Joe Perches wrote:
> > > On Tue, 2016-12-13 at 11:14 +0100, Michal Hocko wrote:
> > > > Are there any more comments or objections to this patch? Is this a good
> > > > start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> > > > well in the initial version.
> > > 
> > > Did Andrew Morton ever comment on this?
> > > I believe he was the primary objector in the past.
> > > 
> > > Last I recollect was over a year ago:
> > > 
> > > https://lkml.org/lkml/2015/7/7/1050
> > 
> > Let me quote:
> > : Sigh.  We've resisted doing this because vmalloc() is somewhat of a bad
> > : thing, and we don't want to make it easy for people to do bad things.
> > : 
> > : And vmalloc is bad because a) it's slow and b) it does GFP_KERNEL
> > : allocations for page tables and c) it is susceptible to arena
> > : fragmentation.
> > : 
> > : We'd prefer that people fix their junk so it doesn't depend upon large
> > : contiguous allocations.  This isn't userspace - kernel space is hostile
> > : and kernel code should be robust.
> > : 
> > : So I dunno.  Should we continue to make it a bit more awkward to use
> > : vmalloc()?  Probably that tactic isn't being very successful - people
> > : will just go ahead and open-code it.  And given the surprising amount
> > : of stuff you've placed in kvmalloc_node(), they'll implement it
> > : incorrectly...
> > : 
> > : How about we compromise: add kvmalloc_node(), but include a BUG_ON("you
> > : suck") to it?
> > 
> > While I agree with some of those points, the reality really sucks,
> > though. We have tried the same tactic with __GFP_NOFAIL and failed as
> > well. I guess we should just bite the bullet and provide an api which is
> > so common that people keep reinventing their own ways around that, many
> > times wrongly or suboptimally. BUG_ON("you suck") is just not going to
> > help much I am afraid.
> > 
> > What do you think Andrew?
> 
> So what are we going to do about this patch?

Well if Andrew doesn't object again, it should probably be applied.
Unless his silence here acts like a pocket-veto.

Andrew?  Anything to add?



Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-20 Thread Michal Hocko
On Wed 14-12-16 09:59:16, Michal Hocko wrote:
> On Tue 13-12-16 14:07:33, Joe Perches wrote:
> > On Tue, 2016-12-13 at 11:14 +0100, Michal Hocko wrote:
> > > Are there any more comments or objections to this patch? Is this a good
> > > start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> > > well in the initial version.
> > 
> > Did Andrew Morton ever comment on this?
> > I believe he was the primary objector in the past.
> > 
> > Last I recollect was over a year ago:
> > 
> > https://lkml.org/lkml/2015/7/7/1050
> 
> Let me quote:
> : Sigh.  We've resisted doing this because vmalloc() is somewhat of a bad
> : thing, and we don't want to make it easy for people to do bad things.
> : 
> : And vmalloc is bad because a) it's slow and b) it does GFP_KERNEL
> : allocations for page tables and c) it is susceptible to arena
> : fragmentation.
> : 
> : We'd prefer that people fix their junk so it doesn't depend upon large
> : contiguous allocations.  This isn't userspace - kernel space is hostile
> : and kernel code should be robust.
> : 
> : So I dunno.  Should we continue to make it a bit more awkward to use
> : vmalloc()?  Probably that tactic isn't being very successful - people
> : will just go ahead and open-code it.  And given the surprising amount
> : of stuff you've placed in kvmalloc_node(), they'll implement it
> : incorrectly...
> : 
> : How about we compromise: add kvmalloc_node(), but include a BUG_ON("you
> : suck") to it?
> 
> While I agree with some of those points, the reality really sucks,
> though. We have tried the same tactic with __GFP_NOFAIL and failed as
> well. I guess we should just bite the bullet and provide an api which is
> so common that people keep reinventing their own ways around that, many
> times wrongly or suboptimally. BUG_ON("you suck") is just not going to
> help much I am afraid.
> 
> What do you think Andrew?

So what are we going to do about this patch?
-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-20 Thread Michal Hocko
On Wed 14-12-16 09:59:16, Michal Hocko wrote:
> On Tue 13-12-16 14:07:33, Joe Perches wrote:
> > On Tue, 2016-12-13 at 11:14 +0100, Michal Hocko wrote:
> > > Are there any more comments or objections to this patch? Is this a good
> > > start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> > > well in the initial version.
> > 
> > Did Andrew Morton ever comment on this?
> > I believe he was the primary objector in the past.
> > 
> > Last I recollect was over a year ago:
> > 
> > https://lkml.org/lkml/2015/7/7/1050
> 
> Let me quote:
> : Sigh.  We've resisted doing this because vmalloc() is somewhat of a bad
> : thing, and we don't want to make it easy for people to do bad things.
> : 
> : And vmalloc is bad because a) it's slow and b) it does GFP_KERNEL
> : allocations for page tables and c) it is susceptible to arena
> : fragmentation.
> : 
> : We'd prefer that people fix their junk so it doesn't depend upon large
> : contiguous allocations.  This isn't userspace - kernel space is hostile
> : and kernel code should be robust.
> : 
> : So I dunno.  Should we continue to make it a bit more awkward to use
> : vmalloc()?  Probably that tactic isn't being very successful - people
> : will just go ahead and open-code it.  And given the surprising amount
> : of stuff you've placed in kvmalloc_node(), they'll implement it
> : incorrectly...
> : 
> : How about we compromise: add kvmalloc_node(), but include a BUG_ON("you
> : suck") to it?
> 
> While I agree with some of those points, the reality really sucks,
> though. We have tried the same tactic with __GFP_NOFAIL and failed as
> well. I guess we should just bite the bullet and provide an api which is
> so common that people keep reinventing their own ways around that, many
> times wrongly or suboptimally. BUG_ON("you suck") is just not going to
> help much I am afraid.
> 
> What do you think Andrew?

So what are we going to do about this patch?
-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-14 Thread Michal Hocko
On Tue 13-12-16 13:55:46, Andreas Dilger wrote:
> On Dec 13, 2016, at 3:14 AM, Michal Hocko  wrote:
> > 
> > Are there any more comments or objections to this patch? Is this a good
> > start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> > well in the initial version.
> 
> I'm in favour of this cleanup as a starting point.  I definitely agree
> that this same functionality is in use in a number of places and should
> be consolidated.
> 
> The vmalloc() from GFP_NOFS can be addressed separately in later patches.
> That is an issue for several filesystems, and while XFS works around this,
> it would be better to lift that out of the filesystem code into the VM.

Well, my longer term plan is to change how GFP_NOFS is used from the fs
code rather than tweak the VM layer. The current situation with the nofs
is messy and confusing. In many contexts it is used without a good
reason - just to be sure that nothing will break. I strongly believe
that we should use a scope api [1] which marks whole regions of
potentially reclaim dangerous code paths and all the allocations within
that region will inherit the nofs protection automatically. That would
solve the vmalloc(GFP_NOFS) problem as well. The route to get there is
no short or easy. I am planning to repost the scope patchset hopefully
soon with ext4 converted.

[1] http://lkml.kernel.org/r/1461671772-1269-1-git-send-email-mho...@kernel.org

> Really, there are several of things about vmalloc() that could improve
> if we decided to move it out of the dog house and allow it to become a
> first class citizen, but that needs a larger discussion, and you can
> already do a lot of cleanup with just the introduction of kvmalloc().
> 
> Since this is changing the ext4 code, you can add my:
> 
> Reviewed-by: Andreas Dilger 

thanks!
-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-14 Thread Michal Hocko
On Tue 13-12-16 13:55:46, Andreas Dilger wrote:
> On Dec 13, 2016, at 3:14 AM, Michal Hocko  wrote:
> > 
> > Are there any more comments or objections to this patch? Is this a good
> > start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> > well in the initial version.
> 
> I'm in favour of this cleanup as a starting point.  I definitely agree
> that this same functionality is in use in a number of places and should
> be consolidated.
> 
> The vmalloc() from GFP_NOFS can be addressed separately in later patches.
> That is an issue for several filesystems, and while XFS works around this,
> it would be better to lift that out of the filesystem code into the VM.

Well, my longer term plan is to change how GFP_NOFS is used from the fs
code rather than tweak the VM layer. The current situation with the nofs
is messy and confusing. In many contexts it is used without a good
reason - just to be sure that nothing will break. I strongly believe
that we should use a scope api [1] which marks whole regions of
potentially reclaim dangerous code paths and all the allocations within
that region will inherit the nofs protection automatically. That would
solve the vmalloc(GFP_NOFS) problem as well. The route to get there is
no short or easy. I am planning to repost the scope patchset hopefully
soon with ext4 converted.

[1] http://lkml.kernel.org/r/1461671772-1269-1-git-send-email-mho...@kernel.org

> Really, there are several of things about vmalloc() that could improve
> if we decided to move it out of the dog house and allow it to become a
> first class citizen, but that needs a larger discussion, and you can
> already do a lot of cleanup with just the introduction of kvmalloc().
> 
> Since this is changing the ext4 code, you can add my:
> 
> Reviewed-by: Andreas Dilger 

thanks!
-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-14 Thread Michal Hocko
On Tue 13-12-16 14:07:33, Joe Perches wrote:
> On Tue, 2016-12-13 at 11:14 +0100, Michal Hocko wrote:
> > Are there any more comments or objections to this patch? Is this a good
> > start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> > well in the initial version.
> 
> Did Andrew Morton ever comment on this?
> I believe he was the primary objector in the past.
> 
> Last I recollect was over a year ago:
> 
> https://lkml.org/lkml/2015/7/7/1050

Let me quote:
: Sigh.  We've resisted doing this because vmalloc() is somewhat of a bad
: thing, and we don't want to make it easy for people to do bad things.
: 
: And vmalloc is bad because a) it's slow and b) it does GFP_KERNEL
: allocations for page tables and c) it is susceptible to arena
: fragmentation.
: 
: We'd prefer that people fix their junk so it doesn't depend upon large
: contiguous allocations.  This isn't userspace - kernel space is hostile
: and kernel code should be robust.
: 
: So I dunno.  Should we continue to make it a bit more awkward to use
: vmalloc()?  Probably that tactic isn't being very successful - people
: will just go ahead and open-code it.  And given the surprising amount
: of stuff you've placed in kvmalloc_node(), they'll implement it
: incorrectly...
: 
: How about we compromise: add kvmalloc_node(), but include a BUG_ON("you
: suck") to it?

While I agree with some of those points, the reality really sucks,
though. We have tried the same tactic with __GFP_NOFAIL and failed as
well. I guess we should just bite the bullet and provide an api which is
so common that people keep reinventing their own ways around that, many
times wrongly or suboptimally. BUG_ON("you suck") is just not going to
help much I am afraid.

What do you think Andrew?
-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-14 Thread Michal Hocko
On Tue 13-12-16 14:07:33, Joe Perches wrote:
> On Tue, 2016-12-13 at 11:14 +0100, Michal Hocko wrote:
> > Are there any more comments or objections to this patch? Is this a good
> > start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> > well in the initial version.
> 
> Did Andrew Morton ever comment on this?
> I believe he was the primary objector in the past.
> 
> Last I recollect was over a year ago:
> 
> https://lkml.org/lkml/2015/7/7/1050

Let me quote:
: Sigh.  We've resisted doing this because vmalloc() is somewhat of a bad
: thing, and we don't want to make it easy for people to do bad things.
: 
: And vmalloc is bad because a) it's slow and b) it does GFP_KERNEL
: allocations for page tables and c) it is susceptible to arena
: fragmentation.
: 
: We'd prefer that people fix their junk so it doesn't depend upon large
: contiguous allocations.  This isn't userspace - kernel space is hostile
: and kernel code should be robust.
: 
: So I dunno.  Should we continue to make it a bit more awkward to use
: vmalloc()?  Probably that tactic isn't being very successful - people
: will just go ahead and open-code it.  And given the surprising amount
: of stuff you've placed in kvmalloc_node(), they'll implement it
: incorrectly...
: 
: How about we compromise: add kvmalloc_node(), but include a BUG_ON("you
: suck") to it?

While I agree with some of those points, the reality really sucks,
though. We have tried the same tactic with __GFP_NOFAIL and failed as
well. I guess we should just bite the bullet and provide an api which is
so common that people keep reinventing their own ways around that, many
times wrongly or suboptimally. BUG_ON("you suck") is just not going to
help much I am afraid.

What do you think Andrew?
-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-13 Thread Joe Perches
On Tue, 2016-12-13 at 11:14 +0100, Michal Hocko wrote:
> Are there any more comments or objections to this patch? Is this a good
> start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> well in the initial version.

Did Andrew Morton ever comment on this?
I believe he was the primary objector in the past.

Last I recollect was over a year ago:

https://lkml.org/lkml/2015/7/7/1050




Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-13 Thread Joe Perches
On Tue, 2016-12-13 at 11:14 +0100, Michal Hocko wrote:
> Are there any more comments or objections to this patch? Is this a good
> start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> well in the initial version.

Did Andrew Morton ever comment on this?
I believe he was the primary objector in the past.

Last I recollect was over a year ago:

https://lkml.org/lkml/2015/7/7/1050




Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-13 Thread Andreas Dilger
On Dec 13, 2016, at 3:14 AM, Michal Hocko  wrote:
> 
> Are there any more comments or objections to this patch? Is this a good
> start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> well in the initial version.

I'm in favour of this cleanup as a starting point.  I definitely agree
that this same functionality is in use in a number of places and should
be consolidated.

The vmalloc() from GFP_NOFS can be addressed separately in later patches.
That is an issue for several filesystems, and while XFS works around this,
it would be better to lift that out of the filesystem code into the VM.
Really, there are several of things about vmalloc() that could improve
if we decided to move it out of the dog house and allow it to become a
first class citizen, but that needs a larger discussion, and you can
already do a lot of cleanup with just the introduction of kvmalloc().

Since this is changing the ext4 code, you can add my:

Reviewed-by: Andreas Dilger 

Cheers, Andreas

> On Thu 08-12-16 11:33:00, Michal Hocko wrote:
>> From: Michal Hocko 
>> 
>> Using kmalloc with the vmalloc fallback for larger allocations is a
>> common pattern in the kernel code. Yet we do not have any common helper
>> for that and so users have invented their own helpers. Some of them are
>> really creative when doing so. Let's just add kv[mz]alloc and make sure
>> it is implemented properly. This implementation makes sure to not make
>> a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
>> to not warn about allocation failures. This also rules out the OOM
>> killer as the vmalloc is a more approapriate fallback than a disruptive
>> user visible action.
>> 
>> This patch also changes some existing users and removes helpers which
>> are specific for them. In some cases this is not possible (e.g.
>> ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
>> broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
>> in general (note that the page table allocation is GFP_KERNEL). Those
>> need to be fixed separately.
>> 
>> apparmor has already claimed kv[mz]alloc so remove those and use
>> __aa_kvmalloc instead to prevent from the naming clashes.
>> 
>> Cc: Paolo Bonzini 
>> Cc: Mike Snitzer 
>> Cc: dm-de...@redhat.com
>> Cc: "Michael S. Tsirkin" 
>> Cc: "Theodore Ts'o" 
>> Cc: k...@vger.kernel.org
>> Cc: linux-e...@vger.kernel.org
>> Cc: linux-f2fs-de...@lists.sourceforge.net
>> Cc: linux-security-mod...@vger.kernel.org
>> Signed-off-by: Michal Hocko 
>> ---
>> 
>> Hi,
>> this has been brought up during [1] discussion. I think we are long
>> overdue with kvmalloc helpers provided by the core mm code. There are
>> so many users out there. This patch doesn't try to convert all existing
>> users. I have just tried to identified those who have invented their
>> own helpers. There are many others who are openconding that. This is
>> something for a coccinelle script to automate.
>> 
>> While looking into this I have encountered many (as noted in the
>> changelog) users who are broken. Especially GFP_NOFS users which might
>> go down the vmalloc path are worrying. Those need to be fixed but that
>> is out of scope of this patch. I have simply left them in the place.
>> A proper fix for them is to not use GFP_NOFS and rather move over to a
>> scope gfp_nofs api [2]. This will take quite some time though.
>> 
>> One thing I haven't considered in this patch - but I can if there is a
>> demand - is that the current callers of kv[mz]alloc cannot really
>> override GFP_NORETRY for larger requests. This flag is implicit. I can
>> imagine some users would rather prefer to retry hard before falling back
>> to vmalloc though. There doesn't seem to be any such user in the tree
>> right now AFAICS. vhost_kvzalloc used __GFP_REPEAT but git history
>> doesn't show any sign there would be a strong reason for that. I might
>> be wrong here. If that is the case then it is not a problem to do
>> 
>>  /*
>>   * Make sure that larger requests are not too disruptive as long
>>   * as the caller doesn't insist by giving __GFP_REPEAT. No OOM
>>   * killer and no allocation failure warnings as we have a fallback
>>   * is done by default.
>>   */
>>  if (size > PAGE_SZE) {
>>  kmalloc_flags |= __GFP_NOWARN;
>> 
>>  if (!(flags & __GFP_REPEAT))
>>  flags |= __GFP_NORETRY;
>>  }
>> 
>> [1] 
>> http://lkml.kernel.org/r/1480554981-195198-1-git-send-email-astepa...@cloudlinux.com
>> [2] 
>> http://lkml.kernel.org/r/1461671772-1269-1-git-send-email-mho...@kernel.org
>> 
>> arch/x86/kvm/lapic.c |  4 ++--
>> arch/x86/kvm/page_track.c|  4 ++--
>> arch/x86/kvm/x86.c   |  4 ++--
>> drivers/md/dm-stats.c|  7 +--
>> 

Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-13 Thread Andreas Dilger
On Dec 13, 2016, at 3:14 AM, Michal Hocko  wrote:
> 
> Are there any more comments or objections to this patch? Is this a good
> start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
> well in the initial version.

I'm in favour of this cleanup as a starting point.  I definitely agree
that this same functionality is in use in a number of places and should
be consolidated.

The vmalloc() from GFP_NOFS can be addressed separately in later patches.
That is an issue for several filesystems, and while XFS works around this,
it would be better to lift that out of the filesystem code into the VM.
Really, there are several of things about vmalloc() that could improve
if we decided to move it out of the dog house and allow it to become a
first class citizen, but that needs a larger discussion, and you can
already do a lot of cleanup with just the introduction of kvmalloc().

Since this is changing the ext4 code, you can add my:

Reviewed-by: Andreas Dilger 

Cheers, Andreas

> On Thu 08-12-16 11:33:00, Michal Hocko wrote:
>> From: Michal Hocko 
>> 
>> Using kmalloc with the vmalloc fallback for larger allocations is a
>> common pattern in the kernel code. Yet we do not have any common helper
>> for that and so users have invented their own helpers. Some of them are
>> really creative when doing so. Let's just add kv[mz]alloc and make sure
>> it is implemented properly. This implementation makes sure to not make
>> a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
>> to not warn about allocation failures. This also rules out the OOM
>> killer as the vmalloc is a more approapriate fallback than a disruptive
>> user visible action.
>> 
>> This patch also changes some existing users and removes helpers which
>> are specific for them. In some cases this is not possible (e.g.
>> ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
>> broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
>> in general (note that the page table allocation is GFP_KERNEL). Those
>> need to be fixed separately.
>> 
>> apparmor has already claimed kv[mz]alloc so remove those and use
>> __aa_kvmalloc instead to prevent from the naming clashes.
>> 
>> Cc: Paolo Bonzini 
>> Cc: Mike Snitzer 
>> Cc: dm-de...@redhat.com
>> Cc: "Michael S. Tsirkin" 
>> Cc: "Theodore Ts'o" 
>> Cc: k...@vger.kernel.org
>> Cc: linux-e...@vger.kernel.org
>> Cc: linux-f2fs-de...@lists.sourceforge.net
>> Cc: linux-security-mod...@vger.kernel.org
>> Signed-off-by: Michal Hocko 
>> ---
>> 
>> Hi,
>> this has been brought up during [1] discussion. I think we are long
>> overdue with kvmalloc helpers provided by the core mm code. There are
>> so many users out there. This patch doesn't try to convert all existing
>> users. I have just tried to identified those who have invented their
>> own helpers. There are many others who are openconding that. This is
>> something for a coccinelle script to automate.
>> 
>> While looking into this I have encountered many (as noted in the
>> changelog) users who are broken. Especially GFP_NOFS users which might
>> go down the vmalloc path are worrying. Those need to be fixed but that
>> is out of scope of this patch. I have simply left them in the place.
>> A proper fix for them is to not use GFP_NOFS and rather move over to a
>> scope gfp_nofs api [2]. This will take quite some time though.
>> 
>> One thing I haven't considered in this patch - but I can if there is a
>> demand - is that the current callers of kv[mz]alloc cannot really
>> override GFP_NORETRY for larger requests. This flag is implicit. I can
>> imagine some users would rather prefer to retry hard before falling back
>> to vmalloc though. There doesn't seem to be any such user in the tree
>> right now AFAICS. vhost_kvzalloc used __GFP_REPEAT but git history
>> doesn't show any sign there would be a strong reason for that. I might
>> be wrong here. If that is the case then it is not a problem to do
>> 
>>  /*
>>   * Make sure that larger requests are not too disruptive as long
>>   * as the caller doesn't insist by giving __GFP_REPEAT. No OOM
>>   * killer and no allocation failure warnings as we have a fallback
>>   * is done by default.
>>   */
>>  if (size > PAGE_SZE) {
>>  kmalloc_flags |= __GFP_NOWARN;
>> 
>>  if (!(flags & __GFP_REPEAT))
>>  flags |= __GFP_NORETRY;
>>  }
>> 
>> [1] 
>> http://lkml.kernel.org/r/1480554981-195198-1-git-send-email-astepa...@cloudlinux.com
>> [2] 
>> http://lkml.kernel.org/r/1461671772-1269-1-git-send-email-mho...@kernel.org
>> 
>> arch/x86/kvm/lapic.c |  4 ++--
>> arch/x86/kvm/page_track.c|  4 ++--
>> arch/x86/kvm/x86.c   |  4 ++--
>> drivers/md/dm-stats.c|  7 +--
>> drivers/vhost/vhost.c| 15 +++---
>> fs/ext4/mballoc.c|  2 +-
>> fs/ext4/super.c  |  4 ++--
>> 

Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-13 Thread Michal Hocko
Are there any more comments or objections to this patch? Is this a good
start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
well in the initial version.

On Thu 08-12-16 11:33:00, Michal Hocko wrote:
> From: Michal Hocko 
> 
> Using kmalloc with the vmalloc fallback for larger allocations is a
> common pattern in the kernel code. Yet we do not have any common helper
> for that and so users have invented their own helpers. Some of them are
> really creative when doing so. Let's just add kv[mz]alloc and make sure
> it is implemented properly. This implementation makes sure to not make
> a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> to not warn about allocation failures. This also rules out the OOM
> killer as the vmalloc is a more approapriate fallback than a disruptive
> user visible action.
> 
> This patch also changes some existing users and removes helpers which
> are specific for them. In some cases this is not possible (e.g.
> ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> in general (note that the page table allocation is GFP_KERNEL). Those
> need to be fixed separately.
> 
> apparmor has already claimed kv[mz]alloc so remove those and use
> __aa_kvmalloc instead to prevent from the naming clashes.
> 
> Cc: Paolo Bonzini 
> Cc: Mike Snitzer 
> Cc: dm-de...@redhat.com
> Cc: "Michael S. Tsirkin" 
> Cc: "Theodore Ts'o" 
> Cc: k...@vger.kernel.org
> Cc: linux-e...@vger.kernel.org
> Cc: linux-f2fs-de...@lists.sourceforge.net
> Cc: linux-security-mod...@vger.kernel.org
> Signed-off-by: Michal Hocko 
> ---
> 
> Hi,
> this has been brought up during [1] discussion. I think we are long overdue
> with kvmalloc helpers provided by the core mm code. There are so many users
> out there. This patch doesn't try to convert all existing users. I have just
> tried to identified those who have invented their own helpers. There are many
> others who are openconding that. This is something for a coccinelle script to
> automate.
> 
> While looking into this I have encountered many (as noted in the
> changelog) users who are broken. Especially GFP_NOFS users which might
> go down the vmalloc path are worrying. Those need to be fixed but that
> is out of scope of this patch. I have simply left them in the place. A proper
> fix for them is to not use GFP_NOFS and rather move over to a scope gfp_nofs
> api [2]. This will take quite some time though.
> 
> One thing I haven't considered in this patch - but I can if there is a demand 
> -
> is that the current callers of kv[mz]alloc cannot really override GFP_NORETRY
> for larger requests. This flag is implicit. I can imagine some users would
> rather prefer to retry hard before falling back to vmalloc though. There 
> doesn't
> seem to be any such user in the tree right now AFAICS. vhost_kvzalloc
> used __GFP_REPEAT but git history doesn't show any sign there would be a 
> strong
> reason for that. I might be wrong here. If that is the case then it is not a 
> problem
> to do
> 
>   /*
>* Make sure that larger requests are not too disruptive as long as
>* the caller doesn't insist by giving __GFP_REPEAT. No OOM
>* killer and no allocation failure warnings as we have a fallback
>* is done by default.
>*/
>   if (size > PAGE_SZE) {
>   kmalloc_flags |= __GFP_NOWARN;
> 
>   if (!(flags & __GFP_REPEAT))
>   flags |= __GFP_NORETRY;
>   }
> 
> [1] 
> http://lkml.kernel.org/r/1480554981-195198-1-git-send-email-astepa...@cloudlinux.com
> [2] 
> http://lkml.kernel.org/r/1461671772-1269-1-git-send-email-mho...@kernel.org
> 
>  arch/x86/kvm/lapic.c |  4 ++--
>  arch/x86/kvm/page_track.c|  4 ++--
>  arch/x86/kvm/x86.c   |  4 ++--
>  drivers/md/dm-stats.c|  7 +--
>  drivers/vhost/vhost.c| 15 +++---
>  fs/ext4/mballoc.c|  2 +-
>  fs/ext4/super.c  |  4 ++--
>  fs/f2fs/f2fs.h   | 20 --
>  fs/f2fs/file.c   |  4 ++--
>  fs/f2fs/segment.c| 14 ++---
>  fs/seq_file.c| 16 +--
>  include/linux/kvm_host.h |  2 --
>  include/linux/mm.h   | 14 +
>  include/linux/vmalloc.h  |  1 +
>  mm/util.c| 40 
> 
>  mm/vmalloc.c |  2 +-
>  security/apparmor/apparmorfs.c   |  2 +-
>  security/apparmor/include/apparmor.h | 10 -
>  security/apparmor/match.c|  2 +-
>  virt/kvm/kvm_main.c  | 18 +++-
>  20 files changed, 84 insertions(+), 101 

Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-13 Thread Michal Hocko
Are there any more comments or objections to this patch? Is this a good
start or kv[mz]alloc has to provide a way to cover GFP_NOFS users as
well in the initial version.

On Thu 08-12-16 11:33:00, Michal Hocko wrote:
> From: Michal Hocko 
> 
> Using kmalloc with the vmalloc fallback for larger allocations is a
> common pattern in the kernel code. Yet we do not have any common helper
> for that and so users have invented their own helpers. Some of them are
> really creative when doing so. Let's just add kv[mz]alloc and make sure
> it is implemented properly. This implementation makes sure to not make
> a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> to not warn about allocation failures. This also rules out the OOM
> killer as the vmalloc is a more approapriate fallback than a disruptive
> user visible action.
> 
> This patch also changes some existing users and removes helpers which
> are specific for them. In some cases this is not possible (e.g.
> ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> in general (note that the page table allocation is GFP_KERNEL). Those
> need to be fixed separately.
> 
> apparmor has already claimed kv[mz]alloc so remove those and use
> __aa_kvmalloc instead to prevent from the naming clashes.
> 
> Cc: Paolo Bonzini 
> Cc: Mike Snitzer 
> Cc: dm-de...@redhat.com
> Cc: "Michael S. Tsirkin" 
> Cc: "Theodore Ts'o" 
> Cc: k...@vger.kernel.org
> Cc: linux-e...@vger.kernel.org
> Cc: linux-f2fs-de...@lists.sourceforge.net
> Cc: linux-security-mod...@vger.kernel.org
> Signed-off-by: Michal Hocko 
> ---
> 
> Hi,
> this has been brought up during [1] discussion. I think we are long overdue
> with kvmalloc helpers provided by the core mm code. There are so many users
> out there. This patch doesn't try to convert all existing users. I have just
> tried to identified those who have invented their own helpers. There are many
> others who are openconding that. This is something for a coccinelle script to
> automate.
> 
> While looking into this I have encountered many (as noted in the
> changelog) users who are broken. Especially GFP_NOFS users which might
> go down the vmalloc path are worrying. Those need to be fixed but that
> is out of scope of this patch. I have simply left them in the place. A proper
> fix for them is to not use GFP_NOFS and rather move over to a scope gfp_nofs
> api [2]. This will take quite some time though.
> 
> One thing I haven't considered in this patch - but I can if there is a demand 
> -
> is that the current callers of kv[mz]alloc cannot really override GFP_NORETRY
> for larger requests. This flag is implicit. I can imagine some users would
> rather prefer to retry hard before falling back to vmalloc though. There 
> doesn't
> seem to be any such user in the tree right now AFAICS. vhost_kvzalloc
> used __GFP_REPEAT but git history doesn't show any sign there would be a 
> strong
> reason for that. I might be wrong here. If that is the case then it is not a 
> problem
> to do
> 
>   /*
>* Make sure that larger requests are not too disruptive as long as
>* the caller doesn't insist by giving __GFP_REPEAT. No OOM
>* killer and no allocation failure warnings as we have a fallback
>* is done by default.
>*/
>   if (size > PAGE_SZE) {
>   kmalloc_flags |= __GFP_NOWARN;
> 
>   if (!(flags & __GFP_REPEAT))
>   flags |= __GFP_NORETRY;
>   }
> 
> [1] 
> http://lkml.kernel.org/r/1480554981-195198-1-git-send-email-astepa...@cloudlinux.com
> [2] 
> http://lkml.kernel.org/r/1461671772-1269-1-git-send-email-mho...@kernel.org
> 
>  arch/x86/kvm/lapic.c |  4 ++--
>  arch/x86/kvm/page_track.c|  4 ++--
>  arch/x86/kvm/x86.c   |  4 ++--
>  drivers/md/dm-stats.c|  7 +--
>  drivers/vhost/vhost.c| 15 +++---
>  fs/ext4/mballoc.c|  2 +-
>  fs/ext4/super.c  |  4 ++--
>  fs/f2fs/f2fs.h   | 20 --
>  fs/f2fs/file.c   |  4 ++--
>  fs/f2fs/segment.c| 14 ++---
>  fs/seq_file.c| 16 +--
>  include/linux/kvm_host.h |  2 --
>  include/linux/mm.h   | 14 +
>  include/linux/vmalloc.h  |  1 +
>  mm/util.c| 40 
> 
>  mm/vmalloc.c |  2 +-
>  security/apparmor/apparmorfs.c   |  2 +-
>  security/apparmor/include/apparmor.h | 10 -
>  security/apparmor/match.c|  2 +-
>  virt/kvm/kvm_main.c  | 18 +++-
>  20 files changed, 84 insertions(+), 101 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index b62c85229711..465e5ff4c304 

Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Michal Hocko
On Fri 09-12-16 06:38:04, Al Viro wrote:
> On Fri, Dec 09, 2016 at 07:22:25AM +0100, Michal Hocko wrote:
> 
> > > Easier to handle those in vmalloc() itself.
> > 
> > I think there were some attempts in the past but some of the code paths
> > are burried too deep and adding gfp_mask all the way down there seemed
> > like a major surgery.
> 
> No need to propagate gfp_mask - the same trick XFS is doing right now can
> be done in vmalloc.c in a couple of places and that's it; I'll resurrect the
> patches and post them tomorrow after I get some sleep.

That would work as an immediate mitigation. No question about that but
what I've tried to point out in the reply to Dave is that longerm we
shouldn't hide this trickiness inside the vmalloc and rather handle
those users who are requesting NOFS/NOIO context from vmalloc. We
already have a scope api for NOIO and I want to add the same for NOFS.
I believe that much more sane approach is to use the API at those places
which really start/stop reclaim recursion dangerous scope (e.g. the
transaction context) rather than using GFP_NOFS randomly because this
approach has proven to not work properly over years. We have so many
place using GFP_NOFS just because nobody bothered to think whether it is
needed but it must be safe for sure that it is not funny.

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Michal Hocko
On Fri 09-12-16 06:38:04, Al Viro wrote:
> On Fri, Dec 09, 2016 at 07:22:25AM +0100, Michal Hocko wrote:
> 
> > > Easier to handle those in vmalloc() itself.
> > 
> > I think there were some attempts in the past but some of the code paths
> > are burried too deep and adding gfp_mask all the way down there seemed
> > like a major surgery.
> 
> No need to propagate gfp_mask - the same trick XFS is doing right now can
> be done in vmalloc.c in a couple of places and that's it; I'll resurrect the
> patches and post them tomorrow after I get some sleep.

That would work as an immediate mitigation. No question about that but
what I've tried to point out in the reply to Dave is that longerm we
shouldn't hide this trickiness inside the vmalloc and rather handle
those users who are requesting NOFS/NOIO context from vmalloc. We
already have a scope api for NOIO and I want to add the same for NOFS.
I believe that much more sane approach is to use the API at those places
which really start/stop reclaim recursion dangerous scope (e.g. the
transaction context) rather than using GFP_NOFS randomly because this
approach has proven to not work properly over years. We have so many
place using GFP_NOFS just because nobody bothered to think whether it is
needed but it must be safe for sure that it is not funny.

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Al Viro
On Fri, Dec 09, 2016 at 07:22:25AM +0100, Michal Hocko wrote:

> > Easier to handle those in vmalloc() itself.
> 
> I think there were some attempts in the past but some of the code paths
> are burried too deep and adding gfp_mask all the way down there seemed
> like a major surgery.

No need to propagate gfp_mask - the same trick XFS is doing right now can
be done in vmalloc.c in a couple of places and that's it; I'll resurrect the
patches and post them tomorrow after I get some sleep.


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Al Viro
On Fri, Dec 09, 2016 at 07:22:25AM +0100, Michal Hocko wrote:

> > Easier to handle those in vmalloc() itself.
> 
> I think there were some attempts in the past but some of the code paths
> are burried too deep and adding gfp_mask all the way down there seemed
> like a major surgery.

No need to propagate gfp_mask - the same trick XFS is doing right now can
be done in vmalloc.c in a couple of places and that's it; I'll resurrect the
patches and post them tomorrow after I get some sleep.


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Michal Hocko
On Fri 09-12-16 02:00:17, Al Viro wrote:
> On Fri, Dec 09, 2016 at 12:44:17PM +1100, Dave Chinner wrote:
> > On Thu, Dec 08, 2016 at 11:33:00AM +0100, Michal Hocko wrote:
> > > From: Michal Hocko 
> > > 
> > > Using kmalloc with the vmalloc fallback for larger allocations is a
> > > common pattern in the kernel code. Yet we do not have any common helper
> > > for that and so users have invented their own helpers. Some of them are
> > > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > > it is implemented properly. This implementation makes sure to not make
> > > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > > to not warn about allocation failures. This also rules out the OOM
> > > killer as the vmalloc is a more approapriate fallback than a disruptive
> > > user visible action.
> > > 
> > > This patch also changes some existing users and removes helpers which
> > > are specific for them. In some cases this is not possible (e.g.
> > > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > > in general (note that the page table allocation is GFP_KERNEL). Those
> > > need to be fixed separately.
> > 
> > See fs/xfs/kmem.c::kmem_zalloc_large(), which is XFS's version of
> > kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
> > functionality will have to play these memalloc_noio_save/
> > memalloc_noio_restore games to ensure they are GFP_NOFS safe
> 
> Easier to handle those in vmalloc() itself.

I think there were some attempts in the past but some of the code paths
are burried too deep and adding gfp_mask all the way down there seemed
like a major surgery.

> The problem I have with these
> helpers is that different places have different cutoff thresholds for
> switch from kmalloc to vmalloc; has anyone done an analysis of those?

Yes, I have noticed some creativity as well. Some of them didn't bother
to kmalloc at all for size > PAGE_SIZE. Some where playing tricks with
PAGE_ALLOC_COSTLY_ORDER. I believe the right thing to do is to simply do
not hammer the system with size > PAGE_SZE which means __GFP_NORETRY for
them and fallback to vmalloc on the failure (basically what
seq_buf_alloc did). I cannot offer any numbers but at least
seq_buf_alloc has proven to do the right thing over time.

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Michal Hocko
On Fri 09-12-16 02:00:17, Al Viro wrote:
> On Fri, Dec 09, 2016 at 12:44:17PM +1100, Dave Chinner wrote:
> > On Thu, Dec 08, 2016 at 11:33:00AM +0100, Michal Hocko wrote:
> > > From: Michal Hocko 
> > > 
> > > Using kmalloc with the vmalloc fallback for larger allocations is a
> > > common pattern in the kernel code. Yet we do not have any common helper
> > > for that and so users have invented their own helpers. Some of them are
> > > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > > it is implemented properly. This implementation makes sure to not make
> > > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > > to not warn about allocation failures. This also rules out the OOM
> > > killer as the vmalloc is a more approapriate fallback than a disruptive
> > > user visible action.
> > > 
> > > This patch also changes some existing users and removes helpers which
> > > are specific for them. In some cases this is not possible (e.g.
> > > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > > in general (note that the page table allocation is GFP_KERNEL). Those
> > > need to be fixed separately.
> > 
> > See fs/xfs/kmem.c::kmem_zalloc_large(), which is XFS's version of
> > kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
> > functionality will have to play these memalloc_noio_save/
> > memalloc_noio_restore games to ensure they are GFP_NOFS safe
> 
> Easier to handle those in vmalloc() itself.

I think there were some attempts in the past but some of the code paths
are burried too deep and adding gfp_mask all the way down there seemed
like a major surgery.

> The problem I have with these
> helpers is that different places have different cutoff thresholds for
> switch from kmalloc to vmalloc; has anyone done an analysis of those?

Yes, I have noticed some creativity as well. Some of them didn't bother
to kmalloc at all for size > PAGE_SIZE. Some where playing tricks with
PAGE_ALLOC_COSTLY_ORDER. I believe the right thing to do is to simply do
not hammer the system with size > PAGE_SZE which means __GFP_NORETRY for
them and fallback to vmalloc on the failure (basically what
seq_buf_alloc did). I cannot offer any numbers but at least
seq_buf_alloc has proven to do the right thing over time.

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Michal Hocko
On Fri 09-12-16 12:44:17, Dave Chinner wrote:
> On Thu, Dec 08, 2016 at 11:33:00AM +0100, Michal Hocko wrote:
> > From: Michal Hocko 
> > 
> > Using kmalloc with the vmalloc fallback for larger allocations is a
> > common pattern in the kernel code. Yet we do not have any common helper
> > for that and so users have invented their own helpers. Some of them are
> > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > it is implemented properly. This implementation makes sure to not make
> > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > to not warn about allocation failures. This also rules out the OOM
> > killer as the vmalloc is a more approapriate fallback than a disruptive
> > user visible action.
> > 
> > This patch also changes some existing users and removes helpers which
> > are specific for them. In some cases this is not possible (e.g.
> > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > in general (note that the page table allocation is GFP_KERNEL). Those
> > need to be fixed separately.
> 
> See fs/xfs/kmem.c::kmem_zalloc_large(), which is XFS's version of
> kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
> functionality will have to play these memalloc_noio_save/
> memalloc_noio_restore games to ensure they are GFP_NOFS safe

Well, I didn't want to play this games in the generic kvmalloc, at least
not now, because all the converted users didn't really need it so far
and I believe that the existing users need a) inspection to check
whether NO{FS,IO} context is really needed and b) I still believe that
the scope nofs api should be used longterm rather than an explicit
GFP_NOFS. I am already working on ext[34] code.

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Michal Hocko
On Fri 09-12-16 12:44:17, Dave Chinner wrote:
> On Thu, Dec 08, 2016 at 11:33:00AM +0100, Michal Hocko wrote:
> > From: Michal Hocko 
> > 
> > Using kmalloc with the vmalloc fallback for larger allocations is a
> > common pattern in the kernel code. Yet we do not have any common helper
> > for that and so users have invented their own helpers. Some of them are
> > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > it is implemented properly. This implementation makes sure to not make
> > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > to not warn about allocation failures. This also rules out the OOM
> > killer as the vmalloc is a more approapriate fallback than a disruptive
> > user visible action.
> > 
> > This patch also changes some existing users and removes helpers which
> > are specific for them. In some cases this is not possible (e.g.
> > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > in general (note that the page table allocation is GFP_KERNEL). Those
> > need to be fixed separately.
> 
> See fs/xfs/kmem.c::kmem_zalloc_large(), which is XFS's version of
> kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
> functionality will have to play these memalloc_noio_save/
> memalloc_noio_restore games to ensure they are GFP_NOFS safe

Well, I didn't want to play this games in the generic kvmalloc, at least
not now, because all the converted users didn't really need it so far
and I believe that the existing users need a) inspection to check
whether NO{FS,IO} context is really needed and b) I still believe that
the scope nofs api should be used longterm rather than an explicit
GFP_NOFS. I am already working on ext[34] code.

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Al Viro
On Fri, Dec 09, 2016 at 12:44:17PM +1100, Dave Chinner wrote:
> On Thu, Dec 08, 2016 at 11:33:00AM +0100, Michal Hocko wrote:
> > From: Michal Hocko 
> > 
> > Using kmalloc with the vmalloc fallback for larger allocations is a
> > common pattern in the kernel code. Yet we do not have any common helper
> > for that and so users have invented their own helpers. Some of them are
> > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > it is implemented properly. This implementation makes sure to not make
> > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > to not warn about allocation failures. This also rules out the OOM
> > killer as the vmalloc is a more approapriate fallback than a disruptive
> > user visible action.
> > 
> > This patch also changes some existing users and removes helpers which
> > are specific for them. In some cases this is not possible (e.g.
> > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > in general (note that the page table allocation is GFP_KERNEL). Those
> > need to be fixed separately.
> 
> See fs/xfs/kmem.c::kmem_zalloc_large(), which is XFS's version of
> kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
> functionality will have to play these memalloc_noio_save/
> memalloc_noio_restore games to ensure they are GFP_NOFS safe

Easier to handle those in vmalloc() itself.  The problem I have with these
helpers is that different places have different cutoff thresholds for
switch from kmalloc to vmalloc; has anyone done an analysis of those?


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Al Viro
On Fri, Dec 09, 2016 at 12:44:17PM +1100, Dave Chinner wrote:
> On Thu, Dec 08, 2016 at 11:33:00AM +0100, Michal Hocko wrote:
> > From: Michal Hocko 
> > 
> > Using kmalloc with the vmalloc fallback for larger allocations is a
> > common pattern in the kernel code. Yet we do not have any common helper
> > for that and so users have invented their own helpers. Some of them are
> > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > it is implemented properly. This implementation makes sure to not make
> > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > to not warn about allocation failures. This also rules out the OOM
> > killer as the vmalloc is a more approapriate fallback than a disruptive
> > user visible action.
> > 
> > This patch also changes some existing users and removes helpers which
> > are specific for them. In some cases this is not possible (e.g.
> > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > in general (note that the page table allocation is GFP_KERNEL). Those
> > need to be fixed separately.
> 
> See fs/xfs/kmem.c::kmem_zalloc_large(), which is XFS's version of
> kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
> functionality will have to play these memalloc_noio_save/
> memalloc_noio_restore games to ensure they are GFP_NOFS safe

Easier to handle those in vmalloc() itself.  The problem I have with these
helpers is that different places have different cutoff thresholds for
switch from kmalloc to vmalloc; has anyone done an analysis of those?


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Dave Chinner
On Thu, Dec 08, 2016 at 11:33:00AM +0100, Michal Hocko wrote:
> From: Michal Hocko 
> 
> Using kmalloc with the vmalloc fallback for larger allocations is a
> common pattern in the kernel code. Yet we do not have any common helper
> for that and so users have invented their own helpers. Some of them are
> really creative when doing so. Let's just add kv[mz]alloc and make sure
> it is implemented properly. This implementation makes sure to not make
> a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> to not warn about allocation failures. This also rules out the OOM
> killer as the vmalloc is a more approapriate fallback than a disruptive
> user visible action.
> 
> This patch also changes some existing users and removes helpers which
> are specific for them. In some cases this is not possible (e.g.
> ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> in general (note that the page table allocation is GFP_KERNEL). Those
> need to be fixed separately.

See fs/xfs/kmem.c::kmem_zalloc_large(), which is XFS's version of
kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
functionality will have to play these memalloc_noio_save/
memalloc_noio_restore games to ensure they are GFP_NOFS safe

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Dave Chinner
On Thu, Dec 08, 2016 at 11:33:00AM +0100, Michal Hocko wrote:
> From: Michal Hocko 
> 
> Using kmalloc with the vmalloc fallback for larger allocations is a
> common pattern in the kernel code. Yet we do not have any common helper
> for that and so users have invented their own helpers. Some of them are
> really creative when doing so. Let's just add kv[mz]alloc and make sure
> it is implemented properly. This implementation makes sure to not make
> a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> to not warn about allocation failures. This also rules out the OOM
> killer as the vmalloc is a more approapriate fallback than a disruptive
> user visible action.
> 
> This patch also changes some existing users and removes helpers which
> are specific for them. In some cases this is not possible (e.g.
> ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> in general (note that the page table allocation is GFP_KERNEL). Those
> need to be fixed separately.

See fs/xfs/kmem.c::kmem_zalloc_large(), which is XFS's version of
kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
functionality will have to play these memalloc_noio_save/
memalloc_noio_restore games to ensure they are GFP_NOFS safe

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Michal Hocko
On Thu 08-12-16 14:00:20, David Hildenbrand wrote:
> Am 08.12.2016 um 11:33 schrieb Michal Hocko:
> > From: Michal Hocko 
> > 
> > Using kmalloc with the vmalloc fallback for larger allocations is a
> > common pattern in the kernel code. Yet we do not have any common helper
> > for that and so users have invented their own helpers. Some of them are
> > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > it is implemented properly. This implementation makes sure to not make
> > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > to not warn about allocation failures. This also rules out the OOM
> > killer as the vmalloc is a more approapriate fallback than a disruptive
> > user visible action.
> > 
> > This patch also changes some existing users and removes helpers which
> > are specific for them. In some cases this is not possible (e.g.
> > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > in general (note that the page table allocation is GFP_KERNEL). Those
> > need to be fixed separately.
> > 
> > apparmor has already claimed kv[mz]alloc so remove those and use
> > __aa_kvmalloc instead to prevent from the naming clashes.
> > 
> > Cc: Paolo Bonzini 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: "Michael S. Tsirkin" 
> > Cc: "Theodore Ts'o" 
> > Cc: k...@vger.kernel.org
> > Cc: linux-e...@vger.kernel.org
> > Cc: linux-f2fs-de...@lists.sourceforge.net
> > Cc: linux-security-mod...@vger.kernel.org
> > Signed-off-by: Michal Hocko 
> 
> I remember yet another similar user in arch/s390/kvm/kvm-s390.c
> -> kvm_s390_set_skeys()
> 
> ...
> keys = kmalloc_array(args->count, sizeof(uint8_t),
>  GFP_KERNEL | __GFP_NOWARN);
> if (!keys)
> vmalloc(sizeof(uint8_t) * args->count);
> ...
> 
> would kvmalloc_array make sense? (it would even make the code here
> less error prone and better to read)

Well, if there are more users like that then why not. I just do not want
to duplicate the whole kmalloc API right now. The above could be
trivially changed to kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL)
so a special API might not be really needed. 

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Michal Hocko
On Thu 08-12-16 14:00:20, David Hildenbrand wrote:
> Am 08.12.2016 um 11:33 schrieb Michal Hocko:
> > From: Michal Hocko 
> > 
> > Using kmalloc with the vmalloc fallback for larger allocations is a
> > common pattern in the kernel code. Yet we do not have any common helper
> > for that and so users have invented their own helpers. Some of them are
> > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > it is implemented properly. This implementation makes sure to not make
> > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > to not warn about allocation failures. This also rules out the OOM
> > killer as the vmalloc is a more approapriate fallback than a disruptive
> > user visible action.
> > 
> > This patch also changes some existing users and removes helpers which
> > are specific for them. In some cases this is not possible (e.g.
> > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > in general (note that the page table allocation is GFP_KERNEL). Those
> > need to be fixed separately.
> > 
> > apparmor has already claimed kv[mz]alloc so remove those and use
> > __aa_kvmalloc instead to prevent from the naming clashes.
> > 
> > Cc: Paolo Bonzini 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: "Michael S. Tsirkin" 
> > Cc: "Theodore Ts'o" 
> > Cc: k...@vger.kernel.org
> > Cc: linux-e...@vger.kernel.org
> > Cc: linux-f2fs-de...@lists.sourceforge.net
> > Cc: linux-security-mod...@vger.kernel.org
> > Signed-off-by: Michal Hocko 
> 
> I remember yet another similar user in arch/s390/kvm/kvm-s390.c
> -> kvm_s390_set_skeys()
> 
> ...
> keys = kmalloc_array(args->count, sizeof(uint8_t),
>  GFP_KERNEL | __GFP_NOWARN);
> if (!keys)
> vmalloc(sizeof(uint8_t) * args->count);
> ...
> 
> would kvmalloc_array make sense? (it would even make the code here
> less error prone and better to read)

Well, if there are more users like that then why not. I just do not want
to duplicate the whole kmalloc API right now. The above could be
trivially changed to kvmalloc(args->count * sizeof(uint8_t), GFP_KERNEL)
so a special API might not be really needed. 

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread David Hildenbrand

Am 08.12.2016 um 11:33 schrieb Michal Hocko:

From: Michal Hocko 

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them. In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
in general (note that the page table allocation is GFP_KERNEL). Those
need to be fixed separately.

apparmor has already claimed kv[mz]alloc so remove those and use
__aa_kvmalloc instead to prevent from the naming clashes.

Cc: Paolo Bonzini 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: "Michael S. Tsirkin" 
Cc: "Theodore Ts'o" 
Cc: k...@vger.kernel.org
Cc: linux-e...@vger.kernel.org
Cc: linux-f2fs-de...@lists.sourceforge.net
Cc: linux-security-mod...@vger.kernel.org
Signed-off-by: Michal Hocko 


I remember yet another similar user in arch/s390/kvm/kvm-s390.c
-> kvm_s390_set_skeys()

...
keys = kmalloc_array(args->count, sizeof(uint8_t),
 GFP_KERNEL | __GFP_NOWARN);
if (!keys)
vmalloc(sizeof(uint8_t) * args->count);
...

would kvmalloc_array make sense? (it would even make the code here
less error prone and better to read)

--

David


Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread David Hildenbrand

Am 08.12.2016 um 11:33 schrieb Michal Hocko:

From: Michal Hocko 

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them. In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
in general (note that the page table allocation is GFP_KERNEL). Those
need to be fixed separately.

apparmor has already claimed kv[mz]alloc so remove those and use
__aa_kvmalloc instead to prevent from the naming clashes.

Cc: Paolo Bonzini 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: "Michael S. Tsirkin" 
Cc: "Theodore Ts'o" 
Cc: k...@vger.kernel.org
Cc: linux-e...@vger.kernel.org
Cc: linux-f2fs-de...@lists.sourceforge.net
Cc: linux-security-mod...@vger.kernel.org
Signed-off-by: Michal Hocko 


I remember yet another similar user in arch/s390/kvm/kvm-s390.c
-> kvm_s390_set_skeys()

...
keys = kmalloc_array(args->count, sizeof(uint8_t),
 GFP_KERNEL | __GFP_NOWARN);
if (!keys)
vmalloc(sizeof(uint8_t) * args->count);
...

would kvmalloc_array make sense? (it would even make the code here
less error prone and better to read)

--

David