Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-26 Thread Dave Chinner
On Wed, Aug 25, 2010 at 09:48:47PM -0400, Ted Ts'o wrote: On Wed, Aug 25, 2010 at 05:30:42PM -0700, David Rientjes wrote: We certainly hope that nobody will reimplement the same function without the __deprecated warning, especially for order PAGE_ALLOC_COSTLY_ORDER where there's no

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-26 Thread Dave Chinner
On Wed, Aug 25, 2010 at 08:09:21PM -0700, David Rientjes wrote: On Wed, 25 Aug 2010, Ted Ts'o wrote: I think it's really sad that the caller can't know what the upper bounds of its memory requirement are ahead of time or at least be able to implement a memory freeing function when

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-26 Thread Peter Zijlstra
On Wed, 2010-08-25 at 20:09 -0700, David Rientjes wrote: Oh, we can determine an upper bound. You might just not like it. Actually ext3/ext4 shouldn't be as bad as XFS, which Dave estimated to be around 400k for a transaction. My guess is that the worst case for ext3/ext4 is probably

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-26 Thread Christoph Lameter
On Wed, 25 Aug 2010, David Rientjes wrote: Because we can remove the flag, remove branches from the page allocator slowpath, and none of these allocations actually are dependent on __GFP_NOFAIL since they are all under PAGE_ALLOC_COSTLY_ORDER. Then we can simply remove __GFP_NOFAIL? Functions

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-26 Thread David Rientjes
On Thu, 26 Aug 2010, Christoph Lameter wrote: Because we can remove the flag, remove branches from the page allocator slowpath, and none of these allocations actually are dependent on __GFP_NOFAIL since they are all under PAGE_ALLOC_COSTLY_ORDER. Then we can simply remove __GFP_NOFAIL?

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Ted Ts'o
On Tue, Aug 24, 2010 at 01:11:26PM -0700, David Rientjes wrote: On Tue, 24 Aug 2010, Jens Axboe wrote: Should be possible to warn at build time for anyone using __GFP_NOFAIL without wrapping it in a function. We could make this __deprecated functions as Peter suggested if you think

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Peter Zijlstra
On Wed, 2010-08-25 at 07:24 -0400, Ted Ts'o wrote: Part of the problem is that we have a few places in the kernel where failure is really not an option --- or rather, if we're going to fail while we're in the middle of doing a commit, our choices really are (a) retry the loop in the jbd layer

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Ted Ts'o
On Wed, Aug 25, 2010 at 01:35:32PM +0200, Peter Zijlstra wrote: On Wed, 2010-08-25 at 07:24 -0400, Ted Ts'o wrote: Part of the problem is that we have a few places in the kernel where failure is really not an option --- or rather, if we're going to fail while we're in the middle of doing a

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Peter Zijlstra
On Wed, 2010-08-25 at 07:57 -0400, Ted Ts'o wrote: On Wed, Aug 25, 2010 at 01:35:32PM +0200, Peter Zijlstra wrote: On Wed, 2010-08-25 at 07:24 -0400, Ted Ts'o wrote: Part of the problem is that we have a few places in the kernel where failure is really not an option --- or rather, if

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Peter Zijlstra
On Wed, 2010-08-25 at 14:48 +0200, Peter Zijlstra wrote: On Wed, 2010-08-25 at 07:24 -0400, Ted Ts'o wrote: (a) retry the loop in the jbd layer (which Andrew really doesn't like), (b) keep our own private cache of free memory so we don't fail and/or loop, Well (b) sounds a lot

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Theodore Tso
On Aug 25, 2010, at 8:52 AM, Peter Zijlstra wrote: Also, there's a good reason for disliking (a), its a deadlock scenario, suppose we need to write out data to get free pages, but the writing out is blocked on requiring free pages. There's really nothing the page allocator can do to help

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Dave Chinner
On Wed, Aug 25, 2010 at 02:48:36PM +0200, Peter Zijlstra wrote: On Wed, 2010-08-25 at 07:57 -0400, Ted Ts'o wrote: On Wed, Aug 25, 2010 at 01:35:32PM +0200, Peter Zijlstra wrote: On Wed, 2010-08-25 at 07:24 -0400, Ted Ts'o wrote: Part of the problem is that we have a few places in the

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Peter Zijlstra
On Wed, 2010-08-25 at 09:20 -0400, Theodore Tso wrote: On Aug 25, 2010, at 8:52 AM, Peter Zijlstra wrote: Also, there's a good reason for disliking (a), its a deadlock scenario, suppose we need to write out data to get free pages, but the writing out is blocked on requiring free pages.

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Peter Zijlstra
On Wed, 2010-08-25 at 09:20 -0400, Theodore Tso wrote: Well, if all of these users start having their own private pools of emergency memory, I'm not sure that's such a great idea either. That's a secondary problem, and could be reduced by something like the memory reservation system I provided

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Peter Zijlstra
On Wed, 2010-08-25 at 23:24 +1000, Dave Chinner wrote: That is, the guarantee that we will always make progress simply does not exist in filesystems, so a mempool-like concept seems to me to be doomed from the start While I appreciate that it might be somewhat (a lot) harder for a

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Peter Zijlstra
On Wed, 2010-08-25 at 07:24 -0400, Ted Ts'o wrote: There are other places where we can fail safely (for example, in jbd's start_this_handle, although that just pushes the layer up the stack, and ultimately, to userspace where most userspace programs don't really expect ENOMEM to get returned

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread David Rientjes
On Wed, 25 Aug 2010, Peter Zijlstra wrote: I'm not sure, but I think the cgroup thing doesn't account kernel allocations, in which case the above problem doesn't exist. Right, the only cgroup that does is cpusets because it binds the memory allocations to a set of nodes. For the cpuset

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Ted Ts'o
On Wed, Aug 25, 2010 at 03:35:42PM +0200, Peter Zijlstra wrote: While I appreciate that it might be somewhat (a lot) harder for a filesystem to provide that guarantee, I'd be deeply worried about your claim that its impossible. It would render a system without swap very prone to deadlocks.

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread David Rientjes
On Wed, 25 Aug 2010, Ted Ts'o wrote: A pool of free pages which is reserved for routines that are doing page cleaning would probably also be a good idea. Maybe that's just retrying with GFP_ATOMIC if a normal allocation fails, or maybe we need our own special pool, or maybe we need to

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread David Rientjes
On Wed, 25 Aug 2010, Peter Zijlstra wrote: The cpusets case is actually the easiest to fix: use GFP_ATOMIC. I don't think that's a valid usage of GFP_ATOMIC, I think we should fallback to outside the cpuset for kernel allocations by default. Cpusets doesn't enforce isolation for only

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Peter Zijlstra
On Wed, 2010-08-25 at 16:11 -0500, Christoph Lameter wrote: We already have __GFP_NOFAIL behavior for slab allocations since a __GFP_NOFAIL flag is passed through to the page allocator if no objects are available. The page allocator will do the same as when called directly with __GFP_NOFAIL

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Christoph Lameter
On Wed, 25 Aug 2010, David Rientjes wrote: It all depends on what flags are passed to kmalloc(), slab nor slub enforce __GFP_NOFAIL behavior themselves. In slab, cache_grow() will return NULL depending on whether the page allocator returns NULL, and that would only happen for __GFP_NORETRY

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Peter Zijlstra
On Wed, 2010-08-25 at 16:53 -0400, Ted Ts'o wrote: On Wed, Aug 25, 2010 at 03:35:42PM +0200, Peter Zijlstra wrote: While I appreciate that it might be somewhat (a lot) harder for a filesystem to provide that guarantee, I'd be deeply worried about your claim that its impossible. It

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread David Rientjes
On Wed, 25 Aug 2010, Christoph Lameter wrote: If the higher order fails in slub then an order 0 alloc is attempted without __GFP_NORETRY. In both cases the nofail behavior of the page allocator determines the outcode. Right, I thought you said the slab layer passes __GFP_NOFAIL when there's

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread David Rientjes
On Wed, 25 Aug 2010, Peter Zijlstra wrote: There's still no hard guarantee that the memory will allocatable (GFP_KERNEL, the compaction, then GFP_ATOMIC could all still fail), but I don't see how continuously looping the page allocator is possibly supposed to help in these situations.

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Dave Chinner
On Wed, Aug 25, 2010 at 03:35:42PM +0200, Peter Zijlstra wrote: On Wed, 2010-08-25 at 23:24 +1000, Dave Chinner wrote: That is, the guarantee that we will always make progress simply does not exist in filesystems, so a mempool-like concept seems to me to be doomed from the start

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Ted Ts'o
On Wed, Aug 25, 2010 at 04:11:38PM -0700, David Rientjes wrote: I'll repropose the patchset with __deprecated as you suggested. Thanks! And what Dave and I are saying is that we'll either need to do our on loop to avoid the deprecation warning, or the use of the deprecated function will

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread David Rientjes
On Wed, 25 Aug 2010, Ted Ts'o wrote: I'll repropose the patchset with __deprecated as you suggested. Thanks! And what Dave and I are saying is that we'll either need to do our on loop to avoid the deprecation warning, or the use of the deprecated function will probably be used forever...

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Christoph Lameter
On Wed, 25 Aug 2010, David Rientjes wrote: On Wed, 25 Aug 2010, Christoph Lameter wrote: If the higher order fails in slub then an order 0 alloc is attempted without __GFP_NORETRY. In both cases the nofail behavior of the page allocator determines the outcode. Right, I thought you

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread Ted Ts'o
On Wed, Aug 25, 2010 at 05:30:42PM -0700, David Rientjes wrote: We certainly hope that nobody will reimplement the same function without the __deprecated warning, especially for order PAGE_ALLOC_COSTLY_ORDER where there's no looping at a higher level. So perhaps the best alternative is

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread David Rientjes
On Wed, 25 Aug 2010, Ted Ts'o wrote: We certainly hope that nobody will reimplement the same function without the __deprecated warning, especially for order PAGE_ALLOC_COSTLY_ORDER where there's no looping at a higher level. So perhaps the best alternative is to implement the same

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-25 Thread David Rientjes
On Wed, 25 Aug 2010, Christoph Lameter wrote: Right, I thought you said the slab layer passes __GFP_NOFAIL when there's no objects available. Yes, the slab layer calls the page allocator when there are no objects available and passes the __GFP_NOFAIL that the user may have set in the

[patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-24 Thread David Rientjes
Add kmalloc_nofail(), kcalloc_nofail(), and kzalloc_nofail(). These functions are equivalent to kmalloc(), kcalloc(), and kzalloc(), respectively, except that they will never return NULL and instead loop forever trying to allocate memory. If the first allocation attempt fails, a warning will be

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-24 Thread Peter Zijlstra
On Tue, 2010-08-24 at 03:50 -0700, David Rientjes wrote: These were added as helper functions for documentation and auditability. No future callers should be added. git grep GFP_NOFAIL isn't auditable enough? might as well declare these functions depricated if you really want to do this. FWIW

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-24 Thread Jens Axboe
On 2010-08-24 15:29, Peter Zijlstra wrote: On Tue, 2010-08-24 at 03:50 -0700, David Rientjes wrote: These were added as helper functions for documentation and auditability. No future callers should be added. git grep GFP_NOFAIL isn't auditable enough? might as well declare these functions

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-24 Thread Dave Chinner
On Tue, Aug 24, 2010 at 03:29:18PM +0200, Peter Zijlstra wrote: On Tue, 2010-08-24 at 03:50 -0700, David Rientjes wrote: These were added as helper functions for documentation and auditability. No future callers should be added. git grep GFP_NOFAIL isn't auditable enough? might as well

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-24 Thread David Rientjes
On Tue, 24 Aug 2010, Jens Axboe wrote: Should be possible to warn at build time for anyone using __GFP_NOFAIL without wrapping it in a function. We could make this __deprecated functions as Peter suggested if you think build time warnings for existing users would be helpful. -- To

Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc

2010-08-24 Thread David Rientjes
On Tue, 24 Aug 2010, Dave Chinner wrote: git grep GFP_NOFAIL isn't auditable enough? might as well declare these functions depricated if you really want to do this. Also, if you are going to add tight loops, you might want to put a backoff in the loops like