Re: Crash kernel with 256 MB reserved memory runs into OOM condition

2019-08-12 Thread Michal Hocko
On Mon 12-08-19 11:42:33, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> On a Dell PowerEdge R7425 with two AMD EPYC 7601 (total 128 threads) and
> 1 TB RAM, the crash kernel with 256 MB of space reserved crashes.
> 
> Please find the messages of the normal and the crash kernel attached.

You will need more memory to reserve for the crash kernel because ...

> [4.548703] Node 0 DMA free:484kB min:4kB low:4kB high:4kB active_anon:0kB 
> inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB 
> writepending:0kB present:568kB managed:484kB mlocked:0kB kernel_stack:0kB 
> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [4.573612] lowmem_reserve[]: 0 125 125 125
> [4.577799] Node 0 DMA32 free:1404kB min:1428kB low:1784kB high:2140kB 
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
> unevictable:15720kB writepending:0kB present:261560kB managed:133752kB 
> mlocked:0kB kernel_stack:2496kB pagetables:0kB bounce:0kB free_pcp:212kB 
> local_pcp:212kB free_cma:0kB

... the memory is really depleted and nothing to be reclaimed (no anon.
file pages) Look how tht free memory is below min watermark (node zone DMA has
lowmem protection for GFP_KERNEL allocation).

[...]
> [4.923156] Out of memory and no killable processes...

and there is no task existing to be killed so we go and panic.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Michal Hocko
On Wed 05-12-18 19:01:03, Nicolas Boichat wrote:
[...]
> > Secondly, why do we need a new sysfs file? Who is going to consume it?
> 
> We have cache_dma, so it seems consistent to add cache_dma32.

I wouldn't copy a pattern unless there is an explicit usecase for it.
We do expose way too much to userspace and that keeps kicking us later.
Not that I am aware of any specific example for cache_dma but seeing
other examples I would rather be more careful.

> I wasn't aware of tools/vm/slabinfo.c, so I can add support for
> cache_dma32 in a follow-up patch. Any other user I should take care
> of?

In general zones are inernal MM implementation details and the less we
export to userspace the better.

> > Then why do we need SLAB_MERGE_SAME to cover GFP_DMA32 as well?
> 
> SLAB_MERGE_SAME tells us which flags _need_ to be the same for the
> slabs to be merged. We don't want slab caches with GFP_DMA32 and
> ~GFP_DMA32 to be merged, so it should be in there.
> (https://elixir.bootlin.com/linux/v4.19.6/source/mm/slab_common.c#L342).

Ohh, my bad, I have misread the change. Sure we definitely not want to
allow merging here. My bad.
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Michal Hocko
On Wed 05-12-18 13:48:27, Nicolas Boichat wrote:
> In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
> data structures smaller than a page with GFP_DMA32 flag.
> 
> This change makes it possible to create a custom cache in DMA32 zone
> using kmem_cache_create, then allocate memory using kmem_cache_alloc.
> 
> We do not create a DMA32 kmalloc cache array, as there are currently
> no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
> ensures that such calls still fail (as they do before this change).

The changelog should be much more specific about decisions made here.
First of all it would be nice to mention the usecase.

Secondly, why do we need a new sysfs file? Who is going to consume it?

Then why do we need SLAB_MERGE_SAME to cover GFP_DMA32 as well? I
thought the whole point is to use dedicated slab cache. Who is this
going to merge with?
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-23 Thread Michal Hocko
On Fri 23-11-18 13:23:41, Vlastimil Babka wrote:
> On 11/22/18 9:23 AM, Christoph Hellwig wrote:
[...]
> > But I do agree with the sentiment of not wanting to spread GFP_DMA32
> > futher into the slab allocator.
> 
> I don't see a problem with GFP_DMA32 for custom caches. Generic
> kmalloc() would be worse, since it would have to create a new array of
> kmalloc caches. But that's already ruled out due to the alignment.

Yes that makes quite a lot of sense to me. We do not really need a
generic support. Just make sure that if somebody creates a GFP_DMA32
restricted cache then allow allocating restricted memory from that.

Is there any fundamental reason that this wouldn't be possible?
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-11-21 Thread Michal Hocko
On Wed 21-11-18 16:46:38, Will Deacon wrote:
> On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote:
> > For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> > is defined (e.g. on arm64 platforms).
> > 
> > For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32.
> > 
> > Also, print an error when the physical address does not fit in
> > 32-bit, to make debugging easier in the future.
> > 
> > Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> > Signed-off-by: Nicolas Boichat 
> > ---
> > 
> > Changes since v1:
> >  - Changed approach to use SLAB_CACHE_DMA32 added by the previous
> >commit.
> >  - Use DMA or DMA32 depending on the architecture (DMA for arm,
> >DMA32 for arm64).
> > 
> > drivers/iommu/io-pgtable-arm-v7s.c | 20 
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> > b/drivers/iommu/io-pgtable-arm-v7s.c
> > index 445c3bde04800c..996f7b6d00b44a 100644
> > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > @@ -161,6 +161,14 @@
> >  
> >  #define ARM_V7S_TCR_PD1BIT(5)
> >  
> > +#ifdef CONFIG_ZONE_DMA32
> > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> > +#else
> > +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> > +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
> > +#endif
> 
> It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit
> architectures, since then we wouldn't need this #ifdeffery afaict.

But GFP_DMA32 should map to GFP_KERNEL on 32b, no? Or what exactly is
going on in here?

-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] mm/cma: remove unsupported gfp_mask parameter from cma_alloc()

2018-07-11 Thread Michal Hocko
On Wed 11-07-18 16:35:28, Joonsoo Kim wrote:
> 2018-07-10 18:50 GMT+09:00 Michal Hocko :
> > On Tue 10-07-18 16:19:32, Joonsoo Kim wrote:
> >> Hello, Marek.
> >>
> >> 2018-07-09 21:19 GMT+09:00 Marek Szyprowski :
> >> > cma_alloc() function doesn't really support gfp flags other than
> >> > __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn parameter.
> >>
> >> Although gfp_mask isn't used in cma_alloc() except no_warn, it can be used
> >> in alloc_contig_range(). For example, if passed gfp mask has no __GFP_FS,
> >> compaction(isolation) would work differently. Do you have considered
> >> such a case?
> >
> > Does any of cma_alloc users actually care about GFP_NO{FS,IO}?
> 
> I don't know. My guess is that cma_alloc() is used for DMA allocation so
> block device would use it, too. If fs/block subsystem initiates the
> request for the device,
> it would be possible that cma_alloc() is called with such a flag.
> Again, I don't know
> much about those subsystem so I would be wrong.

The patch converts existing users and none of them really tries to use
anything other than GFP_KERNEL [|__GFP_NOWARN] so this doesn't seem to
be the case. Should there be a new user requiring more restricted
gfp_mask we should carefuly re-evaluate and think how to support it.

Until then I would simply stick with the proposed approach because my
experience tells me that a wrong gfp mask usage is way too easy so the
simpler the api is the less likely we will see an abuse.
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] mm/cma: remove unsupported gfp_mask parameter from cma_alloc()

2018-07-10 Thread Michal Hocko
On Tue 10-07-18 16:19:32, Joonsoo Kim wrote:
> Hello, Marek.
> 
> 2018-07-09 21:19 GMT+09:00 Marek Szyprowski :
> > cma_alloc() function doesn't really support gfp flags other than
> > __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn parameter.
> 
> Although gfp_mask isn't used in cma_alloc() except no_warn, it can be used
> in alloc_contig_range(). For example, if passed gfp mask has no __GFP_FS,
> compaction(isolation) would work differently. Do you have considered
> such a case?

Does any of cma_alloc users actually care about GFP_NO{FS,IO}?
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] mm/cma: remove unsupported gfp_mask parameter from cma_alloc()

2018-07-09 Thread Michal Hocko
On Mon 09-07-18 14:19:55, Marek Szyprowski wrote:
> cma_alloc() function doesn't really support gfp flags other than
> __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn parameter.
> 
> This will help to avoid giving false feeling that this function supports
> standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
> what has already been an issue: see commit dd65a941f6ba ("arm64:
> dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").
> 
> Signed-off-by: Marek Szyprowski 

Thanks! This makes perfect sense to me. If there is a real need for the
gfp_mask then we should start by defining the semantic first.

Acked-by: Michal Hocko 

> ---
>  arch/powerpc/kvm/book3s_hv_builtin.c   | 2 +-
>  drivers/s390/char/vmcp.c   | 2 +-
>  drivers/staging/android/ion/ion_cma_heap.c | 2 +-
>  include/linux/cma.h| 2 +-
>  kernel/dma/contiguous.c| 3 ++-
>  mm/cma.c   | 8 
>  mm/cma_debug.c | 2 +-
>  7 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
> b/arch/powerpc/kvm/book3s_hv_builtin.c
> index d4a3f4da409b..fc6bb9630a9c 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -77,7 +77,7 @@ struct page *kvm_alloc_hpt_cma(unsigned long nr_pages)
>   VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
>  
>   return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES),
> -  GFP_KERNEL);
> +  false);
>  }
>  EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma);
>  
> diff --git a/drivers/s390/char/vmcp.c b/drivers/s390/char/vmcp.c
> index 948ce82a7725..0fa1b6b1491a 100644
> --- a/drivers/s390/char/vmcp.c
> +++ b/drivers/s390/char/vmcp.c
> @@ -68,7 +68,7 @@ static void vmcp_response_alloc(struct vmcp_session 
> *session)
>* anymore the system won't work anyway.
>*/
>   if (order > 2)
> - page = cma_alloc(vmcp_cma, nr_pages, 0, GFP_KERNEL);
> + page = cma_alloc(vmcp_cma, nr_pages, 0, false);
>   if (page) {
>   session->response = (char *)page_to_phys(page);
>   session->cma_alloc = 1;
> diff --git a/drivers/staging/android/ion/ion_cma_heap.c 
> b/drivers/staging/android/ion/ion_cma_heap.c
> index 49718c96bf9e..3fafd013d80a 100644
> --- a/drivers/staging/android/ion/ion_cma_heap.c
> +++ b/drivers/staging/android/ion/ion_cma_heap.c
> @@ -39,7 +39,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct 
> ion_buffer *buffer,
>   if (align > CONFIG_CMA_ALIGNMENT)
>   align = CONFIG_CMA_ALIGNMENT;
>  
> - pages = cma_alloc(cma_heap->cma, nr_pages, align, GFP_KERNEL);
> + pages = cma_alloc(cma_heap->cma, nr_pages, align, false);
>   if (!pages)
>   return -ENOMEM;
>  
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index bf90f0bb42bd..190184b5ff32 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -33,7 +33,7 @@ extern int cma_init_reserved_mem(phys_addr_t base, 
> phys_addr_t size,
>   const char *name,
>   struct cma **res_cma);
>  extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int 
> align,
> -   gfp_t gfp_mask);
> +   bool no_warn);
>  extern bool cma_release(struct cma *cma, const struct page *pages, unsigned 
> int count);
>  
>  extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void 
> *data);
> diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
> index d987dcd1bd56..19ea5d70150c 100644
> --- a/kernel/dma/contiguous.c
> +++ b/kernel/dma/contiguous.c
> @@ -191,7 +191,8 @@ struct page *dma_alloc_from_contiguous(struct device 
> *dev, size_t count,
>   if (align > CONFIG_CMA_ALIGNMENT)
>   align = CONFIG_CMA_ALIGNMENT;
>  
> - return cma_alloc(dev_get_cma_area(dev), count, align, gfp_mask);
> + return cma_alloc(dev_get_cma_area(dev), count, align,
> +  gfp_mask & __GFP_NOWARN);
>  }
>  
>  /**
> diff --git a/mm/cma.c b/mm/cma.c
> index 5809bbe360d7..4cb76121a3ab 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -395,13 +395,13 @@ static inline void cma_debug_show_areas(struct cma 
> *cma) { }
>   * @cma:   Contiguous memory region for which the allocation is performed.
>   * @count: Requested number of pages.
>   * @align: Requested alignment of pages (in PAGE_SIZE order).
> - * @gfp_mask:  GFP mask 

Re: [External] Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-30 Thread Michal Hocko
On Wed 30-05-18 09:02:13, Huaisheng HS1 Ye wrote:
> From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On Behalf Of 
> Michal Hocko
> Sent: Monday, May 28, 2018 9:38 PM
> > > In my opinion, originally there shouldn't be such many wrong
> > > combinations of these bottom 3 bits. For any user, whether or
> > > driver and fs, they should make a decision that which zone is they
> > > preferred. Matthew's idea is great, because with it the user must
> > > offer an unambiguous flag to gfp zone bits.
> > 
> > Well, I would argue that those shouldn't really care about any zones at
> > all. All they should carea bout is whether they really need a low mem
> > zone (aka directly accessible to the kernel), highmem or they are the
> > allocation is generally movable. Mixing zones into the picture just
> > makes the whole thing more complicated and error prone.
> 
> Dear Michal,
> 
> I don't quite understand that. I think those, mostly drivers, need to
> get the correct zone they want. ZONE_DMA32 is an example, if drivers can be
> satisfied with a low mem zone, why they mark the gfp flags as
> 'GFP_KERNEL|__GFP_DMA32'?
> GFP_KERNEL is enough to make sure a directly accessible low mem, but it is
> obvious that they want to get a DMA accessible zone below 4G.

They want a specific pfn range. Not a _zone_. Zone is an MM abstraction
to manage memory. And not a great one as the time has shown. We have
moved away from the per-zone reclaim because it just turned out to be
problematic. Leaking this abstraction to users was a mistake IMHO. It
was surely convenient but we can clearly see it was just confusing and
many users just got it wrong.

I do agree with Christoph in other email that the proper way for DMA
users is to use the existing DMA API which is more towards what they
need. Set a restriction on dma-able memory ranges.
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [External] Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-28 Thread Michal Hocko
On Fri 25-05-18 09:43:09, Huaisheng HS1 Ye wrote:
> From: Michal Hocko [mailto:mho...@kernel.org]
> Sent: Thursday, May 24, 2018 8:19 PM> 
> > > Let me try to reply your questions.
> > > Exactly, GFP_ZONE_TABLE is too complicated. I think there are two 
> > > advantages
> > > from the series of patches.
> > >
> > > 1. XOR operation is simple and efficient, GFP_ZONE_TABLE/BAD need to do 
> > > twice
> > > shift operations, the first is for getting a zone_type and the second is 
> > > for
> > > checking the to be returned type is a correct or not. But with these 
> > > patch XOR
> > > operation just needs to use once. Because the bottom 3 bits of GFP 
> > > bitmask have
> > > been used to represent the encoded zone number, we can say there is no 
> > > bad zone
> > > number if all callers could use it without buggy way. Of course, the 
> > > returned
> > > zone type in gfp_zone needs to be no more than ZONE_MOVABLE.
> > 
> > But you are losing the ability to check for wrong usage. And it seems
> > that the sad reality is that the existing code do screw up.
> 
> In my opinion, originally there shouldn't be such many wrong
> combinations of these bottom 3 bits. For any user, whether or
> driver and fs, they should make a decision that which zone is they
> preferred. Matthew's idea is great, because with it the user must
> offer an unambiguous flag to gfp zone bits.

Well, I would argue that those shouldn't really care about any zones at
all. All they should carea bout is whether they really need a low mem
zone (aka directly accessible to the kernel), highmem or they are the
allocation is generally movable. Mixing zones into the picture just
makes the whole thing more complicated and error prone.
[...]
> > That being said. I am not saying that I am in love with GFP_ZONE_TABLE.
> > It always makes my head explode when I look there but it seems to work
> > with the current code and it is optimized for it. If you want to change
> > this then you should make sure you describe reasons _why_ this is an
> > improvement. And I would argue that "we can have more zones" is a
> > relevant one.
> 
> Yes, GFP_ZONE_TABLE is too complicated. The patches have 4 advantages as 
> below.
> 
> * The address zone modifiers have new operation method, that is, user should 
> decide which zone is preferred at first, then give the encoded zone number to 
> bottom 3 bits in GFP mask. That is much direct and clear than before.
> 
> * No bad zone combination, because user should choose just one address zone 
> modifier always.
> * Better performance and efficiency, current gfp_zone has to take shifting 
> operation twice for GFP_ZONE_TABLE and GFP_ZONE_BAD. With these patches, 
> gfp_zone() just needs one XOR.
> * Up to 8 zones can be used. At least it isn't a disadvantage, right?

This should be a part of the changelog. Please note that you should
provide some number if you claim performance benefits. The complexity
will always be subjective.
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-28 Thread Michal Hocko
On Fri 25-05-18 05:00:44, Matthew Wilcox wrote:
> On Thu, May 24, 2018 at 05:29:43PM +0200, Michal Hocko wrote:
> > > ie if we had more,
> > > could we solve our pain by making them more generic?
> > 
> > Well, if you have more you will consume more bits in the struct pages,
> > right?
> 
> Not necessarily ... the zone number is stored in the struct page
> currently, so either two or three bits are used right now.  In my
> proposal, one can infer the zone of a page from its PFN, except for
> ZONE_MOVABLE.  So we could trim down to just one bit per struct page
> for 32-bit machines while using 3 bits on 64-bit machines, where there
> is plenty of space.

Just be warned that page_zone is called from many hot paths. I am not
sure adding something more complex there is going to fly.

> > > it more-or-less sucks that the devices with 28-bit DMA limits are forced
> > > to allocate from the low 16MB when they're perfectly capable of using the
> > > low 256MB.
> > 
> > Do we actually care all that much about those? If yes then we should
> > probably follow the ZONE_DMA (x86) path and use a CMA region for them.
> > I mean most devices should be good with very limited addressability or
> > below 4G, no?
> 
> Sure.  One other thing I meant to mention was the media devices
> (TV capture cards and so on) which want a vmalloc_32() allocation.
> On 32-bit machines right now, we allocate from LOWMEM, when we really
> should be allocating from the 1GB-4GB region.  32-bit machines generally
> don't have a ZONE_DMA32 today.

Well, _I_ think that vmalloc on 32b is just lost case...

-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-24 Thread Michal Hocko
On Thu 24-05-18 08:18:18, Matthew Wilcox wrote:
> On Thu, May 24, 2018 at 02:23:23PM +0200, Michal Hocko wrote:
> > > If we had eight ZONEs, we could offer:
> > 
> > No, please no more zones. What we have is quite a maint. burden on its
> > own. Ideally we should only have lowmem, highmem and special/device
> > zones for directly kernel accessible memory, the one that the kernel
> > cannot or must not use and completely special memory managed out of
> > the page allocator. All the remaining constrains should better be
> > implemented on top.
> 
> I believe you when you say that they're a maintenance pain.  Is that
> maintenance pain because they're so specialised?

Well, it used to be LRU balancing which is gone with the node reclaim
but that brings new challenges. Now as you say their meaning is not
really clear to users and that leads to bugs left and right.

> ie if we had more,
> could we solve our pain by making them more generic?

Well, if you have more you will consume more bits in the struct pages,
right?

[...]

> > But those already do have aproper API, IIUC. So do we really need to
> > make our GFP_*/Zone API more complicated than it already is?
> 
> I don't want to change the driver API (setting the DMA mask, etc),
> but we don't actually have a good API to the page allocator for the
> implementation of dma_alloc_foo() to request pages.  More or less,
> architectures do:
> 
>   if (mask < 4GB)
>   alloc_page(GFP_DMA)
>   else if (mask < 64EB)
>   alloc_page(GFP_DMA32)
>   else
>   alloc_page(GFP_HIGHMEM)
> 
> it more-or-less sucks that the devices with 28-bit DMA limits are forced
> to allocate from the low 16MB when they're perfectly capable of using the
> low 256MB.

Do we actually care all that much about those? If yes then we should
probably follow the ZONE_DMA (x86) path and use a CMA region for them.
I mean most devices should be good with very limited addressability or
below 4G, no?
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-24 Thread Michal Hocko
On Wed 23-05-18 22:19:19, Matthew Wilcox wrote:
> On Tue, May 22, 2018 at 08:37:28PM +0200, Michal Hocko wrote:
> > So why is this any better than the current code. Sure I am not a great
> > fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> > doesn't look too much better, yet we are losing a check for incompatible
> > gfp flags. The diffstat looks really sound but then you just look and
> > see that the large part is the comment that at least explained the gfp
> > zone modifiers somehow and the debugging code. So what is the selling
> > point?
> 
> I have a plan, but it's not exactly fully-formed yet.
> 
> One of the big problems we have today is that we have a lot of users
> who have constraints on the physical memory they want to allocate,
> but we have very limited abilities to provide them with what they're
> asking for.  The various different ZONEs have different meanings on
> different architectures and are generally a mess.

Agreed.

> If we had eight ZONEs, we could offer:

No, please no more zones. What we have is quite a maint. burden on its
own. Ideally we should only have lowmem, highmem and special/device
zones for directly kernel accessible memory, the one that the kernel
cannot or must not use and completely special memory managed out of
the page allocator. All the remaining constrains should better be
implemented on top.

> ZONE_16M  // 24 bit
> ZONE_256M // 28 bit
> ZONE_LOWMEM   // CONFIG_32BIT only
> ZONE_4G   // 32 bit
> ZONE_64G  // 36 bit
> ZONE_1T   // 40 bit
> ZONE_ALL  // everything larger
> ZONE_MOVABLE  // movable allocations; no physical address guarantees
> 
> #ifdef CONFIG_64BIT
> #define ZONE_NORMAL   ZONE_ALL
> #else
> #define ZONE_NORMAL   ZONE_LOWMEM
> #endif
> 
> This would cover most driver DMA mask allocations; we could tweak the
> offered zones based on analysis of what people need.

But those already do have aproper API, IIUC. So do we really need to
make our GFP_*/Zone API more complicated than it already is?

> #define GFP_HIGHUSER  (GFP_USER | ZONE_ALL)
> #define GFP_HIGHUSER_MOVABLE  (GFP_USER | ZONE_MOVABLE)
> 
> One other thing I want to see is that fallback from zones happens from
> highest to lowest normally (ie if you fail to allocate in 1T, then you
> try to allocate from 64G), but movable allocations hapen from lowest
> to highest.  So ZONE_16M ends up full of page cache pages which are
> readily evictable for the rare occasions when we need to allocate memory
> below 16MB.
> 
> I'm sure there are lots of good reasons why this won't work, which is
> why I've been hesitant to propose it before now.

I am worried you are playing with a can of worms...
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [External] Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-24 Thread Michal Hocko
On Wed 23-05-18 16:07:16, Huaisheng HS1 Ye wrote:
> From: Michal Hocko [mailto:mho...@kernel.org]
> Sent: Wednesday, May 23, 2018 2:37 AM
> > 
> > On Mon 21-05-18 23:20:21, Huaisheng Ye wrote:
> > > From: Huaisheng Ye <ye...@lenovo.com>
> > >
> > > Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.
> > >
> > > Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
> > > the bottom three bits of GFP mask is reserved for storing encoded
> > > zone number.
> > >
> > > The encoding method is XOR. Get zone number from enum zone_type,
> > > then encode the number with ZONE_NORMAL by XOR operation.
> > > The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
> > > the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
> > > can be used as before.
> > >
> > > Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
> > > a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
> > > for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
> > > __GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
> > > __GFP_ZONE_MOVABLE is created to realize it.
> > >
> > > With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
> > > enough to get ZONE_MOVABLE from gfp_zone. All callers should use
> > > GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that.
> > >
> > > Decode zone number directly from bottom three bits of flags in gfp_zone.
> > > The theory of encoding and decoding is,
> > > A ^ B ^ B = A
> > 
> > So why is this any better than the current code. Sure I am not a great
> > fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
> > doesn't look too much better, yet we are losing a check for incompatible
> > gfp flags. The diffstat looks really sound but then you just look and
> > see that the large part is the comment that at least explained the gfp
> > zone modifiers somehow and the debugging code. So what is the selling
> > point?
> 
> Dear Michal,
> 
> Let me try to reply your questions.
> Exactly, GFP_ZONE_TABLE is too complicated. I think there are two advantages
> from the series of patches.
> 
> 1. XOR operation is simple and efficient, GFP_ZONE_TABLE/BAD need to do twice
> shift operations, the first is for getting a zone_type and the second is for
> checking the to be returned type is a correct or not. But with these patch XOR
> operation just needs to use once. Because the bottom 3 bits of GFP bitmask 
> have
> been used to represent the encoded zone number, we can say there is no bad 
> zone
> number if all callers could use it without buggy way. Of course, the returned
> zone type in gfp_zone needs to be no more than ZONE_MOVABLE.

But you are losing the ability to check for wrong usage. And it seems
that the sad reality is that the existing code do screw up.

> 2. GFP_ZONE_TABLE has limit with the amount of zone types. Current 
> GFP_ZONE_TABLE
> is 32 bits, in general, there are 4 zone types for most ofX86_64 platform, 
> they
> are ZONE_DMA, ZONE_DMA32, ZONE_NORMAL and ZONE_MOVABLE. If we want to expand 
> the
> amount of zone types to larger than 4, the zone shift should be 3.

But we do not want to expand the number of zones IMHO. The existing zoo
is quite a maint. pain.
 
That being said. I am not saying that I am in love with GFP_ZONE_TABLE.
It always makes my head explode when I look there but it seems to work
with the current code and it is optimized for it. If you want to change
this then you should make sure you describe reasons _why_ this is an
improvement. And I would argue that "we can have more zones" is a
relevant one.
-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/12] get rid of GFP_ZONE_TABLE/BAD

2018-05-22 Thread Michal Hocko
On Mon 21-05-18 23:20:21, Huaisheng Ye wrote:
> From: Huaisheng Ye <ye...@lenovo.com>
> 
> Replace GFP_ZONE_TABLE and GFP_ZONE_BAD with encoded zone number.
> 
> Delete ___GFP_DMA, ___GFP_HIGHMEM and ___GFP_DMA32 from GFP bitmasks,
> the bottom three bits of GFP mask is reserved for storing encoded
> zone number.
> 
> The encoding method is XOR. Get zone number from enum zone_type,
> then encode the number with ZONE_NORMAL by XOR operation.
> The goal is to make sure ZONE_NORMAL can be encoded to zero. So,
> the compatibility can be guaranteed, such as GFP_KERNEL and GFP_ATOMIC
> can be used as before.
> 
> Reserve __GFP_MOVABLE in bit 3, so that it can continue to be used as
> a flag. Same as before, __GFP_MOVABLE respresents movable migrate type
> for ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. But when it is enabled with
> __GFP_HIGHMEM, ZONE_MOVABLE shall be returned instead of ZONE_HIGHMEM.
> __GFP_ZONE_MOVABLE is created to realize it.
> 
> With this patch, just enabling __GFP_MOVABLE and __GFP_HIGHMEM is not
> enough to get ZONE_MOVABLE from gfp_zone. All callers should use
> GFP_HIGHUSER_MOVABLE or __GFP_ZONE_MOVABLE directly to achieve that.
> 
> Decode zone number directly from bottom three bits of flags in gfp_zone.
> The theory of encoding and decoding is,
> A ^ B ^ B = A

So why is this any better than the current code. Sure I am not a great
fan of GFP_ZONE_TABLE because of how it is incomprehensible but this
doesn't look too much better, yet we are losing a check for incompatible
gfp flags. The diffstat looks really sound but then you just look and
see that the large part is the comment that at least explained the gfp
zone modifiers somehow and the debugging code. So what is the selling
point?

> Changes since v1,
> 
> v2: Add __GFP_ZONE_MOVABLE and modify GFP_HIGHUSER_MOVABLE to help
> callers to get ZONE_MOVABLE. Add __GFP_ZONE_MASK to mask lowest 3
> bits of GFP bitmasks.
> Modify some callers' gfp flag to update usage of address zone
> modifiers.
> Modify inline function gfp_zone to get better performance according
> to Matthew's suggestion.
> 
> Link: https://marc.info/?l=linux-mm=152596791931266=2
> 
> Huaisheng Ye (12):
>   include/linux/gfp.h: get rid of GFP_ZONE_TABLE/BAD
>   arch/x86/kernel/amd_gart_64: update usage of address zone modifiers
>   arch/x86/kernel/pci-calgary_64: update usage of address zone modifiers
>   drivers/iommu/amd_iommu: update usage of address zone modifiers
>   include/linux/dma-mapping: update usage of address zone modifiers
>   drivers/xen/swiotlb-xen: update usage of address zone modifiers
>   fs/btrfs/extent_io: update usage of address zone modifiers
>   drivers/block/zram/zram_drv: update usage of address zone modifiers
>   mm/vmpressure: update usage of address zone modifiers
>   mm/zsmalloc: update usage of address zone modifiers
>   include/linux/highmem: update usage of movableflags
>   arch/x86/include/asm/page.h: update usage of movableflags
> 
>  arch/x86/include/asm/page.h  |  3 +-
>  arch/x86/kernel/amd_gart_64.c|  2 +-
>  arch/x86/kernel/pci-calgary_64.c |  2 +-
>  drivers/block/zram/zram_drv.c|  6 +--
>  drivers/iommu/amd_iommu.c|  2 +-
>  drivers/xen/swiotlb-xen.c|  2 +-
>  fs/btrfs/extent_io.c |  2 +-
>  include/linux/dma-mapping.h  |  2 +-
>  include/linux/gfp.h  | 98 
> +---
>  include/linux/highmem.h      |  4 +-
>  mm/vmpressure.c  |  2 +-
>  mm/zsmalloc.c|  4 +-
>  12 files changed, 26 insertions(+), 103 deletions(-)
> 
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] lib/test: Delete five error messages for a failed memory allocation

2017-10-09 Thread Michal Hocko
On Sat 07-10-17 19:48:37, SF Markus Elfring wrote:
> From: Markus Elfring <elfr...@users.sourceforge.net>
> Date: Sat, 7 Oct 2017 17:34:23 +0200
> 
> Omit extra messages for a memory allocation failure in these functions.
> 
> This issue was detected by using the Coccinelle software.

Yes this makes sense. None of those really explain what the allocation
failure effect is. The allocation path already tells us about the
failure enough.

> Signed-off-by: Markus Elfring <elfr...@users.sourceforge.net>

Acked-by: Michal Hocko <mho...@suse.com>

> ---
>  lib/test_kasan.c | 5 ++---
>  lib/test_kmod.c  | 8 ++--
>  lib/test_list_sort.c | 9 +++--
>  3 files changed, 7 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/test_kasan.c b/lib/test_kasan.c
> index a25c9763fce1..ef1a3ac1397e 100644
> --- a/lib/test_kasan.c
> +++ b/lib/test_kasan.c
> @@ -353,10 +353,9 @@ static noinline void __init 
> memcg_accounted_kmem_cache(void)
>*/
>   for (i = 0; i < 5; i++) {
>   p = kmem_cache_alloc(cache, GFP_KERNEL);
> - if (!p) {
> - pr_err("Allocation failed\n");
> + if (!p)
>   goto free_cache;
> - }
> +
>   kmem_cache_free(cache, p);
>   msleep(100);
>   }
> diff --git a/lib/test_kmod.c b/lib/test_kmod.c
> index fba78d25e825..337f408b4de6 100644
> --- a/lib/test_kmod.c
> +++ b/lib/test_kmod.c
> @@ -783,10 +783,8 @@ static int kmod_config_sync_info(struct kmod_test_device 
> *test_dev)
>   free_test_dev_info(test_dev);
>   test_dev->info = vzalloc(config->num_threads *
>sizeof(struct kmod_test_device_info));
> - if (!test_dev->info) {
> - dev_err(test_dev->dev, "Cannot alloc test_dev info\n");
> + if (!test_dev->info)
>   return -ENOMEM;
> - }
>  
>   return 0;
>  }
> @@ -1089,10 +1087,8 @@ static struct kmod_test_device 
> *alloc_test_dev_kmod(int idx)
>   struct miscdevice *misc_dev;
>  
>   test_dev = vzalloc(sizeof(struct kmod_test_device));
> - if (!test_dev) {
> - pr_err("Cannot alloc test_dev\n");
> + if (!test_dev)
>   goto err_out;
> - }
>  
>   mutex_init(_dev->config_mutex);
>   mutex_init(_dev->trigger_mutex);
> diff --git a/lib/test_list_sort.c b/lib/test_list_sort.c
> index 28e817387b04..5474f3f3e41d 100644
> --- a/lib/test_list_sort.c
> +++ b/lib/test_list_sort.c
> @@ -76,17 +76,14 @@ static int __init list_sort_test(void)
>   pr_debug("start testing list_sort()\n");
>  
>   elts = kcalloc(TEST_LIST_LEN, sizeof(*elts), GFP_KERNEL);
> - if (!elts) {
> - pr_err("error: cannot allocate memory\n");
> + if (!elts)
>   return err;
> - }
>  
>   for (i = 0; i < TEST_LIST_LEN; i++) {
>   el = kmalloc(sizeof(*el), GFP_KERNEL);
> - if (!el) {
> - pr_err("error: cannot allocate memory\n");
> + if (!el)
>   goto exit;
> - }
> +
>/* force some equivalencies */
>   el->value = prandom_u32() % (TEST_LIST_LEN / 3);
>   el->serial = i;
> -- 
> 2.14.2

-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] mm: move function alloc_pages_exact_nid out of __meminit

2017-09-26 Thread Michal Hocko
On Thu 21-09-17 14:29:19, Ganapatrao Kulkarni wrote:
> This function can be used on NUMA systems in place of alloc_pages_exact
> Adding code to export and to remove __meminit section tagging.

It is usually better to fold such a change into a patch which adds a new
user. Other than that I do not have any objections.

> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulka...@cavium.com>

Acked-by: Michal Hocko <mho...@suse.com>

> ---
>  include/linux/gfp.h | 2 +-
>  mm/page_alloc.c | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index f780718..a4bd234 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -528,7 +528,7 @@ extern unsigned long get_zeroed_page(gfp_t gfp_mask);
>  
>  void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
>  void free_pages_exact(void *virt, size_t size);
> -void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
> +void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
>  
>  #define __get_free_page(gfp_mask) \
>   __get_free_pages((gfp_mask), 0)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c841af8..7975870 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4442,7 +4442,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
>   * Like alloc_pages_exact(), but try to allocate on node nid first before 
> falling
>   * back.
>   */
> -void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
> +void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
>  {
>   unsigned int order = get_order(size);
>   struct page *p = alloc_pages_node(nid, gfp_mask, order);
> @@ -4450,6 +4450,7 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t 
> size, gfp_t gfp_mask)
>   return NULL;
>   return make_alloc_exact((unsigned long)page_address(p), order, size);
>  }
> +EXPORT_SYMBOL(alloc_pages_exact_nid);
>  
>  /**
>   * free_pages_exact - release memory allocated via alloc_pages_exact()
> -- 
> 2.9.4
> 

-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] mm: wire up GFP flag passing in dma_alloc_from_contiguous

2017-01-27 Thread Michal Hocko
On Thu 19-01-17 18:07:07, Lucas Stach wrote:
> The callers of the DMA alloc functions already provide the proper
> context GFP flags. Make sure to pass them through to the CMA
> allocator, to make the CMA compaction context aware.
> 
> Signed-off-by: Lucas Stach <l.st...@pengutronix.de>

Looks good to me
Acked-by: Michal Hocko <mho...@suse.com>

> ---
>  arch/arm/mm/dma-mapping.c  | 16 +---
>  arch/arm64/mm/dma-mapping.c|  4 ++--
>  arch/mips/mm/dma-default.c |  4 ++--
>  arch/x86/kernel/pci-dma.c  |  3 ++-
>  arch/xtensa/kernel/pci-dma.c   |  3 ++-
>  drivers/base/dma-contiguous.c  |  5 +++--
>  drivers/iommu/amd_iommu.c  |  2 +-
>  drivers/iommu/intel-iommu.c|  2 +-
>  include/linux/dma-contiguous.h |  4 ++--
>  9 files changed, 24 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index ab7710002ba6..4d6ec7d821c8 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -349,7 +349,7 @@ static void __dma_free_buffer(struct page *page, size_t 
> size)
>  static void *__alloc_from_contiguous(struct device *dev, size_t size,
>pgprot_t prot, struct page **ret_page,
>const void *caller, bool want_vaddr,
> -  int coherent_flag);
> +  int coherent_flag, gfp_t gfp);
>  
>  static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
>pgprot_t prot, struct page **ret_page,
> @@ -420,7 +420,8 @@ static int __init atomic_pool_init(void)
>*/
>   if (dev_get_cma_area(NULL))
>   ptr = __alloc_from_contiguous(NULL, atomic_pool_size, prot,
> -   , atomic_pool_init, true, NORMAL);
> +   , atomic_pool_init, true, NORMAL,
> +   GFP_KERNEL);
>   else
>   ptr = __alloc_remap_buffer(NULL, atomic_pool_size, gfp, prot,
>  , atomic_pool_init, true);
> @@ -594,14 +595,14 @@ static int __free_from_pool(void *start, size_t size)
>  static void *__alloc_from_contiguous(struct device *dev, size_t size,
>pgprot_t prot, struct page **ret_page,
>const void *caller, bool want_vaddr,
> -  int coherent_flag)
> +  int coherent_flag, gfp_t gfp)
>  {
>   unsigned long order = get_order(size);
>   size_t count = size >> PAGE_SHIFT;
>   struct page *page;
>   void *ptr = NULL;
>  
> - page = dma_alloc_from_contiguous(dev, count, order);
> + page = dma_alloc_from_contiguous(dev, count, order, gfp);
>   if (!page)
>   return NULL;
>  
> @@ -655,7 +656,7 @@ static inline pgprot_t __get_dma_pgprot(unsigned long 
> attrs, pgprot_t prot)
>  #define __get_dma_pgprot(attrs, prot)
> __pgprot(0)
>  #define __alloc_remap_buffer(dev, size, gfp, prot, ret, c, wv)   NULL
>  #define __alloc_from_pool(size, ret_page)NULL
> -#define __alloc_from_contiguous(dev, size, prot, ret, c, wv, coherent_flag)  
> NULL
> +#define __alloc_from_contiguous(dev, size, prot, ret, c, wv, coherent_flag, 
> gfp) NULL
>  #define __free_from_pool(cpu_addr, size) do { } while (0)
>  #define __free_from_contiguous(dev, page, cpu_addr, size, wv)do { } 
> while (0)
>  #define __dma_free_remap(cpu_addr, size) do { } while (0)
> @@ -697,7 +698,8 @@ static void *cma_allocator_alloc(struct 
> arm_dma_alloc_args *args,
>  {
>   return __alloc_from_contiguous(args->dev, args->size, args->prot,
>  ret_page, args->caller,
> -args->want_vaddr, args->coherent_flag);
> +args->want_vaddr, args->coherent_flag,
> +args->gfp);
>  }
>  
>  static void cma_allocator_free(struct arm_dma_free_args *args)
> @@ -1293,7 +1295,7 @@ static struct page **__iommu_alloc_buffer(struct device 
> *dev, size_t size,
>   unsigned long order = get_order(size);
>   struct page *page;
>  
> - page = dma_alloc_from_contiguous(dev, count, order);
> + page = dma_alloc_from_contiguous(dev, count, order, gfp);
>   if (!page)
>   goto error;
>  
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/

Re: [PATCH 1/3] mm: alloc_contig_range: allow to specify GFP mask

2017-01-27 Thread Michal Hocko
On Fri 20-01-17 13:35:40, Vlastimil Babka wrote:
> On 01/19/2017 06:07 PM, Lucas Stach wrote:
[...]
> > @@ -7255,7 +7256,7 @@ int alloc_contig_range(unsigned long start, unsigned 
> > long end,
> > .zone = page_zone(pfn_to_page(start)),
> > .mode = MIGRATE_SYNC,
> > .ignore_skip_hint = true,
> > -   .gfp_mask = GFP_KERNEL,
> > +   .gfp_mask = gfp_mask,
> 
> I think you should apply memalloc_noio_flags() here (and Michal should
> then convert it to the new name in his scoped gfp_nofs series). Note
> that then it's technically a functional change, but it's needed.
> Otherwise looks good.

yes, with that added, feel free to add
Acked-by: Michal Hocko <mho...@suse.com>

> 
> > };
> > INIT_LIST_HEAD();
> >  
> > 

-- 
Michal Hocko
SUSE Labs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu