Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-22 Thread Andrea Arcangeli
Hello,

On Mon, Jun 22, 2020 at 04:30:41PM +0100, Robin Murphy wrote:
> On 2020-06-22 13:46, Joerg Roedel wrote:
> > + Robin
> > 
> > Robin, any idea on this?
> 
> After a bit of archaeology, this dates back to the original review:
> 
> https://lore.kernel.org/linux-arm-kernel/54c285d4.3070...@arm.com/
> https://lore.kernel.org/linux-arm-kernel/54da2666.9030...@arm.com/
> 
> In summary: originally this inherited from other arch code that did 
> simply strip __GFP_COMP; that was deemed questionable because of the 
> nonsensical comment about CONFIG_HUGETLBFS that was stuck to it; the 
> current code is like it is because in 5 and a half years nobody said 
> that it's wrong :)
> 
> If there actually *are* good reasons for stripping __GFP_COMP, then I've 
> certainly no objection to doing so.

The main question is if there's any good reasons for not forbidding
__GFP_COMP to be specified in the callers. The reason given in the
comment isn't convincing.

I don't see how a caller that gets a pointer can care about how the
page structure looks like and in turn why it's asking for __GFP_COMP.

As far as I can tell there are two orthogonal issues in play here:

1) The comment about __GFP_COMP facilitating the sound driver to do
   partial mapping doesn't make much sense. It's probably best to
   WARN_ON immediately in dma_alloc_coherent if __GFP_COMP is
   specified, not only down the call stack in the
   __iommu_dma_alloc_pages() path.

   Note: the CMA paths would already ignore __GFP_COMP if it's
   specified so that __GFP_COMP request can already be ignored. It
   sounds preferable to warn the caller it's asking something it can't
   get, than to silently ignore __GFP_COMP.

   On a side note: hugetlbfs/THP pages can only be allocated with
   __GFP_COMP because for example put_page() must work on all tail
   pages (you can't call compound_head() unless the tail page is part
   of a compound page). But for private driver pages mapped by
   remap_pfn_range, any full or partial mapping is done manually and
   nobody can call GUP on VM_PFNMAP|VM_IO anyway (there's not even the
   requirement of a page struct backing those mappings in fact).

2) __iommu_dma_alloc_pages cannot use __GFP_COMP if it intends to
   return an array of small pages, which is the only thing that the
   current sg_alloc_table_from_pages() supports in input. split_page
   will work as expected to generate small pages from non-compound
   order>0 pages, incidentally it's implement on mm/page_alloc.c, not
   in huge_memory.c.

   split_huge_page as opposed is not intended to be used on newly
   allocated compound page. Maybe we should renamed it to
   split_trans_huge_page to make it more explicit, since it won't even
   work on hugetlbfs (compound) pages.

Thanks,
Andrea

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-22 Thread Robin Murphy

On 2020-06-22 13:46, Joerg Roedel wrote:

+ Robin

Robin, any idea on this?


After a bit of archaeology, this dates back to the original review:

https://lore.kernel.org/linux-arm-kernel/54c285d4.3070...@arm.com/
https://lore.kernel.org/linux-arm-kernel/54da2666.9030...@arm.com/

In summary: originally this inherited from other arch code that did 
simply strip __GFP_COMP; that was deemed questionable because of the 
nonsensical comment about CONFIG_HUGETLBFS that was stuck to it; the 
current code is like it is because in 5 and a half years nobody said 
that it's wrong :)


If there actually *are* good reasons for stripping __GFP_COMP, then I've 
certainly no objection to doing so.


Robin.


On Thu, Jun 18, 2020 at 10:40:26PM -0400, Andrea Arcangeli wrote:

Hello,

On Thu, Jun 18, 2020 at 06:14:49PM -0700, Roman Gushchin wrote:

I agree. The whole

page = alloc_pages_node(nid, alloc_flags, order);
if (!page)
continue;
if (!order)
break;
if (!PageCompound(page)) {
split_page(page, order);
break;
} else if (!split_huge_page(page)) {
break;
}

looks very suspicious to me.
My wild guess is that gfp flags changed somewhere above, so we hit
the branch which was never hit before.


Right to be suspicious about the above: split_huge_page on a regular
page allocated by a driver was never meant to work.

The PageLocked BUG_ON is just a symptom of a bigger issue, basically
split_huge_page it may survive, but it'll stay compound and in turn it
must be freed as compound.

The respective free method doesn't even contemplate freeing compound
pages, the only way the free method can survive, is by removing
__GFP_COMP forcefully in the allocation that was perhaps set here
(there are that many __GFP_COMP in that directory):

static void snd_malloc_dev_pages(struct snd_dma_buffer *dmab, size_t size)
{
gfp_t gfp_flags;

gfp_flags = GFP_KERNEL
| __GFP_COMP/* compound page lets parts be mapped */

And I'm not sure what the comment means here, compound or non compound
doesn't make a difference when you map it, it's not a THP, the
mappings must be handled manually so nothing should check PG_compound
anyway in the mapping code.

Something like this may improve things, it's an untested quick hack,
but this assumes it's always a bug to setup a compound page for these
DMA allocations and given the API it's probably a correct
assumption.. Compound is slower, unless you need it, you can avoid it
and then split_page will give contiguous memory page granular. Ideally
the code shouldn't call split_page at all and it should free it all at
once by keeping track of the order and by returning the order to the
caller, something the API can't do right now as it returns a plain
array that can only represent individual small pages.

Once this is resolved, you may want to check your config, iommu passthrough
sounds more optimal for a soundcard.

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f68a62c3c32b..3dfbc010fa83 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -499,6 +499,10 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
  
  	/* IOMMU can map any pages, so himem can also be used here */

gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
+   if (unlikely(gfp & __GFP_COMP)) {
+   WARN();
+   gfp &= ~__GFP_COMP;
+   }
  
  	while (count) {

struct page *page = NULL;
@@ -522,13 +526,8 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
continue;
if (!order)
break;
-   if (!PageCompound(page)) {
-   split_page(page, order);
-   break;
-   } else if (!split_huge_page(page)) {
-   break;
-   }
-   __free_pages(page, order);
+   split_page(page, order);
+   break;
}
if (!page) {
__iommu_dma_free_pages(pages, i);
diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c
index 6850d13aa98c..378f5a36ec5f 100644
--- a/sound/core/memalloc.c
+++ b/sound/core/memalloc.c
@@ -28,7 +28,6 @@ static void snd_malloc_dev_pages(struct snd_dma_buffer *dmab, 
size_t size)
gfp_t gfp_flags;
  
  	gfp_flags = GFP_KERNEL

-   | __GFP_COMP/* compound page lets parts be mapped */
| __GFP_NORETRY /* don't trigger OOM-killer */
| __GFP_NOWARN; /* no stack trace print - this call is 
non-critical */
dmab->area = dma_alloc_coherent(dmab->dev.dev, size, >addr,

___
iommu mailing list
iommu@lists.linux-foundation.org

Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-22 Thread Joerg Roedel
+ Robin

Robin, any idea on this?

On Thu, Jun 18, 2020 at 10:40:26PM -0400, Andrea Arcangeli wrote:
> Hello,
> 
> On Thu, Jun 18, 2020 at 06:14:49PM -0700, Roman Gushchin wrote:
> > I agree. The whole
> > 
> > page = alloc_pages_node(nid, alloc_flags, order);
> > if (!page)
> > continue;
> > if (!order)
> > break;
> > if (!PageCompound(page)) {
> > split_page(page, order);
> > break;
> > } else if (!split_huge_page(page)) {
> > break;
> > }
> > 
> > looks very suspicious to me.
> > My wild guess is that gfp flags changed somewhere above, so we hit
> > the branch which was never hit before.
> 
> Right to be suspicious about the above: split_huge_page on a regular
> page allocated by a driver was never meant to work.
> 
> The PageLocked BUG_ON is just a symptom of a bigger issue, basically
> split_huge_page it may survive, but it'll stay compound and in turn it
> must be freed as compound.
> 
> The respective free method doesn't even contemplate freeing compound
> pages, the only way the free method can survive, is by removing
> __GFP_COMP forcefully in the allocation that was perhaps set here
> (there are that many __GFP_COMP in that directory):
> 
> static void snd_malloc_dev_pages(struct snd_dma_buffer *dmab, size_t size)
> {
>   gfp_t gfp_flags;
> 
>   gfp_flags = GFP_KERNEL
>   | __GFP_COMP/* compound page lets parts be mapped */
> 
> And I'm not sure what the comment means here, compound or non compound
> doesn't make a difference when you map it, it's not a THP, the
> mappings must be handled manually so nothing should check PG_compound
> anyway in the mapping code.
> 
> Something like this may improve things, it's an untested quick hack,
> but this assumes it's always a bug to setup a compound page for these
> DMA allocations and given the API it's probably a correct
> assumption.. Compound is slower, unless you need it, you can avoid it
> and then split_page will give contiguous memory page granular. Ideally
> the code shouldn't call split_page at all and it should free it all at
> once by keeping track of the order and by returning the order to the
> caller, something the API can't do right now as it returns a plain
> array that can only represent individual small pages.
> 
> Once this is resolved, you may want to check your config, iommu passthrough
> sounds more optimal for a soundcard.
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f68a62c3c32b..3dfbc010fa83 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -499,6 +499,10 @@ static struct page **__iommu_dma_alloc_pages(struct 
> device *dev,
>  
>   /* IOMMU can map any pages, so himem can also be used here */
>   gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
> + if (unlikely(gfp & __GFP_COMP)) {
> + WARN();
> + gfp &= ~__GFP_COMP;
> + }
>  
>   while (count) {
>   struct page *page = NULL;
> @@ -522,13 +526,8 @@ static struct page **__iommu_dma_alloc_pages(struct 
> device *dev,
>   continue;
>   if (!order)
>   break;
> - if (!PageCompound(page)) {
> - split_page(page, order);
> - break;
> - } else if (!split_huge_page(page)) {
> - break;
> - }
> - __free_pages(page, order);
> + split_page(page, order);
> + break;
>   }
>   if (!page) {
>   __iommu_dma_free_pages(pages, i);
> diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c
> index 6850d13aa98c..378f5a36ec5f 100644
> --- a/sound/core/memalloc.c
> +++ b/sound/core/memalloc.c
> @@ -28,7 +28,6 @@ static void snd_malloc_dev_pages(struct snd_dma_buffer 
> *dmab, size_t size)
>   gfp_t gfp_flags;
>  
>   gfp_flags = GFP_KERNEL
> - | __GFP_COMP/* compound page lets parts be mapped */
>   | __GFP_NORETRY /* don't trigger OOM-killer */
>   | __GFP_NOWARN; /* no stack trace print - this call is 
> non-critical */
>   dmab->area = dma_alloc_coherent(dmab->dev.dev, size, >addr,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-21 Thread David Rientjes via iommu
On Fri, 19 Jun 2020, Roman Gushchin wrote:

> > > [   40.287524] BUG: unable to handle page fault for address: 
> > > a77b833df000
> > > [   40.287529] #PF: supervisor write access in kernel mode
> > > [   40.287531] #PF: error_code(0x000b) - reserved bit violation
> > > [   40.287532] PGD 40d14e067 P4D 40d14e067 PUD 40d14f067 PMD 3ec54d067
> > > PTE 80001688033d9163
> > > [   40.287538] Oops: 000b [#2] SMP NOPTI
> > > [   40.287542] CPU: 9 PID: 1986 Comm: pulseaudio Tainted: G  D
> > >   5.8.0-rc1+ #697
> > > [   40.287544] Hardware name: Gigabyte Technology Co., Ltd.
> > > AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019
> > > [   40.287550] RIP: 0010:__memset+0x24/0x30
> > > [   40.287553] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89
> > > d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48
> > > 0f af c6  48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89
> > > d1 f3
> > > [   40.287556] RSP: 0018:a77b827a7e08 EFLAGS: 00010216
> > > [   40.287558] RAX:  RBX: 90f77dced800 RCX: 
> > > 08a0
> > > [   40.287560] RDX:  RSI:  RDI: 
> > > a77b833df000
> > > [   40.287561] RBP: 90f7898c7000 R08: 90f78c507768 R09: 
> > > a77b833df000
> > > [   40.287563] R10: a77b833df000 R11: 90f7839f2d40 R12: 
> > > 
> > > [   40.287564] R13: 90f76a802e00 R14: c0cb8880 R15: 
> > > 90f770f4e700
> > > [   40.287567] FS:  7f3d8e8df880() GS:90f78ee4()
> > > knlGS:
> > > [   40.287569] CS:  0010 DS:  ES:  CR0: 80050033
> > > [   40.287570] CR2: a77b833df000 CR3: 0003fa556000 CR4: 
> > > 003406e0
> > > [   40.287572] Call Trace:
> > > [   40.287584]  snd_pcm_hw_params+0x3fd/0x490 [snd_pcm]
> > > [   40.287593]  snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm]
> > > [   40.287601]  ? snd_pcm_info_user+0x64/0x80 [snd_pcm]
> > > [   40.287608]  snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> > > [   40.287613]  ksys_ioctl+0x82/0xc0
> > > [   40.287617]  __x64_sys_ioctl+0x16/0x20
> > > [   40.287622]  do_syscall_64+0x4d/0x90
> > > [   40.287627]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > Hi Roman,
> > 
> > If you have CONFIG_AMD_MEM_ENCRYPT set, this should be resolved by
> > 
> > commit dbed452a078d56bc7f1abecc3edd6a75e8e4484e
> > Author: David Rientjes 
> > Date:   Thu Jun 11 00:25:57 2020 -0700
> > 
> > dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL
> > 
> > Or you might want to wait for 5.8-rc2 instead which includes this fix.
> > 
> 
> Hello, David!
> 
> Thank you for pointing at it! Unfortunately, there must be something wrong
> with drivers, your patch didn't help much. I still see the same stacktrace.
> 

Wow, I'm very surprised.  Do you have CONFIG_AMD_MEM_ENCRYPT enabled?

Adding Takashi.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-19 Thread Roman Gushchin via iommu
On Fri, Jun 19, 2020 at 01:56:28PM -0700, David Rientjes wrote:
> On Fri, 19 Jun 2020, Roman Gushchin wrote:
> 
> > [   40.287524] BUG: unable to handle page fault for address: 
> > a77b833df000
> > [   40.287529] #PF: supervisor write access in kernel mode
> > [   40.287531] #PF: error_code(0x000b) - reserved bit violation
> > [   40.287532] PGD 40d14e067 P4D 40d14e067 PUD 40d14f067 PMD 3ec54d067
> > PTE 80001688033d9163
> > [   40.287538] Oops: 000b [#2] SMP NOPTI
> > [   40.287542] CPU: 9 PID: 1986 Comm: pulseaudio Tainted: G  D
> >   5.8.0-rc1+ #697
> > [   40.287544] Hardware name: Gigabyte Technology Co., Ltd.
> > AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019
> > [   40.287550] RIP: 0010:__memset+0x24/0x30
> > [   40.287553] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89
> > d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48
> > 0f af c6  48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89
> > d1 f3
> > [   40.287556] RSP: 0018:a77b827a7e08 EFLAGS: 00010216
> > [   40.287558] RAX:  RBX: 90f77dced800 RCX: 
> > 08a0
> > [   40.287560] RDX:  RSI:  RDI: 
> > a77b833df000
> > [   40.287561] RBP: 90f7898c7000 R08: 90f78c507768 R09: 
> > a77b833df000
> > [   40.287563] R10: a77b833df000 R11: 90f7839f2d40 R12: 
> > 
> > [   40.287564] R13: 90f76a802e00 R14: c0cb8880 R15: 
> > 90f770f4e700
> > [   40.287567] FS:  7f3d8e8df880() GS:90f78ee4()
> > knlGS:
> > [   40.287569] CS:  0010 DS:  ES:  CR0: 80050033
> > [   40.287570] CR2: a77b833df000 CR3: 0003fa556000 CR4: 
> > 003406e0
> > [   40.287572] Call Trace:
> > [   40.287584]  snd_pcm_hw_params+0x3fd/0x490 [snd_pcm]
> > [   40.287593]  snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm]
> > [   40.287601]  ? snd_pcm_info_user+0x64/0x80 [snd_pcm]
> > [   40.287608]  snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> > [   40.287613]  ksys_ioctl+0x82/0xc0
> > [   40.287617]  __x64_sys_ioctl+0x16/0x20
> > [   40.287622]  do_syscall_64+0x4d/0x90
> > [   40.287627]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Hi Roman,
> 
> If you have CONFIG_AMD_MEM_ENCRYPT set, this should be resolved by
> 
> commit dbed452a078d56bc7f1abecc3edd6a75e8e4484e
> Author: David Rientjes 
> Date:   Thu Jun 11 00:25:57 2020 -0700
> 
> dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL
> 
> Or you might want to wait for 5.8-rc2 instead which includes this fix.
> 

Hello, David!

Thank you for pointing at it! Unfortunately, there must be something wrong
with drivers, your patch didn't help much. I still see the same stacktrace.

I'll try again after 5.8-rc2 will be out.

Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-19 Thread David Rientjes via iommu
On Fri, 19 Jun 2020, Roman Gushchin wrote:

> [   40.287524] BUG: unable to handle page fault for address: a77b833df000
> [   40.287529] #PF: supervisor write access in kernel mode
> [   40.287531] #PF: error_code(0x000b) - reserved bit violation
> [   40.287532] PGD 40d14e067 P4D 40d14e067 PUD 40d14f067 PMD 3ec54d067
> PTE 80001688033d9163
> [   40.287538] Oops: 000b [#2] SMP NOPTI
> [   40.287542] CPU: 9 PID: 1986 Comm: pulseaudio Tainted: G  D
>   5.8.0-rc1+ #697
> [   40.287544] Hardware name: Gigabyte Technology Co., Ltd.
> AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019
> [   40.287550] RIP: 0010:__memset+0x24/0x30
> [   40.287553] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89
> d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48
> 0f af c6  48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89
> d1 f3
> [   40.287556] RSP: 0018:a77b827a7e08 EFLAGS: 00010216
> [   40.287558] RAX:  RBX: 90f77dced800 RCX: 
> 08a0
> [   40.287560] RDX:  RSI:  RDI: 
> a77b833df000
> [   40.287561] RBP: 90f7898c7000 R08: 90f78c507768 R09: 
> a77b833df000
> [   40.287563] R10: a77b833df000 R11: 90f7839f2d40 R12: 
> 
> [   40.287564] R13: 90f76a802e00 R14: c0cb8880 R15: 
> 90f770f4e700
> [   40.287567] FS:  7f3d8e8df880() GS:90f78ee4()
> knlGS:
> [   40.287569] CS:  0010 DS:  ES:  CR0: 80050033
> [   40.287570] CR2: a77b833df000 CR3: 0003fa556000 CR4: 
> 003406e0
> [   40.287572] Call Trace:
> [   40.287584]  snd_pcm_hw_params+0x3fd/0x490 [snd_pcm]
> [   40.287593]  snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm]
> [   40.287601]  ? snd_pcm_info_user+0x64/0x80 [snd_pcm]
> [   40.287608]  snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> [   40.287613]  ksys_ioctl+0x82/0xc0
> [   40.287617]  __x64_sys_ioctl+0x16/0x20
> [   40.287622]  do_syscall_64+0x4d/0x90
> [   40.287627]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Hi Roman,

If you have CONFIG_AMD_MEM_ENCRYPT set, this should be resolved by

commit dbed452a078d56bc7f1abecc3edd6a75e8e4484e
Author: David Rientjes 
Date:   Thu Jun 11 00:25:57 2020 -0700

dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL

Or you might want to wait for 5.8-rc2 instead which includes this fix.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-19 Thread Roman Gushchin via iommu
On Thu, Jun 18, 2020 at 10:40:26PM -0400, Andrea Arcangeli wrote:
> Hello,
> 
> On Thu, Jun 18, 2020 at 06:14:49PM -0700, Roman Gushchin wrote:
> > I agree. The whole
> > 
> > page = alloc_pages_node(nid, alloc_flags, order);
> > if (!page)
> > continue;
> > if (!order)
> > break;
> > if (!PageCompound(page)) {
> > split_page(page, order);
> > break;
> > } else if (!split_huge_page(page)) {
> > break;
> > }
> > 
> > looks very suspicious to me.
> > My wild guess is that gfp flags changed somewhere above, so we hit
> > the branch which was never hit before.
> 
> Right to be suspicious about the above: split_huge_page on a regular
> page allocated by a driver was never meant to work.
> 
> The PageLocked BUG_ON is just a symptom of a bigger issue, basically
> split_huge_page it may survive, but it'll stay compound and in turn it
> must be freed as compound.
> 
> The respective free method doesn't even contemplate freeing compound
> pages, the only way the free method can survive, is by removing
> __GFP_COMP forcefully in the allocation that was perhaps set here
> (there are that many __GFP_COMP in that directory):
> 
> static void snd_malloc_dev_pages(struct snd_dma_buffer *dmab, size_t size)
> {
>   gfp_t gfp_flags;
> 
>   gfp_flags = GFP_KERNEL
>   | __GFP_COMP/* compound page lets parts be mapped */
> 
> And I'm not sure what the comment means here, compound or non compound
> doesn't make a difference when you map it, it's not a THP, the
> mappings must be handled manually so nothing should check PG_compound
> anyway in the mapping code.
> 
> Something like this may improve things, it's an untested quick hack,
> but this assumes it's always a bug to setup a compound page for these
> DMA allocations and given the API it's probably a correct
> assumption.. Compound is slower, unless you need it, you can avoid it
> and then split_page will give contiguous memory page granular. Ideally
> the code shouldn't call split_page at all and it should free it all at
> once by keeping track of the order and by returning the order to the
> caller, something the API can't do right now as it returns a plain
> array that can only represent individual small pages.
> 
> Once this is resolved, you may want to check your config, iommu passthrough
> sounds more optimal for a soundcard.

It's based on the default Fedora 32 config + all defaults for the 5.6..5.8-rc1
difference. But thanks for the advice!

> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f68a62c3c32b..3dfbc010fa83 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -499,6 +499,10 @@ static struct page **__iommu_dma_alloc_pages(struct 
> device *dev,
>  
>   /* IOMMU can map any pages, so himem can also be used here */
>   gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
> + if (unlikely(gfp & __GFP_COMP)) {
> + WARN();
> + gfp &= ~__GFP_COMP;
> + }
>  
>   while (count) {
>   struct page *page = NULL;
> @@ -522,13 +526,8 @@ static struct page **__iommu_dma_alloc_pages(struct 
> device *dev,
>   continue;
>   if (!order)
>   break;
> - if (!PageCompound(page)) {
> - split_page(page, order);
> - break;
> - } else if (!split_huge_page(page)) {
> - break;
> - }
> - __free_pages(page, order);
> + split_page(page, order);
> + break;
>   }
>   if (!page) {
>   __iommu_dma_free_pages(pages, i);
> diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c
> index 6850d13aa98c..378f5a36ec5f 100644
> --- a/sound/core/memalloc.c
> +++ b/sound/core/memalloc.c
> @@ -28,7 +28,6 @@ static void snd_malloc_dev_pages(struct snd_dma_buffer 
> *dmab, size_t size)
>   gfp_t gfp_flags;
>  
>   gfp_flags = GFP_KERNEL
> - | __GFP_COMP/* compound page lets parts be mapped */
>   | __GFP_NORETRY /* don't trigger OOM-killer */
>   | __GFP_NOWARN; /* no stack trace print - this call is 
> non-critical */
>   dmab->area = dma_alloc_coherent(dmab->dev.dev, size, >addr,
> 
> 

The patch looks very good to me, but unfortunately it seems to reveal
the next layer of problems:

[   23.864671] page:cd0a8fc3a3c0 refcount:0 mapcount:0
mapping: index:0x0
[   23.864674] flags: 0x17c800(arch_1)
[   23.864678] raw: 0017c800  cd0a8fc3a388

[   23.864680] raw:   

[   23.864705] page dumped because: VM_BUG_ON_PAGE(((unsigned int)
page_ref_count(page) + 127u <= 127u))
[   23.864771] [ cut 

Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-18 Thread Andrea Arcangeli
Hello,

On Thu, Jun 18, 2020 at 06:14:49PM -0700, Roman Gushchin wrote:
> I agree. The whole
> 
>   page = alloc_pages_node(nid, alloc_flags, order);
>   if (!page)
>   continue;
>   if (!order)
>   break;
>   if (!PageCompound(page)) {
>   split_page(page, order);
>   break;
>   } else if (!split_huge_page(page)) {
>   break;
>   }
> 
> looks very suspicious to me.
> My wild guess is that gfp flags changed somewhere above, so we hit
> the branch which was never hit before.

Right to be suspicious about the above: split_huge_page on a regular
page allocated by a driver was never meant to work.

The PageLocked BUG_ON is just a symptom of a bigger issue, basically
split_huge_page it may survive, but it'll stay compound and in turn it
must be freed as compound.

The respective free method doesn't even contemplate freeing compound
pages, the only way the free method can survive, is by removing
__GFP_COMP forcefully in the allocation that was perhaps set here
(there are that many __GFP_COMP in that directory):

static void snd_malloc_dev_pages(struct snd_dma_buffer *dmab, size_t size)
{
gfp_t gfp_flags;

gfp_flags = GFP_KERNEL
| __GFP_COMP/* compound page lets parts be mapped */

And I'm not sure what the comment means here, compound or non compound
doesn't make a difference when you map it, it's not a THP, the
mappings must be handled manually so nothing should check PG_compound
anyway in the mapping code.

Something like this may improve things, it's an untested quick hack,
but this assumes it's always a bug to setup a compound page for these
DMA allocations and given the API it's probably a correct
assumption.. Compound is slower, unless you need it, you can avoid it
and then split_page will give contiguous memory page granular. Ideally
the code shouldn't call split_page at all and it should free it all at
once by keeping track of the order and by returning the order to the
caller, something the API can't do right now as it returns a plain
array that can only represent individual small pages.

Once this is resolved, you may want to check your config, iommu passthrough
sounds more optimal for a soundcard.

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f68a62c3c32b..3dfbc010fa83 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -499,6 +499,10 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
 
/* IOMMU can map any pages, so himem can also be used here */
gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
+   if (unlikely(gfp & __GFP_COMP)) {
+   WARN();
+   gfp &= ~__GFP_COMP;
+   }
 
while (count) {
struct page *page = NULL;
@@ -522,13 +526,8 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
continue;
if (!order)
break;
-   if (!PageCompound(page)) {
-   split_page(page, order);
-   break;
-   } else if (!split_huge_page(page)) {
-   break;
-   }
-   __free_pages(page, order);
+   split_page(page, order);
+   break;
}
if (!page) {
__iommu_dma_free_pages(pages, i);
diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c
index 6850d13aa98c..378f5a36ec5f 100644
--- a/sound/core/memalloc.c
+++ b/sound/core/memalloc.c
@@ -28,7 +28,6 @@ static void snd_malloc_dev_pages(struct snd_dma_buffer *dmab, 
size_t size)
gfp_t gfp_flags;
 
gfp_flags = GFP_KERNEL
-   | __GFP_COMP/* compound page lets parts be mapped */
| __GFP_NORETRY /* don't trigger OOM-killer */
| __GFP_NOWARN; /* no stack trace print - this call is 
non-critical */
dmab->area = dma_alloc_coherent(dmab->dev.dev, size, >addr,

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-18 Thread Yang Shi
On Thu, Jun 18, 2020 at 5:19 PM Roman Gushchin  wrote:
>
> Hi!
>
> I was consistently hitting a VM_BUG_ON_PAGE() in split_huge_page_to_list()
> when running vanilla 5.8-rc1 on my desktop. It was happening on every boot
> during the system start. I haven't seen this issue on 5.7.
>
> It looks like split_huge_page() expects the page to be locked,
> but it hasn't been changed from 5.7. I do not see any suspicious
> commits around the call side either.
>
> I've tried the following patch:
>
> --
> From 4af38fbf06a9354fadf22a78f1a42dfbb24fbc3a Mon Sep 17 00:00:00 2001
> From: Roman Gushchin 
> Date: Thu, 18 Jun 2020 16:33:47 -0700
> Subject: [PATCH] iommu/dma: lock page before calling split_huge_page()
>
> split_huge_page() expects a locked page. The following stacktrace
> is generated if debug is on. Fix this by locking the page before
> passing it to split_huge_page().
>
> [   24.861385] page:ef044fb1fa00 refcount:1 mapcount:0 
> mapping: index:0x0 head:ef044fb1fa00 order:2 
> compound_mapcount:0 compound_pincount:0
> [   24.861389] flags: 0x17c001(head)
> [   24.861393] raw: 0017c001 dead0100 dead0122 
> 
> [   24.861395] raw:   0001 
> 
> [   24.861396] page dumped because: VM_BUG_ON_PAGE(!PageLocked(head))
> [   24.861411] [ cut here ]
> [   24.861413] kernel BUG at mm/huge_memory.c:2613!
> [   24.861428] invalid opcode:  [#1] SMP NOPTI
> [   24.861432] CPU: 10 PID: 1505 Comm: pulseaudio Not tainted 5.8.0-rc1+ #689
> [   24.861433] Hardware name: Gigabyte Technology Co., Ltd. 
> AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019
> [   24.861441] RIP: 0010:split_huge_page_to_list+0x731/0xae0
> [   24.861444] Code: 44 00 00 8b 47 34 85 c0 0f 84 b4 02 00 00 f0 ff 4f 34 75 
> c2 e8 e0 12 f7 ff eb bb 48 c7 c6 d0 16 39 ba 4c 89 c7 e8 ef 85 f9 ff <0f> 0b 
> 48 c7 44 24 10 ff ff ff ff 31 db e9 bb fa ff ff 48 8b 7c 24
> [   24.861446] RSP: 0018:c1030254bb50 EFLAGS: 00010286
> [   24.861449] RAX:  RBX: 0002 RCX: 
> 9b54cee98d08
> [   24.861451] RDX: ffd8 RSI:  RDI: 
> 9b54cee98d00
> [   24.861452] RBP: ef044fb1fa00 R08: 0547 R09: 
> 0003
> [   24.861454] R10:  R11: 0001 R12: 
> 9b54df37f188
> [   24.861455] R13: 9b54df355000 R14: ef044fb1fa00 R15: 
> ef044fb1fa00
> [   24.861458] FS:  7fd2dc132880() GS:9b54cee8() 
> knlGS:
> [   24.861460] CS:  0010 DS:  ES:  CR0: 80050033
> [   24.861461] CR2: 7fd2cb10 CR3: 0003feb16000 CR4: 
> 003406e0
> [   24.861464] Call Trace:
> [   24.861473]  ? __mod_lruvec_state+0x41/0xf0
> [   24.861478]  ? __alloc_pages_nodemask+0x15c/0x320
> [   24.861483]  iommu_dma_alloc+0x316/0x580
> [   24.861496]  snd_dma_alloc_pages+0xdf/0x160 [snd_pcm]
> [   24.861508]  snd_dma_alloc_pages_fallback+0x5d/0x80 [snd_pcm]
> [   24.861516]  snd_malloc_sgbuf_pages+0x166/0x380 [snd_pcm]
> [   24.861523]  ? snd_pcm_hw_refine+0x29d/0x310 [snd_pcm]
> [   24.861529]  ? _cond_resched+0x16/0x40
> [   24.861535]  snd_dma_alloc_pages+0x64/0x160 [snd_pcm]
> [   24.861542]  snd_pcm_lib_malloc_pages+0x136/0x1d0 [snd_pcm]
> [   24.861550]  ? snd_pcm_lib_ioctl+0x167/0x210 [snd_pcm]
> [   24.861556]  snd_pcm_hw_params+0x3c0/0x490 [snd_pcm]
> [   24.861563]  snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm]
> [   24.861571]  ? snd_pcm_info_user+0x64/0x80 [snd_pcm]
> [   24.861578]  snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> [   24.861583]  ksys_ioctl+0x82/0xc0
> [   24.861587]  __x64_sys_ioctl+0x16/0x20
> [   24.861593]  do_syscall_64+0x4d/0x90
> [   24.861597]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Roman Gushchin 
> ---
>  drivers/iommu/dma-iommu.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 4959f5df21bd..31e4e305d8d5 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct iommu_dma_msi_page {
> struct list_headlist;
> @@ -549,8 +550,15 @@ static struct page **__iommu_dma_alloc_pages(struct 
> device *dev,
> if (!PageCompound(page)) {
> split_page(page, order);
> break;
> -   } else if (!split_huge_page(page)) {
> -   break;
> +   } else {
> +  

Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-18 Thread Yang Shi
On Thu, Jun 18, 2020 at 6:15 PM Roman Gushchin  wrote:
>
> On Thu, Jun 18, 2020 at 05:46:24PM -0700, Yang Shi wrote:
> > On Thu, Jun 18, 2020 at 5:19 PM Roman Gushchin  wrote:
> > >
> > > Hi!
> > >
> > > I was consistently hitting a VM_BUG_ON_PAGE() in split_huge_page_to_list()
> > > when running vanilla 5.8-rc1 on my desktop. It was happening on every boot
> > > during the system start. I haven't seen this issue on 5.7.
> > >
> > > It looks like split_huge_page() expects the page to be locked,
> > > but it hasn't been changed from 5.7. I do not see any suspicious
> > > commits around the call side either.
> > >
> > > I've tried the following patch:
> > >
> > > --
> > > From 4af38fbf06a9354fadf22a78f1a42dfbb24fbc3a Mon Sep 17 00:00:00 2001
> > > From: Roman Gushchin 
> > > Date: Thu, 18 Jun 2020 16:33:47 -0700
> > > Subject: [PATCH] iommu/dma: lock page before calling split_huge_page()
> > >
> > > split_huge_page() expects a locked page. The following stacktrace
> > > is generated if debug is on. Fix this by locking the page before
> > > passing it to split_huge_page().
> > >
> > > [   24.861385] page:ef044fb1fa00 refcount:1 mapcount:0 
> > > mapping: index:0x0 head:ef044fb1fa00 order:2 
> > > compound_mapcount:0 compound_pincount:0
> > > [   24.861389] flags: 0x17c001(head)
> > > [   24.861393] raw: 0017c001 dead0100 dead0122 
> > > 00000000
> > > [   24.861395] raw:   0001 
> > > 
> > > [   24.861396] page dumped because: VM_BUG_ON_PAGE(!PageLocked(head))
> > > [   24.861411] [ cut here ]
> > > [   24.861413] kernel BUG at mm/huge_memory.c:2613!
> > > [   24.861428] invalid opcode:  [#1] SMP NOPTI
> > > [   24.861432] CPU: 10 PID: 1505 Comm: pulseaudio Not tainted 5.8.0-rc1+ 
> > > #689
> > > [   24.861433] Hardware name: Gigabyte Technology Co., Ltd. 
> > > AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019
> > > [   24.861441] RIP: 0010:split_huge_page_to_list+0x731/0xae0
> > > [   24.861444] Code: 44 00 00 8b 47 34 85 c0 0f 84 b4 02 00 00 f0 ff 4f 
> > > 34 75 c2 e8 e0 12 f7 ff eb bb 48 c7 c6 d0 16 39 ba 4c 89 c7 e8 ef 85 f9 
> > > ff <0f> 0b 48 c7 44 24 10 ff ff ff ff 31 db e9 bb fa ff ff 48 8b 7c 24
> > > [   24.861446] RSP: 0018:c1030254bb50 EFLAGS: 00010286
> > > [   24.861449] RAX:  RBX: 0002 RCX: 
> > > 9b54cee98d08
> > > [   24.861451] RDX: ffd8 RSI:  RDI: 
> > > 9b54cee98d00
> > > [   24.861452] RBP: ef044fb1fa00 R08: 0547 R09: 
> > > 0003
> > > [   24.861454] R10:  R11: 0001 R12: 
> > > 9b54df37f188
> > > [   24.861455] R13: 9b54df355000 R14: ef044fb1fa00 R15: 
> > > ef044fb1fa00
> > > [   24.861458] FS:  7fd2dc132880() GS:9b54cee8() 
> > > knlGS:
> > > [   24.861460] CS:  0010 DS:  ES:  CR0: 80050033
> > > [   24.861461] CR2: 7fd2cb10 CR3: 0003feb16000 CR4: 
> > > 003406e0
> > > [   24.861464] Call Trace:
> > > [   24.861473]  ? __mod_lruvec_state+0x41/0xf0
> > > [   24.861478]  ? __alloc_pages_nodemask+0x15c/0x320
> > > [   24.861483]  iommu_dma_alloc+0x316/0x580
> > > [   24.861496]  snd_dma_alloc_pages+0xdf/0x160 [snd_pcm]
> > > [   24.861508]  snd_dma_alloc_pages_fallback+0x5d/0x80 [snd_pcm]
> > > [   24.861516]  snd_malloc_sgbuf_pages+0x166/0x380 [snd_pcm]
> > > [   24.861523]  ? snd_pcm_hw_refine+0x29d/0x310 [snd_pcm]
> > > [   24.861529]  ? _cond_resched+0x16/0x40
> > > [   24.861535]  snd_dma_alloc_pages+0x64/0x160 [snd_pcm]
> > > [   24.861542]  snd_pcm_lib_malloc_pages+0x136/0x1d0 [snd_pcm]
> > > [   24.861550]  ? snd_pcm_lib_ioctl+0x167/0x210 [snd_pcm]
> > > [   24.861556]  snd_pcm_hw_params+0x3c0/0x490 [snd_pcm]
> > > [   24.861563]  snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm]
> > > [   24.861571]  ? snd_pcm_info_user+0x64/0x80 [snd_pcm]
> > > [   24.861578]  snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> > > [   24.861583]  ksys_ioctl+0x82/0xc0
> > > [   24.861587]  __x64_sys_ioctl+0x16/0x20
> > > [   24.861593]  do_syscall_64+0x4d/0x90
> > > [   24.861597]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >
>

Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-18 Thread Roman Gushchin via iommu
On Thu, Jun 18, 2020 at 05:46:24PM -0700, Yang Shi wrote:
> On Thu, Jun 18, 2020 at 5:19 PM Roman Gushchin  wrote:
> >
> > Hi!
> >
> > I was consistently hitting a VM_BUG_ON_PAGE() in split_huge_page_to_list()
> > when running vanilla 5.8-rc1 on my desktop. It was happening on every boot
> > during the system start. I haven't seen this issue on 5.7.
> >
> > It looks like split_huge_page() expects the page to be locked,
> > but it hasn't been changed from 5.7. I do not see any suspicious
> > commits around the call side either.
> >
> > I've tried the following patch:
> >
> > --
> > From 4af38fbf06a9354fadf22a78f1a42dfbb24fbc3a Mon Sep 17 00:00:00 2001
> > From: Roman Gushchin 
> > Date: Thu, 18 Jun 2020 16:33:47 -0700
> > Subject: [PATCH] iommu/dma: lock page before calling split_huge_page()
> >
> > split_huge_page() expects a locked page. The following stacktrace
> > is generated if debug is on. Fix this by locking the page before
> > passing it to split_huge_page().
> >
> > [   24.861385] page:ef044fb1fa00 refcount:1 mapcount:0 
> > mapping: index:0x0 head:ef044fb1fa00 order:2 
> > compound_mapcount:0 compound_pincount:0
> > [   24.861389] flags: 0x17c001(head)
> > [   24.861393] raw: 0017c001 dead0100 dead0122 
> > 
> > [   24.861395] raw:   0001ffffffff 
> > 0000
> > [   24.861396] page dumped because: VM_BUG_ON_PAGE(!PageLocked(head))
> > [   24.861411] [ cut here ]
> > [   24.861413] kernel BUG at mm/huge_memory.c:2613!
> > [   24.861428] invalid opcode:  [#1] SMP NOPTI
> > [   24.861432] CPU: 10 PID: 1505 Comm: pulseaudio Not tainted 5.8.0-rc1+ 
> > #689
> > [   24.861433] Hardware name: Gigabyte Technology Co., Ltd. 
> > AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019
> > [   24.861441] RIP: 0010:split_huge_page_to_list+0x731/0xae0
> > [   24.861444] Code: 44 00 00 8b 47 34 85 c0 0f 84 b4 02 00 00 f0 ff 4f 34 
> > 75 c2 e8 e0 12 f7 ff eb bb 48 c7 c6 d0 16 39 ba 4c 89 c7 e8 ef 85 f9 ff 
> > <0f> 0b 48 c7 44 24 10 ff ff ff ff 31 db e9 bb fa ff ff 48 8b 7c 24
> > [   24.861446] RSP: 0018:c1030254bb50 EFLAGS: 00010286
> > [   24.861449] RAX:  RBX: 0002 RCX: 
> > 9b54cee98d08
> > [   24.861451] RDX: ffd8 RSI:  RDI: 
> > 9b54cee98d00
> > [   24.861452] RBP: ef044fb1fa00 R08: 0547 R09: 
> > 0003
> > [   24.861454] R10:  R11: 0001 R12: 
> > 9b54df37f188
> > [   24.861455] R13: 9b54df355000 R14: ef044fb1fa00 R15: 
> > ef044fb1fa00
> > [   24.861458] FS:  7fd2dc132880() GS:9b54cee8() 
> > knlGS:
> > [   24.861460] CS:  0010 DS:  ES:  CR0: 80050033
> > [   24.861461] CR2: 7fd2cb10 CR3: 0003feb16000 CR4: 
> > 003406e0
> > [   24.861464] Call Trace:
> > [   24.861473]  ? __mod_lruvec_state+0x41/0xf0
> > [   24.861478]  ? __alloc_pages_nodemask+0x15c/0x320
> > [   24.861483]  iommu_dma_alloc+0x316/0x580
> > [   24.861496]  snd_dma_alloc_pages+0xdf/0x160 [snd_pcm]
> > [   24.861508]  snd_dma_alloc_pages_fallback+0x5d/0x80 [snd_pcm]
> > [   24.861516]  snd_malloc_sgbuf_pages+0x166/0x380 [snd_pcm]
> > [   24.861523]  ? snd_pcm_hw_refine+0x29d/0x310 [snd_pcm]
> > [   24.861529]  ? _cond_resched+0x16/0x40
> > [   24.861535]  snd_dma_alloc_pages+0x64/0x160 [snd_pcm]
> > [   24.861542]  snd_pcm_lib_malloc_pages+0x136/0x1d0 [snd_pcm]
> > [   24.861550]  ? snd_pcm_lib_ioctl+0x167/0x210 [snd_pcm]
> > [   24.861556]  snd_pcm_hw_params+0x3c0/0x490 [snd_pcm]
> > [   24.861563]  snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm]
> > [   24.861571]  ? snd_pcm_info_user+0x64/0x80 [snd_pcm]
> > [   24.861578]  snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> > [   24.861583]  ksys_ioctl+0x82/0xc0
> > [   24.861587]  __x64_sys_ioctl+0x16/0x20
> > [   24.861593]  do_syscall_64+0x4d/0x90
> > [   24.861597]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > Signed-off-by: Roman Gushchin 
> > ---
> >  drivers/iommu/dma-iommu.c | 12 ++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 4959f5df21bd..31e4e305d8d5 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -24,6 +24,7 @@
> >  #include 
> >  

kernel BUG at mm/huge_memory.c:2613!

2020-06-18 Thread Roman Gushchin via iommu
Hi!

I was consistently hitting a VM_BUG_ON_PAGE() in split_huge_page_to_list()
when running vanilla 5.8-rc1 on my desktop. It was happening on every boot
during the system start. I haven't seen this issue on 5.7.

It looks like split_huge_page() expects the page to be locked,
but it hasn't been changed from 5.7. I do not see any suspicious
commits around the call side either.

I've tried the following patch:

--
>From 4af38fbf06a9354fadf22a78f1a42dfbb24fbc3a Mon Sep 17 00:00:00 2001
From: Roman Gushchin 
Date: Thu, 18 Jun 2020 16:33:47 -0700
Subject: [PATCH] iommu/dma: lock page before calling split_huge_page()

split_huge_page() expects a locked page. The following stacktrace
is generated if debug is on. Fix this by locking the page before
passing it to split_huge_page().

[   24.861385] page:ef044fb1fa00 refcount:1 mapcount:0 
mapping: index:0x0 head:ef044fb1fa00 order:2 
compound_mapcount:0 compound_pincount:0
[   24.861389] flags: 0x17c001(head)
[   24.861393] raw: 0017c001 dead0100 dead0122 

[   24.861395] raw:   0001 

[   24.861396] page dumped because: VM_BUG_ON_PAGE(!PageLocked(head))
[   24.861411] [ cut here ]
[   24.861413] kernel BUG at mm/huge_memory.c:2613!
[   24.861428] invalid opcode:  [#1] SMP NOPTI
[   24.861432] CPU: 10 PID: 1505 Comm: pulseaudio Not tainted 5.8.0-rc1+ #689
[   24.861433] Hardware name: Gigabyte Technology Co., Ltd. 
AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019
[   24.861441] RIP: 0010:split_huge_page_to_list+0x731/0xae0
[   24.861444] Code: 44 00 00 8b 47 34 85 c0 0f 84 b4 02 00 00 f0 ff 4f 34 75 
c2 e8 e0 12 f7 ff eb bb 48 c7 c6 d0 16 39 ba 4c 89 c7 e8 ef 85 f9 ff <0f> 0b 48 
c7 44 24 10 ff ff ff ff 31 db e9 bb fa ff ff 48 8b 7c 24
[   24.861446] RSP: 0018:c1030254bb50 EFLAGS: 00010286
[   24.861449] RAX:  RBX: 0002 RCX: 9b54cee98d08
[   24.861451] RDX: ffd8 RSI:  RDI: 9b54cee98d00
[   24.861452] RBP: ef044fb1fa00 R08: 0547 R09: 0003
[   24.861454] R10:  R11: 0001 R12: 9b54df37f188
[   24.861455] R13: 9b54df355000 R14: ef044fb1fa00 R15: ef044fb1fa00
[   24.861458] FS:  7fd2dc132880() GS:9b54cee8() 
knlGS:
[   24.861460] CS:  0010 DS:  ES:  CR0: 80050033
[   24.861461] CR2: 7fd2cb10 CR3: 0003feb16000 CR4: 003406e0
[   24.861464] Call Trace:
[   24.861473]  ? __mod_lruvec_state+0x41/0xf0
[   24.861478]  ? __alloc_pages_nodemask+0x15c/0x320
[   24.861483]  iommu_dma_alloc+0x316/0x580
[   24.861496]  snd_dma_alloc_pages+0xdf/0x160 [snd_pcm]
[   24.861508]  snd_dma_alloc_pages_fallback+0x5d/0x80 [snd_pcm]
[   24.861516]  snd_malloc_sgbuf_pages+0x166/0x380 [snd_pcm]
[   24.861523]  ? snd_pcm_hw_refine+0x29d/0x310 [snd_pcm]
[   24.861529]  ? _cond_resched+0x16/0x40
[   24.861535]  snd_dma_alloc_pages+0x64/0x160 [snd_pcm]
[   24.861542]  snd_pcm_lib_malloc_pages+0x136/0x1d0 [snd_pcm]
[   24.861550]  ? snd_pcm_lib_ioctl+0x167/0x210 [snd_pcm]
[   24.861556]  snd_pcm_hw_params+0x3c0/0x490 [snd_pcm]
[   24.861563]  snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm]
[   24.861571]  ? snd_pcm_info_user+0x64/0x80 [snd_pcm]
[   24.861578]  snd_pcm_ioctl+0x23/0x30 [snd_pcm]
[   24.861583]  ksys_ioctl+0x82/0xc0
[   24.861587]  __x64_sys_ioctl+0x16/0x20
[   24.861593]  do_syscall_64+0x4d/0x90
[   24.861597]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Roman Gushchin 
---
 drivers/iommu/dma-iommu.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd..31e4e305d8d5 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct iommu_dma_msi_page {
struct list_headlist;
@@ -549,8 +550,15 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
if (!PageCompound(page)) {
split_page(page, order);
break;
-   } else if (!split_huge_page(page)) {
-   break;
+   } else {
+   int err;
+
+   lock_page(page);
+   err = split_huge_page(page);
+   unlock_page(page);
+
+   if (!err)
+   break;
}
__free_pages(page, order);
}
-- 
2.26.2


--

But applying it made the kernel panic somewhere else:

[   25.148419] BUG: unable to handle page fault for address: b1a9c2429000
[   25.148424] #PF: supervisor write access