Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-14 Thread Sibren Vasse
On Mon, 14 Jan 2019 at 19:10, Christoph Hellwig  wrote:
>
> On Thu, Jan 10, 2019 at 06:52:26PM +0100, Sibren Vasse wrote:
> > On Thu, 10 Jan 2019 at 15:48, Christoph Hellwig  wrote:
> > >
> > > On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
> > > >>  From the trace it looks like we git the case where swiotlb tries
> > > >> to copy back data from a bounce buffer, but hits a dangling or NULL
> > > >> pointer.  So a couple questions for the submitter:
> > > >>
> > > >>   - does the system have more than 4GB memory and thus use swiotlb?
> > > >> (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
> > > >>   - does the device this happens on have a DMA mask smaller than
> > > >> the available memory, that is should swiotlb be used here to start
> > > >> with?
> > > >
> > > > Rather unlikely. The device is an AMD GPU, so we can address memory up 
> > > > to
> > > > 1TB.
> > >
> > > So we probably somehow got a false positive.
> > >
> > > For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> > > backtrace really is in the swiotlb code (I can't think of anything else,
> > > but I'd rather be sure).
> > I'm not sure what you want me to confirm. Could you elaborate?
>
> Please open the vmlinux file for which this happend in gdb,
> then send the output from this command
>
> l *(dma_direct_unmap_page+0x92)
>
> to this thread.
My call trace contained:
Jan 10 16:34:51  kernel:  dma_direct_unmap_page+0x7a/0x80

(gdb) list *(dma_direct_unmap_page+0x7a)
0x810fa28a is in dma_direct_unmap_page (kernel/dma/direct.c:291).
286 size_t size, enum dma_data_direction dir,
unsigned long attrs)
287 {
288 phys_addr_t phys = dma_to_phys(dev, addr);
289
290 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
291 dma_direct_sync_single_for_cpu(dev, addr, size, dir);
292
293 if (unlikely(is_swiotlb_buffer(phys)))
294 swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
295 }
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-14 Thread Sibren Vasse
On Mon, 14 Jan 2019 at 19:13, Christoph Hellwig  wrote:
>
> Hmm, I wonder if we are not actually using swiotlb in the end,
> can you check if your dmesg contains this line or not?
>
> PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
This line does not appear in my dmesg.

>
> If not I guess we found a bug in swiotlb exit vs is_swiotlb_buffer,
> and you can try this patch:
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index d6361776dc5c..1fb6fd68b9c7 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -378,6 +378,8 @@ void __init swiotlb_exit(void)
> memblock_free_late(io_tlb_start,
>PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
> }
> +   io_tlb_start = 0;
> +   io_tlb_end = 0;
> io_tlb_nslabs = 0;
> max_segment = 0;
>  }
With the patch applied to v5.0-rc2 I can no longer reproduce the issue.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-14 Thread Christoph Hellwig
Hmm, I wonder if we are not actually using swiotlb in the end,
can you check if your dmesg contains this line or not?

PCI-DMA: Using software bounce buffering for IO (SWIOTLB)

If not I guess we found a bug in swiotlb exit vs is_swiotlb_buffer,
and you can try this patch:

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d6361776dc5c..1fb6fd68b9c7 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -378,6 +378,8 @@ void __init swiotlb_exit(void)
memblock_free_late(io_tlb_start,
   PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
}
+   io_tlb_start = 0;
+   io_tlb_end = 0;
io_tlb_nslabs = 0;
max_segment = 0;
 }
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-14 Thread Christoph Hellwig
On Thu, Jan 10, 2019 at 06:52:26PM +0100, Sibren Vasse wrote:
> On Thu, 10 Jan 2019 at 15:48, Christoph Hellwig  wrote:
> >
> > On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
> > >>  From the trace it looks like we git the case where swiotlb tries
> > >> to copy back data from a bounce buffer, but hits a dangling or NULL
> > >> pointer.  So a couple questions for the submitter:
> > >>
> > >>   - does the system have more than 4GB memory and thus use swiotlb?
> > >> (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
> > >>   - does the device this happens on have a DMA mask smaller than
> > >> the available memory, that is should swiotlb be used here to start
> > >> with?
> > >
> > > Rather unlikely. The device is an AMD GPU, so we can address memory up to
> > > 1TB.
> >
> > So we probably somehow got a false positive.
> >
> > For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> > backtrace really is in the swiotlb code (I can't think of anything else,
> > but I'd rather be sure).
> I'm not sure what you want me to confirm. Could you elaborate?

Please open the vmlinux file for which this happend in gdb,
then send the output from this command

l *(dma_direct_unmap_page+0x92)

to this thread.

> > Second it would be great to print what the contents of io_tlb_start
> > and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer,
> > maybe that gives a clue why we are hitting the swiotlb code here.
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 7c007ed7505f..042246dbae00 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -69,6 +69,7 @@ extern phys_addr_t io_tlb_start, io_tlb_end;
> 
>  static inline bool is_swiotlb_buffer(phys_addr_t paddr)
>  {
> +printk_once(KERN_INFO "io_tlb_start: %llu, io_tlb_end: %llu",
> io_tlb_start, io_tlb_end);
>  return paddr >= io_tlb_start && paddr < io_tlb_end;
>  }
> 
> Result on boot:
> [   11.405558] io_tlb_start: 3782983680, io_tlb_end: 3850092544

So this is a normal swiotlb location, and it does defintively exist.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-14 Thread Michel Dänzer
On 2019-01-10 3:48 p.m., Christoph Hellwig wrote:
> On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
>>>  From the trace it looks like we git the case where swiotlb tries
>>> to copy back data from a bounce buffer, but hits a dangling or NULL
>>> pointer.  So a couple questions for the submitter:
>>>
>>>   - does the system have more than 4GB memory and thus use swiotlb?
>>> (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
>>>   - does the device this happens on have a DMA mask smaller than
>>> the available memory, that is should swiotlb be used here to start
>>> with?
>>
>> Rather unlikely. The device is an AMD GPU, so we can address memory up to 
>> 1TB.
> 
> So we probably somehow got a false positive.
> 
> For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> backtrace really is in the swiotlb code (I can't think of anything else,
> but I'd rather be sure).
> 
> Second it would be great to print what the contents of io_tlb_start
> and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer,
> maybe that gives a clue why we are hitting the swiotlb code here.

Any progress? https://bugzilla.kernel.org/show_bug.cgi?id=202261 was
also filed about this.

I hope everyone's clear that this needs to be resolved one way or
another by 5.0 final (though the sooner, the better :).


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-11 Thread Sibren Vasse
On Thu, 10 Jan 2019 at 18:06, Konrad Rzeszutek Wilk
 wrote:
>
> On Thu, Jan 10, 2019 at 04:26:43PM +0100, Sibren Vasse wrote:
> > On Thu, 10 Jan 2019 at 14:57, Christoph Hellwig  wrote:
> > >
> > > On Thu, Jan 10, 2019 at 10:59:02AM +0100, Michel Dänzer wrote:
> > > >
> > > > Hi Christoph,
> > > >
> > > >
> > > > https://bugs.freedesktop.org/109234 (please ignore comments #6-#9) was
> > > > bisected to your commit 55897af63091 "dma-direct: merge swiotlb_dma_ops
> > > > into the dma_direct code". Any ideas?
> > >
> > > From the trace it looks like we git the case where swiotlb tries
> > > to copy back data from a bounce buffer, but hits a dangling or NULL
> > > pointer.  So a couple questions for the submitter:
> > My apologies if I misunderstand something, this subject matter is new to me.
> >
> > >
> > >  - does the system have more than 4GB memory and thus use swiotlb?
> > My system has 8GB memory. The other report on the bug tracker had 16GB.
> >
> > >(check /proc/meminfo, and if something SWIOTLB appears in dmesg)
> > /proc/meminfo: https://ptpb.pw/4rxI
> > Can I grep dmesg for a string?
>
> Can you attach the 'dmesg'?
Dmesg attached.


dmesg
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-11 Thread Sibren Vasse
On Thu, 10 Jan 2019 at 14:57, Christoph Hellwig  wrote:
>
> On Thu, Jan 10, 2019 at 10:59:02AM +0100, Michel Dänzer wrote:
> >
> > Hi Christoph,
> >
> >
> > https://bugs.freedesktop.org/109234 (please ignore comments #6-#9) was
> > bisected to your commit 55897af63091 "dma-direct: merge swiotlb_dma_ops
> > into the dma_direct code". Any ideas?
>
> From the trace it looks like we git the case where swiotlb tries
> to copy back data from a bounce buffer, but hits a dangling or NULL
> pointer.  So a couple questions for the submitter:
My apologies if I misunderstand something, this subject matter is new to me.

>
>  - does the system have more than 4GB memory and thus use swiotlb?
My system has 8GB memory. The other report on the bug tracker had 16GB.

>(check /proc/meminfo, and if something SWIOTLB appears in dmesg)
/proc/meminfo: https://ptpb.pw/4rxI
Can I grep dmesg for a string?

>  - does the device this happens on have a DMA mask smaller than
>the available memory, that is should swiotlb be used here to start
>with?
It's a MSI Radeon RX 570 Gaming X 4GB. The other report was a RX 580.
lshw output: https://ptpb.pw/6s0H


Regards,

Sibren
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-11 Thread Sibren Vasse
On Thu, 10 Jan 2019 at 15:48, Christoph Hellwig  wrote:
>
> On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
> >>  From the trace it looks like we git the case where swiotlb tries
> >> to copy back data from a bounce buffer, but hits a dangling or NULL
> >> pointer.  So a couple questions for the submitter:
> >>
> >>   - does the system have more than 4GB memory and thus use swiotlb?
> >> (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
> >>   - does the device this happens on have a DMA mask smaller than
> >> the available memory, that is should swiotlb be used here to start
> >> with?
> >
> > Rather unlikely. The device is an AMD GPU, so we can address memory up to
> > 1TB.
>
> So we probably somehow got a false positive.
>
> For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
> backtrace really is in the swiotlb code (I can't think of anything else,
> but I'd rather be sure).
I'm not sure what you want me to confirm. Could you elaborate?

>
> Second it would be great to print what the contents of io_tlb_start
> and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer,
> maybe that gives a clue why we are hitting the swiotlb code here.

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7c007ed7505f..042246dbae00 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -69,6 +69,7 @@ extern phys_addr_t io_tlb_start, io_tlb_end;

 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
+printk_once(KERN_INFO "io_tlb_start: %llu, io_tlb_end: %llu",
io_tlb_start, io_tlb_end);
 return paddr >= io_tlb_start && paddr < io_tlb_end;
 }

Result on boot:
[   11.405558] io_tlb_start: 3782983680, io_tlb_end: 3850092544

Regards,

Sibren
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-10 Thread Konrad Rzeszutek Wilk
On Thu, Jan 10, 2019 at 04:26:43PM +0100, Sibren Vasse wrote:
> On Thu, 10 Jan 2019 at 14:57, Christoph Hellwig  wrote:
> >
> > On Thu, Jan 10, 2019 at 10:59:02AM +0100, Michel Dänzer wrote:
> > >
> > > Hi Christoph,
> > >
> > >
> > > https://bugs.freedesktop.org/109234 (please ignore comments #6-#9) was
> > > bisected to your commit 55897af63091 "dma-direct: merge swiotlb_dma_ops
> > > into the dma_direct code". Any ideas?
> >
> > From the trace it looks like we git the case where swiotlb tries
> > to copy back data from a bounce buffer, but hits a dangling or NULL
> > pointer.  So a couple questions for the submitter:
> My apologies if I misunderstand something, this subject matter is new to me.
> 
> >
> >  - does the system have more than 4GB memory and thus use swiotlb?
> My system has 8GB memory. The other report on the bug tracker had 16GB.
> 
> >(check /proc/meminfo, and if something SWIOTLB appears in dmesg)
> /proc/meminfo: https://ptpb.pw/4rxI
> Can I grep dmesg for a string?

Can you attach the 'dmesg'? 

> 
> >  - does the device this happens on have a DMA mask smaller than
> >the available memory, that is should swiotlb be used here to start
> >with?
> It's a MSI Radeon RX 570 Gaming X 4GB. The other report was a RX 580.
> lshw output: https://ptpb.pw/6s0H
> 
> 
> Regards,
> 
> Sibren
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-10 Thread Christoph Hellwig
On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote:
>>  From the trace it looks like we git the case where swiotlb tries
>> to copy back data from a bounce buffer, but hits a dangling or NULL
>> pointer.  So a couple questions for the submitter:
>>
>>   - does the system have more than 4GB memory and thus use swiotlb?
>> (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
>>   - does the device this happens on have a DMA mask smaller than
>> the available memory, that is should swiotlb be used here to start
>> with?
>
> Rather unlikely. The device is an AMD GPU, so we can address memory up to 
> 1TB.

So we probably somehow got a false positive.

For now I'like the reported to confirm that the dma_direct_unmap_page+0x92
backtrace really is in the swiotlb code (I can't think of anything else,
but I'd rather be sure).

Second it would be great to print what the contents of io_tlb_start
and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer,
maybe that gives a clue why we are hitting the swiotlb code here.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-10 Thread Christian König

Am 10.01.19 um 14:57 schrieb Christoph Hellwig:

On Thu, Jan 10, 2019 at 10:59:02AM +0100, Michel Dänzer wrote:

Hi Christoph,


https://bugs.freedesktop.org/109234 (please ignore comments #6-#9) was
bisected to your commit 55897af63091 "dma-direct: merge swiotlb_dma_ops
into the dma_direct code". Any ideas?

 From the trace it looks like we git the case where swiotlb tries
to copy back data from a bounce buffer, but hits a dangling or NULL
pointer.  So a couple questions for the submitter:

  - does the system have more than 4GB memory and thus use swiotlb?
(check /proc/meminfo, and if something SWIOTLB appears in dmesg)
  - does the device this happens on have a DMA mask smaller than
the available memory, that is should swiotlb be used here to start
with?


Rather unlikely. The device is an AMD GPU, so we can address memory up 
to 1TB.


Christian.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code

2019-01-10 Thread Christoph Hellwig
On Thu, Jan 10, 2019 at 10:59:02AM +0100, Michel Dänzer wrote:
> 
> Hi Christoph,
> 
> 
> https://bugs.freedesktop.org/109234 (please ignore comments #6-#9) was
> bisected to your commit 55897af63091 "dma-direct: merge swiotlb_dma_ops
> into the dma_direct code". Any ideas?

From the trace it looks like we git the case where swiotlb tries
to copy back data from a bounce buffer, but hits a dangling or NULL
pointer.  So a couple questions for the submitter:

 - does the system have more than 4GB memory and thus use swiotlb?
   (check /proc/meminfo, and if something SWIOTLB appears in dmesg)
 - does the device this happens on have a DMA mask smaller than
   the available memory, that is should swiotlb be used here to start
   with?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel