Re: two KASANs in TTM logic

2018-09-08 Thread Tom St Denis



On 09/08/2018 05:23 AM, Huang Rui wrote:

On Fri, Sep 07, 2018 at 04:59:11PM +0800, Christian König wrote:

Hi Ray,

in the meantime can we disable the feature once more in the kernel until
we have hammered out all possible corner cases?


That's fine. So far, we have to disable it again. I will do more testing
and repro the issue of Tom firstly.



As Tom figured out commenting out setting "bulk_moveable" to true should
be enough.


I saw you already remove the "bulk_moveable = true" in amdgpu_vm_init(), do
you point we also comment out the one in amdgpu_vm_move_to_lru_tail() to
disable bulk_move totally for the moment?


Hi Ray,

I just commented out the assignment of true.

Tom




Thanks,
Ray



Thanks,
Christian.

Am 07.09.2018 um 08:51 schrieb Huang, Ray:

Hi Tom,

Thanks to trace this issue.  I am trying to reproduce it on 
amd-staging-drm-next with piglit.
May I know the steps/configurations to repro it?

Thanks,
Ray

-Original Message-
From: amd-gfx  On Behalf Of Tom St Denis
Sent: Wednesday, September 5, 2018 9:27 PM
To: Koenig, Christian ; Daenzer, Michel 
; amd-gfx@lists.freedesktop.org; Deucher, Alexander 

Subject: Re: two KASANs in TTM logic

Logs attached.

Tom



On 09/05/2018 08:02 AM, Christian König wrote:

Still not the slightest idea what is causing this and the patch
definitely fixes things a lot.

Can you try to enable list debugging in your kernel?

Thanks,
Christian.

Am 04.09.2018 um 19:18 schrieb Tom St Denis:

Sure:

d2917f399e0b250f47d07da551a335843a24f835 is the first bad commit
commit d2917f399e0b250f47d07da551a335843a24f835
Author: Christian König 
Date:   Thu Aug 30 10:04:53 2018 +0200

      drm/amdgpu: fix "use bulk moves for efficient VM LRU handling" v2

      First step to fix the LRU corruption, we accidentially tried to
move things
      on the LRU after dropping the lock.

      Signed-off-by: Christian König 
      Tested-by: Michel Dänzer 

:04 04 ed5be1ad4da129c4154b2b43acf7ef349a470700
0008c4e2fb56512f41559618dd474c916fc09a37 M  drivers


The commit before that I can run xonotic-glx and piglit on my Carrizo
without a KASAN.

Tom

On 09/04/2018 10:05 AM, Christian König wrote:

The first one should already be fixed.

Not sure where the second comes from. Can you narrow that down further?

Christian.

Am 04.09.2018 um 15:46 schrieb Tom St Denis:

First is caused by this commit while running a GL heavy application.

d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit
commit d78c1fa0c9f815fe951fd57001acca3d35262a17
Author: Michel Dänzer 
Date:   Wed Aug 29 11:59:38 2018 +0200

      Revert "drm/amdgpu: move PD/PT bos on LRU again"

      This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.

      It triggered various badness on my development machine when
running the
      piglit gpu profile with radeonsi on Bonaire, looks like memory
      corruption due to insufficiently protected list manipulations.

      Signed-off-by: Michel Dänzer 
      Signed-off-by: Alex Deucher 

:04 04 b7169f0cf0c7decec631751a9896a92badb67f9d
42ea58f43199d26fc0c7ddcc655e6d0964b81817 M  drivers

The second is caused by something between that and the tip of the
4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while
loading GNOME.

Tom



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: two KASANs in TTM logic

2018-09-08 Thread Huang Rui
On Fri, Sep 07, 2018 at 04:59:11PM +0800, Christian König wrote:
> Hi Ray,
> 
> in the meantime can we disable the feature once more in the kernel until 
> we have hammered out all possible corner cases?

That's fine. So far, we have to disable it again. I will do more testing
and repro the issue of Tom firstly.

> 
> As Tom figured out commenting out setting "bulk_moveable" to true should 
> be enough.

I saw you already remove the "bulk_moveable = true" in amdgpu_vm_init(), do
you point we also comment out the one in amdgpu_vm_move_to_lru_tail() to
disable bulk_move totally for the moment?

Thanks,
Ray

> 
> Thanks,
> Christian.
> 
> Am 07.09.2018 um 08:51 schrieb Huang, Ray:
> > Hi Tom,
> >
> > Thanks to trace this issue.  I am trying to reproduce it on 
> > amd-staging-drm-next with piglit.
> > May I know the steps/configurations to repro it?
> >
> > Thanks,
> > Ray
> >
> > -Original Message-
> > From: amd-gfx  On Behalf Of Tom St 
> > Denis
> > Sent: Wednesday, September 5, 2018 9:27 PM
> > To: Koenig, Christian ; Daenzer, Michel 
> > ; amd-gfx@lists.freedesktop.org; Deucher, Alexander 
> > 
> > Subject: Re: two KASANs in TTM logic
> >
> > Logs attached.
> >
> > Tom
> >
> >
> >
> > On 09/05/2018 08:02 AM, Christian König wrote:
> >> Still not the slightest idea what is causing this and the patch
> >> definitely fixes things a lot.
> >>
> >> Can you try to enable list debugging in your kernel?
> >>
> >> Thanks,
> >> Christian.
> >>
> >> Am 04.09.2018 um 19:18 schrieb Tom St Denis:
> >>> Sure:
> >>>
> >>> d2917f399e0b250f47d07da551a335843a24f835 is the first bad commit
> >>> commit d2917f399e0b250f47d07da551a335843a24f835
> >>> Author: Christian König 
> >>> Date:   Thu Aug 30 10:04:53 2018 +0200
> >>>
> >>>      drm/amdgpu: fix "use bulk moves for efficient VM LRU handling" v2
> >>>
> >>>      First step to fix the LRU corruption, we accidentially tried to
> >>> move things
> >>>      on the LRU after dropping the lock.
> >>>
> >>>      Signed-off-by: Christian König 
> >>>      Tested-by: Michel Dänzer 
> >>>
> >>> :04 04 ed5be1ad4da129c4154b2b43acf7ef349a470700
> >>> 0008c4e2fb56512f41559618dd474c916fc09a37 M  drivers
> >>>
> >>>
> >>> The commit before that I can run xonotic-glx and piglit on my Carrizo
> >>> without a KASAN.
> >>>
> >>> Tom
> >>>
> >>> On 09/04/2018 10:05 AM, Christian König wrote:
> >>>> The first one should already be fixed.
> >>>>
> >>>> Not sure where the second comes from. Can you narrow that down further?
> >>>>
> >>>> Christian.
> >>>>
> >>>> Am 04.09.2018 um 15:46 schrieb Tom St Denis:
> >>>>> First is caused by this commit while running a GL heavy application.
> >>>>>
> >>>>> d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit
> >>>>> commit d78c1fa0c9f815fe951fd57001acca3d35262a17
> >>>>> Author: Michel Dänzer 
> >>>>> Date:   Wed Aug 29 11:59:38 2018 +0200
> >>>>>
> >>>>>      Revert "drm/amdgpu: move PD/PT bos on LRU again"
> >>>>>
> >>>>>      This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.
> >>>>>
> >>>>>      It triggered various badness on my development machine when
> >>>>> running the
> >>>>>      piglit gpu profile with radeonsi on Bonaire, looks like memory
> >>>>>      corruption due to insufficiently protected list manipulations.
> >>>>>
> >>>>>      Signed-off-by: Michel Dänzer 
> >>>>>      Signed-off-by: Alex Deucher 
> >>>>>
> >>>>> :04 04 b7169f0cf0c7decec631751a9896a92badb67f9d
> >>>>> 42ea58f43199d26fc0c7ddcc655e6d0964b81817 M  drivers
> >>>>>
> >>>>> The second is caused by something between that and the tip of the
> >>>>> 4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while
> >>>>> loading GNOME.
> >>>>>
> >>>>> Tom
> >>>>>
> >>>>>
> >>>>>
> >>>>> ___
> >>>>> amd-gfx mailing list
> >>>>> amd-gfx@lists.freedesktop.org
> >>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: two KASANs in TTM logic

2018-09-07 Thread Christian König

Hi Ray,

in the meantime can we disable the feature once more in the kernel until 
we have hammered out all possible corner cases?


As Tom figured out commenting out setting "bulk_moveable" to true should 
be enough.


Thanks,
Christian.

Am 07.09.2018 um 08:51 schrieb Huang, Ray:

Hi Tom,

Thanks to trace this issue.  I am trying to reproduce it on 
amd-staging-drm-next with piglit.
May I know the steps/configurations to repro it?

Thanks,
Ray

-Original Message-
From: amd-gfx  On Behalf Of Tom St Denis
Sent: Wednesday, September 5, 2018 9:27 PM
To: Koenig, Christian ; Daenzer, Michel 
; amd-gfx@lists.freedesktop.org; Deucher, Alexander 

Subject: Re: two KASANs in TTM logic

Logs attached.

Tom



On 09/05/2018 08:02 AM, Christian König wrote:

Still not the slightest idea what is causing this and the patch
definitely fixes things a lot.

Can you try to enable list debugging in your kernel?

Thanks,
Christian.

Am 04.09.2018 um 19:18 schrieb Tom St Denis:

Sure:

d2917f399e0b250f47d07da551a335843a24f835 is the first bad commit
commit d2917f399e0b250f47d07da551a335843a24f835
Author: Christian König 
Date:   Thu Aug 30 10:04:53 2018 +0200

     drm/amdgpu: fix "use bulk moves for efficient VM LRU handling" v2

     First step to fix the LRU corruption, we accidentially tried to
move things
     on the LRU after dropping the lock.

     Signed-off-by: Christian König 
     Tested-by: Michel Dänzer 

:04 04 ed5be1ad4da129c4154b2b43acf7ef349a470700
0008c4e2fb56512f41559618dd474c916fc09a37 M  drivers


The commit before that I can run xonotic-glx and piglit on my Carrizo
without a KASAN.

Tom

On 09/04/2018 10:05 AM, Christian König wrote:

The first one should already be fixed.

Not sure where the second comes from. Can you narrow that down further?

Christian.

Am 04.09.2018 um 15:46 schrieb Tom St Denis:

First is caused by this commit while running a GL heavy application.

d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit
commit d78c1fa0c9f815fe951fd57001acca3d35262a17
Author: Michel Dänzer 
Date:   Wed Aug 29 11:59:38 2018 +0200

     Revert "drm/amdgpu: move PD/PT bos on LRU again"

     This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.

     It triggered various badness on my development machine when
running the
     piglit gpu profile with radeonsi on Bonaire, looks like memory
     corruption due to insufficiently protected list manipulations.

     Signed-off-by: Michel Dänzer 
     Signed-off-by: Alex Deucher 

:04 04 b7169f0cf0c7decec631751a9896a92badb67f9d
42ea58f43199d26fc0c7ddcc655e6d0964b81817 M  drivers

The second is caused by something between that and the tip of the
4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while
loading GNOME.

Tom



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: two KASANs in TTM logic

2018-09-07 Thread Huang, Ray
Hi Tom,

Thanks to trace this issue.  I am trying to reproduce it on 
amd-staging-drm-next with piglit.
May I know the steps/configurations to repro it?

Thanks,
Ray

-Original Message-
From: amd-gfx  On Behalf Of Tom St Denis
Sent: Wednesday, September 5, 2018 9:27 PM
To: Koenig, Christian ; Daenzer, Michel 
; amd-gfx@lists.freedesktop.org; Deucher, Alexander 

Subject: Re: two KASANs in TTM logic

Logs attached.

Tom



On 09/05/2018 08:02 AM, Christian König wrote:
> Still not the slightest idea what is causing this and the patch 
> definitely fixes things a lot.
> 
> Can you try to enable list debugging in your kernel?
> 
> Thanks,
> Christian.
> 
> Am 04.09.2018 um 19:18 schrieb Tom St Denis:
>> Sure:
>>
>> d2917f399e0b250f47d07da551a335843a24f835 is the first bad commit 
>> commit d2917f399e0b250f47d07da551a335843a24f835
>> Author: Christian König 
>> Date:   Thu Aug 30 10:04:53 2018 +0200
>>
>>     drm/amdgpu: fix "use bulk moves for efficient VM LRU handling" v2
>>
>>     First step to fix the LRU corruption, we accidentially tried to 
>> move things
>>     on the LRU after dropping the lock.
>>
>>     Signed-off-by: Christian König 
>>     Tested-by: Michel Dänzer 
>>
>> :04 04 ed5be1ad4da129c4154b2b43acf7ef349a470700
>> 0008c4e2fb56512f41559618dd474c916fc09a37 M  drivers
>>
>>
>> The commit before that I can run xonotic-glx and piglit on my Carrizo 
>> without a KASAN.
>>
>> Tom
>>
>> On 09/04/2018 10:05 AM, Christian König wrote:
>>> The first one should already be fixed.
>>>
>>> Not sure where the second comes from. Can you narrow that down further?
>>>
>>> Christian.
>>>
>>> Am 04.09.2018 um 15:46 schrieb Tom St Denis:
>>>> First is caused by this commit while running a GL heavy application.
>>>>
>>>> d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit 
>>>> commit d78c1fa0c9f815fe951fd57001acca3d35262a17
>>>> Author: Michel Dänzer 
>>>> Date:   Wed Aug 29 11:59:38 2018 +0200
>>>>
>>>>     Revert "drm/amdgpu: move PD/PT bos on LRU again"
>>>>
>>>>     This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.
>>>>
>>>>     It triggered various badness on my development machine when 
>>>> running the
>>>>     piglit gpu profile with radeonsi on Bonaire, looks like memory
>>>>     corruption due to insufficiently protected list manipulations.
>>>>
>>>>     Signed-off-by: Michel Dänzer 
>>>>     Signed-off-by: Alex Deucher 
>>>>
>>>> :04 04 b7169f0cf0c7decec631751a9896a92badb67f9d
>>>> 42ea58f43199d26fc0c7ddcc655e6d0964b81817 M  drivers
>>>>
>>>> The second is caused by something between that and the tip of the
>>>> 4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while 
>>>> loading GNOME.
>>>>
>>>> Tom
>>>>
>>>>
>>>>
>>>> ___
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: two KASANs in TTM logic

2018-09-05 Thread Tom St Denis

Logs attached.

Tom



On 09/05/2018 08:02 AM, Christian König wrote:
Still not the slightest idea what is causing this and the patch 
definitely fixes things a lot.


Can you try to enable list debugging in your kernel?

Thanks,
Christian.

Am 04.09.2018 um 19:18 schrieb Tom St Denis:

Sure:

d2917f399e0b250f47d07da551a335843a24f835 is the first bad commit
commit d2917f399e0b250f47d07da551a335843a24f835
Author: Christian König 
Date:   Thu Aug 30 10:04:53 2018 +0200

    drm/amdgpu: fix "use bulk moves for efficient VM LRU handling" v2

    First step to fix the LRU corruption, we accidentially tried to 
move things

    on the LRU after dropping the lock.

    Signed-off-by: Christian König 
    Tested-by: Michel Dänzer 

:04 04 ed5be1ad4da129c4154b2b43acf7ef349a470700 
0008c4e2fb56512f41559618dd474c916fc09a37 M  drivers



The commit before that I can run xonotic-glx and piglit on my Carrizo 
without a KASAN.


Tom

On 09/04/2018 10:05 AM, Christian König wrote:

The first one should already be fixed.

Not sure where the second comes from. Can you narrow that down further?

Christian.

Am 04.09.2018 um 15:46 schrieb Tom St Denis:

First is caused by this commit while running a GL heavy application.

d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit
commit d78c1fa0c9f815fe951fd57001acca3d35262a17
Author: Michel Dänzer 
Date:   Wed Aug 29 11:59:38 2018 +0200

    Revert "drm/amdgpu: move PD/PT bos on LRU again"

    This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.

    It triggered various badness on my development machine when 
running the

    piglit gpu profile with radeonsi on Bonaire, looks like memory
    corruption due to insufficiently protected list manipulations.

    Signed-off-by: Michel Dänzer 
    Signed-off-by: Alex Deucher 

:04 04 b7169f0cf0c7decec631751a9896a92badb67f9d 
42ea58f43199d26fc0c7ddcc655e6d0964b81817 M  drivers


The second is caused by something between that and the tip of the 
4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while 
loading GNOME.


Tom



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[0.00] Linux version 4.19.0-rc1+ (root@raven) (gcc version 8.1.1 20180712 (Red Hat 8.1.1-5) (GCC)) #24 SMP Wed Sep 5 08:59:20 EDT 2018
[0.00] Command line: BOOT_IMAGE=/vmlinuz-4.19.0-rc1+ root=UUID=66163c80-0ca1-4beb-aeba-5cc130b813e6 ro rhgb quiet modprobe.blacklist=amdgpu,radeon LANG=en_CA.UTF-8
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009d3ff] usable
[0.00] BIOS-e820: [mem 0x0009d400-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x03ff] usable
[0.00] BIOS-e820: [mem 0x0400-0x04009fff] ACPI NVS
[0.00] BIOS-e820: [mem 0x0400a000-0x09bf] usable
[0.00] BIOS-e820: [mem 0x09c0-0x09ff] reserved
[0.00] BIOS-e820: [mem 0x0a00-0x0aff] usable
[0.00] BIOS-e820: [mem 0x0b00-0x0b01] reserved
[0.00] BIOS-e820: [mem 0x0b02-0x73963fff] usable
[0.00] BIOS-e820: [mem 0x73964000-0x7397cfff] ACPI data
[0.00] BIOS-e820: [mem 0x7397d000-0x7a5aafff] usable
[0.00] BIOS-e820: [mem 0x7a5ab000-0x7a6c2fff] reserved
[0.00] BIOS-e820: [mem 0x7a6c3000-0x7a6cefff] ACPI data
[0.00] BIOS-e820: [mem 0x7a6cf000-0x7a7d1fff] usable
[0.00] BIOS-e820: [mem 0x7a7d2000-0x7ab89fff] ACPI NVS
[0.00] BIOS-e820: [mem 0x7ab8a000-0x7b942fff] reserved
[0.00] BIOS-e820: [mem 0x7b943000-0x7dff] usable
[0.00] BIOS-e820: [mem 0x7e00-0xbfff] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfd80-0xfdff] reserved
[0.00] BIOS-e820: [mem 0xfea0-0xfea0] reserved
[0.00] BIOS-e820: [mem 0xfeb8-0xfec01fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfec3-0xfec30fff] reserved
[0.00] 

Re: two KASANs in TTM logic

2018-09-05 Thread Tom St Denis

Hi Christian,

Will in a sec.  I'm doing a piglit run with Felix's KFD patch on top of 
HEAD~ just to verify that everything before that is peachy on my 
Raven+Polaris rig.


Tom

On 09/05/2018 08:02 AM, Christian König wrote:
Still not the slightest idea what is causing this and the patch 
definitely fixes things a lot.


Can you try to enable list debugging in your kernel?

Thanks,
Christian.

Am 04.09.2018 um 19:18 schrieb Tom St Denis:

Sure:

d2917f399e0b250f47d07da551a335843a24f835 is the first bad commit
commit d2917f399e0b250f47d07da551a335843a24f835
Author: Christian König 
Date:   Thu Aug 30 10:04:53 2018 +0200

    drm/amdgpu: fix "use bulk moves for efficient VM LRU handling" v2

    First step to fix the LRU corruption, we accidentially tried to 
move things

    on the LRU after dropping the lock.

    Signed-off-by: Christian König 
    Tested-by: Michel Dänzer 

:04 04 ed5be1ad4da129c4154b2b43acf7ef349a470700 
0008c4e2fb56512f41559618dd474c916fc09a37 M  drivers



The commit before that I can run xonotic-glx and piglit on my Carrizo 
without a KASAN.


Tom

On 09/04/2018 10:05 AM, Christian König wrote:

The first one should already be fixed.

Not sure where the second comes from. Can you narrow that down further?

Christian.

Am 04.09.2018 um 15:46 schrieb Tom St Denis:

First is caused by this commit while running a GL heavy application.

d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit
commit d78c1fa0c9f815fe951fd57001acca3d35262a17
Author: Michel Dänzer 
Date:   Wed Aug 29 11:59:38 2018 +0200

    Revert "drm/amdgpu: move PD/PT bos on LRU again"

    This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.

    It triggered various badness on my development machine when 
running the

    piglit gpu profile with radeonsi on Bonaire, looks like memory
    corruption due to insufficiently protected list manipulations.

    Signed-off-by: Michel Dänzer 
    Signed-off-by: Alex Deucher 

:04 04 b7169f0cf0c7decec631751a9896a92badb67f9d 
42ea58f43199d26fc0c7ddcc655e6d0964b81817 M  drivers


The second is caused by something between that and the tip of the 
4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while 
loading GNOME.


Tom



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: two KASANs in TTM logic

2018-09-05 Thread Christian König
Still not the slightest idea what is causing this and the patch 
definitely fixes things a lot.


Can you try to enable list debugging in your kernel?

Thanks,
Christian.

Am 04.09.2018 um 19:18 schrieb Tom St Denis:

Sure:

d2917f399e0b250f47d07da551a335843a24f835 is the first bad commit
commit d2917f399e0b250f47d07da551a335843a24f835
Author: Christian König 
Date:   Thu Aug 30 10:04:53 2018 +0200

    drm/amdgpu: fix "use bulk moves for efficient VM LRU handling" v2

    First step to fix the LRU corruption, we accidentially tried to 
move things

    on the LRU after dropping the lock.

    Signed-off-by: Christian König 
    Tested-by: Michel Dänzer 

:04 04 ed5be1ad4da129c4154b2b43acf7ef349a470700 
0008c4e2fb56512f41559618dd474c916fc09a37 M  drivers



The commit before that I can run xonotic-glx and piglit on my Carrizo 
without a KASAN.


Tom

On 09/04/2018 10:05 AM, Christian König wrote:

The first one should already be fixed.

Not sure where the second comes from. Can you narrow that down further?

Christian.

Am 04.09.2018 um 15:46 schrieb Tom St Denis:

First is caused by this commit while running a GL heavy application.

d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit
commit d78c1fa0c9f815fe951fd57001acca3d35262a17
Author: Michel Dänzer 
Date:   Wed Aug 29 11:59:38 2018 +0200

    Revert "drm/amdgpu: move PD/PT bos on LRU again"

    This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.

    It triggered various badness on my development machine when 
running the

    piglit gpu profile with radeonsi on Bonaire, looks like memory
    corruption due to insufficiently protected list manipulations.

    Signed-off-by: Michel Dänzer 
    Signed-off-by: Alex Deucher 

:04 04 b7169f0cf0c7decec631751a9896a92badb67f9d 
42ea58f43199d26fc0c7ddcc655e6d0964b81817 M  drivers


The second is caused by something between that and the tip of the 
4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while 
loading GNOME.


Tom



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: two KASANs in TTM logic

2018-09-04 Thread Tom St Denis

Sure:

d2917f399e0b250f47d07da551a335843a24f835 is the first bad commit
commit d2917f399e0b250f47d07da551a335843a24f835
Author: Christian König 
Date:   Thu Aug 30 10:04:53 2018 +0200

drm/amdgpu: fix "use bulk moves for efficient VM LRU handling" v2

First step to fix the LRU corruption, we accidentially tried to 
move things

on the LRU after dropping the lock.

Signed-off-by: Christian König 
Tested-by: Michel Dänzer 

:04 04 ed5be1ad4da129c4154b2b43acf7ef349a470700 
0008c4e2fb56512f41559618dd474c916fc09a37 M  drivers



The commit before that I can run xonotic-glx and piglit on my Carrizo 
without a KASAN.


Tom

On 09/04/2018 10:05 AM, Christian König wrote:

The first one should already be fixed.

Not sure where the second comes from. Can you narrow that down further?

Christian.

Am 04.09.2018 um 15:46 schrieb Tom St Denis:

First is caused by this commit while running a GL heavy application.

d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit
commit d78c1fa0c9f815fe951fd57001acca3d35262a17
Author: Michel Dänzer 
Date:   Wed Aug 29 11:59:38 2018 +0200

    Revert "drm/amdgpu: move PD/PT bos on LRU again"

    This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.

    It triggered various badness on my development machine when 
running the

    piglit gpu profile with radeonsi on Bonaire, looks like memory
    corruption due to insufficiently protected list manipulations.

    Signed-off-by: Michel Dänzer 
    Signed-off-by: Alex Deucher 

:04 04 b7169f0cf0c7decec631751a9896a92badb67f9d 
42ea58f43199d26fc0c7ddcc655e6d0964b81817 M  drivers


The second is caused by something between that and the tip of the 
4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while 
loading GNOME.


Tom



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: two KASANs in TTM logic

2018-09-04 Thread Christian König

The first one should already be fixed.

Not sure where the second comes from. Can you narrow that down further?

Christian.

Am 04.09.2018 um 15:46 schrieb Tom St Denis:

First is caused by this commit while running a GL heavy application.

d78c1fa0c9f815fe951fd57001acca3d35262a17 is the first bad commit
commit d78c1fa0c9f815fe951fd57001acca3d35262a17
Author: Michel Dänzer 
Date:   Wed Aug 29 11:59:38 2018 +0200

    Revert "drm/amdgpu: move PD/PT bos on LRU again"

    This reverts commit 31625ccae4464b61ec8cdb9740df848bbc857a5b.

    It triggered various badness on my development machine when 
running the

    piglit gpu profile with radeonsi on Bonaire, looks like memory
    corruption due to insufficiently protected list manipulations.

    Signed-off-by: Michel Dänzer 
    Signed-off-by: Alex Deucher 

:04 04 b7169f0cf0c7decec631751a9896a92badb67f9d 
42ea58f43199d26fc0c7ddcc655e6d0964b81817 M  drivers


The second is caused by something between that and the tip of the 
4.19-rc1 amd-staging-drm-next (I haven't pinned it down yet) while 
loading GNOME.


Tom



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx