Re: Regression with kernel 4.20 on armhf

2019-01-05 Thread Luís Mendes
Reply in between.

On Fri, Jan 4, 2019 at 4:22 PM Michel Dänzer  wrote:
>
> On 2019-01-04 4:32 p.m., Luís Mendes wrote:
> > Hi Alex, Christian,
> >
> > I've tested amd-staging-drm-next at commit
> > 9698024e8a191481321574bec1fe886bbce797cf - drm/amdgpu: Cleanup 2
> > compiler warnings,
> > and now RX 550 Polaris 12 still hangs in ring gfx with XOrg, but a gpu
> > recovery is now performed and works, except that VRAM contents are
> > lost and screen image becomes corrupted.
>
> This is because OpenGL contexts become unusable when a full GPU reset is
> performed. Making it possible to fully recover from this without
> restarting Xorg (which uses OpenGL via glamor) or at least the
> compositor / other apps using OpenGL would be tricky and require a lot
> of effort, so I wouldn't bet on it ever happening I'm afraid.

Makes sense.

>
>
> > Maybe this is not a driver issue, but rather a mesa or XOrg issue,
> > since something is sent to the compute/gfx unit that causes the GPU to
> > hang, so it is not only timing sensitive, but is mainly because of
> > wrong openGL commands, that drive the GPU into an invalid state
>
> In that case, the problem would be expected to happen the same way on
> x86 as well. There seems to be some kind of platform specific aspect
> affecting it.

It is quite possible this also affects x86. I just haven't tried
running Ubuntu MATE 18.04 on x86.

>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Regression with kernel 4.20 on armhf

2019-01-04 Thread Michel Dänzer
On 2019-01-04 4:32 p.m., Luís Mendes wrote:
> Hi Alex, Christian,
> 
> I've tested amd-staging-drm-next at commit
> 9698024e8a191481321574bec1fe886bbce797cf - drm/amdgpu: Cleanup 2
> compiler warnings,
> and now RX 550 Polaris 12 still hangs in ring gfx with XOrg, but a gpu
> recovery is now performed and works, except that VRAM contents are
> lost and screen image becomes corrupted.

This is because OpenGL contexts become unusable when a full GPU reset is
performed. Making it possible to fully recover from this without
restarting Xorg (which uses OpenGL via glamor) or at least the
compositor / other apps using OpenGL would be tricky and require a lot
of effort, so I wouldn't bet on it ever happening I'm afraid.


> Maybe this is not a driver issue, but rather a mesa or XOrg issue,
> since something is sent to the compute/gfx unit that causes the GPU to
> hang, so it is not only timing sensitive, but is mainly because of
> wrong openGL commands, that drive the GPU into an invalid state

In that case, the problem would be expected to happen the same way on
x86 as well. There seems to be some kind of platform specific aspect
affecting it.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Regression with kernel 4.20 on armhf

2019-01-03 Thread Luís Mendes
Hi Alex,

I've made a backport of that patch to linux 4.20, but didn't notice
any improvements. It looks like, from the logs with drm debug
messages, that the issue is happening after the ring tests, while
entering X session.
Thanks for your suggestion, anyway. I'll try that again once
linux-4.21-rc1 comes out.

Regards,
Luís

On Thu, Jan 3, 2019 at 2:12 PM Alex Deucher  wrote:
>
> Does this patch help by any chance?
>
> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-staging-drm-next=5e01c09ce3b7263d88873105f21a82eda904664b
>
> Alex
>
> On Thu, Jan 3, 2019 at 7:14 AM Luís Mendes  wrote:
> >
> > Hi Christian, Alex,
> >
> > I've set the kernel command line with drm.debug=0xf, and I see what
> > could be a race condition that triggers the failure, and from what I
> > see the critical path is quite after the ring tests. This happens on
> > ARM but maybe what is also affecting my TYAN S7002 and S7025, as the
> > failure symptom seems similar, except it is failing every time on the
> > TYANs.  While on an AsRock Rack EP2C602 with Xeon E5 v2 it is working
> > fine.
> >
> > Below follow the two log excerpts, the first from a working
> > initialization attempt, and the second from a failed initialization
> > attempt. Both attemps were made with with kernel vanilla 4.20.0 on the
> > same armhf system. Full dmesg logs attached. Please ignored the EDID
> > errors, as I'm having a problem with this particular CROWN TV. The
> > EDID gets overwritten at every boot when connected to any Radeon RX
> > card that I have tried, while with Radeon R7 240 the EDID is not
> > corrupted on boot, but that's another story.
> >
> > Meanwhile I will try to find the concrete racing condition. It is
> > noticeable that for some reason the kernel thread
> > [drm:amdgpu_ih_process [amdgpu]] doesn't receive updates due to the
> > gpu hang and only one EOP irq is recevied on the bad boot attempt,
> > while on the good attempt 3 EOP irqs are triggered.
> >
> > Good attempt (critical log excerpt from kern_good.log):
> > Jan  3 11:28:03 picolo kernel: [   39.845747] [drm:amdgpu_ih_process
> > [amdgpu]] amdgpu_ih_process: rptr 16032, wptr 16048
> > Jan  3 11:28:03 picolo kernel: [   39.845987]
> > [drm:drm_calc_vbltimestamp_from_scanoutpos [drm]] crtc 0: Noisy
> > timestamp 26 us > 20 us [3 reps].
> > Jan  3 11:28:03 picolo kernel: [   39.850430] [drm:drm_ioctl [drm]]
> > pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > Jan  3 11:28:03 picolo kernel: [   39.850489] [drm:drm_ioctl [drm]]
> > pid=627, dev=0xe200, auth=1, AMDGPU_CS
> > Jan  3 11:28:03 picolo kernel: [   39.850697] [drm:drm_ioctl [drm]]
> > pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > Jan  3 11:28:03 picolo kernel: [   39.850943] [drm:amdgpu_ih_process
> > [amdgpu]] amdgpu_ih_process: rptr 16048, wptr 16080
> > Jan  3 11:28:03 picolo kernel: [   39.850973] [drm:drm_ioctl [drm]]
> > pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > Jan  3 11:28:03 picolo kernel: [   39.851133]
> > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > Jan  3 11:28:03 picolo kernel: [   39.851159] [drm:drm_ioctl [drm]]
> > pid=627, dev=0xe200, auth=1, AMDGPU_CS
> > Jan  3 11:28:03 picolo kernel: [   39.851333]
> > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > Jan  3 11:28:03 picolo kernel: [   39.851360] [drm:drm_ioctl [drm]]
> > pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > Jan  3 11:28:03 picolo kernel: [   39.851513] [drm:amdgpu_ih_process
> > [amdgpu]] amdgpu_ih_process: rptr 16080, wptr 16096
> > Jan  3 11:28:03 picolo kernel: [   39.851657]
> > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > Jan  3 11:28:03 picolo kernel: [   39.851810] [drm:amdgpu_ih_process
> > [amdgpu]] amdgpu_ih_process: rptr 16096, wptr 16096
> > Jan  3 11:28:03 picolo kernel: [   39.851950] [drm:amdgpu_ih_process
> > [amdgpu]] amdgpu_ih_process: rptr 16096, wptr 16128
> > Jan  3 11:28:03 picolo kernel: [   39.852091]
> > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > Jan  3 11:28:03 picolo kernel: [   39.852239]
> > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > Jan  3 11:28:03 picolo kernel: [   39.852265] [drm:drm_ioctl [drm]]
> > pid=605, dev=0xe200, auth=1, AMDGPU_WAIT_CS
> > Jan  3 11:28:03 picolo kernel: [   39.852411] [drm:amdgpu_ih_process
> > [amdgpu]] amdgpu_ih_process: rptr 16128, wptr 16128
> > Jan  3 11:28:03 picolo kernel: [   39.852605] [drm:amdgpu_ih_process
> > [amdgpu]] amdgpu_ih_process: rptr 16128, wptr 16144
> > Jan  3 11:28:03 picolo kernel: [   39.852754]
> > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > Jan  3 11:28:03 picolo kernel: [   39.852905] [drm:amdgpu_ih_process
> > [amdgpu]] amdgpu_ih_process: rptr 16144, wptr 16160
> > Jan  3 11:28:03 picolo kernel: [   39.853049]
> > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > Jan  3 11:28:03 picolo kernel: [   39.853210] [drm:amdgpu_ih_process
> > [amdgpu]] amdgpu_ih_process: rptr 16160, wptr 16160
> > Jan  3 11:28:03 picolo kernel: [   39.853418] 

Re: Regression with kernel 4.20 on armhf

2019-01-03 Thread Alex Deucher
Does this patch help by any chance?

https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-staging-drm-next=5e01c09ce3b7263d88873105f21a82eda904664b

Alex

On Thu, Jan 3, 2019 at 7:14 AM Luís Mendes  wrote:
>
> Hi Christian, Alex,
>
> I've set the kernel command line with drm.debug=0xf, and I see what
> could be a race condition that triggers the failure, and from what I
> see the critical path is quite after the ring tests. This happens on
> ARM but maybe what is also affecting my TYAN S7002 and S7025, as the
> failure symptom seems similar, except it is failing every time on the
> TYANs.  While on an AsRock Rack EP2C602 with Xeon E5 v2 it is working
> fine.
>
> Below follow the two log excerpts, the first from a working
> initialization attempt, and the second from a failed initialization
> attempt. Both attemps were made with with kernel vanilla 4.20.0 on the
> same armhf system. Full dmesg logs attached. Please ignored the EDID
> errors, as I'm having a problem with this particular CROWN TV. The
> EDID gets overwritten at every boot when connected to any Radeon RX
> card that I have tried, while with Radeon R7 240 the EDID is not
> corrupted on boot, but that's another story.
>
> Meanwhile I will try to find the concrete racing condition. It is
> noticeable that for some reason the kernel thread
> [drm:amdgpu_ih_process [amdgpu]] doesn't receive updates due to the
> gpu hang and only one EOP irq is recevied on the bad boot attempt,
> while on the good attempt 3 EOP irqs are triggered.
>
> Good attempt (critical log excerpt from kern_good.log):
> Jan  3 11:28:03 picolo kernel: [   39.845747] [drm:amdgpu_ih_process
> [amdgpu]] amdgpu_ih_process: rptr 16032, wptr 16048
> Jan  3 11:28:03 picolo kernel: [   39.845987]
> [drm:drm_calc_vbltimestamp_from_scanoutpos [drm]] crtc 0: Noisy
> timestamp 26 us > 20 us [3 reps].
> Jan  3 11:28:03 picolo kernel: [   39.850430] [drm:drm_ioctl [drm]]
> pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> Jan  3 11:28:03 picolo kernel: [   39.850489] [drm:drm_ioctl [drm]]
> pid=627, dev=0xe200, auth=1, AMDGPU_CS
> Jan  3 11:28:03 picolo kernel: [   39.850697] [drm:drm_ioctl [drm]]
> pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> Jan  3 11:28:03 picolo kernel: [   39.850943] [drm:amdgpu_ih_process
> [amdgpu]] amdgpu_ih_process: rptr 16048, wptr 16080
> Jan  3 11:28:03 picolo kernel: [   39.850973] [drm:drm_ioctl [drm]]
> pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> Jan  3 11:28:03 picolo kernel: [   39.851133]
> [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> Jan  3 11:28:03 picolo kernel: [   39.851159] [drm:drm_ioctl [drm]]
> pid=627, dev=0xe200, auth=1, AMDGPU_CS
> Jan  3 11:28:03 picolo kernel: [   39.851333]
> [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> Jan  3 11:28:03 picolo kernel: [   39.851360] [drm:drm_ioctl [drm]]
> pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> Jan  3 11:28:03 picolo kernel: [   39.851513] [drm:amdgpu_ih_process
> [amdgpu]] amdgpu_ih_process: rptr 16080, wptr 16096
> Jan  3 11:28:03 picolo kernel: [   39.851657]
> [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> Jan  3 11:28:03 picolo kernel: [   39.851810] [drm:amdgpu_ih_process
> [amdgpu]] amdgpu_ih_process: rptr 16096, wptr 16096
> Jan  3 11:28:03 picolo kernel: [   39.851950] [drm:amdgpu_ih_process
> [amdgpu]] amdgpu_ih_process: rptr 16096, wptr 16128
> Jan  3 11:28:03 picolo kernel: [   39.852091]
> [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> Jan  3 11:28:03 picolo kernel: [   39.852239]
> [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> Jan  3 11:28:03 picolo kernel: [   39.852265] [drm:drm_ioctl [drm]]
> pid=605, dev=0xe200, auth=1, AMDGPU_WAIT_CS
> Jan  3 11:28:03 picolo kernel: [   39.852411] [drm:amdgpu_ih_process
> [amdgpu]] amdgpu_ih_process: rptr 16128, wptr 16128
> Jan  3 11:28:03 picolo kernel: [   39.852605] [drm:amdgpu_ih_process
> [amdgpu]] amdgpu_ih_process: rptr 16128, wptr 16144
> Jan  3 11:28:03 picolo kernel: [   39.852754]
> [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> Jan  3 11:28:03 picolo kernel: [   39.852905] [drm:amdgpu_ih_process
> [amdgpu]] amdgpu_ih_process: rptr 16144, wptr 16160
> Jan  3 11:28:03 picolo kernel: [   39.853049]
> [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> Jan  3 11:28:03 picolo kernel: [   39.853210] [drm:amdgpu_ih_process
> [amdgpu]] amdgpu_ih_process: rptr 16160, wptr 16160
> Jan  3 11:28:03 picolo kernel: [   39.853418] [drm:amdgpu_ih_process
> [amdgpu]] amdgpu_ih_process: rptr 16160, wptr 16176
> Jan  3 11:28:03 picolo kernel: [   39.853582] [drm:gfx_v8_0_eop_irq
> [amdgpu]] IH: CP EOP
> Jan  3 11:28:03 picolo kernel: [   39.853752] [drm:amdgpu_ih_process
> [amdgpu]] amdgpu_ih_process: rptr 16176, wptr 16208
> Jan  3 11:28:03 picolo kernel: [   39.853901] [drm:gfx_v8_0_eop_irq
> [amdgpu]] IH: CP EOP
> Jan  3 11:28:03 picolo kernel: [   39.854044] [drm:gfx_v8_0_eop_irq
> [amdgpu]] IH: CP EOP
> Jan  3 11:28:03 picolo kernel: [   39.854205] [drm:amdgpu_ih_process
> 

Re: Regression with kernel 4.20 on armhf

2019-01-02 Thread Christian König

Hi Luis,

mhm, sounds like a timing issue. We have probably made something faster 
during bootup in 4.20 and because of this you now see this issue more often.


If the bisection doesn't show any result can you try adding some 
msleep(10) call at critical places in the driver code to narrow this down?


Officially we don't test/support ARM with the driver code, but in this 
particular case we should probably investigate since it sounds like it 
just doesn't happen on x86 because of different timing.


Thanks,
Christian.

Am 28.12.18 um 15:05 schrieb Luís Mendes:

Hi Alex,

Before all... Have a nice holidays! Happy new year!!

- Okay, so it looks like sometimes the driver is able to enter
graphical mode with the Polaris card, but most of the time it fails
before with:
[   49.762704] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=2, emitted seq=3

- This is something that is happening sporadically but in a less
intensive way in 4.17, 4.18 and 4.19 kernels, so this is actually not
a regression, but rather an existent issue, which maybe the patch
"drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq ring tests
sequence" solves. I tried to backport it to 4.20, but had no
improvement. Need to try with the git version, or rc1.

- This hang happens after the console is displayed in the screen, but
before switching to graphical mode with X.

- However if X is entered then the driver is stable and can be used
for long periods.

Regards,
Luís Mendes

On Tue, Dec 18, 2018 at 11:16 PM Luís Mendes  wrote:

Hi Alex,

I am already using drm_arch_can_wc_memory() set to false.
I will try to bisect...

Regards,
Luís

On Tue, Dec 18, 2018 at 7:03 PM Alex Deucher  wrote:

On Tue, Dec 18, 2018 at 8:58 AM Luís Mendes  wrote:

Hi Christian,

I've been using a Sapphire RX 550 and a Sapphire RX 460 on a custom
armhf board that runs well with Linux 4.19.9 at least, but now
starting with Linux kernel 4.20, I'm having a gpu hang, right after
the console being displayed, but before entering in graphical mode,
when starting X session.
I'm only reporting this now, because there was a PCI commit for mvebu
that also entered for linux-4.20 that caused a kernel oops during
pci_map_rom call in amdgpu initialization code. I've reverted that
patch, but now amdgpu is hanging.

It would be useful if you could bisect.  This is the first I've heard
of amdgpu working on an ARM board without write combining (WC)
disabled.  You might check to see if disabling WC helps.  Return false
in drm_arch_can_wc_memory().

Alex



[   24.801861] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=2, emitted seq=3

02:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Baffin [Polaris11] (rev ff) (prog-if 00 [VGA controller])
 Subsystem: Sapphire Technology Limited Baffin [Radeon RX 560]
 Flags: bus master, fast devsel, latency 0, IRQ 51
 Memory at d000 (64-bit, prefetchable) [size=256M]
 Memory at e000 (64-bit, prefetchable) [size=2M]
 I/O ports at 1 [size=256]
 Memory at e020 (32-bit, non-prefetchable) [size=256K]
 Expansion ROM at e024 [disabled] [size=128K]
 Capabilities: 
 Kernel driver in use: amdgpu
 Kernel modules: amdgpu

dmesg follows in attachment.

Regards,
Luís
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Regression with kernel 4.20 on armhf

2018-12-28 Thread Luís Mendes
Hi Alex,

Before all... Have a nice holidays! Happy new year!!

- Okay, so it looks like sometimes the driver is able to enter
graphical mode with the Polaris card, but most of the time it fails
before with:
[   49.762704] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=2, emitted seq=3

- This is something that is happening sporadically but in a less
intensive way in 4.17, 4.18 and 4.19 kernels, so this is actually not
a regression, but rather an existent issue, which maybe the patch
"drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq ring tests
sequence" solves. I tried to backport it to 4.20, but had no
improvement. Need to try with the git version, or rc1.

- This hang happens after the console is displayed in the screen, but
before switching to graphical mode with X.

- However if X is entered then the driver is stable and can be used
for long periods.

Regards,
Luís Mendes

On Tue, Dec 18, 2018 at 11:16 PM Luís Mendes  wrote:
>
> Hi Alex,
>
> I am already using drm_arch_can_wc_memory() set to false.
> I will try to bisect...
>
> Regards,
> Luís
>
> On Tue, Dec 18, 2018 at 7:03 PM Alex Deucher  wrote:
> >
> > On Tue, Dec 18, 2018 at 8:58 AM Luís Mendes  wrote:
> > >
> > > Hi Christian,
> > >
> > > I've been using a Sapphire RX 550 and a Sapphire RX 460 on a custom
> > > armhf board that runs well with Linux 4.19.9 at least, but now
> > > starting with Linux kernel 4.20, I'm having a gpu hang, right after
> > > the console being displayed, but before entering in graphical mode,
> > > when starting X session.
> > > I'm only reporting this now, because there was a PCI commit for mvebu
> > > that also entered for linux-4.20 that caused a kernel oops during
> > > pci_map_rom call in amdgpu initialization code. I've reverted that
> > > patch, but now amdgpu is hanging.
> >
> > It would be useful if you could bisect.  This is the first I've heard
> > of amdgpu working on an ARM board without write combining (WC)
> > disabled.  You might check to see if disabling WC helps.  Return false
> > in drm_arch_can_wc_memory().
> >
> > Alex
> >
> > >
> > >
> > > [   24.801861] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> > > timeout, signaled seq=2, emitted seq=3
> > >
> > > 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> > > [AMD/ATI] Baffin [Polaris11] (rev ff) (prog-if 00 [VGA controller])
> > > Subsystem: Sapphire Technology Limited Baffin [Radeon RX 560]
> > > Flags: bus master, fast devsel, latency 0, IRQ 51
> > > Memory at d000 (64-bit, prefetchable) [size=256M]
> > > Memory at e000 (64-bit, prefetchable) [size=2M]
> > > I/O ports at 1 [size=256]
> > > Memory at e020 (32-bit, non-prefetchable) [size=256K]
> > > Expansion ROM at e024 [disabled] [size=128K]
> > > Capabilities: 
> > > Kernel driver in use: amdgpu
> > > Kernel modules: amdgpu
> > >
> > > dmesg follows in attachment.
> > >
> > > Regards,
> > > Luís
> > > ___
> > > amd-gfx mailing list
> > > amd-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Regression with kernel 4.20 on armhf

2018-12-18 Thread Luís Mendes
Hi Alex,

I am already using drm_arch_can_wc_memory() set to false.
I will try to bisect...

Regards,
Luís

On Tue, Dec 18, 2018 at 7:03 PM Alex Deucher  wrote:
>
> On Tue, Dec 18, 2018 at 8:58 AM Luís Mendes  wrote:
> >
> > Hi Christian,
> >
> > I've been using a Sapphire RX 550 and a Sapphire RX 460 on a custom
> > armhf board that runs well with Linux 4.19.9 at least, but now
> > starting with Linux kernel 4.20, I'm having a gpu hang, right after
> > the console being displayed, but before entering in graphical mode,
> > when starting X session.
> > I'm only reporting this now, because there was a PCI commit for mvebu
> > that also entered for linux-4.20 that caused a kernel oops during
> > pci_map_rom call in amdgpu initialization code. I've reverted that
> > patch, but now amdgpu is hanging.
>
> It would be useful if you could bisect.  This is the first I've heard
> of amdgpu working on an ARM board without write combining (WC)
> disabled.  You might check to see if disabling WC helps.  Return false
> in drm_arch_can_wc_memory().
>
> Alex
>
> >
> >
> > [   24.801861] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> > timeout, signaled seq=2, emitted seq=3
> >
> > 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> > [AMD/ATI] Baffin [Polaris11] (rev ff) (prog-if 00 [VGA controller])
> > Subsystem: Sapphire Technology Limited Baffin [Radeon RX 560]
> > Flags: bus master, fast devsel, latency 0, IRQ 51
> > Memory at d000 (64-bit, prefetchable) [size=256M]
> > Memory at e000 (64-bit, prefetchable) [size=2M]
> > I/O ports at 1 [size=256]
> > Memory at e020 (32-bit, non-prefetchable) [size=256K]
> > Expansion ROM at e024 [disabled] [size=128K]
> > Capabilities: 
> > Kernel driver in use: amdgpu
> > Kernel modules: amdgpu
> >
> > dmesg follows in attachment.
> >
> > Regards,
> > Luís
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Regression with kernel 4.20 on armhf

2018-12-18 Thread Alex Deucher
On Tue, Dec 18, 2018 at 8:58 AM Luís Mendes  wrote:
>
> Hi Christian,
>
> I've been using a Sapphire RX 550 and a Sapphire RX 460 on a custom
> armhf board that runs well with Linux 4.19.9 at least, but now
> starting with Linux kernel 4.20, I'm having a gpu hang, right after
> the console being displayed, but before entering in graphical mode,
> when starting X session.
> I'm only reporting this now, because there was a PCI commit for mvebu
> that also entered for linux-4.20 that caused a kernel oops during
> pci_map_rom call in amdgpu initialization code. I've reverted that
> patch, but now amdgpu is hanging.

It would be useful if you could bisect.  This is the first I've heard
of amdgpu working on an ARM board without write combining (WC)
disabled.  You might check to see if disabling WC helps.  Return false
in drm_arch_can_wc_memory().

Alex

>
>
> [   24.801861] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> timeout, signaled seq=2, emitted seq=3
>
> 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> [AMD/ATI] Baffin [Polaris11] (rev ff) (prog-if 00 [VGA controller])
> Subsystem: Sapphire Technology Limited Baffin [Radeon RX 560]
> Flags: bus master, fast devsel, latency 0, IRQ 51
> Memory at d000 (64-bit, prefetchable) [size=256M]
> Memory at e000 (64-bit, prefetchable) [size=2M]
> I/O ports at 1 [size=256]
> Memory at e020 (32-bit, non-prefetchable) [size=256K]
> Expansion ROM at e024 [disabled] [size=128K]
> Capabilities: 
> Kernel driver in use: amdgpu
> Kernel modules: amdgpu
>
> dmesg follows in attachment.
>
> Regards,
> Luís
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx