Re: [Nouveau] Regression in 5.15 in nouveau

2021-12-07 Thread Dan Moulding
> There is a pretty obvious typo in there:
> 
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -359,7 +359,7 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct 
> nouveau_channel *chan, bool e
>  fobj = dma_resv_shared_list(resv);
>  }
> 
> -   for (i = 0; (i < fobj ? fobj->shared_count : 0) && !ret; ++i) {
> +   for (i = 0; i < (fobj ? fobj->shared_count : 0) && !ret; ++i) {
>  struct nouveau_channel *prev = NULL;
>  bool must_wait = true;
> 
> 
> With that it works and I don't see the flickering in a short test. I 
> will do more testing, but maybe Dan can test, too.
> 
> Cheers,
> Stefan

After fixing the typo the patch is working for me, also. dmesg is also
clean. I will continue running the patched kernel. If I see any
issues, I will report back here.

Cheers,

-- Dan


[Nouveau] Regression in 5.15 in nouveau

2021-12-07 Thread Stefan Fritsch

Hi,

when updating from 5.14 to 5.15 on a system with NVIDIA GP108 [GeForce 
GT 1030] (NV138) and Ryzen 9 3900XT using kde/plasma on X (not wayland), 
there is a regression: There is now some annoying black flickering in 
some applications, for example thunderbird, firefox, or mpv. It mostly 
happens when scrolling or when playing video. Only the window of the 
application flickers, not the whole screen. But the flickering is not 
limited to the scrolled area: for example in firefox the url and 
bookmark bars flicker, too, not only the web site. I have bisected the 
issue to this commit:


commit 3e1ad79bf66165bdb2baca3989f9227939241f11 (HEAD)
Author: Christian König 
Date:   Sun Jun 6 11:50:15 2021 +0200

drm/nouveau: always wait for the exclusive fence

Drivers also need to to sync to the exclusive fence when
a shared one is present.

Signed-off-by: Christian König 
Reviewed-by: Daniel Vetter 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20210702111642.17259-4-christian.koe...@amd.com



This sounds like performance is impacted severely by that commit. Can 
this be fixed somehow? A partial dmesg is below.


Cheers,
Stefan


dmesg |grep -i -e drm -e dri -e nvidia -e nouveau -e fb
[0.00] BIOS-e820: [mem 0xbc552000-0xbc8fbfff] 
reserved
[0.004971] ACPI: XSDT 0xBCFB0728 CC (v01 ALASKA A M I 
01072009 AMI  0113)
[0.010838] PM: hibernation: Registered nosave memory: [mem 
0xbc552000-0xbc8fbfff]

[0.204873] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
[0.292761] Registering PCC driver as Mailbox controller
[0.292761] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[0.518295] pci :06:00.0: reg 0x10: [mem 0xfb00-0xfbff]
[0.519132] pci :06:00.1: [10de:0fb8] type 00 class 0x040300
[0.519653] pci :00:03.1:   bridge window [mem 0xfb00-0xfc0f]
[0.549101] pci :00:03.1:   bridge window [mem 0xfb00-0xfc0f]
[0.550994] pci_bus :06: resource 1 [mem 0xfb00-0xfc0f]
[0.561285] Block layer SCSI generic (bsg) driver version 0.4 loaded 
(major 250)

[0.564152] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[0.570870] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[0.571531] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel 
[0.988343] microcode: Microcode Update Driver: v2.2.
[1.112435] ACPI: OSL: Resource conflict; ACPI support missing from 
driver?

[1.114174] usbcore: registered new interface driver usbfs
[1.114331] usbcore: registered new interface driver hub
[1.114599] usbcore: registered new device driver usb
[2.373857] hid: raw HID events driver (C) Jiri Kosina
[2.378553] usbcore: registered new interface driver usbhid
[2.378641] usbhid: USB HID core driver
[2.581069] ata3.00: supports DRM functions and may not be fully 
accessible
[2.582388] ata3.00: supports DRM functions and may not be fully 
accessible
[3.371574] ata5.00: supports DRM functions and may not be fully 
accessible
[3.396636] ata5.00: supports DRM functions and may not be fully 
accessible
[4.159005] sr 1:0:0:0: [sr0] scsi3-mmc drive: 48x/48x writer dvd-ram 
cd/rw xa/form2 cdda tray

[4.159120] cdrom: Uniform CD-ROM driver Revision: 3.20
[5.936017] systemd[1]: Starting Load Kernel Module drm...
[5.957038] systemd[1]: modprobe@drm.service: Deactivated successfully.
[5.957238] systemd[1]: Finished Load Kernel Module drm.
[6.104901] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
[6.122007] usbcore: registered new device driver apple-mfi-fastcharge
[6.213866] input: HDA NVidia HDMI/DP,pcm=3 as 
/devices/pci:00/:00:03.1/:06:00.1/sound/card0/input8

[6.236581] AMD64 EDAC driver v3.5.0
[6.259473] input: HDA NVidia HDMI/DP,pcm=7 as 
/devices/pci:00/:00:03.1/:06:00.1/sound/card0/input9
[6.259631] input: HDA NVidia HDMI/DP,pcm=8 as 
/devices/pci:00/:00:03.1/:06:00.1/sound/card0/input10
[6.260559] input: HDA NVidia HDMI/DP,pcm=9 as 
/devices/pci:00/:00:03.1/:06:00.1/sound/card0/input11
[6.260913] input: HDA NVidia HDMI/DP,pcm=10 as 
/devices/pci:00/:00:03.1/:06:00.1/sound/card0/input12

[6.485220] nouveau :06:00.0: vgaarb: deactivate vga console
[6.486484] nouveau :06:00.0: NVIDIA GP108 (138000a1)
[6.612994] nouveau :06:00.0: bios: version 86.08.24.00.23
[6.617303] nouveau :06:00.0: pmu: firmware unavailable
[6.621410] nouveau :06:00.0: fb: 2048 MiB GDDR5
[6.653892] nouveau :06:00.0: DRM: VRAM: 2048 MiB
[6.653895] nouveau :06:00.0: DRM: GART: 536870912 MiB
[6.653897] nouveau :06:00.0: DRM: BIT table 'A' not found
[6.653899] nouveau :06:00.0: DRM: BIT table 'L' not found
[6.653900] nouveau :06:00.0: DRM: TMDS table version 2.0
[6.653902] nouveau :06:00.0: DRM: DCB version 4.1
[6.653904] nouveau 

Re: [Nouveau] Regression in 5.15 in nouveau

2021-12-07 Thread Dan Moulding
On 04.12.21 17:40, Stefan Fritsch wrote:
> Hi,
> 
> when updating from 5.14 to 5.15 on a system with NVIDIA GP108 [GeForce
> GT 1030] (NV138) and Ryzen 9 3900XT using kde/plasma on X (not wayland),
> there is a regression: There is now some annoying black flickering in
> some applications, for example thunderbird, firefox, or mpv. It mostly
> happens when scrolling or when playing video. Only the window of the
> application flickers, not the whole screen. But the flickering is not
> limited to the scrolled area: for example in firefox the url and
> bookmark bars flicker, too, not only the web site. I have bisected the
> issue to this commit:
> 
> commit 3e1ad79bf66165bdb2baca3989f9227939241f11 (HEAD)

I have been experiencing this same issue since switching to 5.15. I
can confirm that reverting the above mentioned commit fixes the issue
for me. I'm on GP104 hardware (GeForce GTX 1070), also running KDE
Plasma on X.

Cheers,

-- Dan


Re: [Nouveau] Regression in 5.15 in nouveau

2021-12-07 Thread Dan Moulding
> Please test if that patch changes anything.

Looks like the driver is not functional after applying that patch. As
soon as the display manager is supposed to start I get a black screen
with just a (working) mouse pointer. VT switching doesn't work after
that point.

I got the following warning when compiling with that patch applied:

drivers/gpu/drm/nouveau/nouveau_fence.c: In function ‘nouveau_fence_sync’:
drivers/gpu/drm/nouveau/nouveau_fence.c:362:24: warning: comparison between 
pointer and integer
  362 | for (i = 0; (i < fobj ? fobj->shared_count : 0) && !ret; ++i) {
  |^

Below are the relevant portions from dmesg after attempting to run
with the patch applied.

Cheers,

-- Dan

dmesg:
=



[0.269958] nouveau :01:00.0: NVIDIA GP104 (134000a1)
[0.377100] nouveau :01:00.0: bios: version 86.04.50.80.13
[0.377210] nouveau :01:00.0: pmu: firmware unavailable
[0.377711] nouveau :01:00.0: fb: 8192 MiB GDDR5
[0.391160] nouveau :01:00.0: DRM: VRAM: 8192 MiB
[0.391164] nouveau :01:00.0: DRM: GART: 536870912 MiB
[0.391166] nouveau :01:00.0: DRM: BIT table 'A' not found
[0.391168] nouveau :01:00.0: DRM: BIT table 'L' not found
[0.391170] nouveau :01:00.0: DRM: TMDS table version 2.0
[0.391172] nouveau :01:00.0: DRM: DCB version 4.1
[0.391174] nouveau :01:00.0: DRM: DCB outp 00: 01000f42 00020030
[0.391176] nouveau :01:00.0: DRM: DCB outp 01: 04811f96 04600020
[0.391178] nouveau :01:00.0: DRM: DCB outp 02: 04011f92 00020020
[0.391180] nouveau :01:00.0: DRM: DCB outp 03: 04822f86 04600010
[0.391182] nouveau :01:00.0: DRM: DCB outp 04: 04022f82 00020010
[0.391184] nouveau :01:00.0: DRM: DCB outp 06: 02033f62 00020010
[0.391186] nouveau :01:00.0: DRM: DCB outp 07: 02844f76 04600020
[0.391188] nouveau :01:00.0: DRM: DCB outp 08: 02044f72 00020020
[0.391190] nouveau :01:00.0: DRM: DCB conn 00: 1031
[0.391191] nouveau :01:00.0: DRM: DCB conn 01: 02000146
[0.391193] nouveau :01:00.0: DRM: DCB conn 02: 01000246
[0.391194] nouveau :01:00.0: DRM: DCB conn 03: 00010361
[0.391196] nouveau :01:00.0: DRM: DCB conn 04: 00020446
[0.391489] nouveau :01:00.0: DRM: MM: using COPY for buffer copies
[0.891103] nouveau :01:00.0: DRM: allocated 1920x1080 fb: 0x20, bo 
bba11dd4
[0.892559] fbcon: nouveau (fb0) is primary device
[1.298487] tsc: Refined TSC clocksource calibration: 2999.999 MHz
[1.298492] clocksource: tsc: mask: 0x max_cycles: 
0x2b3e44b2357, max_idle_ns: 440795324996 ns
[1.298555] clocksource: Switched to clocksource tsc
[1.340790] Console: switching to colour frame buffer device 240x67
[1.341249] nouveau :01:00.0: [drm] fb0: nouveau frame buffer device
[1.341412] [drm] Initialized nouveau 1.3.1 20120801 for :01:00.0 on 
minor 0
[1.341420] nouveau :01:00.0: DRM: Disabling PCI power management to 
avoid bug



[   19.742986] general protection fault, probably for non-canonical address 
0x3e40c25cd2e657bd:  [#1] SMP
[   19.742989] CPU: 0 PID: 3588 Comm: X Tainted: GT 5.15.6p2+ #1
[   19.742991] Hardware name: Dell Inc. XPS 8930/0T2HR0, BIOS 1.1.17 06/22/2021
[   19.742992] RIP: 0010:nouveau_fence_sync+0x6f/0x240
[   19.742995] Code: 00 8b 7b 10 85 ff 0f 84 a7 00 00 00 41 89 cd 31 ed 31 d2 
49 be ff ff ff ff ff ff ff 7f 4c 8b 7c d3 18 49 8b 94 24 90 00 00 00 <49> 8b 47 
08 48 3d 00 37 0f a0 74 0c 48 3d 60 37 0f a0 0f 85 16 01
[   19.742996] RSP: 0018:b3bc413e7c10 EFLAGS: 00010202
[   19.742998] RAX: 1001 RBX: 897383c07980 RCX: 0003
[   19.742999] RDX: 897380cce000 RSI: 0001 RDI: 0001
[   19.743000] RBP: 0001 R08: 897382af6c00 R09: 8973868641b8
[   19.743001] R10: 0002 R11: 8973868641dc R12: 897386862400
[   19.743001] R13: 0001 R14: 7fff R15: 3e40c25cd2e657b5
[   19.743002] FS:  7f5e661be8c0() GS:897ae0c0() 
knlGS:
[   19.743003] CS:  0010 DS:  ES:  CR0: 80050033
[   19.743004] CR2: 55c9e8e33d50 CR3: 0001152b7001 CR4: 003706f0
[   19.743005] DR0:  DR1:  DR2: 
[   19.743006] DR3:  DR6: fffe0ff0 DR7: 0400
[   19.743006] Call Trace:
[   19.743008]  
[   19.743009]  nouveau_gem_ioctl_pushbuf+0x6ba/0x11a0
[   19.743011]  ? nouveau_gem_ioctl_new+0x100/0x100
[   19.743012]  drm_ioctl_kernel+0x9f/0xe0
[   19.743015]  drm_ioctl+0x214/0x3f0
[   19.743016]  ? nouveau_gem_ioctl_new+0x100/0x100
[   19.743017]  ? syscall_exit_to_user_mode+0x1d/0x40
[   19.743019]  ? do_syscall_64+0x63/0x80
[   19.743021]  __x64_sys_ioctl+0x80/0xa0
[   19.743023]  do_syscall_64+0x56/0x80
[   19.743025]  ? exc_page_fault+0x18c/0x4e0
[   19.743026]  ? 

Re: [Nouveau] Regression in 5.15 in nouveau

2021-12-07 Thread Stefan Fritsch

There is a pretty obvious typo in there:

--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -359,7 +359,7 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct 
nouveau_channel *chan, bool e

fobj = dma_resv_shared_list(resv);
}

-   for (i = 0; (i < fobj ? fobj->shared_count : 0) && !ret; ++i) {
+   for (i = 0; i < (fobj ? fobj->shared_count : 0) && !ret; ++i) {
struct nouveau_channel *prev = NULL;
bool must_wait = true;


With that it works and I don't see the flickering in a short test. I 
will do more testing, but maybe Dan can test, too.


Cheers,
Stefan

On 07.12.21 20:01, Dan Moulding wrote:

Please test if that patch changes anything.


Looks like the driver is not functional after applying that patch. As
soon as the display manager is supposed to start I get a black screen
with just a (working) mouse pointer. VT switching doesn't work after
that point.

I got the following warning when compiling with that patch applied:

drivers/gpu/drm/nouveau/nouveau_fence.c: In function ‘nouveau_fence_sync’:
drivers/gpu/drm/nouveau/nouveau_fence.c:362:24: warning: comparison between 
pointer and integer
   362 | for (i = 0; (i < fobj ? fobj->shared_count : 0) && !ret; ++i) {
   |^

Below are the relevant portions from dmesg after attempting to run
with the patch applied.

Cheers,

-- Dan

dmesg:
=



[0.269958] nouveau :01:00.0: NVIDIA GP104 (134000a1)
[0.377100] nouveau :01:00.0: bios: version 86.04.50.80.13
[0.377210] nouveau :01:00.0: pmu: firmware unavailable
[0.377711] nouveau :01:00.0: fb: 8192 MiB GDDR5
[0.391160] nouveau :01:00.0: DRM: VRAM: 8192 MiB
[0.391164] nouveau :01:00.0: DRM: GART: 536870912 MiB
[0.391166] nouveau :01:00.0: DRM: BIT table 'A' not found
[0.391168] nouveau :01:00.0: DRM: BIT table 'L' not found
[0.391170] nouveau :01:00.0: DRM: TMDS table version 2.0
[0.391172] nouveau :01:00.0: DRM: DCB version 4.1
[0.391174] nouveau :01:00.0: DRM: DCB outp 00: 01000f42 00020030
[0.391176] nouveau :01:00.0: DRM: DCB outp 01: 04811f96 04600020
[0.391178] nouveau :01:00.0: DRM: DCB outp 02: 04011f92 00020020
[0.391180] nouveau :01:00.0: DRM: DCB outp 03: 04822f86 04600010
[0.391182] nouveau :01:00.0: DRM: DCB outp 04: 04022f82 00020010
[0.391184] nouveau :01:00.0: DRM: DCB outp 06: 02033f62 00020010
[0.391186] nouveau :01:00.0: DRM: DCB outp 07: 02844f76 04600020
[0.391188] nouveau :01:00.0: DRM: DCB outp 08: 02044f72 00020020
[0.391190] nouveau :01:00.0: DRM: DCB conn 00: 1031
[0.391191] nouveau :01:00.0: DRM: DCB conn 01: 02000146
[0.391193] nouveau :01:00.0: DRM: DCB conn 02: 01000246
[0.391194] nouveau :01:00.0: DRM: DCB conn 03: 00010361
[0.391196] nouveau :01:00.0: DRM: DCB conn 04: 00020446
[0.391489] nouveau :01:00.0: DRM: MM: using COPY for buffer copies
[0.891103] nouveau :01:00.0: DRM: allocated 1920x1080 fb: 0x20, bo 
bba11dd4
[0.892559] fbcon: nouveau (fb0) is primary device
[1.298487] tsc: Refined TSC clocksource calibration: 2999.999 MHz
[1.298492] clocksource: tsc: mask: 0x max_cycles: 
0x2b3e44b2357, max_idle_ns: 440795324996 ns
[1.298555] clocksource: Switched to clocksource tsc
[1.340790] Console: switching to colour frame buffer device 240x67
[1.341249] nouveau :01:00.0: [drm] fb0: nouveau frame buffer device
[1.341412] [drm] Initialized nouveau 1.3.1 20120801 for :01:00.0 on 
minor 0
[1.341420] nouveau :01:00.0: DRM: Disabling PCI power management to 
avoid bug



[   19.742986] general protection fault, probably for non-canonical address 
0x3e40c25cd2e657bd:  [#1] SMP
[   19.742989] CPU: 0 PID: 3588 Comm: X Tainted: GT 5.15.6p2+ #1
[   19.742991] Hardware name: Dell Inc. XPS 8930/0T2HR0, BIOS 1.1.17 06/22/2021
[   19.742992] RIP: 0010:nouveau_fence_sync+0x6f/0x240
[   19.742995] Code: 00 8b 7b 10 85 ff 0f 84 a7 00 00 00 41 89 cd 31 ed 31 d2 49 be 
ff ff ff ff ff ff ff 7f 4c 8b 7c d3 18 49 8b 94 24 90 00 00 00 <49> 8b 47 08 48 
3d 00 37 0f a0 74 0c 48 3d 60 37 0f a0 0f 85 16 01
[   19.742996] RSP: 0018:b3bc413e7c10 EFLAGS: 00010202
[   19.742998] RAX: 1001 RBX: 897383c07980 RCX: 0003
[   19.742999] RDX: 897380cce000 RSI: 0001 RDI: 0001
[   19.743000] RBP: 0001 R08: 897382af6c00 R09: 8973868641b8
[   19.743001] R10: 0002 R11: 8973868641dc R12: 897386862400
[   19.743001] R13: 0001 R14: 7fff R15: 3e40c25cd2e657b5
[   19.743002] FS:  7f5e661be8c0() GS:897ae0c0() 
knlGS:
[   19.743003] CS:  0010 DS:  ES:  CR0: 80050033
[   19.743004] CR2: 55c9e8e33d50 CR3: 

Re: [Nouveau] Regression in 5.15 in nouveau

2021-12-07 Thread Daniel Vetter
On Tue, Dec 07, 2021 at 06:32:06PM +0100, Karol Herbst wrote:
> On Tue, Dec 7, 2021 at 10:52 AM Christian König
>  wrote:
> >
> > Am 06.12.21 um 19:37 schrieb Dan Moulding:
> > > On 04.12.21 17:40, Stefan Fritsch wrote:
> > >> Hi,
> > >>
> > >> when updating from 5.14 to 5.15 on a system with NVIDIA GP108 [GeForce
> > >> GT 1030] (NV138) and Ryzen 9 3900XT using kde/plasma on X (not wayland),
> > >> there is a regression: There is now some annoying black flickering in
> > >> some applications, for example thunderbird, firefox, or mpv. It mostly
> > >> happens when scrolling or when playing video. Only the window of the
> > >> application flickers, not the whole screen. But the flickering is not
> > >> limited to the scrolled area: for example in firefox the url and
> > >> bookmark bars flicker, too, not only the web site. I have bisected the
> > >> issue to this commit:
> > >>
> > >> commit 3e1ad79bf66165bdb2baca3989f9227939241f11 (HEAD)
> > > I have been experiencing this same issue since switching to 5.15. I
> > > can confirm that reverting the above mentioned commit fixes the issue
> > > for me. I'm on GP104 hardware (GeForce GTX 1070), also running KDE
> > > Plasma on X.
> >
> > I'm still scratching my head what's going wrong here.
> >
> > Either we trigger some performance problem because we now wait twice for
> > submissions or nouveau is doing something very nasty and not syncing
> > it's memory accesses correctly.
> >
> > Attached is an only compile tested patch which might mitigate the first
> > problem.
> >
> > But if it's the second then nouveau has a really nasty design issue here
> > and somebody with more background on that driver design needs to take a
> > look.
> >
> 
> Ben mentioned a few times that fences might be busted but we all have
> no idea what's actually wrong. So it might be that your change is
> indeed triggering something which was always broken or something else.

Description sounds a bit like we're doing a clear before Xorg has had a
chance to copy the pixmap to the frontbuffer perhaps? That would point to
a fencing issue in userspace, and somehow ignoring fences ensures that the
Xorg copy/blt completes before we get around to clearing stuff.

I'm assuming we're rendering with glamour, so is nouveau relying on kernel
implicit sync or doing it's own fencing in userspace?
-Daniel

> 
> > Please test if that patch changes anything.
> >
> > Thanks,
> > Christian.
> >
> > >
> > > Cheers,
> > >
> > > -- Dan
> >
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Nouveau] Regression in 5.15 in nouveau

2021-12-07 Thread Karol Herbst
On Tue, Dec 7, 2021 at 10:52 AM Christian König
 wrote:
>
> Am 06.12.21 um 19:37 schrieb Dan Moulding:
> > On 04.12.21 17:40, Stefan Fritsch wrote:
> >> Hi,
> >>
> >> when updating from 5.14 to 5.15 on a system with NVIDIA GP108 [GeForce
> >> GT 1030] (NV138) and Ryzen 9 3900XT using kde/plasma on X (not wayland),
> >> there is a regression: There is now some annoying black flickering in
> >> some applications, for example thunderbird, firefox, or mpv. It mostly
> >> happens when scrolling or when playing video. Only the window of the
> >> application flickers, not the whole screen. But the flickering is not
> >> limited to the scrolled area: for example in firefox the url and
> >> bookmark bars flicker, too, not only the web site. I have bisected the
> >> issue to this commit:
> >>
> >> commit 3e1ad79bf66165bdb2baca3989f9227939241f11 (HEAD)
> > I have been experiencing this same issue since switching to 5.15. I
> > can confirm that reverting the above mentioned commit fixes the issue
> > for me. I'm on GP104 hardware (GeForce GTX 1070), also running KDE
> > Plasma on X.
>
> I'm still scratching my head what's going wrong here.
>
> Either we trigger some performance problem because we now wait twice for
> submissions or nouveau is doing something very nasty and not syncing
> it's memory accesses correctly.
>
> Attached is an only compile tested patch which might mitigate the first
> problem.
>
> But if it's the second then nouveau has a really nasty design issue here
> and somebody with more background on that driver design needs to take a
> look.
>

Ben mentioned a few times that fences might be busted but we all have
no idea what's actually wrong. So it might be that your change is
indeed triggering something which was always broken or something else.

> Please test if that patch changes anything.
>
> Thanks,
> Christian.
>
> >
> > Cheers,
> >
> > -- Dan
>



Re: [Nouveau] Regression in 5.15 in nouveau

2021-12-07 Thread Christian König

Am 06.12.21 um 19:37 schrieb Dan Moulding:

On 04.12.21 17:40, Stefan Fritsch wrote:

Hi,

when updating from 5.14 to 5.15 on a system with NVIDIA GP108 [GeForce
GT 1030] (NV138) and Ryzen 9 3900XT using kde/plasma on X (not wayland),
there is a regression: There is now some annoying black flickering in
some applications, for example thunderbird, firefox, or mpv. It mostly
happens when scrolling or when playing video. Only the window of the
application flickers, not the whole screen. But the flickering is not
limited to the scrolled area: for example in firefox the url and
bookmark bars flicker, too, not only the web site. I have bisected the
issue to this commit:

commit 3e1ad79bf66165bdb2baca3989f9227939241f11 (HEAD)

I have been experiencing this same issue since switching to 5.15. I
can confirm that reverting the above mentioned commit fixes the issue
for me. I'm on GP104 hardware (GeForce GTX 1070), also running KDE
Plasma on X.


I'm still scratching my head what's going wrong here.

Either we trigger some performance problem because we now wait twice for 
submissions or nouveau is doing something very nasty and not syncing 
it's memory accesses correctly.


Attached is an only compile tested patch which might mitigate the first 
problem.


But if it's the second then nouveau has a really nasty design issue here 
and somebody with more background on that driver design needs to take a 
look.


Please test if that patch changes anything.

Thanks,
Christian.



Cheers,

-- Dan


>From bcb86d62569c0131288c8b032f848f28f0178648 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= 
Date: Tue, 7 Dec 2021 10:10:15 +0100
Subject: [PATCH] drm/nouveau: wait for the exclusive fence after the shared
 ones
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Always waiting for the exclusive fence resulted on some performance
regressions. So try to wait for the shared fences first, then the
exclusive fence should always be signaled already.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 05d0b3eb3690..0947e332371b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -353,15 +353,19 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e
 
 		if (ret)
 			return ret;
-	}
 
-	fobj = dma_resv_shared_list(resv);
-	fence = dma_resv_excl_fence(resv);
+		fobj = NULL;
+	} else {
+		fobj = dma_resv_shared_list(resv);
+	}
 
-	if (fence) {
+	for (i = 0; (i < fobj ? fobj->shared_count : 0) && !ret; ++i) {
 		struct nouveau_channel *prev = NULL;
 		bool must_wait = true;
 
+		fence = rcu_dereference_protected(fobj->shared[i],
+		dma_resv_held(resv));
+
 		f = nouveau_local_fence(fence, chan->drm);
 		if (f) {
 			rcu_read_lock();
@@ -373,20 +377,13 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e
 
 		if (must_wait)
 			ret = dma_fence_wait(fence, intr);
-
-		return ret;
 	}
 
-	if (!exclusive || !fobj)
-		return ret;
-
-	for (i = 0; i < fobj->shared_count && !ret; ++i) {
+	fence = dma_resv_excl_fence(resv);
+	if (fence) {
 		struct nouveau_channel *prev = NULL;
 		bool must_wait = true;
 
-		fence = rcu_dereference_protected(fobj->shared[i],
-		dma_resv_held(resv));
-
 		f = nouveau_local_fence(fence, chan->drm);
 		if (f) {
 			rcu_read_lock();
@@ -398,6 +395,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, bool e
 
 		if (must_wait)
 			ret = dma_fence_wait(fence, intr);
+
+		return ret;
 	}
 
 	return ret;
-- 
2.25.1