[Nouveau] [PATCH 1/2] drmmode: make event handler leave a note that there are stuck events

2020-08-15 Thread Ilia Mirkin
We don't really expect to have too many events in the queue. If there
are, then the algorithm we use isn't appropriate. Add a warning when the
queue gets very long, as it's an indication of something having gone
wrong.

Signed-off-by: Ilia Mirkin 
---
 src/drmmode_display.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/drmmode_display.c b/src/drmmode_display.c
index 2d3229c..45292c4 100644
--- a/src/drmmode_display.c
+++ b/src/drmmode_display.c
@@ -159,6 +159,8 @@ drmmode_events = {
.prev = _events,
 };
 
+static bool warned = false;
+
 static void
 drmmode_event_handler(int fd, unsigned int frame, unsigned int tv_sec,
  unsigned int tv_usec, void *event_data)
@@ -166,7 +168,10 @@ drmmode_event_handler(int fd, unsigned int frame, unsigned 
int tv_sec,
const uint64_t ust = (uint64_t)tv_sec * 100 + tv_usec;
struct drmmode_event *e = event_data;
 
+   int counter = 0;
+
xorg_list_for_each_entry(e, _events, head) {
+   counter++;
if (e == event_data) {
xorg_list_del(>head);
e->func((void *)(e + 1), e->name, ust, frame);
@@ -174,6 +179,12 @@ drmmode_event_handler(int fd, unsigned int frame, unsigned 
int tv_sec,
break;
}
}
+
+   if (counter > 100 && !warned) {
+   xf86DrvMsg(0, X_WARNING,
+  "Event handler iterated %d times\n", counter);
+   warned = true;
+   }
 }
 
 void
-- 
2.26.2

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH 2/2] present: fix handling of drmWaitVBlank failures

2020-08-15 Thread Ilia Mirkin
When drmWaitVBlank fails, make sure to remove the event from the queue.

Signed-off-by: Ilia Mirkin 
---

Note this needs a bit more testing, and also double-checking what the
"correct" way of dealing with these errors is. I was able to trigger
errors with "xset dpms force off", but perhaps there are also other
conditions.

 src/nouveau_present.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/nouveau_present.c b/src/nouveau_present.c
index 8167fd8..15516d6 100644
--- a/src/nouveau_present.c
+++ b/src/nouveau_present.c
@@ -113,8 +113,19 @@ nouveau_present_vblank_queue(RRCrtcPtr rrcrtc, uint64_t 
event_id, uint64_t msc)
args.request.signal = (unsigned long)token;
 
while ((ret = drmWaitVBlank(pNv->dev->fd, )) != 0) {
-   if (errno != EBUSY || drmmode_event_flush(crtc->scrn) < 0)
+   if (errno != EBUSY) {
+   xf86DrvMsg(crtc->scrn->scrnIndex, X_DEBUG,
+  "PRESENT: Wait for VBlank failed: %s\n", 
strerror(errno));
+   drmmode_event_abort(crtc->scrn, event_id, false);
return BadAlloc;
+   }
+   ret = drmmode_event_flush(crtc->scrn);
+   if (ret < 0) {
+   xf86DrvMsg(crtc->scrn->scrnIndex, X_DEBUG,
+  "PRESENT: Event flush failed\n");
+   drmmode_event_abort(crtc->scrn, event_id, false);
+   return BadAlloc;
+   }
}
 
return Success;
-- 
2.26.2

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Accumulating CPU load from Xorg process with DRI3

2020-08-15 Thread Ilia Mirkin
I've tracked down at least one source of these, which is that we don't
handle drmWaitVBlank errors properly in the PRESENT logic (which would
be used in conjunction with DRI3). These errors, broadly, will happen
while strings are turned off and/or in DPMS sleep. Did your monitors
go to sleep while a video was playing? If not, there's another path
for it to happen...

Cheers,

  -ilia

On Thu, Aug 13, 2020 at 6:47 PM Ilia Mirkin  wrote:
>
> I'm aware of this issue, and am experiencing it myself.
>
> The issue is that drmmode_event_handler takes up more and more CPU
> time. It seems like some events are being "left behind". I haven't had
> time to debug it further yet though.
>
> I also have DRI3 enabled, but only very rarely do I make use of my
> secondary GPUs, and I'm pretty sure I've seen the problem happen
> without any PRIME usage.
>
> Cheers,
>
>   -ilia
>
> On Thu, Aug 13, 2020 at 6:45 PM Andrew Randrianasulu
>  wrote:
> >
> > I observed this bug for quite some time, but so far I workarounded it
> > with just setting DRI2 (default) in xorg.conf.d/20-nouveau.conf
> >
> > Now with two GPU i iwsh to use DRI3, so right now it set up like this:
> >
> > cat /etc/X11/xorg.conf.d/20-nouveau.conf
> > Section "Device"
> > Identifier "Card0"
> > Driver "nouveau"
> > Option "PageFlip" "1"
> > #Option "AccelMethod" "glamor"
> > Option   "DRI"   "3"
> >
> > But just after two hours of uptime X already eating some CPU:
> >
> >
> > op - 01:30:49 up  2:45,  1 user,  load average: 1,12, 0,93, 0,84
> > Tasks: 210 total,   1 running, 209 sleeping,   0 stopped,   0 zombie
> > %Cpu(s): 12,1 us,  3,9 sy,  0,0 ni, 81,7 id,  0,7 wa,  0,0 hi,  1,6 si,  
> > 0,0 st
> > MiB Mem :  11875,3 total,   6416,4 free,   1634,8 used,   3824,1 buff/cache
> > MiB Swap:   1145,0 total,   1145,0 free,  0,0 used.   9969,7 avail Mem
> >
> >   PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND
> >  1198 root  20   0  146160  78828  28160 S  35,8   0,6  30:41.37 Xorg
> >  1285 guest 20   0   59776  17332  13756 S  11,6   0,1  16:12.83 xmms
> >  4006 guest 20   0 1743952 919312 120628 S  10,9   7,6  20:51.01 
> > seamonkey
> >  1278 guest 20   0  101508  48528  30496 S   3,0   0,4   4:03.21 
> > ktorrent
> >  1274 guest 20   0   43368  31784  23684 S   2,0   0,3   0:29.43 konsole
> >  1259 guest 20   0   43092  28232  23640 S   1,3   0,2   0:21.53 kicker
> >  1255 guest 20   06560   4160   2720 S   1,0   0,0   1:00.90 kompmgr
> >  1293 guest 20   0   40164  21328  18636 S   1,0   0,2   1:30.50 gkrellm
> >  1254 guest 20   0   31616  21832  18944 S   0,7   0,2   0:06.49 kwin
> >
> > in ~1 day it will eat full core from my AMD FX-4300 and X will become 
> > sluggish ...
> >
> > I tried to trace it with operf 1.2.0:
> >
> > operf --pid 1198
> >
> > operf: Press Ctl-c or 'kill -SIGINT 7787' to stop profiling
> > operf: Profiler started
> > ^C
> > Profiling done.
> >
> > root@slax:~# opreport
> > Using /root/oprofile_data/samples/ for samples directory.
> > CPU: AMD64 family15h, speed 3800 MHz (estimated)
> > Counted CPU_CLK_UNHALTED events (CPU Clocks not Halted) with a unit mask of 
> > 0x00 (No unit mask) count 10
> > CPU_CLK_UNHALT...|
> >   samples|  %|
> > --
> > 78166 100.000 Xorg
> > CPU_CLK_UNHALT...|
> >   samples|  %|
> > --
> > 62905 80.4762 nouveau_drv.so
> >  5648  7.2256 kallsyms
> >  4186  5.3553 Xorg
> >  1419  1.8154 libpixman-1.so.0.38.0
> >  1038  1.3279 nouveau
> >   687  0.8789 libc-2.30.so
> >   632  0.8085 libexa.so
> >   510  0.6525 libdrm_nouveau.so.2.0.0
> >   402  0.5143 libfb.so
> >   259  0.3313 drm
> >   230  0.2942 ttm
> >   108  0.1382 libpthread-2.30.so
> >47  0.0601 libdrm.so.2.4.0
> >34  0.0435 [vdso] (tgid:1198 range:0xf7fbf000-0xf7fb)
> >27  0.0345 evdev_drv.so
> > 7  0.0090 snd_hda_codec
> > 5  0.0064 r8169
> > 5  0.0064 snd_pcm
> > 5  0.0064 libXfont2.so.2.0.0
> > 3  0.0038 snd_aloop
> > 3  0.0038 libglx.so
> > 2  0.0026 kvm
> > 2  0.0026 snd_timer
> > 1  0.0013 snd_hda_core
> > 1  0.0013 snd_hda_intel
> >
> > so, nouveau_drv itself is major CPU eater 
> >
> > I'll try to rebuild it with debug symbols enabled, and hopefully it will be 
> > enough
> > for at least seeing who eats all those cycles 
> >
> > Sorry for so many emails, just i keep discovering new bugs as I try new 
> > things!
> > ___
> > Nouveau mailing list
> > Nouveau@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/nouveau