[Intel-gfx] intel_mei_pxp: needs better help text
CONFIG_INTEL_MEI_PXP: MEI Support for PXP Services on Intel platforms. Enables the ME FW services required for PXP support through I915 display driver of Intel. That's ... very useless help text. According to https://www.phoronix.com/scan.php?page=news_item&px=Intel-PXP-Protected-Xe-Path this is some kind of DRM. Help text should probably say it has to do with i915 video, and explain the acronyms, and probably its usecases. -- http://www.livejournal.com/~pavelmachek signature.asc Description: PGP signature
Re: [Intel-gfx] intel_mei_pxp: needs better help text
Hi! Extended Cc list. Should I attempt to prepare a patch? Best regards, Pavel On Thu 2021-10-14 12:53:34, Pavel Machek wrote: > > CONFIG_INTEL_MEI_PXP: > > MEI Support for PXP Services on Intel platforms. > > Enables the ME FW services required for PXP support through > I915 display driver of Intel. > > > That's ... very useless help text. According to > https://www.phoronix.com/scan.php?page=news_item&px=Intel-PXP-Protected-Xe-Path > this is some kind of DRM. Help text should probably say it has to do > with i915 video, and explain the acronyms, and probably its usecases. > > > > -- > http://www.livejournal.com/~pavelmachek -- http://www.livejournal.com/~pavelmachek signature.asc Description: PGP signature
Re: [Intel-gfx] 5.13-rc6 on thinkpad X220: graphics hangs with recent mainline
Hi! > > I'm getting graphics problems with 5.13-rc: > > > > Debian 10.9, X, chromium and flightgear is in use. Things were more > > stable than this with previous kernels. > > > > Any ideas? > > The error you are seeing: > > > [185300.784992] i915 :00:02.0: [drm] Resetting chip for stopped > > heartbeat on rcs0 > > [185300.888694] i915 :00:02.0: [drm] fgfs[27370] context reset due to > > GPU hang > > That just indicates that the rendering took too long. It could be caused > by a change in how the application renders, userspace driver or i915. So > a previously on-the-edge-of-timeout operation may have got pushed beyond > the timeout, or the rendering genuinely got completely stuck. > > If you only updated the kernel, not the application or userspace, could > you bisect the commit that introduced the behavior and report: > > https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs > > We have changes around this area, so would be helpful if you can bisect > the commit that started the behavior. So with more recent kernels, problem went away. Is it possible it was one of those "aborted fence aborts both application and X" problems? Best regards, Pavel -- http://www.livejournal.com/~pavelmachek signature.asc Description: Digital signature
[Intel-gfx] 5.13-rc6 on thinkpad X220: graphics hangs with recent mainline
Hi! I'm getting graphics problems with 5.13-rc: Debian 10.9, X, chromium and flightgear is in use. Things were more stable than this with previous kernels. Any ideas? Best regards, Pavel [185233.329693] wlp3s0: deauthenticated from 5c:f4:ab:10:d2:bb (Reason: 16=GROUP_KEY_HANDSHAKE_TIMEOUT) [185234.040352] wlp3s0: authenticate with 5c:f4:ab:10:d2:bb [185234.043836] wlp3s0: send auth to 5c:f4:ab:10:d2:bb (try 1/3) [185234.046652] wlp3s0: authenticated [185234.049087] wlp3s0: associate with 5c:f4:ab:10:d2:bb (try 1/3) [185234.052667] wlp3s0: RX AssocResp from 5c:f4:ab:10:d2:bb (capab=0x411 status=0 aid=1) [185234.055398] wlp3s0: associated [185300.784992] i915 :00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0 [185300.888694] i915 :00:02.0: [drm] fgfs[27370] context reset due to GPU hang [185472.274563] usb 2-1.1: USB disconnect, device number 3 [185472.274578] usb 2-1.1.2: USB disconnect, device number 5 [185472.281518] hid-generic 0003:04F2:0111.0003: usb_submit_urb(ctrl) failed: -19 [185472.299837] hid-generic 0003:04F2:0111.0003: usb_submit_urb(ctrl) failed: -19 [185472.305986] hid-generic 0003:04F2:0111.0003: usb_submit_urb(ctrl) failed: -19 [185472.328012] hid-generic 0003:04F2:0111.0003: usb_submit_urb(ctrl) failed: -19 [185472.333738] usb 2-1.1.3: USB disconnect, device number 6 [185673.454821] usb 2-1.1: new high-speed USB device number 7 using ehci-pci [185673.563486] usb 2-1.1: New USB device found, idVendor=1a40, idProduct=0101, bcdDevice= 1.11 [185673.563502] usb 2-1.1: New USB device strings: Mfr=0, Product=1, SerialNumber=0 [185673.563509] usb 2-1.1: Product: USB 2.0 Hub [185673.564488] hub 2-1.1:1.0: USB hub found [185673.564595] hub 2-1.1:1.0: 4 ports detected ... [207277.385543] wlp3s0: deauthenticated from 5c:f4:ab:10:d2:bb (Reason: 16=GROUP_KEY_HANDSHAKE_TIMEOUT) [207278.062061] wlp3s0: authenticate with 5c:f4:ab:10:d2:bb [207278.068175] wlp3s0: send auth to 5c:f4:ab:10:d2:bb (try 1/3) [207278.070985] wlp3s0: authenticated [207278.075545] wlp3s0: associate with 5c:f4:ab:10:d2:bb (try 1/3) [207278.080793] wlp3s0: RX AssocResp from 5c:f4:ab:10:d2:bb (capab=0x411 status=0 aid=1) [207278.084081] wlp3s0: associated [207564.046469] i915 :00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0 [207564.150293] i915 :00:02.0: [drm] fgfs[25729] context reset due to GPU hang [209075.178776] wlp3s0: deauthenticated from 5c:f4:ab:10:d2:bb (Reason: 16=GROUP_KEY_HANDSHAKE_TIMEOUT) [209075.841872] wlp3s0: authenticate with 5c:f4:ab:10:d2:bb [209075.845305] wlp3s0: send auth to 5c:f4:ab:10:d2:bb (try 1/3) [209075.851186] wlp3s0: authenticated [209075.852537] wlp3s0: associate with 5c:f4:ab:10:d2:bb (try 1/3) [209075.855972] wlp3s0: RX AssocResp from 5c:f4:ab:10:d2:bb (capab=0x411 status=0 aid=1) [209075.858522] wlp3s0: associated [210159.723726] PM: suspend entry (deep) [210159.741497] Filesystems sync: 0.017 seconds [210159.743585] Freezing user space processes ... (elapsed 0.009 seconds) done. [210159.753345] OOM killer disabled. [210159.753349] Freezing remaining freezable tasks ... (elapsed 0.003 seconds) done. [210159.757357] printk: Suspending console(s) (use no_console_suspend to debug) [210159.945365] sd 2:0:0:0: [sdb] Synchronizing SCSI cache [210159.945443] sd 0:0:0:0: [sda] Synchronizing SCSI cache [210159.945651] sd 0:0:0:0: [sda] Stopping disk [210159.947225] sd 2:0:0:0: [sdb] Stopping disk [210160.019791] wlp3s0: deauthenticating from 5c:f4:ab:10:d2:bb by local choice (Reason: 3=DEAUTH_LEAVING) [210160.021158] e1000e: EEE TX LPI TIMER: 0011 [210161.245106] PM: suspend devices took 1.488 seconds [210161.266601] ACPI: EC: interrupt blocked [210161.305431] ACPI: Preparing to enter system sleep state S3 [210161.313532] ACPI: EC: event blocked [210161.313535] ACPI: EC: EC stopped [210161.313537] PM: Saving platform NVS memory [210161.313548] Disabling non-boot CPUs ... ... [224698.957159] wlp3s0: associated [229707.724067] wlp3s0: deauthenticated from 5c:f4:ab:10:d2:bb (Reason: 16=GROUP_KEY_HANDSHAKE_TIMEOUT) [229708.370607] wlp3s0: authenticate with 5c:f4:ab:10:d2:bb [229708.373732] wlp3s0: send auth to 5c:f4:ab:10:d2:bb (try 1/3) [229708.376501] wlp3s0: authenticated [229708.379997] wlp3s0: associate with 5c:f4:ab:10:d2:bb (try 1/3) [229708.383773] wlp3s0: RX AssocResp from 5c:f4:ab:10:d2:bb (capab=0x411 status=0 aid=1) [229708.386423] wlp3s0: associated [229756.518759] i915 :00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0 [229756.622596] i915 :00:02.0: [drm] fgfs[2648] context reset due to GPU hang -- http://www.livejournal.com/~pavelmachek signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 000/190] Revertion of all of the umn.edu commits
Hi! > > Revert "drm/radeon: Fix reference count leaks caused by > > pm_runtime_get_sync" > > Revert "drm/radeon: fix multiple reference count leak" > > Revert "drm/amdkfd: Fix reference count leaks." > > I didn't review these carefully, but from a quick look they all seem > rather inconsequental. Either error paths that are very unlikely, or > drivers which are very dead (looking at the entire list, not just what > you reverted here). > > Acked-by: Daniel Vetter So you are knowingly acking patch re-introducing bugs into kernel, because the bugs are minor? I don't believe that's an okay thing to do. Maybe something needs reverting, but lets not introduce bugs into kernel because they are "minor". Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Kernel stability on baytrail machines
On Tue 2016-07-12 16:41:58, Ezequiel Garcia wrote: > Hi Alan, > > (Adding interested people to this thread) > > On 09 Apr 08:14 PM, One Thousand Gnomes wrote: > > > > I do feel that the importance of the mentioned bug is currently > > > > underestimated. Can anyone here give a note, how much current linux > > > > kernel is supposed to be stable on general baytrail machines? > > > > > > If you did not get any replies... you might want to check MAINTAINERS > > > file, and > > > put Intel x86 maintainers on Cc list. > > > > > > I'm sure someone cares :-). > > > > Yes we care, and there are people looking at the various reports. > > > > Are there any updates on the status of this issue? > > The current bugzilla report [1] marks this as a power management > issue. However, many reports indicate that it would only freeze > when running X, so it's not completely clear if it's related to > the gfx driver too. Does "intel_idle.max_cstate=1" fix it for you? If you feel it is X-only problem, you may want to provide details about your graphics subsystem (DRM enabled? framebuffer only?) and probably cc. ...actually... you may want to verify if it happens in unaccelerated X. INTEL DRM DRIVERS (excluding Poulsbo, Moorestown and derivative chipsets) M: Daniel Vetter M: Jani Nikula L: intel-gfx@lists.freedesktop.org L: dri-de...@lists.freedesktop.org W: https://01.org/linuxgraphics/ Q: http://patchwork.freedesktop.org/project/intel-gfx/ T: git git://anongit.freedesktop.org/drm-intel S: Supported F: drivers/gpu/drm/i915/ F: include/drm/i915* F: include/uapi/drm/i915_drm.h Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Kernel stability on baytrail machines
> On Tue 2016-07-12 16:41:58, Ezequiel Garcia wrote: > >>Hi Alan, > >> > >>(Adding interested people to this thread) > >> > >>On 09 Apr 08:14 PM, One Thousand Gnomes wrote: > >I do feel that the importance of the mentioned bug is currently > >underestimated. Can anyone here give a note, how much current linux > >kernel is supposed to be stable on general baytrail machines? > If you did not get any replies... you might want to check MAINTAINERS > file, and > put Intel x86 maintainers on Cc list. > > I'm sure someone cares :-). > >>>Yes we care, and there are people looking at the various reports. > >>> > >>Are there any updates on the status of this issue? > >> > >>The current bugzilla report [1] marks this as a power management > >>issue. However, many reports indicate that it would only freeze > >>when running X, so it's not completely clear if it's related to > >>the gfx driver too. > >Does > > > >"intel_idle.max_cstate=1" > > > >fix it for you? > Yes, it does. > >If you feel it is X-only problem, you may want to provide details > >about your graphics subsystem (DRM enabled? framebuffer only?) and > >probably cc. > It's not X-only problem. Happens even in console mode, which is KMS > switched during boot though. > >...actually... you may want to verify if it happens in unaccelerated X. > As it happens even in console mode, is this relevant test? No, no need to test with X. Would it be possible to test in good old VGA mode? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] v4.20-rc1: list_del corruption on thinkpad x220, graphics related?
Hi! > > > > There's one similar for nouveau in Bugzilla, but it seems like a genuine > > > > memory corruption (1 bit flipped): > > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > > > > > > > Any extra information would be of use :) > > > > > > > > Regards, Joonas > > > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the > > > > information in one consolidated place: > > > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs > > > > > > I prefer email... certainly for bugs that can't be reproduced. > > > > By adding it to the Bugzilla it may be recognized by somebody else > > who is experiencing a similar issue. Internet points are not deducted > > for submitting bugs in good faith, even if they get closed as > > NOTABUG. Well, your documentation suggests you'll deduce my internet points: Before filing the bug, please try to reproduce your issue with the latest kernel. Use the latest drm-tip branch from http://cgit.freedesktop.org/drm-tip and build as instructed on our Build Guide. :-) > Feel free to copy from email to bugzilla :-). Hmm, so it seems it happened again today: Dec 8 11:45:01 duo CRON[29325]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Dec 8 11:46:42 duo org.mate.panel.applet.MateWeatherAppletFactory[3983]: (mateweather-applet-2:4242): GLib-CRITICAL **: Source ID 14603 was not found when attempting to remove it Dec 8 11:54:59 duo kernel: list_del corruption. prev->next should be 88019283ea28, but was 8801411a1c68 Dec 8 11:54:59 duo kernel: [ cut here ] Dec 8 11:54:59 duo kernel: kernel BUG at /data/fast/l/k/lib/list_debug.c:53! Dec 8 11:54:59 duo kernel: invalid opcode: [#1] SMP PTI Dec 8 11:54:59 duo kernel: CPU: 1 PID: 3428 Comm: Xorg Not tainted 4.20.0-rc1+ #4 Dec 8 11:54:59 duo kernel: Hardware name: LENOVO 42872WU/42872WU, BIOS 8DET74WW (1.44 ) 03/13/2018 Dec 8 11:54:59 duo kernel: RIP: 0010:__list_del_entry_valid+0x8e/0x90 Dec 8 11:54:59 duo kernel: Code: 16 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 08 75 5e 85 e8 03 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 40 75 5e 85 e8 f0 87 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 Dec 8 11:54:59 duo kernel: RSP: :c9223ac0 EFLAGS: 00213282 Dec 8 11:54:59 duo kernel: RAX: 0054 RBX: 880115a07c40 RCX: Dec 8 11:54:59 duo kernel: RDX: RSI: 88019e2653d8 RDI: 88019e2653d8 Dec 8 11:54:59 duo kernel: RBP: c9223ac0 R08: 880193a2ad10 R09: Dec 8 11:54:59 duo kernel: R10: 008e9088 R11: 2e6e6f6974707501 R12: 8801960cb240 Dec 8 11:54:59 duo kernel: R13: 88019283e900 R14: 880115a07ec0 R15: 88019283ea28 Dec 8 11:54:59 duo kernel: FS: () GS:88019e24(0063) knlGS:f79c4880 Dec 8 11:54:59 duo kernel: CS: 0010 DS: 002b ES: 002b CR0: 80050033 Dec 8 11:54:59 duo kernel: CR2: 086b0df8 CR3: 0001939f6004 CR4: 000606a0 Dec 8 11:54:59 duo kernel: Call Trace: Dec 8 11:54:59 duo kernel: i915_vma_move_to_active+0x1c3/0x510 Dec 8 11:54:59 duo kernel: ? i915_request_await_object+0xf4/0x280 Dec 8 11:54:59 duo kernel: i915_gem_do_execbuffer+0xe2f/0x10a0 Dec 8 11:54:59 duo kernel: ? find_held_lock+0x39/0xb0 Dec 8 11:54:59 duo kernel: ? kvmalloc_node+0x26/0x70 Dec 8 11:54:59 duo kernel: i915_gem_execbuffer2_ioctl+0x1b4/0x360 Dec 8 11:54:59 duo kernel: ? i915_gem_execbuffer_ioctl+0x290/0x290 Dec 8 11:54:59 duo kernel: drm_ioctl_kernel+0xaa/0xf0 Dec 8 11:54:59 duo kernel: drm_ioctl+0x323/0x3d0 Dec 8 11:54:59 duo kernel: ? i915_gem_execbuffer_ioctl+0x290/0x290 Dec 8 11:54:59 duo kernel: ? posix_ktime_get_ts+0xc/0x10 Dec 8 11:54:59 duo kernel: i915_compat_ioctl+0x37/0x40 Dec 8 11:54:59 duo kernel: __ia32_compat_sys_ioctl+0x429/0xe90 Dec 8 11:54:59 duo kernel: ? put_old_timespec32+0x9/0x10 Dec 8 11:54:59 duo kernel: ? __ia32_compat_sys_clock_gettime+0x67/0x90 Dec 8 11:54:59 duo kernel: do_int80_syscall_32+0x50/0x100 Dec 8 11:54:59 duo kernel: entry_INT80_compat+0x7d/0x82 Dec 8 11:54:59 duo kernel: RIP: 0023:0xf7fd5c42 Dec 8 11:54:59 duo kernel: Code: 65 8b 15 04 00 00 00 8b 0e 8b 0c ca 83 f9 ff 75 0c 89 04 24 89 f0 e8 b3 fe ff ff eb 05 8b 46 04 01 c8 83 c4 14 5b 5e c3 cd 80 8d b6 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00 Dec 8 11:54:59 duo kernel: RSP: 002b:fff1a014 EFLAGS: 00203292 ORIG_RAX: 0036 Dec 8 11:54:59 duo kernel: RAX: ffda RBX: 000a RCX: 40406469 Dec 8 11:54:59 duo kernel: RDX: fff1a0bc RSI: RDI: 40406469 Dec 8 11:54:59 duo kernel: RBP: 000a R08: R09: Dec 8 11:54:59 duo kernel: R10: R11:
Re: [Intel-gfx] v4.20-rc1: list_del corruption on thinkpad x220, graphics related?
On Sat 2018-12-08 12:13:46, Pavel Machek wrote: > Hi! > > > > > > There's one similar for nouveau in Bugzilla, but it seems like a > > > > > genuine > > > > > memory corruption (1 bit flipped): > > > > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > > > > > > > > > Any extra information would be of use :) > > > > > > > > > > Regards, Joonas > > > > > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the > > > > > information in one consolidated place: > > > > > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs > > > > > > > > I prefer email... certainly for bugs that can't be reproduced. > > > > > > By adding it to the Bugzilla it may be recognized by somebody else > > > who is experiencing a similar issue. Internet points are not deducted > > > for submitting bugs in good faith, even if they get closed as > > > NOTABUG. > > Well, your documentation suggests you'll deduce my internet points: > > Before filing the bug, please try to reproduce your issue with the > latest kernel. Use the latest drm-tip branch from > http://cgit.freedesktop.org/drm-tip and build as instructed on our > Build Guide. > > :-) I'd prefer not to run drm-tip. I'll update to 2.6.20-rc5+ and see if it re-appears (but it takes long time to reproduce :-(). If you think it is useful, I can try to update my machine to linux-next. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] v4.20-rc5+ on x220: Resetting chip for hang on rcs0
Hi! Another day, another problem... but this one is different from the previous hang, as machine survives. Chromium was running with youtube video playing. [31850.666274] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [31850.666277] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [31850.666279] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [31850.666282] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [31850.666285] [drm] GPU crash dump saved to /sys/class/drm/card0/error [31850.666394] i915 :00:02.0: Resetting chip for hang on rcs0 [31850.668474] WARNING: CPU: 0 PID: 13675 at /data/fast/l/k/include/linux/dma-fence.h:503 i915_request_skip+0x71/0x80 [31850.668478] Modules linked in: [31850.668484] CPU: 0 PID: 13675 Comm: kworker/0:3 Not tainted 4.20.0-rc5+ #5 [31850.668487] Hardware name: LENOVO 42872WU/42872WU, BIOS 8DET74WW (1.44 ) 03/13/2018 Dmesg and /sys/class/drm/card0/error are attached. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html delme.gz Description: application/gzip delme2.gz Description: application/gzip signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related?
Hi! > > > > > > > There's one similar for nouveau in Bugzilla, but it seems like a > > > > > > > genuine > > > > > > > memory corruption (1 bit flipped): > > > > > > > > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > > > > > > > > > > > > > Any extra information would be of use :) > > > > > > > > > > > > > > Regards, Joonas > > > > > > > > > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the > > > > > > > information in one consolidated place: > > > > > > > > > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs > > > > > > > > > > > > I prefer email... certainly for bugs that can't be reproduced. > > > > > > > > > > By adding it to the Bugzilla it may be recognized by somebody else > > > > > who is experiencing a similar issue. Internet points are not deducted > > > > > for submitting bugs in good faith, even if they get closed as > > > > > NOTABUG. > > > > > > Well, your documentation suggests you'll deduce my internet points: > > > > > > Before filing the bug, please try to reproduce your issue with the > > > latest kernel. Use the latest drm-tip branch from > > > http://cgit.freedesktop.org/drm-tip and build as instructed on our > > > Build Guide. > > > > > > :-) > > > > I'd prefer not to run drm-tip. I'll update to 2.6.20-rc5+ and see if > > it re-appears (but it takes long time to reproduce :-(). > > If we can or can not reproduce the issue with drm-tip, is a very useful > datapoint for us. If we can not reproduce, it'll be possible to bisect > which commit fixed it, and backport that. On the other hand, if it's > still reproducible, we know we're not spending time on something we > already fixed, and the priority gets a bump. bisect ... is not practical on something that takes 2 days to reproduce. > > If you think it is useful, I can try to update my machine to > > linux-next. > > linux-next is closer to drm-tip, so it's better. Do you have some > specific reason for not wanting to run drm-tip (but linux-next is still > ok)? I already have build/update scripts for -next, and I trust -next not to store screenshots of my desktop in my master boot record :-). Anyway, it does happen with -next. This time, chromiums were running, and crash happened minute? after I exited flightgear. It can be seen in the logs. Oh and I might want to mention -- machine was rather deep in swap this time, as in "mouse jumping when starting fgfs" and "could feel the chromium being swapped back in". I might have had this situation before, and just powercycled the machine "because it is so deep in swap that it will not recover". top says: top - 19:18:24 up 2 days, 8:03, 2 users, load average: 3.02, 3.45, 3.21 Tasks: 141 total, 1 running, 86 sleeping, 0 stopped, 2 zombie %Cpu(s): 18.8 us, 7.6 sy, 3.0 ni, 68.4 id, 1.3 wa, 0.0 hi, 0.9 si, 0.0 st KiB Mem: 5967968 total, 663244 used, 5304724 free,48876 buffers KiB Swap: 1681428 total, 170904 used, 1510524 free. 446280 cached Mem but of course that memory is free once everything died. Any ideas? Should I go back to v4.19 to see if it happens there, too? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html delme.gz Description: application/gzip signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [regression from v4.19] Re: 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related?
Hi! > > > > If you think it is useful, I can try to update my machine to > > > > linux-next. > > > > > > linux-next is closer to drm-tip, so it's better. Do you have some > > > specific reason for not wanting to run drm-tip (but linux-next is still > > > ok)? > > > > I already have build/update scripts for -next, and I trust -next not > > to store screenshots of my desktop in my master boot record :-). > > > > Anyway, it does happen with -next. This time, chromiums were running, > > and crash happened minute? after I exited flightgear. It can be seen > > in the logs. > > > > Oh and I might want to mention -- machine was rather deep in swap this > > time, as in "mouse jumping when starting fgfs" and "could feel the > > chromium being swapped back in". I might have had this situation > > before, and just powercycled the machine "because it is so deep in > > swap that it will not recover". > > > > top says: > > > > top - 19:18:24 up 2 days, 8:03, 2 users, load average: 3.02, 3.45, > > 3.21 > > Tasks: 141 total, 1 running, 86 sleeping, 0 stopped, 2 zombie > > %Cpu(s): 18.8 us, 7.6 sy, 3.0 ni, 68.4 id, 1.3 wa, 0.0 hi, 0.9 > > si, 0.0 st > > KiB Mem: 5967968 total, 663244 used, 5304724 free,48876 > > buffers > > KiB Swap: 1681428 total, 170904 used, 1510524 free. 446280 > > cached Mem > > > > but of course that memory is free once everything died. > > > > Any ideas? Should I go back to v4.19 to see if it happens there, too? > > linux-next includes very much the same code as drm-tip. There's nobody > magically reviewing the code more than it is reviewed for inclusion into > drm-tip, when it is fed into linux-next. So thinking linux-next would be > some way safer is an illusion. > > It sounds like having memory pressure expedites the corruption, which > should make it easier to reproduce and thus fix. > > So if you could please try drm-tip reproducing AND open a bug in Bugzilla. > If you are unwilling to do that, it is very difficult to help you > more. Website says I have to read and agree to two different pieces of legalesee, and I'd need to keep track of yet another password... so you can "communicate" with me. But you can already communicate with me, over email. I verified v4.19 is stable -- it worked ok for way more than two days it usually takes to crash. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [regression from v4.19] Re: 4.20.0-rc6-next-20181210, v4.20-rc1: list_del corruption on thinkpad x220, graphics related?
Hi! > > > So if you could please try drm-tip reproducing AND open a bug in Bugzilla. > > > If you are unwilling to do that, it is very difficult to help you > > > more. > > > > Website says I have to read and agree to two different pieces of > > legalesee, and I'd need to keep track of yet another password... so > > you can "communicate" with me. > > > > But you can already communicate with me, over email. > > I've listed all the reasons why our bug handling process is what it is. > > If registering to the Bugzilla is too much of an effort for you, then I > won't be able to help you further on this. Actually I did register at the bugzilla. Only useful help there was that CONFIG_DRM_I915_DEBUG_GEM might be useful. Unfortunately that one seems to make it panic() and impossible to get anything useful. https://bugs.freedesktop.org/show_bug.cgi?id=109175 Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
5.2: display corruption on X60, X220
Hi! In recent kernels (5.2.0-rc1-next-20190522, 5.2-rc2) I'm getting display corruption in X. Usually in terminals, but also in title bars etc. Black areas with white lines in them, usually... Same configuration worked properly in ... probably 4.19? Then I got some graphics-crashes on X220 that prevented me from testing :-(. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: 5.2: display corruption on X60, X220
On Thu 2019-06-06 11:32:18, Jani Nikula wrote: > On Mon, 03 Jun 2019, Pavel Machek wrote: > > In recent kernels (5.2.0-rc1-next-20190522, 5.2-rc2) I'm getting > > display corruption in X. Usually in terminals, but also in title bars > > etc. Black areas with white lines in them, usually... > > > > Same configuration worked properly in ... probably 4.19? Then I got > > some graphics-crashes on X220 that prevented me from testing :-(. > > It's pretty hard to say anything based on the above. > > Anything in the logs with drm.debug=14 added? I see. It looks like hard-to-debug issue. Oh, interesting part is that corruption _is_ visible if I make a screenshot. Will try with drm.debug=... Do you do some kind of testing that would catch similar issues? Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
[Intel-gfx] DDC on Thinkpad x220
Hi! Thinkpad X220 should be new enough machine to talk DDC to the monitors, right? And my monitor has DDC enable/disable in the menu, so it should support it, too... But I don't have /dev/i2c* and did not figure out how to talk to the monitor. Is the support there in the kernel? What do I need to enable it? lspci says: 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
5.4-rc1 on Thinkpad x220: graphics regression, it "snows" on digital output
Hi! When 5.4-rc1 is booted on thinkpad X220 I get "snow" and other artefacts on digital output. 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) It already snows when kernel is booting, snow continues in X. HDMI1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 478mm x 268mm 1920x1080 60.00*+ Snow continues in other video modes: pavel@duo:~$ xrandr --output HDMI1 --mode 1024x768 pavel@duo:~$ VGA output appears normal. Any ideas? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: 5.4-rc1 on Thinkpad x220: graphics regression, it "snows" on digital output
Hi! > When 5.4-rc1 is booted on thinkpad X220 I get "snow" and other > artefacts on digital output. > > 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation > Core Processor Family Integrated Graphics Controller (rev 09) > > It already snows when kernel is booting, snow continues in X. Sorry, false alarm. I seem to have a hardware problem, it persisted reboot to older kernel, and went away after I wiggled cables. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: DDC on Thinkpad x220
On Tue 2019-10-01 12:39:34, Jani Nikula wrote: > On Mon, 30 Sep 2019, Pavel Machek wrote: > > Hi! > > > > Thinkpad X220 should be new enough machine to talk DDC to the > > monitors, right? And my monitor has DDC enable/disable in the menu, so > > it should support it, too... > > > > But I don't have /dev/i2c* and did not figure out how to talk to the > > monitor. Is the support there in the kernel? What do I need to enable > > it? > > # modprobe i2c-dev Thanks! I enabled I2C_CHARDEV, and installed ddccontrol: c ddccontrol - program to control monitor I can read parameters of Dell monitor on VGA: sudo ddccontrol dev:/dev/i2c-1 -c -d /usr/share/ddccontrol-db/monitor/DELA013.xml Control 0x10: +/79/100 [???] -- brightness Control 0x12: +/63/100 [???] -- contrast Unfortunately the Fujitsu monitor does not seem to communicate. Fujitsu is my main monitor :-(. pavel@duo:~$ sudo ddccontrol dev:/dev/i2c-4 -c -d ddccontrol version 0.4.2 Copyright 2004-2005 Oleg I. Vdovikin (o...@cs.msu.su) Copyright 2004-2006 Nicolas Boichat (nico...@boichat.ch) This program comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of this program under the terms of the GNU General Public License. Reading EDID and initializing DDC/CI at bus dev:/dev/i2c-4... ioctl(): No such device or address ioctl returned -1 ioctl(): No such device or address ioctl returned -1 ioctl(): No such device or address ioctl returned -1 I/O warning : failed to load external entity "/usr/share/ddccontrol-db/monitor/FUS080A.xml" Document not parsed successfully. ioctl(): No such device or address ioctl returned -1 DDC/CI at dev:/dev/i2c-4 is unusable (-1). If your graphics card need it, please check all the required kernel modules are loaded (i2c-dev, and your framebuffer driver). Any further hints? Thanks and best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [Intel-gfx] [PATCH] drm: Add support for integrated privacy screens
On Tue 2019-10-22 17:12:06, Rajat Jain wrote: > Certain laptops now come with panels that have integrated privacy > screens on them. This patch adds support for such panels by adding > a privacy-screen property to the drm_connector for the panel, that > the userspace can then use to control and check the status. The idea > was discussed here: Much better than separate /sys interface, thanks! Pavel -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] xorg hang in 5.5-rc1 -- use after free?
Hi! I got an X hang and there seems to be something useful in dmesg... Any ideas? Pavel [0.00] Linux version 5.5.0-rc1+ (pavel@amd) (gcc version 4.9.2 (Debian 4.9.2-10+deb8u2)) #73 SMP PREEMPT Fri Dec 13 00:46:17 CET 2019 [0.00] Command line: BOOT_IMAGE=(hd1,2)/l/k/o/64/arch/x86/boot/bzImage root=/dev/sda3 resume=/dev/sda1 [0.00] Disabled fast string operations [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable [0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x1fff] usable [0.00] BIOS-e820: [mem 0x2000-0x201f] reserved [0.00] BIOS-e820: [mem 0x2020-0x3fff] usable [0.00] BIOS-e820: [mem 0x4000-0x401f] reserved [0.00] BIOS-e820: [mem 0x4020-0xda99efff] usable [0.00] BIOS-e820: [mem 0xda99f000-0xdae9efff] reserved [0.00] BIOS-e820: [mem 0xdae9f000-0xdaf9efff] ACPI NVS [0.00] BIOS-e820: [mem 0xdaf9f000-0xdaffefff] ACPI data [0.00] BIOS-e820: [mem 0xdafff000-0xdaff] usable [0.00] BIOS-e820: [mem 0xdb00-0xdf9f] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfed08000-0xfed08fff] reserved [0.00] BIOS-e820: [mem 0xfed1-0xfed19fff] reserved [0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xffd2-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00019e5f] usable [0.00] BIOS-e820: [mem 0x00019e60-0x00019e7f] reserved [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.6 present. [0.00] DMI: LENOVO 42872WU/42872WU, BIOS 8DET74WW (1.44 ) 03/13/2018 [0.00] tsc: Fast TSC calibration using PIT [0.00] tsc: Detected 2492.091 MHz processor [0.001588] e820: update [mem 0x-0x0fff] usable ==> reserved [0.001591] e820: remove [mem 0x000a-0x000f] usable [0.001599] last_pfn = 0x19e600 max_arch_pfn = 0x4 [0.001605] MTRR default type: uncachable [0.001606] MTRR fixed ranges enabled: [0.001608] 0-9 write-back [0.001610] A-B uncachable [0.001611] C-F write-protect [0.001612] MTRR variable ranges enabled: [0.001614] 0 base 0FFC0 mask FFFC0 write-protect [0.001616] 1 base 0 mask F8000 write-back [0.001618] 2 base 08000 mask FC000 write-back [0.001619] 3 base 0C000 mask FE000 write-back [0.001621] 4 base 0DC00 mask FFC00 uncachable [0.001622] 5 base 0DB00 mask FFF00 uncachable [0.001624] 6 base 1 mask F8000 write-back [0.001625] 7 base 18000 mask FE000 write-back [0.001626] 8 base 19F00 mask FFF00 uncachable [0.001628] 9 base 19E80 mask FFF80 uncachable [0.002410] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [0.003810] last_pfn = 0xdb000 max_arch_pfn = 0x4 [0.003831] reserving inaccessible SNB gfx pages [0.003837] BRK [0x06001000, 0x06001fff] PGTABLE [0.003840] BRK [0x06002000, 0x06002fff] PGTABLE [0.003842] BRK [0x06003000, 0x06003fff] PGTABLE [0.003904] BRK [0x06004000, 0x06004fff] PGTABLE [0.004060] BRK [0x06005000, 0x06005fff] PGTABLE [0.004332] BRK [0x06006000, 0x06006fff] PGTABLE [0.004839] ACPI: Early table checksum verification disabled [0.004846] ACPI: RSDP 0x000F00E0 24 (v02 LENOVO) [0.004851] ACPI: XSDT 0xDAFFE120 AC (v01 LENOVO TP-8D 1440 PTEC 0002) [0.004858] ACPI: FACP 0xDAFE7000 F4 (v04 LENOVO TP-8D 1440 PTL 0002) [0.004865] ACPI: DSDT 0xDAFEA000 00FA89 (v01 LENOVO TP-8D 1440 INTL 20061109) [0.004870] ACPI: FACS 0xDAF2D000 40 [0.004874] ACPI: FACS 0xDAF2D000
[Intel-gfx] 5.6-rc6: Xorg hangs
Hi! Hardware is thinkpad x220. I had this crash few days ago. And today I have similar-looking one, with slightly newer kernel. (Will post as a follow-up). Any idea what can be wrong? Pavel [171953.828956] iwlwifi :03:00.0: Radio type=0x0-0x0-0x3 [171953.965936] iwlwifi :03:00.0: Radio type=0x0-0x0-0x3 [172269.832635] iwlwifi :03:00.0: Radio type=0x0-0x0-0x3 [172269.964645] iwlwifi :03:00.0: Radio type=0x0-0x0-0x3 [172585.837116] iwlwifi :03:00.0: Radio type=0x0-0x0-0x3 [172585.973091] iwlwifi :03:00.0: Radio type=0x0-0x0-0x3 [172901.836180] iwlwifi :03:00.0: Radio type=0x0-0x0-0x3 [172901.909705] iwlwifi :03:00.0: Radio type=0x0-0x0-0x3 [173216.838138] iwlwifi :03:00.0: Radio type=0x0-0x0-0x3 [173216.998141] iwlwifi :03:00.0: Radio type=0x0-0x0-0x3 [173394.002295] INFO: task Xorg:3074 blocked for more than 120 seconds. [173394.002326] Not tainted 5.6.0-rc6+ #83 [173394.002348] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [173394.002370] XorgD0 3074 3067 0x00404000 [173394.002397] Call Trace: [173394.002430] __schedule+0x350/0x6b0 [173394.002445] schedule+0x3b/0xf0 [173394.002457] schedule_preempt_disabled+0x13/0x20 [173394.002468] __mutex_lock+0x3e0/0x8a0 [173394.002480] ? i915_vma_pin+0xb4/0x750 [173394.002492] mutex_lock_nested+0x16/0x20 [173394.002503] ? mutex_lock_nested+0x16/0x20 [173394.002511] i915_vma_pin+0xb4/0x750 [173394.002526] eb_lookup_vmas+0x1c2/0xd10 [173394.002539] i915_gem_do_execbuffer+0x6a7/0x1ef0 [173394.002556] ? __lock_acquire.isra.33+0x297/0x550 [173394.002566] ? find_held_lock+0x35/0xa0 [173394.002579] ? kvmalloc_node+0x67/0x70 [173394.002593] ? i915_gem_execbuffer_ioctl+0x270/0x270 [173394.002604] i915_gem_execbuffer2_ioctl+0x1bc/0x390 [173394.002616] ? i915_gem_execbuffer_ioctl+0x270/0x270 [173394.002628] drm_ioctl_kernel+0xab/0xf0 [173394.002639] drm_ioctl+0x205/0x3e0 [173394.002650] ? i915_gem_execbuffer_ioctl+0x270/0x270 [173394.002665] ? __fget_files+0x9d/0xd0 [173394.002677] ksys_ioctl+0x73/0xb0 [173394.002688] __x64_sys_ioctl+0x15/0x20 [173394.002698] do_syscall_64+0x48/0x110 [173394.002709] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [173394.002720] RIP: 0033:0x7f207a42b427 [173394.002737] Code: Bad RIP value. [173394.002749] RSP: 002b:7ffc7ed2af88 EFLAGS: 0246 ORIG_RAX: 0010 [173394.002764] RAX: ffda RBX: 56121c1436d0 RCX: 7f207a42b427 [173394.002776] RDX: 7ffc7ed2afd0 RSI: 40406469 RDI: 000e [173394.002788] RBP: 7ffc7ed2afd0 R08: 56121c1810a0 R09: 7ffc7edd1090 [173394.002801] R10: 7ffc7ed2b070 R11: 0246 R12: 40406469 [173394.002813] R13: 000e R14: R15: [173394.002833] INFO: task InputThread:3377 blocked for more than 120 seconds. [173394.002845] Not tainted 5.6.0-rc6+ #83 [173394.002857] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [173394.002869] InputThread D0 3377 3067 0x0040 [173394.002886] Call Trace: [173394.002903] __schedule+0x350/0x6b0 [173394.002920] schedule+0x3b/0xf0 [173394.002935] schedule_preempt_disabled+0x13/0x20 [173394.002951] __mutex_lock+0x3e0/0x8a0 [173394.002971] ? i915_gem_object_bump_inactive_ggtt+0x3f/0x210 [173394.002984] mutex_lock_nested+0x16/0x20 [173394.002995] ? mutex_lock_nested+0x16/0x20 [173394.003006] i915_gem_object_bump_inactive_ggtt+0x3f/0x210 [173394.003018] i915_gem_object_unpin_from_display_plane+0x23/0x60 [173394.003033] intel_unpin_fb_vma+0x40/0xb0 [173394.003045] intel_legacy_cursor_update+0x2ae/0x320 [173394.003058] __setplane_atomic+0xce/0x110 [173394.003069] drm_mode_cursor_universal+0x13d/0x260 [173394.003082] drm_mode_cursor_common+0xd5/0x240 [173394.003093] ? drm_mode_setplane+0x1b0/0x1b0 [173394.003103] drm_mode_cursor_ioctl+0x45/0x60 [173394.003113] drm_ioctl_kernel+0xab/0xf0 [173394.003124] drm_ioctl+0x205/0x3e0 [173394.003133] ? drm_mode_setplane+0x1b0/0x1b0 [173394.003146] ? __fget_files+0x9d/0xd0 [173394.003158] ksys_ioctl+0x73/0xb0 [173394.003169] __x64_sys_ioctl+0x15/0x20 [173394.003178] do_syscall_64+0x48/0x110 [173394.003187] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [173394.003196] RIP: 0033:0x7f207a42b427 [173394.003208] Code: Bad RIP value. [173394.003216] RSP: 002b:7f2076b142d8 EFLAGS: 0246 ORIG_RAX: 0010 [173394.003225] RAX: ffda RBX: 56121c190180 RCX: 7f207a42b427 [173394.003233] RDX: 7f2076b14310 RSI: c01c64a3 RDI: 000e [173394.003241] RBP: 7f2076b14310 R08: 0001 R09: 0001 [173394.003249] R10: 0780 R11: 0246 R12: c01c64a3 [173394.003256] R13: 000e R14: 043b R15: 01e7 [173394
[Intel-gfx] 5.7-rc0: hangs while attempting to run X
Hi! > > Hardware is thinkpad x220. I had this crash few days ago. And today I > > have similar-looking one, with slightly newer kernel. (Will post > > as a follow-up). As part of quest for working system, I tried 5.7-rc0, based on Merge: 50a5de895dbe b4d8ddf8356d Author: Linus Torvalds Date: Wed Apr 1 18:18:18 2020 -0700 It hangs in userspace, at a time when X should be starting, and I'm looking at blinking cursor. 5.6-rcs worked, I'll test 5.6-final. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.7-rc0: hangs while attempting to run X
Hi! > > > Hardware is thinkpad x220. I had this crash few days ago. And today I > > > have similar-looking one, with slightly newer kernel. (Will post > > > as a follow-up). > > As part of quest for working system, I tried 5.7-rc0, based on > > Merge: 50a5de895dbe b4d8ddf8356d > Author: Linus Torvalds > Date: Wed Apr 1 18:18:18 2020 -0700 > > It hangs in userspace, at a time when X should be starting, and I'm > looking at blinking cursor. > > 5.6-rcs worked, I'll test 5.6-final. 5.6-final works. Hmm... commit f365ab31efacb70bed1e821f7435626e0b2528a6 Merge: 4646de87d325 59e7a8cc2dcf Author: Linus Torvalds Date: Wed Apr 1 15:24:20 2020 -0700 Merge tag 'drm-next-2020-04-01' of git://anongit.freedesktop.org/drm/drm Let me test 4646de87d32526ee87b46c2e0130413367fb5362...that one works. Ok, so obviously... I should test... f365ab31efacb70bed1e821f7435626e0b2528a6 Now, this is anti-social: Busywait for request completion limit (ns) (DRM_I915_MAX_REQUEST_BUSYWAIT) [8000] (NEW) How should I know what to answer here (or the others) Interval between heartbeat pulses (ms) (DRM_I915_HEARTBEAT_INTERVAL) [2500] 2500 Preempt timeout (ms, jiffy granularity) (DRM_I915_PREEMPT_TIMEOUT) [640] 640 I just took the defaults.. but... Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] 5.7-rc0: regression caused by drm tree, hangs while attempting to run X
Hi! > > > > Hardware is thinkpad x220. I had this crash few days ago. And today I > > > > have similar-looking one, with slightly newer kernel. (Will post > > > > as a follow-up). > > > > As part of quest for working system, I tried 5.7-rc0, based on > > > > Merge: 50a5de895dbe b4d8ddf8356d > > Author: Linus Torvalds > > Date: Wed Apr 1 18:18:18 2020 -0700 > > > > It hangs in userspace, at a time when X should be starting, and I'm > > looking at blinking cursor. > > > > 5.6-rcs worked, I'll test 5.6-final. > > 5.6-final works. > > Hmm... > > commit f365ab31efacb70bed1e821f7435626e0b2528a6 > Merge: 4646de87d325 59e7a8cc2dcf > Author: Linus Torvalds > Date: Wed Apr 1 15:24:20 2020 -0700 > > Merge tag 'drm-next-2020-04-01' of git://anongit.freedesktop.org/drm/drm > > Let me test 4646de87d32526ee87b46c2e0130413367fb5362...that one works. > > Ok, so obviously... I should > test... f365ab31efacb70bed1e821f7435626e0b2528a6 f365ab31efacb70bed1e821f7435626e0b2528a6 is broken, and it is the first broken merge. next-0403 is also broken. Any ideas, besides the b-word? Would c0ca be good commit for testing? commit 700d6ab987f3b5e28b13b5993e5a9a975c5604e2 Merge: c0ca5437c509 2bdd4c28baff Author: Dave Airlie Date: Mon Mar 30 15:56:03 2020 +1000 Merge tag 'drm-intel-next-fixes-2020-03-27' of git://anongit.freedesktop.org /drm/drm-intel into drm-next Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.7-rc0: regression caused by drm tree, hangs while attempting to run X
Hi! > > commit f365ab31efacb70bed1e821f7435626e0b2528a6 > > Merge: 4646de87d325 59e7a8cc2dcf > > Author: Linus Torvalds > > Date: Wed Apr 1 15:24:20 2020 -0700 > > > > Merge tag 'drm-next-2020-04-01' of > > git://anongit.freedesktop.org/drm/drm > Any ideas, besides the b-word? > > Would c0ca be good commit for testing? > > commit 700d6ab987f3b5e28b13b5993e5a9a975c5604e2 > Merge: c0ca5437c509 2bdd4c28baff c0ca is broken. commit 9001b17698d86f842e2b13e0cafe8021d43209e9 Merge: bda1fb0ed000 217a485c8399 Merge tag 'drm-intel-next-2020-03-13' of git://anongit.freedesktop.org/drm/d rm-intel into drm-next UAPI Changes: So bda1fb0ed000 looks like test candidate... and that one works. I guess 217a485c8399 is reasonable next step... and that 11a48a5a18c63fd7621bb050228cebf13566e4d8 should work ok. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [bisected] Re: 5.7-rc0: regression caused by drm tree, hangs while attempting to run X
Hi! > > > commit f365ab31efacb70bed1e821f7435626e0b2528a6 > > > Merge: 4646de87d325 59e7a8cc2dcf > > > Author: Linus Torvalds > > > Date: Wed Apr 1 15:24:20 2020 -0700 > > > > > > Merge tag 'drm-next-2020-04-01' of > > > git://anongit.freedesktop.org/drm/drm > > > > Any ideas, besides the b-word? > > > > Would c0ca be good commit for testing? > > > > commit 700d6ab987f3b5e28b13b5993e5a9a975c5604e2 > > Merge: c0ca5437c509 2bdd4c28baff > > c0ca is broken. > > commit 9001b17698d86f842e2b13e0cafe8021d43209e9 > Merge: bda1fb0ed000 217a485c8399 > > Merge tag 'drm-intel-next-2020-03-13' of > git://anongit.freedesktop.org/drm/d > rm-intel into drm-next > > UAPI Changes: > > So bda1fb0ed000 looks like test candidate... and that one works. > > I guess 217a485c8399 is reasonable next step... and that > 11a48a5a18c63fd7621bb050228cebf13566e4d8 should work ok. # bad: [217a485c8399634abacd2f138b3524d2e78e8aad] drm/i915: Update DRIVER_DATE to 20200313 # good: [11a48a5a18c63fd7621bb050228cebf13566e4d8] Linux 5.6-rc2 git bisect start '217a485c8399' '11a48a5a18c63fd7621bb050228cebf13566e4d8' # good: [837b63e6087838d0f1e612d448405419199d8033] drm/i915: Program MBUS with rmw during initialization git bisect good 837b63e6087838d0f1e612d448405419199d8033 # good: [3a1b82a19ff91cfef9b5d9d9faabb0ebcac15df0] drm/i915/tgl: Allow DC5/DC6 entry while PG2 is active git bisect good 3a1b82a19ff91cfef9b5d9d9faabb0ebcac15df0 # good: [ba518bbd3f3c265419fa8c3702940cb7c642c6a5] drm/i915: Force DPCD backlight mode for some Dell CML 2020 panels git bisect good ba518bbd3f3c265419fa8c3702940cb7c642c6a5 # good: [73ce0969d1d0bc2cb53370017923640db72e70ec] drm/i915: Clean up integer types in color code git bisect good 73ce0969d1d0bc2cb53370017923640db72e70ec # bad: [aa64f8e1cf235f2e36615dba57c2c50d06181f84] drm/i915: Add Wa_1209644611:icl,ehl git bisect bad aa64f8e1cf235f2e36615dba57c2c50d06181f84 # good: [32fc2849a3d59dc10efda38ef88e8f9052f711be] drm/i915/dsb: convert to drm_device based logging macros. git bisect good 32fc2849a3d59dc10efda38ef88e8f9052f711be # good: [4aea5a9e6521c1ad484992d490f1cefa7d73d1ec] drm/i915/gem: Mark up the racy read of the mmap_singleton git bisect good 4aea5a9e6521c1ad484992d490f1cefa7d73d1ec # bad: [7dc8f1143778a35b190f9413f228b3cf28f67f8d] drm/i915/gem: Drop relocation slowpath git bisect bad 7dc8f1143778a35b190f9413f228b3cf28f67f8d # good: [c02aac25f150d1b7215b9481f8cdd30cc607bedf] drm/i915/gem: Mark up sw-fence notify function git bisect good c02aac25f150d1b7215b9481f8cdd30cc607bedf # good: [07bcfd1291de77ffa9b627b4442783aba1335229] drm/i915/gen12: Disable preemption timeout git bisect good 07bcfd1291de77ffa9b627b4442783aba1335229 # first bad commit: [7dc8f1143778a35b190f9413f228b3cf28f67f8d] drm/i915/gem: Drop relocation slowpath Any ideas? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [bisected] Re: 7dc8f11437: regression in 5.7-rc0, hangs while attempting to run X
Hi! 7dc8f1143778a35b190f9413f228b3cf28f67f8d drm/i915/gem: Drop relocation slowpath Since the relocations are no longer performed under a global struct_mutex, or any other lock, that is also held by pagefault handlers, we can relax and allow our fast path to take a fault. As we no longer need to abort the fast path for lock avoidance, we no longer need the slow path handling at all. causes regression on thinkpad x220: instead of starting X, I'm looking at blinking cursor. Reverting the patch on too of 919dce24701f7b3 fixes things for me. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] Linus, please revert 7dc8f11437: regression in 5.7-rc0, hangs while attempting to run X
On Fri 2020-04-03 15:00:31, Pavel Machek wrote: > Hi! > > 7dc8f1143778a35b190f9413f228b3cf28f67f8d > > drm/i915/gem: Drop relocation slowpath > > Since the relocations are no longer performed under a global > struct_mutex, or any other lock, that is also held by pagefault handlers, > we can relax and allow our fast path to take a fault. As we no longer > need to abort the fast path for lock avoidance, we no longer need the > slow path handling at all. > > causes regression on thinkpad x220: instead of starting X, I'm looking > at blinking cursor. > > Reverting the patch on too of 919dce24701f7b3 fixes things for me. I have received no feedback from patch authors, and I believe we don't want to break boot in -rc1 on Intel hardware... so the commit should be simply reverted. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Linus, please revert 7dc8f11437: regression in 5.7-rc0, hangs while attempting to run X
Hi! > > > 7dc8f1143778a35b190f9413f228b3cf28f67f8d > > > > > > drm/i915/gem: Drop relocation slowpath > > > > > > Since the relocations are no longer performed under a global > > > struct_mutex, or any other lock, that is also held by pagefault > > > handlers, > > > we can relax and allow our fast path to take a fault. As we no longer > > > need to abort the fast path for lock avoidance, we no longer need the > > > slow path handling at all. > > > > > > causes regression on thinkpad x220: instead of starting X, I'm looking > > > at blinking cursor. > > > > > > Reverting the patch on too of 919dce24701f7b3 fixes things for me. > > > > I have received no feedback from patch authors, and I believe we don't > > want to break boot in -rc1 on Intel hardware... so the commit should > > be simply reverted. > > Beyond the fix already submitted? I did not get that one, can I have a pointer? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Linus, please revert 7dc8f11437: regression in 5.7-rc0, hangs while attempting to run X
Hi! > > > Beyond the fix already submitted? > > > > I did not get that one, can I have a pointer? > > What's the status of this one? I tried updating my kernel on April 3, that one did not work, but it did not include 721017cf4bd8. > I'm assuming the fix is commit 721017cf4bd8 ("drm/i915/gem: Ignore > readonly failures when updating relics"), but didn't see a reply to > the query or a confirmation of things working.. I pulled latest tree from Linus, and this one has 721017cf4bd8. Let my try to revert my revert, and test... yes, this one seems okay. Something changed in the X, now it seems that only one monitor is used for login, not both... but it now works. Best regards, Pavel PS: Hmm. This is not helpful. I guess this is "N". * * VDPA drivers * VDPA drivers (VDPA_MENU) [N/y/?] (NEW) ? There is no help available for this option. Symbol: VDPA_MENU [=n] Type : bool Defined at drivers/vdpa/Kconfig:9 Prompt: VDPA drivers Location: -> Device Drivers * * VHOST drivers * VHOST drivers (VHOST_MENU) [Y/n/?] (NEW) ? There is no help available for this option. Symbol: VHOST_MENU [=y] Type : bool Defined at drivers/vhost/Kconfig:21 Prompt: VHOST drivers Location: -> Device Drivers > Btw, Chris, that __put_user() not testing the error should at least > have a comment. We don't have a working "__must_check" for those > things (because they are subtle macros, not functions), but if we did, > we'd get a compiler warning for not checking the error value. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] 4.19-stable: Re: [PATCH 2/3] drm/i915: Break up error capture compression loops with cond_resched()
Hi! > As the error capture will compress user buffers as directed to by the > user, it can take an arbitrary amount of time and space. Break up the > compression loops with a call to cond_resched(), that will allow other > processes to schedule (avoiding the soft lockups) and also serve as a > warning should we try to make this loop atomic in the future. > > Signed-off-by: Chris Wilson > Cc: Mika Kuoppala > Cc: sta...@vger.kernel.org > Reviewed-by: Mika Kuoppala This was queued for 4.19-stable, but is very likely wrong. > @@ -397,6 +399,7 @@ static int compress_page(struct i915_vma_compress *c, > if (!(wc && i915_memcpy_from_wc(ptr, src, PAGE_SIZE))) > memcpy(ptr, src, PAGE_SIZE); > dst->pages[dst->page_count++] = ptr; > + cond_resched(); > > return 0; > } 4.19 compress_page begins with static int compress_page(struct compress *c, ... page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN); and likely may not sleep. That changed with commit a42f45a2a85998453078, but that one is not present in 4.19.. I believe we don't need this in stable: dumping of error file will not take so long to trigger softlockup detectors... and if userland access blocked, we would be able to reschedule, anyway. Best regards, Pavel -- http://www.livejournal.com/~pavelmachek signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] kernel: Expose SYS_kcmp by default
Hi! > Userspace has discovered the functionality offered by SYS_kcmp and has > started to depend upon it. In particular, Mesa uses SYS_kcmp for > os_same_file_description() in order to identify when two fd (e.g. device > or dmabuf) point to the same struct file. Since they depend on it for > core functionality, lift SYS_kcmp out of the non-default > CONFIG_CHECKPOINT_RESTORE into the selectable syscall category. Is it good idea to enable everything because Mesa uses it for file descriptors? This is really interesting syscall... Best regards, Pavel -- http://www.livejournal.com/~pavelmachek signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] udldrmfb: causes WARN in i915 on X60 (x86-32)
Hi! This is in -next, but I get same behaviour on 5.11; and no, udl does not work, but monitor is detected: pavel@amd:~/g/tui/crashled$ xrandr Screen 0: minimum 320 x 200, current 1024 x 768, maximum 4096 x 4096 LVDS1 connected 1024x768+0+0 (normal left inverted right x axis y axis) 246mm x 185mm 1024x768 50.00*+ 60.0040.00 800x600 60.3256.25 640x480 59.94 VGA1 disconnected (normal left inverted right x axis y axis) DVI-1-0 connected 1024x768+0+0 304mm x 228mm 1024x768 60.00*+ 75.03 800x600 75.0060.32 640x480 75.0059.94 720x400 70.08 1024x768 (0x45) 65.000MHz -HSync -VSync h: width 1024 start 1048 end 1184 total 1344 skew0 clock 48.36KHz v: height 768 start 771 end 777 total 806 clock 60.00Hz 800x600 (0x47) 40.000MHz +HSync +VSync h: width 800 start 840 end 968 total 1056 skew0 clock 37.88KHz v: height 600 start 601 end 605 total 628 clock 60.32Hz 640x480 (0x49) 25.175MHz -HSync -VSync h: width 640 start 656 end 752 total 800 skew0 clock 31.47KHz v: height 480 start 490 end 492 total 525 clock 59.94Hz pavel@amd:~/g/tui/crashled$ [13957.499755] wlan0: associated [13962.906368] udl 1-5:1.0: [drm] fb1: udldrmfb frame buffer device [13972.585101] [ cut here ] [13972.585117] WARNING: CPU: 0 PID: 3159 at kernel/dma/mapping.c:192 dma_map_sg_attrs+0x38/0x50 [13972.585137] Modules linked in: [13972.585149] CPU: 0 PID: 3159 Comm: Xorg Not tainted 5.11.0-next-20210223+ #176 [13972.585158] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011 [13972.585166] EIP: dma_map_sg_attrs+0x38/0x50 [13972.585176] Code: f0 01 00 00 00 74 23 ff 75 0c 53 e8 72 1b 00 00 5a 59 85 c0 78 1c 8b 5d fc c9 c3 8d b4 26 00 00 00 00 0f 0b 8d b6 00 00 00 00 <0f> 0b 31 c0 eb e6 66 90 0f 0b 8d b4 26 00 00 00 00 8d b4 26 00 00 [13972.585186] EAX: c296c41c EBX: ECX: 0055 EDX: dbbc4800 [13972.585194] ESI: c69f9ea0 EDI: d2c313c0 EBP: c5cbdda8 ESP: c5cbdda4 [13972.585202] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 [13972.585211] CR0: 80050033 CR2: b6b99000 CR3: 05d42000 CR4: 06b0 [13972.585219] Call Trace: [13972.585227] i915_gem_map_dma_buf+0xee/0x160 [13972.585240] dma_buf_map_attachment+0xb8/0x140 [13972.585251] drm_gem_prime_import_dev.part.0+0x33/0xc0 [13972.585262] ? drm_gem_shmem_create+0x10/0x10 [13972.585271] drm_gem_prime_import_dev+0x22/0x70 [13972.585280] drm_gem_prime_fd_to_handle+0x186/0x1c0 [13972.585289] ? drm_gem_prime_import_dev+0x70/0x70 [13972.585298] ? drm_prime_destroy_file_private+0x20/0x20 [13972.585307] drm_prime_fd_to_handle_ioctl+0x1c/0x30 [13972.585315] drm_ioctl_kernel+0x8e/0xe0 [13972.585325] ? drm_prime_destroy_file_private+0x20/0x20 [13972.585334] drm_ioctl+0x1fd/0x380 [13972.585343] ? drm_prime_destroy_file_private+0x20/0x20 [13972.585352] ? ksys_write+0x5c/0xd0 [13972.585363] ? vfs_write+0xeb/0x3f0 [13972.585371] ? drm_ioctl_kernel+0xe0/0xe0 [13972.585380] __ia32_sys_ioctl+0x369/0x7d0 [13972.585389] ? exit_to_user_mode_prepare+0x4e/0x170 [13972.585398] do_int80_syscall_32+0x2c/0x40 [13972.585409] entry_INT80_32+0x111/0x111 [13972.585419] EIP: 0xb7f68092 [13972.585427] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00 [13972.585436] EAX: ffda EBX: 0030 ECX: c00c642e EDX: bfaeda30 [13972.585444] ESI: 00915790 EDI: c00c642e EBP: 0030 ESP: bfaed9e4 [13972.585452] DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: 00200296 [13972.585461] ? asm_exc_nmi+0xcc/0x2bc [13972.585470] ---[ end trace 46a21fad0595bc89 ]--- pavel@amd:~/g/tui/crashled$ Any ideas? Best regards, Pavel -- http://www.livejournal.com/~pavelmachek signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] udldrm does not recover from powersave? Re: udldrmfb: causes WARN in i915 on X60 (x86-32)
Hi! > >This is in -next, but I get same behaviour on 5.11; and no, udl does > > Thanks for reporting. We are in the process of fixing the issue. The latest > patch is at [1]. > Thank you, that fixes the DMA issue, and I can use the udl. ...for a while. Then screensaver blanks laptop screen, udl screen blanks too. Upon hitting a key, internal screen shows up, udl does not. I try rerunning xrandr ... --auto, but could not recover it. Any ideas? Best regards, Pavel -- http://www.livejournal.com/~pavelmachek signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] udldrm does not recover from powersave? Re: udldrmfb: causes WARN in i915 on X60 (x86-32)
Hi! > > Thank you, that fixes the DMA issue, and I can use the udl. > > > > ...for a while. Then screensaver blanks laptop screen, udl screen > > blanks too. Upon hitting a key, internal screen shows up, udl does > > not. > > > > I try rerunning xrandr ... --auto, but could not recover it. > > > > Any ideas? > > Did it work before the regression? I don't know. I'm trying to get it to work, I basically did not use it before. > For testing, could you please remove the fix and then do > > git revert 6eb0233ec2d0 > > This would restore the old version. Please report back on the results. I doubt this is related, but I can try. Best regards, Pavel -- http://www.livejournal.com/~pavelmachek signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] udldrm does not recover from powersave? Re: udldrmfb: causes WARN in i915 on X60 (x86-32)
Hi! > > > > This is in -next, but I get same behaviour on 5.11; and no, udl does > > > > > > Thanks for reporting. We are in the process of fixing the issue. The > > > latest > > > patch is at [1]. > > > > > > > Thank you, that fixes the DMA issue, and I can use the udl. > > > > ...for a while. Then screensaver blanks laptop screen, udl screen > > blanks too. Upon hitting a key, internal screen shows up, udl does > > not. > > > > I try rerunning xrandr ... --auto, but could not recover it. > > > > Any ideas? > > Did it work before the regression? > > For testing, could you please remove the fix and then do > > git revert 6eb0233ec2d0 > > This would restore the old version. Please report back on the > results. Ok, I went to 7f206cf3ec2b with 6eb0233ec2d0 reverted. That fails to build: drivers/usb/core/message.c: In function ‘usb_set_configuration’: drivers/usb/core/message.c:2100:12: error: ‘struct device’ has no member named ‘dma_pfn_offset’ 2100 | intf->dev.dma_pfn_offset = dev->dev.dma_pfn_offset; |^ drivers/usb/core/message.c:2100:38: error: ‘struct device’ has no member named ‘dma_pfn_offset’ 2100 | intf->dev.dma_pfn_offset = dev->dev.dma_pfn_offset; | ^ CC drivers/net/ethernet/intel/e1000e/param.o make[3]: *** [scripts/Makefile.build:271: drivers/usb/core/message.o] Error 1 So I tried to go to bad commit's parent: git checkout 6eb0233ec2d0^ git log commit cf141ae989e2ff119cd320326da5923b480d1641 ARM/keystone: move the DMA offset handling under ifdef CONFIG_ARM_LPAE But that resulted in lockup soon after "--setprovidersource" command was isued. Best regards, Pavel -- http://www.livejournal.com/~pavelmachek signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] next-20200618: oops in eb_relocate_vma in Xorg process, making machine unusable
Hi! On thinkpad X60 (x86-32): I got this: Had to reboot Best regards, Pavel Jun 18 23:16:28 amd kernel: BUG: unable to handle page fault for address: f860 Jun 18 23:16:28 amd kernel: #PF: supervisor write access in kernel mode Jun 18 23:16:28 amd kernel: #PF: error_code(0x0002) - not-present page Jun 18 23:16:28 amd kernel: *pdpt = 319d7001 *pde = Jun 18 23:16:28 amd kernel: Oops: 0002 [#1] PREEMPT SMP PTI Jun 18 23:16:28 amd kernel: CPU: 0 PID: 2951 Comm: Xorg Not tainted 5.8.0-rc1-next-20200618+ #125 Jun 18 23:16:28 amd kernel: Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011 Jun 18 23:16:28 amd kernel: EIP: eb_relocate_vma+0xdee/0xf50 Jun 18 23:16:28 amd kernel: Code: 85 c0 fd ff ff ed ff ff ff c7 85 c4 fd ff ff ff ff ff ff 8b 85 c0 fd ff ff e9 33 f7 ff ff 8d b6 00 00 00 00 8b 85 d0 fd ff ff 03 01 00 40 10 89 43 04 8b 85 dc fd ff ff 89 43 08 e9 2c f6 ff Jun 18 23:16:28 amd kernel: EAX: 003095c8 EBX: f860 ECX: 012c8000 EDX: Jun 18 23:16:28 amd kernel: ESI: f1ad7cbc EDI: f1ad7b04 EBP: f1ad7c54 ESP: f1ad79ec Jun 18 23:16:28 amd kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 Jun 18 23:16:28 amd kernel: CR0: 80050033 CR2: f860 CR3: 31ada000 CR4: 06b0 Jun 18 23:16:28 amd kernel: Call Trace: Jun 18 23:16:28 amd kernel: ? __lock_acquire.isra.0+0x223/0x500 Jun 18 23:16:28 amd kernel: i915_gem_do_execbuffer+0x9a1/0x2a70 Jun 18 23:16:28 amd kernel: ? intel_runtime_pm_put_unchecked+0xd/0x10 Jun 18 23:16:28 amd kernel: ? i915_gem_gtt_pwrite_fast+0xf6/0x520 Jun 18 23:16:28 amd kernel: ? __lock_acquire.isra.0+0x223/0x500 Jun 18 23:16:28 amd kernel: ? cache_alloc_debugcheck_after+0x151/0x180 Jun 18 23:16:28 amd kernel: ? kvmalloc_node+0x69/0x80 Jun 18 23:16:28 amd kernel: ? __kmalloc+0x92/0x120 Jun 18 23:16:28 amd kernel: ? kvmalloc_node+0x69/0x80 Jun 18 23:16:28 amd kernel: i915_gem_execbuffer2_ioctl+0x1b9/0x3a0 Jun 18 23:16:28 amd kernel: ? drm_dev_exit+0xb/0x40 Jun 18 23:16:28 amd kernel: ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 Jun 18 23:16:28 amd kernel: drm_ioctl_kernel+0x91/0xe0 Jun 18 23:16:28 amd kernel: ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 Jun 18 23:16:28 amd kernel: drm_ioctl+0x1fd/0x371 Jun 18 23:16:28 amd kernel: ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 Jun 18 23:16:28 amd kernel: ? posix_get_monotonic_timespec+0x1d/0x80 Jun 18 23:16:28 amd kernel: ? drm_ioctl_kernel+0xe0/0xe0 Jun 18 23:16:28 amd kernel: ksys_ioctl+0x143/0x7d0 Jun 18 23:16:28 amd kernel: ? ktime_get_ts64+0x77/0x1d0 Jun 18 23:16:28 amd kernel: ? _copy_to_user+0x21/0x30 Jun 18 23:16:28 amd kernel: ? __prepare_exit_to_usermode+0xe5/0x110 Jun 18 23:16:28 amd kernel: __ia32_sys_ioctl+0x10/0x12 Jun 18 23:16:28 amd kernel: do_syscall_32_irqs_on+0x3a/0xf0 Jun 18 23:16:28 amd kernel: do_int80_syscall_32+0x9/0x20 Jun 18 23:16:28 amd kernel: entry_INT80_32+0x116/0x116 Jun 18 23:16:28 amd kernel: EIP: 0xb7f1c092 Jun 18 23:16:28 amd kernel: Code: Bad RIP value. Jun 18 23:16:28 amd kernel: EAX: ffda EBX: 000a ECX: c0406469 EDX: bf97792c Jun 18 23:16:28 amd kernel: ESI: b730a000 EDI: c0406469 EBP: 000a ESP: bf9778a4 Jun 18 23:16:28 amd kernel: DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: 00200292 Jun 18 23:16:28 amd kernel: ? asm_exc_nmi+0xcc/0x2bc Jun 18 23:16:28 amd kernel: Modules linked in: Jun 18 23:16:28 amd kernel: CR2: f860 Jun 18 23:16:28 amd kernel: ---[ end trace 216ff69b99738a0d ]--- Jun 18 23:16:28 amd kernel: EIP: eb_relocate_vma+0xdee/0xf50 Jun 18 23:16:28 amd kernel: Code: 85 c0 fd ff ff ed ff ff ff c7 85 c4 fd ff ff ff ff ff ff 8b 85 c0 fd ff ff e9 33 f7 ff ff 8d b6 00 00 00 00 8b 85 d0 fd ff ff 03 01 00 40 10 89 43 04 8b 85 dc fd ff ff 89 43 08 e9 2c f6 ff Jun 18 23:16:28 amd kernel: EAX: 003095c8 EBX: f860 ECX: 012c8000 EDX: Jun 18 23:16:28 amd kernel: ESI: f1ad7cbc EDI: f1ad7b04 EBP: f1ad7c54 ESP: f1ad79ec Jun 18 23:16:28 amd kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 Jun 18 23:16:28 amd kernel: CR0: 80050033 CR2: f860 CR3: 31ada000 CR4: 06b0 Jun 18 23:16:28 amd kernel: BUG: unable to handle page fault for address: f8602038 Jun 18 23:16:28 amd kernel: #PF: supervisor write access in kernel mode Jun 18 23:16:28 amd kernel: #PF: error_code(0x0002) - not-present page Jun 18 23:16:28 amd kernel: *pdpt = 2e39f001 *pde = Jun 18 23:16:28 amd kernel: Oops: 0002 [#2] PREEMPT SMP PTI Jun 18 23:16:28 amd kernel: CPU: 0 PID: 2951 Comm: Xorg Tainted: G D 5.8.0-rc1-next-20200618+ #125 Jun 18 23:16:28 amd kernel: Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011 Jun 18 23:16:28 amd kernel: EIP: n_tty_open+0x26/0x80 Jun 18 23:16:28 amd kernel: Code: 00 00 00 90 55 89 e5 56 53 89 c3 b8 f0 22 00 00 e8 0f 6a cb ff 85 c0 74 62 89 c6 a1 00 cd 25 c5 b9 c8 66 6b c5 ba a9 3b 11 c5 <89> 46 38 8d 86 58 22 00 00 e8 9c 66 c0 ff 8d 86 a4 22 00 00 b9 c0 Jun 18 23:16:28 a
[Intel-gfx] v5.8-rc1 on thinkpad x220, intel graphics: interface frozen, can still switch to text console
Hi! Linux duo 5.8.0-rc1+ #117 SMP PREEMPT Mon Jun 15 16:13:54 CEST 2020 x86_64 GNU/Linux [133747.719711] [ 17456] 0 17456 4166 271655360 0 sshd [133747.719718] [ 17466] 1000 17466 4166 289655360 0 sshd [133747.719724] [ 17468] 1000 17468 433587 303033 25886720 0 unison [133747.719730] [ 18023] 1000 18023 1316 16409600 0 sleep [133747.719737] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=chromium,pid=27368,uid=1000 [133747.719795] Out of memory: Killed process 27368 (chromium) total-vm:6686908kB, anon-rss:647056kB, file-rss:0kB, shmem-rss:7452kB, UID:1000 pgtables:5304kB oom_score_adj:300 [133747.799893] oom_reaper: reaped process 27368 (chromium), now anon-rss:0kB, file-rss:0kB, shmem-rss:6836kB [136841.820558] i915 :00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0 [136841.924333] i915 :00:02.0: [drm] Xorg[3016] context reset due to GPU hang Kernel is v5.8-rc1. Any ideas? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] v5.8-rc1 on thinkpad x220, intel graphics: interface frozen, can still switch to text console
On Mon 2020-06-22 10:13:13, Chris Wilson wrote: > Quoting Pavel Machek (2020-06-22 09:52:59) > > Hi! > > > > Linux duo 5.8.0-rc1+ #117 SMP PREEMPT Mon Jun 15 16:13:54 CEST 2020 x86_64 > > GNU/Linux > > > > [133747.719711] [ 17456] 0 17456 4166 271655360 > > 0 sshd > > [133747.719718] [ 17466] 1000 17466 4166 289655360 > > 0 sshd > > [133747.719724] [ 17468] 1000 17468 433587 303033 25886720 > > 0 unison > > [133747.719730] [ 18023] 1000 18023 1316 16409600 > > 0 sleep > > [133747.719737] > > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=chromium,pid=27368,uid=1000 > > [133747.719795] Out of memory: Killed process 27368 (chromium) > > total-vm:6686908kB, anon-rss:647056kB, file-rss:0kB, shmem-rss:7452kB, > > UID:1000 pgtables:5304kB oom_score_adj:300 > > [133747.799893] oom_reaper: reaped process 27368 (chromium), now > > anon-rss:0kB, file-rss:0kB, shmem-rss:6836kB > > [136841.820558] i915 :00:02.0: [drm] Resetting chip for stopped > > heartbeat on rcs0 > > [136841.924333] i915 :00:02.0: [drm] Xorg[3016] context reset due > > to GPU hang > > If that was the first occurrence it would have pointed to the error > state containing more information on the cause of the hang. > Attach /sys/class/drm/card0/error I rebooted in the meantime (I need this machine). I updated to 5.8-rc2, let me see if it appears again. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] thinkpad x60: oops in eb_relocate_dma in next-20200710
Hi! I attempted to suspend x60, but it did not work well... Machine is too messed up to pull more debug info from it :-(. Best regards, Pavel [11645.369495] wlan0: RX AssocResp from 5c:f4:ab:10:d2:bb (capab=0x411 status=0 aid=2) [11645.373180] wlan0: associated [12366.990398] BUG: unable to handle page fault for address: f8e01000 [12366.990406] #PF: supervisor write access in kernel mode [12366.990409] #PF: error_code(0x0002) - not-present page [12366.990412] *pdpt = 2a497001 *pde = [12366.990418] Oops: 0002 [#1] PREEMPT SMP PTI [12366.990424] CPU: 0 PID: 3016 Comm: Xorg Not tainted 5.8.0-rc4-next-20200710+ #129 [12366.990427] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011 [12366.990436] EIP: eb_relocate_vma+0xdee/0xf50 [12366.990441] Code: 85 c0 fd ff ff ed ff ff ff c7 85 c4 fd ff ff ff ff ff ff 8b 85 c0 fd ff ff e9 33 f7 ff ff 8d b6 00 00 00 00 8b 85 d0 fd ff ff 03 01 00 40 10 89 43 04 8b 85 dc fd ff ff 89 43 08 e9 2c f6 ff [12366.990445] EAX: 01246134 EBX: f8e01000 ECX: 013b9000 EDX: [12366.990448] ESI: eee57cbc EDI: eee57aa4 EBP: eee57c54 ESP: eee579ec [12366.990452] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 [12366.990456] CR0: 80050033 CR2: f8e01000 CR3: 3023 CR4: 06b0 [12366.990459] Call Trace: [12366.990469] ? shmem_getpage_gfp.isra.0+0x3ba/0x820 [12366.990477] i915_gem_do_execbuffer+0xa7b/0x2730 [12366.990479] ? intel_runtime_pm_put_unchecked+0xd/0x10 [12366.990479] ? i915_gem_gtt_pwrite_fast+0xf6/0x520 [12366.990479] ? __lock_acquire.isra.0+0x223/0x500 [12366.990479] ? cache_alloc_debugcheck_after+0x151/0x180 [12366.990479] ? kvmalloc_node+0x69/0x80 [12366.990479] ? __kmalloc+0x92/0x120 [12366.990479] ? kvmalloc_node+0x69/0x80 [12366.990479] i915_gem_execbuffer2_ioctl+0x1b9/0x3a0 [12366.990479] ? drm_dev_exit+0xb/0x40 [12366.990479] ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 [12366.990479] drm_ioctl_kernel+0x91/0xe0 [12366.990479] ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 [12366.990479] drm_ioctl+0x1fd/0x371 [12366.990479] ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 [12366.990479] ? posix_get_monotonic_timespec+0x1d/0x80 [12366.990479] ? drm_ioctl_kernel+0xe0/0xe0 [12366.990479] ksys_ioctl+0x143/0x7d0 [12366.990479] ? ktime_get_ts64+0x77/0x1d0 [12366.990479] ? _copy_to_user+0x21/0x30 [12366.990479] ? __prepare_exit_to_usermode+0xe5/0x110 [12366.990479] __ia32_sys_ioctl+0x10/0x12 [12366.990479] do_syscall_32_irqs_on+0x3a/0xf0 [12366.990479] do_int80_syscall_32+0x9/0x20 [12366.990479] entry_INT80_32+0x116/0x116 [12366.990479] EIP: 0xb7f94092 [12366.990479] Code: Bad RIP value. [12366.990479] EAX: ffda EBX: 000a ECX: c0406469 EDX: bf82313c [12366.990479] ESI: b7382000 EDI: c0406469 EBP: 000a ESP: bf8230b4 [12366.990479] DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: 00200296 [12366.990479] ? dev_proc_net_exit+0x10/0x40 [12366.990479] ? asm_exc_nmi+0xcc/0x2bc [12366.990479] Modules linked in: [12366.990479] CR2: f8e01000 [12366.990479] ---[ end trace d1eedfdf3b328098 ]--- [12366.990479] EIP: eb_relocate_vma+0xdee/0xf50 [12366.990479] Code: 85 c0 fd ff ff ed ff ff ff c7 85 c4 fd ff ff ff ff ff ff 8b 85 c0 fd ff ff e9 33 f7 ff ff 8d b6 00 00 00 00 8b 85 d0 fd ff ff 03 01 00 40 10 89 43 04 8b 85 dc fd ff ff 89 43 08 e9 2c f6 ff [12366.990479] EAX: 01246134 EBX: f8e01000 ECX: 013b9000 EDX: [12366.990479] ESI: eee57cbc EDI: eee57aa4 EBP: eee57c54 ESP: eee579ec [12366.990479] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 [12366.990479] CR0: 80050033 CR2: f8e01000 CR3: 3023 CR4: 06b0 [12366.996393] BUG: unable to handle page fault for address: f8e03038 [12366.996399] #PF: supervisor write access in kernel mode [12366.996402] #PF: error_code(0x0002) - not-present page [12366.996405] *pdpt = 339a4001 *pde = [12366.996411] Oops: 0002 [#2] PREEMPT SMP PTI [12366.996417] CPU: 0 PID: 3016 Comm: Xorg Tainted: G D 5.8.0-rc4-next-20200710+ #129 [12366.996420] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011 [12366.996429] EIP: n_tty_open+0x26/0x80 [12366.996434] Code: 00 00 00 90 55 89 e5 56 53 89 c3 b8 f0 22 00 00 e8 ef 68 cb ff 85 c0 74 62 89 c6 a1 00 2d 26 c5 b9 88 e7 6b c5 ba bd 9c 11 c5 <89> 46 38 8d 86 58 22 00 00 e8 9c 5e c0 ff 8d 86 a4 22 00 00 b9 80 [12366.996438] EAX: 002e07b0 EBX: f4a4bc00 ECX: c56be788 EDX: c5119cbd [12366.996441] ESI: f8e03000 EDI: EBP: eee57ee4 ESP: eee57edc [12366.996444] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210286 [12366.996448] CR0: 80050033 CR2: f8e03038 CR3: 33b98000 CR4: 06b0 [12366.996451] Call Trace: [12366.996457] tty_ldisc_open.isra.0+0x23/0x40 [12366.996461] tty_ldisc_reinit+0x99/0xe0 [12366.996465] tty_ldisc_hangup+0xc4/0x1e0 [12366.996470] __tty_hangup.part.0+0x13f/0x250 [12366.996476] tty_vhangup_session+0x11/0x20 [12366.996481] dis
[Intel-gfx] -next on 32-bit thinkpad x60: blinking screen, intel DRM responsible?
Hi! Next has been unusable for a while, but today I got dmesg. Screen is blinking, machine is very unhappy, and ssh is slow/hangs, but I got this: This is recurring patern, usually machine dies like this within 30 minutes of boot. [ 455.019838] perf: interrupt took too long (2509 > 2500), lowering kernel.perf_event_max_sample_rate to 79500 [ 752.720607] perf: interrupt took too long (3153 > 3136), lowering kernel.perf_event_max_sample_rate to 63250 [ 1235.055394] BUG: unable to handle page fault for address: f8801000 [ 1235.055408] #PF: supervisor write access in kernel mode [ 1235.055414] #PF: error_code(0x0002) - not-present page [ 1235.055420] *pdpt = 31ff2001 *pde = [ 1235.055436] Oops: 0002 [#1] PREEMPT SMP PTI [ 1235.055446] CPU: 1 PID: 3013 Comm: Xorg Not tainted 5.8.0-next-20200807+ #132 [ 1235.055453] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011 [ 1235.055466] EIP: eb_relocate_vma+0xdee/0xf50 [ 1235.055475] Code: 85 c0 fd ff ff ed ff ff ff c7 85 c4 fd ff ff ff ff ff ff 8b 85 c0 fd ff ff e9 33 f7 ff ff 8d b6 00 00 00 00 8b 85 d0 fd ff ff 03 01 00 40 10 89 43 04 8b 85 dc fd ff ff 89 43 08 e9 2c f6 ff [ 1235.055483] EAX: 009a706c EBX: f8801000 ECX: 0034e000 EDX: [ 1235.055490] ESI: f1fd5cd4 EDI: f1fd5a5c EBP: f1fd5c6c ESP: f1fd5a04 [ 1235.055498] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 [ 1235.055505] CR0: 80050033 CR2: f8801000 CR3: 31856000 CR4: 06b0 [ 1235.055511] Call Trace: [ 1235.055529] ? i915_vma_pin+0x2f4/0x850 [ 1235.055540] ? __mutex_unlock_slowpath+0x2b/0x2c0 [ 1235.055549] ? __active_retire+0x7e/0xd0 [ 1235.07] ? mutex_unlock+0xb/0x10 [ 1235.055564] ? i915_vma_pin+0x2f4/0x850 [ 1235.055573] ? eb_lookup_vmas+0x272/0x9f0 [ 1235.055581] i915_gem_do_execbuffer+0xa7b/0x2730 [ 1235.055595] ? intel_runtime_pm_put_unchecked+0xd/0x10 [ 1235.055602] ? i915_gem_gtt_pwrite_fast+0xf6/0x520 [ 1235.055613] ? __lock_acquire.isra.0+0x223/0x500 [ 1235.055624] ? cache_alloc_debugcheck_after+0x151/0x180 [ 1235.055632] ? kvmalloc_node+0x69/0x80 [ 1235.055639] ? __kmalloc+0x92/0x120 [ 1235.055646] ? kvmalloc_node+0x69/0x80 [ 1235.055654] i915_gem_execbuffer2_ioctl+0xdd/0x350 [ 1235.055662] ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 [ 1235.055671] drm_ioctl_kernel+0x91/0xe0 [ 1235.055679] ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 [ 1235.055686] drm_ioctl+0x1fd/0x371 [ 1235.055694] ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 [ 1235.055706] ? posix_get_monotonic_timespec+0x1d/0x80 [ 1235.055714] ? drm_ioctl_kernel+0xe0/0xe0 [ 1235.055723] __ia32_sys_ioctl+0x14b/0x7c6 [ 1235.055732] ? _copy_to_user+0x21/0x30 [ 1235.055742] ? exit_to_user_mode_prepare+0x53/0x100 [ 1235.055752] do_int80_syscall_32+0x2c/0x40 [ 1235.055761] entry_INT80_32+0x116/0x116 [ 1235.055768] EIP: 0xb7f08092 [ 1235.055776] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00 [ 1235.055784] EAX: ffda EBX: 000a ECX: c0406469 EDX: bfe9cd4c [ 1235.055791] ESI: b72f6000 EDI: c0406469 EBP: 000a ESP: bfe9ccc4 [ 1235.055798] DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: 00200292 [ 1235.055808] ? asm_exc_nmi+0xcc/0x2bc [ 1235.055813] Modules linked in: [ 1235.055823] CR2: f8801000 [ 1235.055833] ---[ end trace f487886b697d29e8 ]--- [ 1235.055840] EIP: eb_relocate_vma+0xdee/0xf50 [ 1235.055848] Code: 85 c0 fd ff ff ed ff ff ff c7 85 c4 fd ff ff ff ff ff ff 8b 85 c0 fd ff ff e9 33 f7 ff ff 8d b6 00 00 00 00 8b 85 d0 fd ff ff 03 01 00 40 10 89 43 04 8b 85 dc fd ff ff 89 43 08 e9 2c f6 ff [ 1235.055855] EAX: 009a706c EBX: f8801000 ECX: 0034e000 EDX: [ 1235.055862] ESI: f1fd5cd4 EDI: f1fd5a5c EBP: f1fd5c6c ESP: f1fd5a04 [ 1235.055870] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 [ 1235.055877] CR0: 80050033 CR2: f8801000 CR3: 31856000 CR4: 06b0 [ 1235.062533] BUG: unable to handle page fault for address: f8803038 -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Hi! After about half an hour of uptime, screen starts blinking on thinkpad x60 and machine becomes unusable. I already reported this in -next, and now it is in mainline. It is 32-bit x86 system. Pavel Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link local (bound): [undef] Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link remote: [AF_INET]87.138.219.28:1194 Aug 17 17:36:23 amd kernel: BUG: unable to handle page fault for address: f8601000 Aug 17 17:36:23 amd kernel: #PF: supervisor write access in kernel mode Aug 17 17:36:23 amd kernel: #PF: error_code(0x0002) - not-present page Aug 17 17:36:23 amd kernel: *pdpt = 318f2001 *pde = Aug 17 17:36:23 amd kernel: Oops: 0002 [#1] PREEMPT SMP PTI Aug 17 17:36:23 amd kernel: CPU: 1 PID: 3004 Comm: Xorg Not tainted 5.9.0-rc1+ #86 Aug 17 17:36:23 amd kernel: Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31 /2011 Aug 17 17:36:23 amd kernel: EIP: eb_relocate_vma+0xcf6/0xf20 Aug 17 17:36:23 amd kernel: Code: e9 ff f7 ff ff c7 85 c0 fd ff ff ed ff ff ff c7 85 c4 fd ff ff ff ff ff ff 8b 85 c0 fd ff ff e9 a5 f8 ff ff 8b 85 d0 fd ff ff 03 01 00 40 10 89 43 04 8b 85 b4 fd ff ff 89 43 08 e9 9f f7 ff Aug 17 17:36:23 amd kernel: EAX: 003c306c EBX: f8601000 ECX: 00847000 EDX: Aug 17 17:36:23 amd kernel: ESI: 00847000 EDI: EBP: f1947c68 ESP: f19479fc Aug 17 17:36:23 amd kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 Aug 17 17:36:23 amd kernel: CR0: 80050033 CR2: f8601000 CR3: 31a1e000 CR4: 06b0 Aug 17 17:36:23 amd kernel: Call Trace: Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0 Aug 17 17:36:23 amd kernel: ? __mutex_unlock_slowpath+0x2b/0x280 Aug 17 17:36:23 amd kernel: ? __active_retire+0x7e/0xd0 Aug 17 17:36:23 amd kernel: ? mutex_unlock+0xb/0x10 Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0 Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530 Aug 17 17:36:23 amd kernel: ? eb_lookup_vmas+0x1f5/0x9e0 Aug 17 17:36:23 amd kernel: i915_gem_do_execbuffer+0xaab/0x2780 Aug 17 17:36:23 amd kernel: ? _raw_spin_unlock_irqrestore+0x27/0x40 Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530 Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530 Aug 17 17:36:23 amd kernel: ? kvmalloc_node+0x69/0x70 Aug 17 17:36:23 amd kernel: i915_gem_execbuffer2_ioctl+0xdd/0x360 Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0 Aug 17 17:36:23 amd kernel: drm_ioctl_kernel+0x87/0xd0 Aug 17 17:36:23 amd kernel: drm_ioctl+0x1f4/0x38b Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0 Aug 17 17:36:23 amd kernel: ? posix_get_monotonic_timespec+0x1c/0x90 Aug 17 17:36:23 amd kernel: ? ktime_get_ts64+0x7a/0x1e0 Aug 17 17:36:23 amd kernel: ? drm_ioctl_kernel+0xd0/0xd0 Aug 17 17:36:23 amd kernel: __ia32_sys_ioctl+0x1ad/0x799 Aug 17 17:36:23 amd kernel: ? debug_smp_processor_id+0x12/0x20 Aug 17 17:36:23 amd kernel: ? exit_to_user_mode_prepare+0x4f/0x100 Aug 17 17:36:23 amd kernel: do_int80_syscall_32+0x2c/0x40 Aug 17 17:36:23 amd kernel: entry_INT80_32+0x111/0x111 Aug 17 17:36:23 amd kernel: EIP: 0xb7fbc092 Aug 17 17:36:23 amd kernel: Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00 Aug 17 17:36:23 amd kernel: EAX: ffda EBX: 000a ECX: c0406469 EDX: bff0ae3c Aug 17 17:36:23 amd kernel: ESI: b73aa000 EDI: c0406469 EBP: 000a ESP: bff0adb4 Aug 17 17:36:23 amd kernel: DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: 00200296 Aug 17 17:36:23 amd kernel: ? asm_exc_nmi+0xcc/0x2bc Aug 17 17:36:23 amd kernel: Modules linked in: Aug 17 17:36:23 amd kernel: CR2: f8601000 Aug 17 17:36:23 amd kernel: ---[ end trace 2ca9775068bbac06 ]--- -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Hi! > > I think there's been some discussion about reverting that change for > > other reasons, but it's quite likely the culprit. > > Hmm. It reverts cleanly, but the end result doesn't work, because of > other changes. > > Reverting all of > >763fedd6a216 ("drm/i915: Remove i915_gem_object_get_dirty_page()") >7ac2d2536dfa ("drm/i915/gem: Delete unused code") >9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only") > > seems to at least build. > > Pavel, does doing those three reverts make things work for you? Thanks. I got "[PATCH 1/2] drm/i915/gem: Replace reloc chain with terminator on..." in my inbox; I believe that's related. Let me try those, first. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915/gem: Replace reloc chain with terminator on error unwind
Hi! > If we hit an error during construction of the reloc chain, we need to > replace the chain into the next batch with the terminator so that upon > flushing the relocations so far, we do not execute a hanging batch. Thanks for the patches. I assume this should fix problem from "5.9-rc1: graphics regression moved from -next to mainline" thread. I have applied them over current -next, and my machine seems to be working so far (but uptime is less than 30 minutes). If the machine still works tommorow, I'll assume problem is solved. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915/gem: Replace reloc chain with terminator on error unwind
Hi! > > > If we hit an error during construction of the reloc chain, we need to > > > replace the chain into the next batch with the terminator so that upon > > > flushing the relocations so far, we do not execute a hanging batch. > > > > Thanks for the patches. I assume this should fix problem from > > "5.9-rc1: graphics regression moved from -next to mainline" thread. > > > > I have applied them over current -next, and my machine seems to be > > working so far (but uptime is less than 30 minutes). > > > > If the machine still works tommorow, I'll assume problem is solved. > > Aye, best wait until we have to start competing with Chromium for > memory... The suspicion is that it was the resource allocation failure > path. Yep, my machines are low on memory. But ... test did not work that well. I have dead X and blinking screen. Machine still works reasonably well over ssh, so I guess that's an improvement. Best regards, Pavel [ 5604.909393] ACPI: EC: event unblocked [ 5604.913590] usb usb2: root hub lost power or was reset [ 5604.913812] usb usb3: root hub lost power or was reset [ 5604.914046] usb usb4: root hub lost power or was reset [ 5604.918812] ata6: port disabled--ignoring [ 5604.925353] sd 0:0:0:0: [sda] Starting disk [ 5605.150042] thinkpad_acpi: ACPI backlight control delay disabled [ 5605.204955] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 5605.205931] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded [ 5605.205941] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out [ 5605.205949] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 5605.207748] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded [ 5605.207757] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out [ 5605.207765] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out [ 5605.208227] ata1.00: configured for UDMA/133 [ 5605.281913] usb 5-2: reset full-speed USB device number 3 using uhci_hcd [ 5605.569752] usb 5-1: reset full-speed USB device number 2 using uhci_hcd [ 5609.082771] PM: resume devices took 4.192 seconds [ 5609.083380] OOM killer enabled. [ 5609.083387] Restarting tasks ... done. [ 5609.103164] video LNXVIDEO:00: Restoring backlight state [ 5609.150144] PM: suspend exit [ 5609.190535] sdhci-pci :15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 5609.239495] sdhci-pci :15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 5609.287144] sdhci-pci :15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 5609.344497] sdhci-pci :15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 5611.426855] wlan0: authenticate with 5c:f4:ab:10:d2:bb [ 5611.430609] wlan0: send auth to 5c:f4:ab:10:d2:bb (try 1/3) [ 5611.432552] wlan0: authenticated [ 5611.433705] wlan0: associate with 5c:f4:ab:10:d2:bb (try 1/3) [ 5611.436440] wlan0: RX AssocResp from 5c:f4:ab:10:d2:bb (capab=0x411 status=0 aid=1) [ 5611.439083] wlan0: associated [ 7744.718473] BUG: unable to handle page fault for address: f8c0 [ 7744.718484] #PF: supervisor write access in kernel mode [ 7744.718487] #PF: error_code(0x0002) - not-present page [ 7744.718491] *pdpt = 31b0b001 *pde = [ 7744.718500] Oops: 0002 [#1] PREEMPT SMP PTI [ 7744.718506] CPU: 0 PID: 3004 Comm: Xorg Not tainted 5.9.0-rc1-next-20200819+ #134 [ 7744.718509] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011 [ 7744.718518] EIP: eb_relocate_vma+0xdbf/0xf20 [ 7744.718523] Code: 48 74 8b 41 08 89 41 0c 8b 85 a4 fd ff ff 89 95 a0 fd ff ff e8 c2 12 6c 00 8b 95 a0 fd ff ff e9 03 fc ff ff 8b 85 d0 fd ff ff 03 01 00 40 10 89 43 04 8b 85 dc fd ff ff 89 43 08 e9 4a f6 ff [ 7744.718527] EAX: 01397010 EBX: f8c0 ECX: 01247000 EDX: [ 7744.718531] ESI: f519cd80 EDI: f1ac1cd4 EBP: f1ac1c6c ESP: f1ac1a04 [ 7744.718535] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 [ 7744.718539] CR0: 80050033 CR2: f8c0 CR3: 31ac2000 CR4: 06b0 [ 7744.718543] Call Trace: [ 7744.718553] ? shmem_read_mapping_page_gfp+0x32/0x70 [ 7744.718560] ? eb_lookup_vmas+0x272/0x9f0 [ 7744.718565] i915_gem_do_execbuffer+0xa7b/0x2730 [ 7744.718573] ? intel_runtime_pm_put_unchecked+0xd/0x10 [ 7744.718578] ? i915_gem_gtt_pwrite_fast+0xf6/0x520 [ 7744.718586] ? __lock_acquire.isra.0+0x223/0x500 [ 7744.718592] ? cache_alloc_debugcheck_after+0x151/0x180 [ 7744.718596] ? kvmalloc_node+0x69/0x80 [ 7744.718600] ? __kmalloc+0x92/0x120 [ 7744.718604] ? kvmalloc_node+0x69/0x80 [ 7744.718608] i915_gem_execbuffer2_ioctl+0xdd/0x350 [ 7744.718613] ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 [ 7744.718619] drm_ioctl_kernel+0x91/0xe0 [ 7744.718623] ? i915_gem_execbuffer_ioctl+0x2a0/0x2a0 [ 7744.718627] drm_ioctl+0x1fd/0x371 [ 7744.718631] ? i915_gem_execbuffer_ioctl+0x2a0
Re: [Intel-gfx] [PATCH 1/2] drm/i915/gem: Replace reloc chain with terminator on error unwind
Hi! > > Yep, my machines are low on memory. > > > > But ... test did not work that well. I have dead X and blinking > > screen. Machine still works reasonably well over ssh, so I guess > > that's an improvement. > > > [ 7744.718473] BUG: unable to handle page fault for address: f8c0 > > [ 7744.718484] #PF: supervisor write access in kernel mode > > [ 7744.718487] #PF: error_code(0x0002) - not-present page > > [ 7744.718491] *pdpt = 31b0b001 *pde = > > [ 7744.718500] Oops: 0002 [#1] PREEMPT SMP PTI > > [ 7744.718506] CPU: 0 PID: 3004 Comm: Xorg Not tainted > > 5.9.0-rc1-next-20200819+ #134 > > [ 7744.718509] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) > > 03/31/2011 > > [ 7744.718518] EIP: eb_relocate_vma+0xdbf/0xf20 > > To save me guessing, paste the above location into > ./scripts/decode_stacktrace.sh ./vmlinux . ./drivers/gpu/drm/i915 > > The f8c0 is something running off the end of a kmap, but I didn't > spot a path were we would ignore an error and keep on writing. > Nevertheless it must exist. Like this? $ ./scripts/decode_stacktrace.sh ./vmlinux . ./drivers/gpu/drm/i915 f8c0 f8c0 eb_relocate_vma+0xdbf/0xf20 eb_relocate_vma (i915_gem_execbuffer.c:?) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Tue 2020-08-18 18:59:27, Linus Torvalds wrote: > On Tue, Aug 18, 2020 at 6:13 PM Dave Airlie wrote: > > > > I think there's been some discussion about reverting that change for > > other reasons, but it's quite likely the culprit. > > Hmm. It reverts cleanly, but the end result doesn't work, because of > other changes. > > Reverting all of > >763fedd6a216 ("drm/i915: Remove i915_gem_object_get_dirty_page()") >7ac2d2536dfa ("drm/i915/gem: Delete unused code") >9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only") > > seems to at least build. > > Pavel, does doing those three reverts make things work for you? Ok, so Chris' patches resulted in (less severe?) crash, let me try this. pavel@amd:/data/l/linux-next-32$ git reset --hard 8eb858df0a5f6bcd371b5d5637255c987278b8c9 HEAD is now at 8eb858df0a5f Add linux-next specific files for 20200819 pavel@amd:/data/l/linux-next-32$ git revert 763fedd6a216 Performing inexact rename detection: 100% (1212316/1212316), done. hint: Waiting for your editor to close the file... Editing file: /data/fast/l/linux-next-32/.git/COMMIT_EDITMSG /home/pavel/bin/emacsf: line 3: ed: command not found [detached HEAD 261cbba627b7] Revert "drm/i915: Remove i915_gem_object_get_dirty_page()" 2 files changed, 18 insertions(+) pavel@amd:/data/l/linux-next-32$ git revert 7ac2d2536dfa warning: inexact rename detection was skipped due to too many files. warning: you may want to set your merge.renamelimit variable to at least 3877 and retry the command. hint: Waiting for your editor to close the file... Editing file: /data/fast/l/linux-next-32/.git/COMMIT_EDITMSG /home/pavel/bin/emacsf: line 3: ed: command not found [detached HEAD 526af90ea811] Revert "drm/i915/gem: Delete unused code" 1 file changed, 19 insertions(+) pavel@amd:/data/l/linux-next-32$ git revert 9e0f9464e2ab warning: inexact rename detection was skipped due to too many files. warning: you may want to set your merge.renamelimit variable to at least 3877 and retry the command. hint: Waiting for your editor to close the file... Editing file: /data/fast/l/linux-next-32/.git/COMMIT_EDITMSG /home/pavel/bin/emacsf: line 3: ed: command not found [detached HEAD 173e46213949] Revert "drm/i915/gem: Async GPU relocations only" 2 files changed, 289 insertions(+), 27 deletions(-) pavel@amd:/data/l/linux-next-32$ It is now running, it seems unison is the thing that usually triggers this (due to memory pressure?). This time it survived unison (but without chromium). I'll really know if it works in day or two. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Hi! > > I think there's been some discussion about reverting that change for > > other reasons, but it's quite likely the culprit. > > Hmm. It reverts cleanly, but the end result doesn't work, because of > other changes. > > Reverting all of > >763fedd6a216 ("drm/i915: Remove i915_gem_object_get_dirty_page()") >7ac2d2536dfa ("drm/i915/gem: Delete unused code") >9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only") > > seems to at least build. > > Pavel, does doing those three reverts make things work for you? Yes, it seems they make things work. (Chris asked for new patch to be tested, so I am switching to his kernel, but it survived longer than it usually does.) Thanks and best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Thu 2020-08-20 09:16:18, Linus Torvalds wrote: > On Thu, Aug 20, 2020 at 2:23 AM Pavel Machek wrote: > > > > Yes, it seems they make things work. (Chris asked for new patch to be > > tested, so I am switching to his kernel, but it survived longer than > > it usually does.) > > Ok, so at worst we know how to solve it, at best the reverts won't be > needed because Chris' patch will fix the issue properly. > > So I'll archive this thread, but remind me if this hasn't gotten > sorted out in the later rc's. Yes, thank you, it seems we have a solution w/o the revert. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] mm: Track page table modifications in __apply_to_page_range()
Hi! > > > The __apply_to_page_range() function is also used to change and/or > > > allocate page-table pages in the vmalloc area of the address space. > > > Make sure these changes get synchronized to other page-tables in the > > > system by calling arch_sync_kernel_mappings() when necessary. > > > > There's no description here of the user-visible effects of the bug. > > Please always provide this, especially when proposing a -stable > > backport. Take pity upon all the downstream kernel maintainers who are > > staring at this wondering whether they should risk adding it to their > > kernels. > > The impact appears limited to x86-32, where apply_to_page_range may miss > updating the PMD. That leads to explosions in drivers like > > [ 24.227844] BUG: unable to handle page fault for address: fe036000 > [ 24.228076] #PF: supervisor write access in kernel mode > [ 24.228294] #PF: error_code(0x0002) - not-present page > [ 24.228494] *pde = > [ 24.228640] Oops: 0002 [#1] SMP > [ 24.228788] CPU: 3 PID: 1300 Comm: gem_concurrent_ Not tainted 5.9.0-rc1+ > #16 > [ 24.228957] Hardware name: /NUC6i3SYB, BIOS > SYSKLi35.86A.0024.2015.1027.2142 10/27/2015 > [ 24.229297] EIP: __execlists_context_alloc+0x132/0x2d0 [i915] > [ 24.229462] Code: 31 d2 89 f0 e8 2f 55 02 00 89 45 e8 3d 00 f0 ff ff 0f 87 > 11 01 00 00 8b 4d e8 03 4b 30 b8 5a 5a 5a 5a ba 01 00 00 00 8d 79 04 01 > 5a 5a 5a 5a c7 81 fc 0f 00 00 5a 5a 5a 5a 83 e7 fc 29 f9 81 > [ 24.229759] EAX: 5a5a5a5a EBX: f60ca000 ECX: fe036000 EDX: 0001 > [ 24.229915] ESI: f43b7340 EDI: fe036004 EBP: f6389cb8 ESP: f6389c9c > [ 24.230072] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010286 > [ 24.230229] CR0: 80050033 CR2: fe036000 CR3: 2d361000 CR4: 001506d0 > [ 24.230385] DR0: DR1: DR2: DR3: > [ 24.230539] DR6: fffe0ff0 DR7: 0400 > [ 24.230675] Call Trace: > [ 24.230957] execlists_context_alloc+0x10/0x20 [i915] > [ 24.231266] intel_context_alloc_state+0x3f/0x70 [i915] > [ 24.231547] __intel_context_do_pin+0x117/0x170 [i915] > [ 24.231850] i915_gem_do_execbuffer+0xcc7/0x2500 [i915] > [ 24.232024] ? __kmalloc_track_caller+0x54/0x230 > [ 24.232181] ? ktime_get+0x3e/0x120 > [ 24.232333] ? dma_fence_signal+0x34/0x50 > [ 24.232617] i915_gem_execbuffer2_ioctl+0xcd/0x1f0 [i915] > [ 24.232912] ? i915_gem_execbuffer_ioctl+0x2e0/0x2e0 [i915] > [ 24.233084] drm_ioctl_kernel+0x8f/0xd0 > [ 24.233236] drm_ioctl+0x223/0x3d0 > [ 24.233505] ? i915_gem_execbuffer_ioctl+0x2e0/0x2e0 [i915] > [ 24.233684] ? pick_next_task_fair+0x1b5/0x3d0 > [ 24.233873] ? __switch_to_asm+0x36/0x50 > [ 24.234021] ? drm_ioctl_kernel+0xd0/0xd0 > [ 24.234167] __ia32_sys_ioctl+0x1ab/0x760 > [ 24.234313] ? exit_to_user_mode_prepare+0xe5/0x110 > [ 24.234453] ? syscall_exit_to_user_mode+0x23/0x130 > [ 24.234601] __do_fast_syscall_32+0x3f/0x70 > [ 24.234744] do_fast_syscall_32+0x29/0x60 > [ 24.234885] do_SYSENTER_32+0x15/0x20 > [ 24.235021] entry_SYSENTER_32+0x9f/0xf2 > [ 24.235157] EIP: 0xb7f28559 > [ 24.235288] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 > 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a > 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76 > [ 24.235576] EAX: ffda EBX: 0005 ECX: c0406469 EDX: bf95556c > [ 24.235722] ESI: b7e68000 EDI: c0406469 EBP: 0005 ESP: bf9554d8 > [ 24.235869] DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: 0296 > [ 24.236018] Modules linked in: i915 x86_pkg_temp_thermal intel_powerclamp > crc32_pclmul crc32c_intel intel_cstate intel_uncore intel_gtt drm_kms_helper > intel_pch_thermal video button autofs4 i2c_i801 i2c_smbus fan > [ 24.236336] CR2: fe036000 > > It looks like kasan, xen and i915 are vulnerable. And actual impact is "on thinkpad X60 in 5.9-rc1, screen starts blinking after 30-or-so minutes, and macine is unusable"... that is assuming we are taking same bug. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] mm: Track page table modifications in __apply_to_page_range()
Hi! > > > The __apply_to_page_range() function is also used to change and/or > > > allocate page-table pages in the vmalloc area of the address space. > > > Make sure these changes get synchronized to other page-tables in the > > > system by calling arch_sync_kernel_mappings() when necessary. > > > > There's no description here of the user-visible effects of the bug. > > Please always provide this, especially when proposing a -stable > > backport. Take pity upon all the downstream kernel maintainers who are > > staring at this wondering whether they should risk adding it to their > > kernels. > > The impact appears limited to x86-32, where apply_to_page_range may miss > updating the PMD. That leads to explosions in drivers like Is this alone supposed to fix my problems with graphics on Thinkpad X60? Let me try... Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] mm: Track page table modifications in __apply_to_page_range()
Hi! > The __apply_to_page_range() function is also used to change and/or > allocate page-table pages in the vmalloc area of the address space. > Make sure these changes get synchronized to other page-tables in the > system by calling arch_sync_kernel_mappings() when necessary. > > Tested-by: Chris Wilson #x86-32 > Cc: # v5.8+ > Signed-off-by: Joerg Roedel This seems to solve screen blinking problems on Thinkpad X60. (It already survived few unison runs, which would usually kill it.). Tested-by: Pavel Machek Thanks and best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Hi! > >> It's a Thinkpad T520. > > > > Oh, so this is a 64-bit machine? Yeah, that patch to flush vmalloc > > ranges won't make any difference on x86-64. > > > > Or are you for some reason running a 32-bit kernel on that thing? Have > > you tried building a 64-bit one (user-space can be 32-bit, it should > > all just work. Knock wood). > > No, I run a 64-bit kernel with 64-bit userspace (Void Linux). > Config is attached, in case anything is obvious from that. For the record, I'm running 5.9.0-rc2-next-20200825 w/o further patches, and it behaves okay on that 32-bit thinkpad x60. BTW... could we get the test farms to occassionaly boot in 32-bit mode? Those modern CPUs can still do that :-). Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915/gem: Replace reloc chain with terminator on error unwind
Hi! > > > > Thanks for the patches. I assume this should fix problem from > > > > "5.9-rc1: graphics regression moved from -next to mainline" thread. > > > > > > > > I have applied them over current -next, and my machine seems to be > > > > working so far (but uptime is less than 30 minutes). > > > > > > > > If the machine still works tommorow, I'll assume problem is solved. > > > > > > Aye, best wait until we have to start competing with Chromium for > > > memory... The suspicion is that it was the resource allocation failure > > > path. > > > > Yep, my machines are low on memory. > > > > But ... test did not work that well. I have dead X and blinking > > screen. Machine still works reasonably well over ssh, so I guess > > that's an improvement. > > Well my last remaining 32bit gen3 device is currently pushing up the > daises, so could you try removing the attempt to use WC? Something like > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > @@ -955,10 +955,7 @@ static u32 *__reloc_gpu_map(struct reloc_cache *cache, > { > u32 *map; > > - map = i915_gem_object_pin_map(pool->obj, > - cache->has_llc ? > - I915_MAP_FORCE_WB : > - I915_MAP_FORCE_WC); > + map = i915_gem_object_pin_map(pool->obj, I915_MAP_FORCE_WB); > > on top of the previous patch. Faultinjection didn't turn up anything in > eb_relocate_vma, so we need to dig deeper. With this on top of other patches, it works. Tested-by: Pavel Machek Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Tue 2020-09-01 13:57:55, Harald Arnesen wrote: > Still (rc3) doesn't work without the three reverts. > > I'm not sure how to proceed, I cannot capture any oops, and see nothing > obvious in any logs. I believe this is the place when you ask Linus for reverts... Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC PATCH 1/6] drm: Add Content Protection property
On Wed 2017-11-29 22:08:56, Sean Paul wrote: > This patch adds a new optional connector property to allow userspace to enable > protection over the content it is displaying. This will typically be > implemented > by the driver using HDCP. > > The property is a tri-state with the following values: > - OFF: Self explanatory, no content protection > - DESIRED: Userspace requests that the driver enable protection > - ENABLED: Once the driver has authenticated the link, it sets this value > > The driver is responsible for downgrading ENABLED to DESIRED if the link > becomes > unprotected. The driver should also maintain the desiredness of protection > across hotplug/dpms/suspend. Why would user of the machine want this to be something else than 'OFF'? If kernel implements this, will it mean hardware vendors will have to prevent user from updating kernel on machines they own? If this is merged, does it open kernel developers to DMCA threats if they try to change it? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC PATCH 1/6] drm: Add Content Protection property
On Tue 2017-12-05 11:45:38, Daniel Vetter wrote: > On Tue, Dec 05, 2017 at 11:28:40AM +0100, Pavel Machek wrote: > > On Wed 2017-11-29 22:08:56, Sean Paul wrote: > > > This patch adds a new optional connector property to allow userspace to > > > enable > > > protection over the content it is displaying. This will typically be > > > implemented > > > by the driver using HDCP. > > > > > > The property is a tri-state with the following values: > > > - OFF: Self explanatory, no content protection > > > - DESIRED: Userspace requests that the driver enable protection > > > - ENABLED: Once the driver has authenticated the link, it sets this value > > > > > > The driver is responsible for downgrading ENABLED to DESIRED if the link > > > becomes > > > unprotected. The driver should also maintain the desiredness of protection > > > across hotplug/dpms/suspend. > > > > Why would user of the machine want this to be something else than > > 'OFF'? > > > > If kernel implements this, will it mean hardware vendors will have to > > prevent user from updating kernel on machines they own? > > > > If this is merged, does it open kernel developers to DMCA threats if > > they try to change it? > > Because this just implements one part of the content protection scheme. > This only gives you an option to enable HDCP (aka encryption, it's really > nothing else) on the cable. Just because it has Content Protection in the > name does _not_ mean it is (stand-alone) an effective nor complete content > protection scheme. It's simply encrypting data, that's all. Yep. So my first question was: why would user of the machine ever want encryption "ENABLED" or "DESIRED"? Could you answer it? > If you want to actually lock down a machine to implement content > protection, then you need secure boot without unlockable boot-loader and a > pile more bits in userspace. If you do all that, only then do you have > full content protection. And yes, then you don't really own the machine > fully, and I think users who are concerned with being able to update > their Yes, so... This patch makes it more likely to see machines with locked down kernels, preventing developers from working with systems their own, running hardware. That is evil, and direct threat to Free software movement. Users compiling their own kernels get no benefit from it. Actually it looks like this only benefits Intel and Disney. We don't want that. > kernels and be able to exercise their software freedoms already know to > avoid such locked down systems. > > So yeah it would be better to call this the "HDMI/DP cable encryption > support", but well, it's not what it's called really. Well, it does not belong in kernel, no matter what is the name. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC PATCH 1/6] drm: Add Content Protection property
Hi! > >> > Why would user of the machine want this to be something else than > >> > 'OFF'? > >> > > >> > If kernel implements this, will it mean hardware vendors will have to > >> > prevent user from updating kernel on machines they own? > >> > > >> > If this is merged, does it open kernel developers to DMCA threats if > >> > they try to change it? > >> > >> Because this just implements one part of the content protection scheme. > >> This only gives you an option to enable HDCP (aka encryption, it's really > >> nothing else) on the cable. Just because it has Content Protection in the > >> name does _not_ mean it is (stand-alone) an effective nor complete content > >> protection scheme. It's simply encrypting data, that's all. > > > > Yep. So my first question was: why would user of the machine ever want > > encryption "ENABLED" or "DESIRED"? Could you answer it? > > How about for sensitive video streams in government offices where you > want to avoid a spy potentially tapping the cable to see the video > stream? Except that spies already have the keys, as every monitor manufacturer has them? > >> kernels and be able to exercise their software freedoms already know to > >> avoid such locked down systems. > >> > >> So yeah it would be better to call this the "HDMI/DP cable encryption > >> support", but well, it's not what it's called really. > > > > Well, it does not belong in kernel, no matter what is the name. > > Should we remove support for encrypted file systems and encrypted > virtual machines? Just like them the option is there is you want to > use it. If you don't want to, you don't have to. Encrypted file systems benefit users. Encrypted video is designed to work against users. In particular, users don't have encryption keys for video they generate. I'd have nothing against feature that would let users encrypt video with keys they control. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] v4.20-rc1: list_del corruption on thinkpad x220
Hi! My machine locked hard (thinkpad x220). After reboot, I found this in syslog: Sounds like memory corruption..? Does not sound like easy to debug. ...otoh, it still looks like an addres, so maybe it is "just" race in GPU drivers? Any ideas? Pavel Nov 8 18:35:01 duo CRON[28511]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa 1 1 1) Nov 8 18:42:57 duo kernel: list_del corruption. prev->next should be 8801742b8178, but was c9000192fec8 Nov 8 18:42:57 duo kernel: [ cut here ] Nov 8 18:42:57 duo kernel: kernel BUG at /data/fast/l/k/lib/list_debug.c:53! Nov 8 18:42:57 duo kernel: invalid opcode: [#1] SMP PTI Nov 8 18:42:57 duo kernel: CPU: 2 PID: 1082 Comm: i915/signal:1 Not tainted 4.20.0-rc1+ #3 Nov 8 18:42:57 duo kernel: Hardware name: LENOVO 42872WU/42872WU, BIOS 8DET74WW (1.44 ) 03 /13/2018 Nov 8 18:42:57 duo kernel: RIP: 0010:__list_del_entry_valid+0x8e/0x90 Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 Nov 8 18:42:57 duo kernel: RSP: :c9000196be78 EFLAGS: 00210086 Nov 8 18:42:57 duo kernel: RAX: 0054 RBX: 8801742b8178 RCX: 00 00 Nov 8 18:42:57 duo kernel: RDX: RSI: 88019e2a53d8 RDI: 88019e2a53 d8 Nov 8 18:42:57 duo kernel: RBP: c9000196be78 R08: 880196e2cd10 R09: 00 00 Nov 8 18:42:57 duo kernel: R10: e7684eb9 R11: 3863656632393101 R12: c9000196be c8 Nov 8 18:42:57 duo kernel: R13: 88019707e000 R14: 8801742b8080 R15: c9000192fd d0 Nov 8 18:42:57 duo kernel: FS: () GS:88019e28() knlGS:000 0 Nov 8 18:42:57 duo kernel: CS: 0010 DS: ES: CR0: 80050033 Nov 8 18:42:57 duo kernel: CR2: ed2bf000 CR3: 0581e001 CR4: 000606a0 Nov 8 18:42:57 duo kernel: Call Trace: Nov 8 18:42:57 duo kernel: intel_breadcrumbs_signaler+0x162/0x330 Nov 8 18:42:57 duo kernel: kthread+0x116/0x150 Nov 8 18:42:57 duo kernel: ? intel_engine_wakeup+0x40/0x40 Nov 8 18:42:57 duo kernel: ? kthread_park+0x90/0x90 Nov 8 18:42:57 duo kernel: ret_from_fork+0x35/0x40 Nov 8 18:42:57 duo kernel: Modules linked in: Nov 8 18:42:57 duo kernel: ---[ end trace 2f8da183a56f80f6 ]--- Nov 8 18:42:57 duo kernel: RIP: 0010:__list_del_entry_valid+0x8e/0x90 Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 Nov 8 18:42:57 duo kernel: RSP: :c9000196be78 EFLAGS: 00210086 Nov 8 18:42:57 duo kernel: RAX: 0054 RBX: 8801742b8178 RCX: Nov 8 18:42:57 duo kernel: RDX: RSI: 88019e2a53d8 RDI: 88019e2a53d8 Nov 8 18:42:57 duo kernel: RBP: c9000196be78 R08: 880196e2cd10 R09: Nov 8 18:42:57 duo kernel: R10: e7684eb9 R11: 3863656632393101 R12: c9000196bec8 Nov 8 18:42:57 duo kernel: R13: 88019707e000 R14: 8801742b8080 R15: c9000192fdd0 Nov 8 18:42:57 duo kernel: FS: () GS:88019e28() knlGS: Nov 8 18:42:57 duo kernel: CS: 0010 DS: ES: CR0: 80050033 Nov 8 18:42:57 duo kernel: CR2: ed2bf000 CR3: 0581e001 CR4: 000606a0 -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] v4.20-rc1: list_del corruption on thinkpad x220
Hi! > > My machine locked hard (thinkpad x220). After reboot, I found this in > > syslog: > > > > Sounds like memory corruption..? Does not sound like easy to debug. > > Were you doing something GPU intense when you experienced the hard hang? > > And if so, have you been able to hit the issue more than once? At this > point it doesn't look like anything we've hit previously, so would be > great to have some more insight into how we could reproduce. I seen another crash since that, but I don't think it counts at "easily reproducible". I may have been running flightgear at that point. That's fairly GPU intensive. > There's one similar for nouveau in Bugzilla, but it seems like a genuine > memory corruption (1 bit flipped): > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > Any extra information would be of use :) > > Regards, Joonas > > PS. Could you open a bug to Bugzilla, it'll help to collect the > information in one consolidated place: > > https://01.org/linuxgraphics/documentation/how-report-bugs I prefer email... certainly for bugs that can't be reproduced. Best regards, Pavel > > > > ...otoh, it still looks like an addres, so maybe it is "just" race in > > GPU drivers? > > > > Any ideas? > > Pavel > > > > Nov 8 18:35:01 duo CRON[28511]: (root) CMD (command -v debian-sa1 > > > /dev/null && debian-sa > > 1 1 1) > > Nov 8 18:42:57 duo kernel: list_del corruption. prev->next should be > > 8801742b8178, but > > was c9000192fec8 > > Nov 8 18:42:57 duo kernel: [ cut here ] > > Nov 8 18:42:57 duo kernel: kernel BUG at > > /data/fast/l/k/lib/list_debug.c:53! > > Nov 8 18:42:57 duo kernel: invalid opcode: [#1] SMP PTI > > Nov 8 18:42:57 duo kernel: CPU: 2 PID: 1082 Comm: i915/signal:1 Not > > tainted 4.20.0-rc1+ #3 > > Nov 8 18:42:57 duo kernel: Hardware name: LENOVO 42872WU/42872WU, > > BIOS 8DET74WW (1.44 ) 03 > > /13/2018 > > Nov 8 18:42:57 duo kernel: RIP: > > 0010:__list_del_entry_valid+0x8e/0x90 > > Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48 > > c7 c7 90 74 5e 85 e8 > > 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff > > <0f> 0b 55 48 89 d0 48 > > 8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48 > > Nov 8 18:42:57 duo kernel: RSP: :c9000196be78 EFLAGS: > > 00210086 > > Nov 8 18:42:57 duo kernel: RAX: 0054 RBX: > > 8801742b8178 RCX: 00 > > 00 > > Nov 8 18:42:57 duo kernel: RDX: RSI: > > 88019e2a53d8 RDI: 88019e2a53 > > d8 > > Nov 8 18:42:57 duo kernel: RBP: c9000196be78 R08: > > 880196e2cd10 R09: 00 > > 00 > > Nov 8 18:42:57 duo kernel: R10: e7684eb9 R11: > > 3863656632393101 R12: c9000196be > > c8 > > Nov 8 18:42:57 duo kernel: R13: 88019707e000 R14: > > 8801742b8080 R15: c9000192fd > > d0 > > Nov 8 18:42:57 duo kernel: FS: () > > GS:88019e28() knlGS:000 > > 0 > > Nov 8 18:42:57 duo kernel: CS: 0010 DS: ES: CR0: > > 80050033 > > Nov 8 18:42:57 duo kernel: CR2: ed2bf000 CR3: > > 0581e001 CR4: 000606a0 > > Nov 8 18:42:57 duo kernel: Call Trace: > > Nov 8 18:42:57 duo kernel: intel_breadcrumbs_signaler+0x162/0x330 > > Nov 8 18:42:57 duo kernel: kthread+0x116/0x150 > > Nov 8 18:42:57 duo kernel: ? intel_engine_wakeup+0x40/0x40 > > Nov 8 18:42:57 duo kernel: ? kthread_park+0x90/0x90 > > Nov 8 18:42:57 duo kernel: ret_from_fork+0x35/0x40 > > Nov 8 18:42:57 duo kernel: Modules linked in: > > Nov 8 18:42:57 duo kernel: ---[ end trace 2f8da183a56f80f6 ]--- > > Nov 8 18:42:57 duo kernel: RIP: > > 0010:__list_del_entry_valid+0x8e/0x90 > > Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 > > 48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 > > 74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 > > 39 f2 75 19 48 8b 32 48 > > Nov 8 18:42:57 duo kernel: RSP: :c9000196be78 EFLAGS: > > 00210086 > > Nov 8 18:42:57 duo kernel: RAX: 0054 RBX: > > 8801742b8178 RCX: > > Nov 8 18:42:57 duo kernel: RDX: RSI: > > 88019e2a53d8 RDI: 88019e2a53d8 > > Nov 8 18:42:57 duo kernel: RBP: c9000196be78 R08: > > 880196e2cd10 R09: > > Nov 8 18:42:57 duo kernel: R10: e7684eb9 R11: > > 3863656632393101 R12: c9000196bec8 > > Nov 8 18:42:57 duo kernel: R13: 88019707e000 R14: > > 8801742b8080 R15: c9000192fdd0 > > Nov 8 18:42:57 duo kernel: FS: () > > GS:88019e28() knlGS: > > Nov 8 18:42:57 duo kernel: CS: 0010 DS: ES: CR0: > > 80050033
Re: [Intel-gfx] v4.20-rc1: list_del corruption on thinkpad x220
Hi! > > > There's one similar for nouveau in Bugzilla, but it seems like a genuine > > > memory corruption (1 bit flipped): > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=84880 > > > > > > Any extra information would be of use :) > > > > > > Regards, Joonas > > > > > > PS. Could you open a bug to Bugzilla, it'll help to collect the > > > information in one consolidated place: > > > > > > https://01.org/linuxgraphics/documentation/how-report-bugs > > > > I prefer email... certainly for bugs that can't be reproduced. > > By adding it to the Bugzilla it may be recognized by somebody else > who is experiencing a similar issue. Internet points are not deducted > for submitting bugs in good faith, even if they get closed as > NOTABUG. Feel free to copy from email to bugzilla :-). > It sounds like you've hit the same signature twice, so it may very well > be reproducible. Does flightgear have some demo mode where you could > leave it running a heavy scene overnight? I'm not sure if it was same signature twice. I had two lockups, but IIRC only investigated one. Not really a demo mode. I can put plane on autopilot, but eventually gas runs out. (And I guess window needs to be visible for test to be effective.) I tried today, but it did not crash. Do you have something else I could run to do the testing? > Were you running 4.19 kernel previously, distro one or vanilla? A full > dmesg from a boot would be appreciated (from kernel where you didn't > experience issues, and from one where you do). Recent kernels I'm running are self-compiled. > We actually have a well defined process and personnel to look into the > Bugzilla entries, so it'd still be helpful to have this logged to > Bugzilla. If I can reproduce it, it makes sense to create bugzilla entry. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [regression] Re: 4.11-rc0, thinkpad x220: GPU hang
On Mon 2017-03-06 12:23:41, Chris Wilson wrote: > On Mon, Mar 06, 2017 at 01:10:48PM +0100, Pavel Machek wrote: > > On Mon 2017-03-06 11:15:28, Chris Wilson wrote: > > > On Mon, Mar 06, 2017 at 12:01:51AM +0100, Pavel Machek wrote: > > > > Hi! > > > > > > > > > > mplayer stopped working after a while. Dmesg says: > > > > > > > > > > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at > > > > > > > > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to > > > > try? Bisect will be slow and nasty :-(. > > > > > > I came the conclusion that #99671 is the ring HEAD overtaking the TAIL, > > > and under the presumption that your bug matches (as the symptoms do): > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > index 4ffa35faff49..62e31a7438ac 100644 > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > @@ -782,10 +782,10 @@ static void i9xx_submit_request(struct > > > drm_i915_gem_request *request) > > > { > > > struct drm_i915_private *dev_priv = request->i915; > > > > > > - i915_gem_request_submit(request); > > > - > > > GEM_BUG_ON(!IS_ALIGNED(request->tail, 8)); > > > I915_WRITE_TAIL(request->engine, request->tail); > > > + > > > + i915_gem_request_submit(request); > > > } > > > > > > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, u32 > > > *cs) > > > > I applied it as: > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > > b/drivers/gpu/drm/i915/intel_ringbuffer.c > > index 91bc4ab..9c49c7a 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -1338,9 +1338,9 @@ static void i9xx_submit_request(struct > > drm_i915_gem_request *request) > > { > > struct drm_i915_private *dev_priv = request->i915; > > > > - i915_gem_request_submit(request); > > - > > I915_WRITE_TAIL(request->engine, request->tail); > > + > > + i915_gem_request_submit(request); > > } > > > > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, > > > > Hmm. But your next mail suggest that it may not be smart to try to > > boot it? :-). > > Don't bother, it'll promptly hang. Any news here? 4.11-rc5 is actually usable on the hardware (unlike -rc1), not sure what changed. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] GPU hang with kernel 4.10rc3
On Mon 2017-01-23 10:39:27, Juergen Gross wrote: > On 13/01/17 15:41, Juergen Gross wrote: > > On 12/01/17 10:21, Chris Wilson wrote: > >> On Thu, Jan 12, 2017 at 07:03:25AM +0100, Juergen Gross wrote: > >>> On 11/01/17 18:08, Chris Wilson wrote: > On Wed, Jan 11, 2017 at 05:33:34PM +0100, Juergen Gross wrote: > > With kernel 4.10rc3 running as Xen dm0 I get at each boot: > > > > [ 49.213697] [drm] GPU HANG: ecode 7:0:0x3d1d3d3d, in gnome-shell > > [1431], reason: Hang on render ring, action: reset > > [ 49.213699] [drm] GPU hangs can indicate a bug anywhere in the entire > > gfx stack, including userspace. > > [ 49.213700] [drm] Please file a _new_ bug report on > > bugs.freedesktop.org against DRI -> DRM/Intel > > [ 49.213700] [drm] drm/i915 developers can then reassign to the right > > component if it's not a kernel issue. > > [ 49.213700] [drm] The gpu crash dump is required to analyze gpu > > hangs, so please always attach it. > > [ 49.213701] [drm] GPU crash dump saved to /sys/class/drm/card0/error > > [ 49.213755] drm/i915: Resetting chip after gpu hang > > [ 60.213769] drm/i915: Resetting chip after gpu hang > > [ 71.189737] drm/i915: Resetting chip after gpu hang > > [ 82.165747] drm/i915: Resetting chip after gpu hang > > [ 93.205727] drm/i915: Resetting chip after gpu hang > > > > The dump is attached. > > That's a nasty one. The first couple of pages of the batchbuffer appear > to be overwritten. (Full of 0xc2c2c2c2, i.e. probably pixel data.) That > may be a concurrent write by either the GPU or CPU, or we may have > incorrected mapped a set of pages. That it doesn't recovered suggests > that the corruption occurs frequently, probably on every request/batch. > >>> > >>> I hoped someone would have an idea already. > >> > >> Sorry, first report of something like this in a long time (that I can > >> remember at least). And the problem is that it can be anything from a > >> coherency to a concurrency issue, so no one patch springs to mind. > >> Thankfully it appears to be kernel related. > >> -Chris > >> > > > > Bisecting took longer than I thought, but I had to cherry pick some > > patches and rebase one of them multiple times... > > > > Finally I found the commit to blame: 920cf4194954ec ("drm/i915: > > Introduce an internal allocator for disposable private objects") > > > > In case you need me to produce some more data or test a patch > > feel free to reach out. > > Anything new for this severe regression? > > Without a fix 4.10 will be unusable with Xen on a machine with i915 > graphics! Did this get solved? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] 4.7-rc0: redshift stopped working on intel display
Hi! It looks like redshift stopped working. Even pretty crazy settings have no visible effect: pavel@amd:~$ redshift -O 1500 -g 6.6:6.6:6.6 Using method `randr'. pavel@amd:~$ redshift -x Using method `randr'. pavel@amd:~$ uname -a Linux amd 4.6.0 #47 SMP Fri May 27 12:07:10 CEST 2016 x86_64 GNU/Linux pavel@amd:~$ redshift -O 5500 -g 6.6:6.6:6.6 Using method `randr'. pavel@amd:~$ redshift -O 5500 -g 6.6:6.6:6.6 -b .3 Using method `randr'. pavel@amd:~$ pavel@amd:~$ lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03) pavel@amd:~$ Any ideas? Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 4.7-rc0: redshift stopped working on intel display
On Sat 2016-05-28 12:12:06, Pavel Machek wrote: > Hi! > > It looks like redshift stopped working. Even pretty crazy settings > have no visible effect: > > pavel@amd:~$ redshift -O 1500 -g 6.6:6.6:6.6 > Using method `randr'. > pavel@amd:~$ redshift -x > Using method `randr'. > pavel@amd:~$ uname -a > Linux amd 4.6.0 #47 SMP Fri May 27 12:07:10 CEST 2016 x86_64 GNU/Linux > pavel@amd:~$ redshift -O 5500 -g 6.6:6.6:6.6 > Using method `randr'. > pavel@amd:~$ redshift -O 5500 -g 6.6:6.6:6.6 -b .3 > Using method `randr'. > pavel@amd:~$ > pavel@amd:~$ lspci | grep VGA > 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset > Integrated Graphics Controller (rev 03) > pavel@amd:~$ suspend-to-ram + resume cycle updates the display to match the settings. Not convenient, but... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 4.7-rc0: redshift stopped working on intel display
Hi! > Could you try to apply the following patch [1], hopefully this fixes > the issue for you. > > [1] https://patchwork.freedesktop.org/patch/89111/ I updated the kernel, applied the patch and yes, that helped. Thanks! Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] 4.0 -> 4.1 regression : after resume from s2ram both internal and external display of a docked ThinkPad ate black
> >>> commit e7d6f7d708290da1b7c92f533444b042c79412e0 > >>> Author: Dave Airlie > >>> Date: Mon Dec 8 13:23:37 2014 +1000 > >>> > >>> drm/i915: resume MST after reading back hw state > >> Is there anything else what I can do ? > >> > >> Current kernels up to 4.2.3 and 4.3-rc3 (not hardened) shows this issue > >> here at my system. > > > > Yes. Now you ask Dave Airlie to fix it. If that > > Dear Dave, > > please fix it. > > Here's a work around which works for me since kernel 4.1.x : Dave. You broke it. You fix it. Don't make me less polite? Daniel? Jani? Can you apply the patch below, or comment whats wrong with that? This is a regression, so it should not require much thinking. Pavel > diff --git a/drivers/gpu/drm/i915/i915_drv.c > b/drivers/gpu/drm/i915/i915_drv.c > index ab64d68..3aeead2 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -740,6 +740,8 @@ static int i915_drm_resume(struct drm_device *dev) > if (dev_priv->display.hpd_irq_setup) > dev_priv->display.hpd_irq_setup(dev); > spin_unlock_irq(&dev_priv->irq_lock); > + > + intel_dp_mst_resume(dev); > > drm_modeset_lock_all(dev); > intel_display_resume(dev); > > > > does not work, you ask him to fix it, in less polite words. If that > > does not work, you verify that reverting > > e7d6f7d708290da1b7c92f533444b042c79412e0 fixes it for you, then ask > > Daniel Vetter and Jani Nikula to revert it. If they fail to do that, > > you go all the way up to Linus. > > > > Good luck ;-), > > Pavel > > > > -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: Limit the busy wait on requests to 2us not 10ms!
Hi! > Reported-by: Jens Axboe > Link: https://lkml.org/lkml/2015/11/12/621 > Cc: Jens Axboe > Cc; "Rogozhkin, Dmitry V" > Cc: Daniel Vetter > Cc: Tvrtko Ursulin > Cc: Eero Tamminen > Cc: "Rantala, Valtteri" > Cc: sta...@kernel.vger.org > --- > drivers/gpu/drm/i915/i915_gem.c | 28 +--- > 1 file changed, 25 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 740530c571d1..2a88158bd1f7 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -1146,14 +1146,36 @@ static bool missed_irq(struct drm_i915_private > *dev_priv, > return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings); > } > > +static u64 local_clock_us(unsigned *cpu) > +{ > + u64 t; > + > + *cpu = get_cpu(); > + t = local_clock() >> 10; > + put_cpu(); > + > + return t; > +} > + > +static bool busywait_stop(u64 timeout, unsigned cpu) > +{ > + unsigned this_cpu; > + > + if (time_after64(local_clock_us(&this_cpu), timeout)) > + return true; > + > + return this_cpu != cpu; > +} Perhaps you want to ask the timekeeping people for the right primitive? I guess you are not the only one needing this.. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 4.8-rc1: it is now common that machine needs re-run of xrandr after resume
Hi! On Wed 2016-09-14 14:14:35, Jani Nikula wrote: > On Wed, 14 Sep 2016, Jani Nikula wrote: > > On Wed, 14 Sep 2016, Pavel Machek wrote: > >> For the "sometimes need xrandr after resume": I don't think I can > >> bisect that. It only happens sometimes :-(. But there's something > >> helpful in the logs: > > > >> [ 1856.218863] [drm:drm_edid_block_valid] *ERROR* EDID checksum is > >> invalid, remainder is 130 > >> [ 1856.218863] Raw EDID: > >> [ 1856.218863] 00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] [drm:drm_edid_block_valid] *ERROR* EDID checksum is > >> invalid, remainder is 130 > >> [ 1856.218863] Raw EDID: > >> [ 1856.218863] 00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] [drm:drm_edid_block_valid] *ERROR* EDID checksum is > >> invalid, remainder is 130 > >> [ 1856.218863] Raw EDID: > >> [ 1856.218863] 00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] [drm:drm_edid_block_valid] *ERROR* EDID checksum is > >> invalid, remainder is 130 > >> [ 1856.218863] Raw EDID: > >> [ 1856.218863] 00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> [ 1856.218863] i915 :00:02.0: HDMI-A-1: EDID block 0 invalid. > > > > Pavel, Martin, do you always see this when the display fails to resume? > > Is it HDMI/DVI for both of you? > > Please try this patch, backported from our next. Sorry, spam filter hidden your emails. I believe I still see the issue on v4.9-rc1. ... does it still make sense to retry the patch? What I also is re-aranged windows. So I get resume, I get both monitors, but I also see that X windows lost connection with the big monitor (and re-arranged my windows for me). Oh and I guess I should mention: 1) Yes, I only see the issue on the DVI output. VGA seems to work. 2) I do have power switch on the monitors, so it is possible that during resume, monitors have no AC power. (Not merely turned off by the soft switch. No AC power.) Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] v4.9-rc3: graphical artefacts in X
Hi! With v4.9, if I maximize "nowcast -x" application, I get broken display (as if someone split the window into rectangles and shuffled them a bit). Switching virtual desktops either fixes it or breaks it, depending in how fast I switch. (Yes, strange). pavel@amd:~$ lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03) pavel@amd:~$ Recompile, reboot, and graphics is okay. (On thinkpad x60, I have seen other strange problems, mostly when I maximize video, but not as easily reproducible). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] v4.9-rc3: graphical artefacts in X
On Fri 2016-11-18 11:14:16, Chris Wilson wrote: > On Fri, Nov 18, 2016 at 12:02:56PM +0100, Pavel Machek wrote: > > Hi! > > > > With v4.9, if I maximize "nowcast -x" application, I get broken > > display (as if someone split the window into rectangles and shuffled > > them a bit). Switching virtual desktops either fixes it or breaks it, > > depending in how fast I switch. (Yes, strange). > > The fix should have landed in v4.9-rc5 Yep, I just tested -rc5, and problem seems gone. Thanks! Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] 4.8-rc1: it is now common that machine needs re-run of xrandr after resume
Hi! I have 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03) In previous kernels, resume worked ok. With 4.8-rc1, I quite often (1 in 10 resumes?) get in state where primary monitor (DVI) is dead (in powersave) and all windows move to secondary monitor (VGA). Running "xrandr" fixes that. I'll update to newer rc and see if it happens again, but if you have any ideas, now would be good time. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 4.8-rc1: it is now common that machine needs re-run of xrandr after resume
Hi! > I have > > 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset > Integrated Graphics Controller (rev 03) > > In previous kernels, resume worked ok. With 4.8-rc1, I quite often (1 > in 10 resumes?) get in state where primary monitor (DVI) is dead (in > powersave) and all windows move to secondary monitor (VGA). Running > "xrandr" fixes that. > > I'll update to newer rc and see if it happens again, but if you have > any ideas, now would be good time. Ok. With -rc6, X are completely broken. I got notification "could not restore CRTC config for screen 63" or something like that, and window manager just does not start. X log is attached as delme, kernel log as delme2. Nothing too suspicious :-(. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [ 227.978] X.Org X Server 1.16.4 Release Date: 2014-12-20 [ 227.978] X Protocol Version 11, Revision 0 [ 227.978] Build Operating System: Linux 3.2.0-4-amd64 i686 Debian [ 227.978] Current Operating System: Linux amd 4.8.0-rc6 #59 SMP Tue Sep 13 22:55:13 CEST 2016 x86_64 [ 227.978] Kernel command line: BOOT_IMAGE=(hd0,2)/l/linux-64/arch/x86/boot/bzImage root=PARTUUID=bdb19d30-04 resume=PARTUUID=bdb19d30-01 [ 227.978] Build Date: 11 February 2015 01:14:26AM [ 227.978] xorg-server 2:1.16.4-1 (http://www.debian.org/support) [ 227.978] Current version of pixman: 0.32.6 [ 227.979]Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. [ 227.979] Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [ 227.979] (==) Log file: "/var/log/Xorg.0.log", Time: Tue Sep 13 23:01:42 2016 [ 227.979] (==) Using system config directory "/usr/share/X11/xorg.conf.d" [ 227.979] (==) No Layout section. Using the first Screen section. [ 227.979] (==) No screen section available. Using defaults. [ 227.979] (**) |-->Screen "Default Screen Section" (0) [ 227.979] (**) | |-->Monitor "" [ 227.979] (==) No monitor specified for screen "Default Screen Section". Using a default monitor configuration. [ 227.979] (==) Automatically adding devices [ 227.979] (==) Automatically enabling devices [ 227.979] (==) Automatically adding GPU devices [ 227.979] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist. [ 227.979]Entry deleted from font path. [ 227.980] (==) FontPath set to: /usr/share/fonts/X11/misc, /usr/share/fonts/X11/100dpi/:unscaled, /usr/share/fonts/X11/75dpi/:unscaled, /usr/share/fonts/X11/Type1, /usr/share/fonts/X11/100dpi, /usr/share/fonts/X11/75dpi, built-ins [ 227.980] (==) ModulePath set to "/usr/lib/xorg/modules" [ 227.980] (II) The server relies on udev to provide the list of input devices. If no devices become available, reconfigure udev or disable AutoAddDevices. [ 227.980] (II) Loader magic: 0x5683b700 [ 227.980] (II) Module ABI versions: [ 227.980]X.Org ANSI C Emulation: 0.4 [ 227.980]X.Org Video Driver: 18.0 [ 227.980]X.Org XInput driver : 21.0 [ 227.980]X.Org Server Extension : 8.0 [ 227.980] (II) xfree86: Adding drm device (/dev/dri/card0) [ 227.995] (--) PCI:*(0:0:2:0) 8086:2e32:8086:d614 rev 3, Mem @ 0xd000/4194304, 0xc000/268435456, I/O @ 0xf140/8, BIOS @ 0x/131072 [ 227.995] (--) PCI: (0:0:2:1) 8086:2e33:8086:d614 rev 3, Mem @ 0xd040/1048576 [ 227.995] (II) LoadModule: "glx" [ 227.997] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so [ 227.999] (II) Module glx: vendor="X.Org Foundation" [ 227.999]compiled for 1.16.4, module version = 1.0.0 [ 227.999]ABI class: X.Org Server Extension, version 8.0 [ 227.999] (==) AIGLX enabled [ 227.999] (==) Matched intel as autoconfigured driver 0 [ 227.999] (==) Matched intel as autoconfigured driver 1 [ 227.999] (==) Matched modesetting as autoconfigured driver 2 [ 227.999] (==) Matched fbdev as autoconfigured driver 3 [ 227.999] (==) Matched vesa as autoconfigured driver 4 [ 227.999] (==) Assigned the driver to the xf86ConfigLayout [ 227.999] (II) LoadModule: "intel" [ 227.999] (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so [ 228.000] (II) Module intel: vendor="X.Org Foundation" [ 228.000]compiled for 1.15.99.904, module version = 2.21.15 [ 228.000]Module class: X.Org Video Driver [ 228.000]ABI class: X.Org Video Driver, version 18.0 [ 228.000] (II) LoadModule: "modesetting" [ 228.000] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so [ 228.001] (II) Module modesetting: vendor="X.Org Foundation" [ 228.001]compiled for 1.16.4, module version = 0.9.0
Re: [Intel-gfx] 4.8-rc1: it is now common that machine needs re-run of xrandr after resume
On Tue 2016-09-13 22:38:45, Martin Steigerwald wrote: > Hi. > > Am Dienstag, 13. September 2016, 22:23:50 CEST schrieb Pavel Machek: > > I have > > > > 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset > > Integrated Graphics Controller (rev 03) > > 00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation > Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) > > Phoronix Test Suite system-info: ... > > In previous kernels, resume worked ok. With 4.8-rc1, I quite often (1 > > in 10 resumes?) get in state where primary monitor (DVI) is dead (in > > powersave) and all windows move to secondary monitor (VGA). Running > > "xrandr" fixes that. > > I have seen this in 4.8 up to rc5 as well. I am not sure yet about rc6 which > I > am currently running. Ok, it happened again today, with yesterdays version of 4.8-rc6. I'm glad I'm not the only one. Intel folks, any ideas? Can you reproduce it? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 4.8-rc1: it is now common that machine needs re-run of xrandr after resume
On Wed 2016-09-14 10:38:18, Jani Nikula wrote: > On Wed, 14 Sep 2016, Pavel Machek wrote: > > Hi! > > > >> I have > >> > >> 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset > >> Integrated Graphics Controller (rev 03) > >> > >> In previous kernels, resume worked ok. With 4.8-rc1, I quite often (1 > >> in 10 resumes?) get in state where primary monitor (DVI) is dead (in > >> powersave) and all windows move to secondary monitor (VGA). Running > >> "xrandr" fixes that. > >> > >> I'll update to newer rc and see if it happens again, but if you have > >> any ideas, now would be good time. > > > > Ok. With -rc6, X are completely broken. I got notification "could not > > restore CRTC config for screen 63" or something like that, and window > > manager just does not start. > > Ugh. Can you bisect from v4.7, assuming it worked? That's probably the > fastest way to resolve this. The "completely broken" part -- something broke in my userland, as booting to the old kernel does not fix it. I'll have to figure it out. For the "sometimes need xrandr after resume": I don't think I can bisect that. It only happens sometimes :-(. But there's something helpful in the logs: Best regards, Pavel [ 1856.213154] CPU1 is up [ 1856.213167] ACPI: Waking up from system sleep state S3 [ 1856.217998] clocksource: Switched to clocksource hpet [ 1856.218170] uhci_hcd :00:1d.0: System wakeup disabled by ACPI [ 1856.218470] uhci_hcd :00:1d.2: System wakeup disabled by ACPI [ 1856.218656] uhci_hcd :00:1d.1: System wakeup disabled by ACPI [ 1856.218665] uhci_hcd :00:1d.3: System wakeup disabled by ACPI [ 1856.218863] ehci-pci :00:1d.7: System wakeup disabled by ACPI [ 1856.218863] PM: noirq resume of devices complete after 19.597 msecs [ 1856.218863] PM: early resume of devices complete after 1.092 msecs [ 1856.218863] usb usb2: root hub lost power or was reset [ 1856.218863] usb usb3: root hub lost power or was reset [ 1856.218863] usb usb4: root hub lost power or was reset [ 1856.218863] usb usb5: root hub lost power or was reset [ 1856.218863] pcieport :00:1c.1: System wakeup disabled by ACPI [ 1856.218863] serial 00:03: activated [ 1856.218863] parport_pc 00:04: activated [ 1856.218863] rtc_cmos 00:05: System wakeup disabled by ACPI [ 1856.218863] ata2: port disabled--ignoring [ 1856.218863] r8169 :03:00.0 eth0: link down [ 1856.218863] sd 2:0:0:0: [sda] Starting disk [ 1856.218863] sd 2:0:1:0: [sdb] Starting disk [ 1856.218863] ata4.01: NODEV after polling detection [ 1856.218863] ata3.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out [ 1856.218863] ata3.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out [ 1856.218863] ata3.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out [ 1856.218863] ata3.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out [ 1856.218863] ata3.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out [ 1856.218863] ata3.00: ACPI cmd c6/00:10:00:00:00:a0 (SET MULTIPLE MODE) succeeded [ 1856.218863] ata3.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out [ 1856.218863] ata3.00: configured for UDMA/133 [ 1856.218863] ata4.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out [ 1856.218863] ata4.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out [ 1856.218863] ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out [ 1856.218863] ata3.01: configured for UDMA/133 [ 1856.218863] ata4.00: configured for UDMA/133 [ 1856.218863] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 130 [ 1856.218863] Raw EDID: [ 1856.218863] 00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 1856.218863] [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 130 [ 1856.218863] Raw EDID: [ 1856.218863] 00 ff ff ff ff ff ff 00 ff ff ff ff ff ff ff ff [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 1856.218863] ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[Intel-gfx] 4.11-rc0, thinkpad x220: GPU hang
Hi! mplayer stopped working after a while. Dmesg says: [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at usb-:00:1d.0-1.2, CDC Ethernet Device, 22:1b:e4:4e:56:f5 [ 3190.767227] [drm] GPU HANG: ecode 6:0:0xbb409fff, in chromium [4597], reason: Hang on render ring, action: reset [ 3190.767311] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 3190.767313] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 3190.767315] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 3190.767317] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 3190.767320] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 3190.767427] drm/i915: Resetting chip after gpu hang [ 3228.329384] cdc_ether 2-1.2:1.0 usb0: kevent 12 may have been dropped [ 3228.329604] cdc_ether 2-1.2:1.0 usb0: kevent 12 may have been dropped [ 3877.246261] perf: interrupt took too long (3142 > 3133), lowering kernel.perf_event_max_sample_rate to 63500 [ 4802.784478] drm/i915: Resetting chip after gpu hang [ 4810.784851] drm/i915: Resetting chip after gpu hang [ 4829.829795] drm/i915: Resetting chip after gpu hang [ 4837.826154] drm/i915: Resetting chip after gpu hang [ 5125.026814] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe A (start=308257 end=308258) time 203 us, min 763, max 767, scanline start 761, end 771 [ 5125.192602] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe B (start=307385 end=307386) time 204 us, min 1073, max 1079, scanline start 1071, end 1086 [ 5125.309992] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe A (start=308274 end=308275) time 203 us, min 763, max 767, scanline start 758, end 768 [ 5125.460013] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe A (start=308283 end=308284) time 204 us, min 763, max 767, scanline start 761, end 771 [ 5125.493340] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe A (start=308285 end=308286) time 202 us, min 763, max 767, scanline start 761, end 771 [ 5125.526684] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe A (start=308287 end=308288) time 204 us, min 763, max 767, scanline start 762, end 772 [ 5125.593245] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe A (start=308291 end=308292) time 203 us, min 763, max 767, scanline start 758, end 768 [ 5125.676636] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe A (start=308296 end=308297) time 202 us, min 763, max 767, scanline start 762, end 772 [ 5125.709960] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe A (start=308298 end=308299) time 203 us, min 763, max 767, scanline start 762, end 772 [ 5126.093109] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe A (start=308321 end=308322) time 204 us, min 763, max 767, scanline start 759, end 770 [ 5647.879171] drm/i915: Resetting chip after gpu hang [ 5655.879507] drm/i915: Resetting chip after gpu hang [ 5850.864464] drm/i915: Resetting chip after gpu hang [ 5858.864853] drm/i915: Resetting chip after gpu hang [ 5904.850879] drm/i915: Resetting chip after gpu hang [ 5912.851252] drm/i915: Resetting chip after gpu hang [ 5949.876973] drm/i915: Resetting chip after gpu hang [ 5957.877460] drm/i915: Resetting chip after gpu hang [ 6018.872153] drm/i915: Resetting chip after gpu hang [ 6030.872646] drm/i915: Resetting chip after gpu hang [ 7108.362610] perf: interrupt took too long (3935 > 3927), lowering kernel.perf_event_max_sample_rate to 50750 [ 9670.047072] drm/i915: Resetting chip after gpu hang [ 9678.047415] drm/i915: Resetting chip after gpu hang [10408.064806] drm/i915: Resetting chip after gpu hang [10416.097168] drm/i915: Resetting chip after gpu hang [10416.097181] [drm:i915_reset] *ERROR* GPU recovery failed pavel@duo:/data/film$ Umm. Dmesg wants me to attach card0/error, but it looks like it contains quite a lot of data. If it contains actual framebuffer content, it may not be wise to post to mailing list Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 4.0.8->4.1.3 : after resume from s2ram both internal and external display of a docked ThinkPad ate black
On Sun 2015-10-04 18:30:14, Toralf Förster wrote: > On 08/04/2015 02:29 PM, Toralf Förster wrote: > > On 08/02/2015 09:43 AM, Pavel Machek wrote: > >> Any chance to bisect it? > > Did it. > > > > FWIW: the mentioned commit was introduced between 3.18 and 3.19. > > But my system (hardened 64 bit Gentoo) did not suffer from it till version > > 4.0.8. > > The hardened kernel 4.1.x was the first where the bug was visible at my > > docked environment too. > > > > > > > > commit e7d6f7d708290da1b7c92f533444b042c79412e0 > > Author: Dave Airlie > > Date: Mon Dec 8 13:23:37 2014 +1000 > > > > drm/i915: resume MST after reading back hw state > > > > Otherwise the MST resume paths can hit DPMS paths > > which hit state checker paths, which hit WARN_ON, > > because the state checker is inconsistent with the > > hw. > > > > This fixes a bunch of WARN_ON's on resume after > > undocking. > > > > Signed-off-by: Dave Airlie > > Reviewed-by: Daniel Vetter > > Cc: sta...@vger.kernel.org > > Signed-off-by: Jani Nikula > > > > Is there anything else what I can do ? > > Current kernels up to 4.2.3 and 4.3-rc3 (not hardened) shows this issue here > at my system. Yes. Now you ask Dave Airlie to fix it. If that does not work, you ask him to fix it, in less polite words. If that does not work, you verify that reverting e7d6f7d708290da1b7c92f533444b042c79412e0 fixes it for you, then ask Daniel Vetter and Jani Nikula to revert it. If they fail to do that, you go all the way up to Linus. Good luck ;-), Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Suspend To RAM failure in >= 4.1 - bissected to "drm/i915: Track GEN6 page table usage"
Hi! > > I then ran a git bissect between v4.0 and v4.1 from Linus's tree and > > found the "guilty" commit was > > > > commit 317b4e903636305cfe702ab3e5b3d68547a69e72 > > Author: Ben Widawsky > > Date: Mon Mar 16 16:00:55 2015 + > > > > drm/i915: Extract context switch skip and add pd load logic > > Damnit, paste fail. > > I meant to paste : > > commit 678d96fbb3b5995a2fdff2bca5e1ab4a40b7e968 > Author: Ben Widawsky > Date: Mon Mar 16 16:00:56 2015 + > > drm/i915: Track GEN6 page table usage > > (as indicated in the title and in the git bisect log) Can you verify that reverting this patch (on top of 4.4?) fixes it? If so, is it time to revert it? Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [regression] Re: 4.11-rc0, thinkpad x220: GPU hang
Hi! > > mplayer stopped working after a while. Dmesg says: > > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to try? Bisect will be slow and nasty :-(. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [regression] Re: 4.11-rc0, thinkpad x220: GPU hang
On Mon 2017-03-06 11:15:28, Chris Wilson wrote: > On Mon, Mar 06, 2017 at 12:01:51AM +0100, Pavel Machek wrote: > > Hi! > > > > > > mplayer stopped working after a while. Dmesg says: > > > > > > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at > > > > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to > > try? Bisect will be slow and nasty :-(. > > I came the conclusion that #99671 is the ring HEAD overtaking the TAIL, > and under the presumption that your bug matches (as the symptoms do): > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 4ffa35faff49..62e31a7438ac 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -782,10 +782,10 @@ static void i9xx_submit_request(struct > drm_i915_gem_request *request) > { > struct drm_i915_private *dev_priv = request->i915; > > - i915_gem_request_submit(request); > - > GEM_BUG_ON(!IS_ALIGNED(request->tail, 8)); > I915_WRITE_TAIL(request->engine, request->tail); > + > + i915_gem_request_submit(request); > } > > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, u32 *cs) I applied it as: diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 91bc4ab..9c49c7a 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1338,9 +1338,9 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request) { struct drm_i915_private *dev_priv = request->i915; - i915_gem_request_submit(request); - I915_WRITE_TAIL(request->engine, request->tail); + + i915_gem_request_submit(request); } static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, Hmm. But your next mail suggest that it may not be smart to try to boot it? :-). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [regression] Re: 4.11-rc0, thinkpad x220: GPU hang
On Tue 2017-03-14 10:08:23, Thorsten Leemhuis wrote: > On 06.03.2017 00:01, Pavel Machek wrote: > >>> mplayer stopped working after a while. Dmesg says: > >>> > >>> [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at > > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to > > try? Bisect will be slow and nasty :-(. > > @Pavel, @Chris: What's the status of this? > > I added this report to the list of regressions for Linux 4.11. I'll try > to watch this thread for further updates on this issue to document > progress in my weekly reports. Please let me know in case the discussion > moves to a different place (bugzilla or another mail thread for > example). tia! We know where the bug is, but there's no fix for it. There was one patch, but it was quickly withdrawn. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [regression] Re: 4.11-rc0, thinkpad x220: GPU hang
Hi! > > > > > > mplayer stopped working after a while. Dmesg says: > > > > > > > > > > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at > > > > > > > > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to > > > > try? Bisect will be slow and nasty :-(. > > > > > > I came the conclusion that #99671 is the ring HEAD overtaking the TAIL, > > > and under the presumption that your bug matches (as the symptoms do): > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > index 4ffa35faff49..62e31a7438ac 100644 > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > @@ -782,10 +782,10 @@ static void i9xx_submit_request(struct > > > drm_i915_gem_request *request) > > > { > > > struct drm_i915_private *dev_priv = request->i915; > > > > > > - i915_gem_request_submit(request); > > > - > > > GEM_BUG_ON(!IS_ALIGNED(request->tail, 8)); > > > I915_WRITE_TAIL(request->engine, request->tail); > > > + > > > + i915_gem_request_submit(request); > > > } > > > > > > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, u32 > > > *cs) > > > > I applied it as: > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > > b/drivers/gpu/drm/i915/intel_ringbuffer.c > > index 91bc4ab..9c49c7a 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -1338,9 +1338,9 @@ static void i9xx_submit_request(struct > > drm_i915_gem_request *request) > > { > > struct drm_i915_private *dev_priv = request->i915; > > > > - i915_gem_request_submit(request); > > - > > I915_WRITE_TAIL(request->engine, request->tail); > > + > > + i915_gem_request_submit(request); > > } > > > > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, > > > > Hmm. But your next mail suggest that it may not be smart to try to > > boot it? :-). > > Don't bother, it'll promptly hang. Any news here? Is there something I can revert to get back to working system? Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [regression] Re: 4.11-rc0, thinkpad x220: GPU hang
On Mon 2017-03-06 12:23:41, Chris Wilson wrote: > On Mon, Mar 06, 2017 at 01:10:48PM +0100, Pavel Machek wrote: > > On Mon 2017-03-06 11:15:28, Chris Wilson wrote: > > > On Mon, Mar 06, 2017 at 12:01:51AM +0100, Pavel Machek wrote: > > > > Hi! > > > > > > > > > > mplayer stopped working after a while. Dmesg says: > > > > > > > > > > > > [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at > > > > > > > > Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to > > > > try? Bisect will be slow and nasty :-(. > > > > > > I came the conclusion that #99671 is the ring HEAD overtaking the TAIL, > > > and under the presumption that your bug matches (as the symptoms do): > > > ... > > static void i9xx_emit_breadcrumb(struct drm_i915_gem_request *req, > > > > Hmm. But your next mail suggest that it may not be smart to try to > > boot it? :-). > > Don't bother, it'll promptly hang. Any news here? Is there chance this is fixed in -rc4? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [Linux v4.10.0-rc1] call-traces after suspend-resume (pm? i915? cpu/hotplug?)
Hi! > [ Add some pm | i915 | x86 folks ] > > Hi, > > I have built Linux v4.10-rc1 today on my Ubuntu/precise AMD64 system > and I see some call-traces. > It is reproducible on suspend and resume. > > I cannot say which area touches the problem or if these are several > independent problems. > > For a full dmesg-log see attachments (my linux-config is attached, too). > > Here some hunks... > > [ 29.003601] BUG: sleeping function called from invalid context at > drivers/base/power/runtime.c:1032 > [ 29.003608] in_atomic(): 1, irqs_disabled(): 0, pid: 1469, name: Xorg > [ 29.003610] 1 lock held by Xorg/1469: > [ 29.003611] #0: (&dev->struct_mutex){+.+.+.}, at: > [] i915_mutex_lock_interruptible+0x43/0x140 [i915] > [ 29.003653] CPU: 0 PID: 1469 Comm: Xorg Not tainted > 4.10.0-rc1-1-iniza-small #1 > [ 29.003655] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > [ 29.003656] Call Trace: Just a note, at least 2 machines here refuse to resume with v4.10-rc1. One has intel graphics, one has AMD. It may or may not have common cause... Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] treewide: Convert del_timer*() to timer_shutdown*()
On Tue 2022-12-20 13:45:19, Steven Rostedt wrote: > [ > Linus, > > I ran the script against your latest master branch: > commit b6bb9676f2165d518b35ba3bea5f1fcfc0d969bf > > As the timer_shutdown*() code is now in your tree, I figured > we can start doing the conversions. At least add the trivial ones > now as Thomas suggested that this gets applied at the end of the > merge window, to avoid conflicts with linux-next during the > development cycle. I can wait to Friday to run it again, and > resubmit. > > What is the best way to handle this? > ] > > From: "Steven Rostedt (Google)" > > Due to several bugs caused by timers being re-armed after they are > shutdown and just before they are freed, a new state of timers was added > called "shutdown". After a timer is set to this state, then it can no > longer be re-armed. > > The following script was run to find all the trivial locations where > del_timer() or del_timer_sync() is called in the same function that the > object holding the timer is freed. It also ignores any locations where the > timer->function is modified between the del_timer*() and the free(), as > that is not considered a "trivial" case. > > This was created by using a coccinelle script and the following commands: LED parts looks good to me. Getting it in just before -rc1 would be best solution for me. Best regards, Pavel -- People of Russia, stop Putin before his war on Ukraine escalates. signature.asc Description: PGP signature
Re: Linux 6.10-rc1
Hi! > > Let's bring in the actual gpu people.. Dave/Jani/others - does any of > > this sound familiar? Pavel says things have gotten much slower in > > 6.10: "something was very wrong with the performance, likely to do > > with graphics" > > Actually, maybe it's not graphics at all. Rafael just sent me a pull > request that fixes a "turbo is disabled at boot, but magically enabled > at runtime by firmware" issue. > > The 6.10-rc1 kernel would notice that turbo was disabled, and stopped > noticing that it magically got re-enabled. > > Pavel, that was with a very different laptop, but who knows... That > would match the "laptop is much slower" thing. > > So current -git might be worth checking. Ok, let me check. That sounds like something that could make machine hotter. My problem seems to be that machine seems to run way hotter with 6.10, and when it hovers around the 97C limit, it is unusable with all the throttling. It gets unusable with 6.9 at 97C, too, it is just that it is harder to make it so hot with 6.9. (And yes, I'm running Chromium, and yes, that means websites influence this. Media playback also does, 1080p video pushes thermals close to the limits even on good kernels.) Thanks and best regards, Pavel -- People of Russia, stop Putin before his war on Ukraine escalates. signature.asc Description: PGP signature
Re: Linux 6.10-rc1
Hi! > > Let's bring in the actual gpu people.. Dave/Jani/others - does any of > > this sound familiar? Pavel says things have gotten much slower in > > 6.10: "something was very wrong with the performance, likely to do > > with graphics" > > Actually, maybe it's not graphics at all. Rafael just sent me a pull > request that fixes a "turbo is disabled at boot, but magically enabled > at runtime by firmware" issue. > > The 6.10-rc1 kernel would notice that turbo was disabled, and stopped > noticing that it magically got re-enabled. > > Pavel, that was with a very different laptop, but who knows... That > would match the "laptop is much slower" thing. > > So current -git might be worth checking. Is that: commit 0cac73eb3875f6ecb6105e533218dba1868d04c9 Merge: 94df82fe5bfd 350cbb5d2f67 Author: Linus Torvalds Date: Fri Jun 14 09:52:51 2024 -0700 Merge tag 'pm-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Restore the behavior of the no_turbo sysfs attribute in the intel_pstate driver which allowed users to make the driver start using turbo P-states if they have been enabled on the fly by the firmware after OS initialization (Rafael Wysocki)" * tag 'pm-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Check turbo_is_disabled() in store_no_turbo() ? I don't think I tweaking no_turbo in the sysfs. But the thermal stuff looks important: commit cee84c0b003f2e0f486f200a72eca2bcdb3a49a7 Merge: d20f6b3d747c b6846826982b Author: Linus Torvalds Date: Fri Jun 14 09:28:56 2024 -0700 Merge tag 'thermal-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm So I guess I'll have to try again. Thanks and best regards, Pavel -- People of Russia, stop Putin before his war on Ukraine escalates. signature.asc Description: PGP signature
Re: Linux 6.10-rc1
Hi! > > Let's bring in the actual gpu people.. Dave/Jani/others - does any of > > this sound familiar? Pavel says things have gotten much slower in > > 6.10: "something was very wrong with the performance, likely to do > > with graphics" > > Actually, maybe it's not graphics at all. Rafael just sent me a pull > request that fixes a "turbo is disabled at boot, but magically enabled > at runtime by firmware" issue. > > The 6.10-rc1 kernel would notice that turbo was disabled, and stopped > noticing that it magically got re-enabled. > > Pavel, that was with a very different laptop, but who knows... That > would match the "laptop is much slower" thing. > > So current -git might be worth checking. So... I went to (then) current -git and I don't want to replace my machine any more. So the problem should not exist in current mainline. (I did not have good objective data, so I'm not 100% sure problem was real in the first place. More like 90% sure.) Best regards, Pavel -- People of Russia, stop Putin before his war on Ukraine escalates. signature.asc Description: PGP signature
Re: Linux 6.10-rc1
Hi! > > > Let's bring in the actual gpu people.. Dave/Jani/others - does any of > > > this sound familiar? Pavel says things have gotten much slower in > > > 6.10: "something was very wrong with the performance, likely to do > > > with graphics" > > > > Actually, maybe it's not graphics at all. Rafael just sent me a pull > > request that fixes a "turbo is disabled at boot, but magically enabled > > at runtime by firmware" issue. > > > > The 6.10-rc1 kernel would notice that turbo was disabled, and stopped > > noticing that it magically got re-enabled. > > > > Pavel, that was with a very different laptop, but who knows... That > > would match the "laptop is much slower" thing. > > > > So current -git might be worth checking. > > So... I went to (then) current -git and I don't want to replace my > machine any more. So the problem should not exist in current mainline. > > (I did not have good objective data, so I'm not 100% sure problem was > real in the first place. More like 90% sure.) Ok, so machine is ready to be thrown out of window, again. Trying to play 29C3 video should not make machine completely unusable ... as in keyboard looses keystrokes in terminal. https://media.ccc.de/v/29c3-5333-en-gsm_cell_phone_network_review_h264#t=340 dmesg is kind-of unhappy: [130729.891961] usb 2-1.2.3: reset low-speed USB device number 13 using ehci-pci [130733.311644] usb 2-1.2.3: reset low-speed USB device number 13 using ehci-pci [130736.534601] i915 :00:02.0: [drm] *ERROR* Atomic update failure on pipe A (start=617818 end=617819) time 159 us, min 1017, max 1023, scanline start 1012, end 1024 [130738.625131] usb 2-1.2.3: reset low-speed USB device number 13 using ehci-pci [130745.451785] usb 2-1.2.3: reset low-speed USB device number 13 using ehci-pci ... [131631.941091] usb 2-1.2.3: reset low-speed USB device number 13 using ehci-pci [131634.817628] usb 2-1.2.3: reset low-speed USB device number 13 using ehci-pci [131639.536918] usb 2-1.2.3: reset low-speed USB device number 13 using ehci-pci [131790.153952] i915 :00:02.0: [drm] GPU HANG: ecode 6:1:95bc, in Xorg [3043] [131790.154245] i915 :00:02.0: [drm] GT0: Resetting chip for stopped heartbeat on rcs0 [131790.255994] i915 :00:02.0: [drm] Xorg[3043] context reset due to GPU hang Wifi is a bit too active, even on fairly idle system: 430 root -51 0 0 0 0 S 0.3 0.0 8:48.74 irq/17-iwlwifi Ideas welcome, especially some way to see what graphics is doing. Best regards, Pavel -- People of Russia, stop Putin before his war on Ukraine escalates. signature.asc Description: PGP signature
Re: Linux 6.10-rc1
Hi! > > Ok, so machine is ready to be thrown out of window, again. Trying to > > play 29C3 video should not make machine completely unusable ... as in > > keyboard looses keystrokes in terminal. > > Well, that at least sounds like you can bisect it with a very clear test-case? > > Even if you don't bisect all the way, just doing a handful of > bisections tends to narrow things down enough that we can at least > guess at what general kind of area it is... So... I guess I now know what went on. We got summer here, and I was running notebook closed. Appartently that affects cooling a _lot_. Open lid means more dust, but also better cooling... Best regards (and sorry for the noise), Pavel -- People of Russia, stop Putin before his war on Ukraine escalates. signature.asc Description: PGP signature
[Intel-gfx] 3.15-rc2: i915 regression: only top 20% of screen works in X
Hi! After update to 3.15-rc2, only top 20% of screen works on X. 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03) 00:02.1 Display controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03) Subsystem: Intel Corporation Device d614 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- This worked before. I believe it worked in 3.14. It definitely works in 3.11-rc2. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 3.15-rc2: i915 regression: only top 20% of screen works in X
On Wed 2014-04-23 23:09:45, Daniel Vetter wrote: > On Wed, Apr 23, 2014 at 10:22 PM, Pavel Machek wrote: > > After update to 3.15-rc2, only top 20% of screen works on X. > > > > 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset > > Integrated Graphics Controller (rev 03) > > > > 00:02.1 Display controller: Intel Corporation 4 Series Chipset > > Integrated Graphics Controller (rev 03) > >Subsystem: Intel Corporation Device d614 > >Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > >ParErr- Stepping- SERR- FastB2B- DisINTx- > >Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast > >>TAbort- SERR- >Latency: 0 > >Region 0: Memory at d040 (64-bit, non-prefetchable) > >[size=1M] > >Capabilities: > > > > This worked before. I believe it worked in 3.14. It definitely works > > in 3.11-rc2. > > Screenshot or more detailed description of what "only top 20% of > screen works in X" means? Well, top cca 20% is ok, then I got repeated pattern of some part of screen. > Anything in dmesg? That will take a look. > bisect result presuming that it reproduces reliably? If there's no other chance, I guess I could do bisect. But first Do you have similar hardware? Does it work for you? Are there any experimental changes that went in recently and I should try reverting first? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 3.15-rc2: i915 regression: only top 20% of screen works in X
Hi! > > After update to 3.15-rc2, only top 20% of screen works on X. > > > > 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset > > Integrated Graphics Controller (rev 03) > > > > 00:02.1 Display controller: Intel Corporation 4 Series Chipset > > Integrated Graphics Controller (rev 03) > >Subsystem: Intel Corporation Device d614 > >Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > >ParErr- Stepping- SERR- FastB2B- DisINTx- > >Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast > >>TAbort- SERR- >Latency: 0 > >Region 0: Memory at d040 (64-bit, non-prefetchable) > >[size=1M] > >Capabilities: > > > > This worked before. I believe it worked in 3.14. It definitely works > > in 3.11-rc2. > > Screenshot or more detailed description of what "only top 20% of > screen works in X" means? > Anything in dmesg? Actually yes, dmesg suggests it is quite sick. drivers/gpu/drm/drm_mm.c:767 warning triggered repeatedly. Also.. initial framebuffer does not work ; I don't seem to see anything before X start up. (This is Debian 6.0.9) Best regards, Pavel Initializing cgroup subsys cpu Linux version 3.15.0-rc1+ (pavel@amd) (gcc version 4.4.5 (Debian 4.4.5-8) ) #335 SMP Sat Apr 19 17:58:01 CEST 2014 e820: BIOS-provided physical RAM map: BIOS-e820: [mem 0x-0x0009ebff] usable BIOS-e820: [mem 0x0009ec00-0x0009] reserved BIOS-e820: [mem 0x000e-0x000f] reserved BIOS-e820: [mem 0x0010-0xbd87dfff] usable BIOS-e820: [mem 0xbd87e000-0xbd900fff] ACPI NVS BIOS-e820: [mem 0xbd901000-0xbda42fff] reserved BIOS-e820: [mem 0xbda43000-0xbda56fff] ACPI NVS BIOS-e820: [mem 0xbda57000-0xbdb54fff] reserved BIOS-e820: [mem 0xbdb55000-0xbdb5dfff] ACPI data BIOS-e820: [mem 0xbdb5e000-0xbdb67fff] ACPI NVS BIOS-e820: [mem 0xbdb68000-0xbdb88fff] reserved BIOS-e820: [mem 0xbdb89000-0xbdb8efff] ACPI NVS BIOS-e820: [mem 0xbdb8f000-0xbdcf] usable BIOS-e820: [mem 0xfec0-0xfec00fff] reserved BIOS-e820: [mem 0xfed0-0xfed00fff] reserved BIOS-e820: [mem 0xfed1c000-0xfed8] reserved BIOS-e820: [mem 0xfff0-0x] reserved BIOS-e820: [mem 0x0001-0x00013fff] usable Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel! SMBIOS 2.4 present. DMI: /DG41MJ, BIOS MJG4110H.86A.0006.2009.1223.1155 12/23/2009 e820: update [mem 0x-0x0fff] usable ==> reserved e820: remove [mem 0x000a-0x000f] usable e820: last_pfn = 0xbdd00 max_arch_pfn = 0x10 MTRR default type: uncachable MTRR fixed ranges enabled: 0-9 write-back A-E7FFF uncachable E8000-F write-protect MTRR variable ranges enabled: 0 base 0 mask F write-back 1 base 1 mask FC000 write-back 2 base 0BDD0 mask 0 write-through 3 base 0BDE0 mask FFFE0 uncachable 4 base 0BE00 mask FFE00 uncachable 5 base 0C000 mask FC000 uncachable 6 disabled x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 initial memory mapped: [mem 0x-0x053f] Base memory trampoline at [c009a000] 9a000 size 16384 init_memory_mapping: [mem 0x-0x000f] [mem 0x-0x000f] page 4k init_memory_mapping: [mem 0x3700-0x373f] [mem 0x3700-0x373f] page 4k BRK [0x05109000, 0x05109fff] PGTABLE init_memory_mapping: [mem 0x3000-0x36ff] [mem 0x3000-0x36ff] page 4k BRK [0x0510a000, 0x0510afff] PGTABLE BRK [0x0510b000, 0x0510bfff] PGTABLE BRK [0x0510c000, 0x0510cfff] PGTABLE BRK [0x0510d000, 0x0510dfff] PGTABLE BRK [0x0510e000, 0x0510efff] PGTABLE init_memory_mapping: [mem 0x0010-0x2fff] [mem 0x0010-0x2fff] page 4k init_memory_mapping: [mem 0x3740-0x377fdfff] [mem 0x3740-0x377fdfff] page 4k ACPI: RSDP 0x000F03C0 24 (v02 INTEL ) ACPI: XSDT 0xBDB5CE18 44 (v01 INTEL DG41MJ 0006 MSFT 00010013) ACPI: FACP 0xBDB5BD98 F4 (v04 INTEL DG41RQ 0006 MSFT 00010013) ACPI BIOS Warning (bug): 32/64X FACS address mismatch in FADT: 0xBDB5FF40/0xBDB64F40, using 64-bit address (20140214/tbfadt-271) ACPI: DSDT 0xBDB55018 005983 (v01 INTEL DG41MJ 0006 INTL 20051117) ACPI: FACS 0xBDB64F40 40 ACPI: APIC 0xBDB5BF18 6C (v02 INTEL DG41MJ 0006 MSFT 00010013) ACPI: MCFG 0xBDB66E18 3C (v01 INTEL DG41MJ 0006 MSFT 0097) ACPI: HPET 0xBDB66D98 38 (v01 INTEL DG41MJ 0006 AMI. 0003) ACPI: Local APIC address 0xfee0 2149MB HIGHMEM available. 887MB LOWMEM available. mapped low ram: 0 - 377fe000 low ram: 0 - 377fe000 Zone
Re: [Intel-gfx] 3.15-rc2: i915 regression: only top 20% of screen works in X
Hi! > > > > After update to 3.15-rc2, only top 20% of screen works on X. > > > > > > > > 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset > > > > Integrated Graphics Controller (rev 03) > > > > > > > > 00:02.1 Display controller: Intel Corporation 4 Series Chipset > > > > Integrated Graphics Controller (rev 03) > > > >Subsystem: Intel Corporation Device d614 > > > >Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > > > >ParErr- Stepping- SERR- FastB2B- DisINTx- > > > >Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast > > > >>TAbort- SERR- > > >Latency: 0 > > > >Region 0: Memory at d040 (64-bit, non-prefetchable) > > > >[size=1M] > > > >Capabilities: > > > > > > > > This worked before. I believe it worked in 3.14. It definitely works > > > > in 3.11-rc2. > > > > > > Screenshot or more detailed description of what "only top 20% of > > > screen works in X" means? > > > Anything in dmesg? > > > > Actually yes, dmesg suggests it is quite > > sick. drivers/gpu/drm/drm_mm.c:767 warning triggered > > repeatedly. Also.. initial framebuffer does not work ; I don't seem to > > see anything before X start up. (This is Debian 6.0.9) > > That says that i915.ko failed to initialise the GPU (or rather the GPU > wasn't responding) and bailed during module load. The key line here is > [drm:init_ring_common] *ERROR* render ring initialization failed ctl > 0001f001 head 2034 tail start 0012f000 Actually, I'm not using modules -- everything is build-in. Can you try that config? Perhaps you can then reproduce the failure. > Jiri has been seeing a similar issue creep in during resume, but it is > not reliable enough to bisect. Is your boot failure reliable enough to > bisect? Also drm-intel-nightly should mitigate this failure and allow > i915.ko to continue to load and run X, which would be worth testing to > make sure that works as intended. So far it failed 100% of time, but this is my main machine, so bringing it down for extended periods is no-no. Greetings to Prague :-). Jiri, do you have i915 a module or build-in? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 3.15-rc2: i915 regression: only top 20% of screen works in X
On Thu 2014-04-24 08:03:04, Daniel Vetter wrote: > On Thu, Apr 24, 2014 at 7:50 AM, Chris Wilson > wrote: > > That says that i915.ko failed to initialise the GPU (or rather the GPU > > wasn't responding) and bailed during module load. The key line here is > > > > [drm:init_ring_common] *ERROR* render ring initialization failed ctl > > 0001f001 head 2034 tail start 0012f000 > > > > Jiri has been seeing a similar issue creep in during resume, but it is > > not reliable enough to bisect. Is your boot failure reliable enough to > > bisect? Also drm-intel-nightly should mitigate this failure and allow > > i915.ko to continue to load and run X, which would be worth testing to > > make sure that works as intended. > > Oh right, g4x going beserk :( Apparently something changed in 3.15 > somewhere which made this much more likely, but like Chris said in > Jiri's case it's too unreliable to reproduce for a bisect. We've had > this come&go pretty much ever since kms support was merged and never > tracked it down. So far I went back to 3.14, and yes, graphics works there. Back to 3.15, put it to smaller monitor. This time, top 30% of screen works, and last working scanline is copied to the rest 60% of screen. [With the exception of mouse cursor, that somehow affects that magically.] > The bug is https://bugs.freedesktop.org/show_bug.cgi?id=76554 > > Like Chris said please test latest drm-intel-nighlty from > http://cgit.freedesktop.org/drm-intel to make sure that the recently > merged mitigation measures work properly. But those won't get your gpu > back, only the display and it's only for 3.16. We're still hunting a > proper fix for 3.15. So you know where the bug is or not? > And if you can indeed reliably reproduce this a bisect could be really useful. I'm confused. And yes, it seems reliable. But I'm not sure if I see same bug you are talking about. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 3.15-rc2: i915 regression: only top 20% of screen works in X
Hi! > Like Chris said please test latest drm-intel-nighlty from > http://cgit.freedesktop.org/drm-intel to make sure that the recently > merged mitigation measures work properly. But those won't get your gpu > back, only the display and it's only for 3.16. We're still hunting a What does it means "won't get my gpu back, just my display"? Gpu is the thing driving the display... no? I checked out drm-intel-nightly. Now I can see some kernel messages scrolling around on vga console (improvement?), but then end up with completely blank screen, and dead machine, AFAICT. Can't seem to ping it. BTW you might want to fix these... drivers/gpu/drm/i915/i915_cmd_parser.c: In function ‘i915_parse_cmds’: drivers/gpu/drm/i915/i915_cmd_parser.c:919: warning: format ‘%td’ expects type ‘ptrdiff_t’, but argument 5 has type ‘long unsigned int’ CC drivers/gpu/drm/i915/i915_gem_context.o LD [M] drivers/gpu/drm/udl/udl.o CC drivers/gpu/drm/i915/i915_gem_debug.o CC drivers/gpu/drm/i915/i915_gem_dmabuf.o CC drivers/gpu/drm/i915/i915_gem_evict.o CC drivers/gpu/drm/i915/i915_gem_execbuffer.o CC drivers/gpu/drm/i915/i915_gem_gtt.o drivers/gpu/drm/i915/i915_gem_gtt.c: In function ‘gen8_ggtt_insert_entries’: drivers/gpu/drm/i915/i915_gem_gtt.c:1389: warning: ‘addr’ may be used uninitialized in this function drivers/gpu/drm/i915/i915_gem_gtt.c: In function ‘gen6_ggtt_insert_entries’: drivers/gpu/drm/i915/i915_gem_gtt.c:1435: warning: ‘addr’ may be used uninitialized in this function Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [partial bisect] Re: 3.15-rc2: i915 regression: only top 20% of screen works in X
Hi! > And if you can indeed reliably reproduce this a bisect could be > really useful It seems reliable, yes. So far I got: Pavel git bisect start # good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14 git bisect good 455c6fdbd219161bd09b1165f11699d6d73de11c # bad: [c39b06951f1dc2e384650288676c5b7dcc0ec92c] DRM: armada: fix corruption while loading cursors git bisect bad c39b06951f1dc2e384650288676c5b7dcc0ec92c # good: [5fb6b953bb7aa86a9c8ea760934982cedc45c52b] include/linux/syscalls.h: add sys_renameat2() prototype git bisect good 5fb6b953bb7aa86a9c8ea760934982cedc45c52b # good: [8ad2bc9796994ecba9f4ba2fc9abca27ee9d193d] Merge branch 'drm-intel-next' of git://git.freedesktop.org/git/drm-intel into drm-next git bisect good 8ad2bc9796994ecba9f4ba2fc9abca27ee9d193d # good: [1287aa903f1b9248f34fb215e3b875d2ae243425] drm: Remove the prefix argument of drm_ut_debug_printk() git bisect good 1287aa903f1b9248f34fb215e3b875d2ae243425 # bad: [c8431fda9f9f3c3b7490cb44bd5720b494a2421e] drm/i915: don't get/put runtime PM at the debugfs forcewake file git bisect bad c8431fda9f9f3c3b7490cb44bd5720b494a2421e # bad: [db8384f2e07bfa8cc607914dfa0b3cee81f59839] drm/i915: don't get/put PC8 reference on freeze/thaw git bisect bad db8384f2e07bfa8cc607914dfa0b3cee81f59839 # bad: [b2040f6fed736ccd2319768bc59833abe74148b8] drm/i915: Remove erronous WARN in the vlv pipe crc code git bisect bad b2040f6fed736ccd2319768bc59833abe74148b8 # good: [301ea74a57851c19e1438ceeaffab663f402f79f] drm/i915: Allow HDMI+VGA cloning git bisect good 301ea74a57851c19e1438ceeaffab663f402f79f # bad: [02f6a1e750df8201561171c47472435557a65864] drm/i915: warn if ring is active before sync flush git bisect bad 02f6a1e750df8201561171c47472435557a65864 # good: [8407bb9129da95fc4099b84cdbbc23e6d4f66aee] drm/i915/bdw: Use scratch page table for GEN8 PPGTT git bisect good 8407bb9129da95fc4099b84cdbbc23e6d4f66aee -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [bisect result] Re: 3.15-rc2: i915 regression: only top 20% of screen works in X
Hi! > And if you can indeed reliably reproduce this a bisect could be really useful. And we have a winner here :-) Ok, it was not as painfull as I feared. It does not revert cleanly, but doing it by hand was not that bad. Best regards, Pavel a51435a3137ad8ae75c288c39bd2d8b2696bae8f is the first bad commit commit a51435a3137ad8ae75c288c39bd2d8b2696bae8f Author: Naresh Kumar Kachhi Date: Wed Mar 12 16:39:40 2014 +0530 drm/i915: disable rings before HW status page setup Rings should be idle before issuing sync_flush (in intel_ring_setup_status_page). This patch moves the ring disabling before doing the HW status page setup. Signed-off-by: Naresh Kumar Kachhi Reviewed-by: Chris Wilson Signed-off-by: Daniel Vetter :04 04 23f2763fe684882008ca395f38d0da1cf197a85e f1052c959b35a99ceae9886603086c09edee5df8 M drivers pavel@amd:/data/l/linux-good$ -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] fix i915 regression: only top 20% of screen works in X
Fix regression where only 20% of screen works in X. This is manual revert of a51435a3137ad8ae75c288c39bd2d8b2696bae8f. Signed-off-by: Pavel Machek diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 6bc68bd..cf63c67 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -447,6 +447,11 @@ static int init_ring_common(struct intel_ring_buffer *ring) gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL); + if (I915_NEED_GFX_HWS(dev)) + intel_ring_setup_status_page(ring); + else + ring_setup_phys_status_page(ring); + /* Stop the ring if it's running. */ I915_WRITE_CTL(ring, 0); I915_WRITE_HEAD(ring, 0); @@ -454,11 +459,6 @@ static int init_ring_common(struct intel_ring_buffer *ring) if (wait_for_atomic((I915_READ_MODE(ring) & MODE_IDLE) != 0, 1000)) DRM_ERROR("%s :timed out trying to stop ring\n", ring->name); - if (I915_NEED_GFX_HWS(dev)) - intel_ring_setup_status_page(ring); - else - ring_setup_phys_status_page(ring); - head = I915_READ_HEAD(ring) & HEAD_ADDR; /* G45 ring initialization fails to reset head to zero */ -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx