Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Tue 2020-09-01 13:57:55, Harald Arnesen wrote: > Still (rc3) doesn't work without the three reverts. > > I'm not sure how to proceed, I cannot capture any oops, and see nothing > obvious in any logs. I believe this is the place when you ask Linus for reverts... Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Still (rc3) doesn't work without the three reverts. I'm not sure how to proceed, I cannot capture any oops, and see nothing obvious in any logs. -- Hilsen Harald ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Hi! > >> It's a Thinkpad T520. > > > > Oh, so this is a 64-bit machine? Yeah, that patch to flush vmalloc > > ranges won't make any difference on x86-64. > > > > Or are you for some reason running a 32-bit kernel on that thing? Have > > you tried building a 64-bit one (user-space can be 32-bit, it should > > all just work. Knock wood). > > No, I run a 64-bit kernel with 64-bit userspace (Void Linux). > Config is attached, in case anything is obvious from that. For the record, I'm running 5.9.0-rc2-next-20200825 w/o further patches, and it behaves okay on that 32-bit thinkpad x60. BTW... could we get the test farms to occassionaly boot in 32-bit mode? Those modern CPUs can still do that :-). Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Wed, Aug 26, 2020 at 1:53 PM Harald Arnesen wrote: > > It's a Thinkpad T520. Oh, so this is a 64-bit machine? Yeah, that patch to flush vmalloc ranges won't make any difference on x86-64. Or are you for some reason running a 32-bit kernel on that thing? Have you tried building a 64-bit one (user-space can be 32-bit, it should all just work. Knock wood). Linus ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Dave Airlie [26.08.2020 22:47]: > On Thu, 27 Aug 2020 at 06:44, Harald Arnesen wrote: >> >> Linus Torvalds [26.08.2020 20:04]: >> >> > On Wed, Aug 26, 2020 at 2:30 AM Harald Arnesen wrote: >> >> Somehow related to lightdm or xfce4? However, it is a regression, since >> >> kernel 5.8 works. >> > Yeah, apparently there's something else wrong with the relocation changes >> > too. >> > >> > That said, does that patch at >> > >> > https://lore.kernel.org/intel-gfx/20200821123746.16904-1-j...@8bytes.org/ >> > >> > change things at all? If there are two independent bugs, maybe >> > applying that patch might at least give you an oops that gets saved in >> > the logs? >> > >> > (it might be worth waiting a bit after the machine locks up in case >> > the machine is alive enough so sync logs after a bit.. If ssh works, >> > that's obviously better yet) >> >> No, doesn't help. And I was wrong, ssh does not work at all when the >> display locks up. > > Did you say what hw you had? is it the same hw as Pavel or different? > > Dave. > It's a Thinkpad T520. Output from 'lspci' attached. -- Hilsen Harald 00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04) 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (Lewisville) (rev 04) 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4) 00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4) 00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b4) 00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b4) 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) 00:1f.0 ISA bridge: Intel Corporation QM67 Express Chipset LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Mobile SATA AHCI Controller (rev 04) 00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04) 03:00.0 Network controller: Intel Corporation Centrino Ultimate-N 6300 (rev 35) 0d:00.0 System peripheral: Ricoh Co Ltd PCIe SDXC/MMC Host Controller (rev 08) 0d:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller (rev 04) ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Thu, 27 Aug 2020 at 06:44, Harald Arnesen wrote: > > Linus Torvalds [26.08.2020 20:04]: > > > On Wed, Aug 26, 2020 at 2:30 AM Harald Arnesen wrote: > >> Somehow related to lightdm or xfce4? However, it is a regression, since > >> kernel 5.8 works. > > Yeah, apparently there's something else wrong with the relocation changes > > too. > > > > That said, does that patch at > > > > https://lore.kernel.org/intel-gfx/20200821123746.16904-1-j...@8bytes.org/ > > > > change things at all? If there are two independent bugs, maybe > > applying that patch might at least give you an oops that gets saved in > > the logs? > > > > (it might be worth waiting a bit after the machine locks up in case > > the machine is alive enough so sync logs after a bit.. If ssh works, > > that's obviously better yet) > > No, doesn't help. And I was wrong, ssh does not work at all when the > display locks up. Did you say what hw you had? is it the same hw as Pavel or different? Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Linus Torvalds [26.08.2020 20:04]: > On Wed, Aug 26, 2020 at 2:30 AM Harald Arnesen wrote: >> Somehow related to lightdm or xfce4? However, it is a regression, since >> kernel 5.8 works. > Yeah, apparently there's something else wrong with the relocation changes too. > > That said, does that patch at > > https://lore.kernel.org/intel-gfx/20200821123746.16904-1-j...@8bytes.org/ > > change things at all? If there are two independent bugs, maybe > applying that patch might at least give you an oops that gets saved in > the logs? > > (it might be worth waiting a bit after the machine locks up in case > the machine is alive enough so sync logs after a bit.. If ssh works, > that's obviously better yet) No, doesn't help. And I was wrong, ssh does not work at all when the display locks up. -- Hilsen Harald ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Wed, Aug 26, 2020 at 2:30 AM Harald Arnesen wrote: > > Somehow related to lightdm or xfce4? However, it is a regression, since > kernel 5.8 works. Yeah, apparently there's something else wrong with the relocation changes too. That said, does that patch at https://lore.kernel.org/intel-gfx/20200821123746.16904-1-j...@8bytes.org/ change things at all? If there are two independent bugs, maybe applying that patch might at least give you an oops that gets saved in the logs? (it might be worth waiting a bit after the machine locks up in case the machine is alive enough so sync logs after a bit.. If ssh works, that's obviously better yet) Linus ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Harald Arnesen [26.08.2020 10:36]: > I was wrong about ssh working. The whole machine locks up when X starts. > > A strange thing, sometimes I can log in from lightdm before it locks up, > sometimes I cannot even use the login screen. Timing related? > > If I don't start X, console login seems to work fine, and I see nothing > obvious in the logs or kernel messages. > > I will try to start just a window manager with startx instead of going > through lightdm. Disabled lightdm, started DE or WM from .xinitrc: xfce4-session: Machine locks up enlightenment: Machine works Somehow related to lightdm or xfce4? However, it is a regression, since kernel 5.8 works. -- Hilsen Harald ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Linus Torvalds [25.08.2020 20:19]: > On Tue, Aug 25, 2020 at 9:32 AM Harald Arnesen wrote: >> >> > For posterity, I'm told the fix is [1]. >> > >> > [1] >> > https://lore.kernel.org/intel-gfx/20200821123746.16904-1-j...@8bytes.org/ >> >> Doesn't fix it for me. As soon as I start XFCE, the mouse and keyboard >> freeezes. I can still ssh into the machine >> >> The three reverts (763fedd6a216, 7ac2d2536dfa and 9e0f9464e2ab) fixes >> the bug for me. > > Do you get any oops or other indication of what ends up going wrong? > Since ssh works that should be fairly easy to see. I was wrong about ssh working. The whole machine locks up when X starts. A strange thing, sometimes I can log in from lightdm before it locks up, sometimes I cannot even use the login screen. Timing related? If I don't start X, console login seems to work fine, and I see nothing obvious in the logs or kernel messages. I will try to start just a window manager with startx instead of going through lightdm. -- Hilsen Harald ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Linus Torvalds [25.08.2020 20:19]: >> Doesn't fix it for me. As soon as I start XFCE, the mouse and keyboard >> freeezes. I can still ssh into the machine >> >> The three reverts (763fedd6a216, 7ac2d2536dfa and 9e0f9464e2ab) fixes >> the bug for me. > Do you get any oops or other indication of what ends up going wrong? > Since ssh works that should be fairly easy to see. Away from the machine now, will check tomorrow morning (CET). -- Hilsen Harald ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Tue, Aug 25, 2020 at 9:32 AM Harald Arnesen wrote: > > > For posterity, I'm told the fix is [1]. > > > > [1] > > https://lore.kernel.org/intel-gfx/20200821123746.16904-1-j...@8bytes.org/ > > Doesn't fix it for me. As soon as I start XFCE, the mouse and keyboard > freeezes. I can still ssh into the machine > > The three reverts (763fedd6a216, 7ac2d2536dfa and 9e0f9464e2ab) fixes > the bug for me. Do you get any oops or other indication of what ends up going wrong? Since ssh works that should be fairly easy to see. Linus ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Jani Nikula [25.08.2020 11:55]: > On Fri, 21 Aug 2020, Pavel Machek wrote: >> On Thu 2020-08-20 09:16:18, Linus Torvalds wrote: >>> On Thu, Aug 20, 2020 at 2:23 AM Pavel Machek wrote: >>> > >>> > Yes, it seems they make things work. (Chris asked for new patch to be >>> > tested, so I am switching to his kernel, but it survived longer than >>> > it usually does.) >>> >>> Ok, so at worst we know how to solve it, at best the reverts won't be >>> needed because Chris' patch will fix the issue properly. >>> >>> So I'll archive this thread, but remind me if this hasn't gotten >>> sorted out in the later rc's. >> >> Yes, thank you, it seems we have a solution w/o the revert. > > For posterity, I'm told the fix is [1]. > > BR, > Jani. > > > [1] https://lore.kernel.org/intel-gfx/20200821123746.16904-1-j...@8bytes.org/ Doesn't fix it for me. As soon as I start XFCE, the mouse and keyboard freeezes. I can still ssh into the machine The three reverts (763fedd6a216, 7ac2d2536dfa and 9e0f9464e2ab) fixes the bug for me. -- Hilsen Harald ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Fri, 21 Aug 2020, Pavel Machek wrote: > On Thu 2020-08-20 09:16:18, Linus Torvalds wrote: >> On Thu, Aug 20, 2020 at 2:23 AM Pavel Machek wrote: >> > >> > Yes, it seems they make things work. (Chris asked for new patch to be >> > tested, so I am switching to his kernel, but it survived longer than >> > it usually does.) >> >> Ok, so at worst we know how to solve it, at best the reverts won't be >> needed because Chris' patch will fix the issue properly. >> >> So I'll archive this thread, but remind me if this hasn't gotten >> sorted out in the later rc's. > > Yes, thank you, it seems we have a solution w/o the revert. For posterity, I'm told the fix is [1]. BR, Jani. [1] https://lore.kernel.org/intel-gfx/20200821123746.16904-1-j...@8bytes.org/ -- Jani Nikula, Intel Open Source Graphics Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Thu 2020-08-20 09:16:18, Linus Torvalds wrote: > On Thu, Aug 20, 2020 at 2:23 AM Pavel Machek wrote: > > > > Yes, it seems they make things work. (Chris asked for new patch to be > > tested, so I am switching to his kernel, but it survived longer than > > it usually does.) > > Ok, so at worst we know how to solve it, at best the reverts won't be > needed because Chris' patch will fix the issue properly. > > So I'll archive this thread, but remind me if this hasn't gotten > sorted out in the later rc's. Yes, thank you, it seems we have a solution w/o the revert. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Thu, Aug 20, 2020 at 2:23 AM Pavel Machek wrote: > > Yes, it seems they make things work. (Chris asked for new patch to be > tested, so I am switching to his kernel, but it survived longer than > it usually does.) Ok, so at worst we know how to solve it, at best the reverts won't be needed because Chris' patch will fix the issue properly. So I'll archive this thread, but remind me if this hasn't gotten sorted out in the later rc's. Linus ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Hi! > > I think there's been some discussion about reverting that change for > > other reasons, but it's quite likely the culprit. > > Hmm. It reverts cleanly, but the end result doesn't work, because of > other changes. > > Reverting all of > >763fedd6a216 ("drm/i915: Remove i915_gem_object_get_dirty_page()") >7ac2d2536dfa ("drm/i915/gem: Delete unused code") >9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only") > > seems to at least build. > > Pavel, does doing those three reverts make things work for you? Yes, it seems they make things work. (Chris asked for new patch to be tested, so I am switching to his kernel, but it survived longer than it usually does.) Thanks and best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Tue 2020-08-18 18:59:27, Linus Torvalds wrote: > On Tue, Aug 18, 2020 at 6:13 PM Dave Airlie wrote: > > > > I think there's been some discussion about reverting that change for > > other reasons, but it's quite likely the culprit. > > Hmm. It reverts cleanly, but the end result doesn't work, because of > other changes. > > Reverting all of > >763fedd6a216 ("drm/i915: Remove i915_gem_object_get_dirty_page()") >7ac2d2536dfa ("drm/i915/gem: Delete unused code") >9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only") > > seems to at least build. > > Pavel, does doing those three reverts make things work for you? Ok, so Chris' patches resulted in (less severe?) crash, let me try this. pavel@amd:/data/l/linux-next-32$ git reset --hard 8eb858df0a5f6bcd371b5d5637255c987278b8c9 HEAD is now at 8eb858df0a5f Add linux-next specific files for 20200819 pavel@amd:/data/l/linux-next-32$ git revert 763fedd6a216 Performing inexact rename detection: 100% (1212316/1212316), done. hint: Waiting for your editor to close the file... Editing file: /data/fast/l/linux-next-32/.git/COMMIT_EDITMSG /home/pavel/bin/emacsf: line 3: ed: command not found [detached HEAD 261cbba627b7] Revert "drm/i915: Remove i915_gem_object_get_dirty_page()" 2 files changed, 18 insertions(+) pavel@amd:/data/l/linux-next-32$ git revert 7ac2d2536dfa warning: inexact rename detection was skipped due to too many files. warning: you may want to set your merge.renamelimit variable to at least 3877 and retry the command. hint: Waiting for your editor to close the file... Editing file: /data/fast/l/linux-next-32/.git/COMMIT_EDITMSG /home/pavel/bin/emacsf: line 3: ed: command not found [detached HEAD 526af90ea811] Revert "drm/i915/gem: Delete unused code" 1 file changed, 19 insertions(+) pavel@amd:/data/l/linux-next-32$ git revert 9e0f9464e2ab warning: inexact rename detection was skipped due to too many files. warning: you may want to set your merge.renamelimit variable to at least 3877 and retry the command. hint: Waiting for your editor to close the file... Editing file: /data/fast/l/linux-next-32/.git/COMMIT_EDITMSG /home/pavel/bin/emacsf: line 3: ed: command not found [detached HEAD 173e46213949] Revert "drm/i915/gem: Async GPU relocations only" 2 files changed, 289 insertions(+), 27 deletions(-) pavel@amd:/data/l/linux-next-32$ It is now running, it seems unison is the thing that usually triggers this (due to memory pressure?). This time it survived unison (but without chromium). I'll really know if it works in day or two. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Hi! > > I think there's been some discussion about reverting that change for > > other reasons, but it's quite likely the culprit. > > Hmm. It reverts cleanly, but the end result doesn't work, because of > other changes. > > Reverting all of > >763fedd6a216 ("drm/i915: Remove i915_gem_object_get_dirty_page()") >7ac2d2536dfa ("drm/i915/gem: Delete unused code") >9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only") > > seems to at least build. > > Pavel, does doing those three reverts make things work for you? Thanks. I got "[PATCH 1/2] drm/i915/gem: Replace reloc chain with terminator on..." in my inbox; I believe that's related. Let me try those, first. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Tue, Aug 18, 2020 at 6:13 PM Dave Airlie wrote: > > I think there's been some discussion about reverting that change for > other reasons, but it's quite likely the culprit. Hmm. It reverts cleanly, but the end result doesn't work, because of other changes. Reverting all of 763fedd6a216 ("drm/i915: Remove i915_gem_object_get_dirty_page()") 7ac2d2536dfa ("drm/i915/gem: Delete unused code") 9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only") seems to at least build. Pavel, does doing those three reverts make things work for you? Linus ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
On Wed, 19 Aug 2020 at 10:38, Linus Torvalds wrote: > > Ping on this? > > The code disassembles to > > 24: 8b 85 d0 fd ff ffmov-0x230(%ebp),%eax > 2a:* c7 03 01 00 40 10movl $0x1041,(%ebx) <-- trapping instruction > 30: 89 43 04 mov%eax,0x4(%ebx) > 33: 8b 85 b4 fd ff ffmov-0x24c(%ebp),%eax > 39: 89 43 08 mov%eax,0x8(%ebx) > 3c: e9jmp ... > > which looks like is one of the cases in __reloc_entry_gpu(). I *think* > it's this one: > > } else if (gen >= 3 && >!(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) { > *batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL; > *batch++ = addr; > *batch++ = target_addr; > > where that "batch" pointer is 0xf8601000, so it looks like it just > overflowed into the next page that isn't there. > > The cleaned-up call trace is > > drm_ioctl+0x1f4/0x38b -> > drm_ioctl_kernel+0x87/0xd0 -> > i915_gem_execbuffer2_ioctl+0xdd/0x360 -> > i915_gem_do_execbuffer+0xaab/0x2780 -> > eb_relocate_vma > > but there's a lot of inling going on, so.. > > The obvious suspect is commit 9e0f9464e2ab ("drm/i915/gem: Async GPU > relocations only") but that's going purely by "that seems to be the > main relocation change this mmrge window". I think there's been some discussion about reverting that change for other reasons, but it's quite likely the culprit. Maybe we can push for a revert sooner, (cc'ing more of i915 team). Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Ping on this? The code disassembles to 24: 8b 85 d0 fd ff ffmov-0x230(%ebp),%eax 2a:* c7 03 01 00 40 10movl $0x1041,(%ebx) <-- trapping instruction 30: 89 43 04 mov%eax,0x4(%ebx) 33: 8b 85 b4 fd ff ffmov-0x24c(%ebp),%eax 39: 89 43 08 mov%eax,0x8(%ebx) 3c: e9jmp ... which looks like is one of the cases in __reloc_entry_gpu(). I *think* it's this one: } else if (gen >= 3 && !(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) { *batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL; *batch++ = addr; *batch++ = target_addr; where that "batch" pointer is 0xf8601000, so it looks like it just overflowed into the next page that isn't there. The cleaned-up call trace is drm_ioctl+0x1f4/0x38b -> drm_ioctl_kernel+0x87/0xd0 -> i915_gem_execbuffer2_ioctl+0xdd/0x360 -> i915_gem_do_execbuffer+0xaab/0x2780 -> eb_relocate_vma but there's a lot of inling going on, so.. The obvious suspect is commit 9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only") but that's going purely by "that seems to be the main relocation change this mmrge window". Linus On Mon, Aug 17, 2020 at 9:11 AM Pavel Machek wrote: > > Hi! > > After about half an hour of uptime, screen starts blinking on thinkpad > x60 and machine becomes unusable. > > I already reported this in -next, and now it is in mainline. It is > 32-bit x86 system. > > > Pavel > > > Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link local (bound): > [undef] > Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link remote: > [AF_INET]87.138.219.28:1194 > Aug 17 17:36:23 amd kernel: BUG: unable to handle page fault for > address: f8601000 > Aug 17 17:36:23 amd kernel: #PF: supervisor write access in kernel > mode > Aug 17 17:36:23 amd kernel: #PF: error_code(0x0002) - not-present page > Aug 17 17:36:23 amd kernel: *pdpt = 318f2001 *pde = > > Aug 17 17:36:23 amd kernel: Oops: 0002 [#1] PREEMPT SMP PTI > Aug 17 17:36:23 amd kernel: CPU: 1 PID: 3004 Comm: Xorg Not tainted > 5.9.0-rc1+ #86 > Aug 17 17:36:23 amd kernel: Hardware name: LENOVO 17097HU/17097HU, > BIOS 7BETD8WW (2.19 ) 03/31 > /2011 > Aug 17 17:36:23 amd kernel: EIP: eb_relocate_vma+0xcf6/0xf20 > Aug 17 17:36:23 amd kernel: Code: e9 ff f7 ff ff c7 85 c0 fd ff ff ed > ff ff ff c7 85 c4 fd ff > ff ff ff ff ff 8b 85 c0 fd ff ff e9 a5 f8 ff ff 8b 85 d0 fd ff ff > 03 01 00 40 10 89 43 04 > 8b 85 b4 fd ff ff 89 43 08 e9 9f f7 ff > Aug 17 17:36:23 amd kernel: EAX: 003c306c EBX: f8601000 ECX: 00847000 > EDX: > Aug 17 17:36:23 amd kernel: ESI: 00847000 EDI: EBP: f1947c68 > ESP: f19479fc > Aug 17 17:36:23 amd kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: > 0068 EFLAGS: 00210246 > Aug 17 17:36:23 amd kernel: CR0: 80050033 CR2: f8601000 CR3: 31a1e000 > CR4: 06b0 > Aug 17 17:36:23 amd kernel: Call Trace: > Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0 > Aug 17 17:36:23 amd kernel: ? __mutex_unlock_slowpath+0x2b/0x280 > Aug 17 17:36:23 amd kernel: ? __active_retire+0x7e/0xd0 > Aug 17 17:36:23 amd kernel: ? mutex_unlock+0xb/0x10 > Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0 > Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530 > Aug 17 17:36:23 amd kernel: ? eb_lookup_vmas+0x1f5/0x9e0 > Aug 17 17:36:23 amd kernel: i915_gem_do_execbuffer+0xaab/0x2780 > Aug 17 17:36:23 amd kernel: ? _raw_spin_unlock_irqrestore+0x27/0x40 > Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530 > Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530 > Aug 17 17:36:23 amd kernel: ? kvmalloc_node+0x69/0x70 > Aug 17 17:36:23 amd kernel: i915_gem_execbuffer2_ioctl+0xdd/0x360 > Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0 > Aug 17 17:36:23 amd kernel: drm_ioctl_kernel+0x87/0xd0 > Aug 17 17:36:23 amd kernel: drm_ioctl+0x1f4/0x38b > Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0 > Aug 17 17:36:23 amd kernel: ? posix_get_monotonic_timespec+0x1c/0x90 > Aug 17 17:36:23 amd kernel: ? ktime_get_ts64+0x7a/0x1e0 > Aug 17 17:36:23 amd kernel: ? drm_ioctl_kernel+0xd0/0xd0 > Aug 17 17:36:23 amd kernel: __ia32_sys_ioctl+0x1ad/0x799 > Aug 17 17:36:23 amd kernel: ? debug_smp_processor_id+0x12/0x20 > Aug 17 17:36:23 amd kernel: ? exit_to_user_mode_prepare+0x4f/0x100 > Aug 17 17:36:23 amd kernel: do_int80_syscall_32+0x2c/0x40 > Aug 17 17:36:23 amd kernel: entry_INT80_32+0x111/0x111 > Aug 17 17:36:23 amd kernel: EIP: 0xb7fbc092 > Aug 17 17:36:23 amd kernel: Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 > 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00 > 00 00 00 00 00 cd 80 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b > 1c 24 c3 8d b4 26 00 > Aug 17 17:36:23 amd kernel: EAX: ffda EBX: 000a ECX: c0406469 >
[Intel-gfx] 5.9-rc1: graphics regression moved from -next to mainline
Hi! After about half an hour of uptime, screen starts blinking on thinkpad x60 and machine becomes unusable. I already reported this in -next, and now it is in mainline. It is 32-bit x86 system. Pavel Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link local (bound): [undef] Aug 17 17:36:04 amd ovpn-castor[2828]: UDPv4 link remote: [AF_INET]87.138.219.28:1194 Aug 17 17:36:23 amd kernel: BUG: unable to handle page fault for address: f8601000 Aug 17 17:36:23 amd kernel: #PF: supervisor write access in kernel mode Aug 17 17:36:23 amd kernel: #PF: error_code(0x0002) - not-present page Aug 17 17:36:23 amd kernel: *pdpt = 318f2001 *pde = Aug 17 17:36:23 amd kernel: Oops: 0002 [#1] PREEMPT SMP PTI Aug 17 17:36:23 amd kernel: CPU: 1 PID: 3004 Comm: Xorg Not tainted 5.9.0-rc1+ #86 Aug 17 17:36:23 amd kernel: Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31 /2011 Aug 17 17:36:23 amd kernel: EIP: eb_relocate_vma+0xcf6/0xf20 Aug 17 17:36:23 amd kernel: Code: e9 ff f7 ff ff c7 85 c0 fd ff ff ed ff ff ff c7 85 c4 fd ff ff ff ff ff ff 8b 85 c0 fd ff ff e9 a5 f8 ff ff 8b 85 d0 fd ff ff 03 01 00 40 10 89 43 04 8b 85 b4 fd ff ff 89 43 08 e9 9f f7 ff Aug 17 17:36:23 amd kernel: EAX: 003c306c EBX: f8601000 ECX: 00847000 EDX: Aug 17 17:36:23 amd kernel: ESI: 00847000 EDI: EBP: f1947c68 ESP: f19479fc Aug 17 17:36:23 amd kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 Aug 17 17:36:23 amd kernel: CR0: 80050033 CR2: f8601000 CR3: 31a1e000 CR4: 06b0 Aug 17 17:36:23 amd kernel: Call Trace: Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0 Aug 17 17:36:23 amd kernel: ? __mutex_unlock_slowpath+0x2b/0x280 Aug 17 17:36:23 amd kernel: ? __active_retire+0x7e/0xd0 Aug 17 17:36:23 amd kernel: ? mutex_unlock+0xb/0x10 Aug 17 17:36:23 amd kernel: ? i915_vma_pin+0xc5/0x8c0 Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530 Aug 17 17:36:23 amd kernel: ? eb_lookup_vmas+0x1f5/0x9e0 Aug 17 17:36:23 amd kernel: i915_gem_do_execbuffer+0xaab/0x2780 Aug 17 17:36:23 amd kernel: ? _raw_spin_unlock_irqrestore+0x27/0x40 Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530 Aug 17 17:36:23 amd kernel: ? __lock_acquire.isra.31+0x261/0x530 Aug 17 17:36:23 amd kernel: ? kvmalloc_node+0x69/0x70 Aug 17 17:36:23 amd kernel: i915_gem_execbuffer2_ioctl+0xdd/0x360 Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0 Aug 17 17:36:23 amd kernel: drm_ioctl_kernel+0x87/0xd0 Aug 17 17:36:23 amd kernel: drm_ioctl+0x1f4/0x38b Aug 17 17:36:23 amd kernel: ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0 Aug 17 17:36:23 amd kernel: ? posix_get_monotonic_timespec+0x1c/0x90 Aug 17 17:36:23 amd kernel: ? ktime_get_ts64+0x7a/0x1e0 Aug 17 17:36:23 amd kernel: ? drm_ioctl_kernel+0xd0/0xd0 Aug 17 17:36:23 amd kernel: __ia32_sys_ioctl+0x1ad/0x799 Aug 17 17:36:23 amd kernel: ? debug_smp_processor_id+0x12/0x20 Aug 17 17:36:23 amd kernel: ? exit_to_user_mode_prepare+0x4f/0x100 Aug 17 17:36:23 amd kernel: do_int80_syscall_32+0x2c/0x40 Aug 17 17:36:23 amd kernel: entry_INT80_32+0x111/0x111 Aug 17 17:36:23 amd kernel: EIP: 0xb7fbc092 Aug 17 17:36:23 amd kernel: Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00 Aug 17 17:36:23 amd kernel: EAX: ffda EBX: 000a ECX: c0406469 EDX: bff0ae3c Aug 17 17:36:23 amd kernel: ESI: b73aa000 EDI: c0406469 EBP: 000a ESP: bff0adb4 Aug 17 17:36:23 amd kernel: DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: 00200296 Aug 17 17:36:23 amd kernel: ? asm_exc_nmi+0xcc/0x2bc Aug 17 17:36:23 amd kernel: Modules linked in: Aug 17 17:36:23 amd kernel: CR2: f8601000 Aug 17 17:36:23 amd kernel: ---[ end trace 2ca9775068bbac06 ]--- -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx