[PATCH] drm/amd/display: Add NULL check for panel_cntl in dce110_edp_backlight_control

2024-09-15 Thread Mikhail Arkhipov
_cntl is not properly set, preventing any attempts to dereference a NULL pointer and avoiding potential crashes. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 06ddcee49a35 ("drm/amd/display: Added multi instance support for panel control") Signed-off-by: Mikhail Arkh

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-10 Thread Mikhail Gavrilov
On Tue, Sep 10, 2024 at 8:47 PM Leo Li wrote: > > Thanks Mikhail, I think I know what's going on now. > > The `scale-monitor-framebuffer` experimental setting is what puts us down the > bad code path. It seems VRR has nothing to do with this issue, just setting > `scale-

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-08 Thread Mikhail Gavrilov
On Sat, Sep 7, 2024 at 12:47 AM Leo Li wrote: > > > Hi Mikhail, > > I've tried to align my system with yours as best as I can, but so far, I've > had > no luck reproducing the hang. A video of what I'm doing: > https://youtu.be/VeD-LPCnfWM?si=b2baF8MyDB

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-04 Thread Mikhail Gavrilov
tch was definitely not enough. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-04 Thread Mikhail Gavrilov
d jobs CC [M] drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxnv50.o *** make[5]: *** [scripts/Makefile.build:485: drivers/gpu/drm/amd/amdgpu] Error 2 make[4]: *** [scripts/Makefile.build:485: drivers/gpu/drm] Error 2 make[3]: *** [scripts/Makefile.build:485: drivers/gpu] Error 2 make[2]: *** [scri

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-09-02 Thread Mikhail Gavrilov
On Sun, Aug 25, 2024 at 2:12 AM Mikhail Gavrilov wrote: > > Hi, > Is anyone trying to look into it? > I continue to reproduce this issue on fresh kernel builds 6.11-rc4+. > In addition to the RenPy engine, the problem also reproduces on games > from Ubisoft, such as Far Cry 4.

Re: 6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-08-24 Thread Mikhail Gavrilov
On Mon, Aug 5, 2024 at 11:05 PM Mikhail Gavrilov wrote: > > Hi, > After commit 1b04dcca4fb1, launching some RenPy games causes computer hang. > After the hang, even Alt + sysrq + REISUB can't reboot the computer! > And no trace in the kernel log! > For demonstration, I&

6.11/regression/bisected - after commit 1b04dcca4fb1, launching some RenPy games causes computer hang

2024-08-05 Thread Mikhail Gavrilov
s 100% reproducivity for this issue. You can find it in the Steam Store: https://store.steampowered.com/app/2946010/Find_the_Orange_Narwhal/ I uploaded demonstration video to youtube: https://youtu.be/yVW6rImRpXw Unfortunately, I can't check the revert commit 1541d63c5fe2 because of conflicts.

Re: 6.10/bisected/regression - Since commit e356d321d024 in the kernel log appears the message "MES failed to respond to msg=MISC (WAIT_REG_MEM)" which were never seen before

2024-08-02 Thread Mikhail Gavrilov
On Wed, Jul 24, 2024 at 10:16 PM Mikhail Gavrilov wrote: > > https://patchwork.freedesktop.org/patch/605201/ > For which kernel is this patch intended? The patch is not applied on > top of d67978318827. I am able to apply this patch on top of e4fc196f5ba3 and the issue is gone

Re: 6.10/bisected/regression - Since commit e356d321d024 in the kernel log appears the message "MES failed to respond to msg=MISC (WAIT_REG_MEM)" which were never seen before

2024-07-24 Thread Mikhail Gavrilov
is this patch intended? The patch is not applied on top of d67978318827. mikhail@primary-ws ~/p/g/linux-3 (master)> git reset d67978318827 --hard HEAD is now at d67978318827 Merge tag 'x86_cpu_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip mikhail@primary-ws

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-07-16 Thread Mikhail Gavrilov
On Tue, Jul 16, 2024 at 10:10 PM Alex Deucher wrote: > > Does the attached partial revert fix it? > > Alex > Yes, thanks. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-07-10 Thread Mikhail Gavrilov
On Wed, Jul 10, 2024 at 12:01 PM Mikhail Gavrilov wrote: > > On Tue, Jul 9, 2024 at 7:48 PM Rodrigo Siqueira Jordao > wrote: > > Hi, > > > > I also tried it with 6900XT. I got the same results on my side. > > This is weird. > > > Anyway, I could not rep

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-07-10 Thread Mikhail Gavrilov
mware information: sudo cat > /sys/kernel/debug/dri/0/amdgpu_firmware_info > sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info [sudo] password for mikhail: VCE feature version: 0, firmware version: 0x UVD feature version: 0, firmware version: 0x MC feature version: 0, firmwa

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-29 Thread Mikhail Gavrilov
On Sat, Jun 29, 2024 at 9:46 PM Rodrigo Siqueira Jordao wrote: > Hi Mikhail, > > I'm trying to reproduce this issue, but until now, I've been unable to > reproduce it. I tried some different scenarios with the following > components: > > 1. Displays: I tried with on

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-21 Thread Mikhail Gavrilov
On Fri, Jun 21, 2024 at 12:56 PM Linux regression tracking (Thorsten Leemhuis) wrote: > Hmmm, I might have missed something, but it looks like nothing happened > here since then. What's the status? Is the issue still happening? Yes. Tested on e5b3efbe1ab1. I spotted that the problem disappears a

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-10 Thread Mikhail Gavrilov
On Fri, Jun 7, 2024 at 5:29 PM Linux regression tracking (Thorsten Leemhuis) wrote: > > [CCing the other amd drm maintainers] > > Mikhail: are those details in any way relevant? Then in the future best > leave them out (or make things easier to follow), they make the bug > re

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-09 Thread Mikhail Gavrilov
On Fri, Jun 7, 2024 at 6:39 PM Alex Deucher wrote: > > --- a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c > +++ b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c > @@ -944,7 +944,7 @@ void optc1_set_drr( > OTG_V_TOTAL_MAX_SEL, 1, >

Re: 6.10/regression/bisected - commit a68c7eaa7a8f cause *ERROR* Trying to clear memory with ring turned off in amdgpu_fill_buffer.

2024-06-09 Thread Mikhail Gavrilov
On Fri, May 17, 2024 at 8:59 PM Mikhail Gavrilov wrote: > > Thanks, Arun. > With the patch this error did not appear anymore. > Tested-by: Mikhail Gavrilov on 7900XTX > hardware. > I see that this patch do the same but more correctly: https://gitlab.freedesktop.org

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-05 Thread Mikhail Gavrilov
On Sun, May 26, 2024 at 7:06 PM Mikhail Gavrilov wrote: > > Hi, > Day before yesterday I replaced 7900XTX to 6900XT for got clear in > which kernel first time appeared warning message "DMA-API: amdgpu > :0f:00.0: cacheline tracking EEXIST, overlapping mappings aren&#

6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-05-26 Thread Mikhail Gavrilov
Hi, Day before yesterday I replaced 7900XTX to 6900XT for got clear in which kernel first time appeared warning message "DMA-API: amdgpu :0f:00.0: cacheline tracking EEXIST, overlapping mappings aren't supported". The kernel 6.3 and older won't boot on a computer with Radeon 7900XTX. When I boo

Re: 6.10/regression/bisected - commit a68c7eaa7a8f cause *ERROR* Trying to clear memory with ring turned off in amdgpu_fill_buffer.

2024-05-17 Thread Mikhail Gavrilov
Thanks, Arun. With the patch this error did not appear anymore. Tested-by: Mikhail Gavrilov on 7900XTX hardware. -- Best Regards, Mike Gavrilov. <>

Re: regression/bisected/6.8 commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 leads to GPU hang when I open GNOME activities

2024-01-24 Thread Mikhail Gavrilov
On Wed, Jan 24, 2024 at 7:19 AM Mikhail Gavrilov wrote: > > Who could dig into it, please? You decided to revert it? https://lkml.org/lkml/2024/1/22/1866 Also I forgot to attach the kernel build .config in the previous message. I'm going to fix it here. It may be useful for reprodu

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-12-19 Thread Mikhail Gavrilov
On Fri, Dec 15, 2023 at 5:37 PM Christian König wrote: > > I have no idea :) > > From the logs I can see that the AMDGPU now has the proper BARs assigned: > > [5.722015] pci :03:00.0: [1002:73df] type 00 class 0x038000 > [5.722051] pci :03:00.0: reg 0x10: [mem > 0xf8-0xfbf

Re: regression/bisected/6.7rc1: Instead of desktop I see a horizontal flashing bar with a picture of the desktop background on white screen

2023-12-18 Thread Mikhail Gavrilov
On Fri, Dec 15, 2023 at 9:14 PM Hamza Mahfooz wrote: > > Can you try the following patch with old fw (version 0x07002100 should > be fine)?: https://patchwork.freedesktop.org/patch/572298/ > Tested-by: Mikhail Gavrilov on 7900XTX hardware. Can I ask? What does SubVP actually d

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-12-15 Thread Mikhail Gavrilov
On Tue, Feb 28, 2023 at 5:43 PM Christian König wrote: > > The point is it doesn't need to talk to the amdgpu hardware. What it > does is that it talks to the good old VGA/VESA emulation and that just > happens to be still enabled by the BIOS/GRUB. > > And that VGA/VESA emulation doesn't need any

Re: 6.7/regression/KASAN: null-ptr-deref in amdgpu_ras_reset_error_count+0x2d6

2023-11-17 Thread Mikhail Gavrilov
he first one patch is enough. Tested-on: 7900XTX, 6900XT and 6800M. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: regression/bisected/6.7rc1: Instead of desktop I see a horizontal flashing bar with a picture of the desktop background on white screen

2023-11-15 Thread Mikhail Gavrilov
On Wed, Nov 15, 2023 at 11:39 PM Lee, Alvin wrote: > > This change has a DMCUB dependency - are you able to update your DMCUB > version as well? > I can confirm this issue was gone after updating firmware. ❯ dmesg | grep DMUB [ 11.496679] [drm] Loading DMUB firmware via PSP: version=0x0700230

Re: regression/bisected/6.7rc1: Instead of desktop I see a horizontal flashing bar with a picture of the desktop background on white screen

2023-11-15 Thread Mikhail Gavrilov
On Wed, Nov 15, 2023 at 11:14 PM Hamza Mahfooz wrote: > > What version of DMUB firmware are you on? > The easiest way to find out would be using the following: > > # dmesg | grep DMUB > Sapphire AMD Radeon RX 7900 XTX PULSE OC: ❯ dmesg | grep DMUB [ 14.341362] [drm] Loading DMUB firmware via PS

Re: regression/bisected/6.7rc1: Instead of desktop I see a horizontal flashing bar with a picture of the desktop background on white screen

2023-11-15 Thread Mikhail Gavrilov
On Tue, Nov 14, 2023 at 11:03 PM Mikhail Gavrilov wrote: > > On Tue, Nov 14, 2023 at 3:55 PM Mikhail Gavrilov > wrote: > > > > Hi, > > Yesterday came the 6.7-rc1 kernel. > > And surprisingly it turned out it is not working with my LG C3. > > I use this O

regression/bisected/6.7rc1: Instead of desktop I see a horizontal flashing bar with a picture of the desktop background on white screen

2023-11-14 Thread Mikhail Gavrilov
Hi, Yesterday came the 6.7-rc1 kernel. And surprisingly it turned out it is not working with my LG C3. I use this OLED TV as my primary monitor. After login to GNOME I see a horizontal flashing bar with a picture of the desktop background on white screen. Demonstration: https://youtu.be/7F76VfRkrVo

Re: 6.7/regression/KASAN: null-ptr-deref in amdgpu_ras_reset_error_count+0x2d6

2023-11-07 Thread Mikhail Gavrilov
On Wed, Nov 8, 2023 at 12:12 AM Alex Deucher wrote: > > The attached patch should fix it. Not sure why your GPU shows up as > busy. The AGP aperture was just disabled. Tested-by: Mikhail Gavrilov Thanks, after applying the patch GPU loading meets expectations. Games are working so ov

Re: 6.7/regression/KASAN: null-ptr-deref in amdgpu_ras_reset_error_count+0x2d6

2023-11-07 Thread Mikhail Gavrilov
On Mon, Nov 6, 2023 at 8:29 PM Alex Deucher wrote: > > Already fixed in this commit: > https://gitlab.freedesktop.org/agd5f/linux/-/commit/d1d4c0b7b65b7fab2bc6f97af9e823b1c42ccdb0 > Which is in included in last weeks PR. > Thanks, it fixed the issue above. But, unfortunately this is not the only

Re: [bug/bisected] commit a2848d08742c8e8494675892c02c0d22acbe3cf8 cause general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI

2023-07-18 Thread Mikhail Gavrilov
On Tue, Jul 18, 2023 at 7:13 AM Chen, Guchun wrote: > > [Public] > > Hello Mike, > > I guess this patch can resolve your problem. > https://patchwork.freedesktop.org/patch/547897/ > > Regards, > Guchun > Tested-by: Mikhail Gavrilov Thanks, the issue was go

Re: [regression][6.5] KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX

2023-07-16 Thread Mikhail Gavrilov
On Fri, Jul 14, 2023 at 4:09 PM Chen, Guchun wrote: > > Thanks for your patience on this, Mike. I think > https://patchwork.freedesktop.org/patch/547592/ can help this, please take a > try. Tested-by: Mikhail Gavrilov Thanks it looks good. I spent the whole weekend with these pa

Re: [regression][6.5] KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX

2023-07-07 Thread Mikhail Gavrilov
On Fri, Jul 7, 2023 at 6:01 AM Chen, Guchun wrote: > > [Public] > > Hi Mike, > > Yes, we are aware of this problem, and we are working on that. The problem is > caused by recent code stores xcp_id to amdgpu bo for accounting memory usage > and so on. However, not all VMs are attached to that lik

Re: [6.4-rc7][regression] slab-out-of-bounds in amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]

2023-06-21 Thread Mikhail Gavrilov
On Wed, Jun 21, 2023 at 12:47 PM Zhu, Jiadong wrote: > > [AMD Official Use Only - General] > > Hi, > > It is fixed on > https://patchwork.freedesktop.org/patch/542647/?series=119384&rev=2 > > Could you make sure if this patch is included. > I confirm this patch fixes the issue. But this patch i

[6.4-rc7][regression] slab-out-of-bounds in amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]

2023-06-21 Thread Mikhail Gavrilov
Hi, after commit 5b711e7f9c73e5ff44d6ac865711d9a05c2a0360 I see KASAN sanitizer bug message at every boot: Backtrace: [ 18.600551] == [ 18.600558] BUG: KASAN: slab-out-of-bounds in amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgp

Re: [PATCH 2/2] drm/amdgpu: make sure that BOs have a backing store

2023-06-06 Thread Mikhail Gavrilov
n range [0x0010-0x0017]" finally fixed. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] - RIP: 0010:amdgpu_bo_get_memory+0x80/0x360 [amdgpu]

2023-05-18 Thread Mikhail Gavrilov
On Mon, May 8, 2023 at 3:40 PM Mikhail Gavrilov wrote: > > No one can reproduce this? > I prepared a video instruction which can helps: > https://youtu.be/0ipQnMpZG1Y > > 1. Run script which would calculate watchers: > $ for i in {1..9}; do sudo curl -s > https://r

Re: KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] - RIP: 0010:amdgpu_bo_get_memory+0x80/0x360 [amdgpu]

2023-05-08 Thread Mikhail Gavrilov
On Fri, May 5, 2023 at 6:44 PM Mikhail Gavrilov wrote: > I need to say that it may not be easy to reproduce this bug. > For helping reproduce: > 1. I looped script above: > $ for i in {1..9}; do sudo curl -s > https://raw.githubusercontent.com/fatso83/dotfiles/master/utils/

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-25 Thread Mikhail Gavrilov
On Thu, Apr 20, 2023 at 3:32 PM Mikhail Gavrilov wrote: > > Important don't give up. > https://youtu.be/25zhHBGIHJ8 [40 min] > https://youtu.be/utnDR26eYBY [50 min] > https://youtu.be/DJQ_tiimW6g [12 min] > https://youtu.be/Y6AH1oJKivA [6 min] > Yes the issue is everyth

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-20 Thread Mikhail Gavrilov
On Thu, Apr 20, 2023 at 2:59 PM Christian König wrote: > Could you try drm-misc-next as well? If as I assume I cloned right repo $ git clone -b drm-misc-next git://anongit.freedesktop.org/drm/drm-misc linux-drm-misc-next for my hardware last commit on this branch is turned out completely unworkin

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-20 Thread Mikhail Gavrilov
On Thu, Apr 20, 2023 at 2:59 PM Christian König wrote: > > Could you try drm-misc-next as well? > > Going to give drm-fixes another round of testing. > > Thanks, > Christian. Important don't give up. https://youtu.be/25zhHBGIHJ8 [40 min] https://youtu.be/utnDR26eYBY [50 min] https://youtu.be/DJQ_

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-19 Thread Mikhail Gavrilov
On Wed, Apr 19, 2023 at 1:12 PM Christian König wrote: > > I'm already looking into this, but can't figure out why we run into > problems here. > > What happens is that a CS is aborted without sending the job to the > scheduler and in this case the cleanup function doesn't seem to work. > > Christ

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-19 Thread Mikhail Gavrilov
Christian? ❯ /usr/src/kernels/6.3.0-0.rc7.56.fc39.x86_64/scripts/faddr2line /lib/debug/lib/modules/6.3.0-0.rc7.56.fc39.x86_64/kernel/drivers/gpu/drm/scheduler/gpu-sched.ko.debug drm_sched_job_cleanup+0x9a drm_sched_job_cleanup+0x9a/0x130: drm_sched_job_cleanup at /usr/src/debug/kernel-6.3-rc7/linu

Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]

2023-04-14 Thread Mikhail Gavrilov
On Tue, Apr 11, 2023 at 10:40 PM Mikhail Gavrilov wrote: > > Hi, > KASAN continues to find problems in the drm_sched_job_cleanup code at 6.3rc6. > I not got any feedback in the thread > https://lore.kernel.org/lkml/cabxgcsmvub2ra4d+k5cna0_2521tox++d4nmoukki4x2-q_...@mail.gmail.com/

Re: BUG: KASAN: slab-use-after-free in drm_sched_get_cleanup_job+0x47b/0x5c0 [gpu_sched]

2023-04-04 Thread Mikhail Gavrilov
On Fri, Mar 24, 2023 at 7:37 PM Christian König wrote: > > Yeah, that one > > Thanks for the info, looks like this isn't fixed. > > Christian. > Hi, glad to see that "BUG: KASAN: slab-use-after-free in drm_sched_get_cleanup_job+0x47b/0x5c0" was fixed in 6.3-rc5. For history it would be good to kn

Re: BUG: KASAN: slab-use-after-free in drm_sched_get_cleanup_job+0x47b/0x5c0 [gpu_sched]

2023-03-23 Thread Mikhail Gavrilov
On Tue, Mar 21, 2023 at 11:47 PM Christian König wrote: > > Hi Mikhail, > > That looks like a reference counting issue to me. > > I'm going to take a look, but we have already fixed one of those recently. > > Probably best that you try this on drm-fixes, just to doubl

[6.3][regression] commit a4e771729a51168bc36317effaa9962e336d4f5e lead to flood kernel logs with warning messages "at kernel/workqueue.c:3167 __flush_work+0x472/0x500"

2023-03-08 Thread Mikhail Gavrilov
Hi, I didn't faced to issue drm_bridge_hpd_enable+0x94/0x9c [drm] but fixing this issue leads to warning messages on my laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007 which has two AMD GPU. Discrete Radeon 6800M and integrated in CPU Cezanne Vega 8. I found bad commit by bisecting: ❯ git

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-28 Thread Mikhail Gavrilov
On Mon, Feb 27, 2023 at 3:22 PM Christian König > > Unfortunately yes. We could clean that up a bit more so that you don't > run into a BUG() assertion, but what essentially happens here is that we > completely fail to talk to the hardware. > > In this situation we can't even re-enable vesa or text

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-24 Thread Mikhail Gavrilov
On Fri, Feb 24, 2023 at 8:31 PM Christian König wrote: > > Sorry I totally missed that you attached the full dmesg to your original > mail. > > Yeah, the driver did fail gracefully. But then X doesn't come up and > then gdm just dies. Are you sure that these messages should be present when the dr

Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-24 Thread Mikhail Gavrilov
On Fri, Feb 24, 2023 at 12:13 PM Christian König wrote: > > Hi Mikhail, > > this is pretty clearly a problem with the system and/or it's BIOS and > not the GPU hw or the driver. > > The option pci=nocrs makes the kernel ignore additional resource windows > the BIOS re

amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

2023-02-23 Thread Mikhail Gavrilov
Hi, I have a laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007. But it is impossible to use without AC power because the system losts nvme when I disconnect the power adapter. Messages from kernel log when it happens: nvme nvme0: controller is down; will reset: CSTS=0x, PCI_STATUS=0

Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.

2023-02-17 Thread Mikhail Gavrilov
On Fri, Feb 17, 2023 at 8:30 PM Alex Deucher wrote: > > On Fri, Feb 17, 2023 at 1:10 AM Mikhail Gavrilov > wrote: > > > > On Fri, Dec 9, 2022 at 7:37 PM Leo Liu wrote: > > > > > > Please try the latest AMDGPU driver: > > > > > > https:/

Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.

2023-02-16 Thread Mikhail Gavrilov
On Fri, Dec 9, 2022 at 7:37 PM Leo Liu wrote: > > Please try the latest AMDGPU driver: > > https://gitlab.freedesktop.org/agd5f/linux/-/commits/amd-staging-drm-next/ > Sorry Leo, I miss your message. This issue is still actual for 6.2-rc8. In my first message I was mistaken. > Before kernel 5.1

Re: [regression][6.0] After commit b261509952bc19d1012cf732f853659be6ebc61e I see WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70

2023-02-13 Thread Mikhail Gavrilov
On Thu, Feb 9, 2023 at 10:17 PM Leo Li wrote: > > Hi Mikhail, seems like your report flew past me, thanks for the ping. > > This might be a simple issue of not backing off when deadlock was hit. > drm_atomic_normalize_zpos() can return an error code, and I ignored it > (oops!

Re: [regression][6.0] After commit b261509952bc19d1012cf732f853659be6ebc61e I see WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70

2023-02-09 Thread Mikhail Gavrilov
9be6ebc61e will stop these warnings. I also attached fresh logs from 6.2.0-0.rc6. 6.2-rc7 I started to build without commit b261509952bc19d1012cf732f853659be6ebc61e to avoid these warnings. On Thu, Oct 13, 2022 at 6:36 PM Mikhail Gavrilov > > Hi! > I bisected an issue of the 6.0 kernel whic

Re: [PATCH] drm/amd: fix memory leak in amdgpu_cs_sync_rings

2023-02-03 Thread Mikhail Gavrilov
if (r) > return r; > } > -- > 2.39.1 > As a bug reporter I can confirm this patch fixes a memory leak. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: [PATCH] drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency

2023-01-06 Thread Mikhail Gavrilov
submitted in 6.2! Why? It will close my questions about amdgpu right now. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

[6.2][regression] looks like commit aab9cf7b6954136f4339136a1a7fc0602a2c4d8b leads to use-after-free and random computer hangs

2022-12-18 Thread Mikhail Gavrilov
Hi, The kernel 6.2 preparation cycle has begun. And after the kernel was updated on my Fedora Rawhide I started receiving use-after-free errors with complete computer hangs. At least a good reproducer of this behaviour is launch of the game "Marvel's Avengers". The backtrace of the issue looks lik

Re: Screen corruption using radeon kernel driver

2022-12-10 Thread Mikhail Krylov
On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote: > On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy wrote: > > > > On 2022-11-30 14:28, Alex Deucher wrote: > > > On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy wrote: > > >> > > >> On 2022-1

Re: [bug][vaapi][h264] The commit 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 on certain video files leads to problems with VAAPI hardware decoding.

2022-12-07 Thread Mikhail Gavrilov
On Wed, Dec 7, 2022 at 7:58 PM Alex Deucher wrote: > > > What GPU do you have and what entries do you have in > sys/class/drm/card0/device/ip_discovery/die/0/UVD for the device? I bisected the issue on the Radeon 6800M. Parent commit for 7cbe08a930a132d84b4cf79953b00b074ec7a2a7 is 46dd2965bdd1c5

Re: Screen corruption using radeon kernel driver

2022-12-01 Thread Mikhail Krylov
On Thu, Dec 01, 2022 at 02:00:58PM +, Robin Murphy wrote: > On 2022-11-30 19:59, Mikhail Krylov wrote: > > On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote: > > > On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy > > > wrote: > > > > > >

Re: Screen corruption using radeon kernel driver

2022-11-30 Thread Mikhail Krylov
On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote: > On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy wrote: > > > > On 2022-11-30 14:28, Alex Deucher wrote: > > > On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy wrote: > > >> > > >> On 2022-1

Re: Screen corruption using radeon kernel driver

2022-11-29 Thread Mikhail Krylov
On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote: > On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov wrote: > > > > On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote: > > > On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov wrote: > > > > &

Re: Screen corruption using radeon kernel driver

2022-11-29 Thread Mikhail Krylov
On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote: > On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov wrote: > > > > On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote: > > > > >>> [excessive quoting removed] > > > > >> So

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-11-28 Thread Mikhail Gavrilov
On Tue, Nov 22, 2022 at 12:16 PM Christian König wrote: > > Ah, thanks a lot for this. I've already pushed the patches into our > internal branch, but getting this confirmation is still great! > > This was quite some fundamental bug in the handling and I hope to get > this completely reworked at s

Re: Screen corruption using radeon kernel driver

2022-11-28 Thread Mikhail Krylov
> >> >> after the New Year for that. >> >> >> >> >> >> Is it at all possible that such a patch will be merged into kernel? >>

Re: Screen corruption using radeon kernel driver

2022-11-28 Thread Mikhail Krylov
On Mon, Apr 25, 2022 at 01:22:04PM -0400, Alex Deucher wrote: > + dri-devel > > On Mon, Apr 25, 2022 at 3:33 AM Krylov Michael wrote: > > > > Hello! > > > > After updating my Linux kernel from version 4.19 (Debian 10 version) to > > 5.10 (packaged with Debian 11), I've noticed that the image > >

Re: [regression][6.0] After commit b261509952bc19d1012cf732f853659be6ebc61e I see WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70

2022-11-22 Thread Mikhail Gavrilov
On Thu, Oct 13, 2022 at 6:36 PM Mikhail Gavrilov wrote: > > Hi! > I bisected an issue of the 6.0 kernel which started happening after > 6.0-rc7 on all my machines. > > Backtrace of this issue looks like as: > > [ 2807.339439] [ cut here ] > [ 28

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-11-21 Thread Mikhail Gavrilov
letely gone. All known broken games working. Tested-by: Mikhail Gavrilov The only thing I don't like is the flood in the kernel logs of the message "WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70", but this is not related to the patches bei

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-11-02 Thread Mikhail Gavrilov
On Tue, Nov 1, 2022 at 10:52 PM Christian König wrote: > > Let's focus on one problem at a time. > > The issue here is that somehow userptr handling became racy after we > removed the lock, but I don't see why. > > We need to fix this ASAP since it is probably a much wider problem and > the additi

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-10-30 Thread Mikhail Gavrilov
On Wed, Oct 26, 2022 at 12:29 PM Christian König wrote: > > Attached is the original test patch rebased on current amd-staging-drm-next. > > Can you test if this is enough to make sure that the games start without > crashing by fetching the userptrs? 1. Over the past week the list of games affect

Re: [6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-10-21 Thread Mikhail Gavrilov
On Fri, Oct 21, 2022 at 1:33 PM Christian König wrote: > > Hi, > > yes Bas already reported this issue, but I couldn't reproduce it. Need > to come up with a patch to narrow this down further. > > Can I send you something to test? I would appreciate to test any patches and ideas. -- Best Regard

[6.1][regression] after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start

2022-10-21 Thread Mikhail Gavrilov
Hi! I found that some games (Cyberpunk 2077, Forza Horizon 4/5) hang at start after commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6. dd80d9c8eecac8c516da5b240d01a35660ba6cb6 is the first bad commit commit dd80d9c8eecac8c516da5b240d01a35660ba6cb6 Author: Christian König Date: Thu Jul 14 10:23:38

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-10-17 Thread Mikhail Gavrilov
On Wed, May 11, 2022 at 5:01 PM Christian König wrote: > > > We have implemented a workaround, but still don't know the exact root cause. > > If anybody wants to look into this it would be rather helpful to be able > to reproduce the issue. > > Regards, > Christian. I see that issue was returned

[regression][6.0] After commit b261509952bc19d1012cf732f853659be6ebc61e I see WARNING message at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks+0x63/0x70

2022-10-13 Thread Mikhail Gavrilov
Hi! I bisected an issue of the 6.0 kernel which started happening after 6.0-rc7 on all my machines. Backtrace of this issue looks like as: [ 2807.339439] [ cut here ] [ 2807.339445] WARNING: CPU: 11 PID: 2061 at drivers/gpu/drm/drm_modeset_lock.c:276 drm_modeset_drop_locks

[regression][6.1] After commit e4dc45b1848bc6bcac31eb1b4ccdd7f6718b3c86 system randomly hungs

2022-10-11 Thread Mikhail Gavrilov
Hi! The hungs occurs randomly, but I found good reproductive scenario (This is running the campaign in the game Halo Infinite) The backtrace is look like this: [ 147.260971] BUG: kernel NULL pointer dereference, address: 0088 [ 147.260987] [ cut here ] [ 147.

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-09-19 Thread Mikhail Gavrilov
Hi! Unfortunately the use-after-free issue still happens on the 6.0-rc5 kernel. The issue became hard to repeat. I spent the whole day at the computer when use-after-free again happened, I was playing the game Tiny Tina's Wonderlands. Therefore, forget about repeatability. It remains only to hope f

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-24 Thread Mikhail Gavrilov
On Fri, Aug 19, 2022 at 5:13 PM Maíra Canal wrote: > > Hi Mikhail, > > Could you please specify the steps to reproduce this use-after-free? I > will try to reproduce it on the RX5700 XT and bisect the issue. > Hi Maíra, thanks for help. I'm afraid that it will be u

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-17 Thread Mikhail Gavrilov
On Wed, Aug 17, 2022 at 11:43 PM Maíra Canal wrote: > > Hi Mikhail, > > Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial > revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the > error. Try reverting it and check if

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-17 Thread Mikhail Gavrilov
On Wed, Aug 17, 2022 at 9:08 PM Melissa Wen wrote: > > Hi Mikhail, > > IIUC, you got this second user-after-free by applying the first version > of Maíra's patch, right? So, that version was adding another unbalanced > unlock to the cs ioctl flow, but it was solved in the

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-16 Thread Mikhail Gavrilov
On Mon, Aug 15, 2022 at 3:37 PM Mikhail Gavrilov wrote: > > Thanks, I tested this patch. > But with this patch use-after-free problem happening in another place: Does anyone have an idea why the second use-after-free happened? >From the trace I don't understand which code is

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-15 Thread Mikhail Gavrilov
On Mon, Aug 15, 2022 at 5:20 AM Maíra Canal wrote: > > Hi Mikhail > > Looks like this use-after-free problem was introduced on > 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems > like: if amdgpu_cs_vm_handling return r != 0, then it will unlock > bo_

[BUG][5.20] refcount_t: underflow; use-after-free

2022-08-14 Thread Mikhail Gavrilov
Hi folks. Joined testing 5.20 today (7ebfc85e2cd7). I encountered a frequently GPU freeze, after which a message appears in the kernel logs: [ 220.280990] [ cut here ] [ 220.281000] refcount_t: underflow; use-after-free. [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcoun

Re: Command "clinfo" causes BUG: kernel NULL pointer dereference, address: 0000000000000008 on driver amdgpu

2022-07-19 Thread Mikhail Gavrilov
On Tue, Jul 19, 2022 at 4:26 PM Mikhail Gavrilov wrote: > In the kernel log there is no error so it is most likely a user space issue , > but I am not > sure about it. But I am confused by the message in the kernel log: [ 1962.000909] amdgpu: HIQ MQD's queue_doorbell_id0 i

Re: Command "clinfo" causes BUG: kernel NULL pointer dereference, address: 0000000000000008 on driver amdgpu

2022-07-19 Thread Mikhail Gavrilov
On Tue, Jul 19, 2022 at 1:40 PM Mike Lothian wrote: > > I was told that this patch replaces the patch you mentioned > https://patchwork.freedesktop.org/series/106078/ and it the one > that'll hopefully land in Linus's tree > Great, I confirm that both patches solve the issue. As I understand the

Re: [Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

2022-07-18 Thread Mikhail Gavrilov
On Wed, Jul 13, 2022 at 5:38 PM Mikhail Gavrilov wrote: > # first bad commit: [9cbbd694a58bdf24def2462276514c90cab7cf80] Merge > drm/drm-next into drm-misc-next > Don't know who to thank but the issue disappeared in 5.19 rc7. -- Best Regards, Mike Gavrilov.

Command "clinfo" causes BUG: kernel NULL pointer dereference, address: 0000000000000008 on driver amdgpu

2022-07-18 Thread Mikhail Gavrilov
Hi guys I continue testing 5.19 rc7 and found the bug. Command "clinfo" causes BUG: kernel NULL pointer dereference, address: 0008 on driver amdgpu. Here is trace: [ 1320.203332] BUG: kernel NULL pointer dereference, address: 0008 [ 1320.203338] #PF: supervisor read access

Re: [Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

2022-07-13 Thread Mikhail Gavrilov
On Sat, Jul 9, 2022 at 5:10 PM Mikhail Gavrilov wrote: > Hi Christian, > if you read my initial post. You should see that I tried to bisect the issue. > But it is very problematic because on each step I see different symptomes. > And if mark different symptoms with skip step we got a

Re: [Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

2022-07-09 Thread Mikhail Gavrilov
On Thu, Jul 7, 2022 at 2:50 PM Christian König wrote: > > Am 07.07.22 um 02:20 schrieb Mikhail Gavrilov: > > On Tue, Jun 28, 2022 at 2:21 PM Mikhail Gavrilov > > wrote: > > Christian can you look why > > drm_aperture_remove_conflicting_pci_framebuffers cause thi

Re: [Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

2022-07-06 Thread Mikhail Gavrilov
On Tue, Jun 28, 2022 at 2:21 PM Mikhail Gavrilov wrote: > Christian can you look why drm_aperture_remove_conflicting_pci_framebuffers cause this kernel bug on my machine? [6.822385] amdgpu: Ignoring ACPI CRAT on non-APU system [6.822462] amdgpu: Virtual CRAT table created for

[Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

2022-06-28 Thread Mikhail Gavrilov
Hi guys. Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode instead I see black screen with constantly glowing cursor. Demonstration: https://youtu.be/rGL4LsHMae4 In the kernel logs there are references to hung processes: [ 149.363465] rfkill: input handler disabled

Re: Screen corruption using radeon kernel driver

2022-05-16 Thread Mikhail Krylov
On Mon, Apr 25, 2022 at 01:22:04PM -0400, Alex Deucher wrote: > + dri-devel > > On Mon, Apr 25, 2022 at 3:33 AM Krylov Michael wrote: > > > > Hello! > > > > After updating my Linux kernel from version 4.19 (Debian 10 version) to > > 5.10 (packaged with Debian 11), I've noticed that the image > >

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-05-11 Thread Mikhail Gavrilov
On Fri, Apr 15, 2022 at 1:04 PM Christian König wrote: > > No, I just couldn't find time during all that bug fixing :) > > Sorry for the delay, going to take a look after the eastern holiday here. > > Christian. The message is just for history. The issue was fixed between b253435746d9a4a and 5.18

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-04-14 Thread Mikhail Gavrilov
On Sat, Apr 9, 2022 at 7:27 PM Christian König wrote: > > That's unfortunately not the end of the story. > > This is fixing your problem, but reintroducing the original problem that > we call the syncobj with a lock held which can crash badly as well. > > Going to take a closer look on Monday. I h

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-04-08 Thread Mikhail Gavrilov
ers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" has gone. Thanks. Tested-by: Mikhail Gavrilov -- Best Regards, Mike Gavrilov.

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-04-08 Thread Mikhail Gavrilov
On Fri, 8 Apr 2022 at 16:13, Christian König wrote: > I own you a beer. > > I still don't know what happens here, but that makes at least a bit more > sense than a patch which only changes comments :) > > Looks like we are missing something here. Can I send you a patch to try > something later to

Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some ga

2022-04-08 Thread Mikhail Gavrilov
Hi Christian > those are two independent and already known problems. > > The warning triggered from the sync_file is already fixed in > drm-misc-next-fixes, but so far I couldn't figure out why the games > suddenly doesn't work any more. I thought that these warnings are related to the stuck of t

[Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games

2022-04-03 Thread Mikhail Gavrilov
Hi, Between commits ed4643521e6a and 34af78c4e616 something was broken. I noted that kernel log flooded with warning message "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" when some games are running: "Resident Evil Village", "Marvel's Aven

  1   2   >