subject:"\[PATCH v2 0\/4\] drm\/xe\: Support PCIe FLR"

[PATCH v2 0/4] drm/xe: Support PCIe FLR

2024-04-17 Thread Aravind Iddamsetty

Resending by adding the r-b's and to get ack from DRM maintainers
for the first patch in the series.

PCI subsystem provides callbacks to inform the driver about a request to
do function level reset by user, initiated by writing to sysfs entry
/sys/bus/pci/devices/.../reset. This will allow the driver to handle FLR
without the need to do unbind and rebind as the driver needs to
reinitialize the device afresh post FLR.

v2:
1. Directly expose the devm_drm_dev_release_action instead of introducing
a helper (Rodrigo)
2. separate out gt idle and pci save/restore to a separate patch (Lucas)
3. Fixed the warnings seen around xe_guc_submit_stop, xe_guc_puc_fini

Cc: Rodrigo Vivi 
Cc: Lucas De Marchi 

dmesg snip showing FLR recovery:

[  590.486336] xe :4d:00.0: enabling device (0140 -> 0142)
[  590.506933] xe :4d:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  590.542355] xe :4d:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  590.578532] xe :4d:00.0: [drm] VISIBLE VRAM: 0x2020,
0x0020
[  590.578556] xe :4d:00.0: [drm] VRAM[0, 0]: Actual physical size
0x0010, usable size exclude stolen 0x000fff00, CPU
accessible size 0x000fff00
[  590.578560] xe :4d:00.0: [drm] VRAM[0, 0]: DPA range:
[0x-10], io range:
[0x2020-202fff00]
[  590.578585] xe :4d:00.0: [drm] VRAM[1, 1]: Actual physical size
0x0010, usable size exclude stolen 0x000fff00, CPU
accessible size 0x000fff00
[  590.578589] xe :4d:00.0: [drm] VRAM[1, 1]: DPA range:
[0x0010-20], io range:
[0x2030-203fff00]
[  590.578592] xe :4d:00.0: [drm] Total VRAM: 0x2020,
0x0020
[  590.578594] xe :4d:00.0: [drm] Available VRAM:
0x2020, 0x001ffe00
[  590.738899] xe :4d:00.0: [drm] GT0: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  590.889991] xe :4d:00.0: [drm] GT1: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  590.892835] [drm] Initialized xe 1.1.0 20201103 for :4d:00.0 on
minor 1
[  590.900215] xe :9a:00.0: enabling device (0140 -> 0142)
[  590.915991] xe :9a:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  590.957450] xe :9a:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  590.989863] xe :9a:00.0: [drm] VISIBLE VRAM: 0x20e0,
0x0020
[  590.989888] xe :9a:00.0: [drm] VRAM[0, 0]: Actual physical size
0x0010, usable size exclude stolen 0x000fff00, CPU
accessible size 0x000fff00
[  590.989893] xe :9a:00.0: [drm] VRAM[0, 0]: DPA range:
[0x-10], io range:
[0x20e0-20efff00]
[  590.989918] xe :9a:00.0: [drm] VRAM[1, 1]: Actual physical size
0x0010, usable size exclude stolen 0x000fff00, CPU
accessible size 0x000fff00
[  590.989921] xe :9a:00.0: [drm] VRAM[1, 1]: DPA range:
[0x0010-20], io range:
[0x20f0-2000]
[  590.989924] xe :9a:00.0: [drm] Total VRAM: 0x20e0,
0x0020
[  590.989927] xe :9a:00.0: [drm] Available VRAM:
0x20e0, 0x001ffe00
[  591.142061] xe :9a:00.0: [drm] GT0: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  591.293505] xe :9a:00.0: [drm] GT1: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  591.295487] [drm] Initialized xe 1.1.0 20201103 for :9a:00.0 on
minor 2
[  610.685993] Console: switching to colour dummy device 80x25
[  610.686118] [IGT] xe_exec_basic: executing
[  610.755398] xe :4d:00.0: [drm] GT0: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  610.771783] xe :4d:00.0: [drm] GT1: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  610.773542] [IGT] xe_exec_basic: starting subtest once-basic
[  610.960251] [IGT] xe_exec_basic: finished subtest once-basic, SUCCESS
[  610.962741] [IGT] xe_exec_basic: exiting, ret=0
[  610.977203] Console: switching to colour frame buffer device 128x48
[  611.006675] xe_exec_basic (3237) used greatest stack depth: 11128
bytes left
[  644.682201] xe :4d:00.0: [drm] GT0: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  644.699060] xe :4d:00.0: [drm] GT1: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  644.699118] xe :4d:00.0: preparing for PCIe FLR reset
[  644.699149] xe :4d:00.0: [drm] removing device access to
userspace
[  644.928577] xe :4d:00.0: PCI device went through FLR, reenabling
the device
[  656.104233] xe :4d:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  656.149525] xe :4d:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  656.182711] xe :4d:00.0: [drm] VISIBLE VRAM: 0x2020,
0x0020
[  656.182737]

Re: [PATCH v2 0/4] drm/xe: Support PCIe FLR

2024-04-07 Thread Aravind Iddamsetty



On 05/04/24 10:30, Aravind Iddamsetty wrote:
> On 05/04/24 03:55, Rodrigo Vivi wrote:
>> On Tue, Apr 02, 2024 at 02:28:55PM +0530, Aravind Iddamsetty wrote:
>>> PCI subsystem provides callbacks to inform the driver about a request to
>>> do function level reset by user, initiated by writing to sysfs entry
>>> /sys/bus/pci/devices/.../reset. This will allow the driver to handle FLR
>>> without the need to do unbind and rebind as the driver needs to
>>> reinitialize the device afresh post FLR.
>>>
>>> v2:
>> all the patches looks good to me here. feel free to use
>>
>> Reviewed-by: Rodrigo Vivi 
>>
>> on them.
> Thank you!
>
>> but I do have some concerns (below)
>>
>>> 1. Directly expose the devm_drm_dev_release_action instead of introducing
>>> a helper (Rodrigo)
>>> 2. separate out gt idle and pci save/restore to a separate patch (Lucas)
>>> 3. Fixed the warnings seen around xe_guc_submit_stop, xe_guc_puc_fini
>> On this I also had to fight to get something working on the wedged_mode=2:
>> lore.kernel.org/all/20240403150732.102678-4-rodrigo.v...@intel.com
>>
>> perhaps we can unify things here.
> I guess we dealing with different scenarios, in this the warning in 
> xe_guc_submit_stop
> was because not invoking xe_uc_reset_prepare before. and we needn't invoke
> xe_guc_pc_fini as guc is already in stopped.
>>> Cc: Rodrigo Vivi 
>>> Cc: Lucas De Marchi 
>>>
>>> dmesg snip showing FLR recovery:
>> things came different at my DG2 here with display working and all:
> after you mentioned this i tested on DG2 i got warnings but no segmentation 
> fault
> and NPD, i have tested my branch which might not be update to date, will re 
> test with the
> latest branch.

While I check upon this is it ok to have this version of series to be merged, 
as I see
even with warnings with display the device and driver are recoverable.

Thanks,
Aravind.
>
>
> Thanks,
> Aravnd.
>> root@rdvivi-desk:/sys/module/xe/drivers/pci:xe/:03:00.0# echo 1 > reset
>> Segmentation fault
>>
>> and many kernel warnings
>>  WARNING: CPU: 8 PID: 2389 at 
>> drivers/gpu/drm/i915/display/intel_display_power_well.c:281 
>> hsw_wait_for_power_well_enable+0x3e7/0x570 [xe]
>>  WARNING: CPU: 9 PID: 1700 at drivers/gpu/drm/drm_mm.c:999 
>> drm_mm_takedown+0x41/0x60
>>
>> [  117.128330] KASAN: null-ptr-deref in range 
>> [0x04e8-0x04ef]
>> [  117.128332] CPU: 13 PID: 2389 Comm: bash Tainted: GW  
>> 6.9.0-rc1+ #9
>> [  117.135501]  ? exc_invalid_op+0x13/0x40
>> [  117.143626] Hardware name: iBUYPOWER INTEL/B660 DS3H AC DDR4-Y1, BIOS F5 
>> 12/17/2021
>> [  117.143627] RIP: 0010:__mutex_lock+0x124/0x14a0
>> [  117.143631] Code: d0 7c 08 84 d2 0f 85 62 0f 00 00 8b 0d 85 c8 8f 04 85 
>> c9 75 29 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 68 48 89 fa 48 c1 ea 03 <80> 
>> 3c 02 00 0f 85 46 0f 00 00 4d 3b 7f 68 0f 85 aa 0e 00 00 bf 01
>> [  117.150630]  ? asm_exc_invalid_op+0x16/0x20
>> [  117.156401] RSP: 0018:c90005a37690 EFLAGS: 00010202
>> [  117.156403] RAX: dc00 RBX:  RCX: 
>> 
>> [  117.163571]  ? drm_buddy_fini+0x181/0x220
>>
>>
>> and more issues.
>>
>> so it looks like we are still missing some parts of the puzzle here...
>>
>>
>>> [  590.486336] xe :4d:00.0: enabling device (0140 -> 0142)
>>> [  590.506933] xe :4d:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [  590.542355] xe :4d:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [  590.578532] xe :4d:00.0: [drm] VISIBLE VRAM: 0x2020,
>>> 0x0020
>>> [  590.578556] xe :4d:00.0: [drm] VRAM[0, 0]: Actual physical size
>>> 0x0010, usable size exclude stolen 0x000fff00, CPU
>>> accessible size 0x000fff00
>>> [  590.578560] xe :4d:00.0: [drm] VRAM[0, 0]: DPA range:
>>> [0x-10], io range:
>>> [0x2020-202fff00]
>>> [  590.578585] xe :4d:00.0: [drm] VRAM[1, 1]: Actual physical size
>>> 0x0010, usable size exclude stolen 0x000fff00, CPU
>>> accessible size 0x000fff00
>>> [  590.578589] xe :4d:00.0: [drm] VRAM[1, 1]: DPA range:
>>> [0x0010-20], io range:
>>> [0x2030-203fff00]
>>> [  590.578592] xe :4d:00.0: [drm] Total VRAM: 0x2020,
>>> 0x0020
>>> [  590.578594] xe :4d:00.0: [drm] Available VRAM:
>>> 0x2020, 0x001ffe00
>>> [  590.738899] xe :4d:00.0: [drm] GT0: CCS_MODE=0 config:0040,
>>> num_engines:1, num_slices:4
>>> [  590.889991] xe :4d:00.0: [drm] GT1: CCS_MODE=0 config:0040,
>>> num_engines:1, num_slices:4
>>> [  590.892835] [drm] Initialized xe 1.1.0 20201103 for :4d:00.0 on
>>> minor 1
>>> [  590.900215] xe :9a:00.0: enabling device (0140 -> 0142)
>>> [  590.915991] xe :9a:00.0: [drm] Using GuC firmware from
>>> xe/pvc_guc_70.20.0.bin version 70.20.0
>>> [  590.957450] xe

Re: [PATCH v2 0/4] drm/xe: Support PCIe FLR

2024-04-04 Thread Aravind Iddamsetty



On 05/04/24 03:55, Rodrigo Vivi wrote:
> On Tue, Apr 02, 2024 at 02:28:55PM +0530, Aravind Iddamsetty wrote:
>> PCI subsystem provides callbacks to inform the driver about a request to
>> do function level reset by user, initiated by writing to sysfs entry
>> /sys/bus/pci/devices/.../reset. This will allow the driver to handle FLR
>> without the need to do unbind and rebind as the driver needs to
>> reinitialize the device afresh post FLR.
>>
>> v2:
> all the patches looks good to me here. feel free to use
>
> Reviewed-by: Rodrigo Vivi 
>
> on them.

Thank you!

>
> but I do have some concerns (below)
>
>> 1. Directly expose the devm_drm_dev_release_action instead of introducing
>> a helper (Rodrigo)
>> 2. separate out gt idle and pci save/restore to a separate patch (Lucas)
>> 3. Fixed the warnings seen around xe_guc_submit_stop, xe_guc_puc_fini
> On this I also had to fight to get something working on the wedged_mode=2:
> lore.kernel.org/all/20240403150732.102678-4-rodrigo.v...@intel.com
>
> perhaps we can unify things here.
I guess we dealing with different scenarios, in this the warning in 
xe_guc_submit_stop
was because not invoking xe_uc_reset_prepare before. and we needn't invoke
xe_guc_pc_fini as guc is already in stopped.
>
>> Cc: Rodrigo Vivi 
>> Cc: Lucas De Marchi 
>>
>> dmesg snip showing FLR recovery:
> things came different at my DG2 here with display working and all:
after you mentioned this i tested on DG2 i got warnings but no segmentation 
fault
and NPD, i have tested my branch which might not be update to date, will re 
test with the
latest branch.


Thanks,
Aravnd.
>
> root@rdvivi-desk:/sys/module/xe/drivers/pci:xe/:03:00.0# echo 1 > reset
> Segmentation fault
>
> and many kernel warnings
>  WARNING: CPU: 8 PID: 2389 at 
> drivers/gpu/drm/i915/display/intel_display_power_well.c:281 
> hsw_wait_for_power_well_enable+0x3e7/0x570 [xe]
>  WARNING: CPU: 9 PID: 1700 at drivers/gpu/drm/drm_mm.c:999 
> drm_mm_takedown+0x41/0x60
>
> [  117.128330] KASAN: null-ptr-deref in range 
> [0x04e8-0x04ef]
> [  117.128332] CPU: 13 PID: 2389 Comm: bash Tainted: GW  
> 6.9.0-rc1+ #9
> [  117.135501]  ? exc_invalid_op+0x13/0x40
> [  117.143626] Hardware name: iBUYPOWER INTEL/B660 DS3H AC DDR4-Y1, BIOS F5 
> 12/17/2021
> [  117.143627] RIP: 0010:__mutex_lock+0x124/0x14a0
> [  117.143631] Code: d0 7c 08 84 d2 0f 85 62 0f 00 00 8b 0d 85 c8 8f 04 85 c9 
> 75 29 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 68 48 89 fa 48 c1 ea 03 <80> 3c 
> 02 00 0f 85 46 0f 00 00 4d 3b 7f 68 0f 85 aa 0e 00 00 bf 01
> [  117.150630]  ? asm_exc_invalid_op+0x16/0x20
> [  117.156401] RSP: 0018:c90005a37690 EFLAGS: 00010202
> [  117.156403] RAX: dc00 RBX:  RCX: 
> 
> [  117.163571]  ? drm_buddy_fini+0x181/0x220
>
>
> and more issues.
>
> so it looks like we are still missing some parts of the puzzle here...
>
>
>> [  590.486336] xe :4d:00.0: enabling device (0140 -> 0142)
>> [  590.506933] xe :4d:00.0: [drm] Using GuC firmware from
>> xe/pvc_guc_70.20.0.bin version 70.20.0
>> [  590.542355] xe :4d:00.0: [drm] Using GuC firmware from
>> xe/pvc_guc_70.20.0.bin version 70.20.0
>> [  590.578532] xe :4d:00.0: [drm] VISIBLE VRAM: 0x2020,
>> 0x0020
>> [  590.578556] xe :4d:00.0: [drm] VRAM[0, 0]: Actual physical size
>> 0x0010, usable size exclude stolen 0x000fff00, CPU
>> accessible size 0x000fff00
>> [  590.578560] xe :4d:00.0: [drm] VRAM[0, 0]: DPA range:
>> [0x-10], io range:
>> [0x2020-202fff00]
>> [  590.578585] xe :4d:00.0: [drm] VRAM[1, 1]: Actual physical size
>> 0x0010, usable size exclude stolen 0x000fff00, CPU
>> accessible size 0x000fff00
>> [  590.578589] xe :4d:00.0: [drm] VRAM[1, 1]: DPA range:
>> [0x0010-20], io range:
>> [0x2030-203fff00]
>> [  590.578592] xe :4d:00.0: [drm] Total VRAM: 0x2020,
>> 0x0020
>> [  590.578594] xe :4d:00.0: [drm] Available VRAM:
>> 0x2020, 0x001ffe00
>> [  590.738899] xe :4d:00.0: [drm] GT0: CCS_MODE=0 config:0040,
>> num_engines:1, num_slices:4
>> [  590.889991] xe :4d:00.0: [drm] GT1: CCS_MODE=0 config:0040,
>> num_engines:1, num_slices:4
>> [  590.892835] [drm] Initialized xe 1.1.0 20201103 for :4d:00.0 on
>> minor 1
>> [  590.900215] xe :9a:00.0: enabling device (0140 -> 0142)
>> [  590.915991] xe :9a:00.0: [drm] Using GuC firmware from
>> xe/pvc_guc_70.20.0.bin version 70.20.0
>> [  590.957450] xe :9a:00.0: [drm] Using GuC firmware from
>> xe/pvc_guc_70.20.0.bin version 70.20.0
>> [  590.989863] xe :9a:00.0: [drm] VISIBLE VRAM: 0x20e0,
>> 0x0020
>> [  590.989888] xe :9a:00.0: [drm] VRAM[0, 0]: Actual physical size
>> 0x0010, usable size exclude stolen 0x000fff00, CPU
>> accessible

Re: [PATCH v2 0/4] drm/xe: Support PCIe FLR

2024-04-04 Thread Rodrigo Vivi

On Tue, Apr 02, 2024 at 02:28:55PM +0530, Aravind Iddamsetty wrote:
> PCI subsystem provides callbacks to inform the driver about a request to
> do function level reset by user, initiated by writing to sysfs entry
> /sys/bus/pci/devices/.../reset. This will allow the driver to handle FLR
> without the need to do unbind and rebind as the driver needs to
> reinitialize the device afresh post FLR.
> 
> v2:

all the patches looks good to me here. feel free to use

Reviewed-by: Rodrigo Vivi 

on them.

but I do have some concerns (below)

> 1. Directly expose the devm_drm_dev_release_action instead of introducing
> a helper (Rodrigo)
> 2. separate out gt idle and pci save/restore to a separate patch (Lucas)
> 3. Fixed the warnings seen around xe_guc_submit_stop, xe_guc_puc_fini

On this I also had to fight to get something working on the wedged_mode=2:
lore.kernel.org/all/20240403150732.102678-4-rodrigo.v...@intel.com

perhaps we can unify things here.

> 
> Cc: Rodrigo Vivi 
> Cc: Lucas De Marchi 
> 
> dmesg snip showing FLR recovery:

things came different at my DG2 here with display working and all:

root@rdvivi-desk:/sys/module/xe/drivers/pci:xe/:03:00.0# echo 1 > reset
Segmentation fault

and many kernel warnings
 WARNING: CPU: 8 PID: 2389 at 
drivers/gpu/drm/i915/display/intel_display_power_well.c:281 
hsw_wait_for_power_well_enable+0x3e7/0x570 [xe]
 WARNING: CPU: 9 PID: 1700 at drivers/gpu/drm/drm_mm.c:999 
drm_mm_takedown+0x41/0x60

[  117.128330] KASAN: null-ptr-deref in range 
[0x04e8-0x04ef]
[  117.128332] CPU: 13 PID: 2389 Comm: bash Tainted: GW  
6.9.0-rc1+ #9
[  117.135501]  ? exc_invalid_op+0x13/0x40
[  117.143626] Hardware name: iBUYPOWER INTEL/B660 DS3H AC DDR4-Y1, BIOS F5 
12/17/2021
[  117.143627] RIP: 0010:__mutex_lock+0x124/0x14a0
[  117.143631] Code: d0 7c 08 84 d2 0f 85 62 0f 00 00 8b 0d 85 c8 8f 04 85 c9 
75 29 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 68 48 89 fa 48 c1 ea 03 <80> 3c 02 
00 0f 85 46 0f 00 00 4d 3b 7f 68 0f 85 aa 0e 00 00 bf 01
[  117.150630]  ? asm_exc_invalid_op+0x16/0x20
[  117.156401] RSP: 0018:c90005a37690 EFLAGS: 00010202
[  117.156403] RAX: dc00 RBX:  RCX: 
[  117.163571]  ? drm_buddy_fini+0x181/0x220


and more issues.

so it looks like we are still missing some parts of the puzzle here...


> 
> [  590.486336] xe :4d:00.0: enabling device (0140 -> 0142)
> [  590.506933] xe :4d:00.0: [drm] Using GuC firmware from
> xe/pvc_guc_70.20.0.bin version 70.20.0
> [  590.542355] xe :4d:00.0: [drm] Using GuC firmware from
> xe/pvc_guc_70.20.0.bin version 70.20.0
> [  590.578532] xe :4d:00.0: [drm] VISIBLE VRAM: 0x2020,
> 0x0020
> [  590.578556] xe :4d:00.0: [drm] VRAM[0, 0]: Actual physical size
> 0x0010, usable size exclude stolen 0x000fff00, CPU
> accessible size 0x000fff00
> [  590.578560] xe :4d:00.0: [drm] VRAM[0, 0]: DPA range:
> [0x-10], io range:
> [0x2020-202fff00]
> [  590.578585] xe :4d:00.0: [drm] VRAM[1, 1]: Actual physical size
> 0x0010, usable size exclude stolen 0x000fff00, CPU
> accessible size 0x000fff00
> [  590.578589] xe :4d:00.0: [drm] VRAM[1, 1]: DPA range:
> [0x0010-20], io range:
> [0x2030-203fff00]
> [  590.578592] xe :4d:00.0: [drm] Total VRAM: 0x2020,
> 0x0020
> [  590.578594] xe :4d:00.0: [drm] Available VRAM:
> 0x2020, 0x001ffe00
> [  590.738899] xe :4d:00.0: [drm] GT0: CCS_MODE=0 config:0040,
> num_engines:1, num_slices:4
> [  590.889991] xe :4d:00.0: [drm] GT1: CCS_MODE=0 config:0040,
> num_engines:1, num_slices:4
> [  590.892835] [drm] Initialized xe 1.1.0 20201103 for :4d:00.0 on
> minor 1
> [  590.900215] xe :9a:00.0: enabling device (0140 -> 0142)
> [  590.915991] xe :9a:00.0: [drm] Using GuC firmware from
> xe/pvc_guc_70.20.0.bin version 70.20.0
> [  590.957450] xe :9a:00.0: [drm] Using GuC firmware from
> xe/pvc_guc_70.20.0.bin version 70.20.0
> [  590.989863] xe :9a:00.0: [drm] VISIBLE VRAM: 0x20e0,
> 0x0020
> [  590.989888] xe :9a:00.0: [drm] VRAM[0, 0]: Actual physical size
> 0x0010, usable size exclude stolen 0x000fff00, CPU
> accessible size 0x000fff00
> [  590.989893] xe :9a:00.0: [drm] VRAM[0, 0]: DPA range:
> [0x-10], io range:
> [0x20e0-20efff00]
> [  590.989918] xe :9a:00.0: [drm] VRAM[1, 1]: Actual physical size
> 0x0010, usable size exclude stolen 0x000fff00, CPU
> accessible size 0x000fff00
> [  590.989921] xe :9a:00.0: [drm] VRAM[1, 1]: DPA range:
> [0x0010-20], io range:
> [0x20f0-2000]
> [  590.989924] xe :9a:00.0: [drm] Total VRAM: 0x20e0,
> 0x0020
> [

[PATCH v2 0/4] drm/xe: Support PCIe FLR

2024-04-02 Thread Aravind Iddamsetty

PCI subsystem provides callbacks to inform the driver about a request to
do function level reset by user, initiated by writing to sysfs entry
/sys/bus/pci/devices/.../reset. This will allow the driver to handle FLR
without the need to do unbind and rebind as the driver needs to
reinitialize the device afresh post FLR.

v2:
1. Directly expose the devm_drm_dev_release_action instead of introducing
a helper (Rodrigo)
2. separate out gt idle and pci save/restore to a separate patch (Lucas)
3. Fixed the warnings seen around xe_guc_submit_stop, xe_guc_puc_fini

Cc: Rodrigo Vivi 
Cc: Lucas De Marchi 

dmesg snip showing FLR recovery:

[  590.486336] xe :4d:00.0: enabling device (0140 -> 0142)
[  590.506933] xe :4d:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  590.542355] xe :4d:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  590.578532] xe :4d:00.0: [drm] VISIBLE VRAM: 0x2020,
0x0020
[  590.578556] xe :4d:00.0: [drm] VRAM[0, 0]: Actual physical size
0x0010, usable size exclude stolen 0x000fff00, CPU
accessible size 0x000fff00
[  590.578560] xe :4d:00.0: [drm] VRAM[0, 0]: DPA range:
[0x-10], io range:
[0x2020-202fff00]
[  590.578585] xe :4d:00.0: [drm] VRAM[1, 1]: Actual physical size
0x0010, usable size exclude stolen 0x000fff00, CPU
accessible size 0x000fff00
[  590.578589] xe :4d:00.0: [drm] VRAM[1, 1]: DPA range:
[0x0010-20], io range:
[0x2030-203fff00]
[  590.578592] xe :4d:00.0: [drm] Total VRAM: 0x2020,
0x0020
[  590.578594] xe :4d:00.0: [drm] Available VRAM:
0x2020, 0x001ffe00
[  590.738899] xe :4d:00.0: [drm] GT0: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  590.889991] xe :4d:00.0: [drm] GT1: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  590.892835] [drm] Initialized xe 1.1.0 20201103 for :4d:00.0 on
minor 1
[  590.900215] xe :9a:00.0: enabling device (0140 -> 0142)
[  590.915991] xe :9a:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  590.957450] xe :9a:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  590.989863] xe :9a:00.0: [drm] VISIBLE VRAM: 0x20e0,
0x0020
[  590.989888] xe :9a:00.0: [drm] VRAM[0, 0]: Actual physical size
0x0010, usable size exclude stolen 0x000fff00, CPU
accessible size 0x000fff00
[  590.989893] xe :9a:00.0: [drm] VRAM[0, 0]: DPA range:
[0x-10], io range:
[0x20e0-20efff00]
[  590.989918] xe :9a:00.0: [drm] VRAM[1, 1]: Actual physical size
0x0010, usable size exclude stolen 0x000fff00, CPU
accessible size 0x000fff00
[  590.989921] xe :9a:00.0: [drm] VRAM[1, 1]: DPA range:
[0x0010-20], io range:
[0x20f0-2000]
[  590.989924] xe :9a:00.0: [drm] Total VRAM: 0x20e0,
0x0020
[  590.989927] xe :9a:00.0: [drm] Available VRAM:
0x20e0, 0x001ffe00
[  591.142061] xe :9a:00.0: [drm] GT0: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  591.293505] xe :9a:00.0: [drm] GT1: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  591.295487] [drm] Initialized xe 1.1.0 20201103 for :9a:00.0 on
minor 2
[  610.685993] Console: switching to colour dummy device 80x25
[  610.686118] [IGT] xe_exec_basic: executing
[  610.755398] xe :4d:00.0: [drm] GT0: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  610.771783] xe :4d:00.0: [drm] GT1: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  610.773542] [IGT] xe_exec_basic: starting subtest once-basic
[  610.960251] [IGT] xe_exec_basic: finished subtest once-basic, SUCCESS
[  610.962741] [IGT] xe_exec_basic: exiting, ret=0
[  610.977203] Console: switching to colour frame buffer device 128x48
[  611.006675] xe_exec_basic (3237) used greatest stack depth: 11128
bytes left
[  644.682201] xe :4d:00.0: [drm] GT0: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  644.699060] xe :4d:00.0: [drm] GT1: CCS_MODE=0 config:0040,
num_engines:1, num_slices:4
[  644.699118] xe :4d:00.0: preparing for PCIe FLR reset
[  644.699149] xe :4d:00.0: [drm] removing device access to
userspace
[  644.928577] xe :4d:00.0: PCI device went through FLR, reenabling
the device
[  656.104233] xe :4d:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  656.149525] xe :4d:00.0: [drm] Using GuC firmware from
xe/pvc_guc_70.20.0.bin version 70.20.0
[  656.182711] xe :4d:00.0: [drm] VISIBLE VRAM: 0x2020,
0x0020
[  656.182737] xe :4d:00.0: [drm] VRAM[0, 0]: Actual physical size
0x0010, usable size exclude stolen

[PATCH v2 0/4] drm/xe: Support PCIe FLR

Re: [PATCH v2 0/4] drm/xe: Support PCIe FLR

Re: [PATCH v2 0/4] drm/xe: Support PCIe FLR

Re: [PATCH v2 0/4] drm/xe: Support PCIe FLR

[PATCH v2 0/4] drm/xe: Support PCIe FLR

5 matches

Site Navigation

Mail list logo

Footer information