Re: eGPU failed to initialize

2020-01-02 Thread Christian König

Hi Qu,

that problem is completely unrelated to amdgpu. See you thunderbold 
bridge fails to assign the necessary I/O resources to the PCI device 
long before amdgpu even loads:


From your dmesg:


Jan 01 07:22:22 thinkpad kernel: pci_bus :06: Allocating resources
Jan 01 07:22:22 thinkpad kernel: pci :09:04.0: bridge window [io  
0x1000-0x0fff] to [bus 0b-3a] add_size 1000
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: bridge window [io  
0x1000-0x1fff] to [bus 09-3a] add_size 1000
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: BAR 13: no space 
for [io  size 0x2000]
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: BAR 13: failed to 
assign [io  size 0x2000]
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: BAR 13: assigned 
[io  0x2000-0x2fff]
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: BAR 13: [io 
0x2000-0x2fff] (failed to expand by 0x1000)
Jan 01 07:22:22 thinkpad kernel: pci :08:00.0: failed to add 1000 
res[13]=[io  0x2000-0x2fff]
Jan 01 07:22:22 thinkpad kernel: pci :09:01.0: BAR 13: assigned 
[io  0x2000-0x2fff]
Jan 01 07:22:22 thinkpad kernel: pci :09:04.0: BAR 13: no space 
for [io  size 0x1000]
Jan 01 07:22:22 thinkpad kernel: pci :09:04.0: BAR 13: failed to 
assign [io  size 0x1000]
Jan 01 07:22:22 thinkpad kernel: pci :09:01.0: BAR 13: assigned 
[io  0x2000-0x2fff]
Jan 01 07:22:22 thinkpad kernel: pci :09:04.0: BAR 13: no space 
for [io  size 0x1000]
Jan 01 07:22:22 thinkpad kernel: pci :09:04.0: BAR 13: failed to 
assign [io  size 0x1000]


This is a rather unusual problem and I have no idea how you ended up 
with that. But with this setup it is impossible for the driver to access 
the device.


Regards,
Christian.

Am 01.01.20 um 10:31 schrieb Qu Wenruo:

Hi,

Not sure if this is reported before, but amdgpu is initialized for an
external GPU (thunderbolt 3), which is not accessible at boot, only
after boltctl initialized the tb3 subsystem.

Then amdgpu will report an timeout, and failed to really initialize the GPU.
At this stage, one my of monitors (U2414H, DP) reports unsupported
framerate, while the other monitor (HP 24mh, HDMI) just reports no signal

The involved GPU is RX580. The tb3 enclosure is AORUS GAMING BOX.

And obviously, this eGPU works pretty fine under Windows.
So my normal boot routine needs to boot into windows, then reboot into
Linux without unplug the tb3 connector, to make the eGPU work under Linux.

The kernel warning is:
Jan 01 07:22:25 thinkpad kernel: [drm] REG_WAIT timeout 10us * 3500
tries - dce_mi_free_dmif line:634
Jan 01 07:22:25 thinkpad kernel: [ cut here ]
Jan 01 07:22:25 thinkpad kernel: WARNING: CPU: 6 PID: 804 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:332
generic_reg_wait.cold+0x25/0x2c [amdgpu]
Jan 01 07:22:25 thinkpad kernel: Modules linked in: xt_CHECKSUM
xt_MASQUERADE xt_conntrack ipt_REJECT tun bridge stp llc nf_tables_set
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct amdgpu nft_chain_nat msr
nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle
gpu_sched iptable_raw ttm iptable_security nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 xt_tcpudp ip_set nfnetlink ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter cmac algif_hash algif_skcipher
af_alg bnep joydev mousedev btrfs xor rmi_smbus rmi_core
snd_hda_codec_hdmi iTCO_wdt mei_wdt mei_hdcp iTCO_vendor_support
snd_hda_codec_realtek intel_rapl_msr wmi_bmof raid6_pq
intel_wmi_thunderbolt iwlmvm snd_hda_codec_generic x86_pkg_temp_thermal
intel_powerclamp snd_hda_intel coretemp snd_intel_nhlt mac80211
kvm_intel snd_hda_codec nls_iso8859_1 libarc4 uvcvideo nls_cp437
intel_cstate btusb snd_hda_core
Jan 01 07:22:25 thinkpad kernel:  vfat videobuf2_vmalloc intel_uncore
btrtl snd_hwdep btbcm iwlwifi videobuf2_memops intel_rapl_perf fat
videobuf2_v4l2 btintel snd_pcm pcspkr psmouse input_leds
videobuf2_common mei_me e1000e i2c_i801 snd_timer thunderbolt bluetooth
cfg80211 videodev mei thinkpad_acpi intel_xhci_usb_role_switch
processor_thermal_device ucsi_acpi ecdh_generic mc nvram ecc
intel_rapl_common intel_soc_dts_iosf crc16 intel_pch_thermal roles
typec_ucsi ledtrig_audio rfkill typec snd int3403_thermal wmi soundcore
battery ac int340x_thermal_zone i2c_hid hid evdev int3400_thermal
mac_hid acpi_thermal_rel crypto_user acpi_call(OE) kvmgt i915 vfio_mdev
mdev vfio_iommu_type1 vfio i2c_algo_bit drm_kms_helper drm intel_gtt
agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops kvm irqbypass
ip_tables x_tables xfs libcrc32c crc32c_generic sd_mod uas usb_storage
scsi_mod dm_crypt crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel dm_mod serio_raw atkbd libps2 aesni_intel
crypto_simd xhci_pci cryptd
Jan 01 07:22:25 thinkpad kernel:  glue_helper xhci_hcd i8042 serio
Jan 01 07:22:25 thinkpad kernel: CPU: 6 PID: 804 Comm: Xorg Tainted: G
   U OE 5.4.6-ar

eGPU failed to initialize

2020-01-01 Thread Qu Wenruo
Hi,

Not sure if this is reported before, but amdgpu is initialized for an
external GPU (thunderbolt 3), which is not accessible at boot, only
after boltctl initialized the tb3 subsystem.

Then amdgpu will report an timeout, and failed to really initialize the GPU.
At this stage, one my of monitors (U2414H, DP) reports unsupported
framerate, while the other monitor (HP 24mh, HDMI) just reports no signal

The involved GPU is RX580. The tb3 enclosure is AORUS GAMING BOX.

And obviously, this eGPU works pretty fine under Windows.
So my normal boot routine needs to boot into windows, then reboot into
Linux without unplug the tb3 connector, to make the eGPU work under Linux.

The kernel warning is:
Jan 01 07:22:25 thinkpad kernel: [drm] REG_WAIT timeout 10us * 3500
tries - dce_mi_free_dmif line:634
Jan 01 07:22:25 thinkpad kernel: [ cut here ]
Jan 01 07:22:25 thinkpad kernel: WARNING: CPU: 6 PID: 804 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:332
generic_reg_wait.cold+0x25/0x2c [amdgpu]
Jan 01 07:22:25 thinkpad kernel: Modules linked in: xt_CHECKSUM
xt_MASQUERADE xt_conntrack ipt_REJECT tun bridge stp llc nf_tables_set
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct amdgpu nft_chain_nat msr
nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle
gpu_sched iptable_raw ttm iptable_security nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 xt_tcpudp ip_set nfnetlink ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter cmac algif_hash algif_skcipher
af_alg bnep joydev mousedev btrfs xor rmi_smbus rmi_core
snd_hda_codec_hdmi iTCO_wdt mei_wdt mei_hdcp iTCO_vendor_support
snd_hda_codec_realtek intel_rapl_msr wmi_bmof raid6_pq
intel_wmi_thunderbolt iwlmvm snd_hda_codec_generic x86_pkg_temp_thermal
intel_powerclamp snd_hda_intel coretemp snd_intel_nhlt mac80211
kvm_intel snd_hda_codec nls_iso8859_1 libarc4 uvcvideo nls_cp437
intel_cstate btusb snd_hda_core
Jan 01 07:22:25 thinkpad kernel:  vfat videobuf2_vmalloc intel_uncore
btrtl snd_hwdep btbcm iwlwifi videobuf2_memops intel_rapl_perf fat
videobuf2_v4l2 btintel snd_pcm pcspkr psmouse input_leds
videobuf2_common mei_me e1000e i2c_i801 snd_timer thunderbolt bluetooth
cfg80211 videodev mei thinkpad_acpi intel_xhci_usb_role_switch
processor_thermal_device ucsi_acpi ecdh_generic mc nvram ecc
intel_rapl_common intel_soc_dts_iosf crc16 intel_pch_thermal roles
typec_ucsi ledtrig_audio rfkill typec snd int3403_thermal wmi soundcore
battery ac int340x_thermal_zone i2c_hid hid evdev int3400_thermal
mac_hid acpi_thermal_rel crypto_user acpi_call(OE) kvmgt i915 vfio_mdev
mdev vfio_iommu_type1 vfio i2c_algo_bit drm_kms_helper drm intel_gtt
agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops kvm irqbypass
ip_tables x_tables xfs libcrc32c crc32c_generic sd_mod uas usb_storage
scsi_mod dm_crypt crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel dm_mod serio_raw atkbd libps2 aesni_intel
crypto_simd xhci_pci cryptd
Jan 01 07:22:25 thinkpad kernel:  glue_helper xhci_hcd i8042 serio
Jan 01 07:22:25 thinkpad kernel: CPU: 6 PID: 804 Comm: Xorg Tainted: G
  U OE 5.4.6-arch3-1 #1
Jan 01 07:22:25 thinkpad kernel: Hardware name: LENOVO
20KHCTO1WW/20KHCTO1WW, BIOS N23ET68W (1.43 ) 10/16/2019
Jan 01 07:22:25 thinkpad kernel: RIP:
0010:generic_reg_wait.cold+0x25/0x2c [amdgpu]
Jan 01 07:22:25 thinkpad kernel: Code: e9 82 23 fe ff 44 8b 44 24 24 48
8b 4c 24 18 44 89 fa 89 ee 48 c7 c7 50 73 ab c1 e8 96 5d 92 ef 83 7b 20
01 0f 84 48 31 fe ff <0f> 0b e9 41 31 fe ff e8 b2 16 e7 ff 48 c7 c7 00
50 b7 c1 e8 e6 8e
Jan 01 07:22:25 thinkpad kernel: RSP: 0018:9e61c147b5c8 EFLAGS: 00010297
Jan 01 07:22:25 thinkpad kernel: RAX: 0044 RBX:
95854881e200 RCX: 
Jan 01 07:22:25 thinkpad kernel: RDX:  RSI:
95854e397708 RDI: 
Jan 01 07:22:25 thinkpad kernel: RBP: 000a R08:
0507 R09: 0004
Jan 01 07:22:25 thinkpad kernel: R10:  R11:
0001 R12: 0322
Jan 01 07:22:25 thinkpad kernel: R13: 0dad R14:
0001 R15: 0dac
Jan 01 07:22:25 thinkpad kernel: FS:  7f9e87acbdc0()
GS:95854e38() knlGS:
Jan 01 07:22:25 thinkpad kernel: CS:  0010 DS:  ES:  CR0:
80050033
Jan 01 07:22:25 thinkpad kernel: CR2: 7f9e831de5b0 CR3:
0004893f4006 CR4: 003606e0
Jan 01 07:22:25 thinkpad kernel: DR0:  DR1:
 DR2: 
Jan 01 07:22:25 thinkpad kernel: DR3:  DR6:
fffe0ff0 DR7: 0400
Jan 01 07:22:25 thinkpad kernel: Call Trace:
Jan 01 07:22:25 thinkpad kernel:  dce_mi_free_dmif+0xf7/0x160 [amdgpu]
Jan 01 07:22:25 thinkpad kernel:  dce110_reset_hw_ctx_wrap+0x193/0x260
[amdgpu]
Jan 01 07:22:25 thinkpad kernel:  dce110_apply_ctx_to_hw+0x51