Bug#986312: NVRM: request_mem_region failed for 0M @ 0x0.
I have also this information in the syslog Mar 29 08:01:51 re-grades-01 kernel: [1.762583] pci :42:04.0: BAR 15: no space for [mem size 0x13800 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762584] pci :42:04.0: BAR 15: failed to assign [mem size 0x13800 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762586] pci :42:08.0: BAR 15: no space for [mem size 0x13800 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762587] pci :42:08.0: BAR 15: failed to assign [mem size 0x13800 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762589] pci :42:0c.0: BAR 15: no space for [mem size 0x13800 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762590] pci :42:0c.0: BAR 15: failed to assign [mem size 0x13800 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762592] pci :43:00.0: BAR 1: no space for [mem size 0x1000 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762593] pci :43:00.0: BAR 1: trying firmware assignment [mem 0x2e08000-0x2e08fff 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762594] pci :43:00.0: BAR 1: [mem 0x2e08000-0x2e08fff 64bit pref] conflicts with PCI Bus :40 [mem 0x260c020-0x2e0c01\ f window] Mar 29 08:01:51 re-grades-01 kernel: [1.762595] pci :43:00.0: BAR 1: failed to assign [mem size 0x1000 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762596] pci :43:00.0: BAR 8: no space for [mem size 0x1 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762597] pci :43:00.0: BAR 8: trying firmware assignment [mem 0x2df8000-0x2e07fff 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762598] pci :43:00.0: BAR 8: [mem 0x2df8000-0x2e07fff 64bit pref] conflicts with PCI Bus :40 [mem 0x260c020-0x2e0c01\ f window] Mar 29 08:01:51 re-grades-01 kernel: [1.762599] pci :43:00.0: BAR 8: failed to assign [mem size 0x1 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762600] pci :43:00.0: BAR 3: no space for [mem size 0x0200 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762601] pci :43:00.0: BAR 3: trying firmware assignment [mem 0x2e0b000-0x2e0b1ff 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762602] pci :43:00.0: BAR 3: [mem 0x2e0b000-0x2e0b1ff 64bit pref] conflicts with PCI Bus :40 [mem 0x260c020-0x2e0c01\ f window] Mar 29 08:01:51 re-grades-01 kernel: [1.762603] pci :43:00.0: BAR 3: failed to assign [mem size 0x0200 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762604] pci :43:00.0: BAR 10: no space for [mem size 0x2000 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762604] pci :43:00.0: BAR 10: trying firmware assignment [mem 0x2e09000-0x2e0afff 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762606] pci :43:00.0: BAR 10: [mem 0x2e09000-0x2e0afff 64bit pref] conflicts with PCI Bus :40 [mem 0x260c020-0x2e0c01fff\ ff window] Mar 29 08:01:51 re-grades-01 kernel: [1.762606] pci :43:00.0: BAR 10: failed to assign [mem size 0x2000 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762607] pci :42:04.0: PCI bridge to [bus 43] Mar 29 08:01:51 re-grades-01 kernel: [1.762611] pci :42:04.0: bridge window [mem 0xca00-0xcb3f] Mar 29 08:01:51 re-grades-01 kernel: [1.762619] pci :44:00.0: BAR 1: no space for [mem size 0x1000 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762620] pci :44:00.0: BAR 1: trying firmware assignment [mem 0x2df4000-0x2df4fff 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762621] pci :44:00.0: BAR 1: [mem 0x2df4000-0x2df4fff 64bit pref] conflicts with PCI Bus :40 [mem 0x260c020-0x2e0c01\ f window] Mar 29 08:01:51 re-grades-01 kernel: [1.762622] pci :44:00.0: BAR 1: failed to assign [mem size 0x1000 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762622] pci :44:00.0: BAR 8: no space for [mem size 0x1 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762623] pci :44:00.0: BAR 8: trying firmware assignment [mem 0x2de4000-0x2df3fff 64bit pref] Mar 29 08:01:51 re-grades-01 kernel: [1.762624] pci :44:00.0: BAR 8: [mem 0x2de4000-0x2df3fff 64bit pref] conflicts with PCI Bus :40 [mem 0x260c020-0x2e0c01\ Maybe the conflict is responsible of this issue. Do you have an idea of what should cause this conflict ?
Bug#986312: NVRM: request_mem_region failed for 0M @ 0x0.
Hello andreas > what's the NUMA layout of the machine? what cpus do you have in there? I attached the lstopo output. > Great. Half of the bus ids are in hex, half in decimal. > How many cards do you have in there? And how many pci devices are there > per card? the list is in the reportbug :) 3 x RTX 3090 and 6 x T4 > Have you tried different drivers? (tesla-450, tesla-460) only the 460 > Have you tried with only one kind of cards (Tesla or Geforce) in there? no > Have you tried with only a single card in there? no Fred re-grades-01.pdf Description: re-grades-01.pdf
Bug#986312: NVRM: request_mem_region failed for 0M @ 0x0.
On 02/04/2021 22.06, PICCA Frederic-Emmanuel wrote: All the graphical cards on the numa node 0 are not available on my computer. what's the NUMA layout of the machine? what cpus do you have in there? We are using this computer in order to do data treatment in a scientific facility. During the boot we have these error messages Mar 29 13:08:12 re-grades-01 kernel: [6.726771] NVRM: request_mem_region failed for 0M @ 0x0. This can Mar 29 13:08:12 re-grades-01 kernel: [6.726771] NVRM: occur when a driver such as rivatv is loaded and claims Mar 29 13:08:12 re-grades-01 kernel: [6.726771] NVRM: ownership of the device's registers. Mar 29 13:08:12 re-grades-01 kernel: [6.726792] nvidia: probe of :43:00.0 failed with error -1 ... I would like to help debug this issue, but I do not know where to start. thanks for considering [22.870] (--) PCI: (67@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 0x260d000/268435456, 0x261e000/33554432 [22.870] (--) PCI: (68@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 0x2633000/268435456, 0x2644000/33554432 [22.870] (--) PCI: (69@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 0x2659000/268435456, 0x266a000/33554432 [22.870] (--) PCI: (70@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @ 0x267f000/268435456, 0x268/33554432, I/O @ 0x6000/128 [22.870] (--) PCI:*(101@0:0:0) 1a03:2000:1458:1000 rev 65, Mem @ 0xd200/33554432, 0xd400/131072, I/O @ 0x7000/128, BIOS @ 0x/131072 [22.870] (--) PCI: (131@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 0x9600/16777216, 0x6616000/268435456, 0x6619000/33554432 [22.871] (--) PCI: (132@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 0x9400/16777216, 0x6602000/268435456, 0x6605000/33554432 [22.871] (--) PCI: (133@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @ 0x9a00/16777216, 0x661c000/268435456, 0x661d000/33554432, I/O @ 0xd000/128, BIOS @ 0x/524288 [22.871] (--) PCI: (134@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @ 0x9800/16777216, 0x661a000/268435456, 0x661b000/33554432, I/O @ 0xc000/128, BIOS @ 0x/524288 [22.871] (--) PCI: (135@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 0x9200/16777216, 0x65ee000/268435456, 0x65f1000/33554432 Great. Half of the bus ids are in hex, half in decimal. How many cards do you have in there? And how many pci devices are there per card? Have you tried different drivers? (tesla-450, tesla-460) Have you tried with only one kind of cards (Tesla or Geforce) in there? Have you tried with only a single card in there? Andreas