Bug#986312: NVRM: request_mem_region failed for 0M @ 0x0.

2021-04-02 Thread PICCA Frederic-Emmanuel
I have also this information in the syslog

Mar 29 08:01:51 re-grades-01 kernel: [1.762583] pci :42:04.0: BAR 15: 
no space for [mem size 0x13800 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762584] pci :42:04.0: BAR 15: 
failed to assign [mem size 0x13800 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762586] pci :42:08.0: BAR 15: 
no space for [mem size 0x13800 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762587] pci :42:08.0: BAR 15: 
failed to assign [mem size 0x13800 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762589] pci :42:0c.0: BAR 15: 
no space for [mem size 0x13800 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762590] pci :42:0c.0: BAR 15: 
failed to assign [mem size 0x13800 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762592] pci :43:00.0: BAR 1: no 
space for [mem size 0x1000 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762593] pci :43:00.0: BAR 1: 
trying firmware assignment [mem 0x2e08000-0x2e08fff 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762594] pci :43:00.0: BAR 1: 
[mem 0x2e08000-0x2e08fff 64bit pref] conflicts with PCI Bus :40 
[mem 0x260c020-0x2e0c01\
f window]
Mar 29 08:01:51 re-grades-01 kernel: [1.762595] pci :43:00.0: BAR 1: 
failed to assign [mem size 0x1000 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762596] pci :43:00.0: BAR 8: no 
space for [mem size 0x1 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762597] pci :43:00.0: BAR 8: 
trying firmware assignment [mem 0x2df8000-0x2e07fff 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762598] pci :43:00.0: BAR 8: 
[mem 0x2df8000-0x2e07fff 64bit pref] conflicts with PCI Bus :40 
[mem 0x260c020-0x2e0c01\
f window]
Mar 29 08:01:51 re-grades-01 kernel: [1.762599] pci :43:00.0: BAR 8: 
failed to assign [mem size 0x1 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762600] pci :43:00.0: BAR 3: no 
space for [mem size 0x0200 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762601] pci :43:00.0: BAR 3: 
trying firmware assignment [mem 0x2e0b000-0x2e0b1ff 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762602] pci :43:00.0: BAR 3: 
[mem 0x2e0b000-0x2e0b1ff 64bit pref] conflicts with PCI Bus :40 
[mem 0x260c020-0x2e0c01\
f window]
Mar 29 08:01:51 re-grades-01 kernel: [1.762603] pci :43:00.0: BAR 3: 
failed to assign [mem size 0x0200 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762604] pci :43:00.0: BAR 10: 
no space for [mem size 0x2000 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762604] pci :43:00.0: BAR 10: 
trying firmware assignment [mem 0x2e09000-0x2e0afff 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762606] pci :43:00.0: BAR 10: 
[mem 0x2e09000-0x2e0afff 64bit pref] conflicts with PCI Bus :40 
[mem 0x260c020-0x2e0c01fff\
ff window]
Mar 29 08:01:51 re-grades-01 kernel: [1.762606] pci :43:00.0: BAR 10: 
failed to assign [mem size 0x2000 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762607] pci :42:04.0: PCI 
bridge to [bus 43]
Mar 29 08:01:51 re-grades-01 kernel: [1.762611] pci :42:04.0:   bridge 
window [mem 0xca00-0xcb3f]
Mar 29 08:01:51 re-grades-01 kernel: [1.762619] pci :44:00.0: BAR 1: no 
space for [mem size 0x1000 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762620] pci :44:00.0: BAR 1: 
trying firmware assignment [mem 0x2df4000-0x2df4fff 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762621] pci :44:00.0: BAR 1: 
[mem 0x2df4000-0x2df4fff 64bit pref] conflicts with PCI Bus :40 
[mem 0x260c020-0x2e0c01\
f window]
Mar 29 08:01:51 re-grades-01 kernel: [1.762622] pci :44:00.0: BAR 1: 
failed to assign [mem size 0x1000 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762622] pci :44:00.0: BAR 8: no 
space for [mem size 0x1 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762623] pci :44:00.0: BAR 8: 
trying firmware assignment [mem 0x2de4000-0x2df3fff 64bit pref]
Mar 29 08:01:51 re-grades-01 kernel: [1.762624] pci :44:00.0: BAR 8: 
[mem 0x2de4000-0x2df3fff 64bit pref] conflicts with PCI Bus :40 
[mem 0x260c020-0x2e0c01\


Maybe the conflict is responsible of this issue.

Do you have an idea of what should cause this conflict ?



Bug#986312: NVRM: request_mem_region failed for 0M @ 0x0.

2021-04-02 Thread PICCA Frederic-Emmanuel
Hello andreas

> what's the NUMA layout of the machine? what cpus do you have in there?

I attached the lstopo output.

> Great. Half of the bus ids are in hex, half in decimal.

> How many cards do you have in there? And how many pci devices are there
> per card?

the list is in the reportbug :)

3 x RTX 3090 and 6 x T4

> Have you tried different drivers? (tesla-450, tesla-460)

only the 460

> Have you tried with only one kind of cards (Tesla or Geforce) in there?

no

> Have you tried with only a single card in there?

no


Fred

re-grades-01.pdf
Description: re-grades-01.pdf


Bug#986312: NVRM: request_mem_region failed for 0M @ 0x0.

2021-04-02 Thread Andreas Beckmann

On 02/04/2021 22.06, PICCA Frederic-Emmanuel wrote:

All the graphical cards on the numa node 0 are not available on my computer.


what's the NUMA layout of the machine? what cpus do you have in there?


We are using this computer in order to do data treatment in a scientific 
facility.

During the boot we have these error messages

Mar 29 13:08:12 re-grades-01 kernel: [6.726771] NVRM: request_mem_region 
failed for 0M @ 0x0. This can
Mar 29 13:08:12 re-grades-01 kernel: [6.726771] NVRM: occur when a driver 
such as rivatv is loaded and claims
Mar 29 13:08:12 re-grades-01 kernel: [6.726771] NVRM: ownership of the 
device's registers.
Mar 29 13:08:12 re-grades-01 kernel: [6.726792] nvidia: probe of 
:43:00.0 failed with error -1

...

I would like to help debug this issue, but I do not know where to start.

thanks for considering




[22.870] (--) PCI: (67@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x260d000/268435456, 0x261e000/33554432
[22.870] (--) PCI: (68@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x2633000/268435456, 0x2644000/33554432
[22.870] (--) PCI: (69@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x2659000/268435456, 0x266a000/33554432
[22.870] (--) PCI: (70@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @ 
0x267f000/268435456, 0x268/33554432, I/O @ 0x6000/128
[22.870] (--) PCI:*(101@0:0:0) 1a03:2000:1458:1000 rev 65, Mem @ 
0xd200/33554432, 0xd400/131072, I/O @ 0x7000/128, BIOS @ 
0x/131072
[22.870] (--) PCI: (131@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x9600/16777216, 0x6616000/268435456, 0x6619000/33554432
[22.871] (--) PCI: (132@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x9400/16777216, 0x6602000/268435456, 0x6605000/33554432
[22.871] (--) PCI: (133@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @ 
0x9a00/16777216, 0x661c000/268435456, 0x661d000/33554432, I/O @ 
0xd000/128, BIOS @ 0x/524288
[22.871] (--) PCI: (134@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @ 
0x9800/16777216, 0x661a000/268435456, 0x661b000/33554432, I/O @ 
0xc000/128, BIOS @ 0x/524288
[22.871] (--) PCI: (135@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x9200/16777216, 0x65ee000/268435456, 0x65f1000/33554432


Great. Half of the bus ids are in hex, half in decimal.

How many cards do you have in there? And how many pci devices are there 
per card?


Have you tried different drivers? (tesla-450, tesla-460)
Have you tried with only one kind of cards (Tesla or Geforce) in there?
Have you tried with only a single card in there?

Andreas