hi,
with the 2xT4 we haven't seen any trouble. we have no nvlink there.

did u try to disable the nvlink?



Vinícius Ferrão via Users <users@ovirt.org> schrieb am Fr., 4. Sept. 2020,
08:39:

> Hello, here we go again.
>
> I’m trying to passthrough 4x NVIDIA Tesla V100 GPUs (with NVLink) to a
> single VM; but things aren’t that good. Only one GPU shows up on the VM.
> lspci is able to show the GPUs, but three of them are unusable:
>
> 08:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]
> (rev a1)
> 09:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]
> (rev a1)
> 0a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]
> (rev a1)
> 0b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]
> (rev a1)
>
> There are some errors on dmesg, regarding a misconfigured BIOS:
>
> [   27.295972] nvidia: loading out-of-tree module taints kernel.
> [   27.295980] nvidia: module license 'NVIDIA' taints kernel.
> [   27.295981] Disabling lock debugging due to kernel taint
> [   27.304180] nvidia: module verification failed: signature and/or
> required key missing - tainting kernel
> [   27.364244] nvidia-nvlink: Nvlink Core is being initialized, major
> device number 241
> [   27.579261] nvidia 0000:09:00.0: enabling device (0000 -> 0002)
> [   27.579560] NVRM: This PCI I/O region assigned to your NVIDIA device is
> invalid:
>                NVRM: BAR1 is 0M @ 0x0 (PCI:0000:09:00.0)
> [   27.579560] NVRM: The system BIOS may have misconfigured your GPU.
> [   27.579566] nvidia: probe of 0000:09:00.0 failed with error -1
> [   27.580727] NVRM: This PCI I/O region assigned to your NVIDIA device is
> invalid:
>                NVRM: BAR0 is 0M @ 0x0 (PCI:0000:0a:00.0)
> [   27.580729] NVRM: The system BIOS may have misconfigured your GPU.
> [   27.580734] nvidia: probe of 0000:0a:00.0 failed with error -1
> [   27.581299] NVRM: This PCI I/O region assigned to your NVIDIA device is
> invalid:
>                NVRM: BAR0 is 0M @ 0x0 (PCI:0000:0b:00.0)
> [   27.581300] NVRM: The system BIOS may have misconfigured your GPU.
> [   27.581305] nvidia: probe of 0000:0b:00.0 failed with error -1
> [   27.581333] NVRM: The NVIDIA probe routine failed for 3 device(s).
> [   27.581334] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  450.51.06
> Sun Jul 19 20:02:54 UTC 2020
> [   27.649128] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver
> for UNIX platforms  450.51.06  Sun Jul 19 20:06:42 UTC 2020
>
> The host is Secure Intel Skylake (x86_64). VM is running with Q35 Chipset
> with UEFI (pc-q35-rhel8.2.0)
>
> I’ve tried to change the I/O mapping options on the host, tried with 56TB
> and 12TB without success. Same results. Didn’t tried with 512GB since the
> machine have 768GB of system RAM.
>
> Tried blacklisting the nouveau on the host, nothing.
> Installed NVIDIA drivers on the host, nothing.
>
> In the host I can use the 4x V100, but inside a single VM it’s impossible.
>
> Any suggestions?
>
>
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/73CXU27AX6ND6EXUJKBKKRWM6DJH7UL7/
>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PIO4DIVUU4JWG5FXYW3NQSVXCFZWYV26/

Reply via email to