Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> Some screens attached (hopefully not too heavy). > Didn't have time to do better. Select your favourite ones. > > I upgraded Linux to newer version (Ubuntu 16.10, kernel 4.8), > and it broke the driver. OpenCL does not work at all anymore. > The screens were made on newer system -- nothing seemed to be > changed in Xorg. Actually, OpenCL still works, after I rebooted host and reinstalled AMD's Pro driver and OpenCL SDK. I did upgrade to even newer Linux, I guess it's the latest development Ubuntu version (don't know how to find OS version [`uname -a` doesn't tell], but kernel is 4.9.0-11-generic). Messages like 'Warning: LLVM emitted unknown config register: 0x4' seem to be gone. I'm getting strange numbers with cl-mem test now, I think stranger than before the upgrade (but not sure, did not do many cl-mem tests back then): # ~/cl-mem/cl-mem Running write test. 128 GB in 688.6 ms (185.9 GB/s) Running read test. 128 GB in 596.1 ms (214.7 GB/s) Running copy test. 128 GB in 715.3 ms (179.0 GB/s) # ~/cl-mem/cl-mem Running write test. 128 GB in 684.8 ms (186.9 GB/s) Running read test. 128 GB in 596.8 ms (214.5 GB/s) Running copy test. 128 GB in 715.1 ms (179.0 GB/s) After `glxgears -fullscreen` run: # ~/cl-mem/cl-mem Running write test. 128 GB in 868.3 ms (147.4 GB/s) Running read test. 128 GB in 275.4 ms (464.8 GB/s) Running copy test. 128 GB in 3878.0 ms (33.0 GB/s) # ~/cl-mem/cl-mem Running write test. 128 GB in 878.8 ms (145.7 GB/s) Running read test. 128 GB in 293.1 ms (436.7 GB/s) Running copy test. 128 GB in 3659.9 ms (35.0 GB/s) [after couple minutes] # ~/cl-mem/cl-mem Running write test. 128 GB in 687.4 ms (186.2 GB/s) Running read test. 128 GB in 596.8 ms (214.5 GB/s) Running copy test. 128 GB in 715.0 ms (179.0 GB/s) The copy test is slow, because there are _lots_ of kernel messages printed like this: [ 1780.569388] amdgpu :00:04.0: GPU fault detected: 147 0x0fba4402 [ 1780.569830] amdgpu :00:04.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x1073 [ 1780.570357] amdgpu :00:04.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B044002 or more generally: [ 1780.569388] amdgpu :00:04.0: GPU fault detected: 147 0x0x02 [ 1780.569830] amdgpu :00:04.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x [ 1780.570357] amdgpu :00:04.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B04x002 The read and write tests results are way too high, as I'm assuming the test is transferring data over PCIe. The copy test at 180 GB/s is reasonable, and matches hardware expectations and others' tests [0]. Also 'mixbench' seems to produce reasonable results that are order-of-2-magnitude comparable to others' tests, like [0]: 5 TFLOPS single precision, 350 MFLOPS double precision (should be single_precision/16 for ATI cards), and 1 TIOPS for integer (32-bit). More captures of Xorg screen attached. Behaviour is very strange, but may give some hints of what might be going wrong to those familiar with X, video memory, framebuffers, DRI, GFX and all that weird and wonderful stuff. When `glxgears` is run in fullscreen mode, what's on screen depends on each run. Framerate varies from run to run, is mostly stable within one run but can change abruptly by 100's of FPS. When the screen is blank, the framerate is slowest (750~1500 FPS). When only parts of gears are rendered, framerate is usually higher, up to 2500 FPS. Rotation is smooth, but with brightness flicker of some parts of gears sometimes. I should mention these numbers are for 1600x1200 screen. Also, with VNC session closed (but 'vino' still running), the frame rate goes up to far more reasonable 6400 FPS. [0] http://cdn.videocardz.com/1/2016/06/Radeon-RX-480-vs-GTX-970-AIDA-GPU-2.png CC'ing to freebsd-virtualization@, as this is likely to be of general interest (the screenshot attachment has been removed). -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> When `glxgears` is run in fullscreen mode, what's on screen depends > on each run. Framerate varies from run to run, is mostly stable within > one run but can change abruptly by 100's of FPS. When the screen is > blank, the framerate is slowest (750~1500 FPS). When only parts of > gears are rendered, framerate is usually higher, up to 2500 FPS. > Rotation is smooth, but with brightness flicker of some parts of > gears sometimes. I should mention these numbers are for 1600x1200 screen. Also, with VNC session closed (but vino still running), the frame rate goes up to far more reasonable 6400 FPS. -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Hi, Does bhyve not execute peripheral cards' option ROMs? Not yet. I guess it doesn't. This could explain a lot of strange behaviour seen resulting from running in a VM. Yes. How does UEFI work in this regard? My guess is that cards have to explicitly support the new boot method (UEFI)? Yes - an additional section in the option ROM is needed, but as mentioned in an earlier email, that support is now widespread thanks to Windows. So passthrough with newer cards may be easier? This could explain why the newer RX 480 worked right away, and the older Quadro 2000 (and a lot of other nVidia cards without manufacturer's support for VMs) had no chance -- UEFI cards are somehow more "autonomous". Possibly, though it might also be the card itself not requiring as much initialization from the option ROM. It all is just speculation on my side, I know nothing about this UEFI stuff. Could you summarize in couple sentences what's the deal between bhyve and UEFI (if there is any), or future plans? UEFI is the ROM firmware for bhyve (and most modern PCs). bhyve has a custom build of the standard Intel EDK2 distribution: https://github.com/freebsd/uefi-edk2/tree/bhyve/UDK2014.SP1 The changes are to support running as a hypervisor guest, where a lot of what is in a normal boot ROM isn't required (e.g. DRAM controller setup, CPU microcode update), and it also contains drivers for device emulations supported by bhyve. Currently, the ability to process an option ROM has been disabled. later, Peter. ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Hi, That is extremely likely. bhyve itself doesn't have a BIOS, though bhyve/UEFI could be modified to handle options ROMs (see http://awilliam.github.io/presentations/KVM-Forum-2014/#/) Hm, interesting. I wonder if a card that's not designed for use with UEFI is destined not to work well/at_all with bhyve... I'll read the presentation later. I think in general almost all cards have UEFI ROM support these days since it has been mandated by Microsoft. However, as Rod mentioned, the bhyve UEFI implementation does not run PCI device option ROMs. (see http://vfio.blogspot.com/2014/08/does-my-graphics-card-rom-support-efi.html) -GPU UUID: GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2 +GPU UUID: Unknown Error That implies some type of h/w access isn't working, either MMIO registers or response from a DMA command. I have a feeling it's something to do with DMA that's not getting configured correctly for data transfers, and returns wrong data (or good data to wrong location). Yes. A general issue with PCI passthrough is that often MMIO from the guest works, since that is just VT-x remapping, but DMA doesn't work due to issues with IOMMU programming (or incorrect mappings being used). This gives a device that partially works in that registers can be read, but data transfer doesn't work. Didn't we verify that the BARs are programmed correctly? The BAR values you see are fictional and are created by bhyve. The actual physical BAR values are those set up by the host BIOS. bhyve uses EPT mappings to translate between the 'fake' value and the real value. So you're saying that bhyve has a bug in that it doesn't program the IOMMU right to match guest's memory-mapped address regions to host's addresses? There isn't a known bug, but the 64-bit BAR region hasn't been tested for a long time so it's possible there is an issue with it. BTW, is it [generally] safe to decrease the BAR base address further? My workstation has a CPU with just 36 address bits... Yes. The only potential conflict is with the top of guest RAM, and 36 bits is a lot of RAM :) 64G of RAM isn't that much these days, how incredible is that :) But you're saying there's nothing else inbetween the top of guest's RAM and the BAR base? In that case it's nothing to worry about at all, as a guest will always have less RAM that the host's CPU can address. Right - the 64-bit PCI decode region would be set dynamically based on the phys address bits, rather than being a hard-coded value. later, Peter. ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> I suspect this is a failure to run the BIOS code that > enables the secondary power connector so you can come > out of slot only power mode. Well, that Quadro does not have a power connector, but I imagine card BIOS routines would be similar between all cards in a family, including those that require the extra power. Does bhyve not execute peripheral cards' option ROMs? I guess it doesn't. This could explain a lot of strange behaviour seen resulting from running in a VM. How does UEFI work in this regard? My guess is that cards have to explicitly support the new boot method (UEFI)? So passthrough with newer cards may be easier? This could explain why the newer RX 480 worked right away, and the older Quadro 2000 (and a lot of other nVidia cards without manufacturer's support for VMs) had no chance -- UEFI cards are somehow more "autonomous". It all is just speculation on my side, I know nothing about this UEFI stuff. Could you summarize in couple sentences what's the deal between bhyve and UEFI (if there is any), or future plans? > The general rule on other platforms is that ATI/AMD cards tend to just > work, where as the NVidia cards are very picky and unless official > listed as known to work when passed through you well fight problems. > Very few cards are listed as known to work, most populars being the > Quadro 2000, and the Quadro FX3800. Many cards are listed as known to > NOT work. Yeah, messing with nVidia for this reason (and because of the closed driver) seems to me like a huge time sink. I don't have the time, so decided to try with AMD for now. I am only interested in nVidia because it's best choice for a Windows VM I hope to run (SolidWorks, Altium, etc). OpenCL on AMD is a priority for me personally right now anyway. > GOOD WORK on getting as far as you have as quickly as you have! > Note that the https://wiki.freebsd.org/bhyve/pci_passthru has > had a small update to reflect we know that VGA passthrough is > not working at this time. Also a note about AMD IOmmu/AMD-Vi > was added, hopefully saving someone from duplicate work. Perhaps a note could be added about ATI/AMD cards partially working, to encourage others to play? -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
... > > > -Performance State : P0 > > > +Performance State : P8 > > > > Note sure what's happening here. > > Driver not kicking the card's BIOS into the right mode > to switch to dynamic power state selection? I suspect this is a failure to run the BIOS code that enables the secondary power connector so you can come out of slot only power mode. > > > Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: > > > Display engine push buffer channel allocation failed Jan 11 11:34:49 > > > fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate > > > display engine core DMA push buffer > > > > Not sure what's happening with those. > > > > Would it be possible to try the nouveau driver ? At least the source > > is available, so it may be easier to determine what is broken. > > I could, but for now I'd like to focus more on AMD card > (which also has an open-source driver). The general rule on other platforms is that ATI/AMD cards tend to just work, where as the NVidia cards are very picky and unless official listed as known to work when passed through you well fight problems. Very few cards are listed as known to work, most populars being the Quadro 2000, and the Quadro FX3800. Many cards are listed as known to NOT work. ... GOOD WORK on getting as far as you have as quickly as you have! Note that the https://wiki.freebsd.org/bhyve/pci_passthru has had a small update to reflect we know that VGA passthrough is not working at this time. Also a note about AMD IOmmu/AMD-Vi was added, hopefully saving someone from duplicate work. -- Rod Grimes rgri...@freebsd.org ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> -- VDPAU works, but I suspect it's not using the GPU [3][4]; > I haven't figure a way how to force the use of GPU. Also, > the main window with text looks OK most of the time (when > doing the video test and in the end, in particular), but > show a smaller black rectangle in top left corner of the > screen instead of the video samples; > -- it almost feels like the DMA and framebuffers aren't always > correctly configured, but still are transferring data [from > somewhere to somewhere sometimes]. It is possible run a VNC server, vino, on top of the Xorg session. I can send screenshots of how the display looks like to anyone interested. -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> > First, `nvidia-smi -q` output diff [0] is interesting. It suggests > > that the card may be in some incompletely initialized state: notice > > the "Unknown Error" instead of real UUID, and the P8 power state. > > Could it be that the driver doesn't put the card's BIOS in the right > > state? > > That is extremely likely. bhyve itself doesn't have a BIOS, though > bhyve/UEFI could be modified to handle options ROMs (see > http://awilliam.github.io/presentations/KVM-Forum-2014/#/) Hm, interesting. I wonder if a card that's not designed for use with UEFI is destined not to work well/at_all with bhyve... I'll read the presentation later. > > -GPU UUID: > > GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2 > > +GPU UUID: Unknown Error > > That implies some type of h/w access isn't working, either MMIO > registers or response from a DMA command. I have a feeling it's something to do with DMA that's not getting configured correctly for data transfers, and returns wrong data (or good data to wrong location). > > -Board ID: 0x100 > > +Board ID: 0x4 > > The same ? I'm quite sure it was the same card. > > PCIe Generation > > Max : 2 > > -Current : 2 > > +Current : 1 > > bhyve's emulated PCI hostbridge only advertises gen-1 - that could be > easily changed to gen2. That could make a difference for some of the > clock issues below > (source is pci_emul.c:pci_emul_add_pciecap()) I doubt the generation number matters. But yeah, wouldn't hurt to change it to '2'. > > Link Width > > Max : 16x > > Current : 16x > That's a bit unexpected since the hostbridge only advertises 1x, but > the driver is probably exporting the host value here. Yeah, nVidia is known to like talking directly to the card in its own, proprietary way. > > -Performance State : P0 > > +Performance State : P8 > > Note sure what's happening here. Driver not kicking the card's BIOS into the right mode to switch to dynamic power state selection? > > Clocks > > -Graphics: 625 MHz > > -SM : 1251 MHz > > -Memory : 1304 MHz > > -Video : 540 MHz > > +Graphics: 405 MHz > > +SM : 810 MHz > > +Memory : 324 MHz > > +Video : 405 MHz > > This may be related to the gen1 vs gen2 issue above. I doubt it's related to PCIe gen. Most likely because the card seems to remain in P8 (low power) mode, according to the same SMI tool. But the frequencies don't look right anyway; well, I didn't bother to look up what P8 is supposed to run at. > > When rebooting, I get this: > > nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: > > 0x857d:0:0:0x0040 > > This may be DMA not working. Yes, I strongly suspect DMA too, especially when it comes to DRI stuff. > A general issue with PCI passthrough is that often MMIO from the > guest works, since that is just VT-x remapping, but DMA doesn't work > due to issues with IOMMU programming (or incorrect mappings being > used). This gives a device that partially works in that registers can > be read, but data transfer doesn't work. Didn't we verify that the BARs are programmed correctly? So you're saying that bhyve has a bug in that it doesn't program the IOMMU right to match guest's memory-mapped address regions to host's addresses? > > Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: > > Display engine push buffer channel allocation failed Jan 11 11:34:49 > > fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate > > display engine core DMA push buffer > > Not sure what's happening with those. > > Would it be possible to try the nouveau driver ? At least the source > is available, so it may be easier to determine what is broken. I could, but for now I'd like to focus more on AMD card (which also has an open-source driver). > > BTW, is it [generally] safe to decrease the BAR base address further? > > My workstation has a CPU with just 36 address bits... > Yes. The only potential conflict is with the top of guest RAM, and 36 > bits is a lot of RAM :) 64G of RAM isn't that much these days, how incredible is that :) But you're saying there's nothing else inbetween the top of guest's RAM and the BAR base? In that case it's nothing to worry about at all, as a guest will always have less RAM that the host's CPU can address. > later, > Peter. -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mail
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Good news, everyone! I tried an AMD card, and it is almost working. I have a lot of logs and info, but I will try to restrain the length of this message. There was no need to do anything special to get the card to work, other than figuring out how to deal with Linux, setting up drivers and OpenCL SDK & linking libraries to the right places, compiling soft, etc. First, PCI info [0] and some dmesg bits [1]. AMD drivers: amdgpu-pro-16.50-362463.tar.xz AMD-APP-SDKInstaller-v3.0.130.136-GA-linux64.tar.bz2 Next, the good news: Xorg starts, and display works!.. Kind of: -- `glxgears` window is flickery, has parts of gears missing, and does not look good in general; -- xterm window has the rectangular cursor shapes plastered all over, in random places; -- full-screen (1600x1200) `glxgears` is slower than expected, and the performance varies suddenly [2]; -- VDPAU works, but I suspect it's not using the GPU [3][4]; I haven't figure a way how to force the use of GPU. Also, the main window with text looks OK most of the time (when doing the video test and in the end, in particular), but show a smaller black rectangle in top left corner of the screen instead of the video samples; -- it almost feels like the DMA and framebuffers aren't always correctly configured, but still are transferring data [from somewhere to somewhere sometimes]. I'm getting lots of messages like [5][6], among others, in various cases. Of some 3 OpenCL applications I tested, one appeared to complete successfully [7]. Running it also produces messages as in [6]. But the numbers make sense, comparing to e.g. tests of R9 Nano (~/mixbench/results/OpenCL/alt_R9-Nano_d1912.5.log) and expectations of the GPU chip. This is exciting! Dunno if the benchmark check whether the computations are correct, though. `clinfo` result [8] is a bit mixed... e.g., "Max clock frequency: 555Mhz" is wrong. Some more tests [9]. [0] 00:04.0 0300: 1002:67df (rev c7) (prog-if 00 [VGA controller]) Subsystem: 1682:9480 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L0s <64ns, L1 <1us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: fee03000 Data: 4022 Kernel driver in use: amdgpu Kernel modules: amdgpu 00:05.0 0300: 10de:0dd8 (rev a1) (prog-if 00 [VGA controller]) Subsystem: 10de:084a Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_367, nvidia_367_drm [1] [0.617109] pci :00:04.0: can't claim BAR 6 [mem 0xf684-0xf685 pref]: no compatible bridge window [0.617806] pci :00:05.0: can't claim BAR 6 [mem 0xf600-0xf607 pref]: no compatible bridge window [0.618496] pci :00:05.0: BAR 6: assigned [mem 0xc008-0xc00f pref] [0.619000] pci :00:04.0: BAR 6: assigned [mem 0xc002-0xc003 pref] [0.619508] pci :00:01.0: BAR 6: assigned [mem 0xc0004000-0xc00047ff pref] [0.620011] pci :00:02.0: BAR 6: assigned [mem 0xc0004800-0xc0004fff pref] [0.620513] pci :
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Hi, BTW, is it [generally] safe to decrease the BAR base address further? > My workstation has a CPU with just 36 address bits... Yes. The only potential conflict is with the top of guest RAM, and 36 bits is a lot of RAM :) later, Peter. ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Hi, First, `nvidia-smi -q` output diff [0] is interesting. It suggests that the card may be in some incompletely initialized state: notice the "Unknown Error" instead of real UUID, and the P8 power state. Could it be that the driver doesn't put the card's BIOS in the right state? That is extremely likely. bhyve itself doesn't have a BIOS, though bhyve/UEFI could be modified to handle options ROMs (see http://awilliam.github.io/presentations/KVM-Forum-2014/#/) The command was run in both host and guest without Xorg loaded. Thanks for the diff; this is very useful. -GPU UUID: GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2 +GPU UUID: Unknown Error That implies some type of h/w access isn't working, either MMIO registers or response from a DMA command. -Board ID: 0x100 +Board ID: 0x4 The same ? PCIe Generation Max : 2 -Current : 2 +Current : 1 bhyve's emulated PCI hostbridge only advertises gen-1 - that could be easily changed to gen2. That could make a difference for some of the clock issues below (source is pci_emul.c:pci_emul_add_pciecap()) Link Width Max : 16x Current : 16x That's a bit unexpected since the hostbridge only advertises 1x, but the driver is probably exporting the host value here. -Performance State : P0 +Performance State : P8 Note sure what's happening here. Clocks -Graphics: 625 MHz -SM : 1251 MHz -Memory : 1304 MHz -Video : 540 MHz +Graphics: 405 MHz +SM : 810 MHz +Memory : 324 MHz +Video : 405 MHz This may be related to the gen1 vs gen2 issue above. When rebooting, I get this: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x857d:0:0:0x0040 This may be DMA not working. A general issue with PCI passthrough is that often MMIO from the guest works, since that is just VT-x remapping, but DMA doesn't work due to issues with IOMMU programming (or incorrect mappings being used). This gives a device that partially works in that registers can be read, but data transfer doesn't work. Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer Not sure what's happening with those. Would it be possible to try the nouveau driver ? At least the source is available, so it may be easier to determine what is broken. later, Peter. ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
... > > Incidentally, could someone put a note about that hardcoded BAR base > on the bhyve PCI passthrough page [0] if it won't be fixed soon, so > many others can play with VGA passthrough meanwhile? I am working with Michael Dexter to get changes made to this wiki page to reflect your work here in getting a step closer to working VGA passthrough, along with a note that we know it does not work at the present time. > [0] https://wiki.freebsd.org/bhyve/pci_passthru > > -- > [SorAlx] ridin' VN2000 Classic LT -- Rod Grimes rgri...@freebsd.org ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> This gives me the idea to try a different driver version in Linux... Tried the same driver version in Linux as in FreeBSD. The driver seems to talk to the card now, but not sure whether I can call this progress: [0.536988] PCI host bridge to bus :00 [0.537291] pci_bus :00: root bus resource [io 0x-0x0cf7 window] [0.537776] pci_bus :00: root bus resource [io 0x0d00-0x1fff window] [0.538248] pci_bus :00: root bus resource [io 0x2000-0x211f window] [0.538722] pci_bus :00: root bus resource [mem 0xc000-0xc40f window] [0.539244] pci_bus :00: root bus resource [mem 0x34-0x340c0f window] [0.539791] pci_bus :00: root bus resource [bus 00] [0.540204] pci :00:00.0: [1275:1275] type 00 class 0x06 [0.540402] pci :00:00.0: reg 0x30: [mem 0x-0x07ff pref] [0.540557] pci :00:01.0: [8086:7000] type 00 class 0x060100 [0.540826] pci :00:01.0: reg 0x30: [mem 0x-0x07ff pref] [0.540923] pci :00:02.0: [1af4:1001] type 00 class 0x01 [0.541052] pci :00:02.0: reg 0x10: [io 0x2000-0x203f] [0.541090] pci :00:02.0: reg 0x14: [mem 0xc000-0xc0001fff] [0.541273] pci :00:02.0: reg 0x30: [mem 0x-0x07ff pref] [0.541442] pci :00:03.0: [1af4:1000] type 00 class 0x02 [0.541568] pci :00:03.0: reg 0x10: [io 0x2040-0x205f] [0.541605] pci :00:03.0: reg 0x14: [mem 0xc0002000-0xc0003fff] [0.541786] pci :00:03.0: reg 0x30: [mem 0x-0x07ff pref] [0.541992] pci :00:04.0: [10de:0dd8] type 00 class 0x03 [0.542136] pci :00:04.0: reg 0x10: [mem 0xc200-0xc3ff] [0.542198] pci :00:04.0: reg 0x14: [mem 0x34-0x3407ff 64bit pref] [0.542259] pci :00:04.0: reg 0x1c: [mem 0x340800-0x340bff 64bit pref] [0.542302] pci :00:04.0: reg 0x24: [io 0x2080-0x20ff] [0.542346] pci :00:04.0: reg 0x30: [mem 0xf600-0xf607 pref] [0.549031] vgaarb: setting as boot device: PCI::00:04.0 [0.549430] vgaarb: device added: PCI::00:04.0,decodes=io+mem,owns=io+mem,locks=none [0.549995] vgaarb: loaded [0.550190] vgaarb: bridge control possible :00:04.0 [0.616082] pci :00:04.0: can't claim BAR 6 [mem 0xf600-0xf607 pref]: no compatible bridge window [0.616775] pci :00:04.0: BAR 6: assigned [mem 0xc008-0xc00f pref] [0.617281] pci :00:01.0: BAR 6: assigned [mem 0xc0004000-0xc00047ff pref] [0.617789] pci :00:02.0: BAR 6: assigned [mem 0xc0004800-0xc0004fff pref] [0.618303] pci :00:03.0: BAR 6: assigned [mem 0xc0005000-0xc00057ff pref] [0.618807] pci_bus :00: resource 4 [io 0x-0x0cf7 window] [0.618808] pci_bus :00: resource 5 [io 0x0d00-0x1fff window] [0.618809] pci_bus :00: resource 6 [io 0x2000-0x211f window] [0.618810] pci_bus :00: resource 7 [mem 0xc000-0xc40f window] [0.618811] pci_bus :00: resource 8 [mem 0x34-0x340c0f window] [1.669308] nvidia: module verification failed: signature and/or required key missing - tainting kernel [1.676499] AVX2 version of gcm_enc/dec engaged. [1.676844] nvidia :00:04.0: can't derive routing for PCI INT A [1.676845] nvidia :00:04.0: PCI INT A: no GSI [1.676904] vgaarb: device changed decodes: PCI::00:04.0,olddecodes=io+mem,decodes=none:owns=io+mem [1.676983] nvidia-nvlink: Nvlink Core is being initialized, major device number 248 [1.676991] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016 [1.683125] AES CTR mode by8 optimization enabled [1.687576] [drm] Initialized drm 1.1.0 20060810 [1.706991] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 367.57 Mon Oct 3 20:32:57 PDT 2016 [1.708732] [drm] [nvidia-drm] [GPU ID 0x0004] Loading driver After starting Xorg: [ 23.762260] divide error: [#1] SMP [ 23.762271] Modules linked in: nvidia_uvm(POE) mac_hid 8250_fintek ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nvidia_drm(POE) nvidia_modeset(POE) drm_kms_helper crct10dif_pclmul syscopyarea sysfillrect crc32_pclmul ghash_clmulni_intel sysimgblt fb_sys_fops drm aesni_intel aes_x86_64 lrw nvidia(POE) gf128mul glue_helper ablk_helper cryptd fjes [ 23.762273] CPU: 2 PID: 1423 Comm: Xorg Tainted: P OE 4.4.0-59-generic #80-Ubuntu [ 23.762273] Hardware name: BHYVE, BIOS 1.00 03/14/2014 [ 23.762274] task: 880005129c00 ti: 880006b08000 task.ti: 880006b08000 [ 23.762373] RIP: 0010:[] [] _nv008359rm+0xdb/0x150 [nvidia] [ 23.762374] RSP: 0018:880006b0b990 EFLAGS: 00010246 [ 23.762374] RAX: 00
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> That's a different issue - it's unlikely, if not impossible, to > configure bhyve with enough RAM to hit 37 bits worth where that would > become a problem. No need to worry about that. Well, there may be peripheral cards that have less bits... Anyway, I see what you mean: memory manager can always remap the DMA regions to lower virtual addresses. Good to know it's not an issue. > There are lots of BIOS/UEFI implementations out there that have the > same restriction. In general, there should be no need for a guest to > reprogram device BARs. There are all sorts of situations... But anyway, if it is too much work, then we can forget about this until someone needs it again. > After changing the 64-bit BAR base address, did you still need the > pci=nocrs option for Linux ? I'd hope this would be no longer necessary. No. As expected, the option is no longer needed. BTW, is it [generally] safe to decrease the BAR base address further? My workstation has a CPU with just 36 address bits... > The problem is the knowledge set of graphics/GPU knowhow and > equipment access, and bhyve/PCI programming, are disjoint. The time > I've spent on it has been the inverse, where I feel that I've spent a > half-day doing things that anyone who knew about graphics could get > done in a half-hour :) > > For these type of issues, joint work is best to leverage the > knowledge of both sides. From my point-of-view, the work you've done > has been very helpful. Yeah, we could benefit from more information exchange, I agree. I am trying to do my part to share what I learned; I started with no knowledge of bhyve, PCI, or GPU passthrough a week ago -- so for now my efforts appear useful, but soon the progress will stop, and we will need someone with real knowledge to step in. Incidentally, could someone put a note about that hardcoded BAR base on the bhyve PCI passthrough page [0] if it won't be fixed soon, so many others can play with VGA passthrough meanwhile? [0] https://wiki.freebsd.org/bhyve/pci_passthru > later, > Peter. -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> > Removing another signature of detecting virtualization and increasing > > compatibility would be negligible gain? Just asking... > I don't think we are going to try and defeat the NVidia virtualization > checks, and I can probably assure you that they would patch them as > fast as we bypassed them. No-no, that's not what I meant. It is not about nVidia, just general quality of the hypervisor. Ideally, running in the VM should to be indistinguishable from running on bare metal, except things like CPUID, peripheral ID strings, etc (that can be easily changed by recompiling). For instance, suppose someone is doing security research -- studying clever malware for example; you'd want to run in VM, but want to have as few ways as possible for the subject to find out that it is running in a VM. Or Internet security... Just some examples. FreeBSD was always (as far as I remember, anyway) in the lead in research (as a platform) and quality, so I figured it'd be nice to keep it this way. > This is officially an item on my plate. Functional hardware that works > doing VGA passthrough under ESXi was brought up about 2 weeks ago so I > am past that stage.I have Quadro FX3800 on a Supermicro X9DAi. Great! I'm glad to see there is interest in GPU passthrough on FreeBSD. > You have helped some in uncovering the next set of issues, but my > plate is very large, and seems to have grown appendages that are > holding all sorts of things :-) Yeah, it's a dream to have a nice, flat plates for projects :) > I am still coming up to speed on the bhyve code, so it wont be > a half hour fix by me, but it well get fixed. Count me in for help. I haven't got lots of time to play with this (or at least shouldn't be spending so much time), but can aid with testing, and also can provide access to a machine with an AMD card (or both nVidia & AMD). > Rod Grimes -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> I hope you keep it up or at least figure out what the driver is doing. Not in my plans at the moment. I prefer AMD GPUs over nV for OpenCL. nVidia did (and does) serve me well for the last 10 year with their excellent FreeBSD graphics driver: I had very few problems with it; it's stable and well documented, and has the performance. Huge thanks and respect to them for that [I just bought another low-end nVidia card for my home computer]. But times a-changin'. We have things like OpenCL and virtualization these days that are of interest and available to everyone (not just the "professionals"). > If they haven't explicitly put in the license terms that virtualization > is forbidden for consumer cards, there's nothing wrong with hot > patching the driver ... Why do you care about some license? It's your card, your computer, so you're free to do what _you_ want with it. How can one "forbid" virtualization? It's impossible. Only way is to design hardware and firmware that is not amenable to VT -- but nobody would do that nowadays. > assuming that they don't do things like Skype > does where it repeatedly checksums the memory image. There are ways around things. But it is easier to buy AMD than to fight someone's stupidity or incompetence. > Good hunting. > > -M -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> Xorg log bits [2] show that X is up. But the monitor stays in sleep BTW, this is what happens to Xorg: USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 638 100.0 0.8 50481584 50612 u0 R< 12:214:33.05 |-- /usr/local/bin/X :0 -auth /root/.serverauth.624 (Xorg) It cannot be killed, and gdb hangs when trying to attach to the process. -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
I hope you keep it up or at least figure out what the driver is doing. If they haven't explicitly put in the license terms that virtualization is forbidden for consumer cards, there's nothing wrong with hot patching the driver ... assuming that they don't do things like Skype does where it repeatedly checksums the memory image. Good hunting. -M On Wed, Jan 11, 2017 at 19:54 wrote: > > > I had a bit more play with nVidia and FreeBSD guest. > > > > First, `nvidia-smi -q` output diff [0] is interesting. It suggests that > > the card may be in some incompletely initialized state: notice the > > "Unknown Error" instead of real UUID, and the P8 power state. Could it > > be that the driver doesn't put the card's BIOS in the right state? > > The command was run in both host and guest without Xorg loaded. > > > > Second, I was able to start Xorg by disabling rendering acceleration > > (Option "NoAccel"). Now nVidia's Xorg module does not fail to allocate > > DMA (I guess it does not try?), but oddly, reserves 48 GB (!?) of virtual > > memory instead. Sadly there is still no display for some reason. > > > > Relevant dmesg bits are below [1]. Of particular interest is the line > > "nvidia-modeset: Allocated GPU:0 () @ PCI::00:00.0" -- the PCI > > address is obviously incorrect. > > > > Xorg log bits [2] show that X is up. But the monitor stays in sleep mode. > > With more options [3], I get this: [4]. Edit: actually, host reboot > > made it behave the same as just with "NoAccel", maybe. > > > > Clearly the driver is able to talk to the card: e.g., it attaches and > > responds to `nvidia-smi` [with the exception of UUID], reads EDID from > > the monitor. But some channel of communication is clearly missing or > > not working right. Any ideas how to go about finding out which one? > > > > [0] > > ==NVSMI LOG== > > > > -Timestamp : Wed Jan 11 19:40:54 2017 > > +Timestamp : Wed Jan 11 11:08:40 2017 > > Driver Version : 367.44 > > > > Attached GPUs : 1 > > -GPU :01:00.0 > > +GPU :00:04.0 > > Product Name: Quadro 2000 > > Product Brand : Quadro > > Display Mode: Enabled > > @@ -17,11 +17,11 @@ > > Current : N/A > > Pending : N/A > > Serial Number : N/A > > -GPU UUID: > GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2 > > +GPU UUID: Unknown Error > > Minor Number: 0 > > VBIOS Version : 70.06.0D.00.02 > > MultiGPU Board : No > > -Board ID: 0x100 > > +Board ID: 0x4 > > GPU Part Number : N/A > > Inforom Version > > Image Version : N/A > > @@ -34,16 +34,16 @@ > > GPU Virtualization Mode > > Virtualization mode : None > > PCI > > -Bus : 0x01 > > -Device : 0x00 > > +Bus : 0x00 > > +Device : 0x04 > > Domain : 0x > > Device Id : 0x0DD810DE > > -Bus Id : :01:00.0 > > +Bus Id : :00:04.0 > > Sub System Id : 0x084A10DE > > GPU Link Info > > PCIe Generation > > Max : 2 > > -Current : 2 > > +Current : 1 > > Link Width > > Max : 16x > > Current : 16x > > @@ -54,7 +54,7 @@ > > Tx Throughput : N/A > > Rx Throughput : N/A > > Fan Speed : 30 % > > -Performance State : P0 > > +Performance State : P8 > > Clocks Throttle Reasons : N/A > > FB Memory Usage > > Total : 963 MiB > > @@ -113,7 +113,7 @@ > > Double Bit ECC : N/A > > Pending : N/A > > Temperature > > -GPU Current Temp: 38 C > > +GPU Current Temp: 35 C > > GPU Shutdown Temp : N/A > > GPU Slowdown Temp : N/A > > Power Readings > > @@ -125,10 +125,10 @@ > > Min Power Limit : N/A > > Max Power Limit : N/A > > Clocks > > -Graphics: 625 MHz > > -SM : 1251 MHz > > -Memory : 1304 MHz > > -Video : 540 MHz > > +Graphics
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
I had a bit more play with nVidia and FreeBSD guest. First, `nvidia-smi -q` output diff [0] is interesting. It suggests that the card may be in some incompletely initialized state: notice the "Unknown Error" instead of real UUID, and the P8 power state. Could it be that the driver doesn't put the card's BIOS in the right state? The command was run in both host and guest without Xorg loaded. Second, I was able to start Xorg by disabling rendering acceleration (Option "NoAccel"). Now nVidia's Xorg module does not fail to allocate DMA (I guess it does not try?), but oddly, reserves 48 GB (!?) of virtual memory instead. Sadly there is still no display for some reason. Relevant dmesg bits are below [1]. Of particular interest is the line "nvidia-modeset: Allocated GPU:0 () @ PCI::00:00.0" -- the PCI address is obviously incorrect. Xorg log bits [2] show that X is up. But the monitor stays in sleep mode. With more options [3], I get this: [4]. Edit: actually, host reboot made it behave the same as just with "NoAccel", maybe. Clearly the driver is able to talk to the card: e.g., it attaches and responds to `nvidia-smi` [with the exception of UUID], reads EDID from the monitor. But some channel of communication is clearly missing or not working right. Any ideas how to go about finding out which one? [0] ==NVSMI LOG== -Timestamp : Wed Jan 11 19:40:54 2017 +Timestamp : Wed Jan 11 11:08:40 2017 Driver Version : 367.44 Attached GPUs : 1 -GPU :01:00.0 +GPU :00:04.0 Product Name: Quadro 2000 Product Brand : Quadro Display Mode: Enabled @@ -17,11 +17,11 @@ Current : N/A Pending : N/A Serial Number : N/A -GPU UUID: GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2 +GPU UUID: Unknown Error Minor Number: 0 VBIOS Version : 70.06.0D.00.02 MultiGPU Board : No -Board ID: 0x100 +Board ID: 0x4 GPU Part Number : N/A Inforom Version Image Version : N/A @@ -34,16 +34,16 @@ GPU Virtualization Mode Virtualization mode : None PCI -Bus : 0x01 -Device : 0x00 +Bus : 0x00 +Device : 0x04 Domain : 0x Device Id : 0x0DD810DE -Bus Id : :01:00.0 +Bus Id : :00:04.0 Sub System Id : 0x084A10DE GPU Link Info PCIe Generation Max : 2 -Current : 2 +Current : 1 Link Width Max : 16x Current : 16x @@ -54,7 +54,7 @@ Tx Throughput : N/A Rx Throughput : N/A Fan Speed : 30 % -Performance State : P0 +Performance State : P8 Clocks Throttle Reasons : N/A FB Memory Usage Total : 963 MiB @@ -113,7 +113,7 @@ Double Bit ECC : N/A Pending : N/A Temperature -GPU Current Temp: 38 C +GPU Current Temp: 35 C GPU Shutdown Temp : N/A GPU Slowdown Temp : N/A Power Readings @@ -125,10 +125,10 @@ Min Power Limit : N/A Max Power Limit : N/A Clocks -Graphics: 625 MHz -SM : 1251 MHz -Memory : 1304 MHz -Video : 540 MHz +Graphics: 405 MHz +SM : 810 MHz +Memory : 324 MHz +Video : 405 MHz Applications Clocks Graphics: N/A Memory : N/A [1] nvidia0: on vgapci0 vgapci0: child nvidia0 requested pci_enable_io vgapci0: attempting to allocate 1 MSI vectors (1 supported) msi: routing MSI IRQ 269 to local APIC 3 vector 51 vgapci0: using IRQ 269 for MSI vgapci0: child nvidia0 requested pci_enable_io nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 367.44 Wed Aug 17 22:05:09 PDT 2016 acquiring duplicate lock of same type: "os.lock_sx" 1st os.lock_sx @ nvidia_os.c:599 2nd os.lock_sx @ nvidia_os.c:599 stack backtrace: #
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> > > Hi, > > > > >> The problem appears to be in the area of assigning memory-mapped > > >> I/O ranges by bhyve for the VGA card to a region outside of the > > >> CPU's addressable space; i.e., bhyve does not check CPUID's > > >> 0x8008 AL value (0x27 for my CPU, which is 39 bits -- while > > >> bhyve assigns 0xd0 & above for the large Prefetch Memory > > >> chunks, which requires 40 address bits). > > > > That's correct - it's a bug in bhyve. > > Baking a proper fix will be more complicated by the fact that PCIe > cards themselves may have limitations. For example, most nVidia GPUs > have 40 bits DMA addressing capability, some 39, an a few (still > modern) ones -- just 37 [ref. nVidia "README" in the driver package]. > > > PCI passthru doesn't allow the BAR values to be modified (this could > > be changed, but it's a lot of work for little gain). > > Removing another signature of detecting virtualization and increasing > compatibility would be negligible gain? Just asking... I don't think we are going to try and defeat the NVidia virtualization checks, and I can probably assure you that they would patch them as fast as we bypassed them. > > > But: > > > # ./nvidia-smi > > > No devices were found > > > dmesg: > > > [ 173.498953] NVRM: RmInitAdapter failed! (0x53:0x3:1856) > > > [ 173.499115] NVRM: rm_init_adapter failed for device bearing > > > minor number 0 > > > > Looks like you're getting close :) > > Hmm, I'm not seeing myself getting much closer here. Do you know > something I don't? ;) I really hope bhyve developers can spare a > bit of time on getting GPU passthrough to work... I know nothing > about these things, and where I waste half a day messing around, > the problem could be fixed in half an hour by someone who knows. This is officially an item on my plate. Functional hardware that works doing VGA passthrough under ESXi was brought up about 2 weeks ago so I am past that stage.I have Quadro FX3800 on a Supermicro X9DAi. You have helped some in uncovering the next set of issues, but my plate is very large, and seems to have grown appendages that are holding all sorts of things :-) I am still coming up to speed on the bhyve code, so it wont be a half hour fix by me, but it well get fixed. > > later, > > Peter. > > -- > [SorAlx] ridin' VN2000 Classic LT > ___ > freebsd-virtualization@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to > "freebsd-virtualization-unsubscr...@freebsd.org" -- Rod Grimes rgri...@freebsd.org ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Hi, That's correct - it's a bug in bhyve. Baking a proper fix will be more complicated by the fact that PCIe cards themselves may have limitations. For example, most nVidia GPUs have 40 bits DMA addressing capability, some 39, an a few (still modern) ones -- just 37 [ref. nVidia "README" in the driver package]. That's a different issue - it's unlikely, if not impossible, to configure bhyve with enough RAM to hit 37 bits worth where that would become a problem. No need to worry about that. PCI passthru doesn't allow the BAR values to be modified (this could be changed, but it's a lot of work for little gain). Removing another signature of detecting virtualization and increasing compatibility would be negligible gain? Just asking... There are lots of BIOS/UEFI implementations out there that have the same restriction. In general, there should be no need for a guest to reprogram device BARs. After changing the 64-bit BAR base address, did you still need the pci=nocrs option for Linux ? I'd hope this would be no longer necessary. But: # ./nvidia-smi No devices were found dmesg: [ 173.498953] NVRM: RmInitAdapter failed! (0x53:0x3:1856) [ 173.499115] NVRM: rm_init_adapter failed for device bearing minor number 0 Looks like you're getting close :) Hmm, I'm not seeing myself getting much closer here. Do you know something I don't? ;) I really hope bhyve developers can spare a bit of time on getting GPU passthrough to work... I know nothing about these things, and where I waste half a day messing around, the problem could be fixed in half an hour by someone who knows. The problem is the knowledge set of graphics/GPU knowhow and equipment access, and bhyve/PCI programming, are disjoint. The time I've spent on it has been the inverse, where I feel that I've spent a half-day doing things that anyone who knew about graphics could get done in a half-hour :) For these type of issues, joint work is best to leverage the knowledge of both sides. From my point-of-view, the work you've done has been very helpful. later, Peter. ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> Is the VM checking documented in the driver notes somewhere? I have a It's not in their driver's "README" file. > Titan X that I need to run CUDA on and would be much happier if I > didn't have to actually switch back and forth between FreeBSD and > Ubuntu on my desktop. Are we new fairly certain that this won't work? Not certain. The idea that nVidia artificially limits the use of the non-pro cards in VMs in their drivers are only speculations. There is a possibility that certain BIOS and/or hardware features are missing in the gaming cards. > (Yet another reason to go with AMD if they ever deliver on ROCm) Yeah, AMD are pretty good for computing. And they don't seem to limit floating-point performance as severely as nVidia for non-"professional" cards. -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> Hi, > > >> The problem appears to be in the area of assigning memory-mapped > >> I/O ranges by bhyve for the VGA card to a region outside of the > >> CPU's addressable space; i.e., bhyve does not check CPUID's > >> 0x8008 AL value (0x27 for my CPU, which is 39 bits -- while > >> bhyve assigns 0xd0 & above for the large Prefetch Memory > >> chunks, which requires 40 address bits). > > That's correct - it's a bug in bhyve. Baking a proper fix will be more complicated by the fact that PCIe cards themselves may have limitations. For example, most nVidia GPUs have 40 bits DMA addressing capability, some 39, an a few (still modern) ones -- just 37 [ref. nVidia "README" in the driver package]. > PCI passthru doesn't allow the BAR values to be modified (this could > be changed, but it's a lot of work for little gain). Removing another signature of detecting virtualization and increasing compatibility would be negligible gain? Just asking... > > But: > > # ./nvidia-smi > > No devices were found > > dmesg: > > [ 173.498953] NVRM: RmInitAdapter failed! (0x53:0x3:1856) > > [ 173.499115] NVRM: rm_init_adapter failed for device bearing > > minor number 0 > > Looks like you're getting close :) Hmm, I'm not seeing myself getting much closer here. Do you know something I don't? ;) I really hope bhyve developers can spare a bit of time on getting GPU passthrough to work... I know nothing about these things, and where I waste half a day messing around, the problem could be fixed in half an hour by someone who knows. > later, > Peter. -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> IIRC the 367.44 version of the nvidia drivers do NOT support the > Quadro 2000, you need to be using the 340.xx version of them. I > ran into problems on native hardware. I pulled the Quadro 2000 out of my workstation [and put the 600 in], which is running fine with the latest driver from ports (367.44). > Also before you attempt to get VGA passthrough working it is best > to make sure you can run native, have you tried running your guest > on the host in a native configuration? Yes, I just installed the nVidia driver on the host, and it works fine. > I have fought this on other platforms many times only to find out > that what I was trying would not ever run native, let alone in a > virtualized environment. This gives me the idea to try a different driver version in Linux... -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> As far as I can tell it's the Hypervisor extension flags list. The lack > of these extensions/optimisations might explain why your FreeBSD VM > runs slow The guest isn't slow, actually -- just the `nvidia-smi` tool was much slower than normal to produce output. CPU speed in the guest is less than 2% slower than bare hardware. Memory bandwidth is over 2.5 times slower. [0] Still, compiling world and ports feels quite snappy. Disk I/O is not bad, too! (but latencies are bit high) [1] > However, even with reapplying the changes to vmm.ko to hide/remove the > 0x4000 CPUID support and CPUID2_HV, I still have the same > "RmInitAdapter failed" issue. Well, I, too, couldn't get nVidia driver to work at all in Linux. The driver loaded, but when trying to use it, gave the "RmInitAdapter failed! (0x53:0x3:1856)" errors. Is is the same in your case? Or the driver does not load at all (like I experienced with Quadro 600)? Maybe I should try the same driver version in Linux as in FreeBSD... > Allegedly[0] nVidia VM checking came in with driver version 337.88, > with more checking after version 344.11. I couldn't install version 319 > as it failed to build the Linux kernel module. I currently have 370.28 > installed which supports both my GT610 and my GTX960. `nvidia-smi -q` thinks it's running on bare hardware [when in VM]: GPU Virtualization Mode Virtualization mode : None Yet we know the driver refuses to load for Quadro 600 and your GTX. (that two-faced bastard nVidia!) So there must be multiple checks. > Maybe the next thing for me to try is to replicate your tests with a > FreeBSD VM. Yes, give it a try. I'm about to give up, for nVidia card at least (I was only using it for testing, as I didn't have AMD GPU handy until recently). [0] # Host with idle FreeBSD VM guest running # ubench Unix Benchmark Utility v.0.3 FreeBSD 10.3-STABLE FreeBSD 10.3-STABLE #0 r311343M: Thu Jan 5 02:31:50 PST 2017 xxx@yyy:/usr/obj/usr/src/sys/SORALX amd64 Ubench CPU: 2076542 Ubench MEM: 1296221 Ubench AVG: 1686381 # Guest # ubench Unix Benchmark Utility v.0.3 FreeBSD 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r311659: Tue Jan 10 14:18:44 PST 2017 xxx@zzz:/usr/obj/usr/src/sys/GENERIC amd64 Ubench CPU: 2053983 Ubench MEM: 489953 Ubench AVG: 1271968 [1] # Host # diskinfo -tv /dev/ada0 /dev/ada0 512 # sectorsize 250059350016# mediasize in bytes (233G) 488397168 # mediasize in sectors 4096# stripesize Seek times: Full stroke: 250 iter in 0.029146 sec =0.117 msec Half stroke: 250 iter in 0.029950 sec =0.120 msec Quarter stroke: 500 iter in 0.057272 sec =0.115 msec Short forward:400 iter in 0.045711 sec =0.114 msec Short backward: 400 iter in 0.045748 sec =0.114 msec Seq outer: 2048 iter in 0.072799 sec =0.036 msec Seq inner: 2048 iter in 0.072256 sec =0.035 msec Transfer rates: outside: 102400 kbytes in 0.222412 sec = 460407 kbytes/sec middle:102400 kbytes in 0.222026 sec = 461207 kbytes/sec inside:102400 kbytes in 0.222842 sec = 459518 kbytes/sec # Guest # diskinfo -tv /dev/vtbd0 /dev/vtbd0 512 # sectorsize 22548644864 # mediasize in bytes (21G) 44040322# mediasize in sectors 32768 # stripesize Seek times: Full stroke: 250 iter in 0.112714 sec =0.451 msec Half stroke: 250 iter in 0.100388 sec =0.402 msec Quarter stroke: 500 iter in 0.132808 sec =0.266 msec Short forward:400 iter in 0.074504 sec =0.186 msec Short backward: 400 iter in 0.096154 sec =0.240 msec Seq outer: 2048 iter in 0.208239 sec =0.102 msec Seq inner: 2048 iter in 0.061853 sec =0.030 msec Transfer rates: outside: 102400 kbytes in 0.201375 sec = 508504 kbytes/sec middle:102400 kbytes in 0.196044 sec = 522332 kbytes/sec inside:102400 kbytes in 0.197697 sec = 517964 kbytes/sec -- [SorAlx] ridin' VN2000 Classic LT ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Hi, There doesn't seem to be support for CPUID 0x4001 in bhyve either. What is it supposed to do? As far as I can tell it's the Hypervisor extension flags list. The lack of these extensions/optimisations might explain why your FreeBSD VM runs slow but their presence also causes the nVidia driver to refuse to run. That leaf is KVM-only. bhyve doesn't have any additional hypervisor leaves beyond 0x400 (the spec for this is https://lwn.net/Articles/301888/) later, Peter. ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Is the VM checking documented in the driver notes somewhere? I have a Titan X that I need to run CUDA on and would be much happier if I didn't have to actually switch back and forth between FreeBSD and Ubuntu on my desktop. Are we new fairly certain that this won't work? (Yet another reason to go with AMD if they ever deliver on ROCm) On Wed, Jan 11, 2017 at 05:53 Dom wrote: > On 11/01/2017 02:01, sor...@cydem.org wrote: > > > Dom wote: > > >> There doesn't seem to be support for CPUID 0x4001 in bhyve either. > > > What is it supposed to do? > > > > As far as I can tell it's the Hypervisor extension flags list. The lack > > of these extensions/optimisations might explain why your FreeBSD VM runs > > slow but their presence also causes the nVidia driver to refuse to run. > > (Can't remember where I read this, sorry) > > > > With your change to PCI_EMUL_MEMBASE64 I can boot a CentOS VM without > > the "pci=nocrs" kernel option and nVidia card is assigned BARs without > > issue. > > > > However, even with reapplying the changes to vmm.ko to hide/remove the > > 0x4000 CPUID support and CPUID2_HV, I still have the same > > "RmInitAdapter failed" issue. > > > > Allegedly[0] nVidia VM checking came in with driver version 337.88, with > > more checking after version 344.11. I couldn't install version 319 as it > > failed to build the Linux kernel module. I currently have 370.28 > > installed which supports both my GT610 and my GTX960. > > > > Maybe the next thing for me to try is to replicate your tests with a > > FreeBSD VM. > > > > [0] https://ubuntuforums.org/showthread.php?t=2266916 search for > > "337.88" > > ___ > > freebsd-virtualization@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > > To unsubscribe, send any mail to " > freebsd-virtualization-unsubscr...@freebsd.org" > > ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
On 11/01/2017 02:01, sor...@cydem.org wrote: Dom wote: There doesn't seem to be support for CPUID 0x4001 in bhyve either. What is it supposed to do? As far as I can tell it's the Hypervisor extension flags list. The lack of these extensions/optimisations might explain why your FreeBSD VM runs slow but their presence also causes the nVidia driver to refuse to run. (Can't remember where I read this, sorry) With your change to PCI_EMUL_MEMBASE64 I can boot a CentOS VM without the "pci=nocrs" kernel option and nVidia card is assigned BARs without issue. However, even with reapplying the changes to vmm.ko to hide/remove the 0x4000 CPUID support and CPUID2_HV, I still have the same "RmInitAdapter failed" issue. Allegedly[0] nVidia VM checking came in with driver version 337.88, with more checking after version 344.11. I couldn't install version 319 as it failed to build the Linux kernel module. I currently have 370.28 installed which supports both my GT610 and my GTX960. Maybe the next thing for me to try is to replicate your tests with a FreeBSD VM. [0] https://ubuntuforums.org/showthread.php?t=2266916 search for "337.88" ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
IIRC the 367.44 version of the nvidia drivers do NOT support the Quadro 2000, you need to be using the 340.xx version of them. I ran into problems on native hardware. Also before you attempt to get VGA passthrough working it is best to make sure you can run native, have you tried running your guest on the host in a native configuration? I have fought this on other platforms many times only to find out that what I was trying would not ever run native, let alone in a virtualized environment. > > The problem appears to be in the area of assigning memory-mapped > > I/O ranges by bhyve for the VGA card to a region outside of the > > CPU's addressable space; i.e., bhyve does not check CPUID's > > 0x8008 AL value (0x27 for my CPU, which is 39 bits -- while > > bhyve assigns 0xd0 & above for the large Prefetch Memory > > chunks, which requires 40 address bits). At least this is my > > understanding of why VGA passthrough does not work. > > To test this, I tried writing to PCI BARs in FreeBSD guest using > `pciconf -w`. Not much use that was: I could read back the values > written to the registers (e.g., `pciconf -r pci0:0:4:0 0x14:48`), > but `pciconf -lvb` still showed the same huge base addresses -- > they did not want to change. > > OK, I had enough of that. So I went to dig in the source, and > changed the "#define PCI_EMUL_MEMBASE64" from '0xD0UL' > to '0x34UL' in src/usr.sbin/bhyve/pci_emul.c. Recompiled > bhyve, booted up FreeBSD, and: > # pciconf -lvb > [...] > vgapci0@pci0:0:4:0: class=0x03 card=0x084a10de chip=0x0dd810de > rev=0xa1 hdr=0x00 > vendor = 'NVIDIA Corporation' > device = 'GF106GL [Quadro 2000]' > class = display > subclass = VGA > bar [10] = type Memory, range 32, base 0xc200, size 33554432, > enabled > bar [14] = type Prefetchable Memory, range 64, base 0x34, > size 134217728, enabled > bar [1c] = type Prefetchable Memory, range 64, base 0x340800, > size 67108864, enabled > bar [24] = type I/O Port, range 32, base 0x2080, size 128, enabled > > ...a-a-and: > # kldload nvidia-modeset > Linux ELF exec handler installed > nvidia0: on vgapci0 > vgapci0: child nvidia0 requested pci_enable_io > vgapci0: attempting to allocate 1 MSI vectors (1 supported) > msi: routing MSI IRQ 269 to local APIC 3 vector 51 > vgapci0: using IRQ 269 for MSI > vgapci0: child nvidia0 requested pci_enable_io > random: harvesting attach, 8 bytes (4 bits) from nvidia0 > # nvidia-smi > acquiring duplicate lock of same type: "os.lock_sx" >1st os.lock_sx @ nvidia_os.c:599 >2nd os.lock_sx @ nvidia_os.c:599 > stack backtrace: > #0 0x80aa6780 at witness_debugger+0x70 > #1 0x80aa6683 at witness_checkorder+0xde3 > #2 0x80a4fac2 at _sx_xlock+0x72 > #3 0x82a515c2 at os_acquire_mutex+0x32 > #4 0x82a21068 at _nv016673rm+0x18 > Tue Jan 10 17:06:48 2017 > > +-+ > | NVIDIA-SMI 367.44 Driver Version: 367.44 > | > > |---+--+--+ > | GPU NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. > ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute > M. | > > |===+==+==| > | 0 Quadro 2000 Off | :00:04.0 Off | > N/A | > | 30% 35CP8N/A / N/A | 0MiB / 963MiB | 0% > Default | > > +---+--+--+ > > > > +-+ > | Processes: GPU > Memory | > | GPU PID Type Process name Usage > | > > |=| > | No running processes found > | > > +-+ > > Beauty! It's very slow to execute, though. And Xorg is not in a hurry > to start working: > [ 204.724] (--) PCI:*(0:0:4:0) 10de:0dd8:10de:084a rev 161, Mem @ > 0xc200/33554432, 0x34/134217728, 0x340800/67108864, I/O @ > 0x2080/128, BIOS @ 0x/65536 > [...] > [ 204.736] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32 > [ 204.736] (==) NVIDIA(0): RGB weight 888 > [ 204.736] (==) NVIDIA(0): Default visual is TrueColor > [ 204.736] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0) > [ 204.738] (**) NVIDIA(0): Enabling 2D acceleration >
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Hi, The problem appears to be in the area of assigning memory-mapped I/O ranges by bhyve for the VGA card to a region outside of the CPU's addressable space; i.e., bhyve does not check CPUID's 0x8008 AL value (0x27 for my CPU, which is 39 bits -- while bhyve assigns 0xd0 & above for the large Prefetch Memory chunks, which requires 40 address bits). That's correct - it's a bug in bhyve. To test this, I tried writing to PCI BARs in FreeBSD guest using `pciconf -w`. Not much use that was: I could read back the values written to the registers (e.g., `pciconf -r pci0:0:4:0 0x14:48`), but `pciconf -lvb` still showed the same huge base addresses -- they did not want to change. PCI passthru doesn't allow the BAR values to be modified (this could be changed, but it's a lot of work for little gain). OK, I had enough of that. So I went to dig in the source, and changed the "#define PCI_EMUL_MEMBASE64" from '0xD0UL' to '0x34UL' in src/usr.sbin/bhyve/pci_emul.c. Yep, that's a good way to test. But: # ./nvidia-smi No devices were found dmesg: [ 173.498953] NVRM: RmInitAdapter failed! (0x53:0x3:1856) [ 173.499115] NVRM: rm_init_adapter failed for device bearing minor number 0 Looks like you're getting close :) later, Peter. ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> The problem appears to be in the area of assigning memory-mapped > I/O ranges by bhyve for the VGA card to a region outside of the > CPU's addressable space; i.e., bhyve does not check CPUID's > 0x8008 AL value (0x27 for my CPU, which is 39 bits -- while > bhyve assigns 0xd0 & above for the large Prefetch Memory > chunks, which requires 40 address bits). At least this is my > understanding of why VGA passthrough does not work. To test this, I tried writing to PCI BARs in FreeBSD guest using `pciconf -w`. Not much use that was: I could read back the values written to the registers (e.g., `pciconf -r pci0:0:4:0 0x14:48`), but `pciconf -lvb` still showed the same huge base addresses -- they did not want to change. OK, I had enough of that. So I went to dig in the source, and changed the "#define PCI_EMUL_MEMBASE64" from '0xD0UL' to '0x34UL' in src/usr.sbin/bhyve/pci_emul.c. Recompiled bhyve, booted up FreeBSD, and: # pciconf -lvb [...] vgapci0@pci0:0:4:0: class=0x03 card=0x084a10de chip=0x0dd810de rev=0xa1 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'GF106GL [Quadro 2000]' class = display subclass = VGA bar [10] = type Memory, range 32, base 0xc200, size 33554432, enabled bar [14] = type Prefetchable Memory, range 64, base 0x34, size 134217728, enabled bar [1c] = type Prefetchable Memory, range 64, base 0x340800, size 67108864, enabled bar [24] = type I/O Port, range 32, base 0x2080, size 128, enabled ...a-a-and: # kldload nvidia-modeset Linux ELF exec handler installed nvidia0: on vgapci0 vgapci0: child nvidia0 requested pci_enable_io vgapci0: attempting to allocate 1 MSI vectors (1 supported) msi: routing MSI IRQ 269 to local APIC 3 vector 51 vgapci0: using IRQ 269 for MSI vgapci0: child nvidia0 requested pci_enable_io random: harvesting attach, 8 bytes (4 bits) from nvidia0 # nvidia-smi acquiring duplicate lock of same type: "os.lock_sx" 1st os.lock_sx @ nvidia_os.c:599 2nd os.lock_sx @ nvidia_os.c:599 stack backtrace: #0 0x80aa6780 at witness_debugger+0x70 #1 0x80aa6683 at witness_checkorder+0xde3 #2 0x80a4fac2 at _sx_xlock+0x72 #3 0x82a515c2 at os_acquire_mutex+0x32 #4 0x82a21068 at _nv016673rm+0x18 Tue Jan 10 17:06:48 2017 +-+ | NVIDIA-SMI 367.44 Driver Version: 367.44 | |---+--+--+ | GPU NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===+==+==| | 0 Quadro 2000 Off | :00:04.0 Off | N/A | | 30% 35CP8N/A / N/A | 0MiB / 963MiB | 0% Default | +---+--+--+ +-+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=| | No running processes found | +-+ Beauty! It's very slow to execute, though. And Xorg is not in a hurry to start working: [ 204.724] (--) PCI:*(0:0:4:0) 10de:0dd8:10de:084a rev 161, Mem @ 0xc200/33554432, 0x34/134217728, 0x340800/67108864, I/O @ 0x2080/128, BIOS @ 0x/65536 [...] [ 204.736] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32 [ 204.736] (==) NVIDIA(0): RGB weight 888 [ 204.736] (==) NVIDIA(0): Default visual is TrueColor [ 204.736] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0) [ 204.738] (**) NVIDIA(0): Enabling 2D acceleration [ 213.674] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:0:4:0 [ 213.674] (--) NVIDIA(0): CRT-0 [ 213.674] (--) NVIDIA(0): DFP-0 (boot) [ 213.674] (--) NVIDIA(0): DFP-1 [ 213.674] (--) NVIDIA(0): DFP-2 [ 213.674] (--) NVIDIA(0): DFP-3 [ 213.675] (--) NVIDIA(0): DFP-4 [ 213.698] (--) NVIDIA(0): CRT-0: disconnected [ 213.698] (--) NVIDIA(0): CRT-0: 400.0 MHz maximum pixel clock [ 213.698] (--) NVIDIA(0): [ 213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): connected [ 213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): Internal TMDS [ 213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): 330.0 MHz maximum pixel clock [...
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
> Found my original attempt by modifying /usr/src/sys/amd64/vmm/x86.c > Unified diff follows, but this didn't work for me. > ("bhyve_id[]" commented out to prevent compiler complaints) Who knows what sort of trickery nVidia's driver is up to besides CPUID when determining the presence of virtualization. Regardless of that, VGA PCIe passthrough does not work in bhyve even with Quadro 2000 (which Xen people have had success with). The problem appears to be in the area of assigning memory-mapped I/O ranges by bhyve for the VGA card to a region outside of the CPU's addressable space; i.e., bhyve does not check CPUID's 0x8008 AL value (0x27 for my CPU, which is 39 bits -- while bhyve assigns 0xd0 & above for the large Prefetch Memory chunks, which requires 40 address bits). At least this is my understanding of why VGA passthrough does not work. This seems easy to fix. Could someone who knows better have a look? Unlike Linux, FreeBSD has no problem assigning BAR range outside addressable range, and then panics when trying to write to these virtual memory addresses. See [0] below. > There doesn't seem to be support for CPUID 0x4001 in bhyve either. What is it supposed to do? [0] Linux dmesg: [0.204799] PCI: MMCONFIG for domain [bus 00-ff] at [mem 0xe000-0xefff] (base 0xe000) [0.205474] PCI: MMCONFIG at [mem 0xe000-0xefff] reserved in ACPI motherboard resources [0.206080] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [0.207306] ACPI: PCI Root Bridge [PC00] (domain [bus 00]) [0.207724] acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [0.208291] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM [0.208759] acpi PNP0A03:00: host bridge window [mem 0xd0-0xd00c0f window] (ignored, not CPU addressable) [0.209517] PCI host bridge to bus :00 [0.209808] pci_bus :00: root bus resource [io 0x-0x0cf7 window] [0.210281] pci_bus :00: root bus resource [io 0x0d00-0x1fff window] [0.210752] pci_bus :00: root bus resource [io 0x2000-0x211f window] [0.211224] pci_bus :00: root bus resource [mem 0xc000-0xc40f window] [0.211743] pci_bus :00: root bus resource [bus 00] [...] [0.223902] PCI: Using ACPI for IRQ routing [0.265987] pci :00:03.0: can't claim BAR 1 [mem 0xd0-0xd007ff 64bit pref]: no compatible bridge window [0.266735] pci :00:03.0: can't claim BAR 3 [mem 0xd00800-0xd00bff 64bit pref]: no compatible bridge window [0.284717] pci :00:03.0: can't claim BAR 6 [mem 0xf600-0xf607 pref]: no compatible bridge window [...] [0.285407] pci :00:03.0: BAR 1: no space for [mem size 0x0800 64bit pref] [0.285933] pci :00:03.0: BAR 1: trying firmware assignment [mem 0xd0-0xd007ff 64bit pref] [0.286599] pci :00:03.0: BAR 1: [mem 0xd0-0xd007ff 64bit pref] conflicts with PCI mem [mem 0x-0x7f] [0.287419] pci :00:03.0: BAR 1: failed to assign [mem size 0x0800 64bit pref] [0.287968] pci :00:03.0: BAR 3: no space for [mem size 0x0400 64bit pref] [0.288506] pci :00:03.0: BAR 3: trying firmware assignment [mem 0xd00800-0xd00bff 64bit pref] [0.289173] pci :00:03.0: BAR 3: [mem 0xd00800-0xd00bff 64bit pref] conflicts with PCI mem [mem 0x-0x7f] [0.289992] pci :00:03.0: BAR 3: failed to assign [mem size 0x0400 64bit pref] [0.290539] pci :00:03.0: BAR 6: assigned [mem 0xc008-0xc00f pref] [0.291039] pci :00:01.0: BAR 6: assigned [mem 0xc0002000-0xc00027ff pref] [0.291540] pci :00:02.0: BAR 6: assigned [mem 0xc0002800-0xc0002fff pref] Cannot get output from Linux's `lspci -vvn` booted with "pci=nocrs" kernel option, as it hangs now close to the end of boot process (not sure why, was able to finish booting before). Another machine: vgapci0@pci0:1:0:0: class=0x03 card=0x083510de chip=0x0df810de rev=0xa1 hdr=0x00 vendor = 'nVidia Corporation' device = 'GF108 [Quadro 600]' class = display subclass = VGA bar [10] = type Memory, range 32, base 0xfa00, size 16777216, enabled bar [14] = type Prefetchable Memory, range 64, base 0xe800, size 134217728, enabled bar [1c] = type Prefetchable Memory, range 64, base 0xf000, size 33554432, enabled bar [24] = type I/O Port, range 32, base 0xe000, size 128, enabled hdac0@pci0:1:0:1: class=0x040300 card=0x083510de chip=0x0bea10de rev=0xa1 hdr=0x00 vendor = 'nVidia Corporation' device = 'GF108 High Definition Audio Controller' class = multimedia subclass = HDA bar [10] = type Memory, range 32, base 0xfb08, size 16384, enabled Host: ppt0@pci0:1:0:0:class=0x03 card=0x084a10de chip=0x0dd810de rev=0xa
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Found my original attempt by modifying /usr/src/sys/amd64/vmm/x86.c Unified diff follows, but this didn't work for me. ("bhyve_id[]" commented out to prevent compiler complaints) There doesn't seem to be support for CPUID 0x4001 in bhyve either. --- x86.c.orig 2016-09-11 14:40:22.410462000 +0100 +++ x86.c 2016-09-11 15:53:14.182186000 +0100 @@ -52,7 +52,7 @@ #defineCPUID_VM_HIGH 0x4000 -static const char bhyve_id[12] = "bhyve bhyve "; +/* static const char bhyve_id[12] = "bhyve bhyve "; */ static uint64_t bhyve_xcpuids; SYSCTL_ULONG(_hw_vmm, OID_AUTO, bhyve_xcpuids, CTLFLAG_RW, &bhyve_xcpuids, 0, @@ -236,7 +236,7 @@ regs[2] &= ~(CPUID2_VMX | CPUID2_EST | CPUID2_TM2); regs[2] &= ~(CPUID2_SMX); - regs[2] |= CPUID2_HV; + /* regs[2] |= CPUID2_HV; */ if (x2apic_state != X2APIC_DISABLED) regs[2] |= CPUID2_X2APIC; @@ -463,12 +463,15 @@ } break; + /* +* Don't expose KVM to guest case 0x4000: regs[0] = CPUID_VM_HIGH; bcopy(bhyve_id, ®s[1], 4); bcopy(bhyve_id + 4, ®s[2], 4); bcopy(bhyve_id + 8, ®s[3], 4); break; + */ default: /* ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
With QEMU, they have the "kvm=off" option which hides hypervisor info from the guest. See: https://www.redhat.com/archives/libvir-list/2014-August/msg00512.html I did try to replicate this a while back but didn't have much success - maybe I missed a flag? The QEMU diff seems relatively small, see: http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00302.html Having another go at doing this is on my to-do list, but not very near the top! ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Howdy, virtualization zealots! This is in reply to maillist thread [0]. It so happens that I have to get GPU-accelerated OpenCL working on my machine, so I had a play with bhyve & PCI-e passthrough for VGA. I was using nVidia Quadro 600 (GF108) for testing (planning to use AMD/ATI for OpenCL, of course). I tried a Linux guest with the proprietary nVidia driver, and the result was that the driver couldn't init the VGA during boot: [1.394726] nvidia: module license 'NVIDIA' taints kernel. [1.395140] Disabling lock debugging due to kernel taint [1.412132] nvidia: module verification failed: signature and/or required key missing - tainting kernel [1.419359] nvidia :00:04.0: can't derive routing for PCI INT A [1.419807] nvidia :00:04.0: PCI INT A: no GSI [1.420157] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: [1.420157] NVRM: BAR1 is 0M @ 0x0 (PCI::00:04.0) [1.421023] NVRM: The system BIOS may have misconfigured your GPU. [1.421476] nvidia: probe of :00:04.0 failed with error -1 [1.437301] nvidia-nvlink: Nvlink Core is being initialized, major device number 247 [1.440094] NVRM: The NVIDIA probe routine failed for 1 device(s). [1.440530] NVRM: None of the NVIDIA graphics adapters were initialized! After adding the "pci=nocrs" Linux boot option (which, from what I understand, magically helps to [partially] workaround bhyve assigning addresses beyond host CPU's physically addressable space for PCIe memory-mapped registers), the guest couldn't finish booting, because bhyve would segfault. Turns out the what peripherals are used, and their order on the command line, are important. Edit: actually, looks like it's the number of CPUs (the '-c' flag's argument) that makes the difference; the machine has a CPU with 4 core, no multithreading. This didn't work (segfault): `bhyve -A -H -P -s 0:0,hostbridge -s 1:0,lpc -s 2:0,virtio-net,tap0 \ -s 3:0,virtio-blk,./bhyve_lunix.img \ -s 4:0,ahci-cd,./ubuntu-16.04.1-server-amd64.iso \ -s 5:0,passthru,1/0/0 -l com1,stdio -c 4 -m 1024M -S lunixguest` [...] [ OK ] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch. [ OK ] Reached target Swap. Assertion failed: (pi->pi_bar[baridx].type == PCIBAR_IO), function passthru_write, file /usr/src/usr.sbin/bhyve/pci_passthru.c, line 850. Abort (core dumped) But his worked, finally: `bhyve -c 1 -m 1024M -S -A -H -P -s 0:0,hostbridge -s 1:0,lpc \ -s 2:0,virtio-net,tap0 -s 3:0,virtio-blk,./bhyve_lunix.img \ -s 4:0,passthru,1/0/0 -l com1,stdio lunixguest` So, the guest booted, and didn't complain about non-addressable- -by-CPU BARs anymore. However, the same fate befall me as Dom had in this thread -- the driver loaded: [1.691216] nvidia: module verification failed: signature and/or required key missing - tainting kernel [1.696641] nvidia :00:04.0: can't derive routing for PCI INT A [1.698093] nvidia :00:04.0: PCI INT A: no GSI [1.699277] vgaarb: device changed decodes: PCI::00:04.0,olddecodes=io+mem,decodes=none:owns=io+mem [1.701461] nvidia-nvlink: Nvlink Core is being initialized, major device number 247 [1.702649] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 375.26 Thu Dec 8 18:36:43 PST 2016 (using threaded interrupts) [1.705481] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 375.26 Thu Dec 8 18:04:14 PST 2016 [1.708941] [drm] [nvidia-drm] [GPU ID 0x0004] Loading driver but couldn't talk to the card: [lost the log, but it was the same as Dom's: "NVRM: rm_init_adapter failed"]. So I decided to try test in a FreeBSD 10.3-STABLE guest. With older driver, or just loading 'nvidia' without modesetting, I got guest kernel panics [1]. I loaded 'nvidia-modeset', there was more success: Linux ELF exec handler installed Linux x86-64 ELF exec handler installed nvidia0: on vgapci0 vgapci0: child nvidia0 requested pci_enable_io vgapci0: attempting to allocate 1 MSI vectors (1 supported) msi: routing MSI IRQ 269 to local APIC 2 vector 51 vgapci0: using IRQ 269 for MSI vgapci0: child nvidia0 requested pci_enable_io nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 367.44 Wed Aug 17 22:05:09 PDT 2016 But: # nvidia-smi NVRM: Xid (PCI::00:04): 62, !2369() NVRM: RmInitAdapter failed! (0x26:0x65:1072) nvidia0: NVRM: rm_init_adapter() failed! No devices were found It also panicked after starting Xorg. After stumbling upon some Xen forums, I found the solution: nVidia crippled the driver so that it detects virtualization environment, and refuses to attach to anything but high-end pro cards! Those bastards [if the speculation is true]! GTX960 didn't work. Quadro 600 didn't work. So I tried with a Quadro 2000: root@fbsd12tst:~
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
It looks like there may not be an issue with MSI after all. The nvidia driver is issued an IRQ when first used, not at boot time. If I run the CUDA "deviceQuery" sample then this appears in dmesg: [ 67.207929] nvidia :00:06.0: irq 29 for MSI/MSI-X [ 67.646207] NVRM: RmInitAdapter failed! (0x24:0x1f:1356) [ 67.646570] NVRM: rm_init_adapter failed for device bearing minor number 0 [ 67.647214] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5 Maybe the IRQ is deallocated immediately so doesn't appear in the output of /proc/interrupts? I guess I'll need to research the NVRM error above now but at least this thread might be useful regarding the BAR allocation. ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Hi Peter, Thanks for getting back to me. Here's the info you requested: [0.163085] acpi PNP0A03:00: host bridge window [0xd0-0xd0100f] (ignored, not CPU addressable) That one is most likely a bug in bhyve, where the space used for 64-bit BAR placement isn't tested against the max physaddr width of the host CPU. To confirm, would you be able to report on this value on your system ? # sudo pkg install cpuid # cpuid | grep ^8008 On my Intel i7-4790K CPU: # cpuid | grep ^8008 8008 3027 The device has an MSI capability, but the nvidia driver may not use it. bhyve PCI passthrough requires the use of MSI/MSI-x interrupts, and doesn't support using legacy interrupts. This could be confirmed from the output of /proc/interrupts when booting Linux on the system. Output of /proc/interrupts: CPU0 0:137 IO-APIC-edge timer 1: 9 IO-APIC-edge i8042 4:965 IO-APIC-edge serial 8: 0 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 12:138 IO-APIC-edge i8042 17: 0 IO-APIC-fasteoi snd_hda_intel 24: 0 PCI-MSI-edge virtio0-config 25: 8535 PCI-MSI-edge virtio0-req.0 26: 0 PCI-MSI-edge virtio1-config 27:123 PCI-MSI-edge virtio1-input.0 28: 1 PCI-MSI-edge virtio1-output.0 NMI: 0 Non-maskable interrupts LOC: 6050 Local timer interrupts SPU: 0 Spurious interrupts PMI: 0 Performance monitoring interrupts IWI: 2484 IRQ work interrupts RTR: 0 APIC ICR read retries RES: 0 Rescheduling interrupts CAL: 0 Function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts THR: 0 Threshold APIC interrupts MCE: 0 Machine check exceptions MCP: 1 Machine check polls ERR: 0 MIS: 0 I guess the lack of a line containing PCI-MSI-* here indicates the nvidia driver isn't using an MSI/MSI-x interrupt? However, searching the web suggests the Linux nvidia driver does use MSI interrupts. This taken from a working non-VM Linux dmesg: [4.330536] nvidia :05:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [4.330542] nvidia :05:00.0: setting latency timer to 64 Source: https://bugzilla.kernel.org/show_bug.cgi?id=20432#c2 (Thread also mentions disabling MSI) I'll try some Linux boot options and reordering the devices when calling bhyve to see if that changes anything. Thanks, Dom ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Hi Dom, Bhyve's ACPI table produces this error linux-side regardless of "pci=" setting: [0.163085] acpi PNP0A03:00: host bridge window [0xd0-0xd0100f] (ignored, not CPU addressable) That one is most likely a bug in bhyve, where the space used for 64-bit BAR placement isn't tested against the max physaddr width of the host CPU. To confirm, would you be able to report on this value on your system ? # sudo pkg install cpuid # cpuid | grep ^8008 8008 3027 ^^ This is the output from a Xeon E3-1220 v3: 0x27 == 39 bits of phys address (0x80 max) 0xd0 requires >= 40 bits. 2. "can't derive routing for PCI INT" Linux-side dmesg related output: [1.677168] nvidia :00:06.0: can't derive routing for PCI INT A [1.677600] nvidia :00:06.0: PCI INT A: no GSI ... Host-side info (when GTX960 is NOT configured as a pass-thru dev): ... cap 05[68] = MSI supports 1 message, 64 bit enabled with 1 message The device has an MSI capability, but the nvidia driver may not use it. bhyve PCI passthrough requires the use of MSI/MSI-x interrupts, and doesn't support using legacy interrupts. This could be confirmed from the output of /proc/interrupts when booting Linux on the system. later, Peter. ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Hello, Setup: nvidia GTX960 in PCIe slot intel i7-4790K CPU FreeBSD 11-RC2 host CentOS 7 guest with kernel 3.10.0-327.28.3.el7.x86_64 Using vm-bhyve port I've hit two issues: 1. BAR allocation Workaround (for me) is adding "pci=nocrs" to linux guest's kernel command line. Without "pci=nocrs" (or with "pci=use_crs"), the GTX960 doesn't get its 256MB block allocated. Bhyve's ACPI table produces this error linux-side regardless of "pci=" setting: [0.163085] acpi PNP0A03:00: host bridge window [0xd0-0xd0100f] (ignored, not CPU addressable) which then leads to this: [0.215369] pci :00:06.0: can't claim BAR 1 [mem 0xd0-0xd00fff 64bit pref]: no compatible bridge window and then, with "pci=use_crs" (i.e. use ACPI host bridge windows): [0.164030] pci_bus :00: root bus resource [bus 00] [0.164379] pci_bus :00: root bus resource [io 0x-0x0cf7] [0.164799] pci_bus :00: root bus resource [io 0x0d00-0x1fff] [0.165206] pci_bus :00: root bus resource [io 0x2000-0x211f] [0.165623] pci_bus :00: root bus resource [mem 0xc000-0xc41f] ... [0.231762] pci :00:06.0: BAR 1: no space for [mem size 0x1000 64bit pref] [0.232263] pci :00:06.0: BAR 1: trying firmware assignment [mem size 0x1000 64bit pref] [0.232855] pci :00:06.0: BAR 1: [mem size 0x1000 64bit pref] conflicts with PCI mem [mem 0x-0x7f] [0.233579] pci :00:06.0: BAR 1: failed to assign [mem size 0x1000 64bit pref] but with "pci=nocrs" (i.e. ignore ACPI host bridge windows): [0.163967] pci_bus :00: root bus resource [bus 00] [0.164323] pci_bus :00: root bus resource [io 0x-0x] [0.164745] pci_bus :00: root bus resource [mem 0x-0x7f] ... [0.230203] pci :00:06.0: BAR 1: assigned [mem 0x14000-0x14fff 64bit pref] 2. "can't derive routing for PCI INT" Linux-side dmesg related output: [1.677168] nvidia :00:06.0: can't derive routing for PCI INT A [1.677600] nvidia :00:06.0: PCI INT A: no GSI Host-side info (when GTX960 is NOT configured as a pass-thru dev): vgapci0@pci0:1:0:0: class=0x03 card=0x19623842 chip=0x140110de rev=0xa1 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'GM206 [GeForce GTX 960]' class = display subclass = VGA bar [10] = type Memory, range 32, base 0xf600, size 16777216, enabled bar [14] = type Prefetchable Memory, range 64, base 0xe000, size 268435456, enabled bar [1c] = type Prefetchable Memory, range 64, base 0xf000, size 33554432, enabled bar [24] = type I/O Port, range 32, base 0xe000, size 128, enabled cap 01[60] = powerspec 3 supports D0 D3 current D0 cap 05[68] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[78] = PCI-Express 2 legacy endpoint max data 128(256) RO link x4(x16) speed 2.5(8.0) ecap 0002[100] = VC 1 max VC0 ecap 001e[258] = unknown 1 ecap 0004[128] = Power Budgeting 1 ecap 0001[420] = AER 2 0 fatal 0 non-fatal 4 corrected ecap 000b[600] = Vendor 1 ID 1 ecap 0019[900] = PCIe Sec 1 lane errors 0xf Linux-side info (when GTX960 IS configured as a pass-thru dev, also with "pci=nocrs"): 00:06.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1) (prog-if 00 [VGA controller]) Subsystem: eVga.com. Corp. Device 1962 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at c100 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at 14000 (64-bit, prefetchable) [size=256M] Region 3: Memory at c200 (64-bit, prefetchable) [size=32M] Region 5: I/O ports at 2080 [size=128] Expansion ROM at f700 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: Data: Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port