subject:"Issues with GTX960 on CentOS7 using bhyve PCI passthru \(FreeBSD 11\-RC2\)"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-15 Thread soralx


> Some screens attached (hopefully not too heavy).
> Didn't have time to do better. Select your favourite ones.
>
> I upgraded Linux to newer version (Ubuntu 16.10, kernel 4.8),
> and it broke the driver. OpenCL does not work at all anymore.
> The screens were made on newer system -- nothing seemed to be
> changed in Xorg.

Actually, OpenCL still works, after I rebooted host and reinstalled
AMD's Pro driver and OpenCL SDK. I did upgrade to even newer Linux,
I guess it's the latest development Ubuntu version (don't know how
to find OS version [`uname -a` doesn't tell], but kernel is
4.9.0-11-generic). Messages like 'Warning: LLVM emitted unknown
config register: 0x4' seem to be gone.

I'm getting strange numbers with cl-mem test now, I think stranger
than before the upgrade (but not sure, did not do many cl-mem tests
back then):
  # ~/cl-mem/cl-mem 
Running write test.
128 GB in 688.6 ms (185.9 GB/s)
Running read test.
128 GB in 596.1 ms (214.7 GB/s)
Running copy test.
128 GB in 715.3 ms (179.0 GB/s)
  # ~/cl-mem/cl-mem 
Running write test.
128 GB in 684.8 ms (186.9 GB/s)
Running read test.
128 GB in 596.8 ms (214.5 GB/s)
Running copy test.
128 GB in 715.1 ms (179.0 GB/s)

After `glxgears -fullscreen` run:
  # ~/cl-mem/cl-mem 
Running write test.
128 GB in 868.3 ms (147.4 GB/s)
Running read test.
128 GB in 275.4 ms (464.8 GB/s)
Running copy test.
128 GB in 3878.0 ms (33.0 GB/s)
  # ~/cl-mem/cl-mem 
Running write test.
128 GB in 878.8 ms (145.7 GB/s)
Running read test.
128 GB in 293.1 ms (436.7 GB/s)
Running copy test.
128 GB in 3659.9 ms (35.0 GB/s)
[after couple minutes]
  # ~/cl-mem/cl-mem 
Running write test.
128 GB in 687.4 ms (186.2 GB/s)
Running read test.
128 GB in 596.8 ms (214.5 GB/s)
Running copy test.
128 GB in 715.0 ms (179.0 GB/s)

The copy test is slow, because there are _lots_ of kernel messages
printed like this:
  [ 1780.569388] amdgpu :00:04.0: GPU fault detected: 147 0x0fba4402
  [ 1780.569830] amdgpu :00:04.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   
0x1073
  [ 1780.570357] amdgpu :00:04.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 
0x0B044002
or more generally:
  [ 1780.569388] amdgpu :00:04.0: GPU fault detected: 147 0x0x02
  [ 1780.569830] amdgpu :00:04.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   
0x
  [ 1780.570357] amdgpu :00:04.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 
0x0B04x002

The read and write tests results are way too high, as I'm assuming
the test is transferring data over PCIe. The copy test at 180 GB/s
is reasonable, and matches hardware expectations and others' tests [0].

Also 'mixbench' seems to produce reasonable results that are
order-of-2-magnitude comparable to others' tests, like [0]:
5 TFLOPS single precision, 350 MFLOPS double precision (should be
single_precision/16 for ATI cards), and 1 TIOPS for integer (32-bit).

More captures of Xorg screen attached. Behaviour is very strange,
but may give some hints of what might be going wrong to those familiar
with X, video memory, framebuffers, DRI, GFX and all that weird and
wonderful stuff.

When `glxgears` is run in fullscreen mode, what's on screen depends
on each run. Framerate varies from run to run, is mostly stable within
one run but can change abruptly by 100's of FPS. When the screen is
blank, the framerate is slowest (750~1500 FPS). When only parts of
gears are rendered, framerate is usually higher, up to 2500 FPS.
Rotation is smooth, but with brightness flicker of some parts of
gears sometimes.

I should mention these numbers are for 1600x1200 screen.
Also, with VNC session closed (but 'vino' still running),
the frame rate goes up to far more reasonable 6400 FPS.

[0] http://cdn.videocardz.com/1/2016/06/Radeon-RX-480-vs-GTX-970-AIDA-GPU-2.png

CC'ing to freebsd-virtualization@, as this is likely to be of
general interest (the screenshot attachment has been removed).

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-15 Thread soralx


> When `glxgears` is run in fullscreen mode, what's on screen depends
> on each run. Framerate varies from run to run, is mostly stable within
> one run but can change abruptly by 100's of FPS. When the screen is
> blank, the framerate is slowest (750~1500 FPS). When only parts of
> gears are rendered, framerate is usually higher, up to 2500 FPS.
> Rotation is smooth, but with brightness flicker of some parts of
> gears sometimes.

I should mention these numbers are for 1600x1200 screen.
Also, with VNC session closed (but vino still running),
the frame rate goes up to far more reasonable 6400 FPS.

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-13 Thread Peter Grehan


Hi,


Does bhyve not execute peripheral cards' option ROMs?


 Not yet.


I guess it doesn't. This could explain a lot of strange
behaviour seen resulting from running in a VM.


 Yes.


How does UEFI work in this regard? My guess is that cards
have to explicitly support the new boot method (UEFI)?


 Yes - an additional section in the option ROM is needed, but as 
mentioned in an earlier email, that support is now widespread thanks to 
Windows.



So passthrough with newer cards may be easier? This could
explain why the newer RX 480 worked right away, and the
older Quadro 2000 (and a lot of other nVidia cards without
manufacturer's support for VMs) had no chance -- UEFI cards
are somehow more "autonomous".


 Possibly, though it might also be the card itself not requiring as 
much initialization from the option ROM.



It all is just speculation on my side, I know nothing about
this UEFI stuff.

Could you summarize in couple sentences what's the deal between
bhyve and UEFI (if there is any), or future plans?


 UEFI is the ROM firmware for bhyve (and most modern PCs). bhyve has a 
custom build of the standard Intel EDK2 distribution:

   https://github.com/freebsd/uefi-edk2/tree/bhyve/UDK2014.SP1

 The changes are to support running as a hypervisor guest, where a lot 
of what is in a normal boot ROM isn't required (e.g. DRAM controller 
setup, CPU microcode update), and it also contains drivers for device 
emulations supported by bhyve.


 Currently, the ability to process an option ROM has been disabled.

later,

Peter.
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-13 Thread Peter Grehan


Hi,


  That is extremely likely. bhyve itself doesn't have a BIOS, though
bhyve/UEFI could be modified to handle options ROMs (see
http://awilliam.github.io/presentations/KVM-Forum-2014/#/)


Hm, interesting. I wonder if a card that's not designed for use
with UEFI is destined not to work well/at_all with bhyve...
I'll read the presentation later.


 I think in general almost all cards have UEFI ROM support these days 
since it has been mandated by Microsoft. However, as Rod mentioned, the 
bhyve UEFI implementation does not run PCI device option ROMs.


 (see 
http://vfio.blogspot.com/2014/08/does-my-graphics-card-rom-support-efi.html)



-GPU UUID:
GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2
+GPU UUID: Unknown Error


  That implies some type of h/w access isn't working, either MMIO
registers or response from a DMA command.


I have a feeling it's something to do with DMA that's
not getting configured correctly for data transfers,
and returns wrong data (or good data to wrong location).


 Yes.


  A general issue with PCI passthrough is that often MMIO from the
guest works, since that is just VT-x remapping, but DMA doesn't work
due to issues with IOMMU programming (or incorrect mappings being
used). This gives a device that partially works in that registers can
be read, but data transfer doesn't work.


Didn't we verify that the BARs are programmed correctly?


 The BAR values you see are fictional and are created by bhyve. The 
actual physical BAR values are those set up by the host BIOS. bhyve uses 
EPT mappings to translate between the 'fake' value and the real value.



So you're saying that bhyve has a bug in that it doesn't
program the IOMMU right to match guest's memory-mapped
address regions to host's addresses?


 There isn't a known bug, but the 64-bit BAR region hasn't been tested 
for a long time so it's possible there is an issue with it.



BTW, is it [generally] safe to decrease the BAR base address further?
My workstation has a CPU with just 36 address bits...

  Yes. The only potential conflict is with the top of guest RAM, and 36
bits is a lot of RAM :)


64G of RAM isn't that much these days, how incredible is that :)
But you're saying there's nothing else inbetween the top of
guest's RAM and the BAR base? In that case it's nothing to
worry about at all, as a guest will always have less RAM that
the host's CPU can address.


 Right - the 64-bit PCI decode region would be set dynamically based on 
the phys address bits, rather than being a hard-coded value.


later,

Peter.

___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-13 Thread soralx


> I suspect this is a failure to run the BIOS code that
> enables the secondary power connector so you can come
> out of slot only power mode.

Well, that Quadro does not have a power connector, but
I imagine card BIOS routines would be similar between
all cards in a family, including those that require the
extra power.

Does bhyve not execute peripheral cards' option ROMs?
I guess it doesn't. This could explain a lot of strange
behaviour seen resulting from running in a VM.

How does UEFI work in this regard? My guess is that cards
have to explicitly support the new boot method (UEFI)?
So passthrough with newer cards may be easier? This could
explain why the newer RX 480 worked right away, and the
older Quadro 2000 (and a lot of other nVidia cards without
manufacturer's support for VMs) had no chance -- UEFI cards
are somehow more "autonomous".

It all is just speculation on my side, I know nothing about
this UEFI stuff.

Could you summarize in couple sentences what's the deal between
bhyve and UEFI (if there is any), or future plans?

> The general rule on other platforms is that ATI/AMD cards tend to just
> work, where as the NVidia cards are very picky and unless official
> listed as known to work when passed through you well fight problems.
> Very few cards are listed as known to work, most populars being the
> Quadro 2000, and the Quadro FX3800.  Many cards are listed as known to
> NOT work.

Yeah, messing with nVidia for this reason (and because of the
closed driver) seems to me like a huge time sink. I don't have
the time, so decided to try with AMD for now. I am only interested
in nVidia because it's best choice for a Windows VM I hope to run
(SolidWorks, Altium, etc). OpenCL on AMD is a priority for me
personally right now anyway.

> GOOD WORK on getting as far as you have as quickly as you have!
> Note that the https://wiki.freebsd.org/bhyve/pci_passthru has
> had a small update to reflect we know that VGA passthrough is
> not working at this time.  Also a note about AMD IOmmu/AMD-Vi
> was added, hopefully saving someone from duplicate work.

Perhaps a note could be added about ATI/AMD cards partially
working, to encourage others to play?

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-13 Thread Rodney W. Grimes

...
> > > -Performance State   : P0
> > > +Performance State   : P8  
> > 
> >   Note sure what's happening here.
> 
> Driver not kicking the card's BIOS into the right mode
> to switch to dynamic power state selection?

I suspect this is a failure to run the BIOS code that
enables the secondary power connector so you can come
out of slot only power mode.

> > > Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0:
> > > Display engine push buffer channel allocation failed Jan 11 11:34:49
> > > fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate
> > > display engine core DMA push buffer  
> > 
> >   Not sure what's happening with those.
> > 
> >   Would it be possible to try the nouveau driver ? At least the source 
> > is available, so it may be easier to determine what is broken.
> 
> I could, but for now I'd like to focus more on AMD card
> (which also has an open-source driver).

The general rule on other platforms is that ATI/AMD cards tend to just
work, where as the NVidia cards are very picky and unless official listed
as known to work when passed through you well fight problems.  Very
few cards are listed as known to work, most populars being the Quadro
2000, and the Quadro FX3800.  Many cards are listed as known to NOT work.

...

GOOD WORK on getting as far as you have as quickly as you have!
Note that the https://wiki.freebsd.org/bhyve/pci_passthru has
had a small update to reflect we know that VGA passthrough is
not working at this time.  Also a note about AMD IOmmu/AMD-Vi
was added, hopefully saving someone from duplicate work.

-- 
Rod Grimes rgri...@freebsd.org
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-13 Thread soralx


>   -- VDPAU works, but I suspect it's not using the GPU [3][4];
>  I haven't figure a way how to force the use of GPU. Also,
>  the main window with text looks OK most of the time (when
>  doing the video test and in the end, in particular), but
>  show a smaller black rectangle in top left corner of the
>  screen instead of the video samples;
>   -- it almost feels like the DMA and framebuffers aren't always
>  correctly configured, but still are transferring data [from
>  somewhere to somewhere sometimes].

It is possible run a VNC server, vino, on top of the Xorg session.
I can send screenshots of how the display looks like to anyone
interested.

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-13 Thread soralx


> >  First, `nvidia-smi -q` output diff [0] is interesting. It suggests
> > that the card may be in some incompletely initialized state: notice
> > the "Unknown Error" instead of real UUID, and the P8 power state.
> > Could it be that the driver doesn't put the card's BIOS in the right
> > state?  
> 
>   That is extremely likely. bhyve itself doesn't have a BIOS, though 
> bhyve/UEFI could be modified to handle options ROMs (see 
> http://awilliam.github.io/presentations/KVM-Forum-2014/#/)

Hm, interesting. I wonder if a card that's not designed for use
with UEFI is destined not to work well/at_all with bhyve...
I'll read the presentation later.

> > -GPU UUID:
> > GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2
> > +GPU UUID: Unknown Error  
> 
>   That implies some type of h/w access isn't working, either MMIO 
> registers or response from a DMA command.

I have a feeling it's something to do with DMA that's
not getting configured correctly for data transfers,
and returns wrong data (or good data to wrong location).

> > -Board ID: 0x100
> > +Board ID: 0x4  
> 
>   The same ?

I'm quite sure it was the same card.

> >  PCIe Generation
> >  Max : 2
> > -Current : 2
> > +Current : 1  
> 
>   bhyve's emulated PCI hostbridge only advertises gen-1 - that could be 
> easily changed to gen2. That could make a difference for some of the 
> clock issues below
>   (source is pci_emul.c:pci_emul_add_pciecap())

I doubt the generation number matters. But yeah,
wouldn't hurt to change it to '2'.


> >  Link Width
> >  Max : 16x
> >  Current : 16x  
>   That's a bit unexpected since the hostbridge only advertises 1x, but 
> the driver is probably exporting the host value here.

Yeah, nVidia is known to like talking directly to the card
in its own, proprietary way.

> > -Performance State   : P0
> > +Performance State   : P8  
> 
>   Note sure what's happening here.

Driver not kicking the card's BIOS into the right mode
to switch to dynamic power state selection?

> >  Clocks
> > -Graphics: 625 MHz
> > -SM  : 1251 MHz
> > -Memory  : 1304 MHz
> > -Video   : 540 MHz
> > +Graphics: 405 MHz
> > +SM  : 810 MHz
> > +Memory  : 324 MHz
> > +Video   : 405 MHz  
> 
>   This may be related to the gen1 vs gen2 issue above.

I doubt it's related to PCIe gen. Most likely because the
card seems to remain in P8 (low power) mode, according to
the same SMI tool. But the frequencies don't look right
anyway; well, I didn't bother to look up what P8 is supposed
to run at.

> > When rebooting, I get this:
> > nvidia-modeset: ERROR: GPU:0: Idling display engine timed out:
> > 0x857d:0:0:0x0040  
> 
>   This may be DMA not working.

Yes, I strongly suspect DMA too, especially when it comes
to DRI stuff.

>   A general issue with PCI passthrough is that often MMIO from the
> guest works, since that is just VT-x remapping, but DMA doesn't work
> due to issues with IOMMU programming (or incorrect mappings being
> used). This gives a device that partially works in that registers can
> be read, but data transfer doesn't work.

Didn't we verify that the BARs are programmed correctly?
So you're saying that bhyve has a bug in that it doesn't
program the IOMMU right to match guest's memory-mapped
address regions to host's addresses?

> > Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0:
> > Display engine push buffer channel allocation failed Jan 11 11:34:49
> > fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate
> > display engine core DMA push buffer  
> 
>   Not sure what's happening with those.
> 
>   Would it be possible to try the nouveau driver ? At least the source 
> is available, so it may be easier to determine what is broken.

I could, but for now I'd like to focus more on AMD card
(which also has an open-source driver).

> > BTW, is it [generally] safe to decrease the BAR base address further?
> > My workstation has a CPU with just 36 address bits...  
>   Yes. The only potential conflict is with the top of guest RAM, and 36 
> bits is a lot of RAM :)

64G of RAM isn't that much these days, how incredible is that :)
But you're saying there's nothing else inbetween the top of
guest's RAM and the BAR base? In that case it's nothing to
worry about at all, as a guest will always have less RAM that
the host's CPU can address.

> later,
> Peter.

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mail

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-12 Thread soralx


Good news, everyone!

 I tried an AMD card, and it is almost working. I have a lot of logs
 and info, but I will try to restrain the length of this message.

 There was no need to do anything special to get the card to work,
 other than figuring out how to deal with Linux, setting up drivers
 and OpenCL SDK & linking libraries to the right places, compiling
 soft, etc.

 First, PCI info [0] and some dmesg bits [1]. AMD drivers:
  amdgpu-pro-16.50-362463.tar.xz
  AMD-APP-SDKInstaller-v3.0.130.136-GA-linux64.tar.bz2

 Next, the good news: Xorg starts, and display works!.. Kind of:
  -- `glxgears` window is flickery, has parts of gears missing,
 and does not look good in general;
  -- xterm window has the rectangular cursor shapes plastered
 all over, in random places;
  -- full-screen (1600x1200) `glxgears` is slower than expected,
 and the performance varies suddenly [2];
  -- VDPAU works, but I suspect it's not using the GPU [3][4];
 I haven't figure a way how to force the use of GPU. Also,
 the main window with text looks OK most of the time (when
 doing the video test and in the end, in particular), but
 show a smaller black rectangle in top left corner of the
 screen instead of the video samples;
  -- it almost feels like the DMA and framebuffers aren't always
 correctly configured, but still are transferring data [from
 somewhere to somewhere sometimes].
  
 I'm getting lots of messages like [5][6], among others, in
 various cases.

 Of some 3 OpenCL applications I tested, one appeared to complete
 successfully [7]. Running it also produces messages as in [6].
 But the numbers make sense, comparing to e.g. tests of R9 Nano
 (~/mixbench/results/OpenCL/alt_R9-Nano_d1912.5.log) and expectations
 of the GPU chip. This is exciting! Dunno if the benchmark check
 whether the computations are correct, though.

 `clinfo` result [8] is a bit mixed... e.g., "Max clock frequency:
 555Mhz" is wrong.

 Some more tests [9].
 

[0]
00:04.0 0300: 1002:67df (rev c7) (prog-if 00 [VGA controller])
Subsystem: 1682:9480
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 
unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- 
TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency 
L0s <64ns, L1 <1us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, 
OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, 
OBFF Disabled
LnkSta2: Current De-emphasis Level: -3.5dB, 
EqualizationComplete+, EqualizationPhase1+
 EqualizationPhase2+, EqualizationPhase3+, 
LinkEqualizationRequest-
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: fee03000  Data: 4022
Kernel driver in use: amdgpu
Kernel modules: amdgpu

00:05.0 0300: 10de:0dd8 (rev a1) (prog-if 00 [VGA controller])
Subsystem: 10de:084a
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_367, nvidia_367_drm


[1]
[0.617109] pci :00:04.0: can't claim BAR 6 [mem 0xf684-0xf685 
pref]: no compatible bridge window
[0.617806] pci :00:05.0: can't claim BAR 6 [mem 0xf600-0xf607 
pref]: no compatible bridge window
[0.618496] pci :00:05.0: BAR 6: assigned [mem 0xc008-0xc00f 
pref]
[0.619000] pci :00:04.0: BAR 6: assigned [mem 0xc002-0xc003 
pref]
[0.619508] pci :00:01.0: BAR 6: assigned [mem 0xc0004000-0xc00047ff 
pref]
[0.620011] pci :00:02.0: BAR 6: assigned [mem 0xc0004800-0xc0004fff 
pref]
[0.620513] pci :

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-12 Thread Peter Grehan


Hi,


BTW, is it [generally] safe to decrease the BAR base address further?

> My workstation has a CPU with just 36 address bits...

 Yes. The only potential conflict is with the top of guest RAM, and 36 
bits is a lot of RAM :)


later,

Peter.
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-12 Thread Peter Grehan


Hi,


 First, `nvidia-smi -q` output diff [0] is interesting. It suggests that
 the card may be in some incompletely initialized state: notice the
 "Unknown Error" instead of real UUID, and the P8 power state. Could it
 be that the driver doesn't put the card's BIOS in the right state?


 That is extremely likely. bhyve itself doesn't have a BIOS, though 
bhyve/UEFI could be modified to handle options ROMs (see 
http://awilliam.github.io/presentations/KVM-Forum-2014/#/)



 The command was run in both host and guest without Xorg loaded.


 Thanks for the diff; this is very useful.


-GPU UUID: GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2
+GPU UUID: Unknown Error


 That implies some type of h/w access isn't working, either MMIO 
registers or response from a DMA command.



-Board ID: 0x100
+Board ID: 0x4


 The same ?


 PCIe Generation
 Max : 2
-Current : 2
+Current : 1


 bhyve's emulated PCI hostbridge only advertises gen-1 - that could be 
easily changed to gen2. That could make a difference for some of the 
clock issues below


 (source is pci_emul.c:pci_emul_add_pciecap())


 Link Width
 Max : 16x
 Current : 16x


 That's a bit unexpected since the hostbridge only advertises 1x, but 
the driver is probably exporting the host value here.



-Performance State   : P0
+Performance State   : P8


 Note sure what's happening here.


 Clocks
-Graphics: 625 MHz
-SM  : 1251 MHz
-Memory  : 1304 MHz
-Video   : 540 MHz
+Graphics: 405 MHz
+SM  : 810 MHz
+Memory  : 324 MHz
+Video   : 405 MHz


 This may be related to the gen1 vs gen2 issue above.


When rebooting, I get this:
nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 
0x857d:0:0:0x0040


 This may be DMA not working.

 A general issue with PCI passthrough is that often MMIO from the guest 
works, since that is just VT-x remapping, but DMA doesn't work due to 
issues with IOMMU programming (or incorrect mappings being used). This 
gives a device that partially works in that registers can be read, but 
data transfer doesn't work.



Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Display engine 
push buffer channel allocation failed
Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Failed to 
allocate display engine core DMA push buffer


 Not sure what's happening with those.

 Would it be possible to try the nouveau driver ? At least the source 
is available, so it may be easier to determine what is broken.


later,

Peter.
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-12 Thread Rodney W. Grimes

...
> 
> Incidentally, could someone put a note about that hardcoded BAR base
> on the bhyve PCI passthrough page [0] if it won't be fixed soon, so
> many others can play with VGA passthrough meanwhile?

I am working with Michael Dexter to get changes made
to this wiki page to reflect your work here in getting a step closer to
working VGA passthrough, along with a note that we know it does not work
at the present time.

> [0] https://wiki.freebsd.org/bhyve/pci_passthru
> 
> -- 
> [SorAlx]  ridin' VN2000 Classic LT

-- 
Rod Grimes rgri...@freebsd.org
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread soralx


> This gives me the idea to try a different driver version in Linux...

Tried the same driver version in Linux as in FreeBSD. The driver seems
to talk to the card now, but not sure whether I can call this progress:

[0.536988] PCI host bridge to bus :00
[0.537291] pci_bus :00: root bus resource [io  0x-0x0cf7 window]
[0.537776] pci_bus :00: root bus resource [io  0x0d00-0x1fff window]
[0.538248] pci_bus :00: root bus resource [io  0x2000-0x211f window]
[0.538722] pci_bus :00: root bus resource [mem 0xc000-0xc40f 
window]
[0.539244] pci_bus :00: root bus resource [mem 
0x34-0x340c0f window]
[0.539791] pci_bus :00: root bus resource [bus 00]
[0.540204] pci :00:00.0: [1275:1275] type 00 class 0x06
[0.540402] pci :00:00.0: reg 0x30: [mem 0x-0x07ff pref]
[0.540557] pci :00:01.0: [8086:7000] type 00 class 0x060100
[0.540826] pci :00:01.0: reg 0x30: [mem 0x-0x07ff pref]
[0.540923] pci :00:02.0: [1af4:1001] type 00 class 0x01
[0.541052] pci :00:02.0: reg 0x10: [io  0x2000-0x203f]
[0.541090] pci :00:02.0: reg 0x14: [mem 0xc000-0xc0001fff]
[0.541273] pci :00:02.0: reg 0x30: [mem 0x-0x07ff pref]
[0.541442] pci :00:03.0: [1af4:1000] type 00 class 0x02
[0.541568] pci :00:03.0: reg 0x10: [io  0x2040-0x205f]
[0.541605] pci :00:03.0: reg 0x14: [mem 0xc0002000-0xc0003fff]
[0.541786] pci :00:03.0: reg 0x30: [mem 0x-0x07ff pref]
[0.541992] pci :00:04.0: [10de:0dd8] type 00 class 0x03
[0.542136] pci :00:04.0: reg 0x10: [mem 0xc200-0xc3ff]
[0.542198] pci :00:04.0: reg 0x14: [mem 0x34-0x3407ff 64bit 
pref]
[0.542259] pci :00:04.0: reg 0x1c: [mem 0x340800-0x340bff 64bit 
pref]
[0.542302] pci :00:04.0: reg 0x24: [io  0x2080-0x20ff]
[0.542346] pci :00:04.0: reg 0x30: [mem 0xf600-0xf607 pref]

[0.549031] vgaarb: setting as boot device: PCI::00:04.0
[0.549430] vgaarb: device added: 
PCI::00:04.0,decodes=io+mem,owns=io+mem,locks=none
[0.549995] vgaarb: loaded
[0.550190] vgaarb: bridge control possible :00:04.0

[0.616082] pci :00:04.0: can't claim BAR 6 [mem 0xf600-0xf607 
pref]: no compatible bridge window
[0.616775] pci :00:04.0: BAR 6: assigned [mem 0xc008-0xc00f 
pref]
[0.617281] pci :00:01.0: BAR 6: assigned [mem 0xc0004000-0xc00047ff 
pref]
[0.617789] pci :00:02.0: BAR 6: assigned [mem 0xc0004800-0xc0004fff 
pref]
[0.618303] pci :00:03.0: BAR 6: assigned [mem 0xc0005000-0xc00057ff 
pref]
[0.618807] pci_bus :00: resource 4 [io  0x-0x0cf7 window]
[0.618808] pci_bus :00: resource 5 [io  0x0d00-0x1fff window]
[0.618809] pci_bus :00: resource 6 [io  0x2000-0x211f window]
[0.618810] pci_bus :00: resource 7 [mem 0xc000-0xc40f window]
[0.618811] pci_bus :00: resource 8 [mem 0x34-0x340c0f 
window]

[1.669308] nvidia: module verification failed: signature and/or required 
key missing - tainting kernel
[1.676499] AVX2 version of gcm_enc/dec engaged.
[1.676844] nvidia :00:04.0: can't derive routing for PCI INT A
[1.676845] nvidia :00:04.0: PCI INT A: no GSI
[1.676904] vgaarb: device changed decodes: 
PCI::00:04.0,olddecodes=io+mem,decodes=none:owns=io+mem
[1.676983] nvidia-nvlink: Nvlink Core is being initialized, major device 
number 248
[1.676991] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  367.57  Mon Oct  
3 20:37:01 PDT 2016
[1.683125] AES CTR mode by8 optimization enabled
[1.687576] [drm] Initialized drm 1.1.0 20060810
[1.706991] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for 
UNIX platforms  367.57  Mon Oct  3 20:32:57 PDT 2016
[1.708732] [drm] [nvidia-drm] [GPU ID 0x0004] Loading driver

After starting Xorg:
[   23.762260] divide error:  [#1] SMP 
[   23.762271] Modules linked in: nvidia_uvm(POE) mac_hid 8250_fintek ib_iser 
rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
multipath linear nvidia_drm(POE) nvidia_modeset(POE) drm_kms_helper 
crct10dif_pclmul syscopyarea sysfillrect crc32_pclmul ghash_clmulni_intel 
sysimgblt fb_sys_fops drm aesni_intel aes_x86_64 lrw nvidia(POE) gf128mul 
glue_helper ablk_helper cryptd fjes
[   23.762273] CPU: 2 PID: 1423 Comm: Xorg Tainted: P   OE   
4.4.0-59-generic #80-Ubuntu
[   23.762273] Hardware name:   BHYVE, BIOS 1.00 03/14/2014
[   23.762274] task: 880005129c00 ti: 880006b08000 task.ti: 
880006b08000
[   23.762373] RIP: 0010:[]  [] 
_nv008359rm+0xdb/0x150 [nvidia]
[   23.762374] RSP: 0018:880006b0b990  EFLAGS: 00010246
[   23.762374] RAX: 00

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread soralx


>   That's a different issue - it's unlikely, if not impossible, to 
> configure bhyve with enough RAM to hit 37 bits worth where that would 
> become a problem. No need to worry about that.

Well, there may be peripheral cards that have less bits... Anyway, I
see what you mean: memory manager can always remap the DMA regions
to lower virtual addresses. Good to know it's not an issue.

>   There are lots of BIOS/UEFI implementations out there that have the 
> same restriction. In general, there should be no need for a guest to 
> reprogram device BARs.

There are all sorts of situations... But anyway, if it is too much
work, then we can forget about this until someone needs it again.

>   After changing the 64-bit BAR base address, did you still need the 
> pci=nocrs option for Linux ? I'd hope this would be no longer necessary.

No. As expected, the option is no longer needed.

BTW, is it [generally] safe to decrease the BAR base address further?
My workstation has a CPU with just 36 address bits...

>   The problem is the knowledge set of graphics/GPU knowhow and
> equipment access, and bhyve/PCI programming, are disjoint. The time
> I've spent on it has been the inverse, where I feel that I've spent a
> half-day doing things that anyone who knew about graphics could get
> done in a half-hour :)
>
>   For these type of issues, joint work is best to leverage the
> knowledge of both sides. From my point-of-view, the work you've done
> has been very helpful.

Yeah, we could benefit from more information exchange, I agree. I am
trying to do my part to share what I learned; I started with no
knowledge of bhyve, PCI, or GPU passthrough a week ago -- so for now
my efforts appear useful, but soon the progress will stop, and we
will need someone with real knowledge to step in.

Incidentally, could someone put a note about that hardcoded BAR base
on the bhyve PCI passthrough page [0] if it won't be fixed soon, so
many others can play with VGA passthrough meanwhile?

[0] https://wiki.freebsd.org/bhyve/pci_passthru

> later,
> Peter.

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread soralx


> > Removing another signature of detecting virtualization and increasing
> > compatibility would be negligible gain? Just asking...  
> I don't think we are going to try and defeat the NVidia virtualization
> checks, and I can probably assure you that they would patch them as
> fast as we bypassed them.

No-no, that's not what I meant. It is not about nVidia, just general
quality of the hypervisor. Ideally, running in the VM should to be
indistinguishable from running on bare metal, except things like
CPUID, peripheral ID strings, etc (that can be easily changed by
recompiling). For instance, suppose someone is doing security
research -- studying clever malware for example; you'd want to run
in VM, but want to have as few ways as possible for the subject to
find out that it is running in a VM. Or Internet security... Just
some examples. FreeBSD was always (as far as I remember, anyway) in
the lead in research (as a platform) and quality, so I figured it'd
be nice to keep it this way.

> This is officially an item on my plate.  Functional hardware that works
> doing VGA passthrough under ESXi was brought up about 2 weeks ago so I
> am past that stage.I have Quadro FX3800 on a Supermicro X9DAi.

Great! I'm glad to see there is interest in GPU passthrough on FreeBSD.

> You have helped some in uncovering the next set of issues, but my
> plate is very large, and seems to have grown appendages that are
> holding all sorts of things :-)

Yeah, it's a dream to have a nice, flat plates for projects :)

> I am still coming up to speed on the bhyve code, so it wont be
> a half hour fix by me, but it well get fixed.

Count me in for help. I haven't got lots of time to play with this
(or at least shouldn't be spending so much time), but can aid with
testing, and also can provide access to a machine with an AMD card
(or both nVidia & AMD).

> Rod Grimes

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread soralx


> I hope you keep it up or at least figure out what the driver is doing.

Not in my plans at the moment. I prefer AMD GPUs over nV for OpenCL.
nVidia did (and does) serve me well for the last 10 year with their
excellent FreeBSD graphics driver: I had very few problems with it;
it's stable and well documented, and has the performance. Huge thanks
and respect to them for that [I just bought another low-end nVidia
card for my home computer].

But times a-changin'. We have things like OpenCL and virtualization
these days that are of interest and available to everyone (not just
the "professionals").

> If they haven't explicitly put in the license terms that virtualization
> is forbidden for consumer cards, there's nothing wrong with hot
> patching the driver ...

Why do you care about some license? It's your card, your computer,
so you're free to do what _you_ want with it. How can one "forbid"
virtualization? It's impossible. Only way is to design hardware and
firmware that is not amenable to VT -- but nobody would do that
nowadays.

> assuming that they don't do things like Skype
> does where it repeatedly checksums the memory image.

There are ways around things. But it is easier to buy AMD than to
fight someone's stupidity or incompetence.
 
> Good hunting.
> 
> -M

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread soralx


>  Xorg log bits [2] show that X is up. But the monitor stays in sleep

BTW, this is what happens to Xorg:
  USER  PID  %CPU %MEM  VSZ   RSS TT  STAT STARTED TIME COMMAND
  root  638 100.0  0.8 50481584 50612 u0  R<   12:214:33.05 |-- 
/usr/local/bin/X :0 -auth /root/.serverauth.624 (Xorg)

It cannot be killed, and gdb hangs when trying to attach to the process.

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread K. Macy

I hope you keep it up or at least figure out what the driver is doing. If
they haven't explicitly put in the license terms that virtualization is
forbidden for consumer cards, there's nothing wrong with hot patching the
driver ... assuming that they don't do things like Skype does where it
repeatedly checksums the memory image.

Good hunting.

-M
On Wed, Jan 11, 2017 at 19:54  wrote:

>
>
>  I had a bit more play with nVidia and FreeBSD guest.
>
>
>
>  First, `nvidia-smi -q` output diff [0] is interesting. It suggests that
>
>  the card may be in some incompletely initialized state: notice the
>
>  "Unknown Error" instead of real UUID, and the P8 power state. Could it
>
>  be that the driver doesn't put the card's BIOS in the right state?
>
>  The command was run in both host and guest without Xorg loaded.
>
>
>
>  Second, I was able to start Xorg by disabling rendering acceleration
>
>  (Option "NoAccel"). Now nVidia's Xorg module does not fail to allocate
>
>  DMA (I guess it does not try?), but oddly, reserves 48 GB (!?) of virtual
>
>  memory instead. Sadly there is still no display for some reason.
>
>
>
>  Relevant dmesg bits are below [1]. Of particular interest is the line
>
>  "nvidia-modeset: Allocated GPU:0 () @ PCI::00:00.0" -- the PCI
>
>  address is obviously incorrect.
>
>
>
>  Xorg log bits [2] show that X is up. But the monitor stays in sleep mode.
>
>  With more options [3], I get this: [4]. Edit: actually, host reboot
>
>  made it behave the same as just with "NoAccel", maybe.
>
>
>
>  Clearly the driver is able to talk to the card: e.g., it attaches and
>
>  responds to `nvidia-smi` [with the exception of UUID], reads EDID from
>
>  the monitor. But some channel of communication is clearly missing or
>
>  not working right. Any ideas how to go about finding out which one?
>
>
>
> [0]
>
>  ==NVSMI LOG==
>
>
>
> -Timestamp   : Wed Jan 11 19:40:54 2017
>
> +Timestamp   : Wed Jan 11 11:08:40 2017
>
>  Driver Version  : 367.44
>
>
>
>  Attached GPUs   : 1
>
> -GPU :01:00.0
>
> +GPU :00:04.0
>
>  Product Name: Quadro 2000
>
>  Product Brand   : Quadro
>
>  Display Mode: Enabled
>
> @@ -17,11 +17,11 @@
>
>  Current : N/A
>
>  Pending : N/A
>
>  Serial Number   : N/A
>
> -GPU UUID:
> GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2
>
> +GPU UUID: Unknown Error
>
>  Minor Number: 0
>
>  VBIOS Version   : 70.06.0D.00.02
>
>  MultiGPU Board  : No
>
> -Board ID: 0x100
>
> +Board ID: 0x4
>
>  GPU Part Number : N/A
>
>  Inforom Version
>
>  Image Version   : N/A
>
> @@ -34,16 +34,16 @@
>
>  GPU Virtualization Mode
>
>  Virtualization mode : None
>
>  PCI
>
> -Bus : 0x01
>
> -Device  : 0x00
>
> +Bus : 0x00
>
> +Device  : 0x04
>
>  Domain  : 0x
>
>  Device Id   : 0x0DD810DE
>
> -Bus Id  : :01:00.0
>
> +Bus Id  : :00:04.0
>
>  Sub System Id   : 0x084A10DE
>
>  GPU Link Info
>
>  PCIe Generation
>
>  Max : 2
>
> -Current : 2
>
> +Current : 1
>
>  Link Width
>
>  Max : 16x
>
>  Current : 16x
>
> @@ -54,7 +54,7 @@
>
>  Tx Throughput   : N/A
>
>  Rx Throughput   : N/A
>
>  Fan Speed   : 30 %
>
> -Performance State   : P0
>
> +Performance State   : P8
>
>  Clocks Throttle Reasons : N/A
>
>  FB Memory Usage
>
>  Total   : 963 MiB
>
> @@ -113,7 +113,7 @@
>
>  Double Bit ECC  : N/A
>
>  Pending : N/A
>
>  Temperature
>
> -GPU Current Temp: 38 C
>
> +GPU Current Temp: 35 C
>
>  GPU Shutdown Temp   : N/A
>
>  GPU Slowdown Temp   : N/A
>
>  Power Readings
>
> @@ -125,10 +125,10 @@
>
>  Min Power Limit : N/A
>
>  Max Power Limit : N/A
>
>  Clocks
>
> -Graphics: 625 MHz
>
> -SM  : 1251 MHz
>
> -Memory  : 1304 MHz
>
> -Video   : 540 MHz
>
> +Graphics

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread soralx


 I had a bit more play with nVidia and FreeBSD guest.

 First, `nvidia-smi -q` output diff [0] is interesting. It suggests that
 the card may be in some incompletely initialized state: notice the
 "Unknown Error" instead of real UUID, and the P8 power state. Could it
 be that the driver doesn't put the card's BIOS in the right state?
 The command was run in both host and guest without Xorg loaded.

 Second, I was able to start Xorg by disabling rendering acceleration
 (Option "NoAccel"). Now nVidia's Xorg module does not fail to allocate
 DMA (I guess it does not try?), but oddly, reserves 48 GB (!?) of virtual
 memory instead. Sadly there is still no display for some reason.

 Relevant dmesg bits are below [1]. Of particular interest is the line
 "nvidia-modeset: Allocated GPU:0 () @ PCI::00:00.0" -- the PCI
 address is obviously incorrect.

 Xorg log bits [2] show that X is up. But the monitor stays in sleep mode.
 With more options [3], I get this: [4]. Edit: actually, host reboot
 made it behave the same as just with "NoAccel", maybe.

 Clearly the driver is able to talk to the card: e.g., it attaches and
 responds to `nvidia-smi` [with the exception of UUID], reads EDID from
 the monitor. But some channel of communication is clearly missing or
 not working right. Any ideas how to go about finding out which one?

[0]
 ==NVSMI LOG==
 
-Timestamp   : Wed Jan 11 19:40:54 2017
+Timestamp   : Wed Jan 11 11:08:40 2017
 Driver Version  : 367.44
 
 Attached GPUs   : 1
-GPU :01:00.0
+GPU :00:04.0
 Product Name: Quadro 2000
 Product Brand   : Quadro
 Display Mode: Enabled
@@ -17,11 +17,11 @@
 Current : N/A
 Pending : N/A
 Serial Number   : N/A
-GPU UUID: GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2
+GPU UUID: Unknown Error
 Minor Number: 0
 VBIOS Version   : 70.06.0D.00.02
 MultiGPU Board  : No
-Board ID: 0x100
+Board ID: 0x4
 GPU Part Number : N/A
 Inforom Version
 Image Version   : N/A
@@ -34,16 +34,16 @@
 GPU Virtualization Mode
 Virtualization mode : None
 PCI
-Bus : 0x01
-Device  : 0x00
+Bus : 0x00
+Device  : 0x04
 Domain  : 0x
 Device Id   : 0x0DD810DE
-Bus Id  : :01:00.0
+Bus Id  : :00:04.0
 Sub System Id   : 0x084A10DE
 GPU Link Info
 PCIe Generation
 Max : 2
-Current : 2
+Current : 1
 Link Width
 Max : 16x
 Current : 16x
@@ -54,7 +54,7 @@
 Tx Throughput   : N/A
 Rx Throughput   : N/A
 Fan Speed   : 30 %
-Performance State   : P0
+Performance State   : P8
 Clocks Throttle Reasons : N/A
 FB Memory Usage
 Total   : 963 MiB
@@ -113,7 +113,7 @@
 Double Bit ECC  : N/A
 Pending : N/A
 Temperature
-GPU Current Temp: 38 C
+GPU Current Temp: 35 C
 GPU Shutdown Temp   : N/A
 GPU Slowdown Temp   : N/A
 Power Readings
@@ -125,10 +125,10 @@
 Min Power Limit : N/A
 Max Power Limit : N/A
 Clocks
-Graphics: 625 MHz
-SM  : 1251 MHz
-Memory  : 1304 MHz
-Video   : 540 MHz
+Graphics: 405 MHz
+SM  : 810 MHz
+Memory  : 324 MHz
+Video   : 405 MHz
 Applications Clocks
 Graphics: N/A
 Memory  : N/A


[1]
nvidia0:  on vgapci0
vgapci0: child nvidia0 requested pci_enable_io
vgapci0: attempting to allocate 1 MSI vectors (1 supported)
msi: routing MSI IRQ 269 to local APIC 3 vector 51
vgapci0: using IRQ 269 for MSI
vgapci0: child nvidia0 requested pci_enable_io
nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  
367.44  Wed Aug 17 22:05:09 PDT 2016
acquiring duplicate lock of same type: "os.lock_sx"
 1st os.lock_sx @ nvidia_os.c:599
 2nd os.lock_sx @ nvidia_os.c:599
stack backtrace:
#

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread Rodney W. Grimes

> 
> > Hi,
> > 
> > >> The problem appears to be in the area of assigning memory-mapped
> > >> I/O ranges by bhyve for the VGA card to a region outside of the
> > >> CPU's addressable space; i.e., bhyve does not check CPUID's
> > >> 0x8008 AL value (0x27 for my CPU, which is 39 bits -- while
> > >> bhyve assigns 0xd0 & above for the large Prefetch Memory
> > >> chunks, which requires 40 address bits).  
> > 
> >   That's correct - it's a bug in bhyve.
> 
> Baking a proper fix will be more complicated by the fact that PCIe
> cards themselves may have limitations. For example, most nVidia GPUs
> have 40 bits DMA addressing capability, some 39, an a few (still
> modern) ones -- just 37 [ref. nVidia "README" in the driver package].
> 
> >   PCI passthru doesn't allow the BAR values to be modified (this could 
> > be changed, but it's a lot of work for little gain).
> 
> Removing another signature of detecting virtualization and increasing
> compatibility would be negligible gain? Just asking...

I don't think we are going to try and defeat the NVidia virtualization
checks, and I can probably assure you that they would patch them as
fast as we bypassed them.

> > > But:
> > >   # ./nvidia-smi
> > >   No devices were found
> > > dmesg:
> > >   [  173.498953] NVRM: RmInitAdapter failed! (0x53:0x3:1856)
> > >   [  173.499115] NVRM: rm_init_adapter failed for device bearing
> > > minor number 0  
> > 
> >   Looks like you're getting close :)
> 
> Hmm, I'm not seeing myself getting much closer here. Do you know
> something I don't? ;) I really hope bhyve developers can spare a
> bit of time on getting GPU passthrough to work... I know nothing
> about these things, and where I waste half a day messing around,
> the problem could be fixed in half an hour by someone who knows.

This is officially an item on my plate.  Functional hardware that works
doing VGA passthrough under ESXi was brought up about 2 weeks ago so I
am past that stage.I have Quadro FX3800 on a Supermicro X9DAi.

You have helped some in uncovering the next set of issues, but my
plate is very large, and seems to have grown appendages that are
holding all sorts of things :-)

I am still coming up to speed on the bhyve code, so it wont be
a half hour fix by me, but it well get fixed.

> > later,
> > Peter.
> 
> -- 
> [SorAlx]  ridin' VN2000 Classic LT
> ___
> freebsd-virtualization@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
> To unsubscribe, send any mail to 
> "freebsd-virtualization-unsubscr...@freebsd.org"

-- 
Rod Grimes rgri...@freebsd.org
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread Peter Grehan


Hi,


  That's correct - it's a bug in bhyve.


Baking a proper fix will be more complicated by the fact that PCIe
cards themselves may have limitations. For example, most nVidia GPUs
have 40 bits DMA addressing capability, some 39, an a few (still
modern) ones -- just 37 [ref. nVidia "README" in the driver package].


 That's a different issue - it's unlikely, if not impossible, to 
configure bhyve with enough RAM to hit 37 bits worth where that would 
become a problem. No need to worry about that.



  PCI passthru doesn't allow the BAR values to be modified (this could
be changed, but it's a lot of work for little gain).


Removing another signature of detecting virtualization and increasing
compatibility would be negligible gain? Just asking...


 There are lots of BIOS/UEFI implementations out there that have the 
same restriction. In general, there should be no need for a guest to 
reprogram device BARs.


 After changing the 64-bit BAR base address, did you still need the 
pci=nocrs option for Linux ? I'd hope this would be no longer necessary.



But:
  # ./nvidia-smi
  No devices were found
dmesg:
  [  173.498953] NVRM: RmInitAdapter failed! (0x53:0x3:1856)
  [  173.499115] NVRM: rm_init_adapter failed for device bearing
minor number 0


  Looks like you're getting close :)


Hmm, I'm not seeing myself getting much closer here. Do you know
something I don't? ;) I really hope bhyve developers can spare a
bit of time on getting GPU passthrough to work... I know nothing
about these things, and where I waste half a day messing around,
the problem could be fixed in half an hour by someone who knows.


 The problem is the knowledge set of graphics/GPU knowhow and equipment 
access, and bhyve/PCI programming, are disjoint. The time I've spent on 
it has been the inverse, where I feel that I've spent a half-day doing 
things that anyone who knew about graphics could get done in a half-hour :)


 For these type of issues, joint work is best to leverage the knowledge 
of both sides. From my point-of-view, the work you've done has been very 
helpful.


later,

Peter.

___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread soralx


> Is the VM checking documented in the driver notes somewhere? I have a

It's not in their driver's "README" file.

> Titan X that I need to run CUDA on and would be much happier if I
> didn't have to actually switch back and forth between FreeBSD and
> Ubuntu on my desktop. Are we new fairly certain that this won't work?

Not certain. The idea that nVidia artificially limits the use of
the non-pro cards in VMs in their drivers are only speculations.
There is a possibility that certain BIOS and/or hardware features
are missing in the gaming cards.

> (Yet another reason to go with AMD if they ever deliver on ROCm)

Yeah, AMD are pretty good for computing. And they don't seem to
limit floating-point performance as severely as nVidia for
non-"professional" cards.

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread soralx


> Hi,
> 
> >> The problem appears to be in the area of assigning memory-mapped
> >> I/O ranges by bhyve for the VGA card to a region outside of the
> >> CPU's addressable space; i.e., bhyve does not check CPUID's
> >> 0x8008 AL value (0x27 for my CPU, which is 39 bits -- while
> >> bhyve assigns 0xd0 & above for the large Prefetch Memory
> >> chunks, which requires 40 address bits).  
> 
>   That's correct - it's a bug in bhyve.

Baking a proper fix will be more complicated by the fact that PCIe
cards themselves may have limitations. For example, most nVidia GPUs
have 40 bits DMA addressing capability, some 39, an a few (still
modern) ones -- just 37 [ref. nVidia "README" in the driver package].

>   PCI passthru doesn't allow the BAR values to be modified (this could 
> be changed, but it's a lot of work for little gain).

Removing another signature of detecting virtualization and increasing
compatibility would be negligible gain? Just asking...

> > But:
> >   # ./nvidia-smi
> >   No devices were found
> > dmesg:
> >   [  173.498953] NVRM: RmInitAdapter failed! (0x53:0x3:1856)
> >   [  173.499115] NVRM: rm_init_adapter failed for device bearing
> > minor number 0  
> 
>   Looks like you're getting close :)

Hmm, I'm not seeing myself getting much closer here. Do you know
something I don't? ;) I really hope bhyve developers can spare a
bit of time on getting GPU passthrough to work... I know nothing
about these things, and where I waste half a day messing around,
the problem could be fixed in half an hour by someone who knows.

> later,
> Peter.

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread soralx


> IIRC the 367.44 version of the nvidia drivers do NOT support the
> Quadro 2000, you need to be using the 340.xx version of them.  I
> ran into problems on native hardware.

I pulled the Quadro 2000 out of my workstation [and put the 600 in],
which is running fine with the latest driver from ports (367.44).

> Also before you attempt to get VGA passthrough working it is best
> to make sure you can run native, have you tried running your guest
> on the host in a native configuration?

Yes, I just installed the nVidia driver on the host, and it works
fine.

> I have fought this on other platforms many times only to find out
> that what I was trying would not ever run native, let alone in a
> virtualized environment.

This gives me the idea to try a different driver version in Linux...

-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread soralx


> As far as I can tell it's the Hypervisor extension flags list. The lack 
> of these extensions/optimisations might explain why your FreeBSD VM
> runs slow 

The guest isn't slow, actually -- just the `nvidia-smi` tool was
much slower than normal to produce output. CPU speed in the guest
is less than 2% slower than bare hardware. Memory bandwidth is
over 2.5 times slower. [0]  Still, compiling world and ports feels
quite snappy. Disk I/O is not bad, too! (but latencies are bit
high) [1]

> However, even with reapplying the changes to vmm.ko to hide/remove the 
> 0x4000 CPUID support and CPUID2_HV, I still have the same 
> "RmInitAdapter failed" issue.

Well, I, too, couldn't get nVidia driver to work at all in Linux.
The driver loaded, but when trying to use it, gave the "RmInitAdapter
failed! (0x53:0x3:1856)" errors. Is is the same in your case? Or
the driver does not load at all (like I experienced with Quadro 600)?
Maybe I should try the same driver version in Linux as in FreeBSD...

> Allegedly[0] nVidia VM checking came in with driver version 337.88,
> with more checking after version 344.11. I couldn't install version 319
> as it failed to build the Linux kernel module. I currently have 370.28 
> installed which supports both my GT610 and my GTX960.

`nvidia-smi -q` thinks it's running on bare hardware [when in VM]:
GPU Virtualization Mode
Virtualization mode : None
Yet we know the driver refuses to load for Quadro 600 and your GTX.
(that two-faced bastard nVidia!) So there must be multiple checks.

> Maybe the next thing for me to try is to replicate your tests with a 
> FreeBSD VM.

Yes, give it a try. I'm about to give up, for nVidia card at least
(I was only using it for testing, as I didn't have AMD GPU handy
until recently).


[0]
# Host with idle FreeBSD VM guest running
# ubench 
Unix Benchmark Utility v.0.3
FreeBSD 10.3-STABLE FreeBSD 10.3-STABLE #0 r311343M: Thu Jan  5 02:31:50 PST 
2017 xxx@yyy:/usr/obj/usr/src/sys/SORALX amd64
Ubench CPU:  2076542
Ubench MEM:  1296221

Ubench AVG:  1686381

# Guest
# ubench
Unix Benchmark Utility v.0.3
FreeBSD 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r311659: Tue Jan 10 14:18:44 PST 
2017 xxx@zzz:/usr/obj/usr/src/sys/GENERIC amd64
Ubench CPU:  2053983
Ubench MEM:   489953

Ubench AVG:  1271968


[1]
# Host
# diskinfo -tv /dev/ada0
/dev/ada0
512 # sectorsize
250059350016# mediasize in bytes (233G)
488397168   # mediasize in sectors
4096# stripesize
Seek times:
Full stroke:  250 iter in   0.029146 sec =0.117 msec
Half stroke:  250 iter in   0.029950 sec =0.120 msec
Quarter stroke:   500 iter in   0.057272 sec =0.115 msec
Short forward:400 iter in   0.045711 sec =0.114 msec
Short backward:   400 iter in   0.045748 sec =0.114 msec
Seq outer:   2048 iter in   0.072799 sec =0.036 msec
Seq inner:   2048 iter in   0.072256 sec =0.035 msec
Transfer rates:
outside:   102400 kbytes in   0.222412 sec =   460407 kbytes/sec
middle:102400 kbytes in   0.222026 sec =   461207 kbytes/sec
inside:102400 kbytes in   0.222842 sec =   459518 kbytes/sec

# Guest
# diskinfo -tv /dev/vtbd0
/dev/vtbd0
512 # sectorsize
22548644864 # mediasize in bytes (21G)
44040322# mediasize in sectors
32768   # stripesize
Seek times:
Full stroke:  250 iter in   0.112714 sec =0.451 msec
Half stroke:  250 iter in   0.100388 sec =0.402 msec
Quarter stroke:   500 iter in   0.132808 sec =0.266 msec
Short forward:400 iter in   0.074504 sec =0.186 msec
Short backward:   400 iter in   0.096154 sec =0.240 msec
Seq outer:   2048 iter in   0.208239 sec =0.102 msec
Seq inner:   2048 iter in   0.061853 sec =0.030 msec

Transfer rates:
outside:   102400 kbytes in   0.201375 sec =   508504 kbytes/sec
middle:102400 kbytes in   0.196044 sec =   522332 kbytes/sec
inside:102400 kbytes in   0.197697 sec =   517964 kbytes/sec


-- 
[SorAlx]  ridin' VN2000 Classic LT
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread Peter Grehan


Hi,


There doesn't seem to be support for CPUID 0x4001 in bhyve either.

What is it supposed to do?


As far as I can tell it's the Hypervisor extension flags list. The lack
of these extensions/optimisations might explain why your FreeBSD VM runs
slow but their presence also causes the nVidia driver to refuse to run.


 That leaf is KVM-only. bhyve doesn't have any additional hypervisor 
leaves beyond 0x400


 (the spec for this is https://lwn.net/Articles/301888/)

later,

Peter.
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread K. Macy

Is the VM checking documented in the driver notes somewhere? I have a Titan
X that I need to run CUDA on and would be much happier if I didn't have to
actually switch back and forth between FreeBSD and Ubuntu on my desktop.
Are we new fairly certain that this won't work? (Yet another reason to go
with AMD if they ever deliver on ROCm)


On Wed, Jan 11, 2017 at 05:53 Dom  wrote:

> On 11/01/2017 02:01, sor...@cydem.org wrote:
>
> > Dom wote:
>
> >> There doesn't seem to be support for CPUID 0x4001 in bhyve either.
>
> > What is it supposed to do?
>
>
>
> As far as I can tell it's the Hypervisor extension flags list. The lack
>
> of these extensions/optimisations might explain why your FreeBSD VM runs
>
> slow but their presence also causes the nVidia driver to refuse to run.
>
> (Can't remember where I read this, sorry)
>
>
>
> With your change to PCI_EMUL_MEMBASE64 I can boot a CentOS VM without
>
> the "pci=nocrs" kernel option and nVidia card is assigned BARs without
>
> issue.
>
>
>
> However, even with reapplying the changes to vmm.ko to hide/remove the
>
> 0x4000 CPUID support and CPUID2_HV, I still have the same
>
> "RmInitAdapter failed" issue.
>
>
>
> Allegedly[0] nVidia VM checking came in with driver version 337.88, with
>
> more checking after version 344.11. I couldn't install version 319 as it
>
> failed to build the Linux kernel module. I currently have 370.28
>
> installed which supports both my GT610 and my GTX960.
>
>
>
> Maybe the next thing for me to try is to replicate your tests with a
>
> FreeBSD VM.
>
>
>
> [0] https://ubuntuforums.org/showthread.php?t=2266916 search for
>
> "337.88"
>
> ___
>
> freebsd-virtualization@freebsd.org mailing list
>
> https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
>
> To unsubscribe, send any mail to "
> freebsd-virtualization-unsubscr...@freebsd.org"
>
>
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread Dom


On 11/01/2017 02:01, sor...@cydem.org wrote:

Dom wote:

There doesn't seem to be support for CPUID 0x4001 in bhyve either.

What is it supposed to do?


As far as I can tell it's the Hypervisor extension flags list. The lack 
of these extensions/optimisations might explain why your FreeBSD VM runs 
slow but their presence also causes the nVidia driver to refuse to run. 
(Can't remember where I read this, sorry)


With your change to PCI_EMUL_MEMBASE64 I can boot a CentOS VM without 
the "pci=nocrs" kernel option and nVidia card is assigned BARs without 
issue.


However, even with reapplying the changes to vmm.ko to hide/remove the 
0x4000 CPUID support and CPUID2_HV, I still have the same 
"RmInitAdapter failed" issue.


Allegedly[0] nVidia VM checking came in with driver version 337.88, with 
more checking after version 344.11. I couldn't install version 319 as it 
failed to build the Linux kernel module. I currently have 370.28 
installed which supports both my GT610 and my GTX960.


Maybe the next thing for me to try is to replicate your tests with a 
FreeBSD VM.


[0] https://ubuntuforums.org/showthread.php?t=2266916 search for 
"337.88"

___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread Rodney W. Grimes

IIRC the 367.44 version of the nvidia drivers do NOT support the
Quadro 2000, you need to be using the 340.xx version of them.  I
ran into problems on native hardware.

Also before you attempt to get VGA passthrough working it is best
to make sure you can run native, have you tried running your guest
on the host in a native configuration?

I have fought this on other platforms many times only to find out
that what I was trying would not ever run native, let alone in a
virtualized environment.

> > The problem appears to be in the area of assigning memory-mapped
> > I/O ranges by bhyve for the VGA card to a region outside of the
> > CPU's addressable space; i.e., bhyve does not check CPUID's
> > 0x8008 AL value (0x27 for my CPU, which is 39 bits -- while
> > bhyve assigns 0xd0 & above for the large Prefetch Memory
> > chunks, which requires 40 address bits). At least this is my
> > understanding of why VGA passthrough does not work.
> 
> To test this, I tried writing to PCI BARs in FreeBSD guest using
> `pciconf -w`. Not much use that was: I could read back the values
> written to the registers (e.g., `pciconf -r pci0:0:4:0 0x14:48`),
> but `pciconf -lvb` still showed the same huge base addresses --
> they did not want to change.
> 
> OK, I had enough of that. So I went to dig in the source, and
> changed the "#define PCI_EMUL_MEMBASE64" from '0xD0UL'
> to '0x34UL' in src/usr.sbin/bhyve/pci_emul.c. Recompiled
> bhyve, booted up FreeBSD, and:
>   # pciconf -lvb
>   [...]
>   vgapci0@pci0:0:4:0: class=0x03 card=0x084a10de chip=0x0dd810de 
> rev=0xa1 hdr=0x00
>   vendor = 'NVIDIA Corporation'
>   device = 'GF106GL [Quadro 2000]'
>   class  = display
>   subclass   = VGA
>   bar   [10] = type Memory, range 32, base 0xc200, size 33554432, 
> enabled
>   bar   [14] = type Prefetchable Memory, range 64, base 0x34, 
> size 134217728, enabled
>   bar   [1c] = type Prefetchable Memory, range 64, base 0x340800, 
> size 67108864, enabled
>   bar   [24] = type I/O Port, range 32, base 0x2080, size 128, enabled
> 
> ...a-a-and:
>   # kldload nvidia-modeset
>   Linux ELF exec handler installed
>   nvidia0:  on vgapci0
>   vgapci0: child nvidia0 requested pci_enable_io
>   vgapci0: attempting to allocate 1 MSI vectors (1 supported)
>   msi: routing MSI IRQ 269 to local APIC 3 vector 51
>   vgapci0: using IRQ 269 for MSI
>   vgapci0: child nvidia0 requested pci_enable_io
>   random: harvesting attach, 8 bytes (4 bits) from nvidia0
>   # nvidia-smi
>   acquiring duplicate lock of same type: "os.lock_sx"
>1st os.lock_sx @ nvidia_os.c:599
>2nd os.lock_sx @ nvidia_os.c:599
>   stack backtrace:
>   #0 0x80aa6780 at witness_debugger+0x70
>   #1 0x80aa6683 at witness_checkorder+0xde3
>   #2 0x80a4fac2 at _sx_xlock+0x72
>   #3 0x82a515c2 at os_acquire_mutex+0x32
>   #4 0x82a21068 at _nv016673rm+0x18
>   Tue Jan 10 17:06:48 2017   
>   
> +-+
>   | NVIDIA-SMI 367.44 Driver Version: 367.44  
>   |
>   
> |---+--+--+
>   | GPU  NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. 
> ECC |
>   | Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage | GPU-Util  Compute 
> M. |
>   
> |===+==+==|
>   |   0  Quadro 2000 Off  | :00:04.0 Off |  
> N/A |
>   | 30%   35CP8N/A /  N/A |  0MiB /   963MiB |  0%  
> Default |
>   
> +---+--+--+
>   
>  
>   
> +-+
>   | Processes:   GPU 
> Memory |
>   |  GPU   PID  Type  Process name   Usage
>   |
>   
> |=|
>   |  No running processes found   
>   |
>   
> +-+
> 
> Beauty! It's very slow to execute, though. And Xorg is not in a hurry
> to start working:
>   [   204.724] (--) PCI:*(0:0:4:0) 10de:0dd8:10de:084a rev 161, Mem @ 
> 0xc200/33554432, 0x34/134217728, 0x340800/67108864, I/O @ 
> 0x2080/128, BIOS @ 0x/65536
>   [...]
>   [   204.736] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
>   [   204.736] (==) NVIDIA(0): RGB weight 888
>   [   204.736] (==) NVIDIA(0): Default visual is TrueColor
>   [   204.736] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
>   [   204.738] (**) NVIDIA(0): Enabling 2D acceleration
>

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread Peter Grehan


Hi,


The problem appears to be in the area of assigning memory-mapped
I/O ranges by bhyve for the VGA card to a region outside of the
CPU's addressable space; i.e., bhyve does not check CPUID's
0x8008 AL value (0x27 for my CPU, which is 39 bits -- while
bhyve assigns 0xd0 & above for the large Prefetch Memory
chunks, which requires 40 address bits).


 That's correct - it's a bug in bhyve.


To test this, I tried writing to PCI BARs in FreeBSD guest using
`pciconf -w`. Not much use that was: I could read back the values
written to the registers (e.g., `pciconf -r pci0:0:4:0 0x14:48`),
but `pciconf -lvb` still showed the same huge base addresses --
they did not want to change.


 PCI passthru doesn't allow the BAR values to be modified (this could 
be changed, but it's a lot of work for little gain).



OK, I had enough of that. So I went to dig in the source, and
changed the "#define PCI_EMUL_MEMBASE64" from '0xD0UL'
to '0x34UL' in src/usr.sbin/bhyve/pci_emul.c.


 Yep, that's a good way to test.


But:
  # ./nvidia-smi
  No devices were found
dmesg:
  [  173.498953] NVRM: RmInitAdapter failed! (0x53:0x3:1856)
  [  173.499115] NVRM: rm_init_adapter failed for device bearing minor number 0


 Looks like you're getting close :)

later,

Peter.

___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-11 Thread soralx


> The problem appears to be in the area of assigning memory-mapped
> I/O ranges by bhyve for the VGA card to a region outside of the
> CPU's addressable space; i.e., bhyve does not check CPUID's
> 0x8008 AL value (0x27 for my CPU, which is 39 bits -- while
> bhyve assigns 0xd0 & above for the large Prefetch Memory
> chunks, which requires 40 address bits). At least this is my
> understanding of why VGA passthrough does not work.

To test this, I tried writing to PCI BARs in FreeBSD guest using
`pciconf -w`. Not much use that was: I could read back the values
written to the registers (e.g., `pciconf -r pci0:0:4:0 0x14:48`),
but `pciconf -lvb` still showed the same huge base addresses --
they did not want to change.

OK, I had enough of that. So I went to dig in the source, and
changed the "#define PCI_EMUL_MEMBASE64" from '0xD0UL'
to '0x34UL' in src/usr.sbin/bhyve/pci_emul.c. Recompiled
bhyve, booted up FreeBSD, and:
  # pciconf -lvb
  [...]
  vgapci0@pci0:0:4:0: class=0x03 card=0x084a10de chip=0x0dd810de 
rev=0xa1 hdr=0x00
  vendor = 'NVIDIA Corporation'
  device = 'GF106GL [Quadro 2000]'
  class  = display
  subclass   = VGA
  bar   [10] = type Memory, range 32, base 0xc200, size 33554432, 
enabled
  bar   [14] = type Prefetchable Memory, range 64, base 0x34, size 
134217728, enabled
  bar   [1c] = type Prefetchable Memory, range 64, base 0x340800, size 
67108864, enabled
  bar   [24] = type I/O Port, range 32, base 0x2080, size 128, enabled

...a-a-and:
  # kldload nvidia-modeset
  Linux ELF exec handler installed
  nvidia0:  on vgapci0
  vgapci0: child nvidia0 requested pci_enable_io
  vgapci0: attempting to allocate 1 MSI vectors (1 supported)
  msi: routing MSI IRQ 269 to local APIC 3 vector 51
  vgapci0: using IRQ 269 for MSI
  vgapci0: child nvidia0 requested pci_enable_io
  random: harvesting attach, 8 bytes (4 bits) from nvidia0
  # nvidia-smi
  acquiring duplicate lock of same type: "os.lock_sx"
   1st os.lock_sx @ nvidia_os.c:599
   2nd os.lock_sx @ nvidia_os.c:599
  stack backtrace:
  #0 0x80aa6780 at witness_debugger+0x70
  #1 0x80aa6683 at witness_checkorder+0xde3
  #2 0x80a4fac2 at _sx_xlock+0x72
  #3 0x82a515c2 at os_acquire_mutex+0x32
  #4 0x82a21068 at _nv016673rm+0x18
  Tue Jan 10 17:06:48 2017   
  
+-+
  | NVIDIA-SMI 367.44 Driver Version: 367.44
|
  
|---+--+--+
  | GPU  NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. ECC 
|
  | Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage | GPU-Util  Compute M. 
|
  
|===+==+==|
  |   0  Quadro 2000 Off  | :00:04.0 Off |  N/A 
|
  | 30%   35CP8N/A /  N/A |  0MiB /   963MiB |  0%  Default 
|
  
+---+--+--+
   
  
+-+
  | Processes:   GPU Memory 
|
  |  GPU   PID  Type  Process name   Usage  
|
  
|=|
  |  No running processes found 
|
  
+-+

Beauty! It's very slow to execute, though. And Xorg is not in a hurry
to start working:
  [   204.724] (--) PCI:*(0:0:4:0) 10de:0dd8:10de:084a rev 161, Mem @ 
0xc200/33554432, 0x34/134217728, 0x340800/67108864, I/O @ 
0x2080/128, BIOS @ 0x/65536
  [...]
  [   204.736] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
  [   204.736] (==) NVIDIA(0): RGB weight 888
  [   204.736] (==) NVIDIA(0): Default visual is TrueColor
  [   204.736] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
  [   204.738] (**) NVIDIA(0): Enabling 2D acceleration
  [   213.674] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:0:4:0
  [   213.674] (--) NVIDIA(0): CRT-0
  [   213.674] (--) NVIDIA(0): DFP-0 (boot)
  [   213.674] (--) NVIDIA(0): DFP-1
  [   213.674] (--) NVIDIA(0): DFP-2
  [   213.674] (--) NVIDIA(0): DFP-3
  [   213.675] (--) NVIDIA(0): DFP-4
  [   213.698] (--) NVIDIA(0): CRT-0: disconnected
  [   213.698] (--) NVIDIA(0): CRT-0: 400.0 MHz maximum pixel clock
  [   213.698] (--) NVIDIA(0): 
  [   213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): connected
  [   213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): Internal TMDS
  [   213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): 330.0 MHz maximum pixel 
clock
  [...

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-10 Thread soralx


> Found my original attempt by modifying /usr/src/sys/amd64/vmm/x86.c
> Unified diff follows, but this didn't work for me.
> ("bhyve_id[]" commented out to prevent compiler complaints)

Who knows what sort of trickery nVidia's driver is up to besides
CPUID when determining the presence of virtualization.

Regardless of that, VGA PCIe passthrough does not work in bhyve
even with Quadro 2000 (which Xen people have had success with).

The problem appears to be in the area of assigning memory-mapped
I/O ranges by bhyve for the VGA card to a region outside of the
CPU's addressable space; i.e., bhyve does not check CPUID's
0x8008 AL value (0x27 for my CPU, which is 39 bits -- while
bhyve assigns 0xd0 & above for the large Prefetch Memory
chunks, which requires 40 address bits). At least this is my
understanding of why VGA passthrough does not work.

This seems easy to fix. Could someone who knows better have a look?

Unlike Linux, FreeBSD has no problem assigning BAR range outside
addressable range, and then panics when trying to write to these
virtual memory addresses. See [0] below.

> There doesn't seem to be support for CPUID 0x4001 in bhyve either.

What is it supposed to do?


[0]
Linux dmesg:
[0.204799] PCI: MMCONFIG for domain  [bus 00-ff] at [mem 
0xe000-0xefff] (base 0xe000)
[0.205474] PCI: MMCONFIG at [mem 0xe000-0xefff] reserved in ACPI 
motherboard resources
[0.206080] PCI: Using host bridge windows from ACPI; if necessary, use 
"pci=nocrs" and report a bug
[0.207306] ACPI: PCI Root Bridge [PC00] (domain  [bus 00])
[0.207724] acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM 
Segments MSI]
[0.208291] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[0.208759] acpi PNP0A03:00: host bridge window [mem 
0xd0-0xd00c0f window] (ignored, not CPU addressable)
[0.209517] PCI host bridge to bus :00
[0.209808] pci_bus :00: root bus resource [io  0x-0x0cf7 window]
[0.210281] pci_bus :00: root bus resource [io  0x0d00-0x1fff window]
[0.210752] pci_bus :00: root bus resource [io  0x2000-0x211f window]
[0.211224] pci_bus :00: root bus resource [mem 0xc000-0xc40f 
window]
[0.211743] pci_bus :00: root bus resource [bus 00]
[...]
[0.223902] PCI: Using ACPI for IRQ routing
[0.265987] pci :00:03.0: can't claim BAR 1 [mem 
0xd0-0xd007ff 64bit pref]: no compatible bridge window
[0.266735] pci :00:03.0: can't claim BAR 3 [mem 
0xd00800-0xd00bff 64bit pref]: no compatible bridge window
[0.284717] pci :00:03.0: can't claim BAR 6 [mem 0xf600-0xf607 
pref]: no compatible bridge window
[...]
[0.285407] pci :00:03.0: BAR 1: no space for [mem size 0x0800 64bit 
pref]
[0.285933] pci :00:03.0: BAR 1: trying firmware assignment [mem 
0xd0-0xd007ff 64bit pref]
[0.286599] pci :00:03.0: BAR 1: [mem 0xd0-0xd007ff 64bit 
pref] conflicts with PCI mem [mem 0x-0x7f]
[0.287419] pci :00:03.0: BAR 1: failed to assign [mem size 0x0800 
64bit pref]
[0.287968] pci :00:03.0: BAR 3: no space for [mem size 0x0400 64bit 
pref]
[0.288506] pci :00:03.0: BAR 3: trying firmware assignment [mem 
0xd00800-0xd00bff 64bit pref]
[0.289173] pci :00:03.0: BAR 3: [mem 0xd00800-0xd00bff 64bit 
pref] conflicts with PCI mem [mem 0x-0x7f]
[0.289992] pci :00:03.0: BAR 3: failed to assign [mem size 0x0400 
64bit pref]
[0.290539] pci :00:03.0: BAR 6: assigned [mem 0xc008-0xc00f 
pref]
[0.291039] pci :00:01.0: BAR 6: assigned [mem 0xc0002000-0xc00027ff 
pref]
[0.291540] pci :00:02.0: BAR 6: assigned [mem 0xc0002800-0xc0002fff 
pref]

Cannot get output from Linux's `lspci -vvn` booted with "pci=nocrs" kernel 
option,
as it hangs now close to the end of boot process (not sure why, was able to 
finish
booting before).

Another machine:
vgapci0@pci0:1:0:0: class=0x03 card=0x083510de chip=0x0df810de rev=0xa1 
hdr=0x00
vendor = 'nVidia Corporation'
device = 'GF108 [Quadro 600]'
class  = display
subclass   = VGA
bar   [10] = type Memory, range 32, base 0xfa00, size 16777216, enabled
bar   [14] = type Prefetchable Memory, range 64, base 0xe800, size 
134217728, enabled
bar   [1c] = type Prefetchable Memory, range 64, base 0xf000, size 
33554432, enabled
bar   [24] = type I/O Port, range 32, base 0xe000, size 128, enabled
hdac0@pci0:1:0:1:   class=0x040300 card=0x083510de chip=0x0bea10de rev=0xa1 
hdr=0x00
vendor = 'nVidia Corporation'
device = 'GF108 High Definition Audio Controller'
class  = multimedia
subclass   = HDA
bar   [10] = type Memory, range 32, base 0xfb08, size 16384, enabled

Host:
ppt0@pci0:1:0:0:class=0x03 card=0x084a10de chip=0x0dd810de rev=0xa

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-10 Thread Dom


Found my original attempt by modifying /usr/src/sys/amd64/vmm/x86.c
Unified diff follows, but this didn't work for me.
("bhyve_id[]" commented out to prevent compiler complaints)
There doesn't seem to be support for CPUID 0x4001 in bhyve either.

--- x86.c.orig  2016-09-11 14:40:22.410462000 +0100
+++ x86.c   2016-09-11 15:53:14.182186000 +0100
@@ -52,7 +52,7 @@

 #defineCPUID_VM_HIGH   0x4000

-static const char bhyve_id[12] = "bhyve bhyve ";
+/* static const char bhyve_id[12] = "bhyve bhyve "; */

 static uint64_t bhyve_xcpuids;
 SYSCTL_ULONG(_hw_vmm, OID_AUTO, bhyve_xcpuids, CTLFLAG_RW, 
&bhyve_xcpuids, 0,

@@ -236,7 +236,7 @@
regs[2] &= ~(CPUID2_VMX | CPUID2_EST | CPUID2_TM2);
regs[2] &= ~(CPUID2_SMX);

-   regs[2] |= CPUID2_HV;
+   /* regs[2] |= CPUID2_HV; */

if (x2apic_state != X2APIC_DISABLED)
regs[2] |= CPUID2_X2APIC;
@@ -463,12 +463,15 @@
}
break;

+   /*
+* Don't expose KVM to guest
case 0x4000:
regs[0] = CPUID_VM_HIGH;
bcopy(bhyve_id, ®s[1], 4);
bcopy(bhyve_id + 4, ®s[2], 4);
bcopy(bhyve_id + 8, ®s[3], 4);
break;
+   */

default:
/*

___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-10 Thread Dom

With QEMU, they have the "kvm=off" option which hides hypervisor info 
from the guest.

See: https://www.redhat.com/archives/libvir-list/2014-August/msg00512.html

I did try to replicate this a while back but didn't have much success - 
maybe I missed a flag?
The QEMU diff seems relatively small, see: 
http://lists.gnu.org/archive/html/qemu-devel/2014-06/msg00302.html


Having another go at doing this is on my to-do list, but not very near 
the top!


___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2017-01-10 Thread soralx


Howdy, virtualization zealots!

 This is in reply to maillist thread [0].

 It so happens that I have to get GPU-accelerated OpenCL working on
 my machine, so I had a play with bhyve & PCI-e passthrough for VGA.
 I was using nVidia Quadro 600 (GF108) for testing (planning to use
 AMD/ATI for OpenCL, of course).

 I tried a Linux guest with the proprietary nVidia driver, and the
 result was that the driver couldn't init the VGA during boot:
  [1.394726] nvidia: module license 'NVIDIA' taints kernel.
  [1.395140] Disabling lock debugging due to kernel taint
  [1.412132] nvidia: module verification failed: signature and/or required 
key missing - tainting kernel
  [1.419359] nvidia :00:04.0: can't derive routing for PCI INT A
  [1.419807] nvidia :00:04.0: PCI INT A: no GSI
  [1.420157] NVRM: This PCI I/O region assigned to your NVIDIA device is 
invalid:
  [1.420157] NVRM: BAR1 is 0M @ 0x0 (PCI::00:04.0)
  [1.421023] NVRM: The system BIOS may have misconfigured your GPU.
  [1.421476] nvidia: probe of :00:04.0 failed with error -1
  [1.437301] nvidia-nvlink: Nvlink Core is being initialized, major device 
number 247
  [1.440094] NVRM: The NVIDIA probe routine failed for 1 device(s).
  [1.440530] NVRM: None of the NVIDIA graphics adapters were initialized!

 After adding the "pci=nocrs" Linux boot option (which, from what I
 understand, magically helps to [partially] workaround bhyve assigning
 addresses beyond host CPU's physically addressable space for PCIe
 memory-mapped registers), the guest couldn't finish booting, because
 bhyve would segfault.

 Turns out the what peripherals are used, and their order on the
 command line, are important. Edit: actually, looks like it's the
 number of CPUs (the '-c' flag's argument) that makes the difference;
 the machine has a CPU with 4 core, no multithreading.

 This didn't work (segfault):
   `bhyve -A -H -P -s 0:0,hostbridge -s 1:0,lpc -s 2:0,virtio-net,tap0 \
  -s 3:0,virtio-blk,./bhyve_lunix.img \
  -s 4:0,ahci-cd,./ubuntu-16.04.1-server-amd64.iso \
  -s 5:0,passthru,1/0/0 -l com1,stdio -c 4 -m 1024M -S lunixguest`
  [...]
  [  OK  ] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
  [  OK  ] Reached target Swap.
  Assertion failed: (pi->pi_bar[baridx].type == PCIBAR_IO), function 
passthru_write, file /usr/src/usr.sbin/bhyve/pci_passthru.c, line 850.
Abort (core dumped)

 But his worked, finally:
   `bhyve -c 1 -m 1024M -S -A -H -P -s 0:0,hostbridge -s 1:0,lpc \
  -s 2:0,virtio-net,tap0 -s 3:0,virtio-blk,./bhyve_lunix.img \
  -s 4:0,passthru,1/0/0 -l com1,stdio lunixguest`

 So, the guest booted, and didn't complain about non-addressable-
 -by-CPU BARs anymore. However, the same fate befall me as Dom
 had in this thread -- the driver loaded:
  [1.691216] nvidia: module verification failed: signature and/or required 
key missing - tainting kernel
  [1.696641] nvidia :00:04.0: can't derive routing for PCI INT A
  [1.698093] nvidia :00:04.0: PCI INT A: no GSI
  [1.699277] vgaarb: device changed decodes: 
PCI::00:04.0,olddecodes=io+mem,decodes=none:owns=io+mem
  [1.701461] nvidia-nvlink: Nvlink Core is being initialized, major device 
number 247
  [1.702649] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.26  Thu 
Dec  8 18:36:43 PST 2016 (using threaded interrupts)
  [1.705481] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for 
UNIX platforms  375.26  Thu Dec  8 18:04:14 PST 2016
  [1.708941] [drm] [nvidia-drm] [GPU ID 0x0004] Loading driver
 but couldn't talk to the card:
  [lost the log, but it was the same as Dom's: "NVRM: rm_init_adapter failed"].

 So I decided to try test in a FreeBSD 10.3-STABLE guest.

 With older driver, or just loading 'nvidia' without modesetting,
 I got guest kernel panics [1]. I loaded 'nvidia-modeset', there
 was more success:
   Linux ELF exec handler installed
   Linux x86-64 ELF exec handler installed
   nvidia0:  on vgapci0
   vgapci0: child nvidia0 requested pci_enable_io
   vgapci0: attempting to allocate 1 MSI vectors (1 supported)
   msi: routing MSI IRQ 269 to local APIC 2 vector 51
   vgapci0: using IRQ 269 for MSI
   vgapci0: child nvidia0 requested pci_enable_io
   nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 
 367.44  Wed Aug 17 22:05:09 PDT 2016

 But:
   # nvidia-smi 
   NVRM: Xid (PCI::00:04): 62, !2369()
   NVRM: RmInitAdapter failed! (0x26:0x65:1072)
   nvidia0: NVRM: rm_init_adapter() failed!
   No devices were found
 It also panicked after starting Xorg.

 After stumbling upon some Xen forums, I found the solution: nVidia
 crippled the driver so that it detects virtualization environment,
 and refuses to attach to anything but high-end pro cards! Those
 bastards [if the speculation is true]! GTX960 didn't work. Quadro
 600 didn't work. So I tried with a Quadro 2000:
  root@fbsd12tst:~

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2016-09-09 Thread Dom


It looks like there may not be an issue with MSI after all.

The nvidia driver is issued an IRQ when first used, not at boot time.
If I run the CUDA "deviceQuery" sample then this appears in dmesg:

[   67.207929] nvidia :00:06.0: irq 29 for MSI/MSI-X
[   67.646207] NVRM: RmInitAdapter failed! (0x24:0x1f:1356)
[   67.646570] NVRM: rm_init_adapter failed for device bearing minor 
number 0
[   67.647214] NVRM: nvidia_frontend_open: minor 0, module->open() 
failed, error -5


Maybe the IRQ is deallocated immediately so doesn't appear in the output 
of /proc/interrupts?


I guess I'll need to research the NVRM error above now but at least this 
thread might be useful regarding the BAR allocation.

___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2016-09-09 Thread Dom


Hi Peter,

Thanks for getting back to me. Here's the info you requested:


[0.163085] acpi PNP0A03:00: host bridge window
[0xd0-0xd0100f] (ignored, not CPU addressable)


 That one is most likely a bug in bhyve, where the space used for 64-bit
BAR placement isn't tested against the max physaddr width of the host
CPU.

 To confirm, would you be able to report on this value on your system ?

# sudo pkg install cpuid
# cpuid | grep ^8008

On my Intel i7-4790K CPU:

# cpuid | grep ^8008
8008 3027   


The device has an MSI capability, but the nvidia driver may not use it.
bhyve PCI passthrough requires the use of MSI/MSI-x interrupts, and
doesn't support using legacy interrupts.

This could be confirmed from the output of /proc/interrupts when
booting Linux on the system.

Output of /proc/interrupts:

CPU0
   0:137   IO-APIC-edge  timer
   1:  9   IO-APIC-edge  i8042
   4:965   IO-APIC-edge  serial
   8:  0   IO-APIC-edge  rtc0
   9:  0   IO-APIC-fasteoi   acpi
  12:138   IO-APIC-edge  i8042
  17:  0   IO-APIC-fasteoi   snd_hda_intel
  24:  0   PCI-MSI-edge  virtio0-config
  25:   8535   PCI-MSI-edge  virtio0-req.0
  26:  0   PCI-MSI-edge  virtio1-config
  27:123   PCI-MSI-edge  virtio1-input.0
  28:  1   PCI-MSI-edge  virtio1-output.0
NMI:  0   Non-maskable interrupts
LOC:   6050   Local timer interrupts
SPU:  0   Spurious interrupts
PMI:  0   Performance monitoring interrupts
IWI:   2484   IRQ work interrupts
RTR:  0   APIC ICR read retries
RES:  0   Rescheduling interrupts
CAL:  0   Function call interrupts
TLB:  0   TLB shootdowns
TRM:  0   Thermal event interrupts
THR:  0   Threshold APIC interrupts
MCE:  0   Machine check exceptions
MCP:  1   Machine check polls
ERR:  0
MIS:  0

I guess the lack of a line containing PCI-MSI-* here indicates the 
nvidia driver isn't using an MSI/MSI-x interrupt?
However, searching the web suggests the Linux nvidia driver does use MSI 
interrupts. This taken from a working non-VM Linux dmesg:


[4.330536] nvidia :05:00.0: PCI INT A -> GSI 16 (level, low) -> 
IRQ 16

[4.330542] nvidia :05:00.0: setting latency timer to 64

Source: https://bugzilla.kernel.org/show_bug.cgi?id=20432#c2
(Thread also mentions disabling MSI)

I'll try some Linux boot options and reordering the devices when calling 
bhyve to see if that changes anything.


Thanks,

Dom

___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2016-09-07 Thread Peter Grehan


Hi Dom,


Bhyve's ACPI table produces this error linux-side regardless of
"pci=" setting:

[0.163085] acpi PNP0A03:00: host bridge window
[0xd0-0xd0100f] (ignored, not CPU addressable)


 That one is most likely a bug in bhyve, where the space used for 64-bit
BAR placement isn't tested against the max physaddr width of the host CPU.

 To confirm, would you be able to report on this value on your system ?

# sudo pkg install cpuid
# cpuid | grep ^8008
8008 3027   
   ^^
This is the output from a Xeon E3-1220 v3: 0x27 == 39 bits of phys
address (0x80 max)

 0xd0 requires >= 40 bits.


2. "can't derive routing for PCI INT"

Linux-side dmesg related output:

[1.677168] nvidia :00:06.0: can't derive routing for PCI INT
A [1.677600] nvidia :00:06.0: PCI INT A: no GSI

 ...

Host-side info (when GTX960 is NOT configured as a pass-thru dev):

...

cap 05[68] = MSI supports 1 message, 64 bit enabled with 1 message


 The device has an MSI capability, but the nvidia driver may not use it.
bhyve PCI passthrough requires the use of MSI/MSI-x interrupts, and 
doesn't support using legacy interrupts.


 This could be confirmed from the output of /proc/interrupts when 
booting Linux on the system.


later,

Peter.

___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)

2016-09-02 Thread Dom


Hello,

Setup:

nvidia GTX960 in PCIe slot
intel i7-4790K CPU
FreeBSD 11-RC2 host
CentOS 7 guest with kernel 3.10.0-327.28.3.el7.x86_64
Using vm-bhyve port

I've hit two issues:

1. BAR allocation

Workaround (for me) is adding "pci=nocrs" to linux guest's kernel 
command line.
Without "pci=nocrs" (or with "pci=use_crs"), the GTX960 doesn't get its 
256MB block allocated.


Bhyve's ACPI table produces this error linux-side regardless of "pci=" 
setting:


[0.163085] acpi PNP0A03:00: host bridge window 
[0xd0-0xd0100f] (ignored, not CPU addressable)


which then leads to this:

[0.215369] pci :00:06.0: can't claim BAR 1 [mem 
0xd0-0xd00fff 64bit pref]: no compatible bridge window


and then, with "pci=use_crs" (i.e. use ACPI host bridge windows):

[0.164030] pci_bus :00: root bus resource [bus 00]
[0.164379] pci_bus :00: root bus resource [io 0x-0x0cf7]
[0.164799] pci_bus :00: root bus resource [io 0x0d00-0x1fff]
[0.165206] pci_bus :00: root bus resource [io 0x2000-0x211f]
[0.165623] pci_bus :00: root bus resource [mem 
0xc000-0xc41f]

...
[0.231762] pci :00:06.0: BAR 1: no space for [mem size 
0x1000 64bit pref]
[0.232263] pci :00:06.0: BAR 1: trying firmware assignment [mem 
size 0x1000 64bit pref]
[0.232855] pci :00:06.0: BAR 1: [mem size 0x1000 64bit pref] 
conflicts with PCI mem [mem 0x-0x7f]
[0.233579] pci :00:06.0: BAR 1: failed to assign [mem size 
0x1000 64bit pref]


but with "pci=nocrs" (i.e. ignore ACPI host bridge windows):

[0.163967] pci_bus :00: root bus resource [bus 00]
[0.164323] pci_bus :00: root bus resource [io 0x-0x]
[0.164745] pci_bus :00: root bus resource [mem 
0x-0x7f]

...
[0.230203] pci :00:06.0: BAR 1: assigned [mem 
0x14000-0x14fff 64bit pref]




2. "can't derive routing for PCI INT"

Linux-side dmesg related output:

[1.677168] nvidia :00:06.0: can't derive routing for PCI INT A
[1.677600] nvidia :00:06.0: PCI INT A: no GSI

Host-side info (when GTX960 is NOT configured as a pass-thru dev):

vgapci0@pci0:1:0:0: class=0x03 card=0x19623842 chip=0x140110de 
rev=0xa1 hdr=0x00

vendor = 'NVIDIA Corporation'
device = 'GM206 [GeForce GTX 960]'
class  = display
subclass   = VGA
bar   [10] = type Memory, range 32, base 0xf600, size 16777216, 
enabled
bar   [14] = type Prefetchable Memory, range 64, base 0xe000, 
size 268435456, enabled
bar   [1c] = type Prefetchable Memory, range 64, base 0xf000, 
size 33554432, enabled

bar   [24] = type I/O Port, range 32, base 0xe000, size 128, enabled
cap 01[60] = powerspec 3  supports D0 D3  current D0
cap 05[68] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[78] = PCI-Express 2 legacy endpoint max data 128(256) RO
 link x4(x16) speed 2.5(8.0)
ecap 0002[100] = VC 1 max VC0
ecap 001e[258] = unknown 1
ecap 0004[128] = Power Budgeting 1
ecap 0001[420] = AER 2 0 fatal 0 non-fatal 4 corrected
ecap 000b[600] = Vendor 1 ID 1
ecap 0019[900] = PCIe Sec 1 lane errors 0xf

Linux-side info (when GTX960 IS configured as a pass-thru dev, also with 
"pci=nocrs"):


00:06.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 
960] (rev a1) (prog-if 00 [VGA controller])

Subsystem: eVga.com. Corp. Device 1962
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
SERR- 
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at c100 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 14000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at c200 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at 2080 [size=128]
Expansion ROM at f700 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)

Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address:   Data: 
Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 
unlimited, L1 <64us

ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-

RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ 
AuxPwr- TransPend-
LnkCap: Port

39 matches

Mail list logo