On Tue, 29 Aug 2017 18:41:44 +0800 Bob Chen <a175818...@gmail.com> wrote:
> The topology is already having all GPUs directly attached to root bus 0. In > this situation you can't see the LnkSta attribute in any capabilities. Right, this is why I suggested viewing the physical device lspci info from the host. I haven't seen the suck link issue with devices on the root bus, but it may be worth double checking. Thanks, Alex > The other way of using emulated switch would somehow show this attribute, > at 8 GT/s, although the real bandwidth is low as usual. > > 2017-08-23 2:06 GMT+08:00 Michael S. Tsirkin <m...@redhat.com>: > > > On Tue, Aug 22, 2017 at 10:56:59AM -0600, Alex Williamson wrote: > > > On Tue, 22 Aug 2017 15:04:55 +0800 > > > Bob Chen <a175818...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > I got a spec from Nvidia which illustrates how to enable GPU p2p in > > > > virtualization environment. (See attached) > > > > > > Neat, looks like we should implement a new QEMU vfio-pci option, > > > something like nvidia-gpudirect-p2p-id=. I don't think I'd want to > > > code the policy of where to enable it into QEMU or the kernel, so we'd > > > push it up to management layers or users to decide. > > > > > > > The key is to append the legacy pci capabilities list when setting up > > the > > > > hypervisor, with a Nvidia customized capability config. > > > > > > > > I added some hack in hw/vfio/pci.c and managed to implement that. > > > > > > > > Then I found the GPU was able to recognize its peer, and the latency > > has > > > > dropped. ✅ > > > > > > > > However the bandwidth didn't improve, but decreased instead. ❌ > > > > > > > > Any suggestions? > > > > > > What's the VM topology? I've found that in a Q35 configuration with > > > GPUs downstream of an emulated root port, the NVIDIA driver in the > > > guest will downshift the physical link rate to 2.5GT/s and never > > > increase it back to 8GT/s. I believe this is because the virtual > > > downstream port only advertises Gen1 link speeds. > > > > > > Fixing that would be nice, and it's great that you now actually have a > > reproducer that can be used to test it properly. > > > > Exposing higher link speeds is a bit of work since there are now all > > kind of corner cases to cover as guests may play with link speeds and we > > must pretend we change it accordingly. An especially interesting > > question is what to do with the assigned device when guest tries to play > > with port link speed. It's kind of similar to AER in that respect. > > > > I guess we can just ignore it for starters. > > > > > If the GPUs are on > > > the root complex (ie. pcie.0) the physical link will run at 2.5GT/s > > > when the GPU is idle and upshift to 8GT/s under load. This also > > > happens if the GPU is exposed in a conventional PCI topology to the > > > VM. Another interesting data point is that an older Kepler GRID card > > > does not have this issue, dynamically shifting the link speed under > > > load regardless of the VM PCI/e topology, while a new M60 using the > > > same driver experiences this problem. I've filed a bug with NVIDIA as > > > this seems to be a regression, but it appears (untested) that the > > > hypervisor should take the approach of exposing full, up-to-date PCIe > > > link capabilities and report a link status matching the downstream > > > devices. > > > > > > > I'd suggest during your testing, watch lspci info for the GPU from the > > > host, noting the behavior of LnkSta (Link Status) to check if the > > > devices gets stuck at 2.5GT/s in your VM configuration and adjust the > > > topology until it works, likely placing the GPUs on pcie.0 for a Q35 > > > based machine. Thanks, > > > > > > Alex > >