Re: [vfio-users] [QEMU+OVMF+VIFO] Memory Prealloc

2022-11-22 Thread Alex Williamson
On Tue, 22 Nov 2022 12:35:56 -1000
Bryan Angelo  wrote:

> Related follow up.
> 
> When I add memory to a running VM via hotplug, QEMU preallocates this
> memory too (as expected based on your explanation).  When I subsequently
> remove memory added to the VM via hotplug, QEMU does not always appear to
> free the underlying memory.
> 
> For example:
> 
> -m 8G,slots=1,maxmem=12G
> 
> QEMU using 8G, VM shows 8G total.
> 
> object_add memory-backend-ram,id=mem1,size=4G
> device_add pc-dimm,id=dimm1,memdev=mem1
> 
> QEMU using 12G, VM shows 12G total.
> 
> After using the VM for a bit:
> 
> device_del dimm1
> object_del mem1
> 
> QEMU using 12G, VM shows 8G total.
> 
> Does it just so happen that the VFIO device is using memory that QEMU
> allocated/pinned for the hotplug device and therefore QEMU cannot free it?
> Or is there something else going on here?

Is this a Windows guest?  Hot adding memory is easy, hot removing it is
much more difficult.  I'm under the impression that Windows doesn't
actually support hot-remove.  For example, what happens if you try to
added it again?  Is the slot still in use?  Does memory size increase
by another 4G?  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://listman.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] [QEMU+OVMF+VIFO] Memory Prealloc

2022-11-22 Thread Alex Williamson
On Tue, 22 Nov 2022 17:57:37 +1100
Ivan Volosyuk  wrote:

> Is there something special about the pinning step? When I start a new
> VM with 16G in dedicated hugepages my system becomes quite
> unresponsive for several seconds, significant packet loss and random
> hang device oopses if I use preemptive kernel?

Pinning in blocking in the ioctl, but this should only affect the QEMU
process, not other host tasks, there are various schedule calls to
prevent this.  A 16GB, hugepage VM certainly shouldn't cause such
problems.  Are you sure the VM is really using pre-allocated hugepages?
This sounds more like a host system that's being forced to swap to
accommodate the VM page pinning.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://listman.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] [QEMU+OVMF+VIFO] Memory Prealloc

2022-11-20 Thread Alex Williamson
On Sun, 20 Nov 2022 16:36:58 -0800
Bryan Angelo  wrote:

> When passing-through via vfio-pci using QEMU 7.1.0 and OVMF, it appears
> that qemu preallocates all guest system memory.
> 
> qemu-system-x86_64 \
> -no-user-config \
> -nodefaults \
> -nographic \
> -rtc base=utc \
> -boot strict=on \
> -machine pc,accel=kvm,dump-guest-core=off \
> -cpu host,migratable=off \
> -smp 8 \
> -m size=8G \
> -overcommit mem-lock=off \
> -device vfio-pci,host=03:00.0 \
> ...
> 
>   PID USER  PR  NI VIRT  RES  %CPU  %MEM TIME+ S COMMAND
>  4151 root  20   0 13560.8m  *8310.8m* 100.0  52.6   0:25.06 S
> qemu-system-x86_64
> 
> 
> If I remove just the vfio-pci device argument, it appears that qemu no
> longer preallocates all guest system memory.
> 
>   PID USER  PR  NI VIRT  RES  %CPU  %MEM TIME+ S COMMAND
>  5049 root  20   0 13414.0m   *762.4m*   0.0   4.8   0:27.06 S
> qemu-system-x86_64
> 
> 
> I am curious if anyone has any context on or experience with this
> functionality.  Does anyone know if preallocation is a requirement for VFIO
> with QEMU or if preallocation can be disabled?
> 
> I am speculating that QEMU is actually preallocating as opposed to the
> guest touching every page of system memory.


This is a necessary artifact of device assignment currently.  Any memory
that can potentially be a DMA target for the assigned device needs to be
pinned in the host.  By default, all guest memory is potentially a DMA
target, therefore all of guest memory is pinned.  A vIOMMU in the guest
can reduce the memory footprint, but the guest will still initially pin
all memory as the vIOMMU is disabled at guest boot/reboot, but this
also trades VM memory footprint for latency, as dynamic mappings
through a vIOMMU to the host IOMMU is a long path.

Eventually, devices supporting Page Request Interface capabilities can
help to alleviate this, by essentially faulting DMA pages, much like
the processor does for memory.  Support for this likely requires new
hardware and software though.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://listman.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] higher power usage with vfio-pci then with amdgpu driver

2022-10-04 Thread Alex Williamson
On Sun, 2 Oct 2022 15:03:04 +0200
Pim  wrote:

> Hey,
> 
> After noticing this commit:
> https://github.com/torvalds/linux/commit/7ab5e10eda02da1d9562ffde562c51055d368e9c
> and because of high energy prices here, I did some tests with an energy
> monitor to see what the power consumption is for my AMD RX480when its
> not used. The RX480 is normally only used to pass through to a vm, but
> this vm is mostly switched off so the RX480 is not used.

In order to get the full benefit from the above commit, you'll need to
have a system that supports D3cold, otherwise you'll only see an
incremental improvement of also placing root ports in D3hot in addition
to the device from previous D3host support.  I can't really offer any
advice regarding systems that support this, I assume it's common in
laptops but not so much in desktops.  As the patch series describes, we
can't simply use PCI config space registers to enter D3cold, we need
hardware and firmware support that's generally invoked through ACPI in
order to remove power to the slot.

Certainly do check to make sure that all the devices under the root
port and the root port itself are also in D3hot.  If any devices under
the bridge remain in a running state, then runtime PM cannot enter
D3cold.

> The result is that when I boot my system with vfio-pci enabled it uses
> ~90Watt (RX480 driver = vfio-pci), but when I boot my system with
> vfio-pci disabled (RX480 driver = amdgpu) then my system uses ~80Watt.
> Both times the RX480 is in D3hot power state.
> 
> Any idea why using the vfio-pci driver results in ~10Watt more power
> consumption?

Probably because the native driver can power down more parts of the
device than we can via the generic PCI interface that gets us to D3hot.
vfio-pci does not have code to reach a device specific low power state.
If slot level power control was available, the referenced commit should
entirely power-off the device.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://listman.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Accessing the PCI config space from both the VM and the host

2022-08-02 Thread Alex Williamson
On Fri, 29 Jul 2022 21:43:44 +0300
Cosmin Chenaru  wrote:

> Hi,
> 
> I have an Intel network card inside a VM using PCI-Passthrough and I would
> like to write one register (the Physical Hardware Clock) but from inside
> the host, as writing it from inside the VM is complicated with all the
> unstable vTSC or other time sensitive tools which don't work best from
> inside a VM.
> 
> I know that I cannot write it from the host's network driver, as the driver
> will unbind the card when starting the VM, so let's say I write the
> register from another place (either hacking the network driver or from the
> VFIO driver). Do you think this would be possible?
> 
> Also, should I worry about locking the access so that the host and VM don't
> write to the PCI config space at the same time? I watched Alex's VFIO
> presentations and I understand that through Memory Mapped IO the VM can
> directly write the PCI config space, so the host will not be able to
> intercept and, in my case, to do the locking.
> 
> And my apologies if these questions were asked before. I did do a quick
> search but could not find anything relevant.

If you want special handling of the device in the host then you could
write a vfio-pci variant driver, see for instance the mlx5 one recently
added, and do the register write at an opportune time.  I'm a bit
dubious why this can't be written in the guest though.

If you want the simplest means to poke the register from the host, see
setpci(8).  I don't think you need any special locking, all config space
accesses come in through read/write operations to the config space
region and go through the same accessor functions that setpci would
use.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://listman.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] SR-IOV PF Passthrough

2022-06-03 Thread Alex Williamson
On Fri, 3 Jun 2022 23:13:39 +0900
Tydus  wrote:

> Hi list,
> 
> I'm trying to passthrough an SR-IOV capable device (PF) into a VM and 
> make it spawn VFs into it and had no luck.
> 
> Digging into qemu vfio code I found 
> https://github.com/qemu/qemu/commit/e37dac06dc4e85a2f46c24261c0dfdf2a30b50e3 
> which gives me some insights about the situation at that time. However I 
> wonder if this situation had changed since. It seems vIOMMU has been 
> implemented and VFIO adds SR-IOV support these years (but it might be 
> not related).
> 
> Sorry if I'm wrong but I'm not that familiar with VFIO and PCIE 
> internals. Any help or information is appreciated.

Hi Tydus,

It's a common misconception that a VM owned PF can enable SR-IOV and
the VFs will simply appear in the VM.  This is not how PCI device
assignment works.  The PF is virtualized into the VM and VF devices
created by the PF must also be created on the host and virtalized into
the VM.  The kernel vfio-pci driver has support for enabling SR-IOV, but
the QEMU and above stack does not.  Effectively a VM interaction with
the PF SR-IOV capability would need to be trapped and signaled to a
management tool like libvirt to perform the SR-IOV configuration in the
host, after which the new host VFs would be collected and attached to
the VM to emulate the bare metal process.  All of this latter process is
currently unimplemented.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://listman.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] crackling audio in guest, after upgrade from Ivy Bridge to Epyc Milan

2022-03-08 Thread Alex Williamson
On Tue, 08 Mar 2022 20:09:02 +
"Bronek Kozicki"  wrote:
> On Tue, 8 Mar 2022, at 7:35 PM, Bronek Kozicki wrote:
> > root@gdansk ~ # lscpu
> > Architecture:x86_64
> > CPU op-mode(s):  32-bit, 64-bit
> > Address sizes:   48 bits physical, 48 bits virtual
> > Byte Order:  Little Endian
> > CPU(s):  96
> > On-line CPU(s) list: 0-95
> > Vendor ID:   AuthenticAMD
> > BIOS Vendor ID:  Advanced Micro Devices, Inc.
> > Model name:  AMD EPYC 7413 24-Core Processor
> > BIOS Model name: AMD EPYC 7413 24-Core Processor
> > 
> > CPU family:  25
> > Model:   1
> > Thread(s) per core:  2
> > Core(s) per socket:  24
> > Socket(s):   2
> > Stepping:1
> > Frequency boost: enabled
> > CPU max MHz: 3630.8101
> > CPU min MHz: 1500.
> > BogoMIPS:5300.12
> > Flags:   fpu vme de pse tsc msr pae mce cx8 
> > apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht 
> > syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl 
> > nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor 
> > ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c 
> > rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
> > 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb 
> > bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate 
> > ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid 
> > cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec 
> > xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero 
> > irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock 
> > nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter 
> > pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes 
> > vpclmulqdq rdpid overflow_recov succor smca
> > Virtualization:  AMD-V
> > L1d cache:   1.5 MiB (48 instances)
> > L1i cache:   1.5 MiB (48 instances)
> > L2 cache:24 MiB (48 instances)
> > L3 cache:256 MiB (8 instances)
> > NUMA node(s):2
> > NUMA node0 CPU(s):   0-23,48-71
> > NUMA node1 CPU(s):   24-47,72-95  
> 
> 
> OK so it seems avic is not listed in the CPU Flags which is puzzling,
> since it is almost 10 years old CPU feature and supported in kernels
> since 5.6 if I read that right. I have send a query to the
> motherboard manufacturer (Supermicro) in case it needs to be enabled
> in BIOS, or maybe conflicts with existing settings.

Hmm, yeah.  I'm not sure what's required to get that enabled in the CPU
flags.  Both AMD systems I have access to show it, a Ryzen 7 4800H and
and EPYC 7601.

___
vfio-users mailing list
vfio-users@redhat.com
https://listman.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] crackling audio in guest, after upgrade from Ivy Bridge to Epyc Milan

2022-03-07 Thread Alex Williamson
On Mon, 07 Mar 2022 22:13:01 +
"Bronek Kozicki"  wrote:

> I know, this is such an old topic ...
> 
> I have today upgraded my hypervisor from intel Ivy-Bridge to AMD Epyc
> Milan and, after making the necessary adjustments in vfio
> configuration to make my virtual machines work again, I found that
> there's immense amount of crackling (like, many times per second) in
> audio, both in Linux and in Windows guests. The audio is USB DAC
> connected through USB, and USB is on passed-through PCI (dedicated
> for guest).
> 
> The host has two AMD 7413 CPUs, with lots of cores dedicated to
> guests:
> 
> video=efifb:off iommu=pt amd_iommu=on add_efi_memmap
> nohz_full=6-35,54-83 rcu_nocbs=6-35,54-83 isolcpus=6-35,54-83
> nvme.poll_queues=12 
>

Getting interrupt bypass to work on AMD (AVIC) is rather more finicky
than the equivalent on Intel (APICv).  Have a look at this pointer and
see if you can get it working:

https://forum.level1techs.com/t/svm-avic-iommu-avic-improves-interrupt-performance-on-zen-2-etc-based-processors/155226

When working correctly, you should essentially stop seeing interrupt
counts increase in /proc/interrupts on the host for any vfio MSI
vectors.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://listman.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] gtx750 TI

2021-08-25 Thread Alex Williamson
On Tue, 24 Aug 2021 19:07:11 -0400
Roger Lawhorn  wrote:

> Hello,
> I have a friend using hypervisor and he cannot get his gtx750 TI passed 
> through.
> He is getting the code 43.
> He says he thinks its the card being too old.
> Any ideas?

I test with a GTX750, Maxwell is not too old.

___
vfio-users mailing list
vfio-users@redhat.com
https://listman.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] semantics of VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP

2021-06-01 Thread Alex Williamson
On Tue, 1 Jun 2021 13:48:22 +
Thanos Makatos  wrote:

> (sending here as I can't find a relevant list in
> http://vger.kernel.org/vger-lists.html)

$ ./scripts/get_maintainer.pl include/uapi/linux/vfio.h 
Alex Williamson  (maintainer:VFIO DRIVER)
Cornelia Huck  (reviewer:VFIO DRIVER)
k...@vger.kernel.org (open list:VFIO DRIVER)
linux-ker...@vger.kernel.org (open list)

> I'm trying to understand the semantics of
> VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP. My (very rough) understanding
> so far is that once a page gets pinned then it's considered dirty and
> if the page is still pinned then it remains dirty even after we're
> done serving VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP. Is my
> understanding correct?

This is the current type1 implementation, but the semantics only
require that a page is reported dirty if it's actually been written.
Without support for tracking DMA writes, we assume that any page
accessible to the device is constantly dirty.  This will be refined
over time as software and hardware support improves, but we currently
error on the side of assuming all pinned pages are always dirty.
Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://listman.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] [PATCH V3 00/22] Live Update

2021-05-17 Thread Alex Williamson
On Mon, 17 May 2021 12:40:43 +0100
Stefan Hajnoczi  wrote:

> On Fri, May 14, 2021 at 11:15:18AM -0400, Steven Sistare wrote:
> > On 5/14/2021 7:53 AM, Stefan Hajnoczi wrote:  
> > > On Thu, May 13, 2021 at 04:21:15PM -0400, Steven Sistare wrote:  
> > >> On 5/12/2021 12:42 PM, Stefan Hajnoczi wrote:  
> > >>> On Fri, May 07, 2021 at 05:24:58AM -0700, Steve Sistare wrote:  
> >  Provide the cprsave and cprload commands for live update.  These save 
> >  and
> >  restore VM state, with minimal guest pause time, so that qemu may be 
> >  updated
> >  to a new version in between.
> > 
> >  cprsave stops the VM and saves vmstate to an ordinary file.  It 
> >  supports two
> >  modes: restart and reboot.  For restart, cprsave exec's the qemu 
> >  binary (or
> >  /usr/bin/qemu-exec if it exists) with the same argv.  qemu restarts in 
> >  a
> >  paused state and waits for the cprload command.  
> > >>>
> > >>> I think cprsave/cprload could be generalized by using QMP to stash the
> > >>> file descriptors. The 'getfd' QMP command already exists and QEMU code
> > >>> already opens fds passed using this mechanism.
> > >>>
> > >>> I haven't checked but it may be possible to drop some patches by reusing
> > >>> QEMU's monitor file descriptor passing since the code already knows how
> > >>> to open from 'getfd' fds.
> > >>>
> > >>> The reason why using QMP is interesting is because it eliminates the
> > >>> need for execve(2). QEMU may be unable to execute a program due to
> > >>> chroot, seccomp, etc.
> > >>>
> > >>> QMP would enable cprsave/cprload to work both with and without
> > >>> execve(2).
> > >>>
> > >>> One tricky thing with this approach might be startup ordering: how to
> > >>> get fds via the QMP monitor in the new process before processing the
> > >>> entire command-line.  
> > >>
> > >> Early on I experimented with a similar approach.  Old qemu passed 
> > >> descriptors to an
> > >> escrow process and exited; new qemu started and retrieved the 
> > >> descriptors from escrow.
> > >> vfio mostly worked after I hacked the kernel to suppress the 
> > >> original-pid owner check.
> > >> I suspect my recent vfio extensions would smooth the rough edges.  
> > > 
> > > I wonder about the reason for VFIO's pid limitation, maybe because it
> > > pins pages from the original process?  
> > 
> > The dma unmap code verifies that the requesting task is the same as the 
> > task that mapped
> > the pages.  We could add an ioctl that passes ownership to a new task.  We 
> > would also need
> > to fix locked memory accounting, which is associated with the mm of the 
> > original task.
> >   
> > > Is this VFIO pid limitation the main reason why you chose to make QEMU
> > > execve(2) the new binary?  
> > 
> > That is one.  Plus, re-attaching to named shared memory for pc.ram causes 
> > the vfio conflict
> > errors I mentioned in the previous email.  We would need to suppress 
> > redundant dma map calls,
> > but allow legitimate dma maps and unmaps in response to the ongoing address 
> > space changes and
> > diff callbacks caused by some drivers. It would be messy and fragile. In 
> > general, it felt like 
> > I was working against vfio rather than with it.
> > 
> > Another big reason is a requirement to preserve anonymous memory for legacy 
> > qemu updates (via
> > code injection which I briefly mentioned in KVM forum).  If we extend cpr 
> > to allow updates 
> > without exec, I still need the exec option.
> >   
> > >> However, the main issue is that guest ram must be backed by named shared 
> > >> memory, and
> > >> we would need to add code to support shared memory for all the secondary 
> > >> memory objects.
> > >> That makes it less interesting for us at this time; we care about 
> > >> updating legacy qemu 
> > >> instances with anonymous guest memory.  
> > > 
> > > Thanks for explaining this more in the other sub-thread. The secondary
> > > memory objects you mentioned are relatively small so I don't think
> > > saving them in the traditional way is a problem.
> > > 
> > > Two approaches for zero-copy memory migration fit into QEMU's existing
> > > migration infrastructure:
> > > 
> > > - Marking RAM blocks that are backed by named memory (tmpfs, hugetlbfs,
> > >   etc) so they are not saved into the savevm file. The existing --object
> > >   memory-backend-file syntax can be used.
> > > 
> > > - Extending the live migration protocol to detect when file descriptor
> > >   passing is available (i.e. UNIX domain socket migration) and using
> > >   that for memory-backend-* objects that have fds.
> > > 
> > > Either of these approaches would handle RAM with existing savevm/migrate
> > > commands.  
> > 
> > Yes, but the vfio issues would still need to be solved, and we would need 
> > new
> > command line options to back existing and future secondary memory objects 
> > with 
> > named shared memory.
> >   
> > > The remaining issue is how to migrate 

Re: [vfio-users] group not bound

2021-01-05 Thread Alex Williamson
On Tue, 5 Jan 2021 11:20:20 -0500
Roger Lawhorn  wrote:

> Hello,
> 
> I recently had to reinstall my OS, but kept all my personal files.
> One of the things that needs to be resetup partially is qemu.
> 
> I am getting the following error when running my qemu4.0 script to start 
> win10:
> ./seabios-rtx2080.sh 24
> pid 2558's current affinity list: 0-23
> pid 2558's new affinity list: 0-7
> qemu4.0-system-x86_64: -device 
> vfio-pci,host=0a:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on,romfile=/media/dad/QEMU-SSD/qemu-wd/rtx.rom:
>  
> vfio :0a:00.0: group 16 is not viable
> Please ensure all devices within the iommu_group are bound to their vfio 
> bus driver.
> pid 2558's current affinity list: 0-7
> pid 2558's new affinity list: 0-23
> 
> I had this before but cant remember how i Resolved it.
> 
> Here is my iommu group 16:
> for a in /sys/kernel/iommu_groups/*; do find $a -type l; done
> 
> /sys/kernel/iommu_groups/16/devices/:0a:00.2
> /sys/kernel/iommu_groups/16/devices/:0a:00.0
> /sys/kernel/iommu_groups/16/devices/:0a:00.3
> /sys/kernel/iommu_groups/16/devices/:0a:00.1
> 
> 
> 
> group 16 is my evga rtx 2080 video card
> it has 4 devices. video, audio, usb, and serial bus.
> 
> here is my cmdline:
> cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-5.0.0-32-generic root=/dev/mapper/vg-root ro 
> vfio-pci.ids=10de:1e04,10de:10f7 mitigations=off
> 
> 
> i only hold the video and audio as the usb and serial wont hold.
>  >lspci -nnk  
> 0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device 
> [10de:1e04] (rev a1)
>      Subsystem: eVga.com. Corp. Device [3842:2281]
>      Kernel driver in use: vfio-pci
>      Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
> 0a:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)
>      Subsystem: eVga.com. Corp. Device [3842:2281]
>      Kernel driver in use: vfio-pci
>      Kernel modules: snd_hda_intel
> 0a:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] 
> (rev a1)
>      Subsystem: eVga.com. Corp. Device [3842:2281]
>      Kernel driver in use: xhci_hcd
> 0a:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 
> [10de:1ad7] (rev a1)
>      Subsystem: eVga.com. Corp. Device [3842:2281]
>      Kernel driver in use: nvidia-gpu
>      Kernel modules: i2c_nvidia_gpu

Ultimately all these devices need to be bound to vfio-pci or unused
ones can be bound to pci-stub, default host kernel drivers are not an
option.  Have you tried driverctl as a means to bind them to vfio-pci?
The pci-stub driver is usually built statically into the kernel
allowing it to claim devices before modules, ie.
pci-stub.ids=10de:1ad6,10de:1ad7.  The only way you could have
previously left them bound to default host drivers would have been
forcing group separation with the ACS override patch (not recommended).
Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] igd single gvt-d passthrough

2020-12-15 Thread Alex Williamson


[There were a bunch of bounces from gmail accounts for this message, so
let me add some comments in hopes that whatever the issue was has been
resolved and more folks will see this thread.]

On Tue, 15 Dec 2020 00:08:27 +0100
boit sanssoif  wrote:

> Hi,
> 
> I'm trying to configure kvm to single passthrough IGD (i5-7200U with HD620)
> in legacy mode on a headless laptop.
> I extracted the vbios.rom and checked device ID (5916) and checksum with
> rom-parser tools
> 
> intel i915 modules are blacklisted and vfio options are setted like that :
> options vfio-pci ids=8086:5916 disable_vga=1


Note that you're intentionally disabling VGA support here while
docs/igd-assign.txt says:

* vfio VGA support very likely needs to be enabled in the host kernel

The reservation in making this an absolute directive is that a register
bit in IGD can indicate that VGA is disabled on the device, in which
case we don't require VGA support.  If you're seeing the below error
message, then VGA is not disabled on the IGD device and the vfio-pci
VGA support which is being disabled by these module options is required.

> I tried to fulfill all the conditions of the igd-assign.txt legacy mode
> from the qemu project but I still failing to success, monitor stays black
> with no output.
> 
> using qemu with pc-i440fx-5.1 :
> 
> qemu-system-x86_64  \
> -machine pc \
> -cpu host,kvm=off -smp cores=2,threads=1 -enable-kvm\
> -boot d -cdrom /tmp/debian-10.6.0-amd64-netinst.iso   \
> -m 4G   \
> -vga none -nographic\
> -nodefaults \
> -device
> vfio-pci-igd-lpc-bridge,id=vfio-pci-igd-lpc-bridge0,bus=pci.0,addr=1f.0 \
> -device
> vfio-pci,host=00:02.0,x-igd-gms=2,id=hostdev0,bus=pci.0,addr=0x2,x-igd-opregion=on,romfile=/tmp/vbios_8086_5916.rom
> 
> 
> here are the errors that I cannot fix :
> 
> qemu-system-x86_64: -device
> vfio-pci,host=00:02.0,x-igd-gms=6,id=hostdev0,bus=pci.0,addr=0x2,x-igd-opregion=on,romfile=/home/user/vbios_8086_5916_origin.rom:
> vfio :00:02.0: failed getting region info for VGA region index 8:
> Invalid argument
> => is VGA region 8 valid ?  


This is directly caused by the disable_vga=1 module option.

> qemu-system-x86_64: -device
> vfio-pci,host=00:02.0,x-igd-gms=6,id=hostdev0,bus=pci.0,addr=0x2,x-igd-opregion=on,romfile=/home/user/vbios_8086_5916_origin.rom:
> IGD device :00:02.0 failed to enable VGA access, legacy mode disabled


And as a result we've dropped out of legacy mode setup.

> And syslog shows errors  :
> vfio-pci :00:02.0: Invalid PCI ROM header signature: expecting 0xaa55,
> got 0x
> resource sanity check: requesting [mem 0x000c-0x000d], which spans
> more than PCI Bus :00 [mem 0x000c-0x000c3fff window]
> caller pci_map_rom+0x7c/0x1d0 mapping multiple BARs

The kernel tries to map the ROM to determine whether it's really there
as part of the standard device initialization, regardless of whether
QEMU is using a romfile option.  I assume that's what's happening here.

> What am I doing wrong ? single gpu passthrough from a headless laptop seems
> to be possible.

Remove the disable_vga=1 option and see where/if it fails next.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] vfio_pci_mmap failure : Hitting mmaps overlapping the MSI-X table

2020-12-08 Thread Alex Williamson
On Wed, 9 Dec 2020 01:46:11 +0530
Vikas Aggarwal  wrote:

> Alex,
> Thanks !
> A follow up question:
> Do i need to backport 3 kernel patches as mentioned in following dpdk
> patch submission comment
> https://mails.dpdk.org/archives/dev/2018-July/109039.html

I think you'll find that two of those are already in your v4.14 kernel.
It appears to me that you'd only one pre-patch for a clean backport:

a32295c612c5 vfio-pci: Allow mapping MSIX BAR
dda01f787df9 vfio: Simplify capability helper

Thanks,
Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] vfio_pci_mmap failure : Hitting mmaps overlapping the MSI-X table

2020-12-07 Thread Alex Williamson
On Mon, 7 Dec 2020 19:19:20 +0530
Vikas Aggarwal  wrote:

> Hello list vfio-users,
> Can someone help me understand reason that why mmap of requested address
> overlaps with MSI-X table during mmap-ing of PCIe resources.
> 
> Platform  :  ARM64 architecture (Marvell OcteonTX2)
> 
> Linux kernel:  4.14.76-22.0.0 aarch64,  Page Size 64K
> 
> Application :  Userspace DPDK+SPDK doing  mmap-ing of PCIe resources  via
> pci_vfio_map_resource_primary( )
> 
> http://code.dpdk.org/dpdk/v19.11/source/lib/librte_pci/rte_pci.c#L140
>   mapaddr  = mmap(0x20208004, 0x3000,
> PROT_READ | PROT_WRITE, MAP_SHARED, 35, 0x0);
> 
> Device 0003:0d:00.0  :   Samsung SSD
> 
> Failure: mapaddr  returned is all 0x, errno is set
> to EINVAL
>  EAL: pci_map_resource(): cannot mmap(36,
> 0x2020801e, 0x2000, 0x0): Invalid argument (0x)
>  EAL: Failed to map pci BAR0
>  EAL:   0003:0d:00.0 mapping BAR0 failed: Invalid
> argument
>  EAL: Requested device 0003:0d:00.0 cannot be used
> 
> Cause from kernel mmap handler:
> EINVAL is returned by vfio_pci_mmap()  in-kernel handler :
> https://elixir.bootlin.com/linux/v4.14.76/source/drivers/vfio/pci/vfio_pci
> .c#L1142
>   if (index == vdev->msix_bar) {
>   /*
>* Disallow mmaps overlapping the MSI-X table; users don't
>* get to touch this directly.  We could find somewhere
>* else to map the overlap, but page granularity is only
>* a recommendation, not a requirement, so the user needs
>* to know which bits are real.  Requiring them to mmap
>* around the table makes that clear.
>*/
> 
>   /* If neither entirely above nor below, then it overlaps
> */
>   if (!(req_start >= vdev->msix_offset + vdev->msix_size ||
> req_start + req_len <= vdev->msix_offset))
>   return -EINVAL;  <=Hitting
> this
>   }
> >From Debug prints:  
>   req_start = 0; vdev->msix_offset = 8192;
> vdev->msix_size=144;  req_len=65536,  vdev->msix_offset=8192;
> 
> Can someone explain me how come this overlap situation is coming and how
> can I fix it.


The 'why' is exactly per your $Subject, previous kernels didn't allow
mmaps over the MSI-X table, which means that for a 64k PAGE_SIZE you'd
be precluded from mmap'ing anything in the first 64K of the BAR.  This
restriction was removed way back in a32295c612c5 ("vfio-pci: Allow
mapping MSIX BAR"), which appeared in kernel v4.16... unfortunately
still two kernel releases newer than the ancient kernel you're based
on.  We decided instead that interrupt remapping needs to protect the
system against the user possibly misprogramming the vector table via an
mmap, specifically for page size restrictions like this.  I'd advise
upgrading your kernel or backporting the change, otherwise outside of
running a 4K PAGE_SIZE kernel, there's nothing that's going to let you
mmap closer to the MSI-X vector table.  Clearly userspace tools could
be fixed to use read/write accesses within the page that contains the
vector table (QEMU should already do this), but it comes at a
performance loss that might be unacceptable.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] [External Email] Re: Regarding Direct Assignment NIC to VM

2020-11-02 Thread Alex Williamson
On Mon, 2 Nov 2020 11:29:14 -0500
Roja Eswaran  wrote:

> Hello Alex,
> Thank you so much for your insight. Are there any other drivers other
> than VFIO which you are aware of which could be used for a direct
> assignment using qemu ?

Nope.  Is it really vital to your project to directly assign the NIC
versus using something like virtio?  A virtio-vhost solution should be
able to easily achieve line rate for a gigabit NIC, at perhaps a small
latency overhead versus direct assignment.  RTL NICs aren't really the
best candidates for assignment even in PCI form.  Thanks,

Alex

> On Sun, Nov 1, 2020 at 9:52 PM Alex Williamson
>  wrote:
> >
> >
> > [Please try to send plain text emails to mailing lists if possible,
> > trying to extract the content below...]
> >
> > On Sun, 1 Nov 2020 20:56:31 -0500
> > Roja malarvathi  wrote:
> >  
> > > First of all, Thank you so much in advance for your time and help.
> > >
> > > I am using Jetson Xavier NX which integrates a Realtek RTL8211FDI
> > > Gigabit Ethernet controller.  The on-module Ethernet controller
> > > supports: 10/100/1000 Gigabit Ethernet IEEE 802.3u Media Access
> > > Controller (MAC)
> > >
> > > I am trying to assign Non-PCI NIC mentioned above directly to the
> > > Vanilla VM. As it's not a PCI device, I have no idea how I can
> > > achieve this. Any insights or comments are really appreciated. Thank
> > > you again!  
> >
> > There is a vfio-platform driver, but platform devices generally require
> > device specific support in the kernel for things like device reset.
> > I'm not aware that we have support for any RTL devices, so unless
> > you're interested in developing and contributing such support, it may
> > not be possible to assign it to a VM with vfio.  Thanks,
> >
> > Alex
> >  
> 

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Regarding Direct Assignment NIC to VM

2020-11-01 Thread Alex Williamson


[Please try to send plain text emails to mailing lists if possible,
trying to extract the content below...]

On Sun, 1 Nov 2020 20:56:31 -0500
Roja malarvathi  wrote:

> First of all, Thank you so much in advance for your time and help.
>
> I am using Jetson Xavier NX which integrates a Realtek RTL8211FDI
> Gigabit Ethernet controller.  The on-module Ethernet controller
> supports: 10/100/1000 Gigabit Ethernet IEEE 802.3u Media Access
> Controller (MAC)
>
> I am trying to assign Non-PCI NIC mentioned above directly to the
> Vanilla VM. As it's not a PCI device, I have no idea how I can
> achieve this. Any insights or comments are really appreciated. Thank
> you again!

There is a vfio-platform driver, but platform devices generally require
device specific support in the kernel for things like device reset.
I'm not aware that we have support for any RTL devices, so unless
you're interested in developing and contributing such support, it may
not be possible to assign it to a VM with vfio.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] location of regions VFIO_REGION_TYPE_MIGRATION/VFIO_REGION_SUBTYPE_MIGRATION

2020-10-02 Thread Alex Williamson
On Fri, 2 Oct 2020 12:54:28 +
Thanos Makatos  wrote:

> According to linux/include/uapi/linux/vfio.h, for a device to support 
> migration
> it must provide a VFIO capability of type VFIO_REGION_INFO_CAP_TYPE and set
> .type/.subtype to VFIO_REGION_TYPE_MIGRATION/VFIO_REGION_TYPE_MIGRATION.
> 
> What confuses me is the following:
> 
> "The structure vfio_device_migration_info is placed at the 0th offset of
> the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
> migration information."
> 
> Where do regions VFIO_REGION_TYPE_MIGRATION and VFIO_REGION_SUBTYPE_MIGRATION
> live on the device's address space? Is just like any other region, except that
> in the case of a PCI device it's index must be equal to, or larger than,
> VFIO_PCI_NUM_REGIONS, and set arbitrarily by the device implementation? If
> so, then I suppose this index must appear in struct 
> vfio_device_info.num_regions?

Yes, the user gets a device fd via VFIO_GROUP_GET_DEVICE_FD, they then
use the VFIO_DEVICE_GET_INFO ioctl to query the device type, number of
IRQs and number of regions.  For a vfio-pci device, some of the
reported region indexes are used for a fixed purpose, these are where
the VFIO_PCI_BAR0_REGION_INDEX conventions come into play.  For region
indexes beyond that fixed enum (VFIO_PCI_NUM_REGION), the user relies on
the capability chain reported in the VFIO_DEVICE_GET_REGION_INFO ioctl
to determine the purpose of the region.  Userspace would walk the
capability chain in that ioctl return buffer to find a
VFIO_REGION_INFO_CAP_TYPE.  This is what would identify the region as
providing migration support.  The vfio_device_migration_info structure
is then accessible at the zero'th offset within that region, ie.
vfio_region_info.offset within the vfio device fd.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] P2P DMA between endpoint devices inside a VM

2020-09-23 Thread Alex Williamson
On Wed, 23 Sep 2020 15:32:15 -0700
Maran Wilson  wrote:

> On Wed, Sep 23, 2020 at 2:19 PM Alex Williamson 
> wrote:
> 
> > On Wed, 23 Sep 2020 13:08:10 -0700
> > Maran Wilson  wrote:
> >  
> > > Just wanted to wrap up this thread by confirming what Alex said is true  
> > (in  
> > > case anyone else is interested in this topic in the future). After  
> > enabling  
> > > IOMMU tracing on the host I was able to confirm that IOMMU mappings were,
> > > in fact, being created properly to map the gPA to hPA of both devices'  
> > BAR  
> > > resources.
> > >
> > > It turns out that our hardware device provides a backdoor way of reading
> > > PCI config space via BAR mapped register space.  The driver inside the VM
> > > was using that and thereby reading back the hPA of the BAR (and using  
> > that  
> > > to program the DMA controller). This sort of breaks the whole  
> > pass-through  
> > > model so I'll have to sort that out on the driver/device side to close  
> > that  
> > > loophole somehow so that the driver inside the VM is forced to use  
> > standard  
> > > Linux APIs to read PCI config space. That way KVM/Qemu can properly
> > > intercept the access and return the gPA values.  
> >
> > Thanks for the follow-up!  It sounds like another option for you might
> > be to virtualize those backdoor accesses like we do for GPUs that have
> > similar features.  That could allow existing drivers to work
> > unmodified.  If you're interested, take a look at hw/vfio/pci-quirks.c
> > in QEMU.  We have generic support for both a VFIOConfigWindowQuirk,
> > where access to config space is through separate data and offset
> > registers, and a VFIOConfigMirrorQuirk, where a range of MMIO space
> > maps to config space.  We just need to know the parameters to apply to
> > your device.  The only downside to the virtualization is that we trap
> > MMIO accesses at page size granularity, so MMIO accesses within that
> > shared page would fault into QEMU to do a read or write rather than
> > make use of the direct access provided through an mmap.  Thanks,
> >
> > Alex
> >  
> 
> Oh yeah. That's exactly what we have going on with our hardware.  So if I'm
> understanding properly, we would just have to run with a patched version of
> Qemu on the host and the rest of the SW stack (kernel, vfio-pci driver,
> etc) can run as-is right?

Yup, QEMU virtualizes the backdoor to expose the emulated config space
rather than the bare metal config space, everything else just work. 
Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] P2P DMA between endpoint devices inside a VM

2020-09-23 Thread Alex Williamson
On Wed, 23 Sep 2020 13:08:10 -0700
Maran Wilson  wrote:

> Just wanted to wrap up this thread by confirming what Alex said is true (in
> case anyone else is interested in this topic in the future). After enabling
> IOMMU tracing on the host I was able to confirm that IOMMU mappings were,
> in fact, being created properly to map the gPA to hPA of both devices' BAR
> resources.
> 
> It turns out that our hardware device provides a backdoor way of reading
> PCI config space via BAR mapped register space.  The driver inside the VM
> was using that and thereby reading back the hPA of the BAR (and using that
> to program the DMA controller). This sort of breaks the whole pass-through
> model so I'll have to sort that out on the driver/device side to close that
> loophole somehow so that the driver inside the VM is forced to use standard
> Linux APIs to read PCI config space. That way KVM/Qemu can properly
> intercept the access and return the gPA values.

Thanks for the follow-up!  It sounds like another option for you might
be to virtualize those backdoor accesses like we do for GPUs that have
similar features.  That could allow existing drivers to work
unmodified.  If you're interested, take a look at hw/vfio/pci-quirks.c
in QEMU.  We have generic support for both a VFIOConfigWindowQuirk,
where access to config space is through separate data and offset
registers, and a VFIOConfigMirrorQuirk, where a range of MMIO space
maps to config space.  We just need to know the parameters to apply to
your device.  The only downside to the virtualization is that we trap
MMIO accesses at page size granularity, so MMIO accesses within that
shared page would fault into QEMU to do a read or write rather than
make use of the direct access provided through an mmap.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] P2P DMA between endpoint devices inside a VM

2020-09-08 Thread Alex Williamson
On Tue, 8 Sep 2020 11:31:42 -0700
Maran Wilson  wrote:

> On Tue, Sep 8, 2020 at 10:22 AM Alex Williamson 
> wrote:
> 
> > On Tue, 8 Sep 2020 09:59:46 -0700
> > Maran Wilson  wrote:
> >  
> > > I'm trying to use the vfio-pci driver to pass-through two PCIe endpoint
> > > devices into a VM. On the host, each of these PCIe endpoint devices is in
> > > its own IOMMU group. From inside the VM, I would like to perform P2P DMA
> > > operations. So basically, programming the DMA engine of one of the  
> > devices  
> > > to write directly to a BAR mapped region of the other device.
> > >
> > >
> > > Is this something that is supported by the vfio driver, working with  
> > Qemu?  
> > > Are there any VM configuration gotchas I need to keep in mind for this
> > > particular use-case? I'm on an AMD Rome server, FWIW.
> > >
> > >
> > > This works on the host (when I'm not using VMs) with IOMMU disabled. And  
> > it  
> > > also works on the host with the IOMMU enabled as long as I add the
> > > appropriate IOMMU mapping of the other device's BAR mapped address to the
> > > appropriate IOMMU group.
> > >
> > >
> > > But from what I can tell, when the endpoint devices are passed through to
> > > the VM, it doesn't appear that any IOMMU mappings are created on the host
> > > to translate gPA of the other endpoint's BAR mapped address. But of  
> > course  
> > > DMA to/from host DRAM does work in that same configuration, so I know  
> > IOMMU  
> > > mappings are being created to translate gPA of DRAM.  
> >
> > As long as we can mmap the endpoint BAR (>= PAGE_SIZE) then the BAR GPA
> > should be mapped through the IOMMU to enable p2p within the VM.  You
> >  
> 
> Thanks Alex. Does it require the endpoints having a common root port inside
> the VM? Or does that part not matter?

There could be platform specific topology requirements, but since you
indicate it works on the host with the IOMMU enabled, I assume we're ok.

> If you happen to know the routine name or two in the driver and/or qemu
> that handles this, it would help me get my bearings sooner and allow me to
> instrument from the kernel side too to see why my experiment is not working.

In QEMU all vfio mappings are handled through a MemoryListener where
vfio_listener_region_{add,del} are the callbacks.  vfio_dma_map() and
vfio_dma_unmap() are the wrappers for the ioctl into the kernel call.
BARs mappings will report true for memory_region_is_ram_device(), so we
won't consider it a fatal error when we can't map them, but we'll
certainly try to map them.

On the kernel side, you'd be using the type1 IOMMU backend, where the
ioctl will go through vfio_iommu_type1_ioctl() and should land in
vfio_dma_do_map() for the mapping case.

> > should be able to see this with tracing enabled in QEMU for vfio*.
> >  
> 
> I will try that too, thanks!
> 
> By the way, is this functionality present as far back as the 4.15 kernel?

It's essentially always been present.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] P2P DMA between endpoint devices inside a VM

2020-09-08 Thread Alex Williamson
On Tue, 8 Sep 2020 09:59:46 -0700
Maran Wilson  wrote:

> I'm trying to use the vfio-pci driver to pass-through two PCIe endpoint
> devices into a VM. On the host, each of these PCIe endpoint devices is in
> its own IOMMU group. From inside the VM, I would like to perform P2P DMA
> operations. So basically, programming the DMA engine of one of the devices
> to write directly to a BAR mapped region of the other device.
> 
> 
> Is this something that is supported by the vfio driver, working with Qemu?
> Are there any VM configuration gotchas I need to keep in mind for this
> particular use-case? I'm on an AMD Rome server, FWIW.
> 
> 
> This works on the host (when I'm not using VMs) with IOMMU disabled. And it
> also works on the host with the IOMMU enabled as long as I add the
> appropriate IOMMU mapping of the other device's BAR mapped address to the
> appropriate IOMMU group.
> 
> 
> But from what I can tell, when the endpoint devices are passed through to
> the VM, it doesn't appear that any IOMMU mappings are created on the host
> to translate gPA of the other endpoint's BAR mapped address. But of course
> DMA to/from host DRAM does work in that same configuration, so I know IOMMU
> mappings are being created to translate gPA of DRAM.

As long as we can mmap the endpoint BAR (>= PAGE_SIZE) then the BAR GPA
should be mapped through the IOMMU to enable p2p within the VM.  You
should be able to see this with tracing enabled in QEMU for vfio*.
Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Not enough IRQs in 82093AA IOAPIC

2020-08-28 Thread Alex Williamson
On Thu, 27 Aug 2020 17:22:59 -0700
Micah Morton  wrote:

> Hi Alex/Paolo,
> 
> We talked a few months ago (some of which can be seen here
> https://www.spinics.net/lists/kvm/msg217578.html) about adding
> platform IRQ forwarding for platform devices behind PCI
> controller/adapter devices (for use when the PCI parent device gets
> assigned to a VM guest through vfio-pci). Thinking through how it
> would work, I've run into the problem that the IOAPIC
> (https://pdos.csail.mit.edu/6.828/2018/readings/ia32/ioapic.pdf) that
> kvm/QEMU emulate for forwarding platform/PCI interrupts to the guest
> is quite limited in terms of free/unused IRQs available for use
> (section 2.4 in the ioapic.pdf above gives descriptions).
> 
> I think I'm not the first one to encounter a lack of legacy IRQ slots
> on x86 kvm/QEMU, as here's an example of QEMU's emulated TPM choosing
> to use polling instead of interrupts for what I think is the same
> reason: https://www.qemu.org/docs/master/specs/tpm.html#acpi-interface
> 
> A few questions:
> 1) Are you aware of anyone ever looking at modernizing the IOAPIC? I
> assume this isn't a priority for VFIO since most high
> speed/performance HW devices don't use legacy interrupts. I guess it's
> not a big issue for kvm/QEMU either since there are not many new
> platform devices coming along that people want to emulate for VM
> guests (TPM being an exception here).

Not only are they typically for "legacy" use cases, but PCI only
supports four interrupt lines and these are usually swizzled across the
buses to spread out how many devices can share an interrupt.  And of
course we can generally share interrupts for PCI devices, so it's more
of an efficiency, rather than exhaustion issue, so long as the
device/driver doesn't require an exclusive interrupt.  So no, I don't
know any efforts to modernize, it's not a worthwhile limitation for
vfio-pci.

> 2) If one was to add more IRQs to the IOAPIC would it be a requirement
> to find a datasheet from some newer hardware and perfectly emulate
> that newer hardware for a more modernized IOAPIC? Or would something
> like hacking the 82093AA emulation code to support more than 24 IRQs
> be within the realm of possibility? Is it even meaningful for guest
> VMs to "think" they're talking to an ancient IOAPIC from 1996?

I think that extending an existing real device is potentially
problematic.  We don't know what dependencies closed source OSes might
have or what they'd infer from the device.  If you can't find a
specific device with widespread support to emulate, maybe it would be
possible to implement a generic device.  Does this really solve the
problem though, or is it just a stopgap?  How would we size the ioapic
to match the VM hardware configuration?

> 3) Was there ever any consideration of making the IOAPIC a
> virtio/paravirtualized device rather than an emulated device?

Dunno, the qemu-devel and kvm mailing list would be a much more
appropriate place to get feedback and history than the vfio-users list.

> I figure if there's no reasonable hope of running an x86 VM with
> QEMU/kvm with more than 24 IRQ slots then I may look at repurposing
> PCI INTx/MSI to forward platform interrupts that come from devices
> behind PCI controllers (if the platform IRQ is level triggered use
> INTx and edge triggered use MSI). I'm sure that's a whole new can of
> worms though.

This is sounding like quite a kludge, how does the touchpad i2c device
indicate that it interrupts through the assigned PCI i2c controller?
Doesn't that controller use interrupts itself?  Trying to multiplex the
PCI controller interrupt and the i2c device interrupt through INTx
sounds messy.  You'd also need to mask off MSI/X capabilities on the
i2c controller if supported since we can't use INTx and MSI/X at the
same time.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] pci passthrough of Intel igp

2020-08-27 Thread Alex Williamson
On Thu, 27 Aug 2020 23:17:52 +0200
daggs  wrote:

> Greetings Alex,
> 
> > Sent: Wednesday, August 26, 2020 at 8:02 PM
> > From: "Alex Williamson" 
> > To: "daggs" 
> > Cc: "Patrick O'Callaghan" , vfio-users@redhat.com
> > Subject: Re: [vfio-users] pci passthrough of Intel igp
> >
> > On Wed, 26 Aug 2020 07:27:59 +0200
> > daggs  wrote:
> >  
> > > Greetings Alex,
> > >  
> > > > Sent: Wednesday, August 26, 2020 at 12:54 AM
> > > > From: "Alex Williamson" 
> > > > To: "daggs" 
> > > > Cc: "Patrick O'Callaghan" , vfio-users@redhat.com
> > > > Subject: Re: [vfio-users] pci passthrough of Intel igp
> > > >
> > > > On Tue, 25 Aug 2020 23:34:48 +0200
> > > > daggs  wrote:
> > > >  
> > > > > Greetings Alex,
> > > > >  
> > > > > > Sent: Wednesday, August 12, 2020 at 8:04 PM
> > > > > > From: "Alex Williamson" 
> > > > > > To: "Patrick O'Callaghan" 
> > > > > > Cc: vfio-users@redhat.com, "daggs" 
> > > > > > Subject: Re: [vfio-users] pci passthrough of Intel igp
> > > > > >
> > > > > > On Wed, 12 Aug 2020 17:46:33 +0100
> > > > > > "Patrick O'Callaghan"  wrote:
> > > > > >  
> > > > > > > On Wed, 2020-08-12 at 18:02 +0200, daggs wrote:  
> > > > > > > > Greetings,
> > > > > > > >
> > > > > > > > I have a machine with an Intel igp of HD Graphics 610 
> > > > > > > > [8086:5902].
> > > > > > > > I found several discussions on the subject stating that it 
> > > > > > > > isn't possible but all of them are several years old.
> > > > > > > > so I wanted to know if it is possible to pass it to a vm?
> > > > > > > > I'm using kernel 5.4.43, libvirt-6.6.0 and qemu-5.0.0.  
> > > > > > >
> > > > > > > If this is your only GPU, it doesn't make much sense. The idea of
> > > > > > > passthrough is to let the VM control an additional GPU, not the 
> > > > > > > main
> > > > > > > one.  
> > > > > >
> > > > > > There are plenty of people trying to assign their primary graphics
> > > > > > device, it makes perfect sense for someone that doesn't intend to 
> > > > > > run a
> > > > > > graphical environment on the host.  Assigning the primary GPU can be
> > > > > > more challenging, but that doesn't mean it isn't done.
> > > > > >
> > > > > > For daggs, I can only say try it yourself, I don't know of any 
> > > > > > specific
> > > > > > reason it wouldn't work, but direct assignment of IGD is a fair bit 
> > > > > > of
> > > > > > luck anyway since the hardware is constantly changing and we don't
> > > > > > really keep up with it.  You might need to play with the x-igd-gms
> > > > > > value on the vfio-pci device in QEMU, several people have found that
> > > > > > x-igd-gms=1 is necessary on some versions of hardware.  Thanks,
> > > > > >
> > > > > > Alex
> > > > > >
> > > > > >  
> > > > >
> > > > > I'm trying to boot the vm up with the igd pass-through, here is my 
> > > > > xml: https://dpaste.com/CNDHRLNXC
> > > > > the boot ends up with no signal and this is visible in dmesg:
> > > > > [   36.181635] DMAR: DRHD: handling fault status reg 3
> > > > > [   36.181641] DMAR: [DMA Read] Request device [00:02.0] PASID 
> > > > >  fault addr c600 [fault reason 06] PTE Read access is not 
> > > > > set
> > > > > [   36.182298] DMAR: DRHD: handling fault status reg 3
> > > > > [   36.182304] DMAR: [DMA Read] Request device [00:02.0] PASID 
> > > > >  fault addr c603d000 [fault reason 06] PTE Read access is not 
> > > > > set
> > > > > [   36.183459] DMAR: DRHD: handling fault status reg 3
> > > > > [   36.183464] DMAR: [DMA Read] Request device [00:02.0] PASID 
> > > > >  fault addr c603e000 [fault reason 06] PTE Read access is not 
> > > > > set
&

Re: [vfio-users] pci passthrough of Intel igp

2020-08-26 Thread Alex Williamson
On Wed, 26 Aug 2020 07:27:59 +0200
daggs  wrote:

> Greetings Alex,
> 
> > Sent: Wednesday, August 26, 2020 at 12:54 AM
> > From: "Alex Williamson" 
> > To: "daggs" 
> > Cc: "Patrick O'Callaghan" , vfio-users@redhat.com
> > Subject: Re: [vfio-users] pci passthrough of Intel igp
> >
> > On Tue, 25 Aug 2020 23:34:48 +0200
> > daggs  wrote:
> >  
> > > Greetings Alex,
> > >  
> > > > Sent: Wednesday, August 12, 2020 at 8:04 PM
> > > > From: "Alex Williamson" 
> > > > To: "Patrick O'Callaghan" 
> > > > Cc: vfio-users@redhat.com, "daggs" 
> > > > Subject: Re: [vfio-users] pci passthrough of Intel igp
> > > >
> > > > On Wed, 12 Aug 2020 17:46:33 +0100
> > > > "Patrick O'Callaghan"  wrote:
> > > >  
> > > > > On Wed, 2020-08-12 at 18:02 +0200, daggs wrote:  
> > > > > > Greetings,
> > > > > >
> > > > > > I have a machine with an Intel igp of HD Graphics 610 [8086:5902].
> > > > > > I found several discussions on the subject stating that it isn't 
> > > > > > possible but all of them are several years old.
> > > > > > so I wanted to know if it is possible to pass it to a vm?
> > > > > > I'm using kernel 5.4.43, libvirt-6.6.0 and qemu-5.0.0.  
> > > > >
> > > > > If this is your only GPU, it doesn't make much sense. The idea of
> > > > > passthrough is to let the VM control an additional GPU, not the main
> > > > > one.  
> > > >
> > > > There are plenty of people trying to assign their primary graphics
> > > > device, it makes perfect sense for someone that doesn't intend to run a
> > > > graphical environment on the host.  Assigning the primary GPU can be
> > > > more challenging, but that doesn't mean it isn't done.
> > > >
> > > > For daggs, I can only say try it yourself, I don't know of any specific
> > > > reason it wouldn't work, but direct assignment of IGD is a fair bit of
> > > > luck anyway since the hardware is constantly changing and we don't
> > > > really keep up with it.  You might need to play with the x-igd-gms
> > > > value on the vfio-pci device in QEMU, several people have found that
> > > > x-igd-gms=1 is necessary on some versions of hardware.  Thanks,
> > > >
> > > > Alex
> > > >
> > > >  
> > >
> > > I'm trying to boot the vm up with the igd pass-through, here is my xml: 
> > > https://dpaste.com/CNDHRLNXC
> > > the boot ends up with no signal and this is visible in dmesg:
> > > [   36.181635] DMAR: DRHD: handling fault status reg 3
> > > [   36.181641] DMAR: [DMA Read] Request device [00:02.0] PASID  
> > > fault addr c600 [fault reason 06] PTE Read access is not set
> > > [   36.182298] DMAR: DRHD: handling fault status reg 3
> > > [   36.182304] DMAR: [DMA Read] Request device [00:02.0] PASID  
> > > fault addr c603d000 [fault reason 06] PTE Read access is not set
> > > [   36.183459] DMAR: DRHD: handling fault status reg 3
> > > [   36.183464] DMAR: [DMA Read] Request device [00:02.0] PASID  
> > > fault addr c603e000 [fault reason 06] PTE Read access is not set
> > > [   36.184614] DMAR: DRHD: handling fault status reg 3
> > > [   36.721979] vfio-pci :00:02.0: vfio_ecap_init: hiding ecap 
> > > 0x1b@0x100
> > >
> > > I've dumped the rom, do I need to run the fixer on it? if so, what is the 
> > > vid and did?
> > > than I need to place this  in the 
> > > hostdev section?
> > > I've noticed most of the reports say it works only on i440fx but they
> > > are all a few years old, is that still the case?
> > >
> > > as part of this contexts, I have this in the kernel cmdline:
> > > pcie_acs_override=id:8086:a170 does it means device 8086:a170 is
> > > "broken" out of the iommu table?  
> >
> > If you dump the ROM and provide it via a  tag in the xml, then yes
> > you need to fix the device ID and checksum, if QEMU loads it from the
> > device it will do this itself.  You can find the vendor ID (vid) and
> > device ID (did) with 'lspci -nns :00:02.0", they will be the
> > numbers in the last set of brackets, ex: [8086:a170].  They're also
> > available in sysfs under the device:
> >
> > $ cat /sys/bus/pci/devi

Re: [vfio-users] pci passthrough of Intel igp

2020-08-25 Thread Alex Williamson
On Tue, 25 Aug 2020 23:34:48 +0200
daggs  wrote:

> Greetings Alex,
> 
> > Sent: Wednesday, August 12, 2020 at 8:04 PM
> > From: "Alex Williamson" 
> > To: "Patrick O'Callaghan" 
> > Cc: vfio-users@redhat.com, "daggs" 
> > Subject: Re: [vfio-users] pci passthrough of Intel igp
> >
> > On Wed, 12 Aug 2020 17:46:33 +0100
> > "Patrick O'Callaghan"  wrote:
> >  
> > > On Wed, 2020-08-12 at 18:02 +0200, daggs wrote:  
> > > > Greetings,
> > > >
> > > > I have a machine with an Intel igp of HD Graphics 610 [8086:5902].
> > > > I found several discussions on the subject stating that it isn't 
> > > > possible but all of them are several years old.
> > > > so I wanted to know if it is possible to pass it to a vm?
> > > > I'm using kernel 5.4.43, libvirt-6.6.0 and qemu-5.0.0.  
> > >
> > > If this is your only GPU, it doesn't make much sense. The idea of
> > > passthrough is to let the VM control an additional GPU, not the main
> > > one.  
> >
> > There are plenty of people trying to assign their primary graphics
> > device, it makes perfect sense for someone that doesn't intend to run a
> > graphical environment on the host.  Assigning the primary GPU can be
> > more challenging, but that doesn't mean it isn't done.
> >
> > For daggs, I can only say try it yourself, I don't know of any specific
> > reason it wouldn't work, but direct assignment of IGD is a fair bit of
> > luck anyway since the hardware is constantly changing and we don't
> > really keep up with it.  You might need to play with the x-igd-gms
> > value on the vfio-pci device in QEMU, several people have found that
> > x-igd-gms=1 is necessary on some versions of hardware.  Thanks,
> >
> > Alex
> >
> >  
> 
> I'm trying to boot the vm up with the igd pass-through, here is my xml: 
> https://dpaste.com/CNDHRLNXC
> the boot ends up with no signal and this is visible in dmesg:
> [   36.181635] DMAR: DRHD: handling fault status reg 3
> [   36.181641] DMAR: [DMA Read] Request device [00:02.0] PASID  fault 
> addr c600 [fault reason 06] PTE Read access is not set
> [   36.182298] DMAR: DRHD: handling fault status reg 3
> [   36.182304] DMAR: [DMA Read] Request device [00:02.0] PASID  fault 
> addr c603d000 [fault reason 06] PTE Read access is not set
> [   36.183459] DMAR: DRHD: handling fault status reg 3
> [   36.183464] DMAR: [DMA Read] Request device [00:02.0] PASID  fault 
> addr c603e000 [fault reason 06] PTE Read access is not set
> [   36.184614] DMAR: DRHD: handling fault status reg 3
> [   36.721979] vfio-pci :00:02.0: vfio_ecap_init: hiding ecap 0x1b@0x100
> 
> I've dumped the rom, do I need to run the fixer on it? if so, what is the vid 
> and did?
> than I need to place this  in the hostdev 
> section?
> I've noticed most of the reports say it works only on i440fx but they
> are all a few years old, is that still the case?
> 
> as part of this contexts, I have this in the kernel cmdline:
> pcie_acs_override=id:8086:a170 does it means device 8086:a170 is
> "broken" out of the iommu table?

If you dump the ROM and provide it via a  tag in the xml, then yes
you need to fix the device ID and checksum, if QEMU loads it from the
device it will do this itself.  You can find the vendor ID (vid) and
device ID (did) with 'lspci -nns :00:02.0", they will be the
numbers in the last set of brackets, ex: [8086:a170].  They're also
available in sysfs under the device:

$ cat /sys/bus/pci/devices/\:00\:02.0/vendor
$ cat /sys/bus/pci/devices/\:00\:02.0/device

Legacy IGD assignment still requires 440FX due to Q35 placing it's own
device at a conflicting address to the one needed by IGD.  The ACS
override patch should never be needed for IGD, though it probably tells
us the vendor and device IDs that you'd need to patch into the ROM.
IGD is always a single function device on the root complex and
therefore will always be in its own IOMMU group.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] dev_WARN for hotplugging to live VFIO group

2020-08-21 Thread Alex Williamson
On Fri, 21 Aug 2020 15:36:40 -0500
Shawn Anastasio  wrote:

> Hello,
> 
> While developing the userspace VFIO components of libkvmchan[1], I've
> run into a dev_WARN in VFIO when hotplugging devices on the same pci
> host bridge as other VFIO-managed devices:
> 
> [  111.220260][ T6281] pci 0001:00:01.0: Device added to live group 1!
> [  111.220390][ T6281] WARNING: CPU: 51 PID: 6281 at 
> drivers/vfio/vfio.c:709 vfio_iommu_group_notifier+0x250/0x360 [vfio]
> 
> In spite of this warning, everything seems to be working fine. The
> daemon that manages these PCI devices and owns the VFIO group fd
> is able to connect to the new devices and use them just fine.
> 
> My question is: what exactly is the reason this warning was added?
> Is there a subtle issue that could be triggered by hotplugging to
> live groups, and if so, what needs to be done to resolve this?
> I'd be happy to try to resolve any issues and submit patches if necessary.
> 
> For reference, this is all on a pseries/ppc64 guest on a POWER9 host,
> with VFIO operating in the VFIO_SPAPR_TCE_IOMMU mode. In this config,
> all PCI devices on a bus exist in the same IOMMU group (in this case #1).

When a device is added to a live group there's a risk that it will be
auto-probed by a host driver, if that occurs then isolation of the
group has been violated and vfio code will BUG_ON to halt the system.
The warning is effectively just a notification that we're in a risky
situation where the wrong driver binding could escalate the issue.

There is a ToDo in the code at that point to prevent driver probing,
but ISTR at that time we may not have had a good way to do that.  I'm
not sure if we do now either.  We have the driver_override field for
the device that we could write into, but at this point we're looking at
a generic device, we don't even know that it's a PCI device.  We could
determine that, but even then it's not clear that the kernel should set
the policy to define that it should be bound to the vfio-pci driver,
potentially versus other vfio drivers that could legitimately manage
the device safely.  If we write a random string to the driver_override
field we could prevent automatic binding to any driver, but then we put
a barrier to making use of the device, which seems like it has support
issues as well.  I'm not sure what the best approach is... that's why
we currently generate a warning and hope it doesn't happen.

On a truly bare metal platform, I don't think this should ever occur in
practice without manually removing and re-scanning devices.  We'd
expect PCIe hotplug to occur on the slot level with isolation to the
downstream port providing that slot.  Without that isolation, or the
increasingly unlikely chance of encountering this with conventional PCI
hotplug, we'd probably hand wave the system as inappropriate for the
task.  Here I think you have a bare metal hypervisor exposing portions
of devices to the "host" in unusual ways that can trigger this and are
expected to be supported.  Sorry, I don't have a good proposal to
resolve how we should handle group composition changing while the group
is in use... thus we currently just whine about it with a warning.
Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] pci passthrough of Intel igp

2020-08-12 Thread Alex Williamson
On Wed, 12 Aug 2020 15:21:31 -0700
kram threetwoone  wrote:

> Sure Alex, thanks for taking a look.  Let me know if there's anything else
> you want to see.  It is a shell script and I get the same errors if I run
> as user or root.
> 
> #!/bin/bash
> qemu-system-x86_64 \
> -enable-kvm \
> -machine q35 \
> -cpu host \
> -nographic \
> -vga none \
> -device intel-iommu \
  ^  Remove this line

> -device vfio-pci,host=00:02.0,multifunction=on \
   ^  ??

multifunction=on doesn't do much unless an addr= value is specified for
the device and another device to appear in the same slot.

> -device vfio-pci,host=00:0e.0 \
> -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_CODE.fd
> \
> -drive if=pflash,format=raw,file=/home/admin/myvars.fd \
> -device vfio-pci,host=00:02.0 \
  ^^^

IGD is listed twice, that plus intel-iommu might explain the error.

> -net nic \
> /home/admin/kodi_drive.img

In order to trigger legacy mode IGD assignment, the IGD device needs to
be placed into addr=2.0  I thought it was also still the case that
legacy mode doesn't work with q35 or OVMF.  UPT (universal passthrough
mode) was originally meant to be only a secondary graphics, maybe
that's what you're going for?  Probably need that i915 OVMF ROM that
Jeff pointed to for using it as primary VM graphics (I've never tried
it).  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] pci passthrough of Intel igp

2020-08-12 Thread Alex Williamson
On Wed, 12 Aug 2020 14:56:08 -0700
kram threetwoone  wrote:

> I have not gotten the VM to boot, there is always the multiple address
> spaces error.  I don't think this is an ACS patch situation; the GPU sits
> in it's own vfio group with no other devices.

I can't think how a group with a single device would get the multiple
address spaces error, but I also can't think how we'd get the multiple
address spaces error at all unless the VM is configure with a vIOMMU,
ex. including an intel-iommu device.  Can you share the VM config?
Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] pci passthrough of Intel igp

2020-08-12 Thread Alex Williamson
On Wed, 12 Aug 2020 17:46:33 +0100
"Patrick O'Callaghan"  wrote:

> On Wed, 2020-08-12 at 18:02 +0200, daggs wrote:
> > Greetings,
> > 
> > I have a machine with an Intel igp of HD Graphics 610 [8086:5902].
> > I found several discussions on the subject stating that it isn't possible 
> > but all of them are several years old.
> > so I wanted to know if it is possible to pass it to a vm?
> > I'm using kernel 5.4.43, libvirt-6.6.0 and qemu-5.0.0.  
> 
> If this is your only GPU, it doesn't make much sense. The idea of
> passthrough is to let the VM control an additional GPU, not the main
> one.

There are plenty of people trying to assign their primary graphics
device, it makes perfect sense for someone that doesn't intend to run a
graphical environment on the host.  Assigning the primary GPU can be
more challenging, but that doesn't mean it isn't done.

For daggs, I can only say try it yourself, I don't know of any specific
reason it wouldn't work, but direct assignment of IGD is a fair bit of
luck anyway since the hardware is constantly changing and we don't
really keep up with it.  You might need to play with the x-igd-gms
value on the vfio-pci device in QEMU, several people have found that
x-igd-gms=1 is necessary on some versions of hardware.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] pin guest memory for vfio pass-through device

2020-07-12 Thread Alex Williamson
On Sun, Jul 12, 2020 at 6:36 PM Yv Lin  wrote:

> After more thoughts, I guess that
> 1) normally ppl don't enable vIOMMU unless they need to use a nested
> guest, as vIOMMU is slow and the memory accounting issue you just mentioned.
>

vIOMMU w/ device assignment is more often used for DPDK in a guest than for
nested guests.


> 2) host IOMMU driver actually can do io page fault and on-demanding
> pinning/mapping for ATS/PRI-capable device, but currently qemu doesn't tell
> if a pass-through device and host IOMMU can do it or not. If this is true,
> maybe we can remove the pinning of all guest memory for this type of
> device??
>

To some extent this is under development with the work Intel and others are
contributing for SVA and SIOV support.  The primary focus is to support
nested paging with PASID support, there are page fault interfaces
proposed.  Thanks,

Alex
___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] pin guest memory for vfio pass-through device

2020-07-12 Thread Alex Williamson
On Sun, Jul 12, 2020 at 6:16 PM Yv Lin  wrote:

>
> Here are some summaries that I learned from what you told.
> 1) If a device is passed through to guestOS via vfio, and there is no
> IOMMU present in guestOS. all memory regions within the device address
> space will be pinned down. if IOMMU is presented in guestOS, qemu could
> only pin and map the needed pages (specified by dma_map_page() called in
> guestOS device driver), but as vIOMMU is emulated, the performance is not
> good.
> 2) Even if a pcie device can support ATS/PRI capability and it's passed
> through to guestOS, the above statement is still true, the IO page fault
> and demanding page won't be utilized anyway.
>

Correct, also note that a vIOMMU is never enabled during early boot on
x86/64, therefore all guest memory will be pinned initially.  Also a vIOMMU
introduces locked memory accounting issues as each device address space
makes use of a separate VFIO container, which does accounting separately.
And finally, ATS implies that we honor devices making use of
"pre-translated" DMA, which implies a degree of trust that the user/device
cannot make use of this as a vector to exploit the host.  Thanks,

Alex
___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] pin guest memory for vfio pass-through device

2020-07-12 Thread Alex Williamson
On Sun, Jul 12, 2020 at 5:38 PM Yv Lin  wrote:

>
>
> On Sun, Jul 12, 2020 at 1:59 PM Alex Williamson <
> alex.l.william...@gmail.com> wrote:
>
>> On Sun, Jul 12, 2020 at 12:25 PM Yv Lin  wrote:
>>
>>> Btw, IOMMUv2 can support peripheral page request (PPR) so in theory if
>>> an end point pcie device can support ATS/PRI, pinning down all memory is
>>> not necessary, does current vfio driver or qemu has corresponding support
>>> to save pinned memory?
>>>
>>
>> I think you're very much over estimating the difference between
>> VFIO_TYPE1_IOMMU and VFIO_TYPE1v2_IOMMU, if this is what you're referring
>> to.  The difference is only subtle unmapping semantics, none of what you
>> mention above.
>>
>
> I was referring to AMD iommuv2 (drivers/iommu/amd_iommu_v2.c in linux
> kernel tree).  If the host machine bears a AMD iommuv2 which has PPR
> capability, does it help for vfio/qemu no pinning down all memories?
>

No, there are not yet any interfaces to handle PPR through VFIO and I'm not
even sure AMD vIOMMU works with VFIO.  Thanks,

Alex

>
___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] pin guest memory for vfio pass-through device

2020-07-12 Thread Alex Williamson
On Sun, Jul 12, 2020 at 12:25 PM Yv Lin  wrote:

> Btw, IOMMUv2 can support peripheral page request (PPR) so in theory if an
> end point pcie device can support ATS/PRI, pinning down all memory is not
> necessary, does current vfio driver or qemu has corresponding support to
> save pinned memory?
>

I think you're very much over estimating the difference between
VFIO_TYPE1_IOMMU and VFIO_TYPE1v2_IOMMU, if this is what you're referring
to.  The difference is only subtle unmapping semantics, none of what you
mention above.


> On Sun, Jul 12, 2020 at 11:03 AM Yv Lin  wrote:
>
>> Hi Alex,
>> thanks for the detailed explanation. it does clarify more to me. I read
>> the vfio_listener_region_add() more carefully. It seems check every
>> memory region against container's host window, for IOMMUv1 vfio device, the
>> host window is always 64bit full range (vfio_host_win_add(container, 0,
>> (hwaddr)-1, info.iova_pgsizes); in vfio_connect_container()), so basically
>> mean all memory region will be pinned and mapped to host IOMMU, is the
>> understanding right?
>>
>
The listener maps everything within the address space of the device, the
extent of that address space depends on whether a vIOMMU is present and
active.  When there is no vIOMMU, the full address space of the VM is
mapped.  Thanks,

Alex
___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] pin guest memory for vfio pass-through device

2020-07-12 Thread Alex Williamson
vfio_dma_map() is the exclusive means that QEMU uses to insert translations
for an assigned device.  It is not only used by AMD vIOMMU, in fact that's
probably one of the less tested use vectors, it's used when QEMU
establishes any sort of memory mapping for the VM.  Any mapping that could
possibly be a DMA target for the device should filter through the
MemoryListener and result in a call to vfio_dma_map().  This includes all
memory that is considered RAM by the VM, as well as possibly direct mapped
peer-to-peer DMA ranges.  When the device is backed by an IOMMU,
vfio_dma_map() will pin pages to establish a fixed IOMMU translation.
vfio_pin_pages() comes into play when the device we're assigning is not
backed by the IOMMU, which can be the case with a mediated device (mdev).
Interactions with these devices are mediated by a vendor driver in the host
kernel where the vendor driver provides device isolation and translation by
acting as an interposer between the user and the device.  The
vfio_pin_pages() interface allows the vendor driver in the host kernel to
request page pinning such that the mapping is fixed while DMA is performed
by the physical backing device.  Thanks,

Alex
___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] nvidia

2020-07-07 Thread Alex Williamson
On Tue, Jul 7, 2020 at 4:33 PM Roger Lawhorn  wrote:

> Hello,
> I have an nvidia 980 ti oc 6gb card.
> I cannot use it with qemu as a passthrough card.
> I have had to passthrough my amd cards only.
> I have read of nvidia making it impossible to use some of their cards in
> virtual machines.
> Is this true? Is there a workaround?
> I had a laptop with a nvidia 880m optimus and it worked fine.
>
> The card stops at the option rom (if you use debug mode you can see this).
> So, this is not where I get to windows and then have an issue.
> Windows cannot even start booting.



Have you followed any guides or howtos for this?  Keywords like "vfio",
"nvidia", "geforce" turn up thousands of hits on google.  I've done a talk
on it, that plus more hands on videos from others are also on youtube.
Optimus is generally much harder to get working than discrete cards.
Stopping at the option ROM might mean we're not reading it properly from
the GPU, it might need to be provided separately.  There are all sorts of
guides online to do that as well.  Thanks,

Alex
___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] two gpus

2020-06-23 Thread Alex Williamson
On Tue, 23 Jun 2020 23:33:22 -0400
Roger Lawhorn  wrote:

> I have the answer.
> 
> I had to change my lines from:
> -device 
> vfio-pci,host=0c:00.0,bus=root_port1,addr=00.0,multifunction=on,x-vga=on \
> -device vfio-pci,host=0c:00.1,bus=root_port2,addr=00.1,multifunction=on \
> -device vfio-pci,host=0d:00.0,bus=root_port3,addr=01.0,multifunction=on \
> 
> to:
> -device 
> vfio-pci,host=0c:00.0,bus=root_port1,addr=00.0,multifunction=on,x-vga=on \
> -device vfio-pci,host=0c:00.1,bus=root_port2,addr=00.0,multifunction=on \
> -device vfio-pci,host=0d:00.0,bus=root_port3,addr=00.0,multifunction=on \
> 
> 
> Apparently, this is a known bug.
> All cards must be on the same addr.
> 
> I have both gpus and 8gb of vmem instead of just 4gb.
> 
> Nice to know you CAN have more than one video card with gpu passthrough.

That's not a bug, that's how PCI function number and PCIe slot numbering
works.  You can't have a function number >0 without a function 0 with
multifunction=on in the same slot and you can't have a slot number >0
under a PCIe downstream port.

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] two gpus

2020-06-21 Thread Alex Williamson
On Sun, 21 Jun 2020 18:58:40 -0400
Roger Lawhorn  wrote:

> ok,
> third try to get a new thread
> 
> Anyway,
> I tried your global switch.
> I still get code 10 device failed to start on the 2nd gpu.
> I tried 8GB.
> Am I actually allocating 8GB for the pci hole or is that just a size?
> I did a full reinstall and factory reset on the video driver.


AFAICT you've got 32GB between these GPUs, that would probably be the
minimum I'd try, maybe even 64GB.  

> 
> On 6/21/20 6:05 PM, Alex Williamson wrote:
> Use:
> 
> -global q35-pcihost.pci-hole64-size=
> 
> ___
> vfio-users mailing list
> vfio-users@redhat.com
> https://www.redhat.com/mailman/listinfo/vfio-users

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] two gpus

2020-06-21 Thread Alex Williamson
[intentionally de-threaded]

On Sun, 21 Jun 2020 17:02:44 -0400
Roger Lawhorn  wrote:
> Hello,
> 
> I have a video card with two gpus.
> The Radeon Pro Duo.
> 
> I can get only one of the gpus passed off to windows 10.
> If I pass off the second one I am told by windows that there is not 
> enough resources.
> 
> So, does the q35 pc not support enough resouces for a 2nd video card?

Use:

-global q35-pcihost.pci-hole64-size=

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Only 8 devices functional on i7 system

2020-06-20 Thread Alex Williamson
On Sat, 20 Jun 2020 09:46:00 +0200
Kjeld Borch Egevang  wrote:

> Hi VFIO users,
> 
> I am working with a PCIe card that supports up to 255 VFs. In order to 
> get the SR-IOV stuff to work I added some of the latest patches to the 
> vfio-pci driver.
> 
> I have two servers.
> 
> The first one is a Centos 8 based system with many cores and a Xeon E5 
> CPU. I enable VFIO with:
> 
> insmod /lib/modules/$(uname -r)/kernel/drivers/vfio/vfio.ko.xz
> insmod /lib/modules/$(uname -r)/kernel/drivers/vfio/vfio_iommu_type1.ko.xz
> insmod /lib/modules/$(uname -r)/kernel/drivers/vfio/vfio_virqfd.ko.xz
> insmod /root/vfio-pci.ko enable_sriov=1 # The patched module
> echo "$vendor $device" > /sys/bus/pci/drivers/vfio-pci/new_id
> echo 15 > /sys/bus/pci/drivers/vfio-pci/*/sriov_numvfs
> 
> This works and I get a PF and 15 fully functional VFs.
> 
> My other server is also a Centos 8 based system but with an i7 CPU. The 
> motherboard is an Asrock Z370M Pro 4.
> 
> When I enable VFIO only the PF and the first 7 VFs are fully functional. 
> If I enter "lspci -s 01:00.1 -x" I get a nice dump of the config space 
> for VF0. But if I enter "lspci -s 01:01.0 -x" I get:
> 
> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 
> My guess is that this is either due to a limitation in the PCI bridge or 
> a driver issue. What is causing this problem?

There might be an SR-IOV option in the BIOS that should be turned on,
but in general I'd expect consumer boards are pretty bad about
allocating bridge resources for SR-IOV.  This might be all that the
system can support under minimum apertures.  Add the following to the
kernel command line to have Linux do the SR-IOV allocation itself:

pci=realloc,assign-busses

Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] empty iommu_groups?

2020-06-15 Thread Alex Williamson
On Mon, 15 Jun 2020 15:41:54 -0600
"Edmund F. Nadolski"  wrote:

> Hi,
> 
> I'm a noob to VFIO so hopefully this is not to lame a question.
> 
> I'm looking to set up a Linux guest VM with a direct-assigned nvme ssd, 
> that I can control by a usermode driver with VFIO.  I enable nested 
> virtualization in KVM and set up the iommu parameters on the command line:
> 
> Host:
> # dmesg | grep Command
> [0.00] Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.7.0_0611+ 
> root=UUID= ro resume=UUID= console=tty0 console=ttyS4,115200 
> intel_iommu=on iommu=pt
> # modprobe kvm_intel nested=1
> # ls /sys/kernel/iommu_groups
> 0  1  10  11  12  13  14  15  16  17  18  19  2  3  4  5  6  7  8  9
> 
> 
> The direct assignment works and I can see the drive in the guest, but 
> vfio does not create any iommu groups:
> 
> Guest:
> # lspci | grep Non
> 07:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe 
> SSD Controller SM981/PM981/PM983
> # dmesg | grep -i -e DMAR -e IOMMU
> [0.00] Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.7.0 
> root=/dev/mapper/fedora_localhost--live-root ro 
> resume=/dev/mapper/fedora_localhost--live-swap 
> rd.lvm.lv=fedora_localhost-live/root 
> rd.lvm.lv=fedora_localhost-live/swap intel_iommu=on iommu=pt
> [0.064051] Kernel command line: 
> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.7.0 
> root=/dev/mapper/fedora_localhost--live-root ro 
> resume=/dev/mapper/fedora_localhost--live-swap 
> rd.lvm.lv=fedora_localhost-live/root 
> rd.lvm.lv=fedora_localhost-live/swap intel_iommu=on iommu=pt
> [0.064102] DMAR: IOMMU enabled
> [0.383193] iommu: Default domain type: Passthrough (set via kernel 
> command line)
> [1.467722] intel_iommu=on
> # ls -la /sys/kernel/iommu_groups/
> total 0
> drwxr-xr-x.  2 root root 0 Jun 15 15:16 .
> drwxr-xr-x. 15 root root 0 Jun 15 15:12 ..
> #
> 
> Clearly I'm missing something in my setup/config, but I'm not sure that 
> that could be.  Can anyone please advise?

There needs to be an intel-iommu device in the VM configuration or else
the intel_iommu=on option to the guest kernel will only print that
enabled line and does nothing more.

https://wiki.qemu.org/Features/VT-d

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Passthrough for non-DMA-masters on x86

2020-04-20 Thread Alex Williamson
On Fri, 17 Apr 2020 09:34:49 -0700
Micah Morton  wrote:

> Hi Alex,
> 
> I've been looking at device passthrough for platform devices on x86
> that are not behind an IOMMU by virtue of not being DMA masters. I
> think on some level this is an explicit non-goal of VFIO
> (https://www.spinics.net/lists/linux-renesas-soc/msg26153.html ,
> https://blog.linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_IOMMU.txt)?

Mostly that's correct.  We do have a no-iommu mode, which was added to
avoid introducing MSI/X support to uio_pci_generic.  No-iommu mode
implements the device interface, including interrupts, but the user is
on their own for any other kind of DMA.  It also taints the kernel
since we're giving a user access to a device without protection of an
IOMMU.

> >From my understanding VFIO is mostly about IOMMU management. I have a  
> few questions however:
> 
> 1) Are interrupt forwarding, IOMMU mgmt, and PCI config space
> virtualization the main 3 things that VFIO does (plus some hacks to
> get GPUs working in guests)? Would you add any other aspects of VFIO
> that I'm missing?

The entire device is accessed through vfio, including all memory and
I/O ranges.  There are also interfaces for device resets.

> 2) If you can forward interrupts to a guest without VFIO (say with
> something like this patch:
> https://www.spinics.net/lists/kvm/msg207949.html), then it should be
> pretty simple to configure the VMM to make the MMIO regions of the
> platform device available to the guest. Is VFIO in the loop at all for
> actually giving the guest access to the MMIOs or is that just done by
> mappings in the VMM?

Yes, vfio is in the loop.  A file descriptor is used to access the
device.  Each memory or I/O region of the device is mapped through the
VMM via offsets on that fd.

> *I don't think I care about VFIO virtualizing PCI BARs for the guest
> since I would be telling the guest about the platform devices through
> ACPI -- so the guest wouldn't be looking to the PCI config space for
> that info anyway. I guess one thing to worry about here would be any
> dependencies the assigned platform device has on any other platform
> devices in the system that don't get assigned to the guest.

You're aware of vfio-platform, right?  Is vfio-platform with
enable_unsafe_noiommu_mode=1 on the vfio module what you're trying to
do?  Of course if you have a non-DMA device, you could also create a
host driver that wraps it via mdev.  You could even make the device
expose a vfio-pci rather than vfio-platform API and invent a fake
config space for it so you don't need to mess with ACPI (assuming
there's a driver in the guest that could bind to a PCI version of the
device).

> 3) Are PCI devices always DMA masters, or at least are they always put
> in an IOMMU group? Have you seen cases of PCI devices that were not
> assignable to a guest through vfio-pci because they weren't in an
> IOMMU group and/or weren't DMA masters?

Non-DMA master PCI devices is not a set that has any special handling.
AFAIK, there's really no way to define a PCI device as non-DMA.
Perhaps the bus-master bit could be hard-coded to zero, but I think
that would be ad-hoc, not really defined by the spec.  Whether a PCI
device is placed into an IOMMU group depends on the topology, if it's
downstream of an IOMMU, then it's placed into an IOMMU group,
regardless of DMA capabilities.  A system could be constructed where
only a subset of devices are downstream of an IOMMU, but I've never
seen such a configuration.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Q on vfio-pci driver usage on Host

2020-04-13 Thread Alex Williamson
On Mon, 13 Apr 2020 10:33:21 -0700
Ravi Kerur  wrote:

> On Mon, Apr 13, 2020 at 8:36 AM Alex Williamson 
> wrote:
> 
> > On Sun, 12 Apr 2020 09:10:49 -0700
> > Ravi Kerur  wrote:
> >  
> > > Hi,
> > >
> > > I use Intel NICs for PF and VF devices. VFs are assigned to virtual
> > > machines and PF is used on the Host. I have intel-iommu=on on GRUB which
> > > enables DMAR and IOMMU capabilities (checked via 'dmesg | grep -e IOMMU  
> > -e  
> > > DMAR) and use DPDK for datapath acceleration.
> > >
> > > Couple of clarifications I need in terms of vfio-pci driver usage
> > >
> > > (1) with intel-iommu=pt (Passthrough mode), PF device on host can bind to
> > > either igb_uio or vfio-pci driver and similarly VF devices assigned to  
> > each  
> > > VM can bind to either igb_uio or vfio-pci driver via Qemu  
> >
> > Note that the actual option is 'intel_iommu=on iommu=pt'.
> >  
> 
> My mistake,
> 
> >  
> > > (2) with intel-iommu=on (IOMMU enabled), PF device on host must bind to
> > > vfio-pci driver and similarly VF devices assigned to each VM much bind to
> > > vfio-pci driver. When IOMMU is enabled, only vfio-pci should be used?  
> >
> > When an IOMMU is present, we refer to the address space through which a
> > device performs DMA as the I/O Virtual Address space, or IOVA.  When
> > the IOMMU is in passthrough mode, we effectively create an identity
> > mapping of physical addresses through the IOVA space.  Therefore to
> > program a device to perform a DMA to user memory, the user only needs
> > to perform a virtual to physical translation on the address and the
> > device can DMA directly with that physical address thanks to the
> > identity map.  When we're not in passthrough mode, we need to actually
> > create a mapping through the IOMMU to allow the device to access that
> > physical memory.  VFIO is the only userspace driver interface that I'm
> > aware of that provides this latter functionality.  Therefore, yes, if
> > you have the IOMMU enabled and not in passthrough mode, your userspace
> > driver needs support for programming the IOMMU, which vfio-pci provides.
> >
> > Also, having both the PF and VFs owned by userspace drivers presents
> > some additional risks, for example the PF may have access to the data
> > accessed by the VF, or at least be able to deny service to the VF.
> > There have been various hacks around this presented by the DPDK
> > community, essentially enabling SR-IOV underneath vfio-pci, without the
> > driver's knowledge.  These are very much dis-recommended, IMO.
> > However, we have added SR-IOV support to the vfio-pci driver in kernel
> > v5.7 and DPDK support is under development, which represents this trust
> > and collaboration between PF and VF drivers using a new VF token
> > concept.  I'd encourage you to look for this if your configuration does
> > require both PF and VF drivers in userspace.  A much more normal
> > configuration to this point has been that the PF makes use of a host
> > driver (ex. igb, ixgbe, i40e, etc.) while the VF is bound to vfio-pci
> > for userspace drivers.  In this configuration the host driver is
> > considered trusted and we don't need to invent new mechanisms to
> > indicate collaboration between userspace drivers.  Thanks,
> >  
> 
> Thanks for the information. Clearly understand what I need to do. Where can
> I find information on vfio-pci sr-iov support (writeup/design)?

https://lore.kernel.org/kvm/158396044753.5601.14804870681174789709.st...@gimli.home/

http://mails.dpdk.org/archives/dev/2020-April/163095.html

Thanks,
Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Q on vfio-pci driver usage on Host

2020-04-13 Thread Alex Williamson
On Sun, 12 Apr 2020 09:10:49 -0700
Ravi Kerur  wrote:

> Hi,
> 
> I use Intel NICs for PF and VF devices. VFs are assigned to virtual
> machines and PF is used on the Host. I have intel-iommu=on on GRUB which
> enables DMAR and IOMMU capabilities (checked via 'dmesg | grep -e IOMMU -e
> DMAR) and use DPDK for datapath acceleration.
> 
> Couple of clarifications I need in terms of vfio-pci driver usage
> 
> (1) with intel-iommu=pt (Passthrough mode), PF device on host can bind to
> either igb_uio or vfio-pci driver and similarly VF devices assigned to each
> VM can bind to either igb_uio or vfio-pci driver via Qemu

Note that the actual option is 'intel_iommu=on iommu=pt'.
 
> (2) with intel-iommu=on (IOMMU enabled), PF device on host must bind to
> vfio-pci driver and similarly VF devices assigned to each VM much bind to
> vfio-pci driver. When IOMMU is enabled, only vfio-pci should be used?

When an IOMMU is present, we refer to the address space through which a
device performs DMA as the I/O Virtual Address space, or IOVA.  When
the IOMMU is in passthrough mode, we effectively create an identity
mapping of physical addresses through the IOVA space.  Therefore to
program a device to perform a DMA to user memory, the user only needs
to perform a virtual to physical translation on the address and the
device can DMA directly with that physical address thanks to the
identity map.  When we're not in passthrough mode, we need to actually
create a mapping through the IOMMU to allow the device to access that
physical memory.  VFIO is the only userspace driver interface that I'm
aware of that provides this latter functionality.  Therefore, yes, if
you have the IOMMU enabled and not in passthrough mode, your userspace
driver needs support for programming the IOMMU, which vfio-pci provides.

Also, having both the PF and VFs owned by userspace drivers presents
some additional risks, for example the PF may have access to the data
accessed by the VF, or at least be able to deny service to the VF.
There have been various hacks around this presented by the DPDK
community, essentially enabling SR-IOV underneath vfio-pci, without the
driver's knowledge.  These are very much dis-recommended, IMO.
However, we have added SR-IOV support to the vfio-pci driver in kernel
v5.7 and DPDK support is under development, which represents this trust
and collaboration between PF and VF drivers using a new VF token
concept.  I'd encourage you to look for this if your configuration does
require both PF and VF drivers in userspace.  A much more normal
configuration to this point has been that the PF makes use of a host
driver (ex. igb, ixgbe, i40e, etc.) while the VF is bound to vfio-pci
for userspace drivers.  In this configuration the host driver is
considered trusted and we don't need to invent new mechanisms to
indicate collaboration between userspace drivers.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] NIC driver

2020-04-01 Thread Alex Williamson
On Wed, 1 Apr 2020 11:33:21 +
"McLeod, Dennis"  wrote:

> I am working on a project in which we have a driver that does the
> necessary register_netdev() stuff at the kernel level. If I wanted to
> take advantage of the vfio framework .. how would the device get
> registered as a network device (with the kernel) if I have to unbind
> my driver to bind vfio-pci? Very new to vfio, so apologies in advance
> if my understanding is way off the mark.

vfio is for writing userspace drivers, not for writing kernel drivers
in userspace.  If you want a netdev, write a kernel driver.  If you
want to bypass the kernel network stack with a driver entirely in
userspace (ex. DPDK), use vfio.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] GPU passthrough: device does not support requested feature x-vga

2020-03-31 Thread Alex Williamson
On Tue, 31 Mar 2020 21:35:33 +0300
Артем Семенов  wrote:

> Hello!
> 
> I try to passthrough GPU to the virtual machine (qemu). I've tried 
> different variants:
> 
> -device vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on
> 
> or
> 
> -device vfio-pci,host=02:00.0,x-vga=on
> 
> and other but in any case I get error:
> 
> vfio :02:00.0: failed getting region info for VGA region index 8: 
> Invalid argument
> device does not support requested feature x-vga
> 
> dmesg contains:
> 
> [  844.599821] vfio_ecap_init: :02:00.0 hiding ecap 0x1e@0x258
> [  844.599860] vfio_ecap_init: :02:00.0 hiding ecap 0x19@0x900
> 
> If I remove x-vga=on option then qemu works but there is no signal from 
> GPU (black screen).
> 
> It looks like http://vfio.blogspot.com/2014/08/vfiovga-faq.html
> 
> "Question 3:
> 
> I have Intel host graphics, when I start the VM I don't get any output 
> on the assigned VGA monitor... "
> 
> I've tried to use 2 versions of the kernel: 4.19 and 5.4 - no difference.
> 
> Moterboard: ASUS PRIME H370-A.
> GPU: ASUS Turbo GeForce RTX 2060 Super.
> 
> What could be the cause of this problem?

Is your host kernel built with CONFIG_VFIO_PCI_VGA=y?  Are you
disabling that support with the disable_vga=1 module option of
vfio-pci?  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] missing VFIO_IOMMU_NOTIFY_DMA_UNMAP event when driver hasn't called vfio_pin_pages

2020-03-05 Thread Alex Williamson


Probably better to ask on the vfio devel list (kvm) rather than the
vfio user's list...

On Fri, 28 Feb 2020 17:20:20 +
Thanos Makatos  wrote:

> > Drivers that handle DMA region registration events without having to call
> > vfio_pin_pages (e.g. in muser we inject the fd backing that VMA to a
> > userspace
> > process and then ask it to memory map that fd) aren't notified at all when
> > that
> > region is unmapped.  Because of this, we get duplicate/overlapping DMA
> > regions
> > that can be problematic to properly handle (e.g. we can implicitly unmap the
> > existing region etc.). Would it make sense for VFIO to always send the DMA
> > unregistration event to a driver (or at least conditionally, if the driver
> > requests it with some flag during driver registration time), even if it 
> > doesn't
> > currently have any pages pinned? I think this could be beneficial for 
> > drivers
> > other than muser, e.g. some driver set up some bookkeeping data structure
> > in
> > response to the DMA_MAP event but it didn't happen to have to pin any
> > page. By
> > receiving the DMA_UNMAP event it could release that data
> > structure/memory.
> > 
> > I've experimented with a proof of concept which seems to work:
> > 
> > # git diff --cached
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c
> > index d864277ea16f..2aaa88f64c67 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -875,6 +875,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu
> > *iommu,
> > struct vfio_dma *dma, *dma_last = NULL;
> > size_t unmapped = 0;
> > int ret = 0, retries = 0;
> > +   bool advised = false;
> > 
> > mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> > 
> > @@ -944,9 +945,11 @@ static int vfio_dma_do_unmap(struct vfio_iommu
> > *iommu,
> > if (dma->task->mm != current->mm)
> > break;
> > 
> > -   if (!RB_EMPTY_ROOT(>pfn_list)) {
> > +   if (!RB_EMPTY_ROOT(>pfn_list) || !advised) {
> > struct vfio_iommu_type1_dma_unmap nb_unmap;
> > 
> > +   advised = true;
> > +

The while() loop iterates over every overlapping mapping, but this
would only advise on the first overlap.  I think instead of this
advised bool logic you'd only unlock the iommu->lock and take the
again: goto if there were pinned pages.

> > if (dma_last == dma) {
> > BUG_ON(++retries > 10);
> > } else {  
> 
> I have also experimented with sending two overlapping DMA regions to VFIO and
> vfio_dma_do_map explicitly fails this operation with -EEXIST. Therefore I 
> could
> assume that if my driver receives two overlapping DMA regions then the 
> existing
> one can be safely unmapped. However, there is still a possibility of resource
> leak since there is no guarantee that at least part of an unmapped DMA region
> will be clobbered by another one. This could be partially mitigated by
> introducing some timeout to unmap the fd of a DMA region that hasn't been
> accessed for some time (and then mmap it on demand if necessary), but it's 
> still
> not ideal.

Seems like the approach you'd take only after exhausting all options to
get an unmap notification, which we certainly haven't yet.
 
> I still think giving the option to mdev drivers to request to be notified
> for DMA unmap events is the best way to solve this problem. Are there other
> alternatives?

I don't have an immediate problem with using the existing notifier
regardless of whether pages are pinned.  If we had to we could use
another event bit to select all unmaps versus only unmaps overlapping
pinned pages.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] vfio not working with vanilla kernel 5.4.22

2020-02-24 Thread Alex Williamson
On Mon, 24 Feb 2020 10:40:39 +
"Bronek Kozicki"  wrote:

> Heads up to anyone running the latest vanilla kernels - after upgrade
> from 5.4.21 to 5.4.22 one of my VMs lost access to a vfio1
> passed-through GPU. This was restored when I downgraded to 5.4.21 so
> the problem seems related to some patch in version 5.4.22
> 
> Also, when starting the VM, I noticed the hypervisor log flooded with
> messages "BAR 3: can't reserve" like:
> 
> Feb 24 09:49:38 gdansk.lan.incorrekt.net kernel: vfio-pci
> :03:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258 Feb 24 09:49:38
> gdansk.lan.incorrekt.net kernel: vfio-pci :03:00.0:
> vfio_ecap_init: hiding ecap 0x19@0x900 Feb 24 09:49:38
> gdansk.lan.incorrekt.net kernel: vfio-pci :03:00.0: BAR 3: can't
> reserve [mem 0xc000-0xc1ff 64bit pref] Feb 24 09:49:38
> gdansk.lan.incorrekt.net kernel: vfio-pci :03:00.0: No more image
> in the PCI ROM Feb 24 09:51:43 gdansk.lan.incorrekt.net kernel:
> vfio-pci :03:00.0: BAR 3: can't reserve [mem
> 0xc000-0xc1ff 64bit pref] Feb 24 09:51:43
> gdansk.lan.incorrekt.net kernel: vfio-pci :03:00.0: BAR 3: can't
> reserve [mem 0xc000-0xc1ff 64bit pref] Feb 24 09:51:43
> gdansk.lan.incorrekt.net kernel: vfio-pci :03:00.0: BAR 3: can't
> reserve [mem 0xc000-0xc1ff 64bit pref] Feb 24 09:51:43
> gdansk.lan.incorrekt.net kernel: vfio-pci :03:00.0: BAR 3: can't
> reserve [mem 0xc000-0xc1ff 64bit pref] Feb 24 09:51:43
> gdansk.lan.incorrekt.net kernel: vfio-pci :03:00.0: BAR 3: can't
> reserve [mem 0xc000-0xc1ff 64bit pref]
> 
> journalctl -b-2 | grep "vfio-pci :03:00.0: BAR 3: can't reserve"
> | wc -l 2609
> 
> Finally, when shutting down the VM I observed kernel panic on the
> hypervisor:
> 
> [  873.831301] Kernel panic - not syncing: Timeout: Not all CPUs
> entered broadcast exception handler [  874.874008] Shutting down cpus
> with NMI [  874.888189] Kernel Offset: 0x0 from 0x8100
> (relocation range: 0x8000-0xbfff) [
> 875.074319] Rebooting in 30 seconds..

Tried v5.4.22, not getting anything similar.  Potentially there's a
driver activated in this kernel that wasn't previously on your system
and it's attached itself to part of your device.  Look in /proc/iomem
to see what it might be and disable it.  Thanks,

Alex 

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Problems passing through Creative X-Fi PCIe sound card

2020-02-13 Thread Alex Williamson
On Fri, 14 Feb 2020 00:17:43 +1100
Michael Slade  wrote:

> Adding nointxmask=1 worked!  With no issues at all.  I think because all 
> the devices are getting their own interrupts (on the host) anyway.
> 
> So do you want me to try to add the card to quirks.c?  I could probably 
> manage it, just I haven't compiled a kernel in ~50 years.

Great!  I think the below should work for a quirk, if you can manage to
build a kernel and try it (removing the nointxmask option), it would be
much appreciated.  Thanks,

Alex

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 29f473ebf20f..3fce64ec6d63 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3385,6 +3385,13 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x37d0, 
quirk_broken_intx_masking);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x37d1, 
quirk_broken_intx_masking);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x37d2, 
quirk_broken_intx_masking);
 
+/*
+ * Creative Labs EMU20k2
+ * https://www.redhat.com/archives/vfio-users/2020-February/msg1.html
+ */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CREATIVE, PCI_DEVICE_ID_CREATIVE_20K2,
+   quirk_broken_intx_masking);
+
 static u16 mellanox_broken_intx_devs[] = {
PCI_DEVICE_ID_MELLANOX_HERMON_SDR,
PCI_DEVICE_ID_MELLANOX_HERMON_DDR,

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] VFIO and coherency with DMA

2020-02-13 Thread Alex Williamson
On Thu, 13 Feb 2020 10:02:26 +
"Stark, Derek"  wrote:

> Hello,
> 
> I've been experimenting with VFIO with one of our FPGA cards using a
> Xilinx part and XDMA IP core. It's been smooth progress so far and
> I've had no issues with bar access and also DMA mapping/transfers to
> and from the card. All in all, I'm finding it a very nice userspace
> driver framework.
> 
> I'm hoping someone can help clarify my understanding of how VFIO
> works for DMA in terms of coherence. I'm on a standard x86_64 Intel
> Xeon platform.
> 
> In the  code I see:
> VFIO
> /*
> * IOMMU enforces DMA cache coherence (ex. PCIe NoSnoop stripping).
> This
> * capability is subject to change as groups are added or removed.
> */
> #define VFIO_DMA_CC_IOMMU   4
> 
> 
> Which implies that IOMMU sets the mappings up as coherent is this
> understanding correct?

No, this is a mechanism for reporting the cache coherency of the IOMMU.
For example, KVM uses this to understand whether it needs to emulate
wbinv instructions for cases where the DMA is not coherent.  There's
nothing vfio can specifically do in the IOMMU mapping to make a DMA
coherent afaik.

> I'm more used to having scatter gather based DMAs where you need to
> sync for the CPU or the device depending upon who owns/accesses the
> memory.
> 
> The use case I am specifically looking at is if a DMA mapping is
> setup through VFIO and then left open whilst data is transferred from
> the device to host memory and then the CPU is processing this data.
> The pinned/mapped data buffer is reused repeatedly as part of a ring
> of buffers. It's only at the point of closing down this application
> that the buffer would be unmapped in vfio.
> 
> Is there any sync type functions or equivalents I need to be aware of
> in this case? Can VFIO DMA mapped memory buffers be safely used in
> this way?

It can, but you need to test that cache coherence extension above to
know whether the processor is coherent with DMA.  If it's not then you
need to invalidate the processor cache before you pull in new data from
the device or else you might just be re-reading stale data from the
cache.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Problems passing through Creative X-Fi PCIe sound card

2020-02-12 Thread Alex Williamson
On Thu, 13 Feb 2020 13:41:44 +1100
Michael Slade  wrote:

> Hi everyone,
> 
> I'll attempt to start with enough info to describe my situation without 
> pasting a complete `lspci -vvvxxx` output etc.
> 
> My special sound card doesn't want to work when passed through to a guest.
> 
> The card is a Creative "X-Fi Titanium Fatal1ty Pro" (1102:000b), and the 
> mobo is an Asus "TUF Gaming X570 Plus (Wi-Fi)".
> 
> I have already had much success passing through multiple other PCIe 
> devices, including the primary GPU with its HDMI audio, 2 USB 
> controllers and a SATA controller, so I think I have the general process 
> down.  The VM's base is pc-i440fx-3.1, running under libvirt.
> 
> So when I attempt to use the card in the guest, within the first few 
> minutes of running the host goes:
> 
> Feb 13 11:48:01 mickpc-bullseye kernel: [13186.228134] irq 98: nobody 
> cared (try booting with the "irqpoll" option)
> [snip]
> Feb 13 11:48:01 mickpc-bullseye kernel: [13186.228199] handlers:
> Feb 13 11:48:01 mickpc-bullseye kernel: [13186.228202] 
> [<9b052715>] vfio_intx_handler [vfio_pci]
> Feb 13 11:48:01 mickpc-bullseye kernel: [13186.228203] Disabling IRQ #98
> 
> Then sound stops working for *some* clients on the guest (including 
> pulseaudio).  The sound goes choppy, as if it's playing each 0.2s of 
> audio 4-5 times.
> 
> I think i tried irqpoll a while back with no luck (From my understanding 
> of what it does, it won't help here anyway)
> 
> I have tried multiple kernel versions on both host and guest (haven't 
> tried 5.5 yet though).
> 
> /proc/interrupts on the host says:
>    98:  0  0  0  0 0  0 
> 40  0  IR-IO-APIC    7-fasteoi vfio-intx(:04:00.0)
> 
> And on the guest:
>   11:  0   7110  0  0 0  0  
> 0  0   IO-APIC  11-fasteoi virtio3, uhci_hcd:usb1, snd_ctxfi
> 
> This is the only device whose interrupt is MSI on the host and not on 
> the guest, and also the only device which is sharing interrupts with 
> other devices on the guest.

VFIO doesn't run in different modes between host and guest, what you
show here is INTx in host and guest.
 
> Can anyone could shed some light on what is actually happening here and 
> how it could be fixed?

I'd guess the device probes OK for DisINTx support, but it doesn't
actually work, the interrupt continues to fire but vfio-pci says "not
my interrupt, my device is masked", when actually it is.  There's a
vfio-pci module option, nointxmask=1 you can use to disable this PCI
2.3 required feature and mask at the IOAPIC instead.  The difficulty is
that vfio needs to be able to get an exclusive interrupt for the
device when using IOAPIC masking, which might mean you need to unbind
anything sharing the interrupt in the host.  The extra bummer is that
it's a global option, so you'll need to do the same for all other
assigned devices.  If it works, we can specifically blacklist the device
in the kernel (drivers/pci/quirks.c) using the quirk_broken_intx_masking
function in the fixup so that the vfio-pci module option is not
required and you'll only need to make sure the audio card has an
exclusive interrupt. TL;DR, the device probes ok for an interrupt
masking feature it doesn't support and interrupts for the card die.
Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] DMA Error -14 with I2C controller passthrough

2020-01-13 Thread Alex Williamson
On Mon, 13 Jan 2020 22:17:29 +0100
Davide Miola  wrote:

> Hi,
> 
> I'm trying to passthrough one of the two I2C controllers my laptop has to
> get the touchpad working in a VM (Disclaimer: I know it's not as simple as
> passing through the controller, but I've got to start somewhere).
> 
> I saw this
>  thread,
> so I think it should be possible after all, but so far I'm facing a lot of
> trouble:
> basically I can't get any of the two I2C controllers of my laptop to
> passthrough, because, after binding them to vfio-pci and starting a VM (I'm
> using a Fedora Workstation iso for that), I get this error:
> 
> qemu-system-x86_64: VFIO_MAP_DMA: -14
> qemu-system-x86_64: vfio_dma_map(0x7f1cccdf0fc0, 0xfebf1000, 0x1000,
> 0x7f1ccf493000) = -14 (Wrong address)
> 
> I get that once when SeaBIOS loads and then another six times in a burst
> when the guest os loads (the amount of times I get that error seems to
> change when using libvirt).
> Dmesg says nothing both on the host and the guest.
> This happens both using libvirt and without, but I've also tried:
>  - 5.4 instead of 5.3 kernel;
>  - Manjaro and Fedora host;
>  - booting in legacy instead of UEFI;
>  - binding to vfio-pci via kernel parameters instead of doing it after boot;
>  - using different QEMU machines;
>  - using OVMF or SeaBIOS;
>  - using KVM or not.
> 
> None of this made any difference.
> 
> I should clarify that that error does not prevent the VM from booting, in
> fact it seems to boot just fine, and lspci on the guest even reports the
> device as being present (same name and vendor/device ids as on the host),
> but it won't bind to intel-lpss, that's the real problem.
> 
> I should also mention that I have no problems passing through other devices
> on this laptop, in fact I've successfully passed through the IGPU (legacy
> passthrough), the USB controller, and chipset audio, all without a problem.
> 
> It goes without saying that I'd be very interested in a possible future
> upstream implementation of something that would allow this kind of
> application, as was proposed in the thread I linked above.
> 
> I have considered using Evdev passthrough, but I find it very unreliable (I
> got it to work for a mouse only via libvirt, keyboard works fine), but I
> just couldn't get it to work for the touchpad: with plain QEMU the touchpad
> kinda works but not as you'd expect, as the mouse teleports to where I
> touch on the touchpad (as if there were a 1:1 mapping between the touchpad
> and the screen), it's very weird and very not usable; and with libvirt I
> couldn't get any mouse movement. But even if I could get it to work, it is
> my understanding that Evdev passthrough just emulates a PS2 mouse, so that
> means I'd lose any multitouch feature, which is not ideal.
> 
> Anyway, my laptop is running a 9th gen CoffeeLake Refresh H-series CPU on
> Manjaro with Linux 5.3 as the main system.

The faulting IOVA (0xfebf1000) looks like it's probably MMIO space of
the device itself, ie. we're trying to establish peer-to-peer DMA
mappings.  This is also the only type of mapping that QEMU would
consider non-fatal to allow the VM to continue running, because they're
rarely used.  I'm curious why this is failing, but I don't think it's
the source of the device not working in the guest.  It looks like the
vfio driver would only generate an -EFAULT for an error in the ioctl
args, so the error might come from deeper in the IOMMU driver in the
host.  Not sure.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Passing arbitrary IRQ to guest?

2019-12-11 Thread Alex Williamson
On Wed, 11 Dec 2019 14:40:56 -0800
Micah Morton  wrote:

> On Wed, Dec 11, 2019 at 10:44 AM Alex Williamson
>  wrote:
> >
> > On Wed, 11 Dec 2019 09:37:57 -0800
> > Micah Morton  wrote:
> >  
> > > On Tue, Dec 10, 2019 at 4:00 PM Alex Williamson
> > >  wrote:  
> > > >
> > > > On Mon, 9 Dec 2019 14:18:50 -0800
> > > > Micah Morton  wrote:
> > > >  
> > > > > On Thu, Sep 5, 2019 at 12:22 PM Micah Morton  
> > > > > wrote:  
> > > > > >
> > > > > > On Wed, Aug 28, 2019 at 3:22 PM Alex Williamson
> > > > > >  wrote:  
> > > > > > >
> > > > > > > On Wed, 28 Aug 2019 09:39:57 -0700
> > > > > > > Micah Morton  wrote:
> > > > > > >  
> > > > > > > > On Mon, Aug 5, 2019 at 11:14 PM Gerd Hoffmann 
> > > > > > > >  wrote:  
> > > > > > > > >
> > > > > > > > > On Mon, Aug 05, 2019 at 12:50:00PM -0700, Micah Morton wrote: 
> > > > > > > > >  
> > > > > > > > > > On Thu, Aug 1, 2019 at 10:36 PM Gerd Hoffmann 
> > > > > > > > > >  wrote:  
> > > > > > > > > > >
> > > > > > > > > > >   Hi,
> > > > > > > > > > >  
> > > > > > > > > > > > From my perspective, as a low-speed device where we 
> > > > > > > > > > > > don't really need
> > > > > > > > > > > > the benefits of an IOMMU, I'd be more inclined to look 
> > > > > > > > > > > > at why it
> > > > > > > > > > > > doesn't work with evdev.  We already have a tablet 
> > > > > > > > > > > > device in QEMU,
> > > > > > > > > > > > what's it take to connect that to evdev?  Cc'ing Gerd 
> > > > > > > > > > > > as maybe he's
> > > > > > > > > > > > already though about touchpad support.  Thanks,  
> > > > > > > > > > >
> > > > > > > > > > > It's not clear why the touchpad doesn't work.  Possibly 
> > > > > > > > > > > using libinput
> > > > > > > > > > > helps, 
> > > > > > > > > > > https://git.kraxel.org/cgit/qemu/log/?h=sirius/display-drm
> > > > > > > > > > >  has
> > > > > > > > > > > some code.  Wiring up to input-linux isn't done yet 
> > > > > > > > > > > though, only the
> > > > > > > > > > > drm ui uses libinput support so far.  
> > > > > > > > > >
> > > > > > > > > > To be clear are you saying that its a known issue that the 
> > > > > > > > > > touchpad
> > > > > > > > > > doesn't work in VM guest with QEMU and evdev?  
> > > > > > > > >
> > > > > > > > > There are other reports of touchpad problems.  I don't know 
> > > > > > > > > whenever
> > > > > > > > > that is a general problem or specific to some devices.
> > > > > > > > >
> > > > > > > > > libinput knows quirks for lots of input devices.  When 
> > > > > > > > > passing through
> > > > > > > > > the evdev to the guest as virtio device libinput can't see 
> > > > > > > > > the device
> > > > > > > > > identity and thus can't apply quirks.  Which might be the 
> > > > > > > > > reason the
> > > > > > > > > touchpad doesn't work.  Using libinput on the host side might 
> > > > > > > > > fix this.
> > > > > > > > >
> > > > > > > > > cheers,
> > > > > > > > >   Gerd
> > > > > > > > >  
> > > > > > > >
> > > > > > > > I was able to get physical passthrough of the touchpad working 
> > > > > > > > in the
> > > > > > > > VM guest by forwarding the IRQ to the guest using the 
> >

Re: [vfio-users] Passing arbitrary IRQ to guest?

2019-12-11 Thread Alex Williamson
On Wed, 11 Dec 2019 09:37:57 -0800
Micah Morton  wrote:

> On Tue, Dec 10, 2019 at 4:00 PM Alex Williamson
>  wrote:
> >
> > On Mon, 9 Dec 2019 14:18:50 -0800
> > Micah Morton  wrote:
> >  
> > > On Thu, Sep 5, 2019 at 12:22 PM Micah Morton  
> > > wrote:  
> > > >
> > > > On Wed, Aug 28, 2019 at 3:22 PM Alex Williamson
> > > >  wrote:  
> > > > >
> > > > > On Wed, 28 Aug 2019 09:39:57 -0700
> > > > > Micah Morton  wrote:
> > > > >  
> > > > > > On Mon, Aug 5, 2019 at 11:14 PM Gerd Hoffmann  
> > > > > > wrote:  
> > > > > > >
> > > > > > > On Mon, Aug 05, 2019 at 12:50:00PM -0700, Micah Morton wrote:  
> > > > > > > > On Thu, Aug 1, 2019 at 10:36 PM Gerd Hoffmann 
> > > > > > > >  wrote:  
> > > > > > > > >
> > > > > > > > >   Hi,
> > > > > > > > >  
> > > > > > > > > > From my perspective, as a low-speed device where we don't 
> > > > > > > > > > really need
> > > > > > > > > > the benefits of an IOMMU, I'd be more inclined to look at 
> > > > > > > > > > why it
> > > > > > > > > > doesn't work with evdev.  We already have a tablet device 
> > > > > > > > > > in QEMU,
> > > > > > > > > > what's it take to connect that to evdev?  Cc'ing Gerd as 
> > > > > > > > > > maybe he's
> > > > > > > > > > already though about touchpad support.  Thanks,  
> > > > > > > > >
> > > > > > > > > It's not clear why the touchpad doesn't work.  Possibly using 
> > > > > > > > > libinput
> > > > > > > > > helps, 
> > > > > > > > > https://git.kraxel.org/cgit/qemu/log/?h=sirius/display-drm has
> > > > > > > > > some code.  Wiring up to input-linux isn't done yet though, 
> > > > > > > > > only the
> > > > > > > > > drm ui uses libinput support so far.  
> > > > > > > >
> > > > > > > > To be clear are you saying that its a known issue that the 
> > > > > > > > touchpad
> > > > > > > > doesn't work in VM guest with QEMU and evdev?  
> > > > > > >
> > > > > > > There are other reports of touchpad problems.  I don't know 
> > > > > > > whenever
> > > > > > > that is a general problem or specific to some devices.
> > > > > > >
> > > > > > > libinput knows quirks for lots of input devices.  When passing 
> > > > > > > through
> > > > > > > the evdev to the guest as virtio device libinput can't see the 
> > > > > > > device
> > > > > > > identity and thus can't apply quirks.  Which might be the reason 
> > > > > > > the
> > > > > > > touchpad doesn't work.  Using libinput on the host side might fix 
> > > > > > > this.
> > > > > > >
> > > > > > > cheers,
> > > > > > >   Gerd
> > > > > > >  
> > > > > >
> > > > > > I was able to get physical passthrough of the touchpad working in 
> > > > > > the
> > > > > > VM guest by forwarding the IRQ to the guest using the kvm/qemu/vfio
> > > > > > framework.
> > > > > >
> > > > > > So basically I wrote extensions to kvm/qemu/vfio to allow for
> > > > > > forwarding arbitrary IRQs to the guest (the IRQ doesn't have to be
> > > > > > associated with any vfio-pci or vfio-platform device). I could clean
> > > > > > up the patches and upstream them (or think about it) if you folks
> > > > > > think anyone else might want to use this functionality? Then again 
> > > > > > as
> > > > > > Alex said before you still need to communicate to the VM which IRQ 
> > > > > > to
> > > > > > use for this device (in my case I did this by modifying ACPI stuff 
> > > > > > in
> > > > > > SeaBIOS, not sure how it could be incorporated into vfio)

Re: [vfio-users] VFIO_MAP_DMA error EINVAL

2019-12-11 Thread Alex Williamson
On Wed, 11 Dec 2019 13:17:18 +
cprt  wrote:

> Hello,
> I am using VFIO with QEMU trying to passthrough my audio device.
> 
> I successfully did this operation with my previous system, with a 7th 
> generation intel and an older kernel.
> Now I am using a 10th generation intel and a newer kernel (5.4), and I can no 
> longer make this work.
> 
> Looking at the QEMU code and errors, I can see that this call is failing:
> ioctl(container->fd, VFIO_IOMMU_MAP_DMA, )
> with error 22 (EINVAL).
> 
> This is the qemu log:
> qemu-system-x86_64: -device vfio-pci,host=00:1f.3: 
> vfio_dma_map(0x7f86fdc7c480, 0xc, 0x7ff4, 0x7f83f20c) = -22 
> (Invalid argument)
> 
> I have setup my system as follows:
> $ cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-linux root=UUID=5b780644-8382-11e9-a363-1b71c3bf36e9 rw 
> loglevel=3 i915.enable_psr=0 intel_iommu=on iommu=pt i915.enable_gvt=1 
> kvm.ignore_msrs=1 vfio-pci.ids=8086:0284,8086:02c8,8086:02a3,8086:02a4
> 
> And this is the information of my PCI layout:
> $ lspci -s 00:1f
> 00:1f.0 ISA bridge: Intel Corporation Device 0284
> 00:1f.3 Audio device: Intel Corporation Device 02c8
> 00:1f.4 SMBus: Intel Corporation Device 02a3
> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Comet Lake SPI 
> (flash) Controller
> $ lspci -s 00:1f -n
> 00:1f.0 0601: 8086:0284
> 00:1f.3 0403: 8086:02c8
> 00:1f.4 0c05: 8086:02a3
> 00:1f.5 0c80: 8086:02a4
> 
> I tried (just as an experiment) to bypass the error check in the QEMU code, 
> and the virtualized audio device works as expected for a while, then it stops.
> 
> Do you know what could generate the problem?

I think this is a result of the reserved region support added in v5.4
which intends to prevent userspace from mapping ranges it shouldn't.
On my system I have:

$ lspci -nns 00:1f.
00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point LPC Controller/eSPI 
Controller [8086:9d4e] (rev 21)
00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-LP PMC 
[8086:9d21] (rev 21)
00:1f.3 Audio device [0403]: Intel Corporation Sunrise Point-LP HD Audio 
[8086:9d71] (rev 21)
00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-LP SMBus [8086:9d23] (rev 
21)
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (4) 
I219-LM [8086:15d7] (rev 21)

$ readlink -f /sys/bus/pci/devices/:00:1f.3/iommu_group
/sys/kernel/iommu_groups/9

$ find /sys/kernel/iommu_groups/9/devices -type l
/sys/kernel/iommu_groups/9/devices/:00:1f.2
/sys/kernel/iommu_groups/9/devices/:00:1f.0
/sys/kernel/iommu_groups/9/devices/:00:1f.3
/sys/kernel/iommu_groups/9/devices/:00:1f.6
/sys/kernel/iommu_groups/9/devices/:00:1f.4

$ cat /sys/kernel/iommu_groups/9/reserved_regions 
0x 0x00ff direct
0xfee0 0xfeef msi

This direct range seems to be the trouble, your error indicates the
problem occurs when QEMU tries to map the GPA range 0xc-0x7ff4,
which clearly overlaps 0x0-0xff.  It seems this reserved range
comes from this code:

drivers/iommu/intel-iommu.c:
#ifdef CONFIG_INTEL_IOMMU_FLOPPY_WA
if (dev_is_pci(device)) {
struct pci_dev *pdev = to_pci_dev(device);

if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_ISA) {
reg = iommu_alloc_resv_region(0, 1UL << 24, 0,
  IOMMU_RESV_DIRECT);
if (reg)
list_add_tail(>list, head);
}
}
#endif /* CONFIG_INTEL_IOMMU_FLOPPY_WA */

We can see above that we do have an ISA bridge in the same IOMMU group
as the audio device, so this is effectively fallout from the poor
grouping of these onboard devices.

This code was introduced in v5.3 via:

commit d850c2ee5fe2259968e3889624ad22ea15cb4a38
Author: Lu Baolu 
Date:   Sat May 25 13:41:24 2019 +0800

iommu/vt-d: Expose ISA direct mapping region via iommu_get_resv_regions

To support mapping ISA region via iommu_group_create_direct_mappings,
make sure its exposed by iommu_get_resv_regions.

Signed-off-by: James Sewart 
Signed-off-by: Lu Baolu 
Signed-off-by: Joerg Roedel 

The original Kconfig option this relates to was added long ago in:

commit 49a0429e53f29109cbf1eadd89497286ba81f1ae
Author: Keshavamurthy, Anil S 
Date:   Sun Oct 21 16:41:57 2007 -0700

Intel IOMMU: Iommu floppy workaround

This config option (DMAR_FLPY_WA) sets up 1:1 mapping for the floppy device 
so
that the floppy device which does not use DMA api's will continue to work.

Once the floppy driver starts using DMA api's this config option can be turn
off or this patch can be yanked out of kernel at that time.

So AIUI the original floppy workaround created an identity map for the
range 0-16MB.  The reserved region attempts to reflect that reservation
to userspace, however I believe this is a software imposed reserved
region for the benefit of 

Re: [vfio-users] Passing though an USB card problems

2019-11-23 Thread Alex Williamson
On Sat, 23 Nov 2019 15:34:21 +0100
Ede Wolf  wrote:

> Hello,
> 
> I am trying to pass through a PCIe USB card to a guest, instead of just 
> the ports, due to very sensitive USB devices.
> Despite the unbind being reported as successful, the booting of the 
> guest fails with an error:
> 
> qemu-system-x86_64: -device vfio-pci,host=09:04.0,x-vga=off: vfio 
> :09:04.0: Failed to set up TRIGGER eventfd signaling for interrupt 
> INTX-0: VFIO_DEVICE_SET_IRQS failure: Device or resource busy
> 
> While I have not been able to find much about this error, I do have one 
> additional device in the same iommu group, that I suspect to be the 
> reason for the failure. However, somehow this device lacks a "driver" 
> folder and therefor the "unbind" file, so I cannot unbind it.
> 
> I am not sure wether this additional device may be a bridge on the card 
> itself or a sensitive part of the mainboard, however blindly executing an
> 
> echo 12d8 e111 > /sys/bus/pci/drivers/vfio-pci/new_id'
> 
> does not change the behaviour.
> 
> Any ideas on what I may be missing or how to possibly resolve this? Any 
> information that I may have been missing to provide?

The USB device you're trying to assign actually appears to be
conventional PCI, not PCIe.  The other device in the group is a
PCIe-to-PCI/X bridge.  If this is a PCIe plugin card, it must be an old
one or one attempting to use old conventional PCI stock of the USB host
controller chip.  In any case, there's probably a error in dmesg
concurrent with the QEMU error that indicates that the device cannot be
configured with an exclusive interrupt.  I suspect the problem is that
the USB host controller failed the test for disabling INTx, which means
that we can't use a shared interrupt and failed to get an exclusive
interrupt.  It's not easy to configure the host to allow the device an
exclusive interrupt, it might be possible by moving the device to a
different slot or disabling drivers for other devices that share the
same interrupt line.  It's usually easier to get a new device to assign
that isn't broken in this way.  Thanks,

Alex


> Here the information about the other device in the same iommu group:
> 
> # lspci -n -s :08:00.0 -v
> 08:00.0 0604: 12d8:e111 (rev 02) (prog-if 00 [Normal decode])
>  Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0
>  Bus: primary=08, secondary=09, subordinate=09, sec-latency=32
>  I/O behind bridge: 8000-8fff [size=4K]
>  Memory behind bridge: bf60-bf6f [size=1M]
>  Prefetchable memory behind bridge: None
>  Capabilities: [80] PCI-X bridge device
>  Capabilities: [90] Power Management version 3
>  Capabilities: [a8] Subsystem: :
>  Capabilities: [b0] Express PCI-Express to PCI/PCI-X Bridge, MSI 00
>  Capabilities: [f0] MSI: Enable- Count=1/1 Maskable- 64bit+
>  Capabilities: [100] Advanced Error Reporting
>  Capabilities: [150] Virtual Channel
> 
> 
> The first three folders are the ids of the USB card in question:
> 
> # ls -l /sys/bus/pci/devices/\:08\:00.0/
> 
> drwxr-xr-x 4 root root0 23. Nov 15:16 :09:04.0
> drwxr-xr-x 4 root root0 23. Nov 15:16 :09:04.1
> drwxr-xr-x 4 root root0 23. Nov 15:16 :09:04.2
> -r--r--r-- 1 root root 4096 23. Nov 15:16 aer_dev_correctable
> -r--r--r-- 1 root root 4096 23. Nov 15:16 aer_dev_fatal
> -r--r--r-- 1 root root 4096 23. Nov 15:16 aer_dev_nonfatal
> -r--r--r-- 1 root root 4096 23. Nov 15:16 ari_enabled
> -rw-r--r-- 1 root root 4096 23. Nov 15:16 broken_parity_status
> -r--r--r-- 1 root root 4096 23. Nov 15:16 class
> -rw-r--r-- 1 root root 4096 23. Nov 15:16 config
> -r--r--r-- 1 root root 4096 23. Nov 15:16 consistent_dma_mask_bits
> -r--r--r-- 1 root root 4096 23. Nov 15:16 current_link_speed
> -r--r--r-- 1 root root 4096 23. Nov 15:16 current_link_width
> -rw-r--r-- 1 root root 4096 23. Nov 15:16 d3cold_allowed
> -r--r--r-- 1 root root 4096 23. Nov 15:16 device
> -r--r--r-- 1 root root 4096 23. Nov 15:16 devspec
> -r--r--r-- 1 root root 4096 23. Nov 15:16 dma_mask_bits
> -rw-r--r-- 1 root root 4096 23. Nov 15:16 driver_override
> -rw-r--r-- 1 root root 4096 23. Nov 15:16 enable
> 
> And the succesfull unbind, also verifyable by lsub, that lacks 3 devices 
> after unbind:
> 
> 
> # systemctl status kvm_virtio_prepare.service
> ● kvm_virtio_prepare.service - Preparation for PCI Passthru
> Loaded: loaded (/etc/systemd/system/kvm_virtio_prepare.service; 
> static; vendor preset: disabled)
>Process: 963 ExecStart=/usr/bin/sh -c echo ":09:04.0" > 
> /sys/bus/pci/devices/:09:04.0/driver/unbind (code=exited, 
> status=0/SUCCESS)
>Process: 964 ExecStart=/usr/bin/sh -c echo ":09:04.1" > 
> /sys/bus/pci/devices/:09:04.1/driver/unbind (code=exited, 
> status=0/SUCCESS)
>Process: 965 ExecStart=/usr/bin/sh -c echo ":09:04.2" > 
> /sys/bus/pci/devices/:09:04.2/driver/unbind (code=exited, 
> 

Re: [vfio-users] No IOMMU Groups seen in /sys/kernel/iommu_groups/

2019-11-22 Thread Alex Williamson
On Fri, 22 Nov 2019 17:12:19 +
Venumadhav Josyula  wrote:

> Hi Alex,
> 
> So I had power cycled it multiple number of times.

I'm only guessing, contact your hardware vendor.  Software can't enable
VT-d until the BIOS provides the necessary firmware tables.  Thanks,

Alex

> ____
> From: Alex Williamson 
> Sent: Friday, 22 November, 2019, 10:30 pm
> To: Venumadhav Josyula
> Cc: vfio-users@redhat.com; Venumadhav Josyula
> Subject: Re: [vfio-users] No IOMMU Groups seen in /sys/kernel/iommu_groups/
> 
> On Fri, 22 Nov 2019 22:13:32 +0530
> Venumadhav Josyula  wrote:
> 
> > So in the bios I need I check for ACPI :DMAR ?  
> 
> It's likely not represented that way, what I'm saying is that the list
> of ACPI tables from your dmesg below should include one named "DMAR".
> It currently does not and until it does, VT-d is not going to work.  A
> BIOS setting to enable the "IOMMU" or "VT-d" is typically required to
> enable support for this table, but I don't know exactly how your BIOS
> exposes it.  If such an option is already enabled in your BIOS, a power
> cycle may be required to complete the process.  Thanks,
> 
> Alex
> 
> [0.00] ACPI: RSDP 7b7fe014 00024 (v02 HP)
> [0.00] ACPI: XSDT 7b7e8188 000EC (v01 HP ProLiant 
> 0001  0113)
> [0.00] ACPI: FACP 7b7f5000 0010C (v05 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: DSDT 7b7e4000 03577 (v02 HP DSDT 
> 0002 HPAG 0002)
> [0.00] ACPI: FACS 7b5dd000 00040
> [0.00] ACPI: UEFI 7b5f3000 00042 (v01 HP ProLiant 
>   )
> [0.00] ACPI: MCEJ 7b7fc000 00130 (v01 HP ProLiant 
> 0001 INTL 010D)
> [0.00] ACPI: SSDT 7b7fb000 00064 (v02 HP SpsNvs   
> 0002 INTL 20130328)
> [0.00] ACPI: HEST 7b7fa000 000A8 (v01 HP ProLiant 
> 0001 INTL 0001)
> [0.00] ACPI: BERT 7b7f9000 00030 (v01 HP ProLiant 
> 0001 INTL 0001)
> [0.00] ACPI: ERST 7b7f8000 00230 (v01 HP ProLiant 
> 0001 INTL 0001)
> [0.00] ACPI: EINJ 7b7f7000 00150 (v01 HP ProLiant 
> 0001 INTL 0001)
> [0.00] ACPI: BGRT 7b7f6000 00038 (v01 HP ProLiant 
> 0002 HP   0113)
> [0.00] ACPI: HPET 7b7f4000 00038 (v01 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: PMCT 7b7f3000 00064 (v01 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: WDDT 7b7f2000 00040 (v01 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: APIC 7b7f1000 0014A (v03 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: MCFG 7b7f 0003C (v01 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: SLIT 7b7ef000 00030 (v01 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: SRAT 7b7ee000 00140 (v03 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: SPMI 7b7ed000 00041 (v05 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: RASF 7b7ec000 00030 (v01 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: SPCR 7b7eb000 00050 (v01 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: MSCT 7b7ea000 00064 (v01 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: BDAT 7b7e9000 00030 (v01 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: PCCT 7b7fd000 0006E (v01 HP ProLiant 
> 0001 HP   0001)
> [0.00] ACPI: SSDT 7b7dd000 0657A (v02 HP PCISSDT  
> 0002 HPAG 0002)
> [0.00] ACPI: SSDT 7b7dc000 001CB (v02 HP TIMESSDT 
> 0002 HPAG 0002)
> [0.00] ACPI: SSDT 7b7db000 002F2 (v01 HP pmab 
> 0001 INTL 20130328)
> 
> 

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] No IOMMU Groups seen in /sys/kernel/iommu_groups/

2019-11-22 Thread Alex Williamson
On Fri, 22 Nov 2019 22:13:32 +0530
Venumadhav Josyula  wrote:

> So in the bios I need I check for ACPI :DMAR ?

It's likely not represented that way, what I'm saying is that the list
of ACPI tables from your dmesg below should include one named "DMAR".
It currently does not and until it does, VT-d is not going to work.  A
BIOS setting to enable the "IOMMU" or "VT-d" is typically required to
enable support for this table, but I don't know exactly how your BIOS
exposes it.  If such an option is already enabled in your BIOS, a power
cycle may be required to complete the process.  Thanks,

Alex

[0.00] ACPI: RSDP 7b7fe014 00024 (v02 HP)
[0.00] ACPI: XSDT 7b7e8188 000EC (v01 HP ProLiant 0001  
0113)
[0.00] ACPI: FACP 7b7f5000 0010C (v05 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: DSDT 7b7e4000 03577 (v02 HP DSDT 0002 
HPAG 0002)
[0.00] ACPI: FACS 7b5dd000 00040
[0.00] ACPI: UEFI 7b5f3000 00042 (v01 HP ProLiant   
)
[0.00] ACPI: MCEJ 7b7fc000 00130 (v01 HP ProLiant 0001 
INTL 010D)
[0.00] ACPI: SSDT 7b7fb000 00064 (v02 HP SpsNvs   0002 
INTL 20130328)
[0.00] ACPI: HEST 7b7fa000 000A8 (v01 HP ProLiant 0001 
INTL 0001)
[0.00] ACPI: BERT 7b7f9000 00030 (v01 HP ProLiant 0001 
INTL 0001)
[0.00] ACPI: ERST 7b7f8000 00230 (v01 HP ProLiant 0001 
INTL 0001)
[0.00] ACPI: EINJ 7b7f7000 00150 (v01 HP ProLiant 0001 
INTL 0001)
[0.00] ACPI: BGRT 7b7f6000 00038 (v01 HP ProLiant 0002 
HP   0113)
[0.00] ACPI: HPET 7b7f4000 00038 (v01 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: PMCT 7b7f3000 00064 (v01 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: WDDT 7b7f2000 00040 (v01 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: APIC 7b7f1000 0014A (v03 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: MCFG 7b7f 0003C (v01 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: SLIT 7b7ef000 00030 (v01 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: SRAT 7b7ee000 00140 (v03 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: SPMI 7b7ed000 00041 (v05 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: RASF 7b7ec000 00030 (v01 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: SPCR 7b7eb000 00050 (v01 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: MSCT 7b7ea000 00064 (v01 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: BDAT 7b7e9000 00030 (v01 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: PCCT 7b7fd000 0006E (v01 HP ProLiant 0001 
HP   0001)
[0.00] ACPI: SSDT 7b7dd000 0657A (v02 HP PCISSDT  0002 
HPAG 0002)
[0.00] ACPI: SSDT 7b7dc000 001CB (v02 HP TIMESSDT 0002 
HPAG 0002)
[0.00] ACPI: SSDT 7b7db000 002F2 (v01 HP pmab 0001 
INTL 20130328)

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] No IOMMU Groups seen in /sys/kernel/iommu_groups/

2019-11-22 Thread Alex Williamson
On Fri, 22 Nov 2019 21:59:28 +0530
Venumadhav Josyula  wrote:

> Hi Alex,
> 
> Pl find the dmesg & cpu model attached.

The CPU supports VT-d (E5-2637v3), but the system firmware does not
seem to provide a DMAR table, which is required for enabling VT-d.
There should be an "ACPI: DMAR ..." line providing location and
information about this table as show for other ACPI tables.  I can only
conclude that VT-d support is not fully or properly enabled in the
BIOS.  Please check the BIOS settings again.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] No IOMMU Groups seen in /sys/kernel/iommu_groups/

2019-11-22 Thread Alex Williamson
On Fri, 22 Nov 2019 15:07:30 +0530
Venumadhav Josyula  wrote:

> Hi All,
> We are trying to use vfio-pci. We have following
> - intel_iommu=on in bios
> - it shows in /proc/cmdline
> 
> [root@vflac2-kvm ~]# cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-3.10.0-957.1.3.el7.x86_64 root=/dev/mapper/c7--kvm-root
> ro crashkernel=auto rd.lvm.lv=c7-kvm/root rd.lvm.lv=c7-kvm/swap rhgb
> quiet *intel_iommu=on
> *isolcpus=2,3,6,7
> [root@vflac2-kvm ~]# dmesg | grep IOM
> *[0.00] DMAR: IOMMU enabled*

As explained in [1] this only indicates the processing of the
intel_iommu=on command line option.  Full dmesg might be required to
understand the issue.

> [root@vflac2-kvm ~]# uname -mrs
> Linux 3.10.0-957.1.3.el7.x86_64 x86_64
> [root@vflac2-kvm ~]# ls /sys/kernel/iommu_groups/
> 
> *[root@vflac2-kvm ~]# ls -al /sys/kernel/iommu_groups/total 0*
> 
> 
> 
> 04:00.0 Ethernet controller: Intel Corporation Ethernet Controller
> 10-Gigabit X540-AT2 (rev 01)
> 04:00.1 Ethernet controller: Intel Corporation Ethernet Controller
> 10-Gigabit X540-AT2 (rev 01)
> 04:10.0 Ethernet controller: Intel Corporation X540 Ethernet Controller
> Virtual Function (rev 01)
> 04:10.2 Ethernet controller: Intel Corporation X540 Ethernet Controller
> Virtual Function (rev 01)
> 05:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for
> 10GbE SFP+ (rev 02)
> 05:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for
> 10GbE SFP+ (rev 02)
> 
> Can u suggest what could wrong ? We  are seeing no iommu_groups getting
> created.

What's the CPU model?  There are some that don't have IOMMU support.
Thanks,

Alex

[1]http://vfio.blogspot.com/2016/09/intel-iommu-enabled-it-doesnt-mean-what.html

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Nested VFIO with QEMU

2019-11-05 Thread Alex Williamson
On Wed, 6 Nov 2019 00:29:52 +0100
Samuel Ortiz  wrote:

> On Tue, Nov 05, 2019 at 01:21:48PM -0700, Alex Williamson wrote:
> > On Fri, 18 Oct 2019 05:48:49 +
> > "Boeuf, Sebastien"  wrote:
> >   
> > > Hi folks,
> > > 
> > > I have been recently working with VFIO, and particularly trying to
> > > achieve device passthrough through multiple layers of virtualization.
> > > 
> > > I wanted to assess QEMU's performances with nested VFIO, using the
> > > emulated Intel IOMMU device. Unfortunately, I cannot make any of my
> > > physical device work when I pass them through, attached to the emulated
> > > Intel IOMMU. Using regular VFIO works properly, but as soon as I enable
> > > the virtual IOMMU, the driver fails to probe (I tried on two different
> > > machines with different types of NIC).
> > > 
> > > So I was wondering if someone was aware of any issue with using both
> > > VFIO and virtual Intel IOMMU with QEMU? I'm sure I might be missing
> > > something obvious but I couldn't find it so far.  
> > 
> > It's not something I test regularly, but I'm under the impression that
> > nested device assignment does work.  When you say the driver fails to
> > probe, which driver is that, the endpoint driver in the L2 guest or
> > vfio-pci in the L1 guest?  Perhaps share your XML or command line?  
> 
> This is fixed now. Apparently the iommu device needs to be passed
> _before_ the other devices on the command line. We managed to make it
> work as expected.

Good news!

> Sebastien and Yi Liu figured this out but for some reasons the
> thread moved to vfio-users-boun...@redhat.com.

Yes, I see some uncaught bounce notifications, it looks like Yi's
initial reply was to vfio-users-bounces.  Yi, you might want to
checkout your mailer configuration.  For posterity/follow-up, I'll
paste the final message from the bounce notification below.  Thanks,

Alex

On Mon, 28 Oct 2019 08:13:23 +
"Liu, Yi L"  wrote:

> Hi Sebastien,
> 
> That’s great it works for you. I remember there was an effort
> to fix it in community. But I cannot recall if it was documented.
> If not, I think I can co-work with community to make it clear.
> 
> Regards,
> Yi Liu
> 
> From: Boeuf, Sebastien
> Sent: Friday, October 25, 2019 7:17 PM
> To: Liu, Yi L 
> Cc: Ortiz, Samuel ; vfio-users-boun...@redhat.com; 
> Bradford, Robert 
> Subject: Re: [vfio-users] Nested VFIO with QEMU
> 
> Hi Yi Liu,
> 
> Yes that was it :)
> Thank you very much for your help!
> 
> Is it documented somewhere that parameters order matters?
> 
> Thanks,
> Sebastien
> 
> On Fri, 2019-10-25 at 09:52 +0800, Liu, Yi L wrote:
> Hi Sebastien,
> 
> I guess the cmdline is cause. You should put the intel-iommu exposure prior 
> to other devices as below.
> 
> -drive if=none,id=drive0,format=raw,file=/home/sebastien/clear-kvm.img \
> -device intel-iommu,intremap=on,caching-mode=on
> -device virtio-blk-pci,drive=drive0,scsi=off \
> -device virtio-rng-pci \
> -device vfio-pci,host=00:19.0 \
> 
> Regards,
> Yi Liu
> 
> From: Boeuf, Sebastien
> Sent: Friday, October 25, 2019 7:14 AM
> To: Liu, Yi L mailto:yi.l@intel.com>>
> Cc: Ortiz, Samuel mailto:samuel.or...@intel.com>>; 
> vfio-users-boun...@redhat.com<mailto:vfio-users-boun...@redhat.com>; 
> Bradford, Robert mailto:robert.bradf...@intel.com>>
> Subject: Re: [vfio-users] Nested VFIO with QEMU
> 
> Hi Yi Liu,
> 
> On Tue, 2019-10-22 at 11:01 +0800, Liu, Yi L wrote:
> 
> Hi Sebastien,
> 
> 
> 
> > From: vfio-users-boun...@redhat.com<mailto:vfio-users-boun...@redhat.com> 
> > [mailto:vfio-users-  
> 
> > boun...@redhat.com<mailto:boun...@redhat.com>] On Behalf Of Boeuf, 
> > Sebastien  
> 
> > Sent: Friday, October 18, 2019 1:49 PM  
> 
> > To: vfio-users@redhat.com<mailto:vfio-users@redhat.com>  
> 
> > Cc: Ortiz, Samuel mailto:samuel.or...@intel.com>>; 
> > Bradford, Robert  
> 
> > mailto:robert.bradf...@intel.com>>  
> 
> > Subject: [vfio-users] Nested VFIO with QEMU  
> 
> >  
> 
> > Hi folks,  
> 
> >  
> 
> > I have been recently working with VFIO, and particularly trying to  
> 
> > achieve device passthrough through multiple layers of virtualization.  
> 
> >  
> 
> > I wanted to assess QEMU's performances with nested VFIO, using the  
> 
> > emulated Intel IOMMU device. Unfortunately, I cannot make any of my  
> 
> > physical device work when I pass them through, attached to the  
> 
> > emulated Intel

Re: [vfio-users] Posted interrupts for nested virtualization?

2019-11-05 Thread Alex Williamson
On Fri, 18 Oct 2019 06:08:31 +
"Boeuf, Sebastien"  wrote:

> Hi folks,
> 
> We have been recently implementing a nested VFIO solution for our Cloud
> Hypervisor VMM. Thanks to virtio-iommu, we can now pass a device
> through nested virtualization.
> 
> After some performances testing, we realized the device in L2 was
> slightly less performant than the same device in L1. The DMAR has
> nothing to do with it since it is programmed only at boot time with the
> entire guest RAM mapping for each device, but the problem comes from
> the interrupts. When using polling instead of interrupts, we can
> clearly see the device in L2 behaving very closely to what is expected
> in L1.
> 
> So basically, because the interrupt has to bounce through each layer of
> virtualization, the more layers the less performance we will get out of
> this device, right?

Yes

> We were wondering if there was any on going work to allow posted
> interrupts to be delivered directly to the last layer of virtualization
> where the device is actually used. I might be missing something but in
> theory, it seems feasible, the problem seems to be that VFIO does not
> update the IRT of the IOMMU, so I don't know how we could manually
> update it, the same way we can update the DMAR with VFIO_MAP_DMA ioctl
> for instance.
> And the second part is about knowing which interrupt needs to be
> updated in the IRT, which means the virtio-iommu device should be able
> to provide those kind of information.
> 
> It seems like there are some possibilities but I clearly don't have the
> whole picture in mind, and I'm interested in learning more about any
> existing work on this topic.

I agree with your assessment, it seems like it should be possible, but
I don't know of any work to support it.  Note that vfio doesn't
directly manipulate posted interrupts, it's done via the irqbypass
producer/consumer mechanism, so perhaps this largely needs to be done
in KVM.  I can't say I've spent many cycles thinking about it either
though.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Nested VFIO with QEMU

2019-11-05 Thread Alex Williamson
On Fri, 18 Oct 2019 05:48:49 +
"Boeuf, Sebastien"  wrote:

> Hi folks,
> 
> I have been recently working with VFIO, and particularly trying to
> achieve device passthrough through multiple layers of virtualization.
> 
> I wanted to assess QEMU's performances with nested VFIO, using the
> emulated Intel IOMMU device. Unfortunately, I cannot make any of my
> physical device work when I pass them through, attached to the emulated
> Intel IOMMU. Using regular VFIO works properly, but as soon as I enable
> the virtual IOMMU, the driver fails to probe (I tried on two different
> machines with different types of NIC).
> 
> So I was wondering if someone was aware of any issue with using both
> VFIO and virtual Intel IOMMU with QEMU? I'm sure I might be missing
> something obvious but I couldn't find it so far.

It's not something I test regularly, but I'm under the impression that
nested device assignment does work.  When you say the driver fails to
probe, which driver is that, the endpoint driver in the L2 guest or
vfio-pci in the L1 guest?  Perhaps share your XML or command line?
Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users



Re: [vfio-users] Passing arbitrary IRQ to guest?

2019-08-28 Thread Alex Williamson
On Wed, 28 Aug 2019 09:39:57 -0700
Micah Morton  wrote:

> On Mon, Aug 5, 2019 at 11:14 PM Gerd Hoffmann  wrote:
> >
> > On Mon, Aug 05, 2019 at 12:50:00PM -0700, Micah Morton wrote:  
> > > On Thu, Aug 1, 2019 at 10:36 PM Gerd Hoffmann  wrote:  
> > > >
> > > >   Hi,
> > > >  
> > > > > From my perspective, as a low-speed device where we don't really need
> > > > > the benefits of an IOMMU, I'd be more inclined to look at why it
> > > > > doesn't work with evdev.  We already have a tablet device in QEMU,
> > > > > what's it take to connect that to evdev?  Cc'ing Gerd as maybe he's
> > > > > already though about touchpad support.  Thanks,  
> > > >
> > > > It's not clear why the touchpad doesn't work.  Possibly using libinput
> > > > helps, https://git.kraxel.org/cgit/qemu/log/?h=sirius/display-drm has
> > > > some code.  Wiring up to input-linux isn't done yet though, only the
> > > > drm ui uses libinput support so far.  
> > >
> > > To be clear are you saying that its a known issue that the touchpad
> > > doesn't work in VM guest with QEMU and evdev?  
> >
> > There are other reports of touchpad problems.  I don't know whenever
> > that is a general problem or specific to some devices.
> >
> > libinput knows quirks for lots of input devices.  When passing through
> > the evdev to the guest as virtio device libinput can't see the device
> > identity and thus can't apply quirks.  Which might be the reason the
> > touchpad doesn't work.  Using libinput on the host side might fix this.
> >
> > cheers,
> >   Gerd
> >  
> 
> I was able to get physical passthrough of the touchpad working in the
> VM guest by forwarding the IRQ to the guest using the kvm/qemu/vfio
> framework.
> 
> So basically I wrote extensions to kvm/qemu/vfio to allow for
> forwarding arbitrary IRQs to the guest (the IRQ doesn't have to be
> associated with any vfio-pci or vfio-platform device). I could clean
> up the patches and upstream them (or think about it) if you folks
> think anyone else might want to use this functionality? Then again as
> Alex said before you still need to communicate to the VM which IRQ to
> use for this device (in my case I did this by modifying ACPI stuff in
> SeaBIOS, not sure how it could be incorporated into vfio).

This seems like something that's not too difficult to hack together,
but quite a lot harder to generalize into something that's useful
beyond this specific hardware.  There's a path to do so via the vfio
API, using a device specific interrupt to expose this IRQ and a
capability to convey how that IRQ is associated so that QEMU could
automatically create some AML.  Defining that interaction is far from
trivial, but before we even approach that, how does vfio-pci learn to
associate this IRQ with a device without growing a full software stack
specific to the PCI device, or class of PCI devices?  We have some
hacks in vfio, but they're usually for devices that can work on any
system, not specific devices on specific systems.  I wouldn't be
willing to support that unless it's at least got some obvious
extensibility to work elsewhere.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Passing arbitrary IRQ to guest?

2019-07-31 Thread Alex Williamson
On Wed, 31 Jul 2019 09:05:54 -0700
Micah Morton  wrote:

> Hi Alex,
> 
> I've noticed that when doing device passthrough with VFIO, if an IRQ
> in the host machine is associated with a PCI device that's being
> passed to the guest, then the IRQ is automatically forwarded into the
> guest by kvm/vfio. Is is possible to tell qemu to forward arbitrary
> IRQs to the guest as well?

Nope :-\
 
> My situation is that I am trying to make a touchpad device work in the
> VM guest. This device sits behind an i2c controller on the host. When
> I pass through the i2c controller (PCI device), its IRQ automatically
> gets passed to the guest as well, so everything works on this level.
> My problem is that the touchpad device behind the i2c controller is
> wired up to use a specific IRQ for interrupts, and since qemu has no
> idea this needs to be forwarded along with the i2c controller, it
> doesn't get forwarded. Everything else with the i2c controller and
> touchpad works as intended in the guest, including aspects of the
> touchpad functionality that don't require interrupts.
> 
> I was hoping there was an easy way to tell qemu to forward this IRQ to
> the guest (maybe in the same code path where the IRQ for the i2c
> controller is forwarded?) -- even though it is not associated with any
> PCI device in the host. If not, maybe you could point me to where in
> the code I would need to hack to pass an extra IRQ to the guest? Or is
> it much more complicated and intractable than I'm making it out to be?

Hmm, I think it's actually pretty tricky.  The PCI i2c device really
just provides you with access to an entirely separate bus.  The access
to that bus works ok because the interface is fully exposed within the
PCI endpoint, but devices on the i2c bus are not.  I'd guess there's an
ACPI object describing the interrupt for that device, so even if we
could transport the IRQ into the VM, you'd likely also need to add an
ACPI blob to relay that interrupt association to the device in the
guest.  I'd also wonder if there are things on the i2c bus that we
don't want a user to have access to and whether assigning the full
controller to the user is really a wise idea.  I'd also expect that the
IOMMU can only provide isolation and translation at the PCI requester
ID level, so as soon as the IOMMU becomes involved we can't really
manage user DMA vs host DMA at the same time.  Too bad the touchpad
isn't a USB endpoint.

In checking whether QEMU has i2c passthrough support, I did find this:

https://wiki.qemu.org/Google_Summer_of_Code_2019#I2C_Passthrough

I'm not sure if anyone signed up for that, but maybe it includes some
breadcrumbs.

It seems that your touchpad is really more of a platform device that
happens to be exposed via i2c, where the i2c controller happens to live
on PCI.  That doesn't necessarily make vfio-pci a great target for
assignment, you'll only get assignment of the i2c interface and not
external dependencies of devices on that i2c bus.  A USB passthrough or
maybe a vfio mediated device like approach might be a better option,
allowing the i2c bus to be owned by a host driver, but exposing
specific endpoints to a VM.

OTOH, do you really need to expose the i2c device in the guest, or
would relaying through the evdev interface be sufficient?

https://passthroughpo.st/using-evdev-passthrough-seamless-vm-input/

Thanks,
Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Fwd: [systemd-devel] Proper way for systemd service to wait mdev gvt device initialization

2019-06-21 Thread Alex Williamson
On Mon, 27 May 2019 08:20:08 +0300
gnido...@ya.ru wrote:

> Alex, thanks for the utlity!
> For now we just have used systemd.path helper [1] to wait for gvt device 
> availability and fire up the service when its ready as was suggested by 
> Jordan Glover.
> 
> [1] https://www.freedesktop.org/software/systemd/man/systemd.path.html
> 
> 26.05.2019, 22:54, "Alex Williamson" :
> > On Sun, 26 May 2019 21:28:36 +0300
> > Alex Ivanov  wrote:
> >  
> >>  Could Intel fix that?  
> >
> > I won't claim that mdev-core is bug free in this area, but it's
> > probably worth noting that mdev support isn't necessarily a fundamental
> > feature of the parent device, it could theoretically be enabled much
> > later than the add uevent for the parent device itself. We may instead
> > want to trigger a change uevent when it becomes available.
> >
> > Also in this space, I've been working on a very preliminary mediated
> > device management and persistence utility:
> >
> > https://github.com/awilliam/mdevctl
> >
> > It works around this issue by simply waiting a short time for mdev
> > support to appear after the device is added. This also could be much
> > more deterministic with a change uevent. Perhaps it's useful for what
> > you're trying to achieve though. Thanks,

I could use some systemd advice here.  I'm working on adding a change
uevent when a parent device is registered or unregistered with mdev.
For example, I add the following udev rules for the mdev devices
themselves:

ACTION=="add", SUBSYSTEM=="mdev", TAG+="systemd"
ACTION=="remove", SUBSYSTEM=="mdev", TAG+="systemd"

example:
  
sys-devices-pci:00-:00:02.0-b0a3989f\x2d8138\x2d4d49\x2db63a\x2d59db28ec8b48.device
   loaded active plugged   
/sys/devices/pci:00/:00:02.0/b0a3989f-8138-4d49-b63a-59db28ec8b48

These seem to be necessary to get device units created and removed and
then I can use BindsTo= in the unit service template such that the
service is automatically stopped when an mdev device is removed.

To start mdev devices configured as automatic, I use the following rule
keying on the change event that I'm adding (this would be called on
the parent device, ex. :00:02.0:

ACTION=="change", ENV{MDEV_STATE}=="registered", 
TEST=="/etc/mdevctl.d/$kernel", PROGRAM="/usr/libexec/mdevctl get-systemd-mdevs 
$kernel" TAG+="systemd", ENV{SYSTEMD_WANTS}="$result"

This looks for the specific MDEV_STATE uevent, tests whether we have a
configuration for the device, gets a space separated list of mdev
devices to automatically start via the template mentioned above, ex:

mdev@sys-devices-pci:00-:00:02.0-b0a3989f\x2d8138\x2d4d49\x2db63a\x2d59db28ec8b48.service

(The service includes BindsTo=%i.device)

The rule adds the systemd tag and ENV{SYSTEMD_WANTS}= for the above
service unit.

So far so good, I end up with the device unit above and service
unit running when the parent registers with mdev:

  
mdev@sys-devices-pci:00-:00:02.0-b0a3989f\x2d8138\x2d4d49\x2db63a\x2d59db28ec8b48.service
 loaded active exitedManage persistent mdev 
sys/devices/pci:00/:00:02.0/b0a3989f-8138-4d49-b63a-59db28ec8b48

If I unload the kvmgt module, which unregisters the parent device with
mdev, both the .service and the .device are automatically removed,
which is exactly what I want.

Then I reload the kvmgt module... and nothing happens.

I read here:

https://www.freedesktop.org/software/systemd/man/systemd.device.html

  Note that systemd will only act on Wants= dependencies when a device
  first becomes active. It will not act on them if they are added to
  devices that are already active. Use SYSTEMD_READY= (see below) to
  configure when a udev device shall be considered active, and thus
  when to trigger the dependencies.

I assume this is why it only works once, and maybe I only get that once
because I'm lucky that no previous rule has made systemd consider the
device active.  I can make this work reliably if I add
ENV{SYSTEMD_READY}="1" on the change event where the parent device is
registered and ="0" when unregistered, but I'm afraid I'm just setting
myself up for a bad time by doing that.

What would be the proper way to trigger SYSTEMD_WANTS repeatedly based
on a change event like this?  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Question about iommu groups

2019-06-18 Thread Alex Williamson
On Tue, 18 Jun 2019 16:47:58 +0100
James Courtier-Dutton  wrote:

> On Tue, 18 Jun 2019 at 16:18, Alex Williamson 
> wrote:
> 
> > [cc +vfio-users]
> >
> > You need a version of the hot reset unit test that accepts multiple
> > devices since each is in a separate group.  The grouping on the Asus
> > system you provided is preferred, it's not a problem.  Thanks,
> >
> >  
> Do you have such a version, or shall I craft one ?

I think the attached one works, pass:

   [ ]...

Thanks,

Alex
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 

void usage(char *name)
{
	printf("usage: %s  \n", name);
}

#define false 0
#define true 1

int main(int argc, char **argv)
{
	int i, j, ret, container, *pfd;
	char path[PATH_MAX];

	struct vfio_group_status group_status = {
		.argsz = sizeof(group_status)
	};

	struct vfio_pci_hot_reset_info *reset_info;
	struct vfio_pci_dependent_device *devices;
	struct vfio_pci_hot_reset *reset;

	struct reset_dev {
		int groupid;
		int seg;
		int bus;
		int dev;
		int func;
		int fd;
		int group;
	} *reset_devs;

	if (argc < 3) {
		usage(argv[0]);
		return -1;
	}

	printf("Expect %d group/device pairs\n", (argc - 1)/2);

	reset_devs = calloc((argc - 1)/2, sizeof(struct reset_dev));
	if (!reset_devs)
		return -1;

	for (i = 0; i < (argc - 1)/2; i++) {
		ret = sscanf(argv[i*2 + 1], "%d", _devs[i].groupid);
		if (ret != 1) {
			usage(argv[0]);
			return -1;
		}

		ret = sscanf(argv[i*2 + 2], "%04x:%02x:%02x.%d",
			 _devs[i].seg, _devs[i].bus,
			 _devs[i].dev, _devs[i].func);
		if (ret != 4) {
			usage(argv[0]);
			return -1;
		}

		printf("Using PCI device %04x:%02x:%02x.%d in group %d "
	   "for hot reset test\n", reset_devs[i].seg,
		   reset_devs[i].bus, reset_devs[i].dev,
		   reset_devs[i].func, reset_devs[i].groupid);
	}

	container = open("/dev/vfio/vfio", O_RDWR);
	if (container < 0) {
		printf("Failed to open /dev/vfio/vfio, %d (%s)\n",
		   container, strerror(errno));
		return container;
	}

	for (i = 0; i < (argc - 1)/2; i++) {
		snprintf(path, sizeof(path), "/dev/vfio/%d",
			 reset_devs[i].groupid);
		reset_devs[i].group = open(path, O_RDWR);
		if (reset_devs[i].group < 0) {
			printf("Failed to open %s, %d (%s)\n",
			path, reset_devs[i].group, strerror(errno));
			return reset_devs[i].group;
		}

		ret = ioctl(reset_devs[i].group, VFIO_GROUP_GET_STATUS,
			_status);
		if (ret) {
			printf("ioctl(VFIO_GROUP_GET_STATUS) failed\n");
			return ret;
		}

		if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
			printf("Group not viable, are all devices attached to vfio?\n");
			return -1;
		}

		ret = ioctl(reset_devs[i].group, VFIO_GROUP_SET_CONTAINER,
			);
		if (ret) {
			printf("Failed to set group container\n");
			return ret;
		}

		if (i == 0) {
			ret = ioctl(container, VFIO_SET_IOMMU,
VFIO_TYPE1_IOMMU);
			if (ret) {
printf("Failed to set IOMMU\n");
return ret;
			}
		}

		snprintf(path, sizeof(path), "%04x:%02x:%02x.%d",
			 reset_devs[i].seg, reset_devs[i].bus,
			 reset_devs[i].dev, reset_devs[i].func);

		reset_devs[i].fd = ioctl(reset_devs[i].group,
	 VFIO_GROUP_GET_DEVICE_FD, path);
		if (reset_devs[i].fd < 0) {
			printf("Failed to get device %s\n", path);
			return -1;
		}
	}

	reset_info = malloc(sizeof(*reset_info));
	if (!reset_info) {
		printf("Failed to alloc info struct\n");
		return -ENOMEM;
	}

	reset_info->argsz = sizeof(*reset_info);

	ret = ioctl(reset_devs[0].fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO,
		reset_info);
	if (ret && errno == ENODEV) {
		printf("Device does not support hot reset\n");
		return 0;
	}
	if (!ret || errno != ENOSPC) {
		printf("Expected fail/-ENOSPC, got %d/%d\n", ret, -errno);
		return -1;
	}

	printf("Dependent device count: %d\n", reset_info->count);

	reset_info = realloc(reset_info, sizeof(*reset_info) +
			 (reset_info->count * sizeof(*devices)));
	if (!reset_info) {
		printf("Failed to re-alloc info struct\n");
		return -ENOMEM;
	}

	reset_info->argsz = sizeof(*reset_info) +
 (reset_info->count * sizeof(*devices));
	ret = ioctl(reset_devs[0].fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO,
		reset_info);
	if (ret) {
		printf("Reset Info error\n");
		return ret;
	}

	devices = _info->devices[0];

	for (i = 0; i < reset_info->count; i++)
		printf("%d: %04x:%02x:%02x.%d group %d\n", i,
		   devices[i].segment, devices[i].bus,
		   devices[i].devfn >> 3, devices[i].devfn & 7,
		   devices[i].group_id);

	printf("Attempting reset: ");
	fflush(stdout);

	reset 

Re: [vfio-users] Question about iommu groups

2019-06-18 Thread Alex Williamson
[cc +vfio-users]

On Tue, 18 Jun 2019 15:33:36 +0100
James Courtier-Dutton  wrote:

> Hi,
> 
> I could not see anywhere it mentioning ACS in the dmesg logs, so I don't
> think it is using ACS.
> 
> Is there some way to tell from the logs that ACS is involved.

See previous reply.

> I have 2 AMD Threadripper 1950 systems.
> One has a Gigabyte motherboard, and the IOMMU groups are OK and what I
> expect.
> One has an ASUS motherboard, and the IOMMU groups are problematic.
> I have run the same kernel on both, so I am beginning to suspect a BIOS
> problem.
> 
> Note also, your "vfio-pci-hot-reset.c"  test program works on the Gigabyte,
> but not the ASUS.
> The kernel used in 5.1.5 from mainline, so no ACS patch applied.
> It has my vfio slots patch applied, but nothing else.
> The IOMMU problem also happens with mainline kernel 5.1.11 with no patches
> applied.

You need a version of the hot reset unit test that accepts multiple
devices since each is in a separate group.  The grouping on the Asus
system you provided is preferred, it's not a problem.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Question about iommu groups

2019-06-18 Thread Alex Williamson
[cc +vfio-users]

On Tue, 18 Jun 2019 15:23:08 +0100
James Courtier-Dutton  wrote:

> Hi,
> 
> Attaching dmesg and lspci -nnvvv

Vega 10 seems to have ACS, great.  The root ports and the downstream
switch ports also support ACS, great.  The grouping seems correct from
the bits I checked.  System board for reference for others:

DMI: System manufacturer System Product Name/ROG STRIX X399-E GAMING, BIOS 1002 
02/15/2019

Some decoding below, most of the relevant I/O devices are nicely
separated except for the hierarchy below 01:00.x which are all grouped
together due to lack of ACS between the functions at 01:00.x
(ASMedia :-P)

00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h 
(Models 00h-0fh) PCIe GPP Bridge [1022:1453] (prog-if 00 [Normal decode])
NUMA node: 0
Bus: primary=00, secondary=01, subordinate=08, sec-latency=0
Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00
Capabilities: [2a0 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ 
EgressCtrl- DirectTrans+
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ 
EgressCtrl- DirectTrans-

01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X399 
Series Chipset USB 3.1 xHCI Controller [1022:43ba] (rev 02) (prog-if 30 [XHCI])
01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] X399 
Series Chipset SATA Controller [1022:43b6] (rev 02) (prog-if 01 [AHCI 1.0])
01:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] X399 
Series Chipset PCIe Bridge [1022:43b1] (rev 02) (prog-if 00 [Normal decode])
NUMA node: 0
Bus: primary=01, secondary=02, subordinate=08, sec-latency=0
Capabilities: [80] Express (v2) Upstream Port, MSI 00

02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 
300 Series Chipset PCIe Port [1022:43b4] (rev 02) (prog-if 00 [Normal decode])
NUMA node: 0
Bus: primary=02, secondary=03, subordinate=03, 
sec-latency=0
Capabilities: [80] Express (v2) Downstream Port 
(Slot+), MSI 00

03:00.0 Network controller [0280]: Realtek 
Semiconductor Co., Ltd. RTL8822BE 802.11a/b/g/n/ac WiFi adapter [10ec:b822]

02:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 
300 Series Chipset PCIe Port [1022:43b4] (rev 02) (prog-if 00 [Normal decode])
NUMA node: 0
Bus: primary=02, secondary=04, subordinate=04, 
sec-latency=0
Capabilities: [80] Express (v2) Downstream Port 
(Slot+), MSI 00

02:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 
300 Series Chipset PCIe Port [1022:43b4] (rev 02) (prog-if 00 [Normal decode])
NUMA node: 0
Bus: primary=02, secondary=05, subordinate=05, 
sec-latency=0
Capabilities: [80] Express (v2) Downstream Port 
(Slot+), MSI 00

05:00.0 Ethernet controller [0200]: Intel Corporation 
I211 Gigabit Network Connection [8086:1539] (rev 03)

02:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 
300 Series Chipset PCIe Port [1022:43b4] (rev 02) (prog-if 00 [Normal decode])
NUMA node: 0
Bus: primary=02, secondary=06, subordinate=06, 
sec-latency=0
Capabilities: [80] Express (v2) Downstream Port 
(Slot+), MSI 00

02:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 
300 Series Chipset PCIe Port [1022:43b4] (rev 02) (prog-if 00 [Normal decode])
NUMA node: 0
Bus: primary=02, secondary=07, subordinate=07, 
sec-latency=0
Capabilities: [80] Express (v2) Downstream Port 
(Slot+), MSI 00

02:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 
300 Series Chipset PCIe Port [1022:43b4] (rev 02) (prog-if 00 [Normal decode])
NUMA node: 0
Bus: primary=02, secondary=08, subordinate=08, 
sec-latency=0
Capabilities: [80] Express (v2) Downstream Port 
(Slot+), MSI 00

08:00.0 USB controller [0c03]: ASMedia Technology Inc. 
ASM2142 USB 3.1 Host Controller [1b21:2142] (prog-if 30 [XHCI])

00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h 
(Models 00h-0fh) PCIe GPP Bridge [1022:1453] (prog-if 00 [Normal decode])
NUMA node: 0
Bus: primary=00, secondary=09, subordinate=09, sec-latency=0
Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00
Capabilities: [2a0 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ 
EgressCtrl- DirectTrans+
ACSCtl: SrcValid+ 

Re: [vfio-users] about vfio interrupt performance

2019-06-18 Thread Alex Williamson
On Tue, 18 Jun 2019 19:43:39 +0800
James  wrote:

> Hi Alex:
> 
> Many thanks for your detailed feedback and great helps!
> 
> 1, yes, make our drive into upstream will also solve this problem :)
> 
> 2, got it, checking device's some status register mapped via
> vfio persistently will be a better solution compare with eventfd if
> interrupt rate is high.
> 
> 3, "1, Some device’s extend configuration space will have problem when
> accessing by random."
> It means I remeber some guy reported that when they try to access extend
> configuration space via vfio framwork, sometimes they'll get access error,
> not all device have this problem(it only happen to extend configuration
> space), it happen rarely.
> I forget the link of this issue, not sure if you have some comments to this
> kind of issue, so sorry to mislead you and waster your time..

I don't recall issues here.  Config space sometimes returns error due
to failed bus reset at the device or root port (typically AMD platforms
or GPUs), otherwise there are VM machine restrictions on extended
config space (conventional vs express machines), but I don't know of
any persistent issues beyond that.

> 4, and to "2, When try to access the device’s space which in the same iommu
> groups at the same time, it will trigger issue by random"
> You mean if we can not sperate the device into different iommu groups, we'd
> better not access two device which in the same groups at the same time.

An IOMMU group is the minimum unit of ownership for a user.  A user can
make use of all endpoints within an IOMMU group, but devices within the
group cannot be split between multiple users or between user and host
use cases.  The IOMMU group represents the smallest unit of isolation.
Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Question about iommu groups

2019-06-18 Thread Alex Williamson
On Tue, 18 Jun 2019 11:43:58 +0100
James Courtier-Dutton  wrote:

> Hi,
> 
> In the following list of iommu groups, I am wondering why sub-functions on
> the same PCIe card are not being given the same IOMMU group as I would
> expect.

I can't provide any specifics without further details, a full 'sudo
lspci -nnvvv' listing would be required to determine if the grouping is
correct.  It is possible for PCI functions to be in separate groups if
ACS is provided to indicate isolation between the functions or if the
vendor has verified isolation exists and quirks are included in the
kernel.

> For example, I would have expected the GPU and the HMDI Audio for that GPU
> to be in the same IOMMU group.

I have yet to see a GPU that supports ACS, so my guess would be that
this is induced by the unsupported ACS override patch.  Please provide
full dmesg.
 
> I am asking, because, with the current IOMMU groups, a vfio bus reset fails.

Bus resets across multiple IOMMU groups are possible, but all of the
affected devices need to be bound to vfio-pci and either owned by the
same user or unused by other users.  If assigning all the functions to
a single VM, bus reset should work regardless of the IOMMU grouping.
Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] about vfio interrupt performance

2019-06-17 Thread Alex Williamson
On Mon, 17 Jun 2019 16:00:42 +0800
James  wrote:

> Hi Experts:
> 
> Sorry to disturb you.
> 
> 
> 
> I failed to find any valid data about vfio interrupt performance in
> community, so send mail to you boldly.
> 
> 
> 
> We have a pcie device work on x86 platform, and no VM in our env,  I plan
> to replace the kernel side device driver with vfio framework, reimplement
> it in user space after enable vfio/vfio_pci/vfio_iommu_type1 in kernel. The
> original intention is just to get rid of the dependents to kernel, let our
> application which need to access our pcie device to be a pure application,
> let it can run on other linux distribution(no custom kernel driver need).

Wouldn't getting your driver upstream also solve some of these issues?

> Our pcie device have the following character:
> 
> 1, have a great deal of interrupt when working
> 
> 2, and also have high demand to interrupt’s processing speed.

There will be more interrupt latency for a vfio userspace driver, the
interrupt is received on the host and signaled to the user via an
eventfd.  Hardware accelerators like APICv and Posted Interrupts are
not available outside of a VM context.  Whether the overhead is
acceptable is something you'll need to determine.  It may be beneficial
to switch to polling mode at high interrupt rate as network devices
tend to do.  DPDK is a userspace driver that makes use of vfio for
device access, but typically uses polling rather that interrupt driven
data transfer AIUI.
 
> 3, it will need to access almost all bar space after mapping.

This is not an issue.

> Here want to check with you, compare with previous kernel side device
> driver, if there are huge decrease for interrupt’s processing speed when
> the interrupt numbers are huge in short time?
> 
> How about your comments to my this attemptation, if it’s valueble to move
> driver to userspace in this kind of situation(no vm, huge interrupt numbers
> etc..).

The description implies you're trying to avoid open sourcing your
device driver by moving it to a userspace driver.  While I'd rather run
an untrusted driver in vfio as a userspace driver, this potentially
makes it inaccessible to users where the hardware or lack of isolation
provided by the platform prevent them from making use of your device.

> BTW, I found there are some random issue when using vfio in community, such
> as:
> 
> 1, Some device’s extend configuration space will have problem when
> accessing by random.
> 
> 2, When try to access the device’s space which in the same iommu groups at
> the same time, it will trigger issue by random.
> 
> 
> 
> If this kind of issue have relation with IOMMU’s hardware limitation, or if
> we can bypass it via some method for now?

The questions are not well worded to understand the issues you're
trying to note here.  Some portions of config space are emulated or
virtualized by the vfio kernel driver, some by QEMU.  Since you won't
be using QEMU, you don't have the latter.  The QEMU machine type and
VM PCI topology also determines the availability of extended config
space, these are VM specific issues.  The IOMMU grouping is definitely
an issue.  IOMMU groups cannot be shared therefore usage of the device
might be restricted to physical configurations where IOMMU isolation is
provided.  The ACS override patch that some people here use is not and
will not be upstreamed, so it should not be considered as a requirement
for the availability of your device.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] [PATCH] Passthough of one GPU with a PC with 2 identical GPUs installed

2019-06-17 Thread Alex Williamson
On Sat, 15 Jun 2019 14:30:49 +0100
James  wrote:

> Hi,
> 
> Please find attached a kernel patch. This is based from a very old patch 
> that never made it into the kernel in 2014. 
> https://lkml.org/lkml/2014/10/20/295. I am not sure who else I should be 
> adding to the Signed-off-by section.
> 
> I have modified it and tested it, so that it works against kernel 5.1.10.
> 
> Summary below:
> 
> 
>     PCI: Introduce new device binding path using pci_dev.driver_override
> 
>      In order to bind PCI slots to specific drivers use:
> pci=driver[:xx:xx.x]=foo,driver[:xx:xx.x]=bar,...
> 
>      The main use case for this is in relation to qemu passthrough
>      of a pci device using IOMMU and vfio-pci.
>      Example:
>      The host has two identical devices. E.g. 2 AMD Vega GPUs,
>      The user wishes to passthrough only one GPU.
>      This new feature allows the user to select which GPU to passthrough.
> 
>      Signed-off-by: James Courtier-Dutton 

vfio-users is not a development list, patches should not be posted here
with the intent of upstream inclusion, especially patches outside of
the vfio driver itself.

Upstream patches should be posted inline, not as attachments.  Messages
with this sort of attachment are likely to get rejected by upstream
lists.

This is simply a reposting of patch that was thoroughly discussed
upstream in the link you provided.  None of the issues raised in that
thread have been addressed in this reposting.

While I like the idea of a driver_override command line option, there
are existing mechanisms to deal with two identical cards with different
driver binding requirements.  I'd suggest using a modprobe.d install
command to perform the driver_override on the device instance intended
for use with vfio-pci, for example:

install amdgpu echo vfio-pci > 
/sys/bus/pci/devices/:02:00.0/driver_override; modprobe --ignore-install 
amdgpu

Thanks,
Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Question about integrated GPU passthrough and initialization

2019-06-12 Thread Alex Williamson
On Wed, 12 Jun 2019 08:49:36 -0700
Micah Morton  wrote:

> Hi Alex,
> 
> Thanks for the help on this earlier, I was able to get IGD passthrough
> working on my device (In case you're interested, crbug.com/970820 has
> further details on the changes we needed to make to the kernel/i915
> driver to get things working).
> 
> I have a couple follow up questions that hopefully you can get to when
> you get a chance:
> 
> 1) I see in https://patchwork.kernel.org/patch/9115161/ that you wrote
> a patch for SeaBIOS to reserve stolen memory for the IGD device. It
> seems that it is hard-coded to reserve 8MB of stolen memory. I wanted
> to reserve 64MB, but wasn't able to locate the "etc/igd-bdsm-size"
> file that should be able to configure this. Where does this file live?
> Its easy enough for me to hard-code SeaBIOS to 64MB, but I was curious
> if there's a cleaner way to set this.

There's a QEMU vfio-pci device option x-igd-gms= which (according to
the commit log[1], because I've forgotten how this works 3yrs later)
specifies the number of 32MB chunks of additional stolen memory
allocated.  So theoretically you could just add x-igd-gms=2 for 64MB.


> 2) Even when SeaBIOS reserves a region for stolen memory, the VM
> kernel has a hard time realizing the region is there and available. So
> far I just hard-coded the kernel/i915 driver since I know the
> address/range of the memory region that SeaBIOS will reserve for
> stolen memory in my case. It requires some hard-coding both in the
> kernel 
> (https://elixir.bootlin.com/linux/v4.14.114/source/arch/x86/kernel/early-quirks.c#L436)
> and in the driver
> (https://elixir.bootlin.com/linux/v4.14.114/source/drivers/gpu/drm/i915/i915_gem_stolen.c#L404).
> Are you aware of any discussions around this problem? I wanted to see
> if it has already been discussed before looking at proposing some kind
> of patch to the kernel/driver.

The fact that there are genX versions of those sorts of sizing and init
routines is exactly part of the problem with assigning IGD.  There's
no standard and the hardware folks change things as they please.  Maybe
the QEMU code doesn't do quite the right thing for your hardware
generation.  IIRC I developed the QEMU quirks on Broadwell and older
hardware and I don't have the time or hardware access to tweak it for
every new chip.  Intel also can't seem to maintain a consistent story
about whether they're reducing or increasing their dependencies on
external components like chipset registers, stolen memory, and firmware
tables.  I expect the best hope for reliable IGD in a VM with physical
display output might come through GVT-g (ie. vGPU), but the physical
display part of that isn't supported yet and I don't see a strong
commitment to enable GVT-g across the product line.  Thanks,

Alex

[1]https://git.qemu.org/?p=qemu.git;a=commit;h=c4c45e943e519f5ac220f7af1afb2a0025d03c54

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Question about integrated GPU passthrough and initialization

2019-06-03 Thread Alex Williamson
On Mon, 3 Jun 2019 14:38:49 -0700
Micah Morton  wrote:

> Hi Alex,
> 
> Could you remind me whether there is a minimum recommended kernel
> version to be running in the VM guest when doing GPU passthrough?
> 
> I'm fine running 4.14 in the host, but was looking to see if I could
> run 4.4 in the guest and couldn't remember if it is advised to use a
> newer kernel in the guest or if there is any reason to have the guest
> kernel match the host kernel version?

I don't recall any specific guest version dependency.  I've got an old
4.9 guest around and that's about the only direct IGD assignment VM I
test on a semi-regular basis.  I do keep that host relatively updated,
so it's currently a 5.0/4.9 combination.  No reason to keep them in sync
that I know of.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Passthru problems with a Vega GPU

2019-05-30 Thread Alex Williamson
On Thu, 30 May 2019 19:24:03 +0100
James Courtier-Dutton  wrote:

> On Thu, 30 May 2019 at 18:54, James Courtier-Dutton 
> wrote:
> 
> > lspci -vvv on host:
> > 43:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> > Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c3) (prog-if 00 [VGA controller])
> > Subsystem: ASUSTeK Computer Inc. Vega 10 XT [Radeon RX Vega 64]
> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR+ FastB2B- DisINTx+
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > SERR-  > Latency: 0, Cache Line Size: 64 bytes
> > Interrupt: pin A routed to IRQ 107
> > NUMA node: 1
> > Region 0: Memory at 8000 (64-bit, prefetchable) [size=256M]
> > Region 2: Memory at 9000 (64-bit, prefetchable) [size=2M]
> > Region 4: I/O ports at 3000 [size=256]
> > Region 5: Memory at 9fe0 (32-bit, non-prefetchable) [size=512K]
> > Expansion ROM at 9fe8 [disabled] [size=128K]
> >
> > lspci -vvv on guest VM:
> > 07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> > Vega 10 XT [Radeon RX Vega 64] (rev c3) (prog-if 00 [VGA controller])
> > Subsystem: ASUSTeK Computer Inc. Vega 10 XL/XT [Radeon RX Vega 56/64]
> > Physical Slot: 0-6
> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> > Stepping- SERR+ FastB2B- DisINTx+
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-  > SERR-  > Latency: 0, Cache Line Size: 64 bytes
> > Interrupt: pin A routed to IRQ 42
> > Region 0: Memory at 4 (64-bit, prefetchable) [size=8G]
> > Region 2: Memory at 3 (64-bit, prefetchable) [size=2M]
> > Region 4: I/O ports at d000 [size=256]
> > Region 5: Memory at fdc0 (32-bit, non-prefetchable) [size=512K]
> > Expansion ROM at fdc8 [disabled] [size=128K]
> >
> >
> > For "Region 0", why is it "size=8G" in guest, but "size=256M" on host
> >
> > When I try to use the card in the guest, I get errors like:
> > [  137.290586] amdgpu :07:00.0: [gfxhub] no-retry page fault (src_id:0
> > ring:158 vmid:8 pasid:32769, for process  pid 0 thread  pid 0)
> > [  137.292141] amdgpu :07:00.0:   in page starting at address
> > 0x7f983d952000 from 27
> > [  137.293164] amdgpu :07:00.0:
> > VM_L2_PROTECTION_FAULT_STATUS:0x0080093C
> > [  137.294065] Evicting PASID 32769 queues
> >
> > The same linux kernel 5.1.5 is on the host and the guest.
> >
> > Now, the GPU has 8GB of RAM, but I don't see how the guest memory window
> > can be larger than the host can do.
> >
> > Can anyone help?
> >
> > Kind Regards
> >
> > James
> >
> >
> >  
> Some more info.
> The guest, before the amdgpu driver is loaded:
> Region 0: Memory at d000 (64-bit, prefetchable) [size=256M]
> The guest, after the amdgpu driver is loaded:
> Region 0: Memory at 4 (64-bit, prefetchable) [size=8G]
> 
> So, the amdgpu driver is trying to expand the window to 8G, and it succeeds
> in the guest, and it re-programs the bridge and the gpu pci windows to
> match.
> But, it does not manage to do the same to the host PCI device.
> 
> Is there any test I can do to force a limit of 256M in the guest?

Hmm, resizable BAR support on the GPU?  (lspci -vvv)  If it's done
through the PCIe capability, it should be read-only.  Try giving the
QEMU patch below a try for QEMU to entirely hide the capability on the
device.  I'm confused how the guest driver could be activating such a
change, but the window through which we access the device doesn't
change dynamically from this, so it's not surprising that the device
wouldn't work correctly if the driver thinks it has direct access to
that memory.  I haven't personally seen devices with resizeable BARs to
implement the support.  If you load an unload the amdgpu driver on the
host, by chance will it also increase the BAR size and maybe leave it
increased?  Thanks,

Alex

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8cecb53d5cf6..ac9ef9323ef4 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2118,6 +2118,7 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 case 0: /* kernel masked capability */
 case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
 case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
+case PCI_EXT_CAP_ID_REBAR:
 trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
 break;
 default:

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] [PATCH] PCI: Mark Intel bridge on SuperMicro Atom C3xxx motherboards to avoid bus reset

2019-05-29 Thread Alex Williamson
On Wed, 29 May 2019 17:03:07 -0500
Bjorn Helgaas  wrote:

> [+cc Alex]
> 
> On Fri, May 24, 2019 at 05:31:18PM +0200, Maik Broemme wrote:
> > The Intel PCI bridge on SuperMicro Atom C3xxx motherboards do not
> > successfully complete a bus reset when used with certain child devices.
> > After the reset, config accesses to the child may fail. If assigning
> > such device via VFIO it will immediately fail with:
> > 
> >   vfio-pci :01:00.0: Failed to return from FLR
> >   vfio-pci :01:00.0: timed out waiting for pending transaction;
> >   performing function level reset anyway  
> 
> I guess these messages are from v4.13 or earlier, since the "Failed to
> return from FLR" text was removed by 821cdad5c46c ("PCI: Wait up to 60
> seconds for device to become ready after FLR"), which appeared in
> v4.14.
> 
> I suppose a current kernel would fail similarly, but could you try it?
> I think a current kernel would give more informative messages like:
> 
>   not ready XXms after FLR, giving up
>   not ready XXms after bus reset, giving up
> 
> I don't understand the connection here: the messages you quote are
> related to FLR, but the quirk isn't related to FLR.  The quirk
> prevents a secondary bus reset.  So is it the case that we try FLR
> first, it fails, then we try a secondary bus reset (does this succeed?
> you don't mention an error from it), and the device remains
> unresponsive and VFIO assignment fails?
> 
> And with the quirk, I assume we still try FLR, and it still fails.
> But we *don't* try a secondary bus reset, and the device magically
> works?  That's confusing to me.

As a counter point, I found a system with this root port in our test
environment.  It's not ideal as this root port has a PCIe-to-PCI bridge
downstream of it with a Matrox graphics downstream of that.  I can't
use vfio-pci to reset this hierarchy, but I can use setpci, ex:

# lspci -nnvs 00:09.0
00:09.0 PCI bridge [0604]: Intel Corporation Device [8086:19a4] (rev 11) 
(prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 26
Memory at 108000 (64-bit, non-prefetchable) [size=128K]
Bus: primary=00, secondary=03, subordinate=04, sec-latency=0
I/O behind bridge: None
Memory behind bridge: 8400-848f [size=9M]
Prefetchable memory behind bridge: 8200-83ff 
[size=32M]
# lspci -nnvs 03:00.0
03:00.0 PCI bridge [0604]: Texas Instruments XIO2000(A)/XIO2200A PCI 
Express-to-PCI Bridge [104c:8231] (rev 03) (prog-if 00 [Normal decode])
Flags: fast devsel
Bus: primary=00, secondary=00, subordinate=00, sec-latency=0
I/O behind bridge: -0fff [size=4K]
Memory behind bridge: -000f [size=1M]
Prefetchable memory behind bridge: -000f 
[size=1M]

(resources are reset from previous experiments)

# setpci -s 00:09.0 3e.w=40:40
# lspci -nnvs 03:00.0
03:00.0 PCI bridge [0604]: Texas Instruments XIO2000(A)/XIO2200A PCI 
Express-to-PCI Bridge [104c:8231] (rev ff) (prog-if ff)
!!! Unknown header type 7f

(bus in reset, config space unavailable, EXPECTED)

# setpci -s 00:09.0 3e.w=0:40
[root@intel-harrisonville-01 devices]# lspci -nnvs 03:00.0
03:00.0 PCI bridge [0604]: Texas Instruments XIO2000(A)/XIO2200A PCI 
Express-to-PCI Bridge [104c:8231] (rev 03) (prog-if 00 [Normal decode])
Flags: fast devsel
Bus: primary=00, secondary=00, subordinate=00, sec-latency=0
I/O behind bridge: -0fff [size=4K]
Memory behind bridge: -000f [size=1M]
Prefetchable memory behind bridge: -000f 
[size=1M]

(bus out of reset, downstream config space is available again)

I'm also confused about the description of this device:

On Fri, 24 May 2019 20:41:13 +0200 Maik Broemme  wrote:
> Also I've tried a PCI-E switch from PLX technology, sold by MikroTik, the
> RouterBoard RB14eU. It is exports 4 Mini PCI ports in one PCI-E port and
> I tried it with one card and multiple cards.
> 
> All these devices start to work once I enabled the bus reset quirk. The
> RB14eU even allows to assign the individual Mini PCI-E ports to
> different VMs and survive independent resets behind the PLX bridge.

To me this describes a topology like:

[RP]---[US]-+-[DS]--[EP]
+-[DS]--[EP]
+-[DS]--[EP]
\-[DS]--[EP]

(RootPort/UpstreamSwitch/DownstreamSwitch/EndPoint)

We can only assigned endpoints to VMs through vfio, therefore if we
need to reset the EP via a bus reset, that reset would occur at the
downstream switch point, not the root port.  It doesn't make sense that
a quirk at the RP would resolve anything about this use case.

Also, per the Intel datasheet, this is not the only root port in this
processor and presumably they'd all work the same way, so handling one
ID as a special case seems wrong regardless.  Thanks,

Alex

> > Device will disappear from PCI device list:
> 

Re: [vfio-users] Question about integrated GPU passthrough and initialization

2019-05-29 Thread Alex Williamson
On Wed, 29 May 2019 09:25:59 -0700
Micah Morton  wrote:

> So as I mentioned, the ChromeOS firmware writes the location of the
> OpRegion to the ASLS PCI config register
> (https://github.com/coreboot/coreboot/blob/master/src/drivers/intel/gma/opregion.c#L88).
> The i915 driver then gets the address for the OpRegion from that
> register here: 
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/i915/intel_opregion.c#L910.
> This all works for Chrome OS, but when we run a VM with SeaBIOS the
> ASLS PCI config register doesn't get written with the location of the
> OpRegion.:
> [0.263640] in i915_driver_init_hw (I added this)
> ...
> [0.263922] in intel_opregion_setup (and this)
> [0.263954] graphic opregion physical addr: 0x0 <-- This is
> supposed to point to the OpRegion, not be zero.
> [0.263954] ACPI OpRegion not supported!
> ...
> [0.267727] Failed to find VBIOS tables (VBT)
> 
> I'm also not sure if the OpRegion is actually in VM memory or not. Do
> you think I need to find a way to put the OpRegion in VM memory as we
> have seen coreboot (Chrome OS firmware) do above? Or should using
> "x-igd-opregion=on" somehow ensure that the OpRegion makes it into VM
> memory? Clearly I at least need to find a way to set that ASLS PCI
> config register in the VM or modify the i915 driver that runs in the
> guest so it can find the OpRegion.

In QEMU, vfio_pci_igd_opregion_init() adds the opregion to a fw_cfg
file "etc/igd-opregion" and makes the (virtual) ASLS register
writable.  Then in SeaBIOS, any Intel vendor ID, PCI class VGA device
will trigger the intel_igd_setup() function, which looks for the fw_cfg
file, allocates space for it, and writes the GPA back to the ASLS
register.  That's at least how it's supposed to work, which again
reminds me for the umpteenth time that x-igd-opregion only works with
SeaBIOS as OVMF has rejected this support in favor of an option ROM
based solution, which Intel never provided.  I think you're using
SeaBIOS though so, so as long as that's not an ancient version it
should do the little dance here.  The ASLS is writable though, we don't
do any write-once tricks, so something could blindly stomp on it.  You
might enable logging in SeaBIOS, it will emit some spew for the
OpRegion support.  You could also enable tracing to see the write of
the ASLS into QEMU.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Question about integrated GPU passthrough and initialization

2019-05-28 Thread Alex Williamson
On Tue, 28 May 2019 09:35:16 -0700
Micah Morton  wrote:

> Ah ok thanks!
> 
> The qemu command line i was using is here: `qemu-system-x86_64
> -chardev stdio,id=seabios -device
> isa-debugcon,iobase=0x402,chardev=seabios -m 2G -smp 2 -M pc -vga none
> -usbdevice tablet -cpu host,-invpcid,-tsc-deadline,check -drive
> 'file=/path/to/image.bin,index=0,media=disk,cache=unsafe,format=raw'
> -enable-kvm -device
> vfio-pci,x-igd-opregion=on,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2,rombar=0
> -device 'virtio-net,netdev=eth0' -netdev
> 'user,id=eth0,net=10.0.2.0/27,hostfwd=tcp:127.0.0.1:9222-:22'`
> 
> It didn't work, but now at least I know why:
> [0.316117] i915 :00:02.0: No more image in the PCI ROM
> [0.316261] [drm] Failed to find VBIOS tables (VBT)
> 
> If I can expose the VBT to the VM maybe it will work :)

Hmm, looking at i915 it seems it didn't find this VBT thing in the
OpRegion so tried to look at the ROM, which comments indicate would
only be the VBT location on an older device.  QEMU should fail if
x-igd-opregion=on is specified but the host kernel didn't provide an
OpRegion at all, so we've at least done some minimal sanity checking at
the host kernel before exposing it, but maybe the OpRegion is missing
some things on this chrome device vs a standard pc?  Maybe Chrome OS
uses a modified i915 driver that doesn't depend on it so the firmware
guys stripped it?  You could write a minimal vfio driver to dump
the opregion data if you want to parse it by hand.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Fwd: [systemd-devel] Proper way for systemd service to wait mdev gvt device initialization

2019-05-26 Thread Alex Williamson
On Sun, 26 May 2019 21:28:36 +0300
Alex Ivanov  wrote:

> Could Intel fix that?

I won't claim that mdev-core is bug free in this area, but it's
probably worth noting that mdev support isn't necessarily a fundamental
feature of the parent device, it could theoretically be enabled much
later than the add uevent for the parent device itself.  We may instead
want to trigger a change uevent when it becomes available.

Also in this space, I've been working on a very preliminary mediated
device management and persistence utility:

https://github.com/awilliam/mdevctl

It works around this issue by simply waiting a short time for mdev
support to appear after the device is added.  This also could be much
more deterministic with a change uevent.  Perhaps it's useful for what
you're trying to achieve though.  Thanks,

Alex

>  Пересылаемое сообщение 
> 20.05.2019, 15:18, "Andrei Borzenkov" :
> 
> On Mon, May 20, 2019 at 10:08 AM Mantas Mikulėnas  wrote:
> >  On Sun, May 19, 2019 at 9:50 PM Alex Ivanov  wrote:  
> >>  Hello.
> >>  What is the proper way to do that? I have a unit that creates gvt device 
> >> in the system
> >>
> >>  ExecStart = "sh -c 'echo a297db4a-f4c2-11e6-90f6-d3b88d6c9525 > 
> >> /sys/bus/pci/devices/:00:02.0/mdev_supported_types/i915-GVTg_V5_8/create'";
> >>  ExecStop = "sh -c 'echo 1 > 
> >> /sys/bus/pci/devices/:00:02.0/a297db4a-f4c2-11e6-90f6-d3b88d6c9525/remove'";
> >>   
> >
> >  Personally, I would use an udev rule:
> >
> >  ACTION=="add", SUBSYSTEM=="pci", ENV{PCI_SLOT_NAME}==":00:02.0", 
> > ATTR{mdev_supported_types/i915-GVTg_V5_8/create}="a297db4a-f4c2-11e6-90f6-d3b88d6c9525"
> >   
> 
> There is a race condition here, driver creates
> .../mdev_supported_types after it has registered device so udev may
> process event before directory is available.
> 
> >  Though on the other hand, a service is a good choice if you want to 
> > `systemctl stop` it later on.
> >
> >  ACTION=="add", SUBSYSTEM=="pci", ENV{PCI_SLOT_NAME}==":00:02.0", 
> > ENV{SYSTEMD_WANTS}+="create-gvt.service"
> >  
> >>  Ideally I would to like to start this service when :00:02.0 device 
> >> appears in the system, but the problem is that 
> >> /sys/bus/pci/devices/:00:02.0/mdev_supported_types/ tree is populated 
> >> later, so my service will fail.
> >>
> >>  So the question what is the proper way to fix that.  
> >
> >  If the driver doesn't populate its sysfs entries in time, maybe it at 
> > least generates 'change' uevents? (udevadm --monitor)  
> 
> I would tentatively say this is driver bug. This directory is created
> during initial device setup, not in response to some event later. From
> https://github.com/torvalds/linux/blob/master/Documentation/driver-model/device.txt:
> 
> --><--  
> As explained in Documentation/kobject.txt, device attributes must be
> created before the KOBJ_ADD uevent is generated.
> --><--  
> 
> Note that some drivers even disable KOBJ_ADD notification during
> device_register() and trigger it manually later, after sysfs layout is
> complete. I cannot evaluate whether this directory can be created and
> populated before device_register().
> 
> >  If there are no uevents either, well, there's nothing you can do from 
> > systemd's side. (Other than making a script that loops repeatedly checking 
> > "is it there yet? is it there yet?")  
> 
> Should really be fixed on kernel side.
>  Конец пересылаемого сообщения 
> 
> ___
> vfio-users mailing list
> vfio-users@redhat.com
> https://www.redhat.com/mailman/listinfo/vfio-users


___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Question about integrated GPU passthrough and initialization

2019-05-24 Thread Alex Williamson
On Fri, 24 May 2019 14:10:03 -0700
Micah Morton  wrote:

> On Fri, May 24, 2019 at 12:50 PM Alex Williamson
>  wrote:
> >
> > On Fri, 24 May 2019 11:12:41 -0700
> > Micah Morton  wrote:
> >  
> > > I’ve been working with an Intel Chrome OS device to see if integrated
> > > GPU passthrough works. The device is a 7th Generation (Kaby Lake)
> > > Intel Core i5-7Y57 with HD Graphics 615. So no discrete GPU. I think
> > > iGPU passthrough should work on this device.
> > >
> > > Initializing the graphics hardware has proven to be the trickiest part
> > > in all of this, and I have a question I thought someone on this list
> > > might be able to answer:
> > >
> > > Why does vfio enforce that I _need_ a VGA rom to be available in order
> > > to boot the guest in legacy passthrough mode
> > > (https://github.com/qemu/qemu/blob/master/hw/vfio/pci-quirks.c#L1616)?  
> >
> > "Legacy" mode is largely about setting up components to make the VGA
> > ROM work.  Some of the components of legacy mode are faking stolen
> > memory (which really only works for the ROM, not the OS driver), adding
> > the LPC bridge (not sure of the OS driver dependency on this, but I
> > think it can be manually added if needed), and adding the OpRegion
> > (which can also be manually added).  If you only care about OS level
> > graphics initialization then you probably only need the OpRegion
> > support, which Intel vGPU also needs but Intel can't provide a
> > consistent story about how it should work.  So why do you think you
> > need legacy assignment mode?  Granted UPT mode is just about a
> > forgotten ideal at Intel, but legacy mode doesn't really solve the
> > stolen memory dependencies for anyone but the VGA ROM.  
> 
> Ah ok. Yeah I have no reason to think I need "legacy" mode apart from
> the fact that I wanted the GPU to be the "primary and exclusive
> graphics device in the VM". Is the only alternative UPT mode, which
> "does not support direct video output"
> (https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/docs/igd-assign.txt#14)?
> So I can definitely pass through the device to the guest and get the
> i915 driver in the guest to attach to it on some level:
> 
> ~ # lspci -nnk (in the guest)
> ...
> 00:02.0 VGA compatible controller [0300]: Intel Corporation Device
> [8086:591e] (rev 02)
> Subsystem: Intel Corporation Device [8086:2212]
> Kernel driver in use: i915
> 
> I guess the question for me now is whether I need OpRegion support in
> the guest (or anything else) for the guest driver to successfully use
> the GPU (I just see a black screen right now and haven't yet enabled
> much in the way of meaningful logs for i915). Do you have any pointers
> to resources on how the OpRegion can be added? Or am I better off
> contacting Intel engineers?

If you're using QEMU by command line, just add x-igd-opregion=on to the
set of device parameters for the vfio-pci device.  libvirt does not
provide a supported way to add this (the 'x-' prefix indicates an
experimental option and is used here because it's rather a hack), but
you can use  elements to pass it, such as documented here:
http://vfio.blogspot.com/2016/09/passing-qemu-command-line-options.html
It's entirely possible that this alone will be enough to light up the
display.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Question about integrated GPU passthrough and initialization

2019-05-24 Thread Alex Williamson
On Fri, 24 May 2019 11:12:41 -0700
Micah Morton  wrote:

> I’ve been working with an Intel Chrome OS device to see if integrated
> GPU passthrough works. The device is a 7th Generation (Kaby Lake)
> Intel Core i5-7Y57 with HD Graphics 615. So no discrete GPU. I think
> iGPU passthrough should work on this device.
> 
> Initializing the graphics hardware has proven to be the trickiest part
> in all of this, and I have a question I thought someone on this list
> might be able to answer:
> 
> Why does vfio enforce that I _need_ a VGA rom to be available in order
> to boot the guest in legacy passthrough mode
> (https://github.com/qemu/qemu/blob/master/hw/vfio/pci-quirks.c#L1616)?

"Legacy" mode is largely about setting up components to make the VGA
ROM work.  Some of the components of legacy mode are faking stolen
memory (which really only works for the ROM, not the OS driver), adding
the LPC bridge (not sure of the OS driver dependency on this, but I
think it can be manually added if needed), and adding the OpRegion
(which can also be manually added).  If you only care about OS level
graphics initialization then you probably only need the OpRegion
support, which Intel vGPU also needs but Intel can't provide a
consistent story about how it should work.  So why do you think you
need legacy assignment mode?  Granted UPT mode is just about a
forgotten ideal at Intel, but legacy mode doesn't really solve the
stolen memory dependencies for anyone but the VGA ROM.

> The Intel device I’m working with normally doesn’t initialize the GPU
> at all until the kernel is running, at which point the i915 driver
> does the initialization. So there is never actually any VGA rom
> anywhere on the system at any time for me to grab. I suppose I maybe
> could get one for this hardware from Intel or the web somewhere, but
> seems like if the device can normally initialize the GPU hardware from
> the kernel driver then it wouldn’t be too unreasonable to pass through
> the GPU and let the guest (with the same i915 driver) initialize the
> GPU without using a VGA rom. Do you think it would be reasonable for
> me to find a way to patch qemu/vfio/SeaBIOS to allow for this
> scenario?

I'm still not sure what you're missing versus the options we provide
otherwise.  If your dependency is actually on stolen memory, then
trying to re-enable the hack for legacy mode is likely not to work for
the OS driver.  The fix for that would mean having the host dictate to
the VM an address space hole such that stolen memory could be identity
mapped into the VM (plus safety things like clearing that memory
between users to avoid leaking data).  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] [PATCH] PCI: Mark Intel bridge on SuperMicro Atom C3xxx motherboards to avoid bus reset

2019-05-24 Thread Alex Williamson
On Fri, 24 May 2019 17:31:18 +0200
Maik Broemme  wrote:

> The Intel PCI bridge on SuperMicro Atom C3xxx motherboards do not
> successfully complete a bus reset when used with certain child devices.

What are these 'certain child devices'?  We can't really regression
test to know if/when the problem might be resolved if we don't know
what to test.  Do these devices reset properly in other systems?  Are
there any devices that can do a bus reset properly on this system?  We'd
really only want to blacklist bus reset on this root port(?) if this is
a systemic problem with the root port, which is not clearly proven
here.  Thanks,

Alex

> After the reset, config accesses to the child may fail. If assigning
> such device via VFIO it will immediately fail with:
> 
>   vfio-pci :01:00.0: Failed to return from FLR
>   vfio-pci :01:00.0: timed out waiting for pending transaction;
>   performing function level reset anyway
> 
> Device will disappear from PCI device list:
> 
>   !!! Unknown header type 7f
>   Kernel driver in use: vfio-pci
>   Kernel modules: ddbridge
> 
> The attached patch will mark the root port as incapable of doing a
> bus level reset. After that all my tested devices survive a VFIO
> assignment and several VM reboot cycles.
> 
> Signed-off-by: Maik Broemme 
> ---
>  drivers/pci/quirks.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 0f16acc323c6..86cd42872708 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3433,6 +3433,13 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 
> 0x0034, quirk_no_bus_reset);
>   */
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);
>  
> +/*
> + * Root port on some SuperMicro Atom C3xxx motherboards do not successfully
> + * complete a bus reset when used with certain child devices. After the
> + * reset, config accesses to the child may fail.
> + */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x19a4, quirk_no_bus_reset);
> +
>  static void quirk_no_pm_reset(struct pci_dev *dev)
>  {
>   /*

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Passthru of one GPU with a PC with 2 identical GPUs installed

2019-05-22 Thread Alex Williamson
On Wed, 22 May 2019 21:47:20 +0100
James Courtier-Dutton  wrote:

> On Wed, 22 May 2019 at 00:11, Alex Williamson 
> wrote:
> 
> >
> > I think a better approach would be to extend the pci= kernel command
> > line option to include driver_override support, perhaps something like:
> >
> > pci=...,driver_overrides==;=,
> >
> >
> > OK, lets assume for a moment, that I will try to implement the above.  
> Mindful of the fact that if there is a bug in this code, it might upset a
> far wider amount of people than my previous suggestion that would only
> affect vfio-pci users.
> I understand that rather than the driver selectively ignoring a device
> probe for a specific device, this method would restrict the probe call
> itself, to only call the mentioned driver.
> 
> Use Case 1: I can understand how the pci=...  line can go in the kernel
> boot command line and kind of force the driver_binding of pci_dev and its
> driver.
> Use Case 2: What I am not so sure about is how this behavior can be
> modified after boot. Say, after boot, someone wished to un-bind it from one
> driver, and then bind it to a different one, how would we undo the
> driver_overrides so that it could then call a device probe on a different
> driver?

PCI devices expose driver_override in sysfs, this mechanism would
simply provide an initial value for that rather than null.  The device
can be returned to the normal driver by unbinding, writing an empty
string to the driver_override for the device, and forcing a driver
probe.  The driverctl utility already facilitates this, see 'driverctl
unset-override'.  What cannot be undone afaik is should the device be
removed and re-added, the override would be put back in place.  Seems
like a narrow use case though for that though.

> Use Case 3: A pci_dev is set on the command line, but it is not yet present
> in the system, but we wish the setting to be there for when that device is
> hot-plugged.

Should work.  The override should be part of the device discovery
process before any driver would have a chance to claim it, it shouldn't
matter whether the device exists from boot or is added later.
 
> Are there any other use cases I should consider?
> Might we wish to consider an option so the user can, for a specific device,
> unbind it from one driver and bind it to a different one.

Already exists, this is exactly why driver_override was added to the
kernel.

> I have also noticed this interface:
> https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci
> Where :00:19.0 format is used for the bind/unbind methods, and not your
> device path suggestion.
> For consistency, would we need to add support for device path to the
> bind/unbind methods?

It could be added, but I don't think it's necessary or a prerequisite.
The bind/unbind interfaces are not persistent, the device has already
been given this address in the current environment, that's not the case
at boot time.

This has been something on my todo list for a while, so I appreciate
you taking a swing at it.  I'm willing to help or maybe take over if
it becomes too much.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Passthru of one GPU with a PC with 2 identical GPUs installed

2019-05-21 Thread Alex Williamson
On Tue, 21 May 2019 23:45:22 +0100
James Courtier-Dutton  wrote:

> On Sun, 19 May 2019 at 09:30, James Courtier-Dutton 
> wrote:
> 
> > Hi,
> >
> > I have a PC with two identical GPUs.
> > One I wish to hand over to vfio and do passthru with, the other I wish the
> > host to use.
> > I know about commands like:
> > echo 1002 687f  >/sys/bus/pci/drivers/vfio-pci/new_id
> >
> > But those will cause both GPUs to be claimed by vfio.
> > I would prefer to do it by slot, e.g. echo 09:00.0  
> > >/sys/bus/pci/drivers/vfio-pci/new_id  
> > But, I cannot see how to do it by slot instead of PCI-ID.
> >
> >  
> How about something like the attached:
> It lets you future filter down what vfio-pci will claim by entering them as
> module options.

For submitting patches, please see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst

I think a better approach would be to extend the pci= kernel command
line option to include driver_override support, perhaps something like:

pci=...,driver_overrides==;=,

This would follow the disable_acs_redir= option as an example where the
device is specified by the path to the device, such that it identifies a
persistent device regardless of bus number changes (see lspci -P).  It
would also make use of the existing driver_override functionality for
all PCI devices rather than unique to the vfio-pci driver.  In fact,
several bus types support driver_override, so we might want to consider
whether the format should be something like:

driver_overrides=pci:=,pci:=

I find having two levels of filtering, like in the patch proposed
confusing and difficult to use (also it ignores PCI domains, see lspci
-D).  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] PXE boot failed if GPU mounted.

2019-05-20 Thread Alex Williamson
On Tue, 21 May 2019 10:00:46 +0800
Eddie Yen  wrote:

> Yes. The QEMU version is 2.11.2 (qemu-2.11.2-4.fc28)
> How can I doing the separate test by using QEMU 4.0?

QEMU 4.0 is available for fc28 in the virt preview repo:

https://fedoraproject.org/wiki/Virtualization_Preview_Repository

Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] PXE boot failed if GPU mounted.

2019-05-20 Thread Alex Williamson
On Tue, 21 May 2019 09:07:34 +0800
Eddie Yen  wrote:

> Hi Alex,
> 
> Here's VM profile.
> Basically this VM is created from virt-manager, and guest OS is Ubuntu so
> no need to add capabilities for Windows environment.
> 
> https://pastebin.com/VZM63ZsC
> 
> And GPU for this VM is Tesla P100. But we also found the same issue on
> another GPU which is Tesla P4.

The VM is QEMU 2.11 machine type, are you using QEMU 2.11?  If so does
this also occur on QEMU 4.0?  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] PXE boot failed if GPU mounted.

2019-05-20 Thread Alex Williamson
On Mon, 20 May 2019 15:52:56 +0800
Eddie Yen  wrote:

> Hi everyone,
> 
> I'm not sure it's VFIO or pure KVM issue on here.
> 
> Now we have one GPU server which contains few Tesla GPUs. Installed Fedora
> 28 and using VFIO to passthrough GPU into VM.
> Everything is OK, except one annoying things that we using PXE to install
> guest OS. But PXE can't get any DHCP IP if GPU mounted. We have to un-mount
> GPU first then re-mount once OS installation completed.
> 
> For PXE, we use host bridge to connect host VLAN, then let PXE boot vnet
> going to connect that bridge.
> I'm not sure the root cause, probably from PCI setup inside VM.
> Does anyone have idea?

Can you post the VM config?  What model Tesla?  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Avoiding VFIO NOIOMMU taint in safe situations

2019-04-04 Thread Alex Williamson
On Wed, 3 Apr 2019 23:31:22 -0500
Shawn Anastasio  wrote:

> On 4/3/19 10:23 PM, Alex Williamson wrote:
> > On Wed, 3 Apr 2019 22:01:14 -0500
> > Shawn Anastasio  wrote:
> >   
> >> Hello all,
> >>
> >> I'm currently writing an application that makes use of Qemu's ivshmem
> >> shared memory mechanism, which exposes shared memory regions from the
> >> host via PCI-E BARs. MSI-X interrupts that are tied to host eventfds are
> >> also exposed.
> >>
> >> Since ivshmem doesn't have an in-tree kernel driver, I have been using
> >> VFIO's NOIOMMU mode to interface with the device. This works wonderfully
> >> for both BAR mapping and MSI-X interrupts. Unfortunately though, binding
> >> the ivshmem device to vfio_pci to use it in this way results in a kernel
> >> taint. I understand that this is because without an IOMMU, VFIO/Linux
> >> has no way of preventing devices from performing malicious access to
> >> other system memory. In the case of ivshmem though, the device does not
> >> have any DMA capabilities.  
> > 
> > The MSI-X interrupt is a DMA.  
> I hadn't realized this. That means then without an IOMMU, an
> MSI-X capable device is capable of reading/writing arbitrary
> memory?

Writing at least, this is why even with an IOMMU there's an opt-in if
that IOMMU lacks interrupt remapping support.

> >> This has created a situation in which the
> >> safest possible way to access the device (a kernel driver would be
> >> inherently less safe, UIO can't access the MSI-X functionality of the
> >> device) results in a kernel taint, when other, less safe methods don't.  
> > 
> > MSI-X support in UIO was rejected because MSI-X is a DMA and UIO does
> > not support devices that do DMA.  Vfio-noiommu was a compromise to
> > allow using the vfio API, but recognizing that it's inherently unsafe.
> >   
> >> In light of this, I propose a change to the VFIO framework that would
> >> allow use cases such as this without a kernel taint. One solution I see
> >> is only tainting when PCI devices with DMA capabilities are bound to
> >> VFIO. It is my understanding that a device's DMA capability can be
> >> determined by checking the Bus Mastering flag in the device's PCI
> >> configuration space, so something like this should be feasible.  
> > 
> > The bus master bit is not a capability for probing, enabling bus master
> > allows a device to perform DMA, including signaling via MSI
> > interrupts.  No bus master, no MSI.
> >   
> >> Perhaps an additional NOIOMMU mode could be introduced which only allows
> >> devices which meet this criteria, too (VFIO_NOIOMMU_NODMA_IOMMU?).
> >> Along with a separate Kconfig option, this would allow users to enable
> >> this safe usage at kernel build time, while still preventing the
> >> possibility of an unsafe DMA capable device from being used.
> >>
> >> I'm curious to hear feedback on this. If this is something that can be
> >> merged, I'd be more than happy to write a patch.  
> > 
> > Add a vIOMMU to your VM configuration (ie. intel-iommu) and use proper
> > vfio in the guest.  Thanks,  
> I had looked into this, but my application also targets ppc64, and a
> cross-platform is therefore necessary.
> 
> Strangely enough when booting a VM on ppc64, the kernel /does/ report
> an IOMMU, but there's only 1 group that contains all devices, so it
> doesn't seem usable.

Yes, AIUI ppc64 PAPR machines always have an IOMMU and there is a
SPAPR IOMMU model in vfio.  Maybe work with QEMU ppc64 developers to
figure out how the ivshmem device can be in its own group.  This
probably requires configuring the VM with another PCI host bridge and
attaching the ivshmem device under it.

> I guess it all boils down to this - does this usage of VFIO-NOIOMMU
> with an MSI-X device constitute a security risk? If so, it seems
> I'll have no choice but to write a kernel driver for a cross-platform
> solution.

There is no property we can detect about a PCI device to determine that
it doesn't support DMA.  All PCI device have DMA available to them.
Clearly we can't simply enforce that bus master is never enabled
because that breaks your use case of needing MSI interrupts and
presumes devices actually honor that bit and don't have more nefarious
ways of enabling it.  So if we have no way to know the device
capabilities or the intention of the user, or exploitability of the
user, I don't see how we can create a policy that singles out this use
case as trusted.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Avoiding VFIO NOIOMMU taint in safe situations

2019-04-03 Thread Alex Williamson
On Wed, 3 Apr 2019 22:01:14 -0500
Shawn Anastasio  wrote:

> Hello all,
> 
> I'm currently writing an application that makes use of Qemu's ivshmem
> shared memory mechanism, which exposes shared memory regions from the
> host via PCI-E BARs. MSI-X interrupts that are tied to host eventfds are
> also exposed.
> 
> Since ivshmem doesn't have an in-tree kernel driver, I have been using
> VFIO's NOIOMMU mode to interface with the device. This works wonderfully
> for both BAR mapping and MSI-X interrupts. Unfortunately though, binding
> the ivshmem device to vfio_pci to use it in this way results in a kernel
> taint. I understand that this is because without an IOMMU, VFIO/Linux
> has no way of preventing devices from performing malicious access to
> other system memory. In the case of ivshmem though, the device does not
> have any DMA capabilities.

The MSI-X interrupt is a DMA.

> This has created a situation in which the
> safest possible way to access the device (a kernel driver would be
> inherently less safe, UIO can't access the MSI-X functionality of the
> device) results in a kernel taint, when other, less safe methods don't.

MSI-X support in UIO was rejected because MSI-X is a DMA and UIO does
not support devices that do DMA.  Vfio-noiommu was a compromise to
allow using the vfio API, but recognizing that it's inherently unsafe.

> In light of this, I propose a change to the VFIO framework that would
> allow use cases such as this without a kernel taint. One solution I see
> is only tainting when PCI devices with DMA capabilities are bound to
> VFIO. It is my understanding that a device's DMA capability can be
> determined by checking the Bus Mastering flag in the device's PCI
> configuration space, so something like this should be feasible.

The bus master bit is not a capability for probing, enabling bus master
allows a device to perform DMA, including signaling via MSI
interrupts.  No bus master, no MSI.

> Perhaps an additional NOIOMMU mode could be introduced which only allows
> devices which meet this criteria, too (VFIO_NOIOMMU_NODMA_IOMMU?).
> Along with a separate Kconfig option, this would allow users to enable
> this safe usage at kernel build time, while still preventing the
> possibility of an unsafe DMA capable device from being used.
> 
> I'm curious to hear feedback on this. If this is something that can be
> merged, I'd be more than happy to write a patch.

Add a vIOMMU to your VM configuration (ie. intel-iommu) and use proper
vfio in the guest.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] W10: installing RX Vega 56 drivers locks up VM

2019-03-28 Thread Alex Williamson
On Fri, 29 Mar 2019 10:16:19 +0900
小川寿人  wrote:
> 
> "model name=ioh3420" is Intel emulated pcie-root-port.
> defalut "model name=pcie-root-port" is QEMU Paravirtualized pcie-root-port.

There's nothing paravirtualized about pcie-root-port, it's just a
generic emulated root port rather than one that tries to emulate a
specific real device.  I'd suggest the generic one over the the
ioh3420, but in either case, a PCIe root port attached to a 440FX host
bridge is an abomination.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] about vfio lpc bridge

2019-03-21 Thread Alex Williamson
On Thu, 21 Mar 2019 18:00:23 +0800 (CST)
fulaiyang   wrote:

> Hi Alex:
> 
> Recently I am interested in igd passthrough. On the 'igd-assign' text, I 
> don't understand why the lpc bridge need to be created ? 

To satisfy some versions of the guest driver.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] lspci and vfio_pci_release deadlock when destroy a pci passthrough VM

2019-03-20 Thread Alex Williamson
On Wed, 20 Mar 2019 13:32:33 +
"Wuzongyong (Euler Dept)"  wrote:

> Hi Alex,
> 
> I notice a patch you pushed in https://lkml.org/lkml/2019/2/18/1315
> You said the previous commit you pushed may prone to deadlock, could you 
> please share the details about how to reproduce the deadlock scene if you 
> know it.
> I met a similar question that all lspci command went into D state and 
> libvirtd went into Z state when destroy a VM with a GPU passthrou. The stack 
> like that:
> 
> INFO: task ps:112058 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ps  D  0 112058  1 0x0004
> Call Trace:
>  [] schedule_preempt_disabled+0x29/0x70
>  [] __mutex_lock_slowpath+0xe1/0x170
>  [] mutex_lock+0x1f/0x2f
>  [] pci_bus_save_and_disable+0x37/0x70
>  [] pci_try_reset_bus+0x38/0x80
>  [] vfio_pci_release+0x3d5/0x430 [vfio_pci]
>  [] ? vfio_pci_rw+0xc0/0xc0 [vfio_pci]
>  [] vfio_device_fops_release+0x22/0x40 [vfio]
>  [] __fput+0xec/0x260
>  [] fput+0xe/0x10
>  [] task_work_run+0xaa/0xe0
>  [] do_notify_resume+0x92/0xb0
>  [] int_signal+0x12/0x17
> INFO: task lspci:139540 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> lspci   D  0 139540 139539 0x
> Call Trace:
>  [] schedule+0x29/0x70
>  [] pci_wait_cfg+0xa0/0x110
>  [] ? wake_up_state+0x20/0x20
>  [] pci_user_read_config_dword+0x105/0x110
>  [] pci_read_config+0x114/0x2c0
>  [] ? __kmalloc+0x55/0x240
>  [] read+0xde/0x1f0
>  [] vfs_read+0x9f/0x170
>  [] SyS_pread64+0x92/0xc0
>  [] system_call_fastpath+0x1c/0x21
> 
> It seems that lspci and vfio_pci_release are in deadlock.

pci_dev_lock() will also block PCI config access to the user, but you
don't indicate whether you're running a kernel with the fix above.  In
the case of that fix, the deadlock scenario I'm familiar with is a bus
reset while the device is being released while the device is also being
unbound for the vfio-pci driver.  For example, echo'ing the device to
the vfio-pci driver unbind, which will take the device lock and block
until the device is released by the user, but when the vfio device file
is closed by the user it triggers a bus reset to return the device to
its initial state, which also tries to take the device lock.  If vfio
was in this deadlock scenario, lspci would also get blocked, but it's
not obvious how lspci and vfio-pci alone might get deadlocked with each
other, if that's the situation you're proposing here.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Igd passthrough not working on w10, works with ubuntu

2019-03-11 Thread Alex Williamson
[re-adding vfio-users]

On Mon, 11 Mar 2019 18:23:14 +0100
Cor Saelmans  wrote:

> Thank you Alex for your reply.
> 
> I tested earlier today with a fresh install of my ubuntu host in legacy
> mode and this worked directly without any further customization.
> 
> Just reinstalled my host back to EFI and tried your suggestion, but without
> any result.
> 
> I did not know display output is not supported with upt.
> 
> I reinstalled my host again in legacy mode.

The legacy mode I'm referring to is specifically for the VM, not the
host, see:

https://git.qemu.org/?p=qemu.git;a=blob;f=docs/igd-assign.txt;hb=HEAD

The host can make a difference though, but it's usually in the
availability of the vBIOS.  For instance since we really only support
SeaBIOS in the guest with physical IGD assignment, it may be necessary
to boot the host in legacy mode to dump a legacy compatible ROM image
for the device.  The ROM image file can then be used for the VM
regardless of the host booting via legacy or EFI.  The entire idea of
"support" with physical IGD assignment is a bit of a misnomer, Intel
seems to have mostly abandoned the idea of Universal Passthrough Mode
and IGD legacy mode is barely supportable by it's very nature and lack
of standards within the hardware.  Intel vGPU is closer to supported,
but does not currently have an option for controlling the physical
display output.  There is QEMU "display" support though, which can make
local graphics fairly efficient, but the output is still controlled by
the host drivers.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] "Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff" when attempting to dump PCI ROM

2019-03-11 Thread Alex Williamson
On Mon, 11 Mar 2019 01:46:03 -0400
Nicolas Roy-Renaud  wrote:

> Hey, Alex, thanks for replying.
> 
> It seems like you're right on the Mem- part.
> 
> [user@OCCAM ~]$ lspci -s 07:00.0 -vvv
> 07:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] 
> (rev a1) (prog-if 00 [VGA controller])
>   Subsystem: ASUSTeK Computer Inc. GM204 [GeForce GTX 970]
>   Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR+ FastB2B- DisINTx-
> 
> For comparison, here's the result from last boot after firing up and 
> shutting down a VM using that device:
> 
> [user@OCCAM ~]$ lspci -s 07:00.0 -vvv
> 07:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] 
> (rev a1) (prog-if 00 [VGA controller])
>   Subsystem: ASUSTeK Computer Inc. GM204 [GeForce GTX 970]
>   Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR+ FastB2B- DisINTx-
> 
> Yet when I try enabling the device, I get this :
> 
> [root@OCCAM user]# echo 1 > /sys/bus/pci/devices/:07:00.0/enable
> bash: echo: write error: Device or resource busy

You probably have a driver attached to the device, looks like this
interface won't work in that case.  You could unbind the device from
any driver first.  Alternatively you could manually manipulate the
memory enable bit with setpci:

setpci -s 07:00.0 COMMAND=2:2

Clear after with:

setpci -s 07:00.0 COMMAND=0:2

This won't bring the device out of D3 power state like the enable file
will, so if you still have problems and it's bound to vfio-pci, you
might want to boot with vfio-pci.disable_idle_d3=1 on the kernel
command line.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] "Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff" when attempting to dump PCI ROM

2019-03-10 Thread Alex Williamson
On Sun, 10 Mar 2019 18:06:37 -0400
Nicolas Roy-Renaud  wrote:

> I've seen a lot of people before reccomand VFIO newcomers to flash their 
> GPU if they couldn't get their passthrough working right before, and 
> since I know how potentially risky and avoidable this sort of procedure 
> is (since QEMU lets you just pass your own ROM to the VM to be used 
> instead), and while attempting to go through the steps myself 
>  
> (even though I've had a working VFIO setup for years), got something 
> unexpected.
> 
> Attempting to dump the ROM from my guest card _/freshly /_/_after a 
> reboot_/ results in the following error message :
> 
> cat: '/sys/bus/pci/devices/:07:00.0/rom': Input/output error
> 
> Accompanied by the following like in dmesg :
> 
> [ 1734.316429] vfio-pci :07:00.0: Invalid PCI ROM header signature: 
> expecting 0xaa55, got 0x

If lspci for the device reports:

Control: I/O- Mem- BusMaster- ...

(specifically Mem-), this could be the reason it's failing.  The PCI
ROM BAR is a memory region and memory decode needs to be enabled on the
device in order to get access to it.  Also if you have the device
already bound to vfio-pci, the device might be in a D3 low power state,
which could make that memory region unavailable.  You can use the
'enable' file in sysfs to fix both of these, so your sequence would
look like this:

echo 1 > /sys/bus/pci/devices/:07:00.0/enable
echo 1 > /sys/bus/pci/devices/:07:00.0/rom
cat /sys/bus/pci/devices/:07:00.0/rom > gpu.rom
echo 0 > /sys/bus/pci/devices/:07:00.0/rom
echo 0 > /sys/bus/pci/devices/:07:00.0/enable

Thanks,
Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] Igd passthrough not working on w10, works with ubuntu

2019-03-10 Thread Alex Williamson
On Sat, 9 Mar 2019 23:22:49 +0100
Cor Saelmans  wrote:

> I have Ubuntu 18 as host.
> 
> 
> Created two guests with virt-manager, setup vfio settings like described in
> several guides.
> 
> 
> GPU: Intel Graphics 655
> 
> 
> guest 1: Ubuntu: Passthrough is working. VM boots and after several seconds
> Ubuntu boot screen shows up and it its working.
> 
> 
> 
> guest 2: Windows 10: Passthrough is not working. When booting the VM the
> monitor screen flashes and goes to "no signal input". When I look with
> teamviewer on the machine the graphics pci device is working correctly and
> drivers are installed.
> 
> 
> What is going  wrong?

Display output is not officially supported in "universal passthrough
mode", you can try to enable it with the x-igd-opregion=on option for
the device.  The Windows drivers depend on this.  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] (good) working GPU for passthrough?

2019-02-11 Thread Alex Williamson
On Mon, 11 Feb 2019 20:39:07 -0500
Kyle Marek  wrote:

> On 2/10/19 8:59 PM, Kash Pande wrote:
> > On 2019-02-10 5:25 p.m., Kyle Marek wrote:  
> >> When I quit in the QEMU monitor, the image stays on the screen, and no
> >> further host dmesg output is produced.
> >>  
> > You must do a full reset in the guest.  
> 
> Hmmm... other cards (GTX 780, GTX 1060), are reset when quitting from
> the monitor, so I don't have to rely on a graceful guest shutdown to
> avoid rebooting my host.

NVIDIA cards reset properly from a bus reset.
 
> Do I need to wait for a reset quirk to be implemented for my card to
> reset on quit/boot? It shouldn't be a condition to usage that the guest
> needs to gracefully shutdown for the card to not get stuck.

Don't hold your breath.

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] (good) working GPU for passthrough?

2019-02-11 Thread Alex Williamson
On Mon, 11 Feb 2019 22:06:36 +0100
Tobias Geiger  wrote:

> > On Thu, Jan 17, 2019 at 11:03:03PM +0100, Tobias Geiger wrote:  
> >> Hello!
> >>
> >> after nearly 5 years of passing through my Radeon HD7800 - it feels old and
> >> slow when used with newer games and 1GB of RAM also doesn't feel right
> >> anymore...
> >>
> >> I tried a VEGA 64 - i was able to live with the ACS patch needed here (Z170
> >> Chipset... not needed with the old HD7800, but whatever...) - but i 
> >> couldn't
> >> stand the reset/FLR/Bug which forces you at least to suspend/resume the 
> >> host
> >> when you want to reboot only the guest...
> >>
> >> So my question is - AMD or NVIDIA, i dont care, cheap or superexpensive 
> >> (ok,
> >> not the quadros) - what do you recommend these days for a hassle-free
> >> passthrough experienence (well at least mostly - acs patch needed would be
> >> ok, reset bug not so... it just makes it unhandy to use in day-to-day
> >> scenarios...)  
> > Wouldn't needing the ACS patch depend on your mobo as opposed to your GPU?  
> 
> 
> I thought so, too! But then this: For my old HD7850 i do NOT need ACS 
> patch, to get it working flawlessly in my Z170 Board - it works with a 
> debian standard kernel even;
> 
> The Vega 64 - in the same slot, same board - needs the ACS patch...
> 
> dont ask me why... i can only guess it might have to do with the pci 
> bridges "within" the Vega64... but thats not more than a very uneducated 
> guess

Generally the ACS patch should only depend on the motherboard unless
you're trying to split the individual functions of a card to separate
VMs.  It shouldn't make much difference whether those functions are
separate devices downstream of an on-card switch or functions within a
multi-function device when assigned to a single VM.  The motherboard
ACS override would control things like whether separate physical slots
are grouped together.  We can only speculate without more data though.
Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] How to spoof device (sub)class ID for passthrough devices?

2019-02-11 Thread Alex Williamson
On Sun, 10 Feb 2019 20:01:47 +0100
Björn Ruytenberg  wrote:

> Hi Alex,
> 
> Thanks for your quick response and the patch!
> 
> I am looking into passing through a muxless GeForce GPU to a Windows guest.
> 
> Having been through several resources, passing through muxed and desktop
> cards seems quite straightforward. Either no configuration is necessary,
> or exposing the (UEFI GOP) VBIOS through the ACPI _ROM method will do
> the trick. From what I gather, the latter will also work with the
> proprietary NVIDIA driver on Linux. However, on Windows guests, it will
> simply bail out with error 43.
> 
> I have been doing some ACPI debugging on Windows (using windbg and QEMU,
> which is excellent for this :-)), and it looks like the NVIDIA driver
> does several _DSM calls instead. I'm not entirely sure what these
> methods do. One method contains a number of magic strings such as
> `NVIDIA Certified Optimus Ready Motherboard`, which presumably lets the
> driver verify it's not running in a VM.
> 
> Rather than trying to (partially) replicate the ACPI table from the host
> in the guest, I figured it might be possible to trick the NVIDIA driver
> into detecting a muxed/desktop card. For this I'll need to:
> 
>   1. Find a VBIOS with a UEFI GOP header from a non-muxless GPU, ideally
> one that is the same model (muxed/desktop) or similar (Quadro).
>   2. Spoof the PCI sub vendor and sub device id, or patch the VBIOS to
> have these match my own card.
>   3. Spoof the PCI device class, changing it from 0302 (3D controller,
> i.e. muxless card) to 0300 (VGA device).
> 
> Now that your patch enables the last, I'll try and see if this works. If
> you are interested, I'd be happy to report back the results.

I'm certainly curious to see what you find, I imagine others are too.
When I looked at Optimus on a Thinkpad it looked like some of the _DSM
calls were hooking into SMI services, so they're beyond obfuscated.
Good luck!  Thanks,

Alex

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


  1   2   3   4   5   >