Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-02 Thread Kay, Allen M


> -Original Message-
> From: Tian, Kevin
> Sent: Monday, February 01, 2016 11:08 PM
> To: Kay, Allen M; Alex Williamson; Gerd Hoffmann; qemu-de...@nongnu.org
> Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-us...@redhat.com
> Subject: RE: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> > From: Kay, Allen M
> > Sent: Saturday, January 30, 2016 5:58 AM
> >
> > First of all, I would like to clarify I'm talking about general IGD
> > passthrough case - not specific to KVMGT.  In IGD passthrough
> > configuration, one of the following will happen when the driver accesses
> OpRegion:
> >
> > 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by
> > pre-map it (i.e. Xen) or map it during EPT page fault (i.e. KVM),
> > guest can successfully read the content of the OpRegion and check the ID
> string.  In this case, everything works fine.
> >
> > 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at all,
> > then guest driver's attempt to setup GVA/GPA mapping will fail, which
> > causes the driver to fail.  In this case, guest driver won't have the
> > opportunity to look into the content of OpRegion memory and check the ID
> string.
> >
> 
> Guest mapping of GVA->GPA can always succeed regardless of whether
> GPA->HPA is valid. Failure will happen only when the GVA is actually
> accessed by guest.
> 

That is the data from team debugged IGD passthrough on a closed source 
hypervisor that does not map OpRegion with EPT.  The end result is the same 
-driver cannot access inside of OpRegion without failing.

> I don't understand 2). If hypervisor doesn't want to setup mapping, there is
> no chance for guest driver to get opregion content, right?

That was precisely the point I was trying to make.  As a result, guest driver 
needs some indication from the hypervisor that the address at 0xFC contains GPA 
that can be safely accessed by the driver without causing unrecoverable failure 
on hypervisors that does not map OpRegion - by leaving HPA address at 0xFC.

Allen
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-02 Thread Alex Williamson
On Tue, 2016-02-02 at 19:10 +, Kay, Allen M wrote:
> 
> > -Original Message-
> > From: Tian, Kevin
> > Sent: Monday, February 01, 2016 11:08 PM
> > To: Kay, Allen M; Alex Williamson; Gerd Hoffmann; qemu-de...@nongnu.org
> > Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; Cao jin; vfio-us...@redhat.com
> > Subject: RE: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > > From: Kay, Allen M
> > > Sent: Saturday, January 30, 2016 5:58 AM
> > > 
> > > First of all, I would like to clarify I'm talking about general IGD
> > > passthrough case - not specific to KVMGT.  In IGD passthrough
> > > configuration, one of the following will happen when the driver accesses
> > OpRegion:
> > > 
> > > 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by
> > > pre-map it (i.e. Xen) or map it during EPT page fault (i.e. KVM),
> > > guest can successfully read the content of the OpRegion and check the ID
> > string.  In this case, everything works fine.
> > > 
> > > 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at all,
> > > then guest driver's attempt to setup GVA/GPA mapping will fail, which
> > > causes the driver to fail.  In this case, guest driver won't have the
> > > opportunity to look into the content of OpRegion memory and check the ID
> > string.
> > > 
> > 
> > Guest mapping of GVA->GPA can always succeed regardless of whether
> > GPA->HPA is valid. Failure will happen only when the GVA is actually
> > accessed by guest.
> > 

Hi Allen,

> That is the data from team debugged IGD passthrough on a closed source 
> hypervisor that does not map OpRegion with EPT.  The end result is the same 
> -driver cannot access inside of OpRegion without
> failing.

Define "failing".

> > I don't understand 2). If hypervisor doesn't want to setup mapping, there is
> > no chance for guest driver to get opregion content, right?
> 
> That was precisely the point I was trying to make.  As a result, guest driver 
> needs some indication from the hypervisor that the address at 0xFC contains 
> GPA that can be safely accessed by the
> driver without causing unrecoverable failure on hypervisors that does not map 
> OpRegion - by leaving HPA address at 0xFC.

I think the thing that doesn't make sense to everyone here is that it's
common practice for x86 systems, especially legacy OSes, to probe
memory, get back -1 and move on.  A hypervisor should support that.  So
if there's a bogus address in the ASL Storage register and the driver
tries to read from the GPA indicated by that address, the VM should at
worst get back -1 or a memory space that doesn't contain the graphics
signature.  If there's a super strict hypervisor that doesn't handle the
VM faulting outside of it's address space, that's very prone to exploit.
If a driver wants to avoid it anyway, perhaps they should be doing
standard things like checking whether the ASL Storage address falls
within a reserved memory region rather than coming up with ad-hoc
register content based solutions.  Thanks,

Alex


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-02 Thread Kay, Allen M


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, February 02, 2016 11:37 AM
> To: Kay, Allen M; Tian, Kevin; Gerd Hoffmann; qemu-de...@nongnu.org
> Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-us...@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> On Tue, 2016-02-02 at 19:10 +, Kay, Allen M wrote:
> >
> > > -Original Message-
> > > From: Tian, Kevin
> > > Sent: Monday, February 01, 2016 11:08 PM
> > > To: Kay, Allen M; Alex Williamson; Gerd Hoffmann;
> > > qemu-de...@nongnu.org
> > > Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo
> > > Habkost; Stefano Stabellini; Cao jin; vfio-us...@redhat.com
> > > Subject: RE: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > chipset tweaks
> > >
> > > > From: Kay, Allen M
> > > > Sent: Saturday, January 30, 2016 5:58 AM
> > > >
> > > > First of all, I would like to clarify I'm talking about general
> > > > IGD passthrough case - not specific to KVMGT.  In IGD passthrough
> > > > configuration, one of the following will happen when the driver
> > > > accesses
> > > OpRegion:
> > > >
> > > > 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by
> > > > pre-map it (i.e. Xen) or map it during EPT page fault (i.e. KVM),
> > > > guest can successfully read the content of the OpRegion and check
> > > > the ID
> > > string.  In this case, everything works fine.
> > > >
> > > > 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at
> > > > all, then guest driver's attempt to setup GVA/GPA mapping will
> > > > fail, which causes the driver to fail.  In this case, guest driver
> > > > won't have the opportunity to look into the content of OpRegion
> > > > memory and check the ID
> > > string.
> > > >
> > >
> > > Guest mapping of GVA->GPA can always succeed regardless of whether
> > > GPA->HPA is valid. Failure will happen only when the GVA is actually
> > > accessed by guest.
> > >
> 
> Hi Allen,
> 
> > That is the data from team debugged IGD passthrough on a closed source
> > hypervisor that does not map OpRegion with EPT.  The end result is the
> same -driver cannot access inside of OpRegion without failing.
> 
> Define "failing".
> 

Hi Alex,

The reported behavior is OpRegion mapping in the guest fail which caused driver 
fail to load.  However, I think what you described below is reasonable.  I will 
take a close look at it after I get my KVM environment setup.

Allen

> > > I don't understand 2). If hypervisor doesn't want to setup mapping,
> > > there is no chance for guest driver to get opregion content, right?
> >
> > That was precisely the point I was trying to make.  As a result, guest
> > driver needs some indication from the hypervisor that the address at 0xFC
> contains GPA that can be safely accessed by the driver without causing
> unrecoverable failure on hypervisors that does not map OpRegion - by
> leaving HPA address at 0xFC.
> 
> I think the thing that doesn't make sense to everyone here is that it's
> common practice for x86 systems, especially legacy OSes, to probe memory,
> get back -1 and move on.  A hypervisor should support that.  So if there's a
> bogus address in the ASL Storage register and the driver tries to read from
> the GPA indicated by that address, the VM should at worst get back -1 or a
> memory space that doesn't contain the graphics signature.  
> If there's a super strict hypervisor that doesn't handle the VM faulting 
> outside of it's address
> space, that's very prone to exploit.
> If a driver wants to avoid it anyway, perhaps they should be doing standard
> things like checking whether the ASL Storage address falls within a reserved
> memory region rather than coming up with ad-hoc register content based
> solutions.  Thanks,
> 
> Alex

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-02 Thread Tian, Kevin
> From: Gerd Hoffmann [mailto:kra...@redhat.com]
> Sent: Tuesday, February 02, 2016 4:56 PM
> 
>   Hi,
> 
> > > I'd have qemu copy the data on 0xfc write then, so things continue to
> > > work without updating seabios.  So, the firmware has to allocate space,
> > > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > > sure the opregion appears at the address written by the firmware, by
> > > whatever method it prefers.
> >
> > Yup. It's Qemu's responsibility to expose opregion content.
> >
> > btw, prefer to do copying here. It's pointless to allow write from guest
> > side. One write example is SWSCI mailbox, thru which gfx driver can
> > trigger some SCI event to communicate with BIOS (specifically ACPI
> > methods here), mostly for some monitor operations. However it's
> > not a right thing for guest to trigger host SCI and thus kick host
> > ACPI methods.
> 
> Thanks.
> 
> So, question again how we do that best.  Option one being the mmap way,
> i.e. basically what the patches posted by alex are doing.  Option two
> being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
> seabios not only set 0xfc, but also store the opregion there by copying
> from fw_cfg.
> 
> Advantage of option one is that we'll keep the option to do things in a
> different way in the future, without breaking the guest/qemu interface.
> 
> Disadvantage is that it'll cause hugepage mappings to be splitted.
> 

based on where you pick up the gfn to map or copy opregion. If you look
at physical, it's usually close to mmio region where several other reserved
e820 entries also exist. If we do same for virtual opregion, it shouldn't
impact hugepage.

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-02 Thread Gerd Hoffmann
  Hi,

> > I'd have qemu copy the data on 0xfc write then, so things continue to
> > work without updating seabios.  So, the firmware has to allocate space,
> > reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> > sure the opregion appears at the address written by the firmware, by
> > whatever method it prefers.
> 
> Yup. It's Qemu's responsibility to expose opregion content. 
> 
> btw, prefer to do copying here. It's pointless to allow write from guest
> side. One write example is SWSCI mailbox, thru which gfx driver can
> trigger some SCI event to communicate with BIOS (specifically ACPI
> methods here), mostly for some monitor operations. However it's 
> not a right thing for guest to trigger host SCI and thus kick host 
> ACPI methods.

Thanks.

So, question again how we do that best.  Option one being the mmap way,
i.e. basically what the patches posted by alex are doing.  Option two
being the fw_cfg way, i.e. place a opregion copy in fw_cfg and have
seabios not only set 0xfc, but also store the opregion there by copying
from fw_cfg.

Advantage of option one is that we'll keep the option to do things in a
different way in the future, without breaking the guest/qemu interface.

Disadvantage is that it'll cause hugepage mappings to be splitted.

Hmm.

cheers,
  Gerd



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-02 Thread Alex Williamson
On Tue, 2016-02-02 at 00:04 +, Kay, Allen M wrote:
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Sunday, January 31, 2016 9:42 AM
> > To: Kay, Allen M; Gerd Hoffmann; David Woodhouse
> > Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; qemu-de...@nongnu.org; Cao jin; vfio-
> > us...@redhat.com
> > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > On Sat, 2016-01-30 at 01:18 +, Kay, Allen M wrote:
> > > 
> > > > -Original Message-
> > > > From: iGVT-g [mailto:igvt-g-boun...@lists.01.org] On Behalf Of Alex
> > > > Williamson
> > > > Sent: Friday, January 29, 2016 10:00 AM
> > > > To: Gerd Hoffmann
> > > > Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo
> > > > Habkost; Stefano Stabellini; qemu-de...@nongnu.org; Cao jin; vfio-
> > > > us...@redhat.com
> > > > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > > chipset tweaks
> > > > 
> > > > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently
> > > > testing for any Intel VGA device, but I wonder if I should only be
> > > > enabling anything opregion if it also appears at a specific address.
> > > > 
> > > 
> > > No.  Both Windows and Linux IGD driver should work at any PCI slot.  We
> > have seen 0:5.0 in the guest and the driver works.
> > 
> > Thanks Allen.  Another question, when I boot a VM with an assigned HD
> > P4000 GPU, my console stream with IOMMU faults, like:
> > 
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> > DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> > 
> > All of these fall within the host RMRR range for the device:
> > 
> > DMAR: Setting RMRR:
> > DMAR: Setting identity map for device :00:02.0 [0x9f80 - 0xaf9f]
> 
> Hi Alex,
> 
> Do you configure IGD as primary or secondary display in your KVM setup?   If 
> primary, are you running Intel vBIOS as part of guest boot?
> 
> On BDW/SKL systems, we have started to configure IGD as secondary and QEMU 
> VGA and primary.  In this setup, we are no longer running vBIOS in the guest 
> which avoids some complications.  vBIOS uses
> stolen memory for display buffers which requires RMRR mapping.  We have been 
> using similar setup (IGD as secondary) on other hypervisors and have not seen 
> IOMMU faults.
> 
> I will setup a KVM configuration on my SKL and see if I can duplicate your 
> problem here.   I will try to call into Don's Thursday meeting to discuss 
> this (I'm on call for jury duty this week).  I
> will give you a heads up on Wednesday evening.

Hi Allen,

I'm currently trying to run as primary, but I don't get any output until
well into the guest boot, so clearly the Intel vBIOS is not happy,
regardless of whether I provide VGA region access.  When I try to run as
secondary I don't get any output at all on the assigned device and the
FC23 Live CD I'm booting doesn't appear to see the IGD output.  I've
only just started playing with actually using it though, so perhaps I
haven't dialed it in just yet.  I will note though that the DMAR faults
are well after the vBIOS would have been run, I see the i915 driver
reads the stolen memory base from config register 0x5c.  Emulating this
register as returning 0x0 avoids the DMAR faults and fixes corruption of
the framebuffer, so this doesn't appear to be exclusive to the vBIOS.

Regardless of which we intend to support, device assignment is an
advanced topic for most users and I think we need to do something to
protect users from having their VM memory stomped on by an IGD device
writing framebuffer data over RAM.  Thanks,

Alex


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-02 Thread Alex Williamson
On Tue, 2016-02-02 at 11:50 +, David Woodhouse wrote:
> On Tue, 2016-02-02 at 06:42 +, Tian, Kevin wrote:
> > > From: Kay, Allen M
> > > Sent: Tuesday, February 02, 2016 8:04 AM
> > > > 
> > > > David notes in the latter commit above:
> > > > 
> > > > "We should be able to successfully assign graphics devices to guests 
> > > > too, as
> > > > long as the initial handling of stolen memory is reconfigured 
> > > > appropriately."
> > > > 
> > > > What code is supposed to be doing that reconfiguration when a device is
> > > > assigned?  Clearly we don't have it yet, making assignment of these 
> > > > devices
> > > > very unsafe.  It seems like vfio or IOMMU code  in the kernel needs 
> > > > device
> > > > specific code to clear these settings to make it safe for userspace, 
> > > > then
> > > > perhaps VM BIOS support to reallocate.  Is there any consistency across 
> > > > IGD
> > > > revisions for doing this?  Is there a spec?
> > > > Thanks,
> 
> I haven't ever successfully assigned an IGD device to a VM myself, but
> my understanding was that it *has* been done. So when the code was
> changed to prevent assignment of devices afflicted by RMRRs (except USB
> where we know it's OK), I just added the integrated graphics to that
> same exception as USB, to preserve the status quo ante.

It had been successfully done on /Xen/, not on anything that actually made
use of that exclusion, so there was no status quo to preserve.

> > I don't think stolen memory should be handled explicitly. If yes, it should 
> > be
> > listed as a RMRR region so general RMRR setup will cover it. But as Allen
> > pointed out, the whole RMRR becomes unnecessary if we target only secondary
> > device for IGD.
> 
> Perhaps the best option is *not* to have special cases in the IOMMU
> code for "those devices which can safely be assigned despite RMRRs".
> 
> Instead, let's let the device driver — or whatever — tell the IOMMU
> code when it's *stopped* the firmware from (ab)using the device's DMA
> facilities.
> 
> So when the USB code does the handoff thing to quiesce the firmware's
> access to USB and take over in the OS, it would call the IOMMU function
> to revoke the RMRR for the USB controller.
> 
> And if/when the graphics driver resets its device into a state where
> it's no longer accessing stolen memory and can be assigned to a VM, it
> can also call that 'RMRR revoke' function.
> 
> Likewise, if we teach device drivers to cancel whatever abominations
> the HP firmware tends to set up behind the OS's back on other PCI
> devices, we can cancel the RMRRs for those too.
> 
> Then the IOMMU code has a simple choice and no special cases — we can
> assign a device iff it has no active RMRR.

I first glance I like it, but there's a problem, it assumes there is a
host driver for the device that will permanently release the device from
the RMRR even after the device is unbound.  Currently we don't have a
requirement that the user must first bind the device to a native host
driver, unbind it, and only then is it eligible for device assignment.
In fact with GPUs we often blacklist the native driver or attach them
directly to a stub driver to avoid the host driver.  Maybe that issue
works itself out since the IOMMU won't allow access to the device
without this step, but it means that i915 needs to be better than most
graphics drivers when it comes to unbinding the device (which is not a
very high bar).  Of course as I've shown on IGD, it's not simply a
matter of declaring the RMRR unused, some reconfiguration of the device
is necessary such that the guest driver doesn't try to start using that
same reserved range.  Thanks,

Alex


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-02 Thread David Woodhouse
On Tue, 2016-02-02 at 07:54 -0700, Alex Williamson wrote:
> 
> I first glance I like it, but there's a problem, it assumes there is a
> host driver for the device that will permanently release the device from
> the RMRR even after the device is unbound.  Currently we don't have a
> requirement that the user must first bind the device to a native host
> driver, unbind it, and only then is it eligible for device assignment.

It doesn't *have* to be a full native driver. It can be a PCI quirk
(the USB controllers could potentially do it that way, although they
don't). Or a stub 'shut it down' driver, potentially even done somehow
through VFIO.

But fundamentally, in all of these cases you have to do *something* to
stop the BIOS-controlled DMA. Otherwise the RMRR shouldn't have been
there in the first place, surely?

But for the gfx case... what *do* we have to do? Does the VMM (and the
VM's BIOS, between them) have to provision a "stolen" region of guest
memory and point the gfx framebuffer at that?

Once we have a proper handle on precisely what needs to happen, we can
have a better conversation about where/how to do that...

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-02 Thread David Woodhouse
On Tue, 2016-02-02 at 06:42 +, Tian, Kevin wrote:
> > From: Kay, Allen M
> > Sent: Tuesday, February 02, 2016 8:04 AM
> > > 
> > > David notes in the latter commit above:
> > > 
> > > "We should be able to successfully assign graphics devices to guests too, 
> > > as
> > > long as the initial handling of stolen memory is reconfigured 
> > > appropriately."
> > > 
> > > What code is supposed to be doing that reconfiguration when a device is
> > > assigned?  Clearly we don't have it yet, making assignment of these 
> > > devices
> > > very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> > > specific code to clear these settings to make it safe for userspace, then
> > > perhaps VM BIOS support to reallocate.  Is there any consistency across 
> > > IGD
> > > revisions for doing this?  Is there a spec?
> > > Thanks,

I haven't ever successfully assigned an IGD device to a VM myself, but
my understanding was that it *has* been done. So when the code was
changed to prevent assignment of devices afflicted by RMRRs (except USB
where we know it's OK), I just added the integrated graphics to that
same exception as USB, to preserve the status quo ante.

> I don't think stolen memory should be handled explicitly. If yes, it should be
> listed as a RMRR region so general RMRR setup will cover it. But as Allen
> pointed out, the whole RMRR becomes unnecessary if we target only secondary
> device for IGD.

Perhaps the best option is *not* to have special cases in the IOMMU
code for "those devices which can safely be assigned despite RMRRs".

Instead, let's let the device driver — or whatever — tell the IOMMU
code when it's *stopped* the firmware from (ab)using the device's DMA
facilities.

So when the USB code does the handoff thing to quiesce the firmware's
access to USB and take over in the OS, it would call the IOMMU function
to revoke the RMRR for the USB controller.

And if/when the graphics driver resets its device into a state where
it's no longer accessing stolen memory and can be assigned to a VM, it
can also call that 'RMRR revoke' function.

Likewise, if we teach device drivers to cancel whatever abominations
the HP firmware tends to set up behind the OS's back on other PCI
devices, we can cancel the RMRRs for those too.

Then the IOMMU code has a simple choice and no special cases — we can
assign a device iff it has no active RMRR.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-01 Thread Kay, Allen M


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Sunday, January 31, 2016 9:42 AM
> To: Kay, Allen M; Gerd Hoffmann; David Woodhouse
> Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; qemu-de...@nongnu.org; Cao jin; vfio-
> us...@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> On Sat, 2016-01-30 at 01:18 +, Kay, Allen M wrote:
> >
> > > -Original Message-
> > > From: iGVT-g [mailto:igvt-g-boun...@lists.01.org] On Behalf Of Alex
> > > Williamson
> > > Sent: Friday, January 29, 2016 10:00 AM
> > > To: Gerd Hoffmann
> > > Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo
> > > Habkost; Stefano Stabellini; qemu-de...@nongnu.org; Cao jin; vfio-
> > > us...@redhat.com
> > > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > chipset tweaks
> > >
> > > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently
> > > testing for any Intel VGA device, but I wonder if I should only be
> > > enabling anything opregion if it also appears at a specific address.
> > >
> >
> > No.  Both Windows and Linux IGD driver should work at any PCI slot.  We
> have seen 0:5.0 in the guest and the driver works.
> 
> Thanks Allen.  Another question, when I boot a VM with an assigned HD
> P4000 GPU, my console stream with IOMMU faults, like:
> 
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> 
> All of these fall within the host RMRR range for the device:
> 
> DMAR: Setting RMRR:
> DMAR: Setting identity map for device :00:02.0 [0x9f80 - 0xaf9f]

Hi Alex,

Do you configure IGD as primary or secondary display in your KVM setup?   If 
primary, are you running Intel vBIOS as part of guest boot?

On BDW/SKL systems, we have started to configure IGD as secondary and QEMU VGA 
and primary.  In this setup, we are no longer running vBIOS in the guest which 
avoids some complications.  vBIOS uses stolen memory for display buffers which 
requires RMRR mapping.  We have been using similar setup (IGD as secondary) on 
other hypervisors and have not seen IOMMU faults.

I will setup a KVM configuration on my SKL and see if I can duplicate your 
problem here.   I will try to call into Don's Thursday meeting to discuss this 
(I'm on call for jury duty this week).  I will give you a heads up on Wednesday 
evening.

Allen

> 
> A while back, we excluded devices using RMRRs from participating in IOMMU
> API domains because they may continue to DMA to these reserved regions
> after assignment, possibly corrupting VM memory (c875d2c1b808).  Intel
> later decided this exclusion shouldn't apply to graphics devices
> (18436afdc11a).  Don't the above IOMMU faults reveal that exactly the
> problem we're trying to prevent by general exclusion of RMRR encumbered
> devices from the IOMMU API is actually occuring?  If I were to have VM
> memory within the RMRR address range, I wouldn't be seeing these faults,
> I'd be having the GPU corrupt my VM memory.
> 
> David notes in the latter commit above:
> 
> "We should be able to successfully assign graphics devices to guests too, as
> long as the initial handling of stolen memory is reconfigured appropriately."
> 
> What code is supposed to be doing that reconfiguration when a device is
> assigned?  Clearly we don't have it yet, making assignment of these devices
> very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> specific code to clear these settings to make it safe for userspace, then
> perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> revisions for doing this?  Is there a spec?
> Thanks,
> 
> Alex

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-01 Thread Tian, Kevin
> From: Kay, Allen M
> Sent: Tuesday, February 02, 2016 8:04 AM
> >
> > David notes in the latter commit above:
> >
> > "We should be able to successfully assign graphics devices to guests too, as
> > long as the initial handling of stolen memory is reconfigured 
> > appropriately."
> >
> > What code is supposed to be doing that reconfiguration when a device is
> > assigned?  Clearly we don't have it yet, making assignment of these devices
> > very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> > specific code to clear these settings to make it safe for userspace, then
> > perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> > revisions for doing this?  Is there a spec?
> > Thanks,

I don't think stolen memory should be handled explicitly. If yes, it should be
listed as a RMRR region so general RMRR setup will cover it. But as Allen
pointed out, the whole RMRR becomes unnecessary if we target only secondary
device for IGD.

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-01 Thread Tian, Kevin
> From: Kay, Allen M
> Sent: Saturday, January 30, 2016 5:58 AM
>
> First of all, I would like to clarify I'm talking about general IGD 
> passthrough case - not
> specific to KVMGT.  In IGD passthrough configuration, one of the following 
> will happen
> when the driver accesses OpRegion:
> 
> 1) If the hypervisor sets up OpRegion GPA/HPA mapping, either by pre-map it 
> (i.e. Xen)
> or map it during EPT page fault (i.e. KVM), guest can successfully read the 
> content of the
> OpRegion and check the ID string.  In this case, everything works fine.
> 
> 2) if the hypervisor does not setup OpRegion GPA/HPA mapping at all, then 
> guest driver's
> attempt to setup GVA/GPA mapping will fail, which causes the driver to fail.  
> In this case,
> guest driver won't have the opportunity to look into the content of OpRegion 
> memory and
> check the ID string.
> 

Guest mapping of GVA->GPA can always succeed regardless of whether
GPA->HPA is valid. Failure will happen only when the GVA is actually
accessed by guest.

I don't understand 2). If hypervisor doesn't want to setup mapping,
there is no chance for guest driver to get opregion content, right? Or
do you mean some hypervisor wants to emulate the opregion access?
but even in that case there's no failure per se except in a slower path.

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-01 Thread Tian, Kevin
> From: Gerd Hoffmann
> Sent: Monday, February 01, 2016 8:49 PM
> 
>   Hi,
> 
> > Thanks for the tip that seabios allocated pages automatically become
> > e820 reserved, that simplifies things a bit.
> 
> It's common practice for all firmware.  The e820 table from qemu is just
> a starting point, it is not passed on to the guest os as-is.  All
> permanent allocations (acpi tables, smbios tables, seabios driver data
> such as virtio rings, ...) are taken away from RAM and added to
> RESERVED, and IIRC seabios also takes care to reserve the bios and
> option rom regions in real mode address space.

Agree. It's cleaner to have seabios do allocation otherwise it's prune to
cause conflict if vfio-pci randomly picks up an address (though we can
do special tweaks within seabios to skip that one by reading 0xFC).

> 
> > > Maybe we should define the interface as "guest writes 0xfc to pick
> > > address, qemu takes care to place opregion there".  That gives us the
> > > freedom to change the qemu implementation (either copy host opregion or
> > > map the host opregion) without breaking things.
> >
> > Ok, so seabios allocates two pages, writes the base address of those
> > pages to 0xfc and looks to see whether the signature appears at that
> > address due to qemu mapping.  It verifies the size and does a
> > free/realloc if not the right size.
> 
> I think seabios first needs to reserve something big enough for a
> temporary mapping, to check signature + size, otherwise the opregion
> might scratch data structures beyond opregion in case it happens to be
> larger than 8k.
> 
> How likely is it that the opregion size ever changes?  Should we better
> be prepared to handle it?  Or would it be ok to have a ...
> 
>if (opregion_size > 8k)
>   panic();
> 
> ... style sanity check?

Above sanity check should be enough. We use 8k in KVMGT too.

> 
> > If the graphics signature does not
> > appear, free those pages and assume no opregion support.
> 
> Yes.
> 
> > If we later
> > decide to use a copy, we'd need to disable the 0xfc automagic mapping
> > and probably pass the data via fw_cfg.  Sound right?
> 
> I'd have qemu copy the data on 0xfc write then, so things continue to
> work without updating seabios.  So, the firmware has to allocate space,
> reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> sure the opregion appears at the address written by the firmware, by
> whatever method it prefers.

Yup. It's Qemu's responsibility to expose opregion content. 

btw, prefer to do copying here. It's pointless to allow write from guest
side. One write example is SWSCI mailbox, thru which gfx driver can
trigger some SCI event to communicate with BIOS (specifically ACPI
methods here), mostly for some monitor operations. However it's 
not a right thing for guest to trigger host SCI and thus kick host 
ACPI methods.

> 
> > > lpc bridge is no problem, only pci id fields are copied over and
> > > unprivileged access is allowed for them.
> > >
> > > Copying the gfx registers of the host bridge is a problem indeed.
> >
> > I would argue that both are really a problem, libvirt wants to put QEMU
> > in a container that prevents access to any host system files other than
> > those explicitly allowed.  Therefore libvirt needs to grant the process
> > access to the lpc sysfs config file even though it only needs user
> > visible register values.
> 
> Yes, correct.  We want svirt be as strict as possible.
> 

That is the most tricky part in IGD pass-thru, which is why Intel decides
to remove those dependencies to make pass-thru easier.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-01-31 Thread Alex Williamson
On Sat, 2016-01-30 at 01:18 +, Kay, Allen M wrote:
> 
> > -Original Message-
> > From: iGVT-g [mailto:igvt-g-boun...@lists.01.org] On Behalf Of Alex
> > Williamson
> > Sent: Friday, January 29, 2016 10:00 AM
> > To: Gerd Hoffmann
> > Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; qemu-de...@nongnu.org; Cao jin; vfio-
> > us...@redhat.com
> > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently testing
> > for any Intel VGA device, but I wonder if I should only be enabling anything
> > opregion if it also appears at a specific address.
> > 
> 
> No.  Both Windows and Linux IGD driver should work at any PCI slot.  We have 
> seen 0:5.0 in the guest and the driver works.

Thanks Allen.  Another question, when I boot a VM with an assigned HD
P4000 GPU, my console stream with IOMMU faults, like:

DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 
DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3 

All of these fall within the host RMRR range for the device:

DMAR: Setting RMRR:
DMAR: Setting identity map for device :00:02.0 [0x9f80 - 0xaf9f]

A while back, we excluded devices using RMRRs from participating in
IOMMU API domains because they may continue to DMA to these reserved
regions after assignment, possibly corrupting VM memory
(c875d2c1b808).  Intel later decided this exclusion shouldn't apply to
graphics devices (18436afdc11a).  Don't the above IOMMU faults reveal
that exactly the problem we're trying to prevent by general exclusion of
RMRR encumbered devices from the IOMMU API is actually occuring?  If I
were to have VM memory within the RMRR address range, I wouldn't be
seeing these faults, I'd be having the GPU corrupt my VM memory.

David notes in the latter commit above:

"We should be able to successfully assign graphics devices to guests
too, as long as the initial handling of stolen memory is reconfigured
appropriately."

What code is supposed to be doing that reconfiguration when a device is
assigned?  Clearly we don't have it yet, making assignment of these
devices very unsafe.  It seems like vfio or IOMMU code  in the kernel
needs device specific code to clear these settings to make it safe for
userspace, then perhaps VM BIOS support to reallocate.  Is there any
consistency across IGD revisions for doing this?  Is there a spec?
Thanks,

Alex


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-01-29 Thread Kay, Allen M


> -Original Message-
> From: iGVT-g [mailto:igvt-g-boun...@lists.01.org] On Behalf Of Alex
> Williamson
> Sent: Friday, January 29, 2016 10:00 AM
> To: Gerd Hoffmann
> Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; qemu-de...@nongnu.org; Cao jin; vfio-
> us...@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently testing
> for any Intel VGA device, but I wonder if I should only be enabling anything
> opregion if it also appears at a specific address.
> 

No.  Both Windows and Linux IGD driver should work at any PCI slot.  We have 
seen 0:5.0 in the guest and the driver works.

Allen
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-01-29 Thread Kay, Allen M


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, January 28, 2016 6:55 PM
> To: Kay, Allen M; Gerd Hoffmann; qemu-de...@nongnu.org
> Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-us...@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> On Fri, 2016-01-29 at 02:22 +, Kay, Allen M wrote:
> >
> > > -Original Message-
> > > From: iGVT-g [mailto:igvt-g-boun...@lists.01.org] On Behalf Of Alex
> > > Williamson
> > > Sent: Thursday, January 28, 2016 11:36 AM
> > > To: Gerd Hoffmann; qemu-de...@nongnu.org
> > > Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo
> > > Habkost; Stefano Stabellini; Cao jin; vfio-us...@redhat.com
> > > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > chipset tweaks
> > >
> > >
> > > 1) The OpRegion MemoryRegion is mapped into system_memory
> through
> > > programming of the 0xFC config space register.
> > >  a) vfio-pci could pick an address to do this as it is realized.
> > >  b) SeaBIOS/OVMF could program this.
> > >
> > > Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need
> > > to pick an address and mark it as e820 reserved.  I'm not sure how
> > > to pick that address.  We'd probably want to make the 0xFC config
> > > register read- only.  1.b) has the issue you mentioned where in most
> > > cases the OpRegion will be 8k, but the BIOS won't know how much
> > > address space it's mapping into system memory when it writes the
> > > 0xFC register.  I don't know how much of a problem this is since the
> > > BIOS can easily determine the size once mapped and re-map it
> somewhere there's sufficient space.
> > > Practically, it seems like it's always going to be 8K.  This of
> > > course requires modification to every BIOS.  It also leaves the 0xFC
> > > register as a mapping control rather than a pointer to the OpRegion
> > > in RAM, which doesn't really match real hardware.  The BIOS would need
> to pick an address in this case.
> > >
> > > 2) Read-only mappings version of 1)
> > >
> > > Discussion: Really nothing changes from the issues above, just
> > > prevents any possibility of the guest modifying anything in the
> > > host.  Xen apparently allows write access to the host page already.
> > >
> > > 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
> > >
> > > Discussion: No benefit that I can see over above other than maybe
> > > allowing write access that doesn't affect the host.
> > >
> > > 4) Copy contents into a guest RAM location, mark it reserved, point
> > > to it via 0xFC config as scratch register.
> > >  a) Done by QEMU (vfio-pci)
> > >  b) Done by SeaBIOS/OVMF
> > >
> > > Discussion: This is the most like real hardware.  4.a) has the usual
> > > issue of how to pick an address, but the benefit of not requiring
> > > BIOS changes (simply mark the RAM reserved via existing methods).
> > > 4.b) would require passing a buffer containing the contents of the
> > > OpRegion via fw_cfg and letting the BIOS do the setup.  The latter
> > > of course requires modifying each BIOS for this support.
> > >
> > > Of course none of these support hotplug nor really can they since
> > > reserved memory regions are not dynamic in the architecture.
> > >
> > > In all cases, some piece of software needs to know where it can
> > > place the OpRegion in guest memory.  It seems like there are
> > > advantages or disadvantages whether that's done by QEMU or the BIOS,
> > > but we only need to do it once if it's QEMU.  Suggestions, comments,
> preferences?
> > >
> >
> > Hi Alex, another thing to consider is how to communicate to the guest
> > driver the address at 0xFC contains a valid GPA address that can be
> > accessed by the driver without causing a EPT fault - since the same driver
> will be used on other hypervisors and they may not EPT map OpRegion
> memory.  On idea proposed by display driver team is to set bit0 of the
> address to 1 for indicating OpRegion memory can be safely accessed by the
> guest driver.
> 
> Hi Allen,
> 
> Why is that any different than a guest accessing any other memory area that
> it shouldn't?  The OpRegion starts with a 16-byte ID string, so if the guest
> finds that it should feel fairly confident the OpRegion data is valid.  The
> published spec also seems to define all bits of 0xfc as valid, not implying 
> any
> sort of alignment requirements, and the i915 driver does a memremap
> directly on the value read from 0xfc.  So I'm not sure whether there's really 
> a
> need to or ability to define any of those bits in an adhoc way to indicate
> mapping.  If we do things right, shouldn't the guest driver not even know it's
> running in a VM, at least for the KVMGT-d case, so we need to be compatible
> with physical hardware.  Thanks,
> 

First of all, I would like to clarify I'm talking about general IGD 

Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-01-28 Thread Jike Song
On 01/29/2016 10:54 AM, Alex Williamson wrote:
> On Fri, 2016-01-29 at 02:22 +, Kay, Allen M wrote:
>>  
>>> -Original Message-
>>> From: iGVT-g [mailto:igvt-g-boun...@lists.01.org] On Behalf Of Alex
>>> Williamson
>>> Sent: Thursday, January 28, 2016 11:36 AM
>>> To: Gerd Hoffmann; qemu-de...@nongnu.org
>>> Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
>>> Stefano Stabellini; Cao jin; vfio-us...@redhat.com
>>> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
>>> tweaks
>>>  
>>>  
>>> 1) The OpRegion MemoryRegion is mapped into system_memory through
>>> programming of the 0xFC config space register.
>>>  a) vfio-pci could pick an address to do this as it is realized.
>>>  b) SeaBIOS/OVMF could program this.
>>>  
>>> Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to pick
>>> an address and mark it as e820 reserved.  I'm not sure how to pick that
>>> address.  We'd probably want to make the 0xFC config register read-
>>> only.  1.b) has the issue you mentioned where in most cases the OpRegion
>>> will be 8k, but the BIOS won't know how much address space it's mapping
>>> into system memory when it writes the 0xFC register.  I don't know how
>>> much of a problem this is since the BIOS can easily determine the size once
>>> mapped and re-map it somewhere there's sufficient space.
>>> Practically, it seems like it's always going to be 8K.  This of course 
>>> requires
>>> modification to every BIOS.  It also leaves the 0xFC register as a mapping
>>> control rather than a pointer to the OpRegion in RAM, which doesn't really
>>> match real hardware.  The BIOS would need to pick an address in this case.
>>>  
>>> 2) Read-only mappings version of 1)
>>>  
>>> Discussion: Really nothing changes from the issues above, just prevents any
>>> possibility of the guest modifying anything in the host.  Xen apparently 
>>> allows
>>> write access to the host page already.
>>>  
>>> 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
>>>  
>>> Discussion: No benefit that I can see over above other than maybe allowing
>>> write access that doesn't affect the host.
>>>  
>>> 4) Copy contents into a guest RAM location, mark it reserved, point to it 
>>> via
>>> 0xFC config as scratch register.
>>>  a) Done by QEMU (vfio-pci)
>>>  b) Done by SeaBIOS/OVMF
>>>  
>>> Discussion: This is the most like real hardware.  4.a) has the usual issue 
>>> of
>>> how to pick an address, but the benefit of not requiring BIOS changes 
>>> (simply
>>> mark the RAM reserved via existing methods).  4.b) would require passing a
>>> buffer containing the contents of the OpRegion via fw_cfg and letting the
>>> BIOS do the setup.  The latter of course requires modifying each BIOS for 
>>> this
>>> support.
>>>  
>>> Of course none of these support hotplug nor really can they since reserved
>>> memory regions are not dynamic in the architecture.
>>>  
>>> In all cases, some piece of software needs to know where it can place the
>>> OpRegion in guest memory.  It seems like there are advantages or
>>> disadvantages whether that's done by QEMU or the BIOS, but we only need
>>> to do it once if it's QEMU.  Suggestions, comments, preferences?
>>>  
>>  
>> Hi Alex, another thing to consider is how to communicate to the guest driver 
>> the address at 0xFC contains a valid GPA address that can be accessed by the 
>> driver without causing a EPT fault - since
>> the same driver will be used on other hypervisors and they may not EPT map 
>> OpRegion memory.  On idea proposed by display driver team is to set bit0 of 
>> the address to 1 for indicating OpRegion memory
>> can be safely accessed by the guest driver.
> 
> Hi Allen,
> 
> Why is that any different than a guest accessing any other memory area
> that it shouldn't?  The OpRegion starts with a 16-byte ID string, so if
> the guest finds that it should feel fairly confident the OpRegion data
> is valid.  The published spec also seems to define all bits of 0xfc as
> valid, not implying any sort of alignment requirements, and the i915
> driver does a memremap directly on the value read from 0xfc.  So I'm not
> sure whether there's really a need to or ability to define any of those
> bits in an adhoc way to indicate mapping.  If we do things right,
> shouldn't the guest driver not even know it's running in a VM, at least
> for the KVMGT-d case, so we need to be compatible with physical
> hardware.  Thanks,
> 

I agree. EPT page fault is allowed on guest OpRegion accessing, as long as
during the page fault handling, KVM will find a proper PFN for that GPA.
It's exactly what is expected for 'normal' memory.

> Alex
> 

--
Thanks,
Jike

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-01-28 Thread Kay, Allen M


> -Original Message-
> From: iGVT-g [mailto:igvt-g-boun...@lists.01.org] On Behalf Of Alex
> Williamson
> Sent: Thursday, January 28, 2016 11:36 AM
> To: Gerd Hoffmann; qemu-de...@nongnu.org
> Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; Cao jin; vfio-us...@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> 
> 1) The OpRegion MemoryRegion is mapped into system_memory through
> programming of the 0xFC config space register.
>  a) vfio-pci could pick an address to do this as it is realized.
>  b) SeaBIOS/OVMF could program this.
> 
> Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to pick
> an address and mark it as e820 reserved.  I'm not sure how to pick that
> address.  We'd probably want to make the 0xFC config register read-
> only.  1.b) has the issue you mentioned where in most cases the OpRegion
> will be 8k, but the BIOS won't know how much address space it's mapping
> into system memory when it writes the 0xFC register.  I don't know how
> much of a problem this is since the BIOS can easily determine the size once
> mapped and re-map it somewhere there's sufficient space.
> Practically, it seems like it's always going to be 8K.  This of course 
> requires
> modification to every BIOS.  It also leaves the 0xFC register as a mapping
> control rather than a pointer to the OpRegion in RAM, which doesn't really
> match real hardware.  The BIOS would need to pick an address in this case.
> 
> 2) Read-only mappings version of 1)
> 
> Discussion: Really nothing changes from the issues above, just prevents any
> possibility of the guest modifying anything in the host.  Xen apparently 
> allows
> write access to the host page already.
> 
> 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
> 
> Discussion: No benefit that I can see over above other than maybe allowing
> write access that doesn't affect the host.
> 
> 4) Copy contents into a guest RAM location, mark it reserved, point to it via
> 0xFC config as scratch register.
>  a) Done by QEMU (vfio-pci)
>  b) Done by SeaBIOS/OVMF
> 
> Discussion: This is the most like real hardware.  4.a) has the usual issue of
> how to pick an address, but the benefit of not requiring BIOS changes (simply
> mark the RAM reserved via existing methods).  4.b) would require passing a
> buffer containing the contents of the OpRegion via fw_cfg and letting the
> BIOS do the setup.  The latter of course requires modifying each BIOS for this
> support.
> 
> Of course none of these support hotplug nor really can they since reserved
> memory regions are not dynamic in the architecture.
> 
> In all cases, some piece of software needs to know where it can place the
> OpRegion in guest memory.  It seems like there are advantages or
> disadvantages whether that's done by QEMU or the BIOS, but we only need
> to do it once if it's QEMU.  Suggestions, comments, preferences?
> 

Hi Alex, another thing to consider is how to communicate to the guest driver 
the address at 0xFC contains a valid GPA address that can be accessed by the 
driver without causing a EPT fault - since the same driver will be used on 
other hypervisors and they may not EPT map OpRegion memory.  On idea proposed 
by display driver team is to set bit0 of the address to 1 for indicating 
OpRegion memory can be safely accessed by the guest driver.

> 
> Another thing I notice in this series is the access to PCI config space of 
> both
> the host bridge and the LPC bridge.  This prevents unprivileged use cases and
> is a barrier to libvirt support since it will need to provide access to the 
> pci-
> sysfs files for the process.  Should vfio add additional device specific 
> regions
> to expose the config space of these other devices?  I don't see that there's
> any write access necessary, so these would be read-only.  The comment in
> the kernel regarding why an unprivileged user can only access standard
> config space indicates that some devices lockup if unimplemented config
> space is accessed.  It seems like that's probably not an issue for recent-ish
> Intel host bridges and LPC devices.  If OpRegion, host bridge config, and LPC
> config were all provided through vfio, would there be any need for igd-
> passthrough switches on the machine type?  It seems like the QEMU vfio-pci
> driver could enable the necessary features and pre-fill the host and LPC
> bridge config items on demand when parsing an IGD device.  Thanks,
> 
> Alex
> 
> __

Allen
_
> iGVT-g mailing list
> igv...@lists.01.org
> https://lists.01.org/mailman/listinfo/igvt-g
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-01-28 Thread Alex Williamson
On Fri, 2016-01-29 at 02:22 +, Kay, Allen M wrote:
> 
> > -Original Message-
> > From: iGVT-g [mailto:igvt-g-boun...@lists.01.org] On Behalf Of Alex
> > Williamson
> > Sent: Thursday, January 28, 2016 11:36 AM
> > To: Gerd Hoffmann; qemu-de...@nongnu.org
> > Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> > Stefano Stabellini; Cao jin; vfio-us...@redhat.com
> > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> > tweaks
> > 
> > 
> > 1) The OpRegion MemoryRegion is mapped into system_memory through
> > programming of the 0xFC config space register.
> >  a) vfio-pci could pick an address to do this as it is realized.
> >  b) SeaBIOS/OVMF could program this.
> > 
> > Discussion: 1.a) Avoids any BIOS dependency, but vfio-pci would need to pick
> > an address and mark it as e820 reserved.  I'm not sure how to pick that
> > address.  We'd probably want to make the 0xFC config register read-
> > only.  1.b) has the issue you mentioned where in most cases the OpRegion
> > will be 8k, but the BIOS won't know how much address space it's mapping
> > into system memory when it writes the 0xFC register.  I don't know how
> > much of a problem this is since the BIOS can easily determine the size once
> > mapped and re-map it somewhere there's sufficient space.
> > Practically, it seems like it's always going to be 8K.  This of course 
> > requires
> > modification to every BIOS.  It also leaves the 0xFC register as a mapping
> > control rather than a pointer to the OpRegion in RAM, which doesn't really
> > match real hardware.  The BIOS would need to pick an address in this case.
> > 
> > 2) Read-only mappings version of 1)
> > 
> > Discussion: Really nothing changes from the issues above, just prevents any
> > possibility of the guest modifying anything in the host.  Xen apparently 
> > allows
> > write access to the host page already.
> > 
> > 3) Copy OpRegion contents into buffer and do either 1) or 2) above.
> > 
> > Discussion: No benefit that I can see over above other than maybe allowing
> > write access that doesn't affect the host.
> > 
> > 4) Copy contents into a guest RAM location, mark it reserved, point to it 
> > via
> > 0xFC config as scratch register.
> >  a) Done by QEMU (vfio-pci)
> >  b) Done by SeaBIOS/OVMF
> > 
> > Discussion: This is the most like real hardware.  4.a) has the usual issue 
> > of
> > how to pick an address, but the benefit of not requiring BIOS changes 
> > (simply
> > mark the RAM reserved via existing methods).  4.b) would require passing a
> > buffer containing the contents of the OpRegion via fw_cfg and letting the
> > BIOS do the setup.  The latter of course requires modifying each BIOS for 
> > this
> > support.
> > 
> > Of course none of these support hotplug nor really can they since reserved
> > memory regions are not dynamic in the architecture.
> > 
> > In all cases, some piece of software needs to know where it can place the
> > OpRegion in guest memory.  It seems like there are advantages or
> > disadvantages whether that's done by QEMU or the BIOS, but we only need
> > to do it once if it's QEMU.  Suggestions, comments, preferences?
> > 
> 
> Hi Alex, another thing to consider is how to communicate to the guest driver 
> the address at 0xFC contains a valid GPA address that can be accessed by the 
> driver without causing a EPT fault - since
> the same driver will be used on other hypervisors and they may not EPT map 
> OpRegion memory.  On idea proposed by display driver team is to set bit0 of 
> the address to 1 for indicating OpRegion memory
> can be safely accessed by the guest driver.

Hi Allen,

Why is that any different than a guest accessing any other memory area
that it shouldn't?  The OpRegion starts with a 16-byte ID string, so if
the guest finds that it should feel fairly confident the OpRegion data
is valid.  The published spec also seems to define all bits of 0xfc as
valid, not implying any sort of alignment requirements, and the i915
driver does a memremap directly on the value read from 0xfc.  So I'm not
sure whether there's really a need to or ability to define any of those
bits in an adhoc way to indicate mapping.  If we do things right,
shouldn't the guest driver not even know it's running in a VM, at least
for the KVMGT-d case, so we need to be compatible with physical
hardware.  Thanks,

Alex


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel