Re: [Xen-devel] RMRR Fix Design for Xen

2015-01-05 Thread George Dunlap
On Fri, Dec 19, 2014 at 1:21 AM, Tiejun Chen tiejun.c...@intel.com wrote:
 RMRR Fix Design for Xen

 This design is a goal to fix RMRR for Xen. It includes four sectors as
 follows:

 * Background
 * What is RMRR
 * Current RMRR Issues
 * Design Overview

 We hope this can help us to understand current problem then figure out a
 clean and better solution everyone can agree now to go forward.

 Background
 ==

 We first identified this RMRR defect when trying to pass-through IGD device,
 which can be simply fixed by adding an identity mapping in case of shared
 EPT table. However along with the community discussion, it boiled down to
 a more general RMRR problem, i.e. the identity mapping is brute-added
 in hypervisor, w/o considering whether conflicting with an existing guest
 PFN ranges. As a general solution we need invent a new mechanism so
 reserved ranges allocated by hypervisor can be exported to the user space
 toolstack and hvmloader, so conflict can be detected when constructing
 guest PFN layout, with best-effort avoidance policies to further help.

 What is RMRR
 

 RMRR is a acronym for Reserved Memory Region Reporting.

 BIOS may report each such reserved memory region through the RMRR structures,
 along with the devices that requires access to the specified reserved memory
 region. Reserved memory ranges that are either not DMA targets, or memory
 ranges that may be target of BIOS initiated DMA only during pre-boot phase
 (such as from a boot disk drive) must not be included in the reserved memory
 region reporting. The base address of each RMRR region must be 4KB aligned and
 the size must be an integer multiple of 4KB. BIOS must report the RMRR 
 reported
 memory addresses as reserved in the system memory map returned through methods
 suchas INT15, EFI GetMemoryMap etc. The reserved memory region reporting
 structures are optional. If there are no RMRR structures, the system software
 concludes that the platform does not have any reserved memory ranges that are
 DMA targets.

 The RMRR regions are expected to be used for legacy usages (such as USB, UMA
 Graphics, etc.) requiring reserved memory. Platform designers shouldavoid or
 limit use of reserved memory regions since these require system software to
 create holes in the DMA virtual address range available to system software and
 its drivers.

 The following is grabbed from my BDW:

 (XEN) [VT-D]dmar.c:834: found ACPI_DMAR_RMRR:
 (XEN) [VT-D]dmar.c:679:   RMRR region: base_addr ab80a000 end_address ab81dfff
 (XEN) [VT-D]dmar.c:834: found ACPI_DMAR_RMRR:
 (XEN) [VT-D]dmar.c:679:   RMRR region: base_addr ad00 end_address af7f

 Here USB occupies 0xab80a000:0xab81dfff, IGD owns 0xad00:0xaf7f.

 Note there are zero or more Reserved Memory Region Reporting (RMRR) in one 
 given
 platform. And multiple devices may share one RMRR range. Additionally RMRR can
 go anyplace.

Tiejun,

Thanks for this document -- such a document is really helpful in
figuring out the best way to architect the solution to a problem.

I hope you don't mind me asking a few additional questions here.
You've said that:
* RMRR is a range used by devices (typically legacy devices such as
USB, but apparently also newer devices like IGD)
* RMRR ranges are reported by BIOSes
* RMRR ranges should be avoided by the guest.

I'm still missing a few things, however.

* In the case of passing through a virtual device, how does the
range apply wrt gpfn space and mfn space?  I assume in example
above, the range [ab80a000,ab81dfff] corresponds to an mfn range.
When passing through this device to the guest, do pfns
[ab80a000,ab81dfff] need to be mapped to the same mfn range (i.e., 1-1
mapping), or can they be mapped from somewhere else in pfn space?

* You've described the range, but later on you talk about Xen
creating RMRR mappings.  What does this mean?  Are there registers
that need to be written?  Do the ept / IOMMU tables need some kind of
special flags?

Thanks,
 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RMRR Fix Design for Xen

2014-12-21 Thread Chen, Tiejun

Jan,

Thanks for your time but I'm not going to address your comments here. 
Because I heard this design is totally not satisfied your expectation. 
But this really was reviewed with several revisions by Kevin and Yang 
before sending in public...


Anyway, I guess the only thing what I can do is that, Kevin and Yang, or 
other appropriate guys should finish this design as you expect. So now 
I'd better not say anything to avoid bringing any inconvenience.


Tiejun

On 2014/12/19 23:13, Jan Beulich wrote:

On 19.12.14 at 02:21, tiejun.c...@intel.com wrote:

#4 Something like USB, is still restricted to current RMRR implementation. We
should work out this case.


This can mean all or nothing. My understanding is that right now code
assumes that USB devices won't use their RMRR-specified memory
regions post-boot (kind of contrary to your earlier statement that in
such a case the regions shouldn't be listed in RMRRs in the first place).


Design Overview
===

First of all we need to make sure all resources don't overlap RMRR. And then
in case of shared ept, we can set these identity entries. And Certainly we
will
group all devices associated to one same RMRR entry, then make sure all
group
devices should be assigned to same VM.

1. Setup RMRR identity mapping

current status:
 * identity mapping only setup in non-shared ept case

proposal:

In non-shared ept case, IOMMU stuff always set those entries and RMRR is
already marked reserved in host so its fine enough.


Is it? Where? Or am I misunderstanding the whole statement, likely
due to me silently replacing host by guest (since reservation in
host address spaces is of no interest here afaict)?


But in shared ept case, we need to
check any conflit, so we should follow up

   - gfn space unoccupied
 - insert mapping: success.
 gfn:_mfn(gfn), PAGE_ORDER_4K, p2m_mmio_direct, p2m_access_rw
   - gfn space already occupied by 1:1 RMRR mapping
 - do nothing; success.
   - gfn space already occupied by other mapping
 - fail.

expectation:
 * only devices w/ non-conflicting RMRR can be assigned
 * fortunately this achieves the very initial intention to support IGD
   pass-through on BDW


Are you trying to say here that doing the above is all you need for
your specific machine? If so, that's clearly not something to go into
a design document.

Also there's clearly an alternative proposal: Drop support for sharing
page tables. Your colleagues will surely have told you that we've
been considering this for quite some time, and had actually hoped
for them to do the necessary VT-d side work to allow for this without
causing performance regressions.


2.1 Expose RMRR to user space

current status:
 * Xen always record RMRR info into one list, acpi_rmrr_units, while
parsing
   acpi. So we can retrieve these info by lookup that list.

proposal:
 * RMRR would be exposed by a new hypercall, which Jan already finished
in
   current version but just expose all RMRR info unconditionally.
 * Furthermore we can expose RMRR on demand to diminish shrinking guest
   RAM/MMIO space.
 * So we will introduce a new parameter, 'rdm_forcecheck' and to
collaborate
   with SBDFs to control which RMRR should be exposed:

 - We can set this parameter in .cfg file like,

 rdm_forcecheck = 1 = Of course this should be 0 by default.

 '1' means we should force check to reserve all ranges
unconditionally.
 and if failed VM wouldn't be created successfully. This also can
give
 user a chance to work well with later hotplug, even if not a device
 assignment while creating VM.

 If 0, we just check those assigned pci devices. As you know we already


assigned? Wasn't the plan to have a separate potentially-to-be-
assigned list? And I can only re-iterate that the name
rdm_forcecheck doesn't really express what you mean. Since
your intention is to check all devices (rather than do a check that
otherwise wouldn't be done), rdm_all or rdm_check_all would
seem closer to the intended behavior.


 have such an existing hypercall to assign PCI devices, looks we can
work
 directly under this hypercall to get that necessary SBDF to sort
which
 RMRR should be handled. But obviously, we need to get these info
before
 we populate guest memory to make sure these RMRR ranges should be
 excluded from guest memory. But unfortunately the memory populating
 takes place before a device assignment, so we can't live on that
 directly.

 But as we discussed it just benefit that assigned case to reorder
that
 order, but not good to hotplug case. So we have to introduce a new
 DOMCTL to pass that global parameter with SBDF at the same time.

 For example, if we own these two RMRR entries,


own is confusing here, I assume you mean if there are such entries.


 [00:14.0]

Re: [Xen-devel] RMRR Fix Design for Xen

2014-12-21 Thread Tian, Kevin
I'll work out a new design proposal based on below content and previous
discussions. Thanks Tiejun for your hard-working and Jan for your careful
reviews so far.

For below comment:
  Also there's clearly an alternative proposal: Drop support for sharing
  page tables. Your colleagues will surely have told you that we've
  been considering this for quite some time, and had actually hoped
  for them to do the necessary VT-d side work to allow for this without
  causing performance regressions.

let's separate it from RMRR discussion, because RMRR issues are about
general p2m and thus orthogonal to the implementation difference between
shared or not-shared fact (though it did lead to different behaviors w/ current
bogus logic).

Thanks
Kevin

 From: Chen, Tiejun
 Sent: Monday, December 22, 2014 10:12 AM
 
 Jan,
 
 Thanks for your time but I'm not going to address your comments here.
 Because I heard this design is totally not satisfied your expectation.
 But this really was reviewed with several revisions by Kevin and Yang
 before sending in public...
 
 Anyway, I guess the only thing what I can do is that, Kevin and Yang, or
 other appropriate guys should finish this design as you expect. So now
 I'd better not say anything to avoid bringing any inconvenience.
 
 Tiejun
 
 On 2014/12/19 23:13, Jan Beulich wrote:
  On 19.12.14 at 02:21, tiejun.c...@intel.com wrote:
  #4 Something like USB, is still restricted to current RMRR implementation.
 We
  should work out this case.
 
  This can mean all or nothing. My understanding is that right now code
  assumes that USB devices won't use their RMRR-specified memory
  regions post-boot (kind of contrary to your earlier statement that in
  such a case the regions shouldn't be listed in RMRRs in the first place).
 
  Design Overview
  ===
 
  First of all we need to make sure all resources don't overlap RMRR. And
 then
  in case of shared ept, we can set these identity entries. And Certainly we
  will
  group all devices associated to one same RMRR entry, then make sure all
  group
  devices should be assigned to same VM.
 
  1. Setup RMRR identity mapping
 
  current status:
   * identity mapping only setup in non-shared ept case
 
  proposal:
 
  In non-shared ept case, IOMMU stuff always set those entries and RMRR is
  already marked reserved in host so its fine enough.
 
  Is it? Where? Or am I misunderstanding the whole statement, likely
  due to me silently replacing host by guest (since reservation in
  host address spaces is of no interest here afaict)?
 
  But in shared ept case, we need to
  check any conflit, so we should follow up
 
 - gfn space unoccupied
   - insert mapping: success.
   gfn:_mfn(gfn), PAGE_ORDER_4K, p2m_mmio_direct,
 p2m_access_rw
 - gfn space already occupied by 1:1 RMRR mapping
   - do nothing; success.
 - gfn space already occupied by other mapping
   - fail.
 
  expectation:
   * only devices w/ non-conflicting RMRR can be assigned
   * fortunately this achieves the very initial intention to support IGD
 pass-through on BDW
 
  Are you trying to say here that doing the above is all you need for
  your specific machine? If so, that's clearly not something to go into
  a design document.
 
  Also there's clearly an alternative proposal: Drop support for sharing
  page tables. Your colleagues will surely have told you that we've
  been considering this for quite some time, and had actually hoped
  for them to do the necessary VT-d side work to allow for this without
  causing performance regressions.
 
  2.1 Expose RMRR to user space
 
  current status:
   * Xen always record RMRR info into one list, acpi_rmrr_units, while
  parsing
 acpi. So we can retrieve these info by lookup that list.
 
  proposal:
   * RMRR would be exposed by a new hypercall, which Jan already
 finished
  in
 current version but just expose all RMRR info unconditionally.
   * Furthermore we can expose RMRR on demand to diminish
 shrinking guest
 RAM/MMIO space.
   * So we will introduce a new parameter, 'rdm_forcecheck' and to
  collaborate
 with SBDFs to control which RMRR should be exposed:
 
   - We can set this parameter in .cfg file like,
 
   rdm_forcecheck = 1 = Of course this should be 0 by
 default.
 
   '1' means we should force check to reserve all ranges
  unconditionally.
   and if failed VM wouldn't be created successfully. This also can
  give
   user a chance to work well with later hotplug, even if not a
 device
   assignment while creating VM.
 
   If 0, we just check those assigned pci devices. As you know we
 already
 
  assigned? Wasn't the plan to have a separate potentially-to-be-
  assigned list? And I can only re-iterate that the name
  rdm_forcecheck doesn't really express what you mean. Since
  your intention is to check all devices (rather than 

Re: [Xen-devel] RMRR Fix Design for Xen

2014-12-19 Thread Fabio Fantoni

Il 19/12/2014 02:21, Tiejun Chen ha scritto:

 RMRR Fix Design for Xen

This design is a goal to fix RMRR for Xen. It includes four sectors as
follows:

 * Background
 * What is RMRR
 * Current RMRR Issues
 * Design Overview

We hope this can help us to understand current problem then figure out a
clean and better solution everyone can agree now to go forward.

Background
==

We first identified this RMRR defect when trying to pass-through IGD device,
which can be simply fixed by adding an identity mapping in case of shared
EPT table. However along with the community discussion, it boiled down to
a more general RMRR problem, i.e. the identity mapping is brute-added
in hypervisor, w/o considering whether conflicting with an existing guest
PFN ranges. As a general solution we need invent a new mechanism so
reserved ranges allocated by hypervisor can be exported to the user space
toolstack and hvmloader, so conflict can be detected when constructing
guest PFN layout, with best-effort avoidance policies to further help.

What is RMRR


RMRR is a acronym for Reserved Memory Region Reporting.

BIOS may report each such reserved memory region through the RMRR structures,
along with the devices that requires access to the specified reserved memory
region. Reserved memory ranges that are either not DMA targets, or memory
ranges that may be target of BIOS initiated DMA only during pre-boot phase
(such as from a boot disk drive) must not be included in the reserved memory
region reporting. The base address of each RMRR region must be 4KB aligned and
the size must be an integer multiple of 4KB. BIOS must report the RMRR reported
memory addresses as reserved in the system memory map returned through methods
suchas INT15, EFI GetMemoryMap etc. The reserved memory region reporting
structures are optional. If there are no RMRR structures, the system software
concludes that the platform does not have any reserved memory ranges that are
DMA targets.

The RMRR regions are expected to be used for legacy usages (such as USB, UMA
Graphics, etc.) requiring reserved memory. Platform designers shouldavoid or
limit use of reserved memory regions since these require system software to
create holes in the DMA virtual address range available to system software and
its drivers.

The following is grabbed from my BDW:

(XEN) [VT-D]dmar.c:834: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:679:   RMRR region: base_addr ab80a000 end_address ab81dfff
(XEN) [VT-D]dmar.c:834: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:679:   RMRR region: base_addr ad00 end_address af7f

Here USB occupies 0xab80a000:0xab81dfff, IGD owns 0xad00:0xaf7f.

Note there are zero or more Reserved Memory Region Reporting (RMRR) in one given
platform. And multiple devices may share one RMRR range. Additionally RMRR can
go anyplace.

Current RMRR Issues
===

#1 RMRR may conflict RAM, mmio or other ranges in Guest physical level.


Sorry if my question is not inherent, I don't have good knowledge about 
it, xen domUs require that all memory regions is correctly defined in 
hvmloader or do them automatically and correct in any emulated devices 
assigned to domUs?
I'm unable to have qxl vga working in linux domUs and unable to found 
the problem for now, I saw a memory warning in system logs of domU with 
qxl which makes me think that perhaps the differences in memory regions 
of qxl are not considered properly in hvmloader.

The warning I found in one Fedora domU with qxl is:
ioremap error for 0xfc001000-0xfc002000, requested 0x10, got 0x0 
Here a post about with logs compared also with stdvga tests and kvm 
(with qxl full working) test:

http://lists.xen.org/archives/html/xen-devel/2013-10/msg00016.html
Here the qxl support for libxl patch updated to latest xen if someone 
want fast try it:

https://github.com/Fantu/Xen/commit/fadecf8d6ee0e8c7e421fafba67aa11879e8b8fe

Can someone tell me is can be a hvmloader memory problem or this is not 
related?


Thanks for any reply and sorry for my bad english.



#2 Xen doesn't create RMRR mapping in case of shared ept, then the assigned
device can't work well.

#3 Xen doesn't consider that case multiple devices may share one RMRR entry.
This also is a damage between different VMs when we assign such devices to
different VMs.

#4 Something like USB, is still restricted to current RMRR implementation. We
should work out this case.

Design Overview
===

First of all we need to make sure all resources don't overlap RMRR. And then
in case of shared ept, we can set these identity entries. And Certainly we will
group all devices associated to one same RMRR entry, then make sure all group
devices should be assigned to same VM.

1. Setup RMRR identity mapping

current status:
 * identity mapping only setup in non-shared ept case

proposal:

In non-shared ept case, IOMMU stuff always set those entries and RMRR is already
marked