Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Paul Durrant
> -Original Message-
> From: Roger Pau Monne
> Sent: 22 March 2018 10:06
> To: Paul Durrant 
> Cc: 'Alexey G' ; xen-devel@lists.xenproject.org;
> Andrew Cooper ; Ian Jackson
> ; Jan Beulich ; Wei Liu
> ; Anthony Perard ;
> Stefano Stabellini 
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Thu, Mar 22, 2018 at 09:29:44AM +, Paul Durrant wrote:
> > > The more I think about it, the more I like the existing
> > > map_io_range_to_ioreq_server() approach. :( It works without doing
> > > anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> > > working as expected. There is a problem to make it compatible with
> > > the specific multiple ioreq servers feature, but providing a new
> > > dmop/hypercall (which you suggest is a must have thing to trap
> MMCONFIG
> > > MMIO to give QEMU only the freedom to tell where it is located) allows
> > > to solve this problem in any possible way, either MMIO -> PCI conf
> > > translation or anything else.
> > >
> >
> > I don't think we even want QEMU to have the freedom to say where the
> > MMCONFIG areas are located, do we?
> 
> Sadly this how the chipset works. The PCIEXBAR register contains the
> position of the MCFG area. And this is emulated by QEMU.

So we should be emulating that in Xen, not handing it off to QEMU. Our 
integration with QEMU is already terrible and using QEMU to emulate the PCIe 
chipset will only make it worse.

> 
> > QEMU is not in charge of the
> > guest memory map and it is not responsible for the building the MCFG
> > table, Xen is.
> 
> Well, the one that builds the MCFG table is hvmloader actually, which
> is the one that initially sets the value of PCIEXBAR and thus the
> initial position of the MCFG.
> 
> > So it should be Xen that decides where the MMCONFIG
> > area goes for each registered PCI device and it should be Xen that
> > adds that to the MCFG table. It should be Xen that handles the
> > MMCONFIG MMIO accesses and these should be forwarded to QEMU as
> PCI
> > config IOREQs.  Now, it may be that we need to introduce a Xen
> > specific mechanism into QEMU to then route those config space
> > transactions to the device models but that would be an improvement
> > over the current cf8/cfc hackery anyway.
> 
> I think we need a way for QEMU to tell Xen the position of the MCFG
> area, and any changes to it.
> 
> I don't think we want to emulate the PCIEXBAR register inside of Xen,
> if we do that then we would likely have to emulate the full Express
> Chipset inside of Xen.
> 

No, that's *exactly* what we should be doing. We should only be using QEMU for 
emulation of discrete peripheral devices.

  Paul

> Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] X86 Community Call - Wed Apr 11, 14:00 - 15:00 UTC - Call for Agenda Items

2018-03-22 Thread Lars Kurth
Hi all,

please find attached
a) Meeting details (just a link with timezones) – the meeting invite will 
follow when we have an agenda
   Bridge details – will be sent with the meeting invite
   I am thinking of using GotoMeeting, but want to try this with a Linux only 
user before I commit
c) Call for agenda items

A few suggestions were made, such as XPTI status (if applicable), PVH status
Also we have some left-overs from the last call: see 
https://lists.xenproject.org/archives/html/xen-devel/2018-03/threads.html#01571

Regards
Lars

== Meeting Details ==
Wed April 11, 15:00 - 16:00 UTC

International meeting times: 
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2018=4=11=14=0=0=224=24=179=136=37=33

== Agenda Proposal ==
We start with a round the table call as to who is on the call (name and company)

=== A) Coordination and Planning ===
Coordinating who does what, what needs attention, what is blocked, etc.

A1) Short-term
Any urgent issues related to the 4.11 release that need discussing

A2) Long-term, Larger series
Please call out any x86 related series, that need attention in the longer term. 
Provide
* Title of series
* Link to series (e.g. on https://lists.xenproject.org/archives/html/xen-devel, 
markmail, …)
* Describe any: Dependencies, Issues, etc. that are relevant

=== B) Design, architecture, feature eupdates related discussions ===
Please highlight any design/architecture discussions that you would like to 
cover. Please describe
* Design, point to any mail discussions
* Describe clearly what you are blocked on: highlight any issues

=== C) Demos, Sharing of Experiences, Sometimes discussion of specific 
issues/bugs/problems/... ===
Please highlight any of the above that you would like to cover. Please describe
* What the issue/experience/demo is that you would like to cover

=== D) AOB ===

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 10:09:16 +
Paul Durrant  wrote:
[...]
>> > I don't think we even want QEMU to have the freedom to say where
>> > the MMCONFIG areas are located, do we?
>> 
>> Sadly this how the chipset works. The PCIEXBAR register contains the
>> position of the MCFG area. And this is emulated by QEMU.

>So we should be emulating that in Xen, not handing it off to QEMU. Our
>integration with QEMU is already terrible and using QEMU to emulate
>the PCIe chipset will only make it worse.  

I guess QEMU guys will tell that it will actually improve. :)
One of the very first observation I made while learning Xen/QEMU was
that Xen and QEMU behave sort of like stepmother and stepdaughter --
dislike each other but have to live together in one house for now.
I think a better interaction will benefit both.

There are some architectural issues (MMIO hole control for passthrough
needs is one of them) which can be solved by actually improving
coordination with QEMU, while not sacrificing the security in any way.

>> > QEMU is not in charge of the
>> > guest memory map and it is not responsible for the building the
>> > MCFG table, Xen is.
>> 
>> Well, the one that builds the MCFG table is hvmloader actually, which
>> is the one that initially sets the value of PCIEXBAR and thus the
>> initial position of the MCFG.
>> 
>> > So it should be Xen that decides where the MMCONFIG
>> > area goes for each registered PCI device and it should be Xen that
>> > adds that to the MCFG table. It should be Xen that handles the
>> > MMCONFIG MMIO accesses and these should be forwarded to QEMU as
>> PCI
>> > config IOREQs.  Now, it may be that we need to introduce a Xen
>> > specific mechanism into QEMU to then route those config space
>> > transactions to the device models but that would be an improvement
>> > over the current cf8/cfc hackery anyway.
>> 
>> I think we need a way for QEMU to tell Xen the position of the MCFG
>> area, and any changes to it.
>> 
>> I don't think we want to emulate the PCIEXBAR register inside of Xen,
>> if we do that then we would likely have to emulate the full Express
>> Chipset inside of Xen.
>> 
>No, that's *exactly* what we should be doing. We should only be using
>QEMU for emulation of discrete peripheral devices.  

Emulated PCIe Switch (PCI-PCI bridge basically) can be considered a
discrete peripheral device which can function alone?

If we should emulate the whole PCIe bus, where will be the dividing
line between chipset emulation and PCIe hierarchy emulation?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable test] 120988: regressions - FAIL

2018-03-22 Thread osstest service owner
flight 120988 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/120988/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-buildfail REGR. vs. 120943

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-examine  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 120859
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 120943
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 120943
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 120943
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 120943
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 120943
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 120943
 test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass
 test-amd64-amd64-xl-pvhv2-amd 12 guest-start  fail  never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass

version targeted for testing:
 xen  7a1358bbe73e5f749c3d2f53478dc1f30720f949
baseline version:
 xen  0012ae8afb4a6e76f2847119f2c6850fbf41d9b7

Last test of basis   120943  2018-03-18 21:56:54 Z3 days
Testing same since   120988  2018-03-20 10:55:25 Z2 days1 attempts


People who touched revisions under test:
  Amit Singh Tomar 
  Julien Grall 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  fail
 build-i386   

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Jan Beulich
>>> On 22.03.18 at 12:56,  wrote:
> I really don't understand why some people have that fear of emulated
> MMCONFIG -- it's really the same thing as any other MMIO range QEMU
> already emulates via map_io_range_to_ioreq_server(). No sensitive
> information exposed. It is related only to emulated PCI conf space which
> QEMU already knows about and use, providing emulated PCI devices for it.

You continue to ignore the routing requirement multiple ioreq
servers impose.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 09:57:16 +
Roger Pau Monné  wrote:
[...]
>> Yes, and it is still needed as we have two distinct (and not equal)
>> interfaces to PCI conf space. Apart from 0..FFh range overlapping
>> they can be considered very different interfaces. And whether it is
>> a real system or emulated -- we can use either one of these two
>> interfaces or both.  
>
>The legacy PCI config space accesses and the MCFG config space access
>are just different methods of accessing the PCI configuration space,
>but the data _must_ be exactly the same. I don't see how a device
>would care about where the access to the config space originated.

If they were different methods of accessing the same thing, they
could've been used interchangeably. When we've got a PCI conf ioreq
which has offset>100h we know we cannot just pass it to emulated
CF8/CFC but have to emulate this specifically.

>> For QEMU zero changes are needed to support MMCONFIG MMIO accesses if
>> they come as MMIO ioreqs. It's just what its MMCONFIG emulation code
>> expects.  
>
>As I said many times in this thread, you seem to be focused around
>what's best for QEMU only, and this is wrong. The IOREQ interface is
>used by QEMU, but it's also used by other device emulators.
>
>I get the feeling that you assume that the correct solution is the one
>that involves less changes to Xen and QEMU. This is simply not true.
>
>> Anyway, for (kind of vague) users of the multiple ioreq servers
>> capability we can enable MMIO translation to PCI conf ioreqs. Note
>> that actually this is an extra step, not forwarding trapped MMCONFIG
>> MMIO accesses to the selected device model as is.
>>  
>> >Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI
>> >config space access is misleading.  
>> 
>> These are very different accesses, both in transport and
>> capabilities. 
>> >In both cases Xen would have to do the MCFG access decoding in order
>> >to figure out which IOREQ server will handle the request. At which
>> >point the only step that you avoid is the reconstruction of the
>> >memory access from the IOREQ_TYPE_PCI_CONFIG which is trivial.  
>> 
>> The "reconstruction of the memory access" you mentioned won't be easy
>> actually. The thing is, address_space_read/write is not all what we
>> need.
>> 
>> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
>> need to be an involved party, mainly to know where MMCONFIG area is
>> located so we can construct the address within its range from BDF.
>> This piece of information is destroyed in the process of MMIO ioreq
>> translation to PCI conf type.  
>
>QEMU certainly knows the position of the MCFG area (because it's the
>one that tells Xen about it), so I don't understand your concerns
>above.
>> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know
>> anything about the current emulated MMCONFIG state. The correct way
>> to have this info is to participate in its emulation. As we don't
>> participate, we have no other way than trying to gain backdoor
>> access to PCIHost fields via things like object_resolve_*(). This
>> solution is cumbersome and ugly but will work... and may break
>> anytime due to changes in QEMU.   
>
>OK, so you don't want to reconstruct the access, fine.
>
>Then just inject it using pcie_mmcfg_data_{read/write} or some similar
>wrapper. My suggestion was just to try to use the easier way to get
>this injected into QEMU.

QEMU knows its position, the problem it that xen-hvm.c (ioreq
processor) is rather isolated from MMCONFIG emulation.

If you check the pcie_mmcfg_data_read/write MMCONFIG handlers in QEMU,
you can see this:

static uint64_t pcie_mmcfg_data_read(void *opaque, <...>
{
PCIExpressHost *e = opaque;
...

We know this 'opaque' when we do MMIO-style MMCONFIG handling as
pcie_mmcfg_data_read/write are actual handlers.

But xen-hvm.c needs to gain access to PCIExpressHost out of nowhere,
which is possible but considered a hack by QEMU. We can also insert
some code to MMCONFIG emulation which will store info we need to some
global variables to be used across wildly different and unrelated
modules. It will work, but anyone who see it will have bad thoughts on
his mind.

>> QEMU maintainers will grin while looking at all this I'm afraid --
>> trapped MMIO accesses which are translated to PCI conf accesses which
>> in turn translated back to emulated MMIO accesses upon receiving,
>> along with tedious attempts to gain access to MMCONFIG-related info
>> as we're not invited to the MMCONFIG emulation party.
>>
>> The more I think about it, the more I like the existing
>> map_io_range_to_ioreq_server() approach. :( It works without doing
>> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
>> working as expected. There is a problem to make it compatible with
>> the specific multiple ioreq servers feature, but providing a new
>> dmop/hypercall (which you suggest is a must have thing to trap
>> MMCONFIG MMIO to 

Re: [Xen-devel] [PATCH v5 2/5] x86/msr: add VMX MSRs into HVM_max domain policy

2018-03-22 Thread Andrew Cooper
On 22/03/2018 10:14, Sergey Dyasli wrote:
> On Wed, 2018-03-21 at 20:46 +, Andrew Cooper wrote:
>> On 28/02/2018 16:09, Sergey Dyasli wrote:
>>> +
>>> +dp->vmx.pinbased_ctls.allowed_0.raw = VMX_PINBASED_CTLS_DEFAULT1;
>>> +dp->vmx.pinbased_ctls.allowed_1.raw = VMX_PINBASED_CTLS_DEFAULT1;
>>> +supported = PIN_BASED_EXT_INTR_MASK |
>>> +PIN_BASED_NMI_EXITING   |
>>> +PIN_BASED_PREEMPT_TIMER;
>> Please have a single set of brackets around the entire or statement, so
>> editors will indent new changes correctly.
> Which editors?

Any editor which can read the file annotation stating BSD style, which
results in

supported = PIN_BASED_EXT_INTR_MASK |
    PIN_BASED_NMI_EXITING |

by default.

>  My editor is doing it fine. Anyway, is this what you are
> asking for?
>
> supported = (PIN_BASED_EXT_INTR_MASK |
>  PIN_BASED_NMI_EXITING   |
>  PIN_BASED_PREEMPT_TIMER);
>

Yes.  That's great thanks.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 39/39] ARM: VGIC: wire new VGIC(-v2) files into Xen build system

2018-03-22 Thread Andre Przywara
Hi,

On 22/03/18 08:16, Julien Grall wrote:
> Hi Andre,
> 
> On 03/21/2018 04:32 PM, Andre Przywara wrote:
>> diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
>> index 131358a5a1..22c70ff7cd 100644
>> --- a/xen/arch/arm/vgic/vgic.c
>> +++ b/xen/arch/arm/vgic/vgic.c
>> @@ -981,6 +981,16 @@ unsigned int vgic_max_vcpus(const struct domain *d)
>>   return min_t(unsigned int, MAX_VIRT_CPUS, vgic_vcpu_limit);
>>   }
>>   +#ifdef CONFIG_HAS_GICV3
>> +void vgic_v3_setup_hw(paddr_t dbase,
>> +  unsigned int nr_rdist_regions,
>> +  const struct rdist_region *regions,
>> +  unsigned int intid_bits)
>> +{
>> +    /* Dummy implementation to allow building without actual vGICv3
>> support. */
> 
> One major inconvenience with that solution is GICv3 driver is going to
> be initialized but then you hit the BUG_ON() in domain_vgic_register.
> This is really not nice for the user but it is not obvious why the
> BUG_ON() is hit.
> 
> I am ok if you don't want to touch the Kconfig. But I would at least
> implement that helper with a panic("vGICv3 not yet supported with the
> new vGIC");

Yes, that's a good point (and easy to implement!) ;-)

Also I think we should have something saying that we are using the new
VGIC, so we have it in the logs. Just realised this when I was wondering
if my machine is currently using the new or the old VGIC ;-)

I put something in the vgic_v2_setup_hw() implementation.

Cheers,
Andre.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 09:29:44 +
Paul Durrant  wrote:

>> -Original Message-
[...]
>> >In both cases Xen would have to do the MCFG access decoding in order
>> >to figure out which IOREQ server will handle the request. At which
>> >point the only step that you avoid is the reconstruction of the
>> >memory access from the IOREQ_TYPE_PCI_CONFIG which is trivial.  
>> 
>> The "reconstruction of the memory access" you mentioned won't be easy
>> actually. The thing is, address_space_read/write is not all what we
>> need.
>> 
>> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
>> need to be an involved party, mainly to know where MMCONFIG area is
>> located so we can construct the address within its range from BDF.
>> This piece of information is destroyed in the process of MMIO ioreq
>> translation to PCI conf type.
>> 
>> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know
>> anything about the current emulated MMCONFIG state. The correct way
>> to have this info is to participate in its emulation. As we don't
>> participate, we have no other way than trying to gain backdoor
>> access to PCIHost fields via things like object_resolve_*(). This
>> solution is cumbersome and ugly but will work... and may break
>> anytime due to changes in QEMU.
>> 
>> QEMU maintainers will grin while looking at all this I'm afraid --
>> trapped MMIO accesses which are translated to PCI conf accesses which
>> in turn translated back to emulated MMIO accesses upon receiving,
>> along with tedious attempts to gain access to MMCONFIG-related info
>> as we're not invited to the MMCONFIG emulation party.
>> 
>> The more I think about it, the more I like the existing
>> map_io_range_to_ioreq_server() approach. :( It works without doing
>> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
>> working as expected. There is a problem to make it compatible with
>> the specific multiple ioreq servers feature, but providing a new
>> dmop/hypercall (which you suggest is a must have thing to trap
>> MMCONFIG MMIO to give QEMU only the freedom to tell where it is
>> located) allows to solve this problem in any possible way, either
>> MMIO -> PCI conf translation or anything else.
>>   
>
>I don't think we even want QEMU to have the freedom to say where the
>MMCONFIG areas are located, do we? QEMU is not in charge of the guest
>memory map and it is not responsible for the building the MCFG table,
>Xen is. So it should be Xen that decides where the MMCONFIG area goes
>for each registered PCI device and it should be Xen that adds that to
>the MCFG table. It should be Xen that handles the MMCONFIG MMIO
>accesses and these should be forwarded to QEMU as PCI config IOREQs.
>Now, it may be that we need to introduce a Xen specific mechanism into
>QEMU to then route those config space transactions to the device
>models but that would be an improvement over the current cf8/cfc
>hackery anyway.

Well, MMCONFIG is a chipset-specific thing. We probably can't simply
abstract its usage, merely providing ACPI MCFG table for it.

Its layout must correspond to the emulated PCI conf space where the
majority of devices belong to QEMU. Although we can track all QEMU's
usage of emulated/PT PCI devices and build this layout themselves, this
design may introduce multiple issues. For QEMU handling of such PCI
conf ioreq without knowing anything about MMCONFIG will become worse --
previously he at least knew that those belong to the MMCONFIG range he
emulates, but in case of PCI conf ioreqs situation gets a bit more
complicated -- either CF8/CFC workaround or manual lookup of the target
device from rather isolated xen-hvm.c. Feasible, yes, but will look like
a dirty hack -- doing part of QEMU's internal job.

These are merely inconveniences, main problem here at the moment is
OVMF. OVMF does MMCONFIG relocation by writing to PCIEXBAR he knows
about on Q35, followed by using it at the address he expects. This is
something I want to address in subsequent patches.

>  Paul
>
>> >> We can still route either ioreq
>> >> type to multiple device emulators accordingly.  
>> >
>> >It's exactly the same that's done for IO space PCI config space
>> >addresses. QEMU gets an IOREQ_TYPE_PCI_CONFIG and it replays the IO
>> >space access using do_outp and cpu_ioreq_pio.  
>> 
>> ...And it is completely limited to basic PCI conf space. I don't know
>> the context of this line in xen-hvm.c:
>> 
>> val = (1u << 31) | ((req->addr & 0x0f00) << 16) | ((sbdf & 0x)
>> << 8) | (req->addr & 0xfc);
>> 
>> but seems like current QEMU versions do not expect anything similar
>> to AMD ECS-style accesses for 0CF8h. It is limited to basic PCI conf
>> only. 
>> >If you think using IOREQ_TYPE_COPY for MCFG accesses is such a
>> >benefit for QEMU, why not just translate the IOREQ_TYPE_PCI_CONFIG
>> >into IOREQ_TYPE_COPY in handle_ioreq and dispatch it using
>> >cpu_ioreq_move?  
>> 
>> Answered above, we need to somehow 

[Xen-devel] [PATCH v18 11/11] tools/libxenctrl: use new xenforeignmemory API to seed grant table

2018-03-22 Thread Paul Durrant
A previous patch added support for priv-mapping guest resources directly
(rather than having to foreign-map, which requires P2M modification for
HVM guests).

This patch makes use of the new API to seed the guest grant table unless
the underlying infrastructure (i.e. privcmd) doesn't support it, in which
case the old scheme is used.

NOTE: The call to xc_dom_gnttab_hvm_seed() in hvm_build_set_params() was
  actually unnecessary, as the grant table has already been seeded
  by a prior call to xc_dom_gnttab_init() made by libxl__build_dom().

Signed-off-by: Paul Durrant 
Acked-by: Marek Marczykowski-Górecki 
Reviewed-by: Roger Pau Monné 
Acked-by: Wei Liu 
---
Cc: Ian Jackson 

v18:
 - Trivial re-base.

v13:
 - Re-base.

v10:
 - Use new id constant for grant table.

v4:
 - Minor cosmetic fix suggested by Roger.

v3:
 - Introduced xc_dom_set_gnttab_entry() to avoid duplicated code.
---
 tools/libxc/include/xc_dom.h|   8 +--
 tools/libxc/xc_dom_boot.c   | 114 +---
 tools/libxc/xc_sr_restore_x86_hvm.c |  10 ++--
 tools/libxc/xc_sr_restore_x86_pv.c  |   2 +-
 tools/libxl/libxl_dom.c |   1 -
 tools/python/xen/lowlevel/xc/xc.c   |   6 +-
 6 files changed, 92 insertions(+), 49 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 491cad8114..cee2ac9901 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -332,12 +332,8 @@ void *xc_dom_boot_domU_map(struct xc_dom_image *dom, 
xen_pfn_t pfn,
 int xc_dom_boot_image(struct xc_dom_image *dom);
 int xc_dom_compat_check(struct xc_dom_image *dom);
 int xc_dom_gnttab_init(struct xc_dom_image *dom);
-int xc_dom_gnttab_hvm_seed(xc_interface *xch, uint32_t domid,
-   xen_pfn_t console_gmfn,
-   xen_pfn_t xenstore_gmfn,
-   uint32_t console_domid,
-   uint32_t xenstore_domid);
-int xc_dom_gnttab_seed(xc_interface *xch, uint32_t domid,
+int xc_dom_gnttab_seed(xc_interface *xch, uint32_t guest_domid,
+   bool is_hvm,
xen_pfn_t console_gmfn,
xen_pfn_t xenstore_gmfn,
uint32_t console_domid,
diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c
index 2e5681dc5d..8307ebeaf6 100644
--- a/tools/libxc/xc_dom_boot.c
+++ b/tools/libxc/xc_dom_boot.c
@@ -256,11 +256,29 @@ static xen_pfn_t xc_dom_gnttab_setup(xc_interface *xch, 
uint32_t domid)
 return gmfn;
 }
 
-int xc_dom_gnttab_seed(xc_interface *xch, uint32_t domid,
-   xen_pfn_t console_gmfn,
-   xen_pfn_t xenstore_gmfn,
-   uint32_t console_domid,
-   uint32_t xenstore_domid)
+static void xc_dom_set_gnttab_entry(xc_interface *xch,
+grant_entry_v1_t *gnttab,
+unsigned int idx,
+uint32_t guest_domid,
+uint32_t backend_domid,
+xen_pfn_t backend_gmfn)
+{
+if ( guest_domid == backend_domid || backend_gmfn == -1)
+return;
+
+xc_dom_printf(xch, "%s: [%u] -> 0x%"PRI_xen_pfn,
+  __FUNCTION__, idx, backend_gmfn);
+
+gnttab[idx].flags = GTF_permit_access;
+gnttab[idx].domid = backend_domid;
+gnttab[idx].frame = backend_gmfn;
+}
+
+static int compat_gnttab_seed(xc_interface *xch, uint32_t domid,
+  xen_pfn_t console_gmfn,
+  xen_pfn_t xenstore_gmfn,
+  uint32_t console_domid,
+  uint32_t xenstore_domid)
 {
 
 xen_pfn_t gnttab_gmfn;
@@ -284,18 +302,10 @@ int xc_dom_gnttab_seed(xc_interface *xch, uint32_t domid,
 return -1;
 }
 
-if ( domid != console_domid  && console_gmfn != -1)
-{
-gnttab[GNTTAB_RESERVED_CONSOLE].flags = GTF_permit_access;
-gnttab[GNTTAB_RESERVED_CONSOLE].domid = console_domid;
-gnttab[GNTTAB_RESERVED_CONSOLE].frame = console_gmfn;
-}
-if ( domid != xenstore_domid && xenstore_gmfn != -1)
-{
-gnttab[GNTTAB_RESERVED_XENSTORE].flags = GTF_permit_access;
-gnttab[GNTTAB_RESERVED_XENSTORE].domid = xenstore_domid;
-gnttab[GNTTAB_RESERVED_XENSTORE].frame = xenstore_gmfn;
-}
+xc_dom_set_gnttab_entry(xch, gnttab, GNTTAB_RESERVED_CONSOLE,
+domid, console_domid, console_gmfn);
+xc_dom_set_gnttab_entry(xch, gnttab, GNTTAB_RESERVED_XENSTORE,
+domid, xenstore_domid, xenstore_gmfn);
 
 if ( munmap(gnttab, PAGE_SIZE) == -1 )
 {
@@ -313,11 +323,11 @@ int xc_dom_gnttab_seed(xc_interface *xch, uint32_t domid,
 return 0;
 

Re: [Xen-devel] [PATCH v5 1/5] x86/msr: add VMX MSRs definitions and populate Raw domain policy

2018-03-22 Thread Sergey Dyasli
On Wed, 2018-03-21 at 19:52 +, Andrew Cooper wrote:
> On 28/02/18 16:09, Sergey Dyasli wrote:
> > 
> > +struct {
> > +/* 0x0480  MSR_IA32_VMX_BASIC */
> > +union {
> > +uint64_t raw;
> > +struct {
> > +uint32_t vmcs_revision_id:31;
> 
> vmcs_rev_id
> 
> > +bool  mbz:1;  /* 31 always zero */
> 
> Is this really mbz?  Isn't this the shadow identifier bit for shadow vmcs's?

Yes, in vmcs itself it's the shadow bit. However, it is always zero in
MSR since the job of MSR is to report vmcs revision identifier.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Roger Pau Monné
On Thu, Mar 22, 2018 at 08:49:58AM +1000, Alexey G wrote:
> On Wed, 21 Mar 2018 17:15:04 +
> Roger Pau Monné  wrote:
> [...]
> >> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
> >> conf translation is a redundant step. Why not to allow specifying
> >> for DM whether it prefers to receive MMCONFIG accesses as native
> >> (MMIO ones) or as translated PCI conf ioreqs?  
> >
> >You are just adding an extra level of complexity to an interface
> >that's fairly simple. You register a PCI device using
> >XEN_DMOP_IO_RANGE_PCI and you get IOREQ_TYPE_PCI_CONFIG ioreqs.
> 
> Yes, and it is still needed as we have two distinct (and not equal)
> interfaces to PCI conf space. Apart from 0..FFh range overlapping they
> can be considered very different interfaces. And whether it is a real
> system or emulated -- we can use either one of these two interfaces or
> both.

The legacy PCI config space accesses and the MCFG config space access
are just different methods of accessing the PCI configuration space,
but the data _must_ be exactly the same. I don't see how a device
would care about where the access to the config space originated.

> For QEMU zero changes are needed to support MMCONFIG MMIO accesses if
> they come as MMIO ioreqs. It's just what its MMCONFIG emulation code
> expects.

As I said many times in this thread, you seem to be focused around
what's best for QEMU only, and this is wrong. The IOREQ interface is
used by QEMU, but it's also used by other device emulators.

I get the feeling that you assume that the correct solution is the one
that involves less changes to Xen and QEMU. This is simply not true.

> Anyway, for (kind of vague) users of the multiple ioreq servers
> capability we can enable MMIO translation to PCI conf ioreqs. Note that
> actually this is an extra step, not forwarding trapped MMCONFIG MMIO
> accesses to the selected device model as is.
>
> >Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI config
> >space access is misleading.
> 
> These are very different accesses, both in transport and capabilities.
> 
> >In both cases Xen would have to do the MCFG access decoding in order
> >to figure out which IOREQ server will handle the request. At which
> >point the only step that you avoid is the reconstruction of the memory
> >access from the IOREQ_TYPE_PCI_CONFIG which is trivial.
> 
> The "reconstruction of the memory access" you mentioned won't be easy
> actually. The thing is, address_space_read/write is not all what we
> need.
> 
> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
> need to be an involved party, mainly to know where MMCONFIG area is
> located so we can construct the address within its range from BDF.
> This piece of information is destroyed in the process of MMIO ioreq
> translation to PCI conf type.

QEMU certainly knows the position of the MCFG area (because it's the
one that tells Xen about it), so I don't understand your concerns
above.

> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know anything
> about the current emulated MMCONFIG state. The correct way to have this
> info is to participate in its emulation. As we don't participate, we
> have no other way than trying to gain backdoor access to PCIHost fields
> via things like object_resolve_*(). This solution is cumbersome and
> ugly but will work... and may break anytime due to changes in QEMU. 

OK, so you don't want to reconstruct the access, fine.

Then just inject it using pcie_mmcfg_data_{read/write} or some similar
wrapper. My suggestion was just to try to use the easier way to get
this injected into QEMU.

> QEMU maintainers will grin while looking at all this I'm afraid --
> trapped MMIO accesses which are translated to PCI conf accesses which
> in turn translated back to emulated MMIO accesses upon receiving, along
> with tedious attempts to gain access to MMCONFIG-related info as we're
> not invited to the MMCONFIG emulation party.
>
> The more I think about it, the more I like the existing
> map_io_range_to_ioreq_server() approach. :( It works without doing
> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> working as expected. There is a problem to make it compatible with
> the specific multiple ioreq servers feature, but providing a new
> dmop/hypercall (which you suggest is a must have thing to trap MMCONFIG
> MMIO to give QEMU only the freedom to tell where it is located) allows
> to solve this problem in any possible way, either MMIO -> PCI conf
> translation or anything else.

I'm sorry, but I'm getting lost.

You complain that using IOREQ_TYPE_PCI_CONFIG is not a good approach
because QEMU needs to know the position of the MCFG area if we want to
reconstruct and forward the MMIO access. And then you are proposing to
use IOREQ_TYPE_COPY which _requires_ QEMU to know the position of the
MCFG area in order to do the decoding of the PCI config space access.

> >> We can still route 

Re: [Xen-devel] [PATCH v11 10/12] vpci: add a priority parameter to the vPCI register initializer

2018-03-22 Thread Jan Beulich
>>> On 20.03.18 at 16:15,  wrote:
> This is needed for MSI-X, since MSI-X will need to be initialized
> before parsing the BARs, so that the header BAR handlers are aware of
> the MSI-X related holes and make sure they are not mapped in order for
> the trap handlers to work properly.
> 
> Signed-off-by: Roger Pau Monné 
> Reviewed-by: Jan Beulich 
> ---
> Cc: Stefano Stabellini 
> Cc: Julien Grall 
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Tim Deegan 
> Cc: Wei Liu 
> ---
> Changes since v4:
>  - Add a middle priority and add the PCI header to it.
> 
> Changes since v3:
>  - Add a numerial suffix to the section used to store the pointer to
>each initializer function, and sort them at link time.
> ---
>  xen/arch/arm/xen.lds.S| 4 ++--

Julien, Stefano?

Thanks, Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 04/12] pci: split code to size BARs from pci_add_device

2018-03-22 Thread Roger Pau Monné
On Thu, Mar 22, 2018 at 04:15:06AM -0600, Jan Beulich wrote:
> >>> On 20.03.18 at 16:15,  wrote:
> > @@ -672,11 +722,16 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
> >  unsigned int i;
> >  
> >  BUILD_BUG_ON(ARRAY_SIZE(pdev->vf_rlen) != PCI_SRIOV_NUM_BARS);
> > -for ( i = 0; i < PCI_SRIOV_NUM_BARS; ++i )
> > +for ( i = 0; i < PCI_SRIOV_NUM_BARS; )
> >  {
> >  unsigned int idx = pos + PCI_SRIOV_BAR + i * 4;
> >  u32 bar = pci_conf_read32(seg, bus, slot, func, idx);
> > -u32 hi = 0;
> > +pci_sbdf_t sbdf = {
> > +.seg = seg,
> > +.bus = bus,
> > +.dev = slot,
> > +.func = func,
> > +};
> 
> So I've had everything up to patch 9 applied and ready for pushing,
> when I did my usual secondary compile test on an old system: This
> fails to compile with gcc 4.3 (due to there being a unnamed sub-
> structure). A similar issue exists at least in patch 7. Since the
> structure gets introduced in patch 1 (and hence may need changing

pci_sbdf_t is already in the source tree, it was introduced by
514f58d4468a40b5dd418a5ea1742681930c3f2d back in December.

> there, depending on how this is to be addressed), I'm not going to
> push any part of this series.

No patch in the series changes pci_sbdf_t at all, so in any case this
should be a pre-patch or a post-patch, but not really part of patch 1.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Roger Pau Monné
On Thu, Mar 22, 2018 at 09:29:44AM +, Paul Durrant wrote:
> > The more I think about it, the more I like the existing
> > map_io_range_to_ioreq_server() approach. :( It works without doing
> > anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> > working as expected. There is a problem to make it compatible with
> > the specific multiple ioreq servers feature, but providing a new
> > dmop/hypercall (which you suggest is a must have thing to trap MMCONFIG
> > MMIO to give QEMU only the freedom to tell where it is located) allows
> > to solve this problem in any possible way, either MMIO -> PCI conf
> > translation or anything else.
> > 
> 
> I don't think we even want QEMU to have the freedom to say where the
> MMCONFIG areas are located, do we?

Sadly this how the chipset works. The PCIEXBAR register contains the
position of the MCFG area. And this is emulated by QEMU.

> QEMU is not in charge of the
> guest memory map and it is not responsible for the building the MCFG
> table, Xen is.

Well, the one that builds the MCFG table is hvmloader actually, which
is the one that initially sets the value of PCIEXBAR and thus the
initial position of the MCFG.

> So it should be Xen that decides where the MMCONFIG
> area goes for each registered PCI device and it should be Xen that
> adds that to the MCFG table. It should be Xen that handles the
> MMCONFIG MMIO accesses and these should be forwarded to QEMU as PCI
> config IOREQs.  Now, it may be that we need to introduce a Xen
> specific mechanism into QEMU to then route those config space
> transactions to the device models but that would be an improvement
> over the current cf8/cfc hackery anyway.

I think we need a way for QEMU to tell Xen the position of the MCFG
area, and any changes to it.

I don't think we want to emulate the PCIEXBAR register inside of Xen,
if we do that then we would likely have to emulate the full Express
Chipset inside of Xen.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Paul Durrant
> -Original Message-
> From: Alexey G [mailto:x19...@gmail.com]
> Sent: 22 March 2018 09:55
> To: Jan Beulich 
> Cc: Andrew Cooper ; Anthony Perard
> ; Ian Jackson ; Paul
> Durrant ; Roger Pau Monne
> ; Wei Liu ; Stefano Stabellini
> ; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Thu, 22 Mar 2018 03:04:16 -0600
> "Jan Beulich"  wrote:
> 
>  On 22.03.18 at 01:31,  wrote:
> >> On Wed, 21 Mar 2018 17:06:28 +
> >> Paul Durrant  wrote:
> >> [...]
>  Well, this might work actually. Although the overall scenario will
>  be overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it
>  will look:
> 
>  QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen
> new
>  MMCONFIG address/size -> Xen (re)maps MMIO trapping area ->
> someone
>  is
>  accessing this area -> Xen intercepts this MMIO access
> 
>  But here's what happens next:
> 
>  Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
>  DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back
>  to the offset in emulated MMCONFIG range -> DM calls
>  address_space_read/write to trigger MMIO emulation
> 
> >>>
> >>>That would only be true of a dm that cannot handle PCI config ioreqs
> >>>directly.
> >>
> >> It's just a bit problematic for xen-hvm.c (Xen ioreq processor in
> >> QEMU).
> >>
> >> It receives these PCI conf ioreqs out of any context. To workaround
> >> this, existing code issues I/O to emulated CF8h/CFCh ports in order
> >> to allow QEMU to find their target. But we can't use the same method
> >> for MMCONFIG accesses -- this works for basic PCI conf space only.
> >
> >I think you want to view this the other way around: No physical
> >device would ever get to see MMCFG accesses (or CF8/CFC port
> >ones). This same layering is what we should have in the
> >virtualized case.
> 
> We have purely virtual layout of the PCI bus along with virtual,
> emulated and completely unrelated to host's MMCONFIG -- so what's
> exposed? This emulated MMCONFIG simply a supplement to virtual PCI bus
> and its layout correspond to the virtual PCI bus guest/QEMU see.
> 
> It's QEMU who controls chipset-specific PCIEXBAR emulation and knows
> about MMCONFIG position and size.

...and I think that it the wrong solution for Xen. We only use QEMU as an 
emulator for peripheral devices; we should not be using it for this kind of 
emulation... that should be brought into the hypervisor.

> QEMU informs Xen about where it is,

No. Xen should not care where QEMU wants to put it because the MMIO emulations 
should not even read QEMU.

   Paul

> in order to receive events about R/W accesses to this emulated area --
> so, why he should receive these events in a form of PCI conf BDF/reg and
> not simply as MMCONFIG offset directly if it is basically the same
> thing?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4] hvm/svm: Implement Debug events

2018-03-22 Thread Jan Beulich
>>> On 22.03.18 at 11:46,  wrote:
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -172,6 +172,24 @@ static void svm_enable_msr_interception(struct domain 
> *d, uint32_t msr)
>  svm_intercept_msr(v, msr, MSR_INTERCEPT_WRITE);
>  }
>  
> +static void svm_set_icebp_interception(struct domain *d, bool enable)
> +{
> +struct vcpu *v;

While I agree that the hook's parameter would better not be a
pointer to const, the local variable here surely should be.

> @@ -2656,9 +2663,28 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>  HVMTRACE_0D(SMI);
>  break;
>  
> +case VMEXIT_ICEBP:
>  case VMEXIT_EXCEPTION_DB:
>  if ( !v->domain->debugger_attached )
> -hvm_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
> +{
> +int rc;
> +unsigned int trap_type = exit_reason == VMEXIT_ICEBP ?
> +X86_EVENTTYPE_PRI_SW_EXCEPTION : X86_EVENTTYPE_HW_EXCEPTION;
> +
> +inst_len = 0;
> +
> +if ( trap_type >= X86_EVENTTYPE_SW_INTERRUPT )
> +inst_len = __get_instruction_length(v, INSTR_ICEBP);

>= (other == ) implies more than a single type is covered. How
does that fit with passing the unique INSTR_ICEBP to the function?
I don't see the point anyway to set the type to one of two
possible values and then compare against a third. Things would
likely quite a bit more obvious if you had an if/else pair and did
both type and insn len assignments separately for each case.

> @@ -581,6 +596,16 @@ static inline bool_t hvm_enable_msr_interception(struct 
> domain *d, uint32_t msr)
>  return 0;
>  }
>  
> +static inline bool hvm_set_icebp_interception(struct domain *d, bool enable)
> +{
> +if( hvm_funcs.set_icebp_interception )

Contrary to what your revision log says, there's still a style issue
here plus ...

> +{
> +hvm_funcs.set_icebp_interception(d, enable);
> +return 1;

... true here and ...

> +}
> +return 0;

... false here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 34/39] ARM: new VGIC: vgic-init: register VGIC

2018-03-22 Thread Andre Przywara
Hi,

On 22/03/18 08:00, Julien Grall wrote:
> Hi Andre,
> 
> On 03/21/2018 04:32 PM, Andre Przywara wrote:
>> This patch implements the function which is called by Xen when it wants
>> to register the virtual GIC.
>> This also implements vgic_max_vcpus() for the new VGIC, which reports
>> back the maximum number of VCPUs a certain GIC model supports. Similar
>> to the counterpart in the "old" VGIC, we return some maximum value if
>> the VGIC has not been initialised yet.
>>
>> Signed-off-by: Andre Przywara 
> 
> Thank you for the update. We will have to remove the GIC_INVALID case
> once Andrew's series is merged. If his series is merged before yours, it
> would not be an issue as that case should never be hit.

Yes, for my first reply I didn't originally see that his patch was a 20/20.

So I changed my mind and decided to not rely on this series ;-)
We can indeed fix this up later.

> Reviewed-by: Julien Grall 

Thanks!

Andre.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 1/5] x86/msr: add VMX MSRs definitions and populate Raw domain policy

2018-03-22 Thread Sergey Dyasli
New definitions provide a convenient way of accessing contents of
VMX MSRs. They are separated into 5 logical blocks based on the
availability conditions of MSRs in the each block:

1. vmx: [VMX_BASIC, VMX_VMCS_ENUM]
2. VMX_PROCBASED_CTLS2
3. VMX_EPT_VPID_CAP
4. vmx_true_ctls: [VMX_TRUE_PINBASED_CTLS, VMX_TRUE_ENTRY_CTLS]
5. VMX_VMFUNC

Every bit value is accessible by its name and bit names match existing
Xen's definitions as close as possible. There is a "raw" 64-bit field
for each MSR as well as "raw" arrays for vmx and vmx_true_ctls blocks.

Add calculate_raw_vmx_policy() which fills Raw policy with H/W values
of VMX MSRs. Host policy will contain a copy of these values (for now).

Signed-off-by: Sergey Dyasli 
---
v5 --> v6:
- Removed "_bits" and "_based" from union names
- Removed "_exiting" suffixes from control bit names
- Various shortenings of control bit names
---
 xen/arch/x86/msr.c  | 118 ++
 xen/include/asm-x86/msr.h   | 330 
 xen/include/asm-x86/x86-defns.h |  54 +++
 3 files changed, 502 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 369b4754ce..87239e151e 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -34,10 +34,65 @@ struct msr_domain_policy __read_mostly 
raw_msr_domain_policy,
 struct msr_vcpu_policy __read_mostly hvm_max_msr_vcpu_policy,
__read_mostly  pv_max_msr_vcpu_policy;
 
+static bool vmx_procbased_ctls2_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx.procbased_ctls.allowed_1.secondary;
+}
+
+static bool vmx_ept_vpid_cap_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx_procbased_ctls2.allowed_1.ept ||
+   dp->vmx_procbased_ctls2.allowed_1.vpid;
+}
+
+static bool vmx_true_ctls_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx.basic.default1_zero;
+}
+
+static bool vmx_vmfunc_available(const struct msr_domain_policy *dp)
+{
+return dp->vmx_procbased_ctls2.allowed_1.vmfunc;
+}
+
+static void __init calculate_raw_vmx_policy(struct msr_domain_policy *dp)
+{
+unsigned int i, start_msr, end_msr;
+
+if ( !cpu_has_vmx )
+return;
+
+start_msr = MSR_IA32_VMX_BASIC;
+end_msr = MSR_IA32_VMX_VMCS_ENUM;
+for ( i = start_msr; i <= end_msr; i++ )
+rdmsrl(i, dp->vmx.raw[i - start_msr]);
+
+if ( vmx_procbased_ctls2_available(dp) )
+rdmsrl(MSR_IA32_VMX_PROCBASED_CTLS2, dp->vmx_procbased_ctls2.raw);
+
+if ( vmx_ept_vpid_cap_available(dp) )
+rdmsrl(MSR_IA32_VMX_EPT_VPID_CAP, dp->vmx_ept_vpid_cap.raw);
+
+if ( vmx_true_ctls_available(dp) )
+{
+start_msr = MSR_IA32_VMX_TRUE_PINBASED_CTLS;
+end_msr = MSR_IA32_VMX_TRUE_ENTRY_CTLS;
+for ( i = start_msr; i <= end_msr; i++ )
+rdmsrl(i, dp->vmx_true_ctls.raw[i - start_msr]);
+}
+
+if ( vmx_vmfunc_available(dp) )
+rdmsrl(MSR_IA32_VMX_VMFUNC, dp->vmx_vmfunc.raw);
+}
+
 static void __init calculate_raw_policy(void)
 {
+struct msr_domain_policy *dp = _msr_domain_policy;
+
 /* 0x00ce  MSR_INTEL_PLATFORM_INFO */
 /* Was already added by probe_cpuid_faulting() */
+
+calculate_raw_vmx_policy(dp);
 }
 
 static void __init calculate_host_policy(void)
@@ -284,6 +339,69 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
 return X86EMUL_EXCEPTION;
 }
 
+static void __init __maybe_unused build_assertions(void)
+{
+struct msr_domain_policy dp;
+
+BUILD_BUG_ON(sizeof(dp.vmx.basic) !=
+ sizeof(dp.vmx.basic.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.pinbased_ctls) !=
+ sizeof(dp.vmx.pinbased_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.procbased_ctls) !=
+ sizeof(dp.vmx.procbased_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.exit_ctls) !=
+ sizeof(dp.vmx.exit_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.entry_ctls) !=
+ sizeof(dp.vmx.entry_ctls.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.misc) !=
+ sizeof(dp.vmx.misc.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr0_fixed0) !=
+ sizeof(dp.vmx.cr0_fixed0.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr0_fixed1) !=
+ sizeof(dp.vmx.cr0_fixed1.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr4_fixed0) !=
+ sizeof(dp.vmx.cr4_fixed0.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.cr4_fixed1) !=
+ sizeof(dp.vmx.cr4_fixed1.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.vmcs_enum) !=
+ sizeof(dp.vmx.vmcs_enum.raw));
+BUILD_BUG_ON(sizeof(dp.vmx.raw) !=
+ sizeof(dp.vmx.basic) +
+ sizeof(dp.vmx.pinbased_ctls) +
+ sizeof(dp.vmx.procbased_ctls) +
+ sizeof(dp.vmx.exit_ctls) +
+ sizeof(dp.vmx.entry_ctls) +
+ sizeof(dp.vmx.misc) +
+ sizeof(dp.vmx.cr0_fixed0) +
+ 

[Xen-devel] [PATCH v6 3/5] x86/cpuid: update signature of hvm_cr4_guest_valid_bits()

2018-03-22 Thread Sergey Dyasli
With the new cpuid infrastructure there is a domain-wide struct cpuid
policy and there is no need to pass a separate struct vcpu * into
hvm_cr4_guest_valid_bits() anymore. Make the function accept struct
domain * instead and update callers.

Signed-off-by: Sergey Dyasli 
Reviewed-by: Andrew Cooper 
---
v5 --> v6:
- Added brackets to expression in vmx.c and replaced 0 with false
- Added Reviewed-by
---
 xen/arch/x86/hvm/domain.c   | 3 ++-
 xen/arch/x86/hvm/hvm.c  | 7 +++
 xen/arch/x86/hvm/svm/svmdebug.c | 4 ++--
 xen/arch/x86/hvm/vmx/vmx.c  | 4 ++--
 xen/arch/x86/hvm/vmx/vvmx.c | 2 +-
 xen/include/asm-x86/hvm/hvm.h   | 2 +-
 6 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/domain.c b/xen/arch/x86/hvm/domain.c
index 60474649de..ce15ce0470 100644
--- a/xen/arch/x86/hvm/domain.c
+++ b/xen/arch/x86/hvm/domain.c
@@ -111,6 +111,7 @@ static int check_segment(struct segment_register *reg, enum 
x86_segment seg)
 /* Called by VCPUOP_initialise for HVM guests. */
 int arch_set_info_hvm_guest(struct vcpu *v, const vcpu_hvm_context_t *ctx)
 {
+const struct domain *d = v->domain;
 struct cpu_user_regs *uregs = >arch.user_regs;
 struct segment_register cs, ds, ss, es, tr;
 const char *errstr;
@@ -272,7 +273,7 @@ int arch_set_info_hvm_guest(struct vcpu *v, const 
vcpu_hvm_context_t *ctx)
 if ( v->arch.hvm_vcpu.guest_efer & EFER_LME )
 v->arch.hvm_vcpu.guest_efer |= EFER_LMA;
 
-if ( v->arch.hvm_vcpu.guest_cr[4] & ~hvm_cr4_guest_valid_bits(v, 0) )
+if ( v->arch.hvm_vcpu.guest_cr[4] & ~hvm_cr4_guest_valid_bits(d, false) )
 {
 gprintk(XENLOG_ERR, "Bad CR4 value: %#016lx\n",
 v->arch.hvm_vcpu.guest_cr[4]);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5759c73dd4..fe253034f2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -931,9 +931,8 @@ const char *hvm_efer_valid(const struct vcpu *v, uint64_t 
value,
 X86_CR0_CD | X86_CR0_PG)))
 
 /* These bits in CR4 can be set by the guest. */
-unsigned long hvm_cr4_guest_valid_bits(const struct vcpu *v, bool restore)
+unsigned long hvm_cr4_guest_valid_bits(const struct domain *d, bool restore)
 {
-const struct domain *d = v->domain;
 const struct cpuid_policy *p;
 bool mce, vmxe;
 
@@ -1000,7 +999,7 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
hvm_domain_context_t *h)
 return -EINVAL;
 }
 
-if ( ctxt.cr4 & ~hvm_cr4_guest_valid_bits(v, 1) )
+if ( ctxt.cr4 & ~hvm_cr4_guest_valid_bits(d, true) )
 {
 printk(XENLOG_G_ERR "HVM%d restore: bad CR4 %#" PRIx64 "\n",
d->domain_id, ctxt.cr4);
@@ -2350,7 +2349,7 @@ int hvm_set_cr4(unsigned long value, bool_t may_defer)
 struct vcpu *v = current;
 unsigned long old_cr;
 
-if ( value & ~hvm_cr4_guest_valid_bits(v, 0) )
+if ( value & ~hvm_cr4_guest_valid_bits(v->domain, false) )
 {
 HVM_DBG_LOG(DBG_LEVEL_1,
 "Guest attempts to set reserved bit in CR4: %lx",
diff --git a/xen/arch/x86/hvm/svm/svmdebug.c b/xen/arch/x86/hvm/svm/svmdebug.c
index 091c58fa1b..6c215d19fe 100644
--- a/xen/arch/x86/hvm/svm/svmdebug.c
+++ b/xen/arch/x86/hvm/svm/svmdebug.c
@@ -121,9 +121,9 @@ bool svm_vmcb_isvalid(const char *from, const struct 
vmcb_struct *vmcb,
(cr3 >> v->domain->arch.cpuid->extd.maxphysaddr))) )
 PRINTF("CR3: MBZ bits are set (%#"PRIx64")\n", cr3);
 
-if ( cr4 & ~hvm_cr4_guest_valid_bits(v, false) )
+if ( cr4 & ~hvm_cr4_guest_valid_bits(v->domain, false) )
 PRINTF("CR4: invalid bits are set (%#"PRIx64", valid: %#"PRIx64")\n",
-   cr4, hvm_cr4_guest_valid_bits(v, false));
+   cr4, hvm_cr4_guest_valid_bits(v->domain, false));
 
 if ( vmcb_get_dr6(vmcb) >> 32 )
 PRINTF("DR6: bits [63:32] are not zero (%#"PRIx64")\n",
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c5cc96339e..847c314a08 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1598,8 +1598,8 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned 
int cr,
  * Update CR4 host mask to only trap when the guest tries to set
  * bits that are controlled by the hypervisor.
  */
-v->arch.hvm_vmx.cr4_host_mask = HVM_CR4_HOST_MASK | X86_CR4_PKE |
-~hvm_cr4_guest_valid_bits(v, 0);
+v->arch.hvm_vmx.cr4_host_mask = (HVM_CR4_HOST_MASK | X86_CR4_PKE |
+   ~hvm_cr4_guest_valid_bits(v->domain, 
false));
 v->arch.hvm_vmx.cr4_host_mask |= v->arch.hvm_vmx.vmx_realmode ?
  X86_CR4_VME : 0;
 v->arch.hvm_vmx.cr4_host_mask |= !hvm_paging_enabled(v) ?
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index dcd3b28f86..43f7297c04 100644
--- 

[Xen-devel] [PATCH v6 4/5] x86/msr: update domain policy on CPUID policy changes

2018-03-22 Thread Sergey Dyasli
Availability of some MSRs depends on certain CPUID bits. Add function
recalculate_domain_msr_policy() which updates availability of MSRs
based on current domain's CPUID policy. This function is called when
CPUID policy is changed from a toolstack.

Add recalculate_domain_vmx_msr_policy() which changes availability of
VMX MSRs based on domain's nested virt settings. If it's enabled, then
the domain receives a copy of HVM_max vmx policy with allowed CR4 bits
adjusted by CPUID policy.

Signed-off-by: Sergey Dyasli 
Reviewed-by: Andrew Cooper 
---
v5 --> v6:
- Updated recalculate_msr_policy() comment and commit message
- Added Reviewed-by
---
 xen/arch/x86/domctl.c |  1 +
 xen/arch/x86/msr.c| 35 +++
 xen/include/asm-x86/msr.h |  3 +++
 3 files changed, 39 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3aeb3..5bde1a22b7 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -125,6 +125,7 @@ static int update_domain_cpuid_info(struct domain *d,
 }
 
 recalculate_cpuid_policy(d);
+recalculate_msr_policy(d);
 
 switch ( ctl->input[0] )
 {
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 01a5b52f95..26d987098b 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 DEFINE_PER_CPU(uint32_t, tsc_aux);
 
@@ -283,6 +284,39 @@ void __init init_guest_msr_policy(void)
 calculate_pv_max_policy();
 }
 
+static void vmx_copy_policy(const struct msr_domain_policy *src,
+  struct msr_domain_policy *dst)
+{
+memcpy(dst->vmx.raw, src->vmx.raw, sizeof(dst->vmx.raw));
+dst->vmx_procbased_ctls2.raw = src->vmx_procbased_ctls2.raw;
+dst->vmx_ept_vpid_cap.raw = src->vmx_ept_vpid_cap.raw;
+memcpy(dst->vmx_true_ctls.raw, src->vmx_true_ctls.raw,
+   sizeof(dst->vmx_true_ctls.raw));
+dst->vmx_vmfunc.raw = src->vmx_vmfunc.raw;
+}
+
+static void recalculate_vmx_msr_policy(struct domain *d)
+{
+struct msr_domain_policy *dp = d->arch.msr;
+
+if ( !nestedhvm_enabled(d) || !d->arch.cpuid->basic.vmx )
+{
+vmx_clear_policy(dp);
+
+return;
+}
+
+vmx_copy_policy(_max_msr_domain_policy, dp);
+
+/* Get allowed CR4 bits from CPUID policy */
+dp->vmx.cr4_fixed1.allowed_1.raw = hvm_cr4_guest_valid_bits(d, false);
+}
+
+void recalculate_msr_policy(struct domain *d)
+{
+recalculate_vmx_msr_policy(d);
+}
+
 int init_domain_msr_policy(struct domain *d)
 {
 struct msr_domain_policy *dp;
@@ -303,6 +337,7 @@ int init_domain_msr_policy(struct domain *d)
 }
 
 d->arch.msr = dp;
+recalculate_msr_policy(d);
 
 return 0;
 }
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 5fdf82860e..41433fea94 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -641,6 +641,9 @@ int init_vcpu_msr_policy(struct vcpu *v);
 int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val);
 int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val);
 
+/* Updates availability of MSRs based on CPUID policy */
+void recalculate_msr_policy(struct domain *d);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_MSR_H */
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 5/5] x86/msr: handle VMX MSRs with guest_rd/wrmsr()

2018-03-22 Thread Sergey Dyasli
Now that each domain has a correct view of VMX MSRs in it's per-domain
MSR policy, it's possible to handle guest's RD/WRMSR with the new
handlers. Do it and remove the old nvmx_msr_read_intercept() and
associated bits.

There is no functional change to what a guest sees in its VMX MSRs.

Signed-off-by: Sergey Dyasli 
Reviewed-by: Andrew Cooper 
---
v5 --> v6:
- Moved VMX MSRs case to the read-only block in guest_wrmsr()
- Added Reviewed-by
---
 xen/arch/x86/hvm/vmx/vmx.c |   6 --
 xen/arch/x86/hvm/vmx/vvmx.c| 178 -
 xen/arch/x86/msr.c |  32 +++
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 -
 4 files changed, 32 insertions(+), 186 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 847c314a08..ba5b78a9c2 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2869,10 +2869,6 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 if ( nestedhvm_enabled(curr->domain) )
 *msr_content |= IA32_FEATURE_CONTROL_ENABLE_VMXON_OUTSIDE_SMX;
 break;
-case MSR_IA32_VMX_BASIC...MSR_IA32_VMX_VMFUNC:
-if ( !nvmx_msr_read_intercept(msr, msr_content) )
-goto gp_fault;
-break;
 case MSR_IA32_MISC_ENABLE:
 rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content);
 /* Debug Trace Store is not supported. */
@@ -3126,8 +3122,6 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 break;
 }
 case MSR_IA32_FEATURE_CONTROL:
-case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
-/* None of these MSRs are writeable. */
 goto gp_fault;
 
 case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 43f7297c04..5a1d9c8fc5 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1980,184 +1980,6 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
 return X86EMUL_OKAY;
 }
 
-#define __emul_value(enable1, default1) \
-((enable1 | default1) << 32 | (default1))
-
-#define gen_vmx_msr(enable1, default1, host_value) \
-(((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \
-((uint32_t)(__emul_value(enable1, default1) | host_value)))
-
-/*
- * Capability reporting
- */
-int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
-{
-struct vcpu *v = current;
-struct domain *d = v->domain;
-u64 data = 0, host_data = 0;
-int r = 1;
-
-/* VMX capablity MSRs are available only when guest supports VMX. */
-if ( !nestedhvm_enabled(d) || !d->arch.cpuid->basic.vmx )
-return 0;
-
-/*
- * These MSRs are only available when flags in other MSRs are set.
- * These prerequisites are listed in the Intel 64 and IA-32
- * Architectures Software Developer’s Manual, Vol 3, Appendix A.
- */
-switch ( msr )
-{
-case MSR_IA32_VMX_PROCBASED_CTLS2:
-if ( !cpu_has_vmx_secondary_exec_control )
-return 0;
-break;
-
-case MSR_IA32_VMX_EPT_VPID_CAP:
-if ( !(cpu_has_vmx_ept || cpu_has_vmx_vpid) )
-return 0;
-break;
-
-case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
-case MSR_IA32_VMX_TRUE_EXIT_CTLS:
-case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
-if ( !(vmx_basic_msr & VMX_BASIC_DEFAULT1_ZERO) )
-return 0;
-break;
-
-case MSR_IA32_VMX_VMFUNC:
-if ( !cpu_has_vmx_vmfunc )
-return 0;
-break;
-}
-
-rdmsrl(msr, host_data);
-
-/*
- * Remove unsupport features from n1 guest capability MSR
- */
-switch (msr) {
-case MSR_IA32_VMX_BASIC:
-{
-const struct vmcs_struct *vmcs =
-map_domain_page(_mfn(PFN_DOWN(v->arch.hvm_vmx.vmcs_pa)));
-
-data = (host_data & (~0ul << 32)) |
-   (vmcs->vmcs_revision_id & 0x7fff);
-unmap_domain_page(vmcs);
-break;
-}
-case MSR_IA32_VMX_PINBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
-/* 1-settings */
-data = PIN_BASED_EXT_INTR_MASK |
-   PIN_BASED_NMI_EXITING |
-   PIN_BASED_PREEMPT_TIMER;
-data = gen_vmx_msr(data, VMX_PINBASED_CTLS_DEFAULT1, host_data);
-break;
-case MSR_IA32_VMX_PROCBASED_CTLS:
-case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
-{
-u32 default1_bits = VMX_PROCBASED_CTLS_DEFAULT1;
-/* 1-settings */
-data = CPU_BASED_HLT_EXITING |
-   CPU_BASED_VIRTUAL_INTR_PENDING |
-   CPU_BASED_CR8_LOAD_EXITING |
-   CPU_BASED_CR8_STORE_EXITING |
-   CPU_BASED_INVLPG_EXITING |
-   CPU_BASED_CR3_LOAD_EXITING |
-   CPU_BASED_CR3_STORE_EXITING |
-   CPU_BASED_MONITOR_EXITING |
-   CPU_BASED_MWAIT_EXITING |
- 

[Xen-devel] [PATCH v6 0/5] VMX MSRs policy for Nested Virt: part 1

2018-03-22 Thread Sergey Dyasli
The end goal of having VMX MSRs policy is to be able to manage
L1 VMX features. This patch series is the first part of this work.
There is no functional change to what L1 sees in VMX MSRs at this
point. But each domain will have a policy object which allows to
sensibly query what VMX features the domain has. This will unblock
some other nested virtualization work items.

Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

The above makes L1 VMX feature set inconsistent between different H/W
and there is no ability to control what features are available to L1.
The overall set of issues has much in common with CPUID policy.

Part 1 adds VMX MSRs into struct msr_domain_policy and initializes them
during domain creation based on CPUID policy. In the future it should be
possible to independently configure values of VMX MSRs for each domain.

v5 --> v6:
- Various shortenings of control bit names
- Added Reviewed-by: Andrew Cooper to pathes 3,4 and 5
- Other changes are provided on per-patch basis

Sergey Dyasli (5):
  x86/msr: add VMX MSRs definitions and populate Raw domain policy
  x86/msr: add VMX MSRs into HVM_max domain policy
  x86/cpuid: update signature of hvm_cr4_guest_valid_bits()
  x86/msr: update domain policy on CPUID policy changes
  x86/msr: handle VMX MSRs with guest_rd/wrmsr()

 xen/arch/x86/domctl.c  |   1 +
 xen/arch/x86/hvm/domain.c  |   3 +-
 xen/arch/x86/hvm/hvm.c |   7 +-
 xen/arch/x86/hvm/svm/svmdebug.c|   4 +-
 xen/arch/x86/hvm/vmx/vmx.c |  10 +-
 xen/arch/x86/hvm/vmx/vvmx.c| 178 
 xen/arch/x86/msr.c | 320 +++
 xen/include/asm-x86/hvm/hvm.h  |   2 +-
 xen/include/asm-x86/hvm/vmx/vvmx.h |   2 -
 xen/include/asm-x86/msr.h  | 333 +
 xen/include/asm-x86/x86-defns.h|  54 ++
 11 files changed, 718 insertions(+), 196 deletions(-)

-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v18 08/11] tools/libxenforeignmemory: add support for resource mapping

2018-03-22 Thread Paul Durrant
A previous patch introduced a new HYPERVISOR_memory_op to acquire guest
resources for direct priv-mapping.

This patch adds new functionality into libxenforeignmemory to make use
of a new privcmd ioctl [1] that uses the new memory op to make such
resources available via mmap(2).

[1] 
http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=ce59a05e6712

Signed-off-by: Paul Durrant 
Reviewed-by: Roger Pau Monné 
Reviewed-by: Wei Liu 
---
Cc: Ian Jackson 

v4:
 - Fixed errno and removed single-use label
 - The unmap call now returns a status
 - Use C99 initialization for ioctl struct

v2:
 - Bump minor version up to 3.
---
 tools/include/xen-sys/Linux/privcmd.h  | 11 +
 tools/libs/foreignmemory/Makefile  |  2 +-
 tools/libs/foreignmemory/core.c| 53 ++
 .../libs/foreignmemory/include/xenforeignmemory.h  | 41 +
 tools/libs/foreignmemory/libxenforeignmemory.map   |  5 ++
 tools/libs/foreignmemory/linux.c   | 45 ++
 tools/libs/foreignmemory/private.h | 31 +
 7 files changed, 187 insertions(+), 1 deletion(-)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index 732ff7c15a..9531b728f9 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -86,6 +86,15 @@ typedef struct privcmd_dm_op {
const privcmd_dm_op_buf_t __user *ubufs;
 } privcmd_dm_op_t;
 
+typedef struct privcmd_mmap_resource {
+   domid_t dom;
+   __u32 type;
+   __u32 id;
+   __u32 idx;
+   __u64 num;
+   __u64 addr;
+} privcmd_mmap_resource_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: _hypercall_t
@@ -103,5 +112,7 @@ typedef struct privcmd_dm_op {
_IOC(_IOC_NONE, 'P', 5, sizeof(privcmd_dm_op_t))
 #define IOCTL_PRIVCMD_RESTRICT \
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
+#define IOCTL_PRIVCMD_MMAP_RESOURCE\
+   _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
 
 #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
diff --git a/tools/libs/foreignmemory/Makefile 
b/tools/libs/foreignmemory/Makefile
index cbe815fce8..ee5c3fd67e 100644
--- a/tools/libs/foreignmemory/Makefile
+++ b/tools/libs/foreignmemory/Makefile
@@ -2,7 +2,7 @@ XEN_ROOT = $(CURDIR)/../../..
 include $(XEN_ROOT)/tools/Rules.mk
 
 MAJOR= 1
-MINOR= 2
+MINOR= 3
 SHLIB_LDFLAGS += -Wl,--version-script=libxenforeignmemory.map
 
 CFLAGS   += -Werror -Wmissing-prototypes
diff --git a/tools/libs/foreignmemory/core.c b/tools/libs/foreignmemory/core.c
index 7c8562ae74..63f12e2450 100644
--- a/tools/libs/foreignmemory/core.c
+++ b/tools/libs/foreignmemory/core.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 
+#include 
+
 #include "private.h"
 
 static int all_restrict_cb(Xentoolcore__Active_Handle *ah, domid_t domid) {
@@ -135,6 +137,57 @@ int xenforeignmemory_restrict(xenforeignmemory_handle 
*fmem,
 return osdep_xenforeignmemory_restrict(fmem, domid);
 }
 
+xenforeignmemory_resource_handle *xenforeignmemory_map_resource(
+xenforeignmemory_handle *fmem, domid_t domid, unsigned int type,
+unsigned int id, unsigned long frame, unsigned long nr_frames,
+void **paddr, int prot, int flags)
+{
+xenforeignmemory_resource_handle *fres;
+int rc;
+
+/* Check flags only contains POSIX defined values */
+if ( flags & ~(MAP_SHARED | MAP_PRIVATE) )
+{
+errno = EINVAL;
+return NULL;
+}
+
+fres = calloc(1, sizeof(*fres));
+if ( !fres )
+{
+errno = ENOMEM;
+return NULL;
+}
+
+fres->domid = domid;
+fres->type = type;
+fres->id = id;
+fres->frame = frame;
+fres->nr_frames = nr_frames;
+fres->addr = *paddr;
+fres->prot = prot;
+fres->flags = flags;
+
+rc = osdep_xenforeignmemory_map_resource(fmem, fres);
+if ( rc )
+{
+free(fres);
+fres = NULL;
+} else
+*paddr = fres->addr;
+
+return fres;
+}
+
+int xenforeignmemory_unmap_resource(
+xenforeignmemory_handle *fmem, xenforeignmemory_resource_handle *fres)
+{
+int rc = osdep_xenforeignmemory_unmap_resource(fmem, fres);
+
+free(fres);
+return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/foreignmemory/include/xenforeignmemory.h 
b/tools/libs/foreignmemory/include/xenforeignmemory.h
index f4814c390f..d594be8df0 100644
--- a/tools/libs/foreignmemory/include/xenforeignmemory.h
+++ b/tools/libs/foreignmemory/include/xenforeignmemory.h
@@ -138,6 +138,47 @@ int xenforeignmemory_unmap(xenforeignmemory_handle *fmem,
 int xenforeignmemory_restrict(xenforeignmemory_handle *fmem,
   domid_t domid);
 
+typedef struct xenforeignmemory_resource_handle 
xenforeignmemory_resource_handle;
+
+/**
+ * This 

Re: [Xen-devel] [PATCH v4 7/8] x86: also NOP out xen_cr3 restores of XPTI

2018-03-22 Thread Wei Liu
On Mon, Mar 19, 2018 at 07:41:11AM -0600, Jan Beulich wrote:
> ... despite quite likely the gain being rather limited.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 5/8] x86/XPTI: reduce .text.entry

2018-03-22 Thread Wei Liu
On Mon, Mar 19, 2018 at 07:40:12AM -0600, Jan Beulich wrote:
> This exposes less code pieces and at the same time reduces the range
> covered from slightly above 3 pages to a little below 2 of them.
> 
> The code being moved is unchanged, except for the removal of trailing
> blanks, insertion of blanks between operands, and a pointless q suffix
> from "retq".
> 
> A few more small pieces could be moved, but it seems better to me to
> leave them where they are to not make it overly hard to follow code
> paths.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 6/8] x86: enable interrupts earlier with XPTI disabled

2018-03-22 Thread Wei Liu
On Mon, Mar 19, 2018 at 07:40:50AM -0600, Jan Beulich wrote:
> The STI instances were moved (or added in the INT80 case) to meet TLB
> flush requirements. When XPTI is disabled, they can be put back where
> they were (or omitted in the INT80 case).
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] drm/xen-front: Add support for Xen PV display frontend

2018-03-22 Thread Oleksandr Andrushchenko

On 03/22/2018 03:14 AM, Boris Ostrovsky wrote:



On 03/21/2018 10:58 AM, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

Add support for Xen para-virtualized frontend display driver.
Accompanying backend [1] is implemented as a user-space application
and its helper library [2], capable of running as a Weston client
or DRM master.
Configuration of both backend and frontend is done via
Xen guest domain configuration options [3].



I won't claim that I really understand what's going on here as far as 
DRM stuff is concerned but I didn't see any obvious issues with Xen bits.


So for that you can tack on my
Reviewed-by: Boris Ostrovsky 


Thank you

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 03:04:16 -0600
"Jan Beulich"  wrote:

 On 22.03.18 at 01:31,  wrote:  
>> On Wed, 21 Mar 2018 17:06:28 +
>> Paul Durrant  wrote:
>> [...]  
 Well, this might work actually. Although the overall scenario will
 be overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it
 will look:
 
 QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
 MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone
 is
 accessing this area -> Xen intercepts this MMIO access
 
 But here's what happens next:
 
 Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
 DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back
 to the offset in emulated MMCONFIG range -> DM calls
 address_space_read/write to trigger MMIO emulation
 
>>>
>>>That would only be true of a dm that cannot handle PCI config ioreqs
>>>directly.  
>> 
>> It's just a bit problematic for xen-hvm.c (Xen ioreq processor in
>> QEMU).
>> 
>> It receives these PCI conf ioreqs out of any context. To workaround
>> this, existing code issues I/O to emulated CF8h/CFCh ports in order
>> to allow QEMU to find their target. But we can't use the same method
>> for MMCONFIG accesses -- this works for basic PCI conf space only.  
>
>I think you want to view this the other way around: No physical
>device would ever get to see MMCFG accesses (or CF8/CFC port
>ones). This same layering is what we should have in the
>virtualized case.

We have purely virtual layout of the PCI bus along with virtual,
emulated and completely unrelated to host's MMCONFIG -- so what's
exposed? This emulated MMCONFIG simply a supplement to virtual PCI bus
and its layout correspond to the virtual PCI bus guest/QEMU see.

It's QEMU who controls chipset-specific PCIEXBAR emulation and knows
about MMCONFIG position and size. QEMU informs Xen about where it is,
in order to receive events about R/W accesses to this emulated area --
so, why he should receive these events in a form of PCI conf BDF/reg and
not simply as MMCONFIG offset directly if it is basically the same
thing?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 04/12] pci: split code to size BARs from pci_add_device

2018-03-22 Thread Jan Beulich
>>> On 20.03.18 at 16:15,  wrote:
> @@ -672,11 +722,16 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>  unsigned int i;
>  
>  BUILD_BUG_ON(ARRAY_SIZE(pdev->vf_rlen) != PCI_SRIOV_NUM_BARS);
> -for ( i = 0; i < PCI_SRIOV_NUM_BARS; ++i )
> +for ( i = 0; i < PCI_SRIOV_NUM_BARS; )
>  {
>  unsigned int idx = pos + PCI_SRIOV_BAR + i * 4;
>  u32 bar = pci_conf_read32(seg, bus, slot, func, idx);
> -u32 hi = 0;
> +pci_sbdf_t sbdf = {
> +.seg = seg,
> +.bus = bus,
> +.dev = slot,
> +.func = func,
> +};

So I've had everything up to patch 9 applied and ready for pushing,
when I did my usual secondary compile test on an old system: This
fails to compile with gcc 4.3 (due to there being a unnamed sub-
structure). A similar issue exists at least in patch 7. Since the
structure gets introduced in patch 1 (and hence may need changing
there, depending on how this is to be addressed), I'm not going to
push any part of this series.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/5] x86/msr: add VMX MSRs into HVM_max domain policy

2018-03-22 Thread Sergey Dyasli
On Wed, 2018-03-21 at 20:46 +, Andrew Cooper wrote:
> On 28/02/2018 16:09, Sergey Dyasli wrote:
> > +
> > +dp->vmx.pinbased_ctls.allowed_0.raw = VMX_PINBASED_CTLS_DEFAULT1;
> > +dp->vmx.pinbased_ctls.allowed_1.raw = VMX_PINBASED_CTLS_DEFAULT1;
> > +supported = PIN_BASED_EXT_INTR_MASK |
> > +PIN_BASED_NMI_EXITING   |
> > +PIN_BASED_PREEMPT_TIMER;
> 
> Please have a single set of brackets around the entire or statement, so
> editors will indent new changes correctly.

Which editors? My editor is doing it fine. Anyway, is this what you are
asking for?

supported = (PIN_BASED_EXT_INTR_MASK |
 PIN_BASED_NMI_EXITING   |
 PIN_BASED_PREEMPT_TIMER);

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] X86 Community Call - Wed Apr 11, 14:00 - 15:00 UTC - Call for Agenda Items

2018-03-22 Thread Paul Durrant
De-htmling...

-
From: Lars Kurth 
Sent: 22 March 2018 10:22
To: xen-de...@lists.xensource.com
Cc: committ...@xenproject.org; Juergen Gross ; Janakarajan 
Natarajan ; Tamas K Lengyel ; Wei Liu 
; Andrew Cooper ; Daniel Kiper 
; Roger Pau Monné ; Christopher 
Clark ; Rich Persaud ; Paul 
Durrant ; Jan Beulich' ; Brian 
Woods ; intel-...@intel.com
Subject: X86 Community Call - Wed Apr 11, 14:00 - 15:00 UTC - Call for Agenda 
Items

Hi all,
please find attached 
a) Meeting details (just a link with timezones) – the meeting invite will 
follow when we have an agenda
   Bridge details – will be sent with the meeting invite
   I am thinking of using GotoMeeting, but want to try this with a Linux only 
user before I commit
c) Call for agenda items
A few suggestions were made, such as XPTI status (if applicable), PVH status
Also we have some left-overs from the last call: see 
https://lists.xenproject.org/archives/html/xen-devel/2018-03/threads.html#01571 
 
Regards
Lars
== Meeting Details ==
Wed April 11, 15:00 - 16:00 UTC
International meeting times: 
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2018=4=11=14=0=0=224=24=179=136=37=33
 
== Agenda Proposal ==
We start with a round the table call as to who is on the call (name and company)
=== A) Coordination and Planning ===
Coordinating who does what, what needs attention, what is blocked, etc. 
A1) Short-term
Any urgent issues related to the 4.11 release that need discussing 
A2) Long-term, Larger series
Please call out any x86 related series, that need attention in the longer term. 
Provide
* Title of series
* Link to series (e.g. on https://lists.xenproject.org/archives/html/xen-devel, 
markmail, …)
* Describe any: Dependencies, Issues, etc. that are relevant
=== B) Design, architecture, feature eupdates related discussions ===
Please highlight any design/architecture discussions that you would like to 
cover. Please describe
* Design, point to any mail discussions
* Describe clearly what you are blocked on: highlight any issues
=== C) Demos, Sharing of Experiences, Sometimes discussion of specific 
issues/bugs/problems/... ===
Please highlight any of the above that you would like to cover. Please describe
* What the issue/experience/demo is that you would like to cover
=== D) AOB ===
-

I think we need to discuss PCI emulation and our future direction. Our current 
hybrid with QEMU is becoming increasingly problematic.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11 04/12] pci: split code to size BARs from pci_add_device

2018-03-22 Thread Jan Beulich
>>> On 22.03.18 at 11:31,  wrote:
> On Thu, Mar 22, 2018 at 04:15:06AM -0600, Jan Beulich wrote:
>> >>> On 20.03.18 at 16:15,  wrote:
>> > @@ -672,11 +722,16 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>> >  unsigned int i;
>> >  
>> >  BUILD_BUG_ON(ARRAY_SIZE(pdev->vf_rlen) != PCI_SRIOV_NUM_BARS);
>> > -for ( i = 0; i < PCI_SRIOV_NUM_BARS; ++i )
>> > +for ( i = 0; i < PCI_SRIOV_NUM_BARS; )
>> >  {
>> >  unsigned int idx = pos + PCI_SRIOV_BAR + i * 4;
>> >  u32 bar = pci_conf_read32(seg, bus, slot, func, idx);
>> > -u32 hi = 0;
>> > +pci_sbdf_t sbdf = {
>> > +.seg = seg,
>> > +.bus = bus,
>> > +.dev = slot,
>> > +.func = func,
>> > +};
>> 
>> So I've had everything up to patch 9 applied and ready for pushing,
>> when I did my usual secondary compile test on an old system: This
>> fails to compile with gcc 4.3 (due to there being a unnamed sub-
>> structure). A similar issue exists at least in patch 7. Since the
>> structure gets introduced in patch 1 (and hence may need changing
> 
> pci_sbdf_t is already in the source tree, it was introduced by
> 514f58d4468a40b5dd418a5ea1742681930c3f2d back in December.

Oh, I guess it's the test harness instance that I've mistakenly seen
in the grep output here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 2/5] x86/msr: add VMX MSRs into HVM_max domain policy

2018-03-22 Thread Sergey Dyasli
Currently, when nested virt is enabled, the set of L1 VMX features
is fixed and calculated by nvmx_msr_read_intercept() as an intersection
between the full set of Xen's supported L1 VMX features, the set of
actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
features that Xen uses.

Add calculate_hvm_max_vmx_policy() which will save the end result of
nvmx_msr_read_intercept() on current H/W into HVM_max domain policy.
There will be no functional change to what L1 sees in VMX MSRs. But the
actual use of HVM_max domain policy will happen later, when VMX MSRs
are handled by guest_rd/wrmsr().

Signed-off-by: Sergey Dyasli 
---
v5 --> v6:
- Replaced !cpu_has_vmx check with !hvm_max_cpuid_policy.basic.vmx
- Added a TODO reminder
- Added brackets around bit or expressions
---
 xen/arch/x86/msr.c | 135 +
 1 file changed, 135 insertions(+)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 87239e151e..01a5b52f95 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -106,6 +106,139 @@ static void __init calculate_host_policy(void)
 dp->plaform_info.cpuid_faulting = cpu_has_cpuid_faulting;
 }
 
+static void vmx_clear_policy(struct msr_domain_policy *dp)
+{
+memset(dp->vmx.raw, 0, sizeof(dp->vmx.raw));
+dp->vmx_procbased_ctls2.raw = 0;
+dp->vmx_ept_vpid_cap.raw = 0;
+memset(dp->vmx_true_ctls.raw, 0, sizeof(dp->vmx_true_ctls.raw));
+dp->vmx_vmfunc.raw = 0;
+}
+
+static void __init calculate_hvm_max_vmx_policy(struct msr_domain_policy *dp)
+{
+const struct msr_domain_policy *hp = _msr_domain_policy;
+uint32_t supported;
+
+if ( !hvm_max_cpuid_policy.basic.vmx )
+return;
+
+vmx_clear_policy(dp);
+
+ /* TODO: actually make vmx features selection sane */
+dp->vmx.basic.raw = hp->vmx.basic.raw;
+
+dp->vmx.pinbased_ctls.allowed_0.raw = VMX_PINBASED_CTLS_DEFAULT1;
+dp->vmx.pinbased_ctls.allowed_1.raw = VMX_PINBASED_CTLS_DEFAULT1;
+supported = (PIN_BASED_EXT_INTR_MASK |
+ PIN_BASED_NMI_EXITING   |
+ PIN_BASED_PREEMPT_TIMER);
+dp->vmx.pinbased_ctls.allowed_1.raw |= supported;
+dp->vmx.pinbased_ctls.allowed_1.raw &= hp->vmx.pinbased_ctls.allowed_1.raw;
+
+dp->vmx.procbased_ctls.allowed_0.raw = VMX_PROCBASED_CTLS_DEFAULT1;
+dp->vmx.procbased_ctls.allowed_1.raw = VMX_PROCBASED_CTLS_DEFAULT1;
+supported = (CPU_BASED_HLT_EXITING  |
+ CPU_BASED_VIRTUAL_INTR_PENDING |
+ CPU_BASED_CR8_LOAD_EXITING |
+ CPU_BASED_CR8_STORE_EXITING|
+ CPU_BASED_INVLPG_EXITING   |
+ CPU_BASED_MONITOR_EXITING  |
+ CPU_BASED_MWAIT_EXITING|
+ CPU_BASED_MOV_DR_EXITING   |
+ CPU_BASED_ACTIVATE_IO_BITMAP   |
+ CPU_BASED_USE_TSC_OFFSETING|
+ CPU_BASED_UNCOND_IO_EXITING|
+ CPU_BASED_RDTSC_EXITING|
+ CPU_BASED_MONITOR_TRAP_FLAG|
+ CPU_BASED_VIRTUAL_NMI_PENDING  |
+ CPU_BASED_ACTIVATE_MSR_BITMAP  |
+ CPU_BASED_PAUSE_EXITING|
+ CPU_BASED_RDPMC_EXITING|
+ CPU_BASED_TPR_SHADOW   |
+ CPU_BASED_ACTIVATE_SECONDARY_CONTROLS);
+dp->vmx.procbased_ctls.allowed_1.raw |= supported;
+dp->vmx.procbased_ctls.allowed_1.raw &=
+hp->vmx.procbased_ctls.allowed_1.raw;
+
+dp->vmx.exit_ctls.allowed_0.raw = VMX_EXIT_CTLS_DEFAULT1;
+dp->vmx.exit_ctls.allowed_1.raw = VMX_EXIT_CTLS_DEFAULT1;
+supported = (VM_EXIT_ACK_INTR_ON_EXIT   |
+ VM_EXIT_IA32E_MODE |
+ VM_EXIT_SAVE_PREEMPT_TIMER |
+ VM_EXIT_SAVE_GUEST_PAT |
+ VM_EXIT_LOAD_HOST_PAT  |
+ VM_EXIT_SAVE_GUEST_EFER|
+ VM_EXIT_LOAD_HOST_EFER |
+ VM_EXIT_LOAD_PERF_GLOBAL_CTRL);
+dp->vmx.exit_ctls.allowed_1.raw |= supported;
+dp->vmx.exit_ctls.allowed_1.raw &= hp->vmx.exit_ctls.allowed_1.raw;
+
+dp->vmx.entry_ctls.allowed_0.raw = VMX_ENTRY_CTLS_DEFAULT1;
+dp->vmx.entry_ctls.allowed_1.raw = VMX_ENTRY_CTLS_DEFAULT1;
+supported = (VM_ENTRY_LOAD_GUEST_PAT|
+ VM_ENTRY_LOAD_GUEST_EFER   |
+ VM_ENTRY_LOAD_PERF_GLOBAL_CTRL |
+ VM_ENTRY_IA32E_MODE);
+dp->vmx.entry_ctls.allowed_1.raw |= supported;
+dp->vmx.entry_ctls.allowed_1.raw &= hp->vmx.entry_ctls.allowed_1.raw;
+
+dp->vmx.misc.raw = hp->vmx.misc.raw;
+/* Do not support CR3-target feature now */
+dp->vmx.misc.cr3_target = false;
+
+/* PG, PE bits must be 1 in VMX operation */
+dp->vmx.cr0_fixed0.allowed_0.pe = true;
+dp->vmx.cr0_fixed0.allowed_0.pg = true;
+
+/* allow 0-settings for all bits */
+dp->vmx.cr0_fixed1.allowed_1.raw = 

[Xen-devel] [PATCH v18 06/11] x86/hvm/ioreq: add a new mappable resource type...

2018-03-22 Thread Paul Durrant
... XENMEM_resource_ioreq_server

This patch adds support for a new resource type that can be mapped using
the XENMEM_acquire_resource memory op.

If an emulator makes use of this resource type then, instead of mapping
gfns, the IOREQ server will allocate pages from the emulating domain's
heap. These pages will never be present in the P2M of the guest at any
point (and are not even shared with the guest) and so are not vulnerable to
any direct attack by the guest.

NOTE: Use of the new resource type is not compatible with use of
  XEN_DMOP_get_ioreq_server_info unless the XEN_DMOP_no_gfns flag is
  set.

Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: George Dunlap 
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: Ian Jackson 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Julien Grall 

v18:
 - Revert largely back to v14, but use ioreq server emulator rather
   than current->domain.
 - Add missing checks spotted by Jan.
 - Re-base.

v17:
 - The use of xenheap pages means that freeing needs to be deferred until
   domain destruction. Add an explanatory paragraph to the commit comment.

v15:
 - Use xenheap pages rather than domheap pages and assign ownership to
   target domain.

v14:
 - Addressed more comments from Jan.

v13:
 - Introduce an arch_acquire_resource() as suggested by Julien (and have
   the ARM varient simply return -EOPNOTSUPP).
 - Check for ioreq server id truncation as requested by Jan.
 - Not added Jan's R-b due to substantive change from v12.

v12:
 - Addressed more comments from Jan.
 - Dropped George's A-b and Wei's R-b because of material change.

v11:
 - Addressed more comments from Jan.

v10:
 - Addressed comments from Jan.

v8:
 - Re-base on new boilerplate.
 - Adjust function signature of hvm_get_ioreq_server_frame(), and test
   whether the bufioreq page is present.

v5:
 - Use get_ioreq_server() function rather than indexing array directly.
 - Add more explanation into comments to state than mapping guest frames
   and allocation of pages for ioreq servers are not simultaneously
   permitted.
 - Add a comment into asm/ioreq.h stating the meaning of the index
   value passed to hvm_get_ioreq_server_frame().
---
 xen/arch/x86/hvm/ioreq.c| 164 
 xen/arch/x86/mm.c   |  47 
 xen/common/memory.c |   3 +-
 xen/include/asm-arm/mm.h|   8 ++
 xen/include/asm-x86/hvm/ioreq.h |   2 +
 xen/include/asm-x86/mm.h|   5 ++
 xen/include/public/hvm/dm_op.h  |   4 +
 xen/include/public/memory.h |   9 +++
 8 files changed, 241 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index ce53d883e9..ca02e6da10 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -259,6 +259,19 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, 
bool buf)
 struct hvm_ioreq_page *iorp = buf ? >bufioreq : >ioreq;
 int rc;
 
+if ( iorp->page )
+{
+/*
+ * If a page has already been allocated (which will happen on
+ * demand if hvm_get_ioreq_server_frame() is called), then
+ * mapping a guest frame is not permitted.
+ */
+if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+return -EPERM;
+
+return 0;
+}
+
 if ( d->is_dying )
 return -EINVAL;
 
@@ -281,6 +294,67 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, 
bool buf)
 return rc;
 }
 
+static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+{
+struct hvm_ioreq_page *iorp = buf ? >bufioreq : >ioreq;
+
+if ( iorp->page )
+{
+/*
+ * If a guest frame has already been mapped (which may happen
+ * on demand if hvm_get_ioreq_server_info() is called), then
+ * allocating a page is not permitted.
+ */
+if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
+return -EPERM;
+
+return 0;
+}
+
+/*
+ * Allocated IOREQ server pages are assigned to the emulating
+ * domain, not the target domain. This is safe because the emulating
+ * domain cannot be destroyed until the ioreq server is destroyed.
+ * Also we must use MEMF_no_refcount otherwise page allocation
+ * could fail if the emulating domain has already reached its
+ * maximum allocation.
+ */
+iorp->page = alloc_domheap_page(s->emulator, MEMF_no_refcount);
+
+if ( !iorp->page )
+return -ENOMEM;
+
+if ( !get_page_type(iorp->page, PGT_writable_page) )
+goto fail;
+
+iorp->va = __map_domain_page_global(iorp->page);
+if ( !iorp->va )
+goto fail;
+
+clear_page(iorp->va);
+return 0;
+
+ fail:
+put_page_and_type(iorp->page);
+iorp->page = NULL;

[Xen-devel] [PATCH v18 04/11] x86/hvm/ioreq: defer mapping gfns until they are actually requested

2018-03-22 Thread Paul Durrant
A subsequent patch will introduce a new scheme to allow an emulator to
map ioreq server pages directly from Xen rather than the guest P2M.

This patch lays the groundwork for that change by deferring mapping of
gfns until their values are requested by an emulator. To that end, the
pad field of the xen_dm_op_get_ioreq_server_info structure is re-purposed
to a flags field and new flag, XEN_DMOP_no_gfns, defined which modifies the
behaviour of XEN_DMOP_get_ioreq_server_info to allow the caller to avoid
requesting the gfn values.

Signed-off-by: Paul Durrant 
Reviewed-by: Roger Pau Monné 
Acked-by: Wei Liu 
Reviewed-by: Jan Beulich 
---
Cc: Ian Jackson 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Chao Gao 

v18:
 - Trivial re-base.

v17:
 - Fix typo in commit comment.

v16:
 - Leave call to map pages in hvm_ioreq_server_init() for default ioreq
   server instance, as pointed out by Chao (cc-ed). This is small and
   obvious change which reduces the size of the patch, so I have left
   existent R-bs and A-bs in place.

v8:
 - For safety make all of the pointers passed to
   hvm_get_ioreq_server_info() optional.
 - Shrink bufioreq_handling down to a uint8_t.

v3:
 - Updated in response to review comments from Wei and Roger.
 - Added a HANDLE_BUFIOREQ macro to make the code neater.
 - This patch no longer introduces a security vulnerability since there
   is now an explicit limit on the number of ioreq servers that may be
   created for any one domain.
---
 tools/libs/devicemodel/core.c   |  8 
 tools/libs/devicemodel/include/xendevicemodel.h |  6 +--
 xen/arch/x86/hvm/dm.c   |  9 +++--
 xen/arch/x86/hvm/ioreq.c| 49 -
 xen/include/asm-x86/hvm/domain.h|  2 +-
 xen/include/public/hvm/dm_op.h  | 32 +---
 6 files changed, 69 insertions(+), 37 deletions(-)

diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index 23924e9a38..f76e3d305e 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -204,6 +204,14 @@ int xendevicemodel_get_ioreq_server_info(
 
 data->id = id;
 
+/*
+ * If the caller is not requesting gfn values then instruct the
+ * hypercall not to retrieve them as this may cause them to be
+ * mapped.
+ */
+if (!ioreq_gfn && !bufioreq_gfn)
+data->flags |= XEN_DMOP_no_gfns;
+
 rc = xendevicemodel_op(dmod, domid, 1, , sizeof(op));
 if (rc)
 return rc;
diff --git a/tools/libs/devicemodel/include/xendevicemodel.h 
b/tools/libs/devicemodel/include/xendevicemodel.h
index 7629c35df7..08cb0d4374 100644
--- a/tools/libs/devicemodel/include/xendevicemodel.h
+++ b/tools/libs/devicemodel/include/xendevicemodel.h
@@ -61,11 +61,11 @@ int xendevicemodel_create_ioreq_server(
  * @parm domid the domain id to be serviced
  * @parm id the IOREQ Server id.
  * @parm ioreq_gfn pointer to a xen_pfn_t to receive the synchronous ioreq
- *  gfn
+ *  gfn. (May be NULL if not required)
  * @parm bufioreq_gfn pointer to a xen_pfn_t to receive the buffered ioreq
- *gfn
+ *gfn. (May be NULL if not required)
  * @parm bufioreq_port pointer to a evtchn_port_t to receive the buffered
- * ioreq event channel
+ * ioreq event channel. (May be NULL if not required)
  * @return 0 on success, -1 on failure.
  */
 int xendevicemodel_get_ioreq_server_info(
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 96b0d13f2f..ce18754442 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -420,16 +420,19 @@ static int dm_op(const struct dmop_args *op_args)
 {
 struct xen_dm_op_get_ioreq_server_info *data =
 _ioreq_server_info;
+const uint16_t valid_flags = XEN_DMOP_no_gfns;
 
 const_op = false;
 
 rc = -EINVAL;
-if ( data->pad )
+if ( data->flags & ~valid_flags )
 break;
 
 rc = hvm_get_ioreq_server_info(d, data->id,
-   >ioreq_gfn,
-   >bufioreq_gfn,
+   (data->flags & XEN_DMOP_no_gfns) ?
+   NULL : >ioreq_gfn,
+   (data->flags & XEN_DMOP_no_gfns) ?
+   NULL : >bufioreq_gfn,
>bufioreq_port);
 break;
 }
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index d5f0e24b98..ce53d883e9 100644
--- 

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Roger Pau Monné
On Thu, Mar 22, 2018 at 10:29:22PM +1000, Alexey G wrote:
> On Thu, 22 Mar 2018 09:57:16 +
> Roger Pau Monné  wrote:
> [...]
> >> Yes, and it is still needed as we have two distinct (and not equal)
> >> interfaces to PCI conf space. Apart from 0..FFh range overlapping
> >> they can be considered very different interfaces. And whether it is
> >> a real system or emulated -- we can use either one of these two
> >> interfaces or both.  
> >
> >The legacy PCI config space accesses and the MCFG config space access
> >are just different methods of accessing the PCI configuration space,
> >but the data _must_ be exactly the same. I don't see how a device
> >would care about where the access to the config space originated.
> 
> If they were different methods of accessing the same thing, they
> could've been used interchangeably. When we've got a PCI conf ioreq
> which has offset>100h we know we cannot just pass it to emulated
> CF8/CFC but have to emulate this specifically.

This is already not the best approach to dispatch PCI config space
access in QEMU. I think the interface in QEMU should be:

pci_conf_space_{read/write}(sbdf, register, size , data)

And this would go directly into the device. But I assume this involves
a non-trivial amount of work to be implemented. Hence xen-hvm.c usage
of the IO port access replay.

> >OK, so you don't want to reconstruct the access, fine.
> >
> >Then just inject it using pcie_mmcfg_data_{read/write} or some similar
> >wrapper. My suggestion was just to try to use the easier way to get
> >this injected into QEMU.
> 
> QEMU knows its position, the problem it that xen-hvm.c (ioreq
> processor) is rather isolated from MMCONFIG emulation.
> 
> If you check the pcie_mmcfg_data_read/write MMCONFIG handlers in QEMU,
> you can see this:
> 
> static uint64_t pcie_mmcfg_data_read(void *opaque, <...>
> {
> PCIExpressHost *e = opaque;
> ...
> 
> We know this 'opaque' when we do MMIO-style MMCONFIG handling as
> pcie_mmcfg_data_read/write are actual handlers.
> 
> But xen-hvm.c needs to gain access to PCIExpressHost out of nowhere,
> which is possible but considered a hack by QEMU. We can also insert
> some code to MMCONFIG emulation which will store info we need to some
> global variables to be used across wildly different and unrelated
> modules. It will work, but anyone who see it will have bad thoughts on
> his mind.

Since you need to notify Xen the MCFG area address, why not just store
the MCFG address while doing this operation? You could do this with a
helper in xen-hvm.c, and keep the variable locally to that file.

In any case, this is a QEMU implementation detail. IMO the IOREQ
interface is clear and should not be bended like this just because
'this is easier to implement in QEMU'.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Paul Durrant
> -Original Message-
> From: Alexey G [mailto:x19...@gmail.com]
> Sent: 21 March 2018 22:50
> To: Roger Pau Monne 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Ian Jackson ; Jan
> Beulich ; Wei Liu ; Paul Durrant
> ; Anthony Perard ;
> Stefano Stabellini 
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Wed, 21 Mar 2018 17:15:04 +
> Roger Pau Monné  wrote:
> [...]
> >> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
> >> conf translation is a redundant step. Why not to allow specifying
> >> for DM whether it prefers to receive MMCONFIG accesses as native
> >> (MMIO ones) or as translated PCI conf ioreqs?
> >
> >You are just adding an extra level of complexity to an interface
> >that's fairly simple. You register a PCI device using
> >XEN_DMOP_IO_RANGE_PCI and you get IOREQ_TYPE_PCI_CONFIG ioreqs.
> 
> Yes, and it is still needed as we have two distinct (and not equal)
> interfaces to PCI conf space. Apart from 0..FFh range overlapping they
> can be considered very different interfaces. And whether it is a real
> system or emulated -- we can use either one of these two interfaces or
> both.
> 
> For QEMU zero changes are needed to support MMCONFIG MMIO accesses
> if
> they come as MMIO ioreqs. It's just what its MMCONFIG emulation code
> expects.
> Anyway, for (kind of vague) users of the multiple ioreq servers
> capability we can enable MMIO translation to PCI conf ioreqs. Note that
> actually this is an extra step, not forwarding trapped MMCONFIG MMIO
> accesses to the selected device model as is.
> 
> >Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI
> config
> >space access is misleading.
> 
> These are very different accesses, both in transport and capabilities.
> 
> >In both cases Xen would have to do the MCFG access decoding in order
> >to figure out which IOREQ server will handle the request. At which
> >point the only step that you avoid is the reconstruction of the memory
> >access from the IOREQ_TYPE_PCI_CONFIG which is trivial.
> 
> The "reconstruction of the memory access" you mentioned won't be easy
> actually. The thing is, address_space_read/write is not all what we
> need.
> 
> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
> need to be an involved party, mainly to know where MMCONFIG area is
> located so we can construct the address within its range from BDF.
> This piece of information is destroyed in the process of MMIO ioreq
> translation to PCI conf type.
> 
> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know anything
> about the current emulated MMCONFIG state. The correct way to have this
> info is to participate in its emulation. As we don't participate, we
> have no other way than trying to gain backdoor access to PCIHost fields
> via things like object_resolve_*(). This solution is cumbersome and
> ugly but will work... and may break anytime due to changes in QEMU.
> 
> QEMU maintainers will grin while looking at all this I'm afraid --
> trapped MMIO accesses which are translated to PCI conf accesses which
> in turn translated back to emulated MMIO accesses upon receiving, along
> with tedious attempts to gain access to MMCONFIG-related info as we're
> not invited to the MMCONFIG emulation party.
> 
> The more I think about it, the more I like the existing
> map_io_range_to_ioreq_server() approach. :( It works without doing
> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> working as expected. There is a problem to make it compatible with
> the specific multiple ioreq servers feature, but providing a new
> dmop/hypercall (which you suggest is a must have thing to trap MMCONFIG
> MMIO to give QEMU only the freedom to tell where it is located) allows
> to solve this problem in any possible way, either MMIO -> PCI conf
> translation or anything else.
> 

I don't think we even want QEMU to have the freedom to say where the MMCONFIG 
areas are located, do we? QEMU is not in charge of the guest memory map and it 
is not responsible for the building the MCFG table, Xen is. So it should be Xen 
that decides where the MMCONFIG area goes for each registered PCI device and it 
should be Xen that adds that to the MCFG table. It should be Xen that handles 
the MMCONFIG MMIO accesses and these should be forwarded to QEMU as PCI config 
IOREQs.
Now, it may be that we need to introduce a Xen specific mechanism into QEMU to 
then route those config space transactions to the device models but that would 
be an improvement over the current cf8/cfc hackery anyway.

  Paul

> >> We can still route either ioreq
> >> type to multiple device emulators accordingly.
> >
> >It's exactly the same that's 

[Xen-devel] [PATCH v4] hvm/svm: Implement Debug events

2018-03-22 Thread Alexandru Isaila
At this moment the Debug events for the AMD architecture are not
forwarded to the monitor layer.

This patch adds the Debug event to the common capabilities, adds
the VMEXIT_ICEBP then forwards the event to the monitor layer.

Chapter 2: SVM Processor and Platform Extensions: "Note: A vector 1
exception generated by the single byte INT1
instruction (also known as ICEBP) does not trigger the #DB
intercept. Software should use the dedicated ICEBP
intercept to intercept ICEBP"

Signed-off-by: Alexandru Isaila 

---
Changes since V3:
- Merge disable/enable hooks into set_icebp_interception
- Address style comments.
---
 xen/arch/x86/hvm/svm/emulate.c|  1 +
 xen/arch/x86/hvm/svm/svm.c| 60 +--
 xen/arch/x86/monitor.c|  3 ++
 xen/include/asm-x86/hvm/hvm.h | 25 +++
 xen/include/asm-x86/hvm/svm/emulate.h |  1 +
 xen/include/asm-x86/monitor.h |  4 +--
 6 files changed, 76 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
index e1a1581..535674e 100644
--- a/xen/arch/x86/hvm/svm/emulate.c
+++ b/xen/arch/x86/hvm/svm/emulate.c
@@ -65,6 +65,7 @@ static const struct {
 } opc_tab[INSTR_MAX_COUNT] = {
 [INSTR_PAUSE]   = { X86EMUL_OPC_F3(0, 0x90) },
 [INSTR_INT3]= { X86EMUL_OPC(   0, 0xcc) },
+[INSTR_ICEBP]   = { X86EMUL_OPC(   0, 0xf1) },
 [INSTR_HLT] = { X86EMUL_OPC(   0, 0xf4) },
 [INSTR_XSETBV]  = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 2, 1) },
 [INSTR_VMRUN]   = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 0) },
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index c34f5b5..affd8da 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -172,6 +172,24 @@ static void svm_enable_msr_interception(struct domain *d, 
uint32_t msr)
 svm_intercept_msr(v, msr, MSR_INTERCEPT_WRITE);
 }
 
+static void svm_set_icebp_interception(struct domain *d, bool enable)
+{
+struct vcpu *v;
+
+for_each_vcpu ( d, v )
+{
+struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
+uint32_t intercepts = vmcb_get_general2_intercepts(vmcb);
+
+if ( enable )
+intercepts |= GENERAL2_INTERCEPT_ICEBP;
+else
+intercepts &= ~GENERAL2_INTERCEPT_ICEBP;
+
+vmcb_set_general2_intercepts(vmcb, intercepts);
+}
+}
+
 static void svm_save_dr(struct vcpu *v)
 {
 struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
@@ -1109,7 +1127,8 @@ static void noreturn svm_do_resume(struct vcpu *v)
 {
 struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
 bool debug_state = (v->domain->debugger_attached ||
-v->domain->arch.monitor.software_breakpoint_enabled);
+v->domain->arch.monitor.software_breakpoint_enabled ||
+v->domain->arch.monitor.debug_exception_enabled);
 bool_t vcpu_guestmode = 0;
 struct vlapic *vlapic = vcpu_vlapic(v);
 
@@ -2438,19 +2457,6 @@ static bool svm_get_pending_event(struct vcpu *v, struct 
x86_event *info)
 return true;
 }
 
-static void svm_propagate_intr(struct vcpu *v, unsigned long insn_len)
-{
-struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
-struct x86_event event = {
-.vector = vmcb->eventinj.fields.type,
-.type = vmcb->eventinj.fields.type,
-.error_code = vmcb->exitinfo1,
-};
-
-event.insn_len = insn_len;
-hvm_inject_event();
-}
-
 static struct hvm_function_table __initdata svm_function_table = {
 .name = "SVM",
 .cpu_up_prepare   = svm_cpu_up_prepare,
@@ -2490,6 +2496,7 @@ static struct hvm_function_table __initdata 
svm_function_table = {
 .msr_read_intercept   = svm_msr_read_intercept,
 .msr_write_intercept  = svm_msr_write_intercept,
 .enable_msr_interception = svm_enable_msr_interception,
+.set_icebp_interception = svm_set_icebp_interception,
 .set_rdtsc_exiting= svm_set_rdtsc_exiting,
 .set_descriptor_access_exiting = svm_set_descriptor_access_exiting,
 .get_insn_bytes   = svm_get_insn_bytes,
@@ -2656,9 +2663,28 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
 HVMTRACE_0D(SMI);
 break;
 
+case VMEXIT_ICEBP:
 case VMEXIT_EXCEPTION_DB:
 if ( !v->domain->debugger_attached )
-hvm_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+{
+int rc;
+unsigned int trap_type = exit_reason == VMEXIT_ICEBP ?
+X86_EVENTTYPE_PRI_SW_EXCEPTION : X86_EVENTTYPE_HW_EXCEPTION;
+
+inst_len = 0;
+
+if ( trap_type >= X86_EVENTTYPE_SW_INTERRUPT )
+inst_len = __get_instruction_length(v, INSTR_ICEBP);
+
+rc = hvm_monitor_debug(regs->rip,
+   HVM_MONITOR_DEBUG_EXCEPTION,
+   trap_type, inst_len);
+if ( rc < 0 )
+

[Xen-devel] [PATCH v3a 00/39] (0/3) Fixups for the new VGIC(-v2) implementation

2018-03-22 Thread Andre Przywara
Hi,

this is just an update of the three patches which didn't get any review
tags so far.
The fixes for the new versions of 03/39 and 39/39 are pretty straight
forward, but 14/39 is more of a beast. I sent a diff to the original
patch [1] separately to give an idea of the changes.

I added the R-b: and A-b: tags along with the NIT fixes to my tree and
will later push a branch with those tags and these fixes here in a somewhat
final version.
Look out for the vgic-new/v3a branch appearing at
http://www.linux-arm.org/git?p=xen-ap.git

Cheers,
Andre

[1] https://lists.xen.org/archives/html/xen-devel/2018-03/msg02680.html

---
Changelog v3 ... v3a: (copied from the patches' changelog)
03/39:
- always set/clear _IRQ_INPROGRESS bit (not only for guest IRQs)
- add comments
14/39:
- take hardware IRQ lock in vgic_v2_fold_lr_state()
- fix last remaining u32 usage
- print message when using new VGIC
- add TODO about racy _IRQ_INPROGRESS setting
39/39:
- print panic when trying to run on GICv3 hardware

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 10:06:09 +
Paul Durrant  wrote:

>> -Original Message-
>> From: Alexey G [mailto:x19...@gmail.com]
>> Sent: 22 March 2018 09:55
>> To: Jan Beulich 
>> Cc: Andrew Cooper ; Anthony Perard
>> ; Ian Jackson ;
>> Paul Durrant ; Roger Pau Monne
>> ; Wei Liu ; Stefano
>> Stabellini ; xen-devel@lists.xenproject.org
>> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate
>> MMCONFIG area in the MMIO hole + minor code refactoring
>> 
>> On Thu, 22 Mar 2018 03:04:16 -0600
>> "Jan Beulich"  wrote:
>>   
>>  On 22.03.18 at 01:31,  wrote:  
>> >> On Wed, 21 Mar 2018 17:06:28 +
>> >> Paul Durrant  wrote:
>> >> [...]  
>>  Well, this might work actually. Although the overall scenario
>>  will be overcomplicated a bit for _PCI_CONFIG ioreqs. Here is
>>  how it will look:
>> 
>>  QEMU receives PCIEXBAR update -> calls the new dmop to tell
>>  Xen  
>> new  
>>  MMCONFIG address/size -> Xen (re)maps MMIO trapping area ->  
>> someone  
>>  is
>>  accessing this area -> Xen intercepts this MMIO access
>> 
>>  But here's what happens next:
>> 
>>  Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
>>  DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info
>>  back to the offset in emulated MMCONFIG range -> DM calls
>>  address_space_read/write to trigger MMIO emulation
>>   
>> >>>
>> >>>That would only be true of a dm that cannot handle PCI config
>> >>>ioreqs directly.  
>> >>
>> >> It's just a bit problematic for xen-hvm.c (Xen ioreq processor in
>> >> QEMU).
>> >>
>> >> It receives these PCI conf ioreqs out of any context. To
>> >> workaround this, existing code issues I/O to emulated CF8h/CFCh
>> >> ports in order to allow QEMU to find their target. But we can't
>> >> use the same method for MMCONFIG accesses -- this works for basic
>> >> PCI conf space only.  
>> >
>> >I think you want to view this the other way around: No physical
>> >device would ever get to see MMCFG accesses (or CF8/CFC port
>> >ones). This same layering is what we should have in the
>> >virtualized case.  
>> 
>> We have purely virtual layout of the PCI bus along with virtual,
>> emulated and completely unrelated to host's MMCONFIG -- so what's
>> exposed? This emulated MMCONFIG simply a supplement to virtual PCI
>> bus and its layout correspond to the virtual PCI bus guest/QEMU see.
>> 
>> It's QEMU who controls chipset-specific PCIEXBAR emulation and knows
>> about MMCONFIG position and size.  
>
>...and I think that it the wrong solution for Xen. We only use QEMU as
>an emulator for peripheral devices; we should not be using it for this
>kind of emulation... that should be brought into the hypervisor.
>
>> QEMU informs Xen about where it is,  
>
>No. Xen should not care where QEMU wants to put it because the MMIO
>emulations should not even read QEMU.

QEMU does a lot of MMIO emulation, what's so special in the emulated
MMCONFIG? It has absolutely nothing to do with host's MMCONFIG, neither
in address/size or the internal layout. None of the host
MMCONFIG-related facilities touched in any way. It is purely virtual
thing.

I really don't understand why some people have that fear of emulated
MMCONFIG -- it's really the same thing as any other MMIO range QEMU
already emulates via map_io_range_to_ioreq_server(). No sensitive
information exposed. It is related only to emulated PCI conf space which
QEMU already knows about and use, providing emulated PCI devices for it.

>   Paul
>
>> in order to receive events about R/W accesses to this emulated area
>> -- so, why he should receive these events in a form of PCI conf
>> BDF/reg and not simply as MMCONFIG offset directly if it is
>> basically the same thing?  


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3a 39/39] ARM: VGIC: wire new VGIC(-v2) files into Xen build system

2018-03-22 Thread Andre Przywara
Now that we have both the old VGIC prepared to cope with a sibling and
the code for the new VGIC in place, lets add a Kconfig option to enable
the new code and wire it into the Xen build system.
This will add a compile time option to use either the "old" or the "new"
VGIC.
In the moment this is restricted to a vGIC-v2. To make the build system
happy, we provide a temporary dummy implementation of
vgic_v3_setup_hw() to allow building for now.

Signed-off-by: Andre Przywara 
---
Changelog v3 ... v3a:
- print panic when trying to run on GICv3 hardware

Changelog v2 ... v3:
- fix indentation of Kconfig entry
- select NEEDS_LIST_SORT
- drop unconditional list_sort.o inclusion

Changelog v1 ... v2:
- add Kconfig help text
- use separate Makefile in vgic/ directory
- protect compilation without GICV3 support
- always include list_sort() in build

 xen/arch/arm/Kconfig   | 18 +-
 xen/arch/arm/Makefile  |  5 -
 xen/arch/arm/vgic/Makefile |  5 +
 xen/arch/arm/vgic/vgic.c   | 11 +++
 4 files changed, 37 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/arm/vgic/Makefile

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 2782ee6589..8174c0c635 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -48,7 +48,23 @@ config HAS_GICV3
 config HAS_ITS
 bool
 prompt "GICv3 ITS MSI controller support" if EXPERT = "y"
-depends on HAS_GICV3
+depends on HAS_GICV3 && !NEW_VGIC
+
+config NEW_VGIC
+   bool
+   prompt "Use new VGIC implementation"
+   select NEEDS_LIST_SORT
+   ---help---
+
+   This is an alternative implementation of the ARM GIC interrupt
+   controller emulation, based on the Linux/KVM VGIC. It has a better
+   design and fixes many shortcomings of the existing GIC emulation in
+   Xen. It will eventually replace the existing/old VGIC.
+   However at the moment it lacks support for Dom0 using the ITS for
+   using MSIs.
+   Say Y if you want to help testing this new code or if you experience
+   problems with the standard emulation.
+   At the moment this implementation is not security supported.
 
 config SBSA_VUART_CONSOLE
bool "Emulated SBSA UART console support"
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 41d7366527..a9533b107e 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -16,7 +16,6 @@ obj-y += domain_build.o
 obj-y += domctl.o
 obj-$(EARLY_PRINTK) += early_printk.o
 obj-y += gic.o
-obj-y += gic-vgic.o
 obj-y += gic-v2.o
 obj-$(CONFIG_HAS_GICV3) += gic-v3.o
 obj-$(CONFIG_HAS_ITS) += gic-v3-its.o
@@ -47,10 +46,14 @@ obj-y += sysctl.o
 obj-y += time.o
 obj-y += traps.o
 obj-y += vcpreg.o
+subdir-$(CONFIG_NEW_VGIC) += vgic
+ifneq ($(CONFIG_NEW_VGIC),y)
+obj-y += gic-vgic.o
 obj-y += vgic.o
 obj-y += vgic-v2.o
 obj-$(CONFIG_HAS_GICV3) += vgic-v3.o
 obj-$(CONFIG_HAS_ITS) += vgic-v3-its.o
+endif
 obj-y += vm_event.o
 obj-y += vtimer.o
 obj-$(CONFIG_SBSA_VUART_CONSOLE) += vpl011.o
diff --git a/xen/arch/arm/vgic/Makefile b/xen/arch/arm/vgic/Makefile
new file mode 100644
index 00..806826948e
--- /dev/null
+++ b/xen/arch/arm/vgic/Makefile
@@ -0,0 +1,5 @@
+obj-y += vgic.o
+obj-y += vgic-v2.o
+obj-y += vgic-mmio.o
+obj-y += vgic-mmio-v2.o
+obj-y += vgic-init.o
diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
index f9a5088285..ac18cab6f3 100644
--- a/xen/arch/arm/vgic/vgic.c
+++ b/xen/arch/arm/vgic/vgic.c
@@ -981,6 +981,17 @@ unsigned int vgic_max_vcpus(const struct domain *d)
 return min_t(unsigned int, MAX_VIRT_CPUS, vgic_vcpu_limit);
 }
 
+#ifdef CONFIG_HAS_GICV3
+/* Dummy implementation to allow building without actual vGICv3 support. */
+void vgic_v3_setup_hw(paddr_t dbase,
+  unsigned int nr_rdist_regions,
+  const struct rdist_region *regions,
+  unsigned int intid_bits)
+{
+panic("New VGIC implementation does not yet support GICv3.");
+}
+#endif
+
 /*
  * Local variables:
  * mode: C
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3a 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ

2018-03-22 Thread Andre Przywara
When playing around with hardware mapped, level triggered virtual IRQs,
there is the need to explicitly set the active or pending state of an
interrupt at some point.
To prepare the GIC for that, we introduce a set_active_state() and a
set_pending_state() function to let the VGIC manipulate the state of
an associated hardware IRQ.
This takes care of properly setting the _IRQ_INPROGRESS bit.

Signed-off-by: Andre Przywara 
---
Changelog v3 ... v3a:
- always set/clear _IRQ_INPROGRESS bit (not only for guest IRQs)
- add comments

Changelog v2 ... v3:
- extend comments to note preliminary nature of vgic_get_lpi()

Changelog v1 ... v2:
- reorder header file inclusion

 xen/arch/arm/gic-v2.c | 41 +
 xen/arch/arm/gic-v3.c | 37 +
 xen/include/asm-arm/gic.h | 24 
 3 files changed, 102 insertions(+)

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index aa0fc6c1a1..7374686235 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -243,6 +243,45 @@ static void gicv2_poke_irq(struct irq_desc *irqd, uint32_t 
offset)
 writel_gicd(1U << (irqd->irq % 32), offset + (irqd->irq / 32) * 4);
 }
 
+/*
+ * This is forcing the active state of an interrupt, somewhat circumventing
+ * the normal interrupt flow and the GIC state machine. So use with care
+ * and only if you know what you are doing. For this reason we also have to
+ * tinker with the _IRQ_INPROGRESS bit here, since the normal IRQ handler
+ * will not be involved.
+ */
+static void gicv2_set_active_state(struct irq_desc *irqd, bool active)
+{
+ASSERT(spin_is_locked(>lock));
+
+if ( active )
+{
+set_bit(_IRQ_INPROGRESS, >status);
+gicv2_poke_irq(irqd, GICD_ISACTIVER);
+}
+else
+{
+clear_bit(_IRQ_INPROGRESS, >status);
+gicv2_poke_irq(irqd, GICD_ICACTIVER);
+}
+}
+
+static void gicv2_set_pending_state(struct irq_desc *irqd, bool pending)
+{
+ASSERT(spin_is_locked(>lock));
+
+if ( pending )
+{
+/* The _IRQ_INPROGRESS bit will be set when the interrupt fires. */
+gicv2_poke_irq(irqd, GICD_ISPENDR);
+}
+else
+{
+/* The _IRQ_INPROGRESS remains unchanged. */
+gicv2_poke_irq(irqd, GICD_ICPENDR);
+}
+}
+
 static void gicv2_set_irq_type(struct irq_desc *desc, unsigned int type)
 {
 uint32_t cfg, actual, edgebit;
@@ -1278,6 +1317,8 @@ const static struct gic_hw_operations gicv2_ops = {
 .eoi_irq = gicv2_eoi_irq,
 .deactivate_irq  = gicv2_dir_irq,
 .read_irq= gicv2_read_irq,
+.set_active_state= gicv2_set_active_state,
+.set_pending_state   = gicv2_set_pending_state,
 .set_irq_type= gicv2_set_irq_type,
 .set_irq_priority= gicv2_set_irq_priority,
 .send_SGI= gicv2_send_SGI,
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index cb41844af2..a5105ac9e7 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -477,6 +477,41 @@ static unsigned int gicv3_read_irq(void)
 return irq;
 }
 
+/*
+ * This is forcing the active state of an interrupt, somewhat circumventing
+ * the normal interrupt flow and the GIC state machine. So use with care
+ * and only if you know what you are doing. For this reason we also have to
+ * tinker with the _IRQ_INPROGRESS bit here, since the normal IRQ handler
+ * will not be involved.
+ */
+static void gicv3_set_active_state(struct irq_desc *irqd, bool active)
+{
+ASSERT(spin_is_locked(>lock));
+
+if ( active )
+{
+set_bit(_IRQ_INPROGRESS, >status);
+gicv3_poke_irq(irqd, GICD_ISACTIVER, false);
+}
+else
+{
+clear_bit(_IRQ_INPROGRESS, >status);
+gicv3_poke_irq(irqd, GICD_ICACTIVER, false);
+}
+}
+
+static void gicv3_set_pending_state(struct irq_desc *irqd, bool pending)
+{
+ASSERT(spin_is_locked(>lock));
+
+if ( pending )
+/* The _IRQ_INPROGRESS bit will be set when the interrupt fires. */
+gicv3_poke_irq(irqd, GICD_ISPENDR, false);
+else
+/* The _IRQ_INPROGRESS bit will remain unchanged. */
+gicv3_poke_irq(irqd, GICD_ICPENDR, false);
+}
+
 static inline uint64_t gicv3_mpidr_to_affinity(int cpu)
 {
  uint64_t mpidr = cpu_logical_map(cpu);
@@ -1769,6 +1804,8 @@ static const struct gic_hw_operations gicv3_ops = {
 .eoi_irq = gicv3_eoi_irq,
 .deactivate_irq  = gicv3_dir_irq,
 .read_irq= gicv3_read_irq,
+.set_active_state= gicv3_set_active_state,
+.set_pending_state   = gicv3_set_pending_state,
 .set_irq_type= gicv3_set_irq_type,
 .set_irq_priority= gicv3_set_irq_priority,
 .send_SGI= gicv3_send_sgi,
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index 3079387e06..2aca243ac3 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -345,6 +345,10 

[Xen-devel] [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend

2018-03-22 Thread Andre Przywara
Processing maintenance interrupts and accessing the list registers
are dependent on the host's GIC version.
Introduce vgic-v2.c to contain GICv2 specific functions.
Implement the GICv2 specific code for syncing the emulation state
into the VGIC registers.
This also adds the hook to let Xen setup the host GIC addresses.

This is based on Linux commit 140b086dd197, written by Marc Zyngier.

Signed-off-by: Andre Przywara 
---
Changelog v3 ... v3a:
- take hardware IRQ lock in vgic_v2_fold_lr_state()
- fix last remaining u32 usage
- print message when using new VGIC
- add TODO about racy _IRQ_INPROGRESS setting

Changelog v2 ... v3:
- remove no longer needed asm/io.h header
- replace 0/1 with false/true for bool's
- clear _IRQ_INPROGRESS bit when retiring hardware mapped IRQ
- fix indentation and w/s issues

Changelog v1 ... v2:
- remove v2 specific underflow function (now generic)
- re-add Linux code to properly handle acked level IRQs

 xen/arch/arm/vgic/vgic-v2.c | 259 
 xen/arch/arm/vgic/vgic.c|   6 +
 xen/arch/arm/vgic/vgic.h|   9 ++
 3 files changed, 274 insertions(+)
 create mode 100644 xen/arch/arm/vgic/vgic-v2.c

diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
new file mode 100644
index 00..1773503cfb
--- /dev/null
+++ b/xen/arch/arm/vgic/vgic-v2.c
@@ -0,0 +1,259 @@
+/*
+ * Copyright (C) 2015, 2016 ARM Ltd.
+ * Imported from Linux ("new" KVM VGIC) and heavily adapted to Xen.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vgic.h"
+
+static struct {
+bool enabled;
+paddr_t dbase;  /* Distributor interface address */
+paddr_t cbase;  /* CPU interface address & size */
+paddr_t csize;
+paddr_t vbase;  /* Virtual CPU interface address */
+
+/* Offset to add to get an 8kB contiguous region if GIC is aliased */
+uint32_t aliased_offset;
+} gic_v2_hw_data;
+
+void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
+  paddr_t vbase, uint32_t aliased_offset)
+{
+gic_v2_hw_data.enabled = true;
+gic_v2_hw_data.dbase = dbase;
+gic_v2_hw_data.cbase = cbase;
+gic_v2_hw_data.csize = csize;
+gic_v2_hw_data.vbase = vbase;
+gic_v2_hw_data.aliased_offset = aliased_offset;
+
+printk("Using the new VGIC implementation.\n");
+}
+
+/*
+ * transfer the content of the LRs back into the corresponding ap_list:
+ * - active bit is transferred as is
+ * - pending bit is
+ *   - transferred as is in case of edge sensitive IRQs
+ *   - set to the line-level (resample time) for level sensitive IRQs
+ */
+void vgic_v2_fold_lr_state(struct vcpu *vcpu)
+{
+struct vgic_cpu *vgic_cpu = >arch.vgic;
+unsigned int used_lrs = vcpu->arch.vgic.used_lrs;
+unsigned long flags;
+unsigned int lr;
+
+if ( !used_lrs )/* No LRs used, so nothing to sync back here. */
+return;
+
+gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
+
+for ( lr = 0; lr < used_lrs; lr++ )
+{
+struct gic_lr lr_val;
+uint32_t intid;
+struct vgic_irq *irq;
+struct irq_desc *desc = NULL;
+bool have_desc_lock = false;
+
+gic_hw_ops->read_lr(lr, _val);
+
+/*
+ * TODO: Possible optimization to avoid reading LRs:
+ * Read the ELRSR to find out which of our LRs have been cleared
+ * by the guest. We just need to know the IRQ number for those, which
+ * we could save in an array when populating the LRs.
+ * This trades one MMIO access (ELRSR) for possibly more than one 
(LRs),
+ * but requires some more code to save the IRQ number and to handle
+ * those finished IRQs according to the algorithm below.
+ * We need some numbers to justify this: chances are that we don't
+ * have many LRs in use most of the time, so we might not save much.
+ */
+gic_hw_ops->clear_lr(lr);
+
+intid = lr_val.virq;
+irq = vgic_get_irq(vcpu->domain, vcpu, intid);
+
+local_irq_save(flags);
+spin_lock(>irq_lock);
+
+/* The locking order forces us to drop and re-take the locks here. */
+if ( irq->hw )
+{
+spin_unlock(>irq_lock);
+
+desc = irq_to_desc(irq->hwintid);
+spin_lock(>lock);
+

[Xen-devel] [PATCH v18 10/11] common: add a new mappable resource type: XENMEM_resource_grant_table

2018-03-22 Thread Paul Durrant
This patch allows grant table frames to be mapped using the
XENMEM_acquire_resource memory op.

NOTE: This patch expands the on-stack mfn_list array in acquire_resource()
  but it is still small enough to remain on-stack.

Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 

v18:
 - Non-trivial re-base of grant table code.
 - Dropped Jan's R-b because of the grant table changes.

v13:
 - Re-work the internals to avoid using the XENMAPIDX_grant_table_status
   hack.

v12:
 - Dropped limit checks as requested by Jan.

v10:
 - Addressed comments from Jan.

v8:
 - The functionality was originally incorporated into the earlier patch
   "x86/mm: add HYPERVISOR_memory_op to acquire guest resources".
---
 xen/common/grant_table.c  | 71 +++
 xen/common/memory.c   | 45 ++-
 xen/include/public/memory.h   |  9 --
 xen/include/xen/grant_table.h |  4 +++
 4 files changed, 113 insertions(+), 16 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 18201912e4..c8c3661b19 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -3863,6 +3863,35 @@ int mem_sharing_gref_to_gfn(struct grant_table *gt, 
grant_ref_t ref,
 }
 #endif
 
+/* caller must hold read or write lock */
+static int gnttab_get_status_frame_mfn(struct domain *d,
+   unsigned long idx, mfn_t *mfn)
+{
+struct grant_table *gt = d->grant_table;
+
+if ( idx >= nr_status_frames(gt) )
+return -EINVAL;
+
+*mfn = _mfn(virt_to_mfn(gt->status[idx]));
+return 0;
+}
+
+/* caller must hold write lock */
+static int gnttab_get_shared_frame_mfn(struct domain *d,
+   unsigned long idx, mfn_t *mfn)
+{
+struct grant_table *gt = d->grant_table;
+
+if ( (idx >= nr_grant_frames(gt)) && (idx < gt->max_grant_frames) )
+gnttab_grow_table(d, idx + 1);
+
+if ( idx >= nr_grant_frames(gt) )
+return -EINVAL;
+
+*mfn = _mfn(virt_to_mfn(gt->shared_raw[idx]));
+return 0;
+}
+
 int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn,
  mfn_t *mfn)
 {
@@ -3880,21 +3909,11 @@ int gnttab_map_frame(struct domain *d, unsigned long 
idx, gfn_t gfn,
 {
 idx &= ~XENMAPIDX_grant_table_status;
 status = true;
-if ( idx < nr_status_frames(gt) )
-*mfn = _mfn(virt_to_mfn(gt->status[idx]));
-else
-rc = -EINVAL;
-}
-else
-{
-if ( (idx >= nr_grant_frames(gt)) && (idx < gt->max_grant_frames) )
-gnttab_grow_table(d, idx + 1);
 
-if ( idx < nr_grant_frames(gt) )
-*mfn = _mfn(virt_to_mfn(gt->shared_raw[idx]));
-else
-rc = -EINVAL;
+rc = gnttab_get_status_frame_mfn(d, idx, mfn);
 }
+else
+rc = gnttab_get_shared_frame_mfn(d, idx, mfn);
 
 if ( !rc && paging_mode_translate(d) &&
  !gfn_eq(gnttab_get_frame_gfn(gt, status, idx), INVALID_GFN) )
@@ -3909,6 +3928,32 @@ int gnttab_map_frame(struct domain *d, unsigned long 
idx, gfn_t gfn,
 return rc;
 }
 
+int gnttab_get_shared_frame(struct domain *d, unsigned long idx,
+mfn_t *mfn)
+{
+struct grant_table *gt = d->grant_table;
+int rc;
+
+grant_write_lock(gt);
+rc = gnttab_get_shared_frame_mfn(d, idx, mfn);
+grant_write_unlock(gt);
+
+return rc;
+}
+
+int gnttab_get_status_frame(struct domain *d, unsigned long idx,
+mfn_t *mfn)
+{
+struct grant_table *gt = d->grant_table;
+int rc;
+
+grant_read_lock(gt);
+rc = gnttab_get_status_frame_mfn(d, idx, mfn);
+grant_read_unlock(gt);
+
+return rc;
+}
+
 static void gnttab_usage_print(struct domain *rd)
 {
 int first = 1;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c09ef179e8..bc570167bb 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -967,6 +968,43 @@ static long xatp_permission_check(struct domain *d, 
unsigned int space)
 return xsm_add_to_physmap(XSM_TARGET, current->domain, d);
 }
 
+static int acquire_grant_table(struct domain *d, unsigned int id,
+   unsigned long frame,
+   unsigned int nr_frames,
+   xen_pfn_t mfn_list[])
+{
+unsigned int i = nr_frames;
+
+/* Iterate backwards in case table needs to grow */
+while ( i-- != 0 )
+{
+mfn_t mfn = INVALID_MFN;
+int rc;
+
+

[Xen-devel] [xen-unstable-smoke test] 121056: trouble: blocked/broken/pass

2018-03-22 Thread osstest service owner
flight 121056 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/121056/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf  broken
 build-armhf   4 host-install(4)broken REGR. vs. 121043

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  6161d9f27fcb6c48021e6928bb240dfa39d9f1d3
baseline version:
 xen  8df3821c08d024684a6c83659d8d794b565067f9

Last test of basis   121043  2018-03-21 21:04:22 Z0 days
Testing same since   121056  2018-03-22 10:01:22 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Doug Goldstein 
  Jan Beulich 
  Joe Jin 
  Tim Deegan 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  broken  
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-job build-armhf broken
broken-step build-armhf host-install(4)

Not pushing.

(No revision log; it would be 318 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/8] x86: NOP out XPTI entry/exit code when it's not in use

2018-03-22 Thread Wei Liu
On Mon, Mar 19, 2018 at 07:37:54AM -0600, Jan Beulich wrote:
> Introduce a synthetic feature flag to use alternative instruction
> patching to NOP out all code on entry/exit paths. Having NOPs here is
> generally better than using conditional branches.
> 
> Also change the limit on the number of bytes we can patch in one go to
> that resulting from the encoding in struct alt_instr - there's no point
> reducing it below that limit, and without a check being in place that
> the limit isn't actually exceeded, such an artificial boundary is a
> latent risk.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 4/8] x86/XPTI: use %r12 to write zero into xen_cr3

2018-03-22 Thread Wei Liu
On Mon, Mar 19, 2018 at 07:39:34AM -0600, Jan Beulich wrote:
> Now that we zero all registers early on all entry paths, use that to
> avoid a couple of immediates here.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: Andrew Cooper 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 3/8] x86: log XPTI enabled status

2018-03-22 Thread Wei Liu
On Mon, Mar 19, 2018 at 07:39:04AM -0600, Jan Beulich wrote:
> At the same time also report the state of the two defined
> ARCH_CAPABILITIES MSR bits. To avoid further complicating the
> conditional around that printk(), drop it (it's a debug level one only
> anyway).
> 
> Issue the main message without any XENLOG_*, and also drop XENLOG_INFO
> from the respective BTI message, to make sure they're visible at default
> log level also in release builds.
> 
> Signed-off-by: Jan Beulich 
> Tested-by: Juergen Gross 
> Reviewed-by: Juergen Gross 
> Reviewed-by: Andrew Cooper 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 15/39] ARM: new VGIC: Implement vgic_vcpu_pending_irq

2018-03-22 Thread Andre Przywara
Hi,

On 22/03/18 03:52, Julien Grall wrote:
> Hi Andre,
> 
> On 03/21/2018 04:32 PM, Andre Przywara wrote:
>> Tell Xen whether a particular VCPU has an IRQ that needs handling
>> in the guest. This is used to decide whether a VCPU is runnable or
>> if a hypercall should be preempted to let the guest handle the IRQ.
>>
>> This is based on Linux commit 90eee56c5f90, written by Eric Auger.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>> Changelog v2 ... v3:
>> - adjust vgic_vcpu_pending_irq() to return integers, not false/true
> 
> I would have preferred to have the return switch to bool instead. I
> guess this can be done on a follow-up. With one comment below:

I did that originally, but then you meanwhile merged that first patch
already. So I didn't want to add another patch to this series.
I am fine with changing this afterwards, probably as part of a fixup series.

> Reviewed-by: Julien Grall 

Thanks!

Cheers,
Andre.

>>
>> Changelog v1 ... v2:
>> - adjust to new vgic_vcpu_pending_irq() prototype, drop wrapper
>>
>>   xen/arch/arm/vgic/vgic.c | 37 +
>>   1 file changed, 37 insertions(+)
>>
>> diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
>> index 2fa595f4f7..925cda4580 100644
>> --- a/xen/arch/arm/vgic/vgic.c
>> +++ b/xen/arch/arm/vgic/vgic.c
>> @@ -647,6 +647,43 @@ void vgic_sync_to_lrs(void)
>>   gic_hw_ops->update_hcr_status(GICH_HCR_EN, 1);
>>   }
>>   +/**
>> + * vgic_vcpu_pending_irq() - determine if interrupts need to be injected
>> + * @vcpu: The vCPU on which to check for interrupts.
>> + *
>> + * Checks whether there is an interrupt on the given VCPU which needs
>> + * handling in the guest. This requires at least one IRQ to be pending
>> + * and enabled.
>> + *
>> + * Returns: 1 if the guest should run to handle interrupts, 0 otherwise.
> 
> NIT: Because of "ret = irq_is_pending(irq) && irq->enabled", you will
> return a non-zero value if the guest should run to handle interrupts.
> 
>> + */
>> +int vgic_vcpu_pending_irq(struct vcpu *vcpu)
>> +{
>> +    struct vgic_cpu *vgic_cpu = >arch.vgic;
>> +    struct vgic_irq *irq;
>> +    unsigned long flags;
>> +    int ret = 0;
>> +
>> +    if ( !vcpu->domain->arch.vgic.enabled )
>> +    return 0;
>> +
>> +    spin_lock_irqsave(_cpu->ap_list_lock, flags);
>> +
>> +    list_for_each_entry(irq, _cpu->ap_list_head, ap_list)
>> +    {
>> +    spin_lock(>irq_lock);
>> +    ret = irq_is_pending(irq) && irq->enabled;
>> +    spin_unlock(>irq_lock);
>> +
>> +    if ( ret )
>> +    break;
>> +    }
>> +
>> +    spin_unlock_irqrestore(_cpu->ap_list_lock, flags);
>> +
>> +    return ret;
>> +}
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>>
> 
> Cheers,
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v18 00/11] x86: guest resource mapping

2018-03-22 Thread Paul Durrant
This series introduces support for direct mapping of guest resources.
The resources are:
 - IOREQ server pages
 - Grant tables

v18:
 - Re-base
 - Use the now-reference-counted emulating domain to host ioreq pages

v17:
 - Make sure ioreq page free-ing is done at domain destruction

v16:
 - Fix default ioreq server code and verified with qemu trad

v15:
 - Correct page ownership of ioreq pages

v14:
 - Responded to more comments from Jan.

v13:
 - Responded to more comments from Jan and Julien.
 - Build-tested using ARM cross-compilation.

v12:
 - Responded to more comments from Jan.

v11:
 - Responded to more comments from Jan.

v10:
 - Responded to comments from Jan.

v9:
 - Change to patch #1 only.

v8:
 - Re-ordered series and dropped two patches that have already been
   committed.

v7:
 - Fixed assertion failure hit during domain destroy.

v6:
 - Responded to missed comments from Roger.

v5:
 - Responded to review comments from Wei.

v4:
 - Responded to further review comments from Roger.

v3:
 - Dropped original patch #1 since it is covered by Juergen's patch.
 - Added new xenforeignmemorycleanup patch (#4).
 - Replaced the patch introducing the ioreq server 'is_default' flag with
   one that changes the ioreq server list into an array (#8).

Paul Durrant (11):
  x86/hvm/ioreq: maintain an array of ioreq servers rather than a list
  x86/hvm/ioreq: simplify code and use consistent naming
  x86/hvm/ioreq: use gfn_t in struct hvm_ioreq_page
  x86/hvm/ioreq: defer mapping gfns until they are actually requested
  x86/mm: add HYPERVISOR_memory_op to acquire guest resources
  x86/hvm/ioreq: add a new mappable resource type...
  x86/mm: add an extra command to HYPERVISOR_mmu_update...
  tools/libxenforeignmemory: add support for resource mapping
  tools/libxenforeignmemory: reduce xenforeignmemory_restrict code
footprint
  common: add a new mappable resource type: XENMEM_resource_grant_table
  tools/libxenctrl: use new xenforeignmemory API to seed grant table

 tools/flask/policy/modules/xen.if  |   4 +-
 tools/include/xen-sys/Linux/privcmd.h  |  11 +
 tools/libs/devicemodel/core.c  |   8 +
 tools/libs/devicemodel/include/xendevicemodel.h|   6 +-
 tools/libs/foreignmemory/Makefile  |   2 +-
 tools/libs/foreignmemory/core.c|  53 ++
 tools/libs/foreignmemory/freebsd.c |   7 -
 .../libs/foreignmemory/include/xenforeignmemory.h  |  41 +
 tools/libs/foreignmemory/libxenforeignmemory.map   |   5 +
 tools/libs/foreignmemory/linux.c   |  45 ++
 tools/libs/foreignmemory/minios.c  |   7 -
 tools/libs/foreignmemory/netbsd.c  |   7 -
 tools/libs/foreignmemory/private.h |  43 +-
 tools/libs/foreignmemory/solaris.c |   7 -
 tools/libxc/include/xc_dom.h   |   8 +-
 tools/libxc/xc_dom_boot.c  | 114 ++-
 tools/libxc/xc_sr_restore_x86_hvm.c|  10 +-
 tools/libxc/xc_sr_restore_x86_pv.c |   2 +-
 tools/libxl/libxl_dom.c|   1 -
 tools/python/xen/lowlevel/xc/xc.c  |   6 +-
 xen/arch/x86/hvm/dm.c  |   9 +-
 xen/arch/x86/hvm/ioreq.c   | 884 -
 xen/arch/x86/mm.c  |  60 +-
 xen/arch/x86/mm/p2m.c  |   3 +-
 xen/common/compat/memory.c | 100 +++
 xen/common/grant_table.c   |  71 +-
 xen/common/memory.c| 137 
 xen/include/asm-arm/mm.h   |   8 +
 xen/include/asm-arm/p2m.h  |  10 +
 xen/include/asm-x86/hvm/domain.h   |  15 +-
 xen/include/asm-x86/hvm/ioreq.h|   2 +
 xen/include/asm-x86/mm.h   |   5 +
 xen/include/asm-x86/p2m.h  |   3 +
 xen/include/public/hvm/dm_op.h |  36 +-
 xen/include/public/memory.h|  69 +-
 xen/include/public/xen.h   |  12 +-
 xen/include/xen/grant_table.h  |   4 +
 xen/include/xlat.lst   |   1 +
 xen/include/xsm/dummy.h|   6 +
 xen/include/xsm/xsm.h  |   6 +
 xen/xsm/dummy.c|   1 +
 xen/xsm/flask/hooks.c  |   6 +
 xen/xsm/flask/policy/access_vectors|   2 +
 43 files changed, 1320 insertions(+), 517 deletions(-)
---
Cc: Daniel De Graaf 
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano 

[Xen-devel] [PATCH v18 02/11] x86/hvm/ioreq: simplify code and use consistent naming

2018-03-22 Thread Paul Durrant
This patch re-works much of the ioreq server initialization and teardown
code:

- The hvm_map/unmap_ioreq_gfn() functions are expanded to call through
  to hvm_alloc/free_ioreq_gfn() rather than expecting them to be called
  separately by outer functions.
- Several functions now test the validity of the hvm_ioreq_page gfn value
  to determine whether they need to act. This means can be safely called
  for the bufioreq page even when it is not used.
- hvm_add/remove_ioreq_gfn() simply return in the case of the default
  IOREQ server so callers no longer need to test before calling.
- hvm_ioreq_server_setup_pages() is renamed to hvm_ioreq_server_map_pages()
  to mirror the existing hvm_ioreq_server_unmap_pages().

All of this significantly shortens the code.

Signed-off-by: Paul Durrant 
Reviewed-by: Roger Pau Monné 
Reviewed-by: Wei Liu 
Acked-by: Jan Beulich 
---
Cc: Andrew Cooper 

v18:
 - Trivial re-base.

v3:
 - Re-based on top of 's->is_default' to 'IS_DEFAULT(s)' changes.
 - Minor updates in response to review comments from Roger.
---
 xen/arch/x86/hvm/ioreq.c | 182 ++-
 1 file changed, 69 insertions(+), 113 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index d8d4e96a80..bd141db0d5 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -210,63 +210,75 @@ bool handle_hvm_io_completion(struct vcpu *v)
 return true;
 }
 
-static int hvm_alloc_ioreq_gfn(struct domain *d, unsigned long *gfn)
+static unsigned long hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
 {
+struct domain *d = s->target;
 unsigned int i;
-int rc;
 
-rc = -ENOMEM;
+ASSERT(!IS_DEFAULT(s));
+
 for ( i = 0; i < sizeof(d->arch.hvm_domain.ioreq_gfn.mask) * 8; i++ )
 {
 if ( test_and_clear_bit(i, >arch.hvm_domain.ioreq_gfn.mask) )
-{
-*gfn = d->arch.hvm_domain.ioreq_gfn.base + i;
-rc = 0;
-break;
-}
+return d->arch.hvm_domain.ioreq_gfn.base + i;
 }
 
-return rc;
+return gfn_x(INVALID_GFN);
 }
 
-static void hvm_free_ioreq_gfn(struct domain *d, unsigned long gfn)
+static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s,
+   unsigned long gfn)
 {
+struct domain *d = s->target;
 unsigned int i = gfn - d->arch.hvm_domain.ioreq_gfn.base;
 
-if ( gfn != gfn_x(INVALID_GFN) )
-set_bit(i, >arch.hvm_domain.ioreq_gfn.mask);
+ASSERT(!IS_DEFAULT(s));
+ASSERT(gfn != gfn_x(INVALID_GFN));
+
+set_bit(i, >arch.hvm_domain.ioreq_gfn.mask);
 }
 
-static void hvm_unmap_ioreq_page(struct hvm_ioreq_server *s, bool buf)
+static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
 {
 struct hvm_ioreq_page *iorp = buf ? >bufioreq : >ioreq;
 
+if ( iorp->gfn == gfn_x(INVALID_GFN) )
+return;
+
 destroy_ring_for_helper(>va, iorp->page);
+iorp->page = NULL;
+
+if ( !IS_DEFAULT(s) )
+hvm_free_ioreq_gfn(s, iorp->gfn);
+
+iorp->gfn = gfn_x(INVALID_GFN);
 }
 
-static int hvm_map_ioreq_page(
-struct hvm_ioreq_server *s, bool buf, unsigned long gfn)
+static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
 {
 struct domain *d = s->target;
 struct hvm_ioreq_page *iorp = buf ? >bufioreq : >ioreq;
-struct page_info *page;
-void *va;
 int rc;
 
-if ( (rc = prepare_ring_for_helper(d, gfn, , )) )
-return rc;
-
-if ( (iorp->va != NULL) || d->is_dying )
-{
-destroy_ring_for_helper(, page);
+if ( d->is_dying )
 return -EINVAL;
-}
 
-iorp->va = va;
-iorp->page = page;
-iorp->gfn = gfn;
+if ( IS_DEFAULT(s) )
+iorp->gfn = buf ?
+d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] :
+d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+else
+iorp->gfn = hvm_alloc_ioreq_gfn(s);
 
-return 0;
+if ( iorp->gfn == gfn_x(INVALID_GFN) )
+return -ENOMEM;
+
+rc = prepare_ring_for_helper(d, iorp->gfn, >page, >va);
+
+if ( rc )
+hvm_unmap_ioreq_gfn(s, buf);
+
+return rc;
 }
 
 bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
@@ -279,8 +291,7 @@ bool is_ioreq_server_page(struct domain *d, const struct 
page_info *page)
 
 FOR_EACH_IOREQ_SERVER(d, id, s)
 {
-if ( (s->ioreq.va && s->ioreq.page == page) ||
- (s->bufioreq.va && s->bufioreq.page == page) )
+if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
 {
 found = true;
 break;
@@ -292,20 +303,30 @@ bool is_ioreq_server_page(struct domain *d, const struct 
page_info *page)
 return found;
 }
 
-static void hvm_remove_ioreq_gfn(
-struct domain *d, struct hvm_ioreq_page *iorp)
+static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server 

[Xen-devel] [PATCH v18 05/11] x86/mm: add HYPERVISOR_memory_op to acquire guest resources

2018-03-22 Thread Paul Durrant
Certain memory resources associated with a guest are not necessarily
present in the guest P2M.

This patch adds the boilerplate for new memory op to allow such a resource
to be priv-mapped directly, by either a PV or HVM tools domain.

NOTE: Whilst the new op is not intrinsicly specific to the x86 architecture,
  I have no means to test it on an ARM platform and so cannot verify
  that it functions correctly.

Signed-off-by: Paul Durrant 
Acked-by: Daniel De Graaf 
---
Cc: Jan Beulich 
Cc: George Dunlap 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
Cc: Julien Grall 

v18:
 - Allow the resource page owner to be specified by a returned flag.
 - Drop Jan's R-b due to change.

v14:
 - Addressed more comments from Jan.

v13:
 - Use xen_pfn_t for mfn_list.
 - Addressed further comments from Jan and Julien.

v12:
 - Addressed more comments form Jan.
 - Removed #ifdef CONFIG_X86 from common code and instead introduced a
   stub set_foreign_p2m_entry() in asm-arm/p2m.h returning -EOPNOTSUPP.
 - Restricted mechanism for querying implementation limit on nr_frames
   and simplified compat code.

v11:
 - Addressed more comments from Jan.

v9:
 - Addressed more comments from Jan.

v8:
 - Move the code into common as requested by Jan.
 - Make the gmfn_list handle a 64-bit type to avoid limiting the MFN
   range for a 32-bit tools domain.
 - Add missing pad.
 - Add compat code.
 - Make this patch deal with purely boilerplate.
 - Drop George's A-b and Wei's R-b because the changes are non-trivial,
   and update Cc list now the boilerplate is common.

v5:
 - Switched __copy_to/from_guest_offset() to copy_to/from_guest_offset().
---
 tools/flask/policy/modules/xen.if   |   4 +-
 xen/arch/x86/mm/p2m.c   |   3 +-
 xen/common/compat/memory.c  | 100 
 xen/common/memory.c |  93 +
 xen/include/asm-arm/p2m.h   |  10 
 xen/include/asm-x86/p2m.h   |   3 ++
 xen/include/public/memory.h |  55 +++-
 xen/include/xlat.lst|   1 +
 xen/include/xsm/dummy.h |   6 +++
 xen/include/xsm/xsm.h   |   6 +++
 xen/xsm/dummy.c |   1 +
 xen/xsm/flask/hooks.c   |   6 +++
 xen/xsm/flask/policy/access_vectors |   2 +
 13 files changed, 286 insertions(+), 4 deletions(-)

diff --git a/tools/flask/policy/modules/xen.if 
b/tools/flask/policy/modules/xen.if
index 459880bb01..7aefd0061e 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -52,7 +52,8 @@ define(`create_domain_common', `
settime setdomainhandle getvcpucontext set_misc_info };
allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim
set_max_evtchn set_vnumainfo get_vnumainfo cacheflush
-   psr_cmt_op psr_alloc soft_reset set_gnttab_limits };
+   psr_cmt_op psr_alloc soft_reset set_gnttab_limits
+   resource_map };
allow $1 $2:security check_context;
allow $1 $2:shadow enable;
allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage 
mmuext_op updatemp };
@@ -152,6 +153,7 @@ define(`device_model', `
allow $1 $2_target:domain { getdomaininfo shutdown };
allow $1 $2_target:mmu { map_read map_write adjust physmap target_hack 
};
allow $1 $2_target:hvm { getparam setparam hvmctl dm };
+   allow $1 $2_target:domain2 resource_map;
 ')
 
 # make_device_model(priv, dm_dom, hvm_dom)
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 48e50fb5d8..55693eba59 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1132,8 +1132,7 @@ static int set_typed_p2m_entry(struct domain *d, unsigned 
long gfn_l,
 }
 
 /* Set foreign mfn in the given guest's p2m table. */
-static int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
- mfn_t mfn)
+int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
 return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_map_foreign,
p2m_get_hostp2m(d)->default_access);
diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index 35bb259808..13fd64ddf5 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -71,6 +71,7 @@ int compat_memory_op(unsigned int cmd, 
XEN_GUEST_HANDLE_PARAM(void) compat)
 struct xen_remove_from_physmap *xrfp;
 struct xen_vnuma_topology_info *vnuma;
 struct 

[Xen-devel] [PATCH v18 03/11] x86/hvm/ioreq: use gfn_t in struct hvm_ioreq_page

2018-03-22 Thread Paul Durrant
This patch adjusts the ioreq server code to use type-safe gfn_t values
where possible. No functional change.

Signed-off-by: Paul Durrant 
Reviewed-by: Roger Pau Monné 
Reviewed-by: Wei Liu 
Acked-by: Jan Beulich 
---
Cc: Andrew Cooper 

v18:
 - Trivial re-base.
---
 xen/arch/x86/hvm/ioreq.c | 46 
 xen/include/asm-x86/hvm/domain.h |  2 +-
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index bd141db0d5..d5f0e24b98 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -210,7 +210,7 @@ bool handle_hvm_io_completion(struct vcpu *v)
 return true;
 }
 
-static unsigned long hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
+static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
 {
 struct domain *d = s->target;
 unsigned int i;
@@ -220,20 +220,19 @@ static unsigned long hvm_alloc_ioreq_gfn(struct 
hvm_ioreq_server *s)
 for ( i = 0; i < sizeof(d->arch.hvm_domain.ioreq_gfn.mask) * 8; i++ )
 {
 if ( test_and_clear_bit(i, >arch.hvm_domain.ioreq_gfn.mask) )
-return d->arch.hvm_domain.ioreq_gfn.base + i;
+return _gfn(d->arch.hvm_domain.ioreq_gfn.base + i);
 }
 
-return gfn_x(INVALID_GFN);
+return INVALID_GFN;
 }
 
-static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s,
-   unsigned long gfn)
+static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
 {
 struct domain *d = s->target;
-unsigned int i = gfn - d->arch.hvm_domain.ioreq_gfn.base;
+unsigned int i = gfn_x(gfn) - d->arch.hvm_domain.ioreq_gfn.base;
 
 ASSERT(!IS_DEFAULT(s));
-ASSERT(gfn != gfn_x(INVALID_GFN));
+ASSERT(!gfn_eq(gfn, INVALID_GFN));
 
 set_bit(i, >arch.hvm_domain.ioreq_gfn.mask);
 }
@@ -242,7 +241,7 @@ static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, 
bool buf)
 {
 struct hvm_ioreq_page *iorp = buf ? >bufioreq : >ioreq;
 
-if ( iorp->gfn == gfn_x(INVALID_GFN) )
+if ( gfn_eq(iorp->gfn, INVALID_GFN) )
 return;
 
 destroy_ring_for_helper(>va, iorp->page);
@@ -251,7 +250,7 @@ static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, 
bool buf)
 if ( !IS_DEFAULT(s) )
 hvm_free_ioreq_gfn(s, iorp->gfn);
 
-iorp->gfn = gfn_x(INVALID_GFN);
+iorp->gfn = INVALID_GFN;
 }
 
 static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
@@ -264,16 +263,17 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, 
bool buf)
 return -EINVAL;
 
 if ( IS_DEFAULT(s) )
-iorp->gfn = buf ?
-d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] :
-d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN];
+iorp->gfn = _gfn(buf ?
+ d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] :
+ d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN]);
 else
 iorp->gfn = hvm_alloc_ioreq_gfn(s);
 
-if ( iorp->gfn == gfn_x(INVALID_GFN) )
+if ( gfn_eq(iorp->gfn, INVALID_GFN) )
 return -ENOMEM;
 
-rc = prepare_ring_for_helper(d, iorp->gfn, >page, >va);
+rc = prepare_ring_for_helper(d, gfn_x(iorp->gfn), >page,
+ >va);
 
 if ( rc )
 hvm_unmap_ioreq_gfn(s, buf);
@@ -309,10 +309,10 @@ static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server 
*s, bool buf)
 struct domain *d = s->target;
 struct hvm_ioreq_page *iorp = buf ? >bufioreq : >ioreq;
 
-if ( IS_DEFAULT(s) || iorp->gfn == gfn_x(INVALID_GFN) )
+if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
 return;
 
-if ( guest_physmap_remove_page(d, _gfn(iorp->gfn),
+if ( guest_physmap_remove_page(d, iorp->gfn,
_mfn(page_to_mfn(iorp->page)), 0) )
 domain_crash(d);
 clear_page(iorp->va);
@@ -324,15 +324,15 @@ static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, 
bool buf)
 struct hvm_ioreq_page *iorp = buf ? >bufioreq : >ioreq;
 int rc;
 
-if ( IS_DEFAULT(s) || iorp->gfn == gfn_x(INVALID_GFN) )
+if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
 return 0;
 
 clear_page(iorp->va);
 
-rc = guest_physmap_add_page(d, _gfn(iorp->gfn),
+rc = guest_physmap_add_page(d, iorp->gfn,
 _mfn(page_to_mfn(iorp->page)), 0);
 if ( rc == 0 )
-paging_mark_pfn_dirty(d, _pfn(iorp->gfn));
+paging_mark_pfn_dirty(d, _pfn(gfn_x(iorp->gfn)));
 
 return rc;
 }
@@ -595,8 +595,8 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
 INIT_LIST_HEAD(>ioreq_vcpu_list);
 spin_lock_init(>bufioreq_lock);
 
-s->ioreq.gfn = gfn_x(INVALID_GFN);
-s->bufioreq.gfn = gfn_x(INVALID_GFN);
+s->ioreq.gfn = INVALID_GFN;
+s->bufioreq.gfn = INVALID_GFN;
 
 rc 

Re: [Xen-devel] X86 Community Call - Wed Apr 11, 14:00 - 15:00 UTC - Call for Agenda Items

2018-03-22 Thread Roger Pau Monné
On Thu, Mar 22, 2018 at 10:27:35AM +, Paul Durrant wrote:
> De-htmling...
> 
> -
> From: Lars Kurth 
> Sent: 22 March 2018 10:22
> To: xen-de...@lists.xensource.com
> Cc: committ...@xenproject.org; Juergen Gross ; Janakarajan 
> Natarajan ; Tamas K Lengyel ; Wei Liu 
> ; Andrew Cooper ; Daniel 
> Kiper ; Roger Pau Monné ; 
> Christopher Clark ; Rich Persaud 
> ; Paul Durrant ; Jan Beulich' 
> ; Brian Woods ; intel-...@intel.com
> Subject: X86 Community Call - Wed Apr 11, 14:00 - 15:00 UTC - Call for Agenda 
> Items
> 
> Hi all,
> please find attached 
> a) Meeting details (just a link with timezones) – the meeting invite will 
> follow when we have an agenda
>    Bridge details – will be sent with the meeting invite
>    I am thinking of using GotoMeeting, but want to try this with a Linux only 
> user before I commit
> c) Call for agenda items
> A few suggestions were made, such as XPTI status (if applicable), PVH status
> Also we have some left-overs from the last call: see 
> https://lists.xenproject.org/archives/html/xen-devel/2018-03/threads.html#01571
>   
> Regards
> Lars
> == Meeting Details ==
> Wed April 11, 15:00 - 16:00 UTC
> International meeting times: 
> https://www.timeanddate.com/worldclock/meetingdetails.html?year=2018=4=11=14=0=0=224=24=179=136=37=33
>  
> == Agenda Proposal ==
> We start with a round the table call as to who is on the call (name and 
> company)
> === A) Coordination and Planning ===
> Coordinating who does what, what needs attention, what is blocked, etc. 
> A1) Short-term
> Any urgent issues related to the 4.11 release that need discussing 
> A2) Long-term, Larger series
> Please call out any x86 related series, that need attention in the longer 
> term. Provide
> * Title of series
> * Link to series (e.g. on 
> https://lists.xenproject.org/archives/html/xen-devel, markmail, …)
> * Describe any: Dependencies, Issues, etc. that are relevant
> === B) Design, architecture, feature eupdates related discussions ===
> Please highlight any design/architecture discussions that you would like to 
> cover. Please describe
> * Design, point to any mail discussions
> * Describe clearly what you are blocked on: highlight any issues
> === C) Demos, Sharing of Experiences, Sometimes discussion of specific 
> issues/bugs/problems/... ===
> Please highlight any of the above that you would like to cover. Please 
> describe
> * What the issue/experience/demo is that you would like to cover
> === D) AOB ===
> -
> 
> I think we need to discuss PCI emulation and our future direction. Our 
> current hybrid with QEMU is becoming increasingly problematic.

+1

I can also give an update on the PVH work.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v18 01/11] x86/hvm/ioreq: maintain an array of ioreq servers rather than a list

2018-03-22 Thread Paul Durrant
A subsequent patch will remove the current implicit limitation on creation
of ioreq servers which is due to the allocation of gfns for the ioreq
structures and buffered ioreq ring.

It will therefore be necessary to introduce an explicit limit and, since
this limit should be small, it simplifies the code to maintain an array of
that size rather than using a list.

Also, by reserving an array slot for the default server and populating
array slots early in create, the need to pass an 'is_default' boolean
to sub-functions can be avoided.

Some function return values are changed by this patch: Specifically, in
the case where the id of the default ioreq server is passed in, -EOPNOTSUPP
is now returned rather than -ENOENT.

Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 

v18:
 - non-trivial re-base.
 - small modification to FOR_EACH... macro to iterate backwards, to main-
   tain a previous undocumented but useful semantic that secondary
   emulators are selected in favour of qemu.
 - dropped R-b's because of change.

v10:
 - modified FOR_EACH... macro as suggested by Jan.
 - check for NULL in IS_DEFAULT macro as suggested by Jan.

v9:
 - modified FOR_EACH... macro as requested by Andrew.

v8:
 - Addressed various comments from Jan.

v7:
 - Fixed assertion failure found in testing.

v6:
 - Updated according to comments made by Roger on v4 that I'd missed.

v5:
 - Switched GET/SET_IOREQ_SERVER() macros to get/set_ioreq_server()
   functions to avoid possible double-evaluation issues.

v4:
 - Introduced more helper macros and relocated them to the top of the
   code.

v3:
 - New patch (replacing "move is_default into struct hvm_ioreq_server") in
   response to review comments.
---
 xen/arch/x86/hvm/ioreq.c | 539 +++
 xen/include/asm-x86/hvm/domain.h |  11 +-
 2 files changed, 265 insertions(+), 285 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 44d029499d..d8d4e96a80 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -33,6 +33,37 @@
 
 #include 
 
+static void set_ioreq_server(struct domain *d, unsigned int id,
+ struct hvm_ioreq_server *s)
+{
+ASSERT(id < MAX_NR_IOREQ_SERVERS);
+ASSERT(!s || !d->arch.hvm_domain.ioreq_server.server[id]);
+
+d->arch.hvm_domain.ioreq_server.server[id] = s;
+}
+
+#define GET_IOREQ_SERVER(d, id) \
+(d)->arch.hvm_domain.ioreq_server.server[id]
+
+static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
+ unsigned int id)
+{
+if ( id >= MAX_NR_IOREQ_SERVERS )
+return NULL;
+
+return GET_IOREQ_SERVER(d, id);
+}
+
+#define IS_DEFAULT(s) \
+((s) && (s) == GET_IOREQ_SERVER((s)->target, DEFAULT_IOSERVID))
+
+/* Iterate over all possible ioreq servers */
+#define FOR_EACH_IOREQ_SERVER(d, id, s) \
+for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \
+if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \
+continue; \
+else
+
 static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
 {
 shared_iopage_t *p = s->ioreq.va;
@@ -47,10 +78,9 @@ bool hvm_io_pending(struct vcpu *v)
 {
 struct domain *d = v->domain;
 struct hvm_ioreq_server *s;
+unsigned int id;
 
-list_for_each_entry ( s,
-  >arch.hvm_domain.ioreq_server.list,
-  list_entry )
+FOR_EACH_IOREQ_SERVER(d, id, s)
 {
 struct hvm_ioreq_vcpu *sv;
 
@@ -127,10 +157,9 @@ bool handle_hvm_io_completion(struct vcpu *v)
 struct hvm_vcpu_io *vio = >arch.hvm_vcpu.hvm_io;
 struct hvm_ioreq_server *s;
 enum hvm_io_completion io_completion;
+unsigned int id;
 
-  list_for_each_entry ( s,
-  >arch.hvm_domain.ioreq_server.list,
-  list_entry )
+FOR_EACH_IOREQ_SERVER(d, id, s)
 {
 struct hvm_ioreq_vcpu *sv;
 
@@ -243,13 +272,12 @@ static int hvm_map_ioreq_page(
 bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
 {
 const struct hvm_ioreq_server *s;
+unsigned int id;
 bool found = false;
 
 spin_lock_recursive(>arch.hvm_domain.ioreq_server.lock);
 
-list_for_each_entry ( s,
-  >arch.hvm_domain.ioreq_server.list,
-  list_entry )
+FOR_EACH_IOREQ_SERVER(d, id, s)
 {
 if ( (s->ioreq.va && s->ioreq.page == page) ||
  (s->bufioreq.va && s->bufioreq.page == page) )
@@ -302,7 +330,7 @@ static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server 
*s,
 }
 
 static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
- bool is_default, struct vcpu *v)
+ struct vcpu *v)
 {
 struct hvm_ioreq_vcpu *sv;
 int rc;
@@ -316,7 +344,8 @@ static int 

[Xen-devel] [PATCH v18 09/11] tools/libxenforeignmemory: reduce xenforeignmemory_restrict code footprint

2018-03-22 Thread Paul Durrant
By using a static inline stub in private.h for OS where this functionality
is not implemented, the various duplicate stubs in the OS-specific source
modules can be avoided.

Signed-off-by: Paul Durrant 
Reviewed-by: Roger Pau Monné 
Acked-by: Wei Liu 
---
Cc: Ian Jackson 

v4:
 - Removed extraneous freebsd code.

v3:
 - Patch added in response to review comments.
---
 tools/libs/foreignmemory/freebsd.c |  7 ---
 tools/libs/foreignmemory/minios.c  |  7 ---
 tools/libs/foreignmemory/netbsd.c  |  7 ---
 tools/libs/foreignmemory/private.h | 12 +---
 tools/libs/foreignmemory/solaris.c |  7 ---
 5 files changed, 9 insertions(+), 31 deletions(-)

diff --git a/tools/libs/foreignmemory/freebsd.c 
b/tools/libs/foreignmemory/freebsd.c
index dec447485a..6e6bc4b11f 100644
--- a/tools/libs/foreignmemory/freebsd.c
+++ b/tools/libs/foreignmemory/freebsd.c
@@ -95,13 +95,6 @@ int osdep_xenforeignmemory_unmap(xenforeignmemory_handle 
*fmem,
 return munmap(addr, num << PAGE_SHIFT);
 }
 
-int osdep_xenforeignmemory_restrict(xenforeignmemory_handle *fmem,
-domid_t domid)
-{
-errno = -EOPNOTSUPP;
-return -1;
-}
-
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/foreignmemory/minios.c 
b/tools/libs/foreignmemory/minios.c
index 75f340122e..43341ca301 100644
--- a/tools/libs/foreignmemory/minios.c
+++ b/tools/libs/foreignmemory/minios.c
@@ -58,13 +58,6 @@ int osdep_xenforeignmemory_unmap(xenforeignmemory_handle 
*fmem,
 return munmap(addr, num << PAGE_SHIFT);
 }
 
-int osdep_xenforeignmemory_restrict(xenforeignmemory_handle *fmem,
-domid_t domid)
-{
-errno = -EOPNOTSUPP;
-return -1;
-}
-
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/foreignmemory/netbsd.c 
b/tools/libs/foreignmemory/netbsd.c
index 9bf95ef4f0..54a418ebd6 100644
--- a/tools/libs/foreignmemory/netbsd.c
+++ b/tools/libs/foreignmemory/netbsd.c
@@ -100,13 +100,6 @@ int osdep_xenforeignmemory_unmap(xenforeignmemory_handle 
*fmem,
 return munmap(addr, num*XC_PAGE_SIZE);
 }
 
-int osdep_xenforeignmemory_restrict(xenforeignmemory_handle *fmem,
-domid_t domid)
-{
-errno = -EOPNOTSUPP;
-return -1;
-}
-
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/foreignmemory/private.h 
b/tools/libs/foreignmemory/private.h
index b191000b49..b06ce12583 100644
--- a/tools/libs/foreignmemory/private.h
+++ b/tools/libs/foreignmemory/private.h
@@ -35,9 +35,6 @@ void *osdep_xenforeignmemory_map(xenforeignmemory_handle 
*fmem,
 int osdep_xenforeignmemory_unmap(xenforeignmemory_handle *fmem,
  void *addr, size_t num);
 
-int osdep_xenforeignmemory_restrict(xenforeignmemory_handle *fmem,
-domid_t domid);
-
 #if defined(__NetBSD__) || defined(__sun__)
 /* Strictly compat for those two only only */
 void *compat_mapforeign_batch(xenforeignmem_handle *fmem, uint32_t dom,
@@ -57,6 +54,13 @@ struct xenforeignmemory_resource_handle {
 };
 
 #ifndef __linux__
+static inline int osdep_xenforeignmemory_restrict(xenforeignmemory_handle 
*fmem,
+  domid_t domid)
+{
+errno = EOPNOTSUPP;
+return -1;
+}
+
 static inline int osdep_xenforeignmemory_map_resource(
 xenforeignmemory_handle *fmem, xenforeignmemory_resource_handle *fres)
 {
@@ -70,6 +74,8 @@ static inline int osdep_xenforeignmemory_unmap_resource(
 return 0;
 }
 #else
+int osdep_xenforeignmemory_restrict(xenforeignmemory_handle *fmem,
+domid_t domid);
 int osdep_xenforeignmemory_map_resource(
 xenforeignmemory_handle *fmem, xenforeignmemory_resource_handle *fres);
 int osdep_xenforeignmemory_unmap_resource(
diff --git a/tools/libs/foreignmemory/solaris.c 
b/tools/libs/foreignmemory/solaris.c
index a33decb4ae..ee8aae4fbd 100644
--- a/tools/libs/foreignmemory/solaris.c
+++ b/tools/libs/foreignmemory/solaris.c
@@ -97,13 +97,6 @@ int osdep_xenforeignmemory_unmap(xenforeignmemory_handle 
*fmem,
 return munmap(addr, num*XC_PAGE_SIZE);
 }
 
-int osdep_xenforeignmemory_restrict(xenforeignmemory_handle *fmem,
-domid_t domid)
-{
-errno = -EOPNOTSUPP;
-return -1;
-}
-
 /*
  * Local variables:
  * mode: C
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v18 07/11] x86/mm: add an extra command to HYPERVISOR_mmu_update...

2018-03-22 Thread Paul Durrant
...to allow the calling domain to prevent translation of specified l1e
value.

Despite what the comment in public/xen.h might imply, specifying a
command value of MMU_NORMAL_PT_UPDATE will not simply update an l1e with
the specified value. Instead, mod_l1_entry() tests whether foreign_dom
has PG_translate set in its paging mode and, if it does, assumes that the
the pfn value in the l1e is a gfn rather than an mfn.

To allow PV tools domain to map mfn values from a previously issued
HYPERVISOR_memory_op:XENMEM_acquire_resource, there needs to be a way
to tell HYPERVISOR_mmu_update that the specific l1e value does not
require translation regardless of the paging mode of foreign_dom. This
patch therefore defines a new command value, MMU_PT_UPDATE_NO_TRANSLATE,
which has the same semantics as MMU_NORMAL_PT_UPDATE except that the
paging mode of foreign_dom is ignored and the l1e value is used verbatim.

Signed-off-by: Paul Durrant 
Reviewed-by: Jan Beulich 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 

v13:
 - Re-base.

v8:
 - New in this version, replacing "allow a privileged PV domain to map
   guest mfns".
---
 xen/arch/x86/mm.c| 13 -
 xen/include/public/xen.h | 12 +---
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 846cc61935..8e3be1f263 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1901,9 +1901,10 @@ void page_unlock(struct page_info *page)
 
 /* Update the L1 entry at pl1e to new value nl1e. */
 static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
-unsigned long gl1mfn, int preserve_ad,
+unsigned long gl1mfn, unsigned int cmd,
 struct vcpu *pt_vcpu, struct domain *pg_dom)
 {
+bool preserve_ad = (cmd == MMU_PT_UPDATE_PRESERVE_AD);
 l1_pgentry_t ol1e;
 struct domain *pt_dom = pt_vcpu->domain;
 int rc = 0;
@@ -1925,7 +1926,8 @@ static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t 
nl1e,
 }
 
 /* Translate foreign guest address. */
-if ( paging_mode_translate(pg_dom) )
+if ( cmd != MMU_PT_UPDATE_NO_TRANSLATE &&
+ paging_mode_translate(pg_dom) )
 {
 p2m_type_t p2mt;
 p2m_query_t q = l1e_get_flags(nl1e) & _PAGE_RW ?
@@ -3617,6 +3619,7 @@ long do_mmu_update(
  */
 case MMU_NORMAL_PT_UPDATE:
 case MMU_PT_UPDATE_PRESERVE_AD:
+case MMU_PT_UPDATE_NO_TRANSLATE:
 {
 p2m_type_t p2mt;
 
@@ -3676,8 +3679,7 @@ long do_mmu_update(
 {
 case PGT_l1_page_table:
 rc = mod_l1_entry(va, l1e_from_intpte(req.val), mfn,
-  cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
-  pg_owner);
+  cmd, v, pg_owner);
 break;
 
 case PGT_l2_page_table:
@@ -3988,7 +3990,8 @@ static int __do_update_va_mapping(
 goto out;
 }
 
-rc = mod_l1_entry(pl1e, val, mfn_x(gl1mfn), 0, v, pg_owner);
+rc = mod_l1_entry(pl1e, val, mfn_x(gl1mfn), MMU_NORMAL_PT_UPDATE, v,
+  pg_owner);
 
 page_unlock(gl1pg);
 put_page(gl1pg);
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 308109f176..fb1df8f293 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -268,6 +268,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
  * As MMU_NORMAL_PT_UPDATE above, but A/D bits currently in the PTE are ORed
  * with those in @val.
  *
+ * ptr[1:0] == MMU_PT_UPDATE_NO_TRANSLATE:
+ * As MMU_NORMAL_PT_UPDATE above, but @val is not translated though FD
+ * page tables.
+ *
  * @val is usually the machine frame number along with some attributes.
  * The attributes by default follow the architecture defined bits. Meaning that
  * if this is a X86_64 machine and four page table layout is used, the layout
@@ -334,9 +338,11 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
  *
  * PAT (bit 7 on) --> PWT (bit 3 on) and clear bit 7.
  */
-#define MMU_NORMAL_PT_UPDATE  0 /* checked '*ptr = val'. ptr is MA.  */
-#define MMU_MACHPHYS_UPDATE   1 /* ptr = MA of frame to modify entry for */
-#define MMU_PT_UPDATE_PRESERVE_AD 2 /* atomically: *ptr = val | (*ptr&(A|D)) */
+#define MMU_NORMAL_PT_UPDATE   0 /* checked '*ptr = val'. ptr is MA.  
*/
+#define MMU_MACHPHYS_UPDATE1 /* ptr = MA of frame to modify entry for 
*/
+#define MMU_PT_UPDATE_PRESERVE_AD  2 /* atomically: *ptr = val | (*ptr&(A|D)) 
*/
+#define MMU_PT_UPDATE_NO_TRANSLATE 3 /* checked '*ptr = val'. ptr is MA.  
*/
+  

Re: [Xen-devel] [for-4.11][PATCH v6 16/16] xen: Convert page_to_mfn and mfn_to_page to use typesafe MFN

2018-03-22 Thread Tim Deegan
Hi,

At 04:47 + on 21 Mar (1521607657), Julien Grall wrote:
> Most of the users of page_to_mfn and mfn_to_page are either overriding
> the macros to make them work with mfn_t or use mfn_x/_mfn because the
> rest of the function use mfn_t.
> 
> So make page_to_mfn and mfn_to_page return mfn_t by default. The __*
> version are now dropped as this patch will convert all the remaining
> non-typesafe callers.
> 
> Only reasonable clean-ups are done in this patch. The rest will use
> _mfn/mfn_x for the time being.
> 
> Lastly, domain_page_to_mfn is also converted to use mfn_t given that
> most of the callers are now switched to _mfn(domain_page_to_mfn(...)).
> 
> Signed-off-by: Julien Grall 
> Acked-by: Razvan Cojocaru 
> Reviewed-by: Paul Durrant 
> Reviewed-by: Boris Ostrovsky 
> Reviewed-by: Kevin Tian 
> Reviewed-by: Wei Liu 
> Acked-by: Jan Beulich 
> Reviewed-by: George Dunlap 

Thought I'd already acked this for the shadow code, but clearly not.
Sorry for the delay, and:

Acked-by: Tim Deegan 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 3/7] xen/x86: support per-domain flag for xpti

2018-03-22 Thread Juergen Gross
On 22/03/18 16:26, Jan Beulich wrote:
 On 21.03.18 at 13:51,  wrote:
>> +void xpti_domain_init(struct domain *d)
>> +{
>> +if ( !is_pv_domain(d) || is_pv_32bit_domain(d) )
>> +return;
> 
> As you rely on the zero-initialization of the field here, ...
> 
>> +switch ( opt_xpti )
>> +{
>> +case XPTI_OFF:
>> +d->arch.pv_domain.xpti = false;
> 
> ... this could go away as well.

I wanted to make the switch statement complete. No problem to drop
setting of xpti here of you like that better.

> 
>> @@ -1050,8 +1050,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>>  panic("Error %d setting up PV root page table\n", rc);
>>  if ( per_cpu(root_pgt, 0) )
>>  {
>> -get_cpu_info()->pv_cr3 = __pa(per_cpu(root_pgt, 0));
>> -
>> +get_cpu_info()->pv_cr3 = 0;
>>  /*
>>   * All entry points which may need to switch page tables have to 
>> start
>>   * with interrupts off. Re-write what pv_trap_init() has put there.
> 
> Please don't drop the blank line.

Okay.

> 
>> @@ -36,7 +38,8 @@ static inline void pv_vcpu_destroy(struct vcpu *v) {}
>>  static inline int pv_vcpu_initialise(struct vcpu *v) { return -EOPNOTSUPP; }
>>  static inline void pv_domain_destroy(struct domain *d) {}
>>  static inline int pv_domain_initialise(struct domain *d) { return 
>> -EOPNOTSUPP; }
>> -
>> +static inline void xpti_init(void) {}
>> +static inline void xpti_domain_init(struct domain *d) {}
>>  #endif  /* CONFIG_PV */
> 
> Same here. With that
> Reviewed-by: Jan Beulich 

Thanks,


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 4/7] xen/x86: use invpcid for flushing the TLB

2018-03-22 Thread Jan Beulich
>>> On 21.03.18 at 13:51,  wrote:
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1380,6 +1380,14 @@ Because responsibility for APIC setup is shared 
> between Xen and the
>  domain 0 kernel this option is automatically propagated to the domain
>  0 command line.
>  
> +### noinvpcid (x86)
> +> `= `
> +
> +Disable using the INVPCID instruction for flushing TLB entries.
> +This should only be used in case of known issues on the current platform
> +with that instruction. Disabling INVPCID will normally result in a slightly
> +degraded performance.

At the first glance this looks as if it wants to be a cpuid=
sub-option. However, that would disable use by both Xen and
(HVM) guests. Andrew, what are your plans here as to
distinguishing the "Xen uses a feature" from the "disable use of
a feature altogether"?

If we stay with a separate option, then please make this a
normal boolean one (i.e. drop the "no" prefix), as "no-noinvpcid"
is rather ugly.

> @@ -457,7 +472,6 @@ static void generic_set_all(void)
>   set_bit(count, _changes_mask);
>   mask >>= 1;
>   }
> - 
>  }
>  
>  static void generic_set_mtrr(unsigned int reg, unsigned long base,

I don't mind this line being dropped, but in general please avoid
stray changes which aren't assimilated into changes you do anyway.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] X86 Community Call - Wed Apr 11, 14:00 - 15:00 UTC - Call for Agenda Items

2018-03-22 Thread Razvan Cojocaru
On 03/22/2018 12:22 PM, Lars Kurth wrote:
> Hi all,
> 
> please find attached 
> a) Meeting details (just a link with timezones) – the meeting invite
> will follow when we have an agenda
>    Bridge details – will be sent with the meeting invite
>    I am thinking of using GotoMeeting, but want to try this with a Linux
> only user before I commit
> c) Call for agenda items

Using GotoMeeting would be great (as a Linux user I recall being able to
hear / see GotoMeeting sessions, though I've not tried it recently).

It's definitely more convenient than a phone meeting.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12 05/12] pci: add support to size ROM BARs to pci_size_mem_bar

2018-03-22 Thread Roger Pau Monne
Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
---
Changes since v6:
 - Remove the rom local variable.

Changes since v5:
 - Use the flags field.
 - Introduce a mask local variable.
 - Simplify return.

Changes since v4:
 - New in this version.
---
 xen/drivers/passthrough/pci.c | 28 ++--
 xen/include/xen/pci.h |  1 +
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index c0846e8ebb..1db69d5b99 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -610,11 +610,16 @@ unsigned int pci_size_mem_bar(pci_sbdf_t sbdf, unsigned 
int pos,
 uint32_t hi = 0, bar = pci_conf_read32(sbdf.seg, sbdf.bus, sbdf.dev,
sbdf.func, pos);
 uint64_t size;
-
-ASSERT((bar & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_MEMORY);
+bool is64bits = !(flags & PCI_BAR_ROM) &&
+(bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64;
+uint32_t mask = (flags & PCI_BAR_ROM) ? (uint32_t)PCI_ROM_ADDRESS_MASK
+  : 
(uint32_t)PCI_BASE_ADDRESS_MEM_MASK;
+
+ASSERT(!((flags & PCI_BAR_VF) && (flags & PCI_BAR_ROM)));
+ASSERT((flags & PCI_BAR_ROM) ||
+   (bar & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_MEMORY);
 pci_conf_write32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, pos, ~0);
-if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
- PCI_BASE_ADDRESS_MEM_TYPE_64 )
+if ( is64bits )
 {
 if ( flags & PCI_BAR_LAST )
 {
@@ -628,10 +633,9 @@ unsigned int pci_size_mem_bar(pci_sbdf_t sbdf, unsigned 
int pos,
 hi = pci_conf_read32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, pos + 4);
 pci_conf_write32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, pos + 4, ~0);
 }
-size = pci_conf_read32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, pos) &
-   PCI_BASE_ADDRESS_MEM_MASK;
-if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
- PCI_BASE_ADDRESS_MEM_TYPE_64 )
+size = pci_conf_read32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func,
+   pos) & mask;
+if ( is64bits )
 {
 size |= (uint64_t)pci_conf_read32(sbdf.seg, sbdf.bus, sbdf.dev,
   sbdf.func, pos + 4) << 32;
@@ -643,14 +647,10 @@ unsigned int pci_size_mem_bar(pci_sbdf_t sbdf, unsigned 
int pos,
 size = -size;
 
 if ( paddr )
-*paddr = (bar & PCI_BASE_ADDRESS_MEM_MASK) | ((uint64_t)hi << 32);
+*paddr = (bar & mask) | ((uint64_t)hi << 32);
 *psize = size;
 
-if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
- PCI_BASE_ADDRESS_MEM_TYPE_64 )
-return 2;
-
-return 1;
+return is64bits ? 2 : 1;
 }
 
 int pci_add_device(u16 seg, u8 bus, u8 devfn,
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 2f171a8dcc..4cfa774615 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -191,6 +191,7 @@ const char *parse_pci_seg(const char *, unsigned int *seg, 
unsigned int *bus,
 
 #define PCI_BAR_VF  (1u << 0)
 #define PCI_BAR_LAST(1u << 1)
+#define PCI_BAR_ROM (1u << 2)
 unsigned int pci_size_mem_bar(pci_sbdf_t sbdf, unsigned int pos,
   uint64_t *paddr, uint64_t *psize,
   unsigned int flags);
-- 
2.16.2


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12 02/12] x86/mmcfg: add handlers for the PVH Dom0 MMCFG areas

2018-03-22 Thread Roger Pau Monne
Introduce a set of handlers for the accesses to the MMCFG areas. Those
areas are setup based on the contents of the hardware MMCFG tables,
and the list of handled MMCFG areas is stored inside of the hvm_domain
struct.

The read/writes are forwarded to the generic vpci handlers once the
address is decoded in order to obtain the device and register the
guest is trying to access.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Paul Durrant 
Reviewed-by: Jan Beulich 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Paul Durrant 
---
Changes since v7:
 - Add check for end_bus >= start_bus to register_vpci_mmcfg_handler.
 - Protect destroy_vpci_mmcfg with the mmcfg_lock.

Changes since v6:
 - Move allocation of mmcfg outside of the locked region.
 - Do proper overlap checks when adding mmcfg regions.
 - Return _RETRY if the mcfg region cannot be found in the read/write
   handlers. This means the mcfg area has been removed between the
   accept and the read/write calls.

Changes since v5:
 - Switch to use pci_sbdf_t.
 - Switch to the new per vpci locks.
 - Move the mmcfg related external definitions to asm-x86/pci.h.

Changes since v4:
 - Change the attribute of pvh_setup_mmcfg to __hwdom_init.
 - Try to add as many MMCFG regions as possible, even if one fails to
   add.
 - Change some fields of the hvm_mmcfg struct: turn size into a
   unsigned int, segment into uint16_t and bus into uint8_t.
 - Convert some address parameters from unsigned long to paddr_t for
   consistency.
 - Make vpci_mmcfg_decode_addr return the decoded register in the
   return of the function.
 - Introduce a new macro to convert a MMCFG address into a BDF, and
   use it in vpci_mmcfg_decode_addr to clarify the logic.
 - In vpci_mmcfg_{read/write} unify the logic for 8B accesses and
   smaller ones.
 - Add the __hwdom_init attribute to register_vpci_mmcfg_handler.
 - Test that reg + size doesn't cross a device boundary.

Changes since v3:
 - Propagate changes from previous patches: drop xen_ prefix for vpci
   functions, pass slot and func instead of devfn and fix the error
   paths of the MMCFG handlers.
 - s/ecam/mmcfg/.
 - Move the destroy code to a separate function, so the hvm_mmcfg
   struct can be private to hvm/io.c.
 - Constify the return of vpci_mmcfg_find.
 - Use d instead of v->domain in vpci_mmcfg_accept.
 - Allow 8byte accesses to the mmcfg.

Changes since v1:
 - Added locking.
---
 xen/arch/x86/hvm/dom0_build.c|  21 +
 xen/arch/x86/hvm/hvm.c   |   4 +
 xen/arch/x86/hvm/io.c| 184 ++-
 xen/arch/x86/x86_64/mmconfig.h   |   4 -
 xen/include/asm-x86/hvm/domain.h |   4 +
 xen/include/asm-x86/hvm/io.h |   7 ++
 xen/include/asm-x86/pci.h|   6 ++
 7 files changed, 225 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 1c70416af4..259814d95d 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -1055,6 +1056,24 @@ static int __init pvh_setup_acpi(struct domain *d, 
paddr_t start_info)
 return 0;
 }
 
+static void __hwdom_init pvh_setup_mmcfg(struct domain *d)
+{
+unsigned int i;
+int rc;
+
+for ( i = 0; i < pci_mmcfg_config_num; i++ )
+{
+rc = register_vpci_mmcfg_handler(d, pci_mmcfg_config[i].address,
+ pci_mmcfg_config[i].start_bus_number,
+ pci_mmcfg_config[i].end_bus_number,
+ pci_mmcfg_config[i].pci_segment);
+if ( rc )
+printk("Unable to setup MMCFG handler at %#lx for segment %u\n",
+   pci_mmcfg_config[i].address,
+   pci_mmcfg_config[i].pci_segment);
+}
+}
+
 int __init dom0_construct_pvh(struct domain *d, const module_t *image,
   unsigned long image_headroom,
   module_t *initrd,
@@ -1096,6 +1115,8 @@ int __init dom0_construct_pvh(struct domain *d, const 
module_t *image,
 return rc;
 }
 
+pvh_setup_mmcfg(d);
+
 panic("Building a PVHv2 Dom0 is not yet supported.");
 return 0;
 }
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 26f6335854..346e11f2d6 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -584,8 +584,10 @@ int hvm_domain_initialise(struct domain *d)
 spin_lock_init(>arch.hvm_domain.irq_lock);
 spin_lock_init(>arch.hvm_domain.uc_lock);
 spin_lock_init(>arch.hvm_domain.write_map.lock);
+rwlock_init(>arch.hvm_domain.mmcfg_lock);
 INIT_LIST_HEAD(>arch.hvm_domain.write_map.list);
 INIT_LIST_HEAD(>arch.hvm_domain.g2m_ioport_list);
+INIT_LIST_HEAD(>arch.hvm_domain.mmcfg_regions);
 
 rc = 

[Xen-devel] [PATCH v12 07/12] vpci: add header handlers

2018-03-22 Thread Roger Pau Monne
Introduce a set of handlers that trap accesses to the PCI BARs and the
command register, in order to snoop BAR sizing and BAR relocation.

The command handler is used to detect changes to bit 2 (response to
memory space accesses), and maps/unmaps the BARs of the device into
the guest p2m. A rangeset is used in order to figure out which memory
to map/unmap. This makes it easier to keep track of the possible
overlaps with other BARs, and will also simplify MSI-X support, where
certain regions of a BAR might be used for the MSI-X table or PBA.

The BAR register handlers are used to detect attempts by the guest to
size or relocate the BARs.

Note that the long running BAR mapping and unmapping operations are
deferred to be performed by hvm_io_pending, so that they can be safely
preempted.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
[IO]
Reviewed-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Paul Durrant 
---
Changes since v11:
 - Fix initialization of sbdf with gcc 4.3.

Changes since v10:
 - Fix indirect function call in map_range.
 - Use rom->addr instead of fetching it from the ROM BAR register in
   modify_decoding.
 - Remove ternary operator from modify_decoding.
 - Simply apply_map to have a single return.
 - Constify pci_dev parameter of apply_map.
 - Remove references to maybe_defer_map.
 - Use pdev (const) or dev (non-const) consistently in modify_bars.
 - Invert part of the logic in rom_write to remove one indentation
   level.
 - Add comments in rom_write to clarify why rom->addr is updated in
   two different places.
 - Use lx to print frame numbers in modify_bars.
 - Add start/end local variables in the first modify_bars loop.

Changes since v9:
 - Expand comments to clarify the code.
 - Rename rom to rom_only in the vpci_cpu struct.
 - Change definition style of dummy vpci_cpu.
 - Replace incorrect usage of PFN_UP.
 - Use system_state in order to check if the mapping functions are
   being called from Dom0 builder context.
 - Split the maybe_defer_map into two functions and place the Dom0
   builder one in the init section.

Changes since v8:
 - Do not pretend to support ARM in the map_range function. Explain
   the required changes in the comment.
 - Introduce PCI_HEADER_{NORMAL/BRIDGE}_NR_BARS defines.
 - Rename 'rom' boolean variable to 'rom_only', which is more
   descriptive of it's meaning.
 - Introduce vpci_remove_device which removes all handlers for a
   device.
 - Simplify error handling when modifying BARs mapping. Any error will
   cause the device to be unplugged (by calling vpci_remove_device).
 - Return an error code in modify_bars. Add comments describing why
   the error is sometimes ignored.

Changes since v7:
 - Order includes.
 - Add newline between switch cases.
 - Fix typo in comment (hopping).
 - Wrap ternary conditional in parentheses.
 - Remove CONFIG_HAS_PCI gueard from sched.h vpci_vcpu usage.
 - Add comment regarding vpci_vcpu usage.
 - Move rom_enabled from BAR struct to header.
 - Do not protect vpci_vcpu with __XEN__ guards.

Changes since v6:
 - s/vpci_check_pending/vpci_process_pending/.
 - Improve error handling in vpci_process_pending.
 - Add a comment that explains how vpci_check_bar_overlap works.
 - Add error messages to vpci_modify_bars and vpci_modify_rom.
 - Introduce vpci_hw_read16/32, in order to passthrough reads to
   the underlying hw.
 - Print BAR number on error in vpci_bar_write.
 - Place the CONFIG_HAS_PCI guards inside the vpci.h header and
   provide an empty vpci_vcpu structure for the !CONFIG_HAS_PCI case.
 - Define CONFIG_HAS_PCI in the test harness emul.h header before
   including vpci.h
 - Add ARM TODOs and an ARM-specific bodge to vpci_map_range due to
   the lack of preemption in {un}map_mmio_regions.
 - Make vpci_maybe_defer_map void.
 - Set rom_enabled in vpci_init_bars.
 - Defer enabling/disabling the memory decoding (or the ROM enable
   bit) until the memory has been mapped/unmapped.
 - Remove vpci_ prefix from static functions.
 - Use the same code in order to map the general BARs and the ROM
   BARs.
 - Remove the seg/bus local variables and use pdev->{seg,bus} instead.
 - Convert the bools in the BAR related structs into bool bitfields.
 - Add the must_check attribute to vpci_process_pending.
 - Open code check_bar_overlap inside modify_bars, which was it's only
   user.

Changes since v5:
 - Switch to the new handler type.
 - Use pci_sbdf_t to size the BARs.
 - Use a single return for vpci_modify_bar.
 - Do not return an error code from vpci_modify_bars, just log the
   

[Xen-devel] [PATCH v12 06/12] xen: introduce rangeset_consume_ranges

2018-03-22 Thread Roger Pau Monne
This function allows to iterate over a rangeset while removing the
processed regions.

This will be used in order to split processing of large memory areas
when mapping them into the guest p2m.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Wei Liu 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
---
Changes since v6:
 - Expand commit message.
 - Add a comment to describe the expected function behavior.
 - Fix indentation.

Changes since v5:
 - New in this version.
---
 xen/common/rangeset.c  | 28 
 xen/include/xen/rangeset.h | 10 ++
 2 files changed, 38 insertions(+)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index ade34f6a50..bb68ce62e4 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -350,6 +350,34 @@ int rangeset_claim_range(struct rangeset *r, unsigned long 
size,
 return 0;
 }
 
+int rangeset_consume_ranges(struct rangeset *r,
+int (*cb)(unsigned long s, unsigned long e, void *,
+  unsigned long *c),
+void *ctxt)
+{
+int rc = 0;
+
+write_lock(>lock);
+while ( !rangeset_is_empty(r) )
+{
+unsigned long consumed = 0;
+struct range *x = first_range(r);
+
+rc = cb(x->s, x->e, ctxt, );
+
+ASSERT(consumed <= x->e - x->s + 1);
+x->s += consumed;
+if ( x->s > x->e )
+destroy_range(r, x);
+
+if ( rc )
+break;
+}
+write_unlock(>lock);
+
+return rc;
+}
+
 int rangeset_add_singleton(
 struct rangeset *r, unsigned long s)
 {
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index 1f83b1f44b..583b72bb0c 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -70,6 +70,16 @@ int rangeset_report_ranges(
 struct rangeset *r, unsigned long s, unsigned long e,
 int (*cb)(unsigned long s, unsigned long e, void *), void *ctxt);
 
+/*
+ * Note that the consume function can return an error value apart from
+ * -ERESTART, and that no cleanup is performed (ie: the user should call
+ * rangeset_destroy if needed).
+ */
+int rangeset_consume_ranges(struct rangeset *r,
+int (*cb)(unsigned long s, unsigned long e,
+  void *, unsigned long *c),
+void *ctxt);
+
 /* Add/remove/query a single number. */
 int __must_check rangeset_add_singleton(
 struct rangeset *r, unsigned long s);
-- 
2.16.2


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3a 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ

2018-03-22 Thread Julien Grall

Hi Andre,

On 03/22/2018 11:56 AM, Andre Przywara wrote:

When playing around with hardware mapped, level triggered virtual IRQs,
there is the need to explicitly set the active or pending state of an
interrupt at some point.
To prepare the GIC for that, we introduce a set_active_state() and a
set_pending_state() function to let the VGIC manipulate the state of
an associated hardware IRQ.
This takes care of properly setting the _IRQ_INPROGRESS bit.

Signed-off-by: Andre Przywara 


Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Jan Beulich
>>> On 22.03.18 at 15:34,  wrote:
> On Thu, 22 Mar 2018 07:20:00 -0600
> "Jan Beulich"  wrote:
> 
> On 22.03.18 at 14:05,  wrote:  
>>> On Thu, 22 Mar 2018 06:09:44 -0600
>>> "Jan Beulich"  wrote:
>>>   
>>> On 22.03.18 at 12:56,  wrote:
> I really don't understand why some people have that fear of
> emulated MMCONFIG -- it's really the same thing as any other MMIO
> range QEMU already emulates via map_io_range_to_ioreq_server(). No
> sensitive information exposed. It is related only to emulated PCI
> conf space which QEMU already knows about and use, providing
> emulated PCI devices for it.

You continue to ignore the routing requirement multiple ioreq
servers impose.  
>>> 
>>> If the emulated MMCONFIG approach will be modified to become
>>> fully compatible with multiple ioreq servers (whatever they used
>>> for), I assume there will be no objections that emulated MMCONFIG
>>> can't be used?
>>> I just want to clarify this moment -- why people think that
>>> a completely emulated MMIO range, not related in any
>>> way to host's MMCONFIG may compromise something.  
>>
>>Compromise? All that was said so far - afair - was that this is the
>>wrong way round design wise.
> 
> I assume it's all about emulating some real system for HVM, for other
> goals PV/PVH are available. What is a proper, design-wise way to
> emulate the MMIO-based MMCONFIG range Q35 provides you think of?
> 
> Here is what I've heard so far in this thread:
> 
> 1. Add a completely new dmop/hypercall so that QEMU can tell Xen where
> emulated MMCONFIG MMIO area is located and in the same time map it for
> MMIO trapping to intercept accesses. Latter action is the same what
> map_io_range_to_ioreq_server() does, but let's ignore it for now
> because there was opinion that we need to stick to a distinct hypercall.
> 
> 2. Upon trapping accesses to this emulated range, Xen will pretend that
> QEMU didn't just told him about MMCONFIG location and size and instead
> convert MMIO access into PCI conf one and send the ioreq to QEMU or
> some other DM.
> 
> 3. If there will be a PCIEXBAR relocation (OVMF does it currently for
> MMCONFIG usage, but we must later teach him non-QEMU manners), QEMU must
> immediately inform Xen about any changes in MMCONFIG location/status.
> 
> 4. QEMU receives PCI conf access while expecting the MMIO address, so
> xen-hvm.c has to deal with it somehow, either obtaining MMCONFIG base
> and recreating emulated MMIO access from BDF/reg or doing the dirty work
> of finding PCIBus/PCIDevice target itself as it cannot use emulated
> CF8/CFC ports due to legacy PCI conf size limitation.
> 
> Please confirm that it is a preferable solution or if something missing.

I'm afraid this is only part of the picture, as you've been told by
others before. We first of all need to settle on who emulates
the core chipset registers. Depending on that will be how Xen
would learn about the MCFG location inside the guest.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86emul: fix #XM delivery typo

2018-03-22 Thread Andrew Cooper
On 22/03/18 14:41, Roger Pau Monné wrote:
> On Thu, Mar 22, 2018 at 08:40:04AM -0600, Jan Beulich wrote:
>> This clearly wasn't meant the way it was originally written.
>>
>> Reported-by: Roger Pau Monné 
>> Signed-off-by: Jan Beulich 
> Reviewed-by: Roger Pau Monné 

Acked-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 1/7] x86/xpti: avoid copying L4 page table contents when possible

2018-03-22 Thread Juergen Gross
On 22/03/18 15:31, Jan Beulich wrote:
 On 21.03.18 at 13:51,  wrote:
>> --- a/xen/arch/x86/flushtlb.c
>> +++ b/xen/arch/x86/flushtlb.c
>> @@ -158,6 +158,9 @@ unsigned int flush_area_local(const void *va, unsigned 
>> int flags)
>>  }
>>  }
>>  
>> +if ( flags & FLUSH_ROOT_PGTBL )
>> +get_cpu_info()->root_pgt_changed = true;
>> +
>>  local_irq_restore(irqfl);
>>  
>>  return flags;
> 
> Does this really need to sit inside the interrupts disabled section?

Hmm, no, I don't think so. I'll move it below local_irq_restore().

> Thinking about it I even wonder whether the cache flush part needs
> to be. Even for the INVLPG portion of the TLB flush part I can't
> seem to see a need for IRQs to be off. I think it's really just the
> pre_flush() / post_flush() pair which needs to be inside such a
> section. I'll prepare a patch (for after 4.11). I think some of the
> changes later in your series will actually further ease this.
> 
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -499,10 +499,15 @@ void free_shared_domheap_page(struct page_info *page)
>>  void make_cr3(struct vcpu *v, mfn_t mfn)
>>  {
>>  v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
>> +if ( v == current && this_cpu(root_pgt) && is_pv_vcpu(v) &&
>> + !is_pv_32bit_vcpu(v) )
>> +get_cpu_info()->root_pgt_changed = true;
>>  }
> 
> As this doesn't actually update CR3, setting the flag shouldn't
> generally be necessary if the caller then invokes write_ptbase().
> Isn't setting the flag here needed solely in the case of
> _toggle_guest_pt() being up the call tree? In which case it would
> perhaps better be set there (and in turn some or even all of the
> conditional around it could be dropped)?

Yes, you are right.

> 
>>  void write_ptbase(struct vcpu *v)
>>  {
>> +if ( this_cpu(root_pgt) && is_pv_vcpu(v) && !is_pv_32bit_vcpu(v) )
>> +get_cpu_info()->root_pgt_changed = true;
>>  write_cr3(v->arch.cr3);
> 
> When you come here from e.g. __sync_local_execstate(), you
> don't really need to set the flag. Of course you'll come here again
> before the next 64-bit PV vCPU will make it to restore_all_guest,
> so by the time we make it there the flag will be set anyway.
> However, if you already use such a subtlety, then there's also
> no point excluding 32-bit vCPU-s here (nor in make_cr3()), as
> those will never make it to restore_all_guest. Same then for
> excluding HVM vCPU-s. And I then wonder whether (here or
> more likely in a later patch) the root_pgt check couldn't go away
> as well.

I'm not sure this is worth it. Patch 3 will re-introduce a conditional
here and it will look rather different (e.g. without the root_pgt
check). So micro-optimizing this patch barely makes any sense.

> 
>> @@ -3698,18 +3703,29 @@ long do_mmu_update(
>>  break;
>>  rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
>>cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
>> -/*
>> - * No need to sync if all uses of the page can be 
>> accounted
>> - * to the page lock we hold, its pinned status, and 
>> uses on
>> - * this (v)CPU.
>> - */
>> -if ( !rc && !cpu_has_no_xpti &&
>> - ((page->u.inuse.type_info & PGT_count_mask) >
>> -  (1 + !!(page->u.inuse.type_info & PGT_pinned) +
>> -   (pagetable_get_pfn(curr->arch.guest_table) == 
>> mfn) 
>> +
>> -   (pagetable_get_pfn(curr->arch.guest_table_user) 
>> ==
>> -mfn))) )
>> -sync_guest = true;
>> +if ( !rc && !cpu_has_no_xpti )
>> +{
>> +bool local_in_use = false;
>> +
>> +if ( (pagetable_get_pfn(curr->arch.guest_table) ==
>> +  mfn) ||
>> + 
>> (pagetable_get_pfn(curr->arch.guest_table_user) ==
>> +  mfn) )
>> +{
>> +local_in_use = true;
>> +get_cpu_info()->root_pgt_changed = true;
>> +}
> 
> The conditional causes root_pgt_changed to get set even in cases
> where what CR3 points to doesn't actually change (if it's the user
> page tables that get modified). I think you want to check
> curr->arch.cr3 here, or only curr->arch.guest_table (as user mode
> can't invoke hypercalls).

I'll go with curr->arch.guest_table.

> 
>> +/*
>> + * No need to sync if all uses of the page can be
>> + * accounted to the page lock we hold, its pinned
>> + * status, and uses on this (v)CPU.
>> + */
>> +

Re: [Xen-devel] possible I/O emulation state machine issue

2018-03-22 Thread Andrew Cooper
On 22/03/18 15:12, Jan Beulich wrote:
> Paul,
>
> our PV driver person has found a reproducible crash with ws2k8,
> triggered by one of the WHQL tests. The guest get crashed because
> the re-issue check of an ioreq close to the top of hvmemul_do_io()
> fails. I've handed him a first debugging patch, output of which
> suggests that we're dealing with a completely new request, which
> in turn would mean that we've run into stale STATE_IORESP_READY
> state:
>
> (XEN) d2v3: t=0/1 a=3c4/fed000f0 s=2/4 c=1/1 d=0/1 f=0/0 p=0/0 
> v=100/831873f27a30
> (XEN) [ Xen-4.10.0_15-0  x86_64  debug=n   Tainted:  C   ]

Irrespective of the issue at hand, can testing be tried with a debug
build to see if any of the assertions are hit?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Jan Beulich
>>> On 22.03.18 at 14:05,  wrote:
> On Thu, 22 Mar 2018 06:09:44 -0600
> "Jan Beulich"  wrote:
> 
> On 22.03.18 at 12:56,  wrote:  
>>> I really don't understand why some people have that fear of emulated
>>> MMCONFIG -- it's really the same thing as any other MMIO range QEMU
>>> already emulates via map_io_range_to_ioreq_server(). No sensitive
>>> information exposed. It is related only to emulated PCI conf space
>>> which QEMU already knows about and use, providing emulated PCI
>>> devices for it.  
>>
>>You continue to ignore the routing requirement multiple ioreq
>>servers impose.
> 
> If the emulated MMCONFIG approach will be modified to become
> fully compatible with multiple ioreq servers (whatever they used for), I
> assume there will be no objections that emulated MMCONFIG can't be
> used?
> I just want to clarify this moment -- why people think that
> a completely emulated MMIO range, not related in any
> way to host's MMCONFIG may compromise something.

Compromise? All that was said so far - afair - was that this is the
wrong way round design wise.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 8/8] x86: avoid double CR3 reload when switching to guest user mode

2018-03-22 Thread Wei Liu
On Mon, Mar 19, 2018 at 07:41:42AM -0600, Jan Beulich wrote:
> When XPTI is active, the CR3 load in restore_all_guest is sufficient
> when switching to user mode, improving in particular system call and
> page fault exit paths for the guest.
> 
> Signed-off-by: Jan Beulich 
> Tested-by: Juergen Gross 
> Reviewed-by: Juergen Gross 
> ---
> v2: Add ASSERT(!in_irq()).
> 
> --- a/xen/arch/x86/pv/domain.c
> +++ b/xen/arch/x86/pv/domain.c
> @@ -219,10 +219,22 @@ int pv_domain_initialise(struct domain *
>  return rc;
>  }
>  
> -static void _toggle_guest_pt(struct vcpu *v)
> +static void _toggle_guest_pt(struct vcpu *v, bool force_cr3)
>  {
> +ASSERT(!in_irq());
> +
>  v->arch.flags ^= TF_kernel_mode;
>  update_cr3(v);
> +
> +/*
> + * There's no need to load CR3 here when it is going to be loaded on the
> + * way out to guest mode again anyway, and when the page tables we're
> + * currently on are the kernel ones (whereas when switching to kernel
> + * mode we need to be able to write a bounce frame onto the kernel 
> stack).
> + */

Not sure I follow the comment. If you're talking about
create_bounce_frame, it wouldn't call this function in the first place,
right?

> +if ( !force_cr3 && !(v->arch.flags & TF_kernel_mode) )

Also, it takes a bit of mental power to see !(v->arch.flags &
TF_kernel_mode) means the mode Xen is using. Can you maybe just use a
variable at the beginning like

   bool kernel_mode = v->arch.flags & TF_kernel_mode;

and then use it here?

> +return;
> +
>  /* Don't flush user global mappings from the TLB. Don't tick TLB clock. 
> */
>  asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
>  
> @@ -252,13 +264,13 @@ void toggle_guest_mode(struct vcpu *v)
>  }
>  asm volatile ( "swapgs" );
>  
> -_toggle_guest_pt(v);
> +_toggle_guest_pt(v, cpu_has_no_xpti);
>  }
>  
>  void toggle_guest_pt(struct vcpu *v)
>  {
>  if ( !is_pv_32bit_vcpu(v) )
> -_toggle_guest_pt(v);
> +_toggle_guest_pt(v, true);
>  }
>  
>  /*
> 
> 
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 05/14] x86/HVM: eliminate custom #MF/#XM handling

2018-03-22 Thread Roger Pau Monné
On Thu, Mar 15, 2018 at 07:06:36AM -0600, Jan Beulich wrote:
> @@ -8478,7 +8411,8 @@ x86_emulate(
>  }
>  
>   complete_insn: /* Commit shadow register state. */
> -put_fpu(, false, state, ctxt, ops);
> +put_fpu(fpu_type, false, state, ctxt, ops);
> +fpu_type = X86EMUL_FPU_none;
>  
>  /* Zero the upper 32 bits of %rip if not in 64-bit mode. */
>  if ( !mode_64bit() )
> @@ -8502,13 +8436,22 @@ x86_emulate(
>  ctxt->regs->eflags &= ~X86_EFLAGS_RF;
>  
>   done:
> -put_fpu(, fic.insn_bytes > 0 && dst.type == OP_MEM, state, ctxt, 
> ops);
> +put_fpu(fpu_type, insn_bytes > 0 && dst.type == OP_MEM, state, ctxt, 
> ops);
>  put_stub(stub);
>  return rc;
>  #undef state
>  
>  #ifdef __XEN__
>   emulation_stub_failure:
> +generate_exception_if(stub_exn.info.fields.trapnr == EXC_MF, EXC_MF);
> +if ( stub_exn.info.fields.trapnr == EXC_XM )
> +{
> +unsigned long cr4;
> +
> +if ( !ops->read_cr || !ops->read_cr(4, , ctxt) == X86EMUL_OKAY )

Is the second expression in the above line missing parentheses:

if ( !ops->read_cr || !(ops->read_cr(4, , ctxt) == X86EMUL_OKAY) )

Or should this be:

if ( !ops->read_cr || ops->read_cr(4, , ctxt) != X86EMUL_OKAY )

clang complains with:

In file included from x86_emulate.c:44:
./x86_emulate/x86_emulate.c:8665:31: error: logical not is only applied to the 
left hand side of
  this comparison [-Werror,-Wlogical-not-parentheses]
if ( !ops->read_cr || !ops->read_cr(4, , ctxt) == X86EMUL_OKAY )
  ^~~
./x86_emulate/x86_emulate.c:8665:31: note: add parentheses after the '!' to 
evaluate the comparison
  first
if ( !ops->read_cr || !ops->read_cr(4, , ctxt) == X86EMUL_OKAY )
  ^
   (  )
./x86_emulate/x86_emulate.c:8665:31: note: add parentheses around left hand 
side expression to
  silence this warning
if ( !ops->read_cr || !ops->read_cr(4, , ctxt) == X86EMUL_OKAY )
  ^
  (   )
1 error generated.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 1/7] x86/xpti: avoid copying L4 page table contents when possible

2018-03-22 Thread Jan Beulich
>>> On 21.03.18 at 13:51,  wrote:
> --- a/xen/arch/x86/flushtlb.c
> +++ b/xen/arch/x86/flushtlb.c
> @@ -158,6 +158,9 @@ unsigned int flush_area_local(const void *va, unsigned 
> int flags)
>  }
>  }
>  
> +if ( flags & FLUSH_ROOT_PGTBL )
> +get_cpu_info()->root_pgt_changed = true;
> +
>  local_irq_restore(irqfl);
>  
>  return flags;

Does this really need to sit inside the interrupts disabled section?

Thinking about it I even wonder whether the cache flush part needs
to be. Even for the INVLPG portion of the TLB flush part I can't
seem to see a need for IRQs to be off. I think it's really just the
pre_flush() / post_flush() pair which needs to be inside such a
section. I'll prepare a patch (for after 4.11). I think some of the
changes later in your series will actually further ease this.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -499,10 +499,15 @@ void free_shared_domheap_page(struct page_info *page)
>  void make_cr3(struct vcpu *v, mfn_t mfn)
>  {
>  v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
> +if ( v == current && this_cpu(root_pgt) && is_pv_vcpu(v) &&
> + !is_pv_32bit_vcpu(v) )
> +get_cpu_info()->root_pgt_changed = true;
>  }

As this doesn't actually update CR3, setting the flag shouldn't
generally be necessary if the caller then invokes write_ptbase().
Isn't setting the flag here needed solely in the case of
_toggle_guest_pt() being up the call tree? In which case it would
perhaps better be set there (and in turn some or even all of the
conditional around it could be dropped)?

>  void write_ptbase(struct vcpu *v)
>  {
> +if ( this_cpu(root_pgt) && is_pv_vcpu(v) && !is_pv_32bit_vcpu(v) )
> +get_cpu_info()->root_pgt_changed = true;
>  write_cr3(v->arch.cr3);

When you come here from e.g. __sync_local_execstate(), you
don't really need to set the flag. Of course you'll come here again
before the next 64-bit PV vCPU will make it to restore_all_guest,
so by the time we make it there the flag will be set anyway.
However, if you already use such a subtlety, then there's also
no point excluding 32-bit vCPU-s here (nor in make_cr3()), as
those will never make it to restore_all_guest. Same then for
excluding HVM vCPU-s. And I then wonder whether (here or
more likely in a later patch) the root_pgt check couldn't go away
as well.

> @@ -3698,18 +3703,29 @@ long do_mmu_update(
>  break;
>  rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
>cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
> -/*
> - * No need to sync if all uses of the page can be 
> accounted
> - * to the page lock we hold, its pinned status, and uses 
> on
> - * this (v)CPU.
> - */
> -if ( !rc && !cpu_has_no_xpti &&
> - ((page->u.inuse.type_info & PGT_count_mask) >
> -  (1 + !!(page->u.inuse.type_info & PGT_pinned) +
> -   (pagetable_get_pfn(curr->arch.guest_table) == 
> mfn) 
> +
> -   (pagetable_get_pfn(curr->arch.guest_table_user) ==
> -mfn))) )
> -sync_guest = true;
> +if ( !rc && !cpu_has_no_xpti )
> +{
> +bool local_in_use = false;
> +
> +if ( (pagetable_get_pfn(curr->arch.guest_table) ==
> +  mfn) ||
> + (pagetable_get_pfn(curr->arch.guest_table_user) 
> ==
> +  mfn) )
> +{
> +local_in_use = true;
> +get_cpu_info()->root_pgt_changed = true;
> +}

The conditional causes root_pgt_changed to get set even in cases
where what CR3 points to doesn't actually change (if it's the user
page tables that get modified). I think you want to check
curr->arch.cr3 here, or only curr->arch.guest_table (as user mode
can't invoke hypercalls).

> +/*
> + * No need to sync if all uses of the page can be
> + * accounted to the page lock we hold, its pinned
> + * status, and uses on this (v)CPU.
> + */
> +if ( (page->u.inuse.type_info & PGT_count_mask) >
> + (1 + !!(page->u.inuse.type_info & PGT_pinned) +
> +  local_in_use) )

The boolean local_in_use evaluates to 1 here, when previously the
value could have been 1 or 2 (I agree that's highly theoretical, but
anyway). Of course this will be addressed implicitly if you check
(only) curr->arch.guest_table above and move the
curr->arch.guest_table_user check here.

Jan



Re: [Xen-devel] [PATCH v3 2/7] x86/xpti: don't flush TLB twice when switching to 64-bit pv context

2018-03-22 Thread Jan Beulich
>>> On 21.03.18 at 13:51,  wrote:
> When switching to a 64-bit pv context the TLB is flushed twice today:
> the first time when switching to the new address space in
> write_ptbase(), the second time when switching to guest mode in
> restore_to_guest.
> 
> Avoid the first TLB flush in that case.
> 
> Signed-off-by: Juergen Gross 
> ---
> V3:
> - omit setting root_pgt_changed to false (Jan Beulich)
> ---
>  xen/arch/x86/mm.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 352600ad73..8c944b33c9 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -123,6 +123,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -507,8 +508,14 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
>  void write_ptbase(struct vcpu *v)
>  {
>  if ( this_cpu(root_pgt) && is_pv_vcpu(v) && !is_pv_32bit_vcpu(v) )
> +{
>  get_cpu_info()->root_pgt_changed = true;
> -write_cr3(v->arch.cr3);
> +asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
> +}
> +else
> +{
> +write_cr3(v->arch.cr3);
> +}

Unnecessary braces. with that
Reviewed-by: Jan Beulich 
(This could be taken care of while committing, but the patch
depends on patch 1 anyway, which may see further
transformation.)

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 08:42:09 -0600
"Jan Beulich"  wrote:

 On 22.03.18 at 15:34,  wrote:  
>> On Thu, 22 Mar 2018 07:20:00 -0600
>> "Jan Beulich"  wrote:
>>   
>> On 22.03.18 at 14:05,  wrote:
 On Thu, 22 Mar 2018 06:09:44 -0600
 "Jan Beulich"  wrote:
 
 On 22.03.18 at 12:56,  wrote:  
>> I really don't understand why some people have that fear of
>> emulated MMCONFIG -- it's really the same thing as any other MMIO
>> range QEMU already emulates via map_io_range_to_ioreq_server().
>> No sensitive information exposed. It is related only to emulated
>> PCI conf space which QEMU already knows about and use, providing
>> emulated PCI devices for it.  
>
>You continue to ignore the routing requirement multiple ioreq
>servers impose.
 
 If the emulated MMCONFIG approach will be modified to become
 fully compatible with multiple ioreq servers (whatever they used
 for), I assume there will be no objections that emulated MMCONFIG
 can't be used?
 I just want to clarify this moment -- why people think that
 a completely emulated MMIO range, not related in any
 way to host's MMCONFIG may compromise something.
>>>
>>>Compromise? All that was said so far - afair - was that this is the
>>>wrong way round design wise.  
>> 
>> I assume it's all about emulating some real system for HVM, for other
>> goals PV/PVH are available. What is a proper, design-wise way to
>> emulate the MMIO-based MMCONFIG range Q35 provides you think of?
>> 
>> Here is what I've heard so far in this thread:
>> 
>> 1. Add a completely new dmop/hypercall so that QEMU can tell Xen
>> where emulated MMCONFIG MMIO area is located and in the same time
>> map it for MMIO trapping to intercept accesses. Latter action is the
>> same what map_io_range_to_ioreq_server() does, but let's ignore it
>> for now because there was opinion that we need to stick to a
>> distinct hypercall.
>> 
>> 2. Upon trapping accesses to this emulated range, Xen will pretend
>> that QEMU didn't just told him about MMCONFIG location and size and
>> instead convert MMIO access into PCI conf one and send the ioreq to
>> QEMU or some other DM.
>> 
>> 3. If there will be a PCIEXBAR relocation (OVMF does it currently for
>> MMCONFIG usage, but we must later teach him non-QEMU manners), QEMU
>> must immediately inform Xen about any changes in MMCONFIG
>> location/status.
>> 
>> 4. QEMU receives PCI conf access while expecting the MMIO address, so
>> xen-hvm.c has to deal with it somehow, either obtaining MMCONFIG base
>> and recreating emulated MMIO access from BDF/reg or doing the dirty
>> work of finding PCIBus/PCIDevice target itself as it cannot use
>> emulated CF8/CFC ports due to legacy PCI conf size limitation.
>> 
>> Please confirm that it is a preferable solution or if something
>> missing.  
>
>I'm afraid this is only part of the picture, as you've been told by
>others before. We first of all need to settle on who emulates
>the core chipset registers. Depending on that will be how Xen
>would learn about the MCFG location inside the guest.

Few related thoughts:

1. MMCONFIG address is chipset-specific. On Q35 it's a PCIEXBAR, on
other x86 systems it may be HECBASE or else. So we can assume it is
bound to the emulated machine

2. We rely on QEMU to emulate different machines for us.

3. There are users which touch chipset-specific PCIEXBAR directly if
they see a Q35 system (OVMF so far)

Seems like we're pretty limited in freedom of choice in this
conditions, I'm afraid.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 3/7] xen/x86: support per-domain flag for xpti

2018-03-22 Thread Jan Beulich
>>> On 21.03.18 at 13:51,  wrote:
> +void xpti_domain_init(struct domain *d)
> +{
> +if ( !is_pv_domain(d) || is_pv_32bit_domain(d) )
> +return;

As you rely on the zero-initialization of the field here, ...

> +switch ( opt_xpti )
> +{
> +case XPTI_OFF:
> +d->arch.pv_domain.xpti = false;

... this could go away as well.

> @@ -1050,8 +1050,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  panic("Error %d setting up PV root page table\n", rc);
>  if ( per_cpu(root_pgt, 0) )
>  {
> -get_cpu_info()->pv_cr3 = __pa(per_cpu(root_pgt, 0));
> -
> +get_cpu_info()->pv_cr3 = 0;
>  /*
>   * All entry points which may need to switch page tables have to 
> start
>   * with interrupts off. Re-write what pv_trap_init() has put there.

Please don't drop the blank line.

> @@ -36,7 +38,8 @@ static inline void pv_vcpu_destroy(struct vcpu *v) {}
>  static inline int pv_vcpu_initialise(struct vcpu *v) { return -EOPNOTSUPP; }
>  static inline void pv_domain_destroy(struct domain *d) {}
>  static inline int pv_domain_initialise(struct domain *d) { return 
> -EOPNOTSUPP; }
> -
> +static inline void xpti_init(void) {}
> +static inline void xpti_domain_init(struct domain *d) {}
>  #endif   /* CONFIG_PV */

Same here. With that
Reviewed-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend

2018-03-22 Thread Julien Grall

Hi Andre,

On 03/22/2018 11:56 AM, Andre Przywara wrote:

+/* The locking order forces us to drop and re-take the locks here. */
+if ( irq->hw )
+{
+spin_unlock(>irq_lock);
+
+desc = irq_to_desc(irq->hwintid);
+spin_lock(>lock);
+spin_lock(>irq_lock);
+
+/* This h/w IRQ should still be assigned to the virtual IRQ. */
+ASSERT(irq->hw && desc->irq == irq->hwintid);
+
+have_desc_lock = true;
+}


I am a bit concerned of this dance in fold_lr_state(). This looks 
awfully complex but I don't have better solution here. I will have a 
think during the night.


However, this is not going to solve the race condition I mentioned 
between clearing _IRQ_INPROGRESS here and setting _IRQ_INPROGRESS in 
do_IRQ. This is because you don't know the order they are going to be 
executed.


I wanted to make sure you didn't intend to solve that one. Am I correct?

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 05/14] x86/HVM: eliminate custom #MF/#XM handling

2018-03-22 Thread Jan Beulich
>>> On 22.03.18 at 15:12,  wrote:
> On Thu, Mar 15, 2018 at 07:06:36AM -0600, Jan Beulich wrote:
>> @@ -8478,7 +8411,8 @@ x86_emulate(
>>  }
>>  
>>   complete_insn: /* Commit shadow register state. */
>> -put_fpu(, false, state, ctxt, ops);
>> +put_fpu(fpu_type, false, state, ctxt, ops);
>> +fpu_type = X86EMUL_FPU_none;
>>  
>>  /* Zero the upper 32 bits of %rip if not in 64-bit mode. */
>>  if ( !mode_64bit() )
>> @@ -8502,13 +8436,22 @@ x86_emulate(
>>  ctxt->regs->eflags &= ~X86_EFLAGS_RF;
>>  
>>   done:
>> -put_fpu(, fic.insn_bytes > 0 && dst.type == OP_MEM, state, ctxt, 
>> ops);
>> +put_fpu(fpu_type, insn_bytes > 0 && dst.type == OP_MEM, state, ctxt, 
>> ops);
>>  put_stub(stub);
>>  return rc;
>>  #undef state
>>  
>>  #ifdef __XEN__
>>   emulation_stub_failure:
>> +generate_exception_if(stub_exn.info.fields.trapnr == EXC_MF, EXC_MF);
>> +if ( stub_exn.info.fields.trapnr == EXC_XM )
>> +{
>> +unsigned long cr4;
>> +
>> +if ( !ops->read_cr || !ops->read_cr(4, , ctxt) == X86EMUL_OKAY )
> 
> Is the second expression in the above line missing parentheses:
> 
> if ( !ops->read_cr || !(ops->read_cr(4, , ctxt) == X86EMUL_OKAY) )
> 
> Or should this be:
> 
> if ( !ops->read_cr || ops->read_cr(4, , ctxt) != X86EMUL_OKAY )

Oops, yes indeed, the latter. Thanks for the report.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 12:44:02 +
Roger Pau Monné  wrote:

>On Thu, Mar 22, 2018 at 10:29:22PM +1000, Alexey G wrote:
>> On Thu, 22 Mar 2018 09:57:16 +
>> Roger Pau Monné  wrote:
>> [...]  
>> >> Yes, and it is still needed as we have two distinct (and not
>> >> equal) interfaces to PCI conf space. Apart from 0..FFh range
>> >> overlapping they can be considered very different interfaces. And
>> >> whether it is a real system or emulated -- we can use either one
>> >> of these two interfaces or both.
>> >
>> >The legacy PCI config space accesses and the MCFG config space
>> >access are just different methods of accessing the PCI
>> >configuration space, but the data _must_ be exactly the same. I
>> >don't see how a device would care about where the access to the
>> >config space originated.  
>> 
>> If they were different methods of accessing the same thing, they
>> could've been used interchangeably. When we've got a PCI conf ioreq
>> which has offset>100h we know we cannot just pass it to emulated
>> CF8/CFC but have to emulate this specifically.  
>
>This is already not the best approach to dispatch PCI config space
>access in QEMU. I think the interface in QEMU should be:
>
>pci_conf_space_{read/write}(sbdf, register, size , data)
>
>And this would go directly into the device. But I assume this involves
>a non-trivial amount of work to be implemented. Hence xen-hvm.c usage
>of the IO port access replay.

Yes, it's a helpful shortcut. The only bad thing that we can't use
it for PCI extended config accesses, a memory address within emulated
MMCONFIG much more preferable in current architecture.

>> >OK, so you don't want to reconstruct the access, fine.
>> >
>> >Then just inject it using pcie_mmcfg_data_{read/write} or some
>> >similar wrapper. My suggestion was just to try to use the easier
>> >way to get this injected into QEMU.  
>> 
>> QEMU knows its position, the problem it that xen-hvm.c (ioreq
>> processor) is rather isolated from MMCONFIG emulation.
>> 
>> If you check the pcie_mmcfg_data_read/write MMCONFIG handlers in
>> QEMU, you can see this:
>> 
>> static uint64_t pcie_mmcfg_data_read(void *opaque, <...>
>> {
>> PCIExpressHost *e = opaque;
>> ...
>> 
>> We know this 'opaque' when we do MMIO-style MMCONFIG handling as
>> pcie_mmcfg_data_read/write are actual handlers.
>> 
>> But xen-hvm.c needs to gain access to PCIExpressHost out of nowhere,
>> which is possible but considered a hack by QEMU. We can also insert
>> some code to MMCONFIG emulation which will store info we need to some
>> global variables to be used across wildly different and unrelated
>> modules. It will work, but anyone who see it will have bad thoughts
>> on his mind.  
>
>Since you need to notify Xen the MCFG area address, why not just store
>the MCFG address while doing this operation? You could do this with a
>helper in xen-hvm.c, and keep the variable locally to that file.
>
>In any case, this is a QEMU implementation detail. IMO the IOREQ
>interface is clear and should not be bended like this just because
>'this is easier to implement in QEMU'.

A bit of hack too, but might work. Anyway, it's an extra work we can
avoid if we simply skip PCI conf translation for MMCONFIG MMIO ioreqs
targeting QEMU. I completely agree that we need to translate these
accesses into PCI conf ioreqs for device DMs, but for QEMU it is an
unwanted and redundant step.

AFAIK (Paul might correct me here) the multiple device emulators
feature already makes use of the primary (aka default) DM and
device-specific DM distinction, so in theory it should be possible to
provide that translation only for device-specific DMs (which function
apart from the emulated machine and cannot use its facilities).

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 121061: trouble: blocked/broken/pass

2018-03-22 Thread osstest service owner
flight 121061 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/121061/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf  broken
 build-armhf   4 host-install(4)broken REGR. vs. 121043

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  6161d9f27fcb6c48021e6928bb240dfa39d9f1d3
baseline version:
 xen  8df3821c08d024684a6c83659d8d794b565067f9

Last test of basis   121043  2018-03-21 21:04:22 Z0 days
Testing same since   121056  2018-03-22 10:01:22 Z0 days2 attempts


People who touched revisions under test:
  Andrew Cooper 
  Doug Goldstein 
  Jan Beulich 
  Joe Jin 
  Tim Deegan 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  broken  
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-job build-armhf broken
broken-step build-armhf host-install(4)

Not pushing.

(No revision log; it would be 318 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 8/8] x86: avoid double CR3 reload when switching to guest user mode

2018-03-22 Thread Jan Beulich
>>> On 22.03.18 at 14:20,  wrote:
> On Mon, Mar 19, 2018 at 07:41:42AM -0600, Jan Beulich wrote:
>> --- a/xen/arch/x86/pv/domain.c
>> +++ b/xen/arch/x86/pv/domain.c
>> @@ -219,10 +219,22 @@ int pv_domain_initialise(struct domain *
>>  return rc;
>>  }
>>  
>> -static void _toggle_guest_pt(struct vcpu *v)
>> +static void _toggle_guest_pt(struct vcpu *v, bool force_cr3)
>>  {
>> +ASSERT(!in_irq());
>> +
>>  v->arch.flags ^= TF_kernel_mode;
>>  update_cr3(v);
>> +
>> +/*
>> + * There's no need to load CR3 here when it is going to be loaded on the
>> + * way out to guest mode again anyway, and when the page tables we're
>> + * currently on are the kernel ones (whereas when switching to kernel
>> + * mode we need to be able to write a bounce frame onto the kernel 
>> stack).
>> + */
> 
> Not sure I follow the comment. If you're talking about
> create_bounce_frame, it wouldn't call this function in the first place,
> right?

Right. The comment is talking about what may happen after we
return from here.

>> +if ( !force_cr3 && !(v->arch.flags & TF_kernel_mode) )
> 
> Also, it takes a bit of mental power to see !(v->arch.flags &
> TF_kernel_mode) means the mode Xen is using. Can you maybe just use a
> variable at the beginning like
> 
>bool kernel_mode = v->arch.flags & TF_kernel_mode;
> 
> and then use it here?

Except for the (how I would say) clutter by the extra local variable
I don't see much of a difference.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] X86 Community Call - Wed Apr 11, 14:00 - 15:00 UTC - Call for Agenda Items

2018-03-22 Thread Julien Grall

Hi,

On 03/22/2018 11:55 AM, Roger Pau Monné wrote:

On Thu, Mar 22, 2018 at 10:27:35AM +, Paul Durrant wrote:

De-htmling...

-
From: Lars Kurth
Sent: 22 March 2018 10:22
To: xen-de...@lists.xensource.com
Cc: committ...@xenproject.org; Juergen Gross ; Janakarajan Natarajan ; Tamas K Lengyel 
; Wei Liu ; Andrew Cooper ; Daniel Kiper 
; Roger Pau Monné ; Christopher Clark ; Rich Persaud 
; Paul Durrant ; Jan Beulich' ; Brian Woods 
; intel-...@intel.com
Subject: X86 Community Call - Wed Apr 11, 14:00 - 15:00 UTC - Call for Agenda 
Items

Hi all,
please find attached
a) Meeting details (just a link with timezones) – the meeting invite will 
follow when we have an agenda
    Bridge details – will be sent with the meeting invite
    I am thinking of using GotoMeeting, but want to try this with a Linux only 
user before I commit
c) Call for agenda items
A few suggestions were made, such as XPTI status (if applicable), PVH status
Also we have some left-overs from the last call: see 
https://lists.xenproject.org/archives/html/xen-devel/2018-03/threads.html#01571
Regards
Lars
== Meeting Details ==
Wed April 11, 15:00 - 16:00 UTC
International meeting times: 
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2018=4=11=14=0=0=224=24=179=136=37=33
== Agenda Proposal ==
We start with a round the table call as to who is on the call (name and company)
=== A) Coordination and Planning ===
Coordinating who does what, what needs attention, what is blocked, etc.
A1) Short-term
Any urgent issues related to the 4.11 release that need discussing
A2) Long-term, Larger series
Please call out any x86 related series, that need attention in the longer term. 
Provide
* Title of series
* Link to series (e.g. on https://lists.xenproject.org/archives/html/xen-devel, 
markmail, …)
* Describe any: Dependencies, Issues, etc. that are relevant
=== B) Design, architecture, feature eupdates related discussions ===
Please highlight any design/architecture discussions that you would like to 
cover. Please describe
* Design, point to any mail discussions
* Describe clearly what you are blocked on: highlight any issues
=== C) Demos, Sharing of Experiences, Sometimes discussion of specific 
issues/bugs/problems/... ===
Please highlight any of the above that you would like to cover. Please describe
* What the issue/experience/demo is that you would like to cover
=== D) AOB ===
-

I think we need to discuss PCI emulation and our future direction. Our current 
hybrid with QEMU is becoming increasingly problematic.


+1


I think it would be worth for Stefano and I to join this discussion. 
Ideally, we want to use a common solution between Arm and x86.


Not sure the time will fit for Stefano thought.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 14/39] ARM: new VGIC: Add GICv2 world switch backend

2018-03-22 Thread Julien Grall

Hi Andre,

On 03/22/2018 11:04 AM, Andre Przywara wrote:

This is a "patch to the patch" mentioned above, to make it clear what
changed:
We now take the desc lock in vgic_v2_fold_lr_state() when we are dealing
with a hardware IRQ. This is a bit complicated, because we have to obey
the existing locking order, so do our infamous "drop-take-retake" dance.
Also I print a message about using the new VGIC and fix that last
remaining "u32" usage.

Please note that I had to initialise "desc" to NULL because my compiler
(GCC 5.3) is not smart enough to see that we only use it with irq->hw
set and it's safe. Please let me know if it's me not being smart enough
here instead ;-)


I would not be surprised that even recent compiler can't deal with that. 
It would require quite some work from the compiler to know that desc is 
only used when irq->hw.


I will comment the code on 3a.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] X86 Community Call - Wed Apr 11, 14:00 - 15:00 UTC - Call for Agenda Items

2018-03-22 Thread Lars Kurth
Removing the non-working Intel alias

@John: once this alias actually works, let me know. 
The start of the thread is at 
https://lists.xenproject.org/archives/html/xen-devel/2018-03/threads.html#02672

@All:
To summarize in terms of higher level discussions:
* Discuss PCI emulation and our future direction. Our current hybrid with QEMU 
is becoming increasingly problematic (leader: Paul)
* Update on PVH work (leader: Royger)

It would be good if leaders could do some preparation and send out a short 
description of anything that they think may help others follow the discussion.

We should probably also summarize quickly any developments on NVDIMM, depending 
on progress.

I would say: maybe use the first 15-30 minutes for more operational stuff. The 
second half for bigger ticket items.

On 22/03/2018, 14:49, "Julien Grall"  wrote:

>> -
>>
>> I think we need to discuss PCI emulation and our future direction. Our 
current hybrid with QEMU is becoming increasingly problematic.
> 
> +1

I think it would be worth for Stefano and I to join this discussion. 
Ideally, we want to use a common solution between Arm and x86.

Not sure the time will fit for Stefano thought.

It's at 7am Pacific, which is a little early for Stefano. I can't really move 
the call: it was quite hard to agree a time-slot.
But we could aim to schedule this discussion for say 7:30 or 7:45, which makes 
this easier for Stefano

Regards
Lars 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [for-4.11][PATCH v6 16/16] xen: Convert page_to_mfn and mfn_to_page to use typesafe MFN

2018-03-22 Thread Julien Grall



On 03/22/2018 12:24 PM, Tim Deegan wrote:

Hi,


Hi Tim,


At 04:47 + on 21 Mar (1521607657), Julien Grall wrote:

Most of the users of page_to_mfn and mfn_to_page are either overriding
the macros to make them work with mfn_t or use mfn_x/_mfn because the
rest of the function use mfn_t.

So make page_to_mfn and mfn_to_page return mfn_t by default. The __*
version are now dropped as this patch will convert all the remaining
non-typesafe callers.

Only reasonable clean-ups are done in this patch. The rest will use
_mfn/mfn_x for the time being.

Lastly, domain_page_to_mfn is also converted to use mfn_t given that
most of the callers are now switched to _mfn(domain_page_to_mfn(...)).

Signed-off-by: Julien Grall 
Acked-by: Razvan Cojocaru 
Reviewed-by: Paul Durrant 
Reviewed-by: Boris Ostrovsky 
Reviewed-by: Kevin Tian 
Reviewed-by: Wei Liu 
Acked-by: Jan Beulich 
Reviewed-by: George Dunlap 


Thought I'd already acked this for the shadow code, but clearly not.


You acked on the first version. That patch was heavily rework just after 
to drop __mfn_to_page and __page_to_mfn rather. Hence I dropped the acked.



Sorry for the delay, and:

Acked-by: Tim Deegan 


Thank you!

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 07:20:00 -0600
"Jan Beulich"  wrote:

 On 22.03.18 at 14:05,  wrote:  
>> On Thu, 22 Mar 2018 06:09:44 -0600
>> "Jan Beulich"  wrote:
>>   
>> On 22.03.18 at 12:56,  wrote:
 I really don't understand why some people have that fear of
 emulated MMCONFIG -- it's really the same thing as any other MMIO
 range QEMU already emulates via map_io_range_to_ioreq_server(). No
 sensitive information exposed. It is related only to emulated PCI
 conf space which QEMU already knows about and use, providing
 emulated PCI devices for it.
>>>
>>>You continue to ignore the routing requirement multiple ioreq
>>>servers impose.  
>> 
>> If the emulated MMCONFIG approach will be modified to become
>> fully compatible with multiple ioreq servers (whatever they used
>> for), I assume there will be no objections that emulated MMCONFIG
>> can't be used?
>> I just want to clarify this moment -- why people think that
>> a completely emulated MMIO range, not related in any
>> way to host's MMCONFIG may compromise something.  
>
>Compromise? All that was said so far - afair - was that this is the
>wrong way round design wise.

I assume it's all about emulating some real system for HVM, for other
goals PV/PVH are available. What is a proper, design-wise way to
emulate the MMIO-based MMCONFIG range Q35 provides you think of?

Here is what I've heard so far in this thread:

1. Add a completely new dmop/hypercall so that QEMU can tell Xen where
emulated MMCONFIG MMIO area is located and in the same time map it for
MMIO trapping to intercept accesses. Latter action is the same what
map_io_range_to_ioreq_server() does, but let's ignore it for now
because there was opinion that we need to stick to a distinct hypercall.

2. Upon trapping accesses to this emulated range, Xen will pretend that
QEMU didn't just told him about MMCONFIG location and size and instead
convert MMIO access into PCI conf one and send the ioreq to QEMU or
some other DM.

3. If there will be a PCIEXBAR relocation (OVMF does it currently for
MMCONFIG usage, but we must later teach him non-QEMU manners), QEMU must
immediately inform Xen about any changes in MMCONFIG location/status.

4. QEMU receives PCI conf access while expecting the MMIO address, so
xen-hvm.c has to deal with it somehow, either obtaining MMCONFIG base
and recreating emulated MMIO access from BDF/reg or doing the dirty work
of finding PCIBus/PCIDevice target itself as it cannot use emulated
CF8/CFC ports due to legacy PCI conf size limitation.

Please confirm that it is a preferable solution or if something missing.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] possible I/O emulation state machine issue

2018-03-22 Thread Jan Beulich
Paul,

our PV driver person has found a reproducible crash with ws2k8,
triggered by one of the WHQL tests. The guest get crashed because
the re-issue check of an ioreq close to the top of hvmemul_do_io()
fails. I've handed him a first debugging patch, output of which
suggests that we're dealing with a completely new request, which
in turn would mean that we've run into stale STATE_IORESP_READY
state:

(XEN) d2v3: t=0/1 a=3c4/fed000f0 s=2/4 c=1/1 d=0/1 f=0/0 p=0/0 
v=100/831873f27a30
(XEN) [ Xen-4.10.0_15-0  x86_64  debug=n   Tainted:  C   ]
(XEN) CPU:39
(XEN) RIP:e008:[] emulate.c#hvmemul_do_io+0x1b1/0x640
(XEN) RFLAGS: 00010292   CONTEXT: hypervisor (d2v3)
(XEN) rax: 8308797d802c   rbx: 0004   rcx: 
(XEN) rdx: 831873f27fff   rsi: 000a   rdi: 82d0804433b8
(XEN) rbp: 830007d28000   rsp: 831873f27728   r8:  0027
(XEN) r9:  0010   r10: 0400   r11: 82d08035bd40
(XEN) r12: 0001   r13:    r14: 0001
(XEN) r15: 831873f278e0   cr0: 80050033   cr4: 26e0
(XEN) cr3: 003794f02000   cr2: fa6000fae10e
(XEN) fsb:    gsb:    gss: 07fdd000
(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
(XEN) Xen code around  (emulate.c#hvmemul_do_io+0x1b1/0x640):
(XEN)  54 24 70 e8 cf 87 f7 ff <0f> 0b 48 8d 3d 16 b6 0b 00 48 8d 35 88 f8 0c 00
(XEN) Xen stack trace from rsp=831873f27728:
(XEN)0002 0004 0001 0001
(XEN) 0001  
(XEN)  0100 831873f27a30
(XEN)83283fe74010 83284ad22000  0001
(XEN)831873f277d8 831873f277e0 03c4 0100
(XEN)00020001  8317f8e5b000 0004
(XEN)0001 831873f27a30 831873f27a30 fed000f0
(XEN)830007d289c8 82d0802d578e  831873f27a30
(XEN) 0004 0004 
(XEN)831873f27a30 82d0802d64dd 831873f27a30 831873f27d10
(XEN)fed000f0 831873f27a30 0103 
(XEN)831873f278e0 ffd070f0 00040004 0004
(XEN)0001 831873f27c78 831873f278d8 831873f278d0
(XEN)831873f27938 fed000f0 0001 0001
(XEN)82d080350ecb 0004 0001 831873f27c78
(XEN)831873f27a30 0002 830007d28000 82d0802d69f1
(XEN)0001 82d0802a313d ffd070f0 0001
(XEN) 00f0 82d080350ecb 831873f27aa0
(XEN) 831873f27c78 831873f27a28 830007d28a60
(XEN)82d0803a7620 82d0802a4aad 831873f279c8 831873f27ac0
(XEN) Xen call trace:
(XEN)[] emulate.c#hvmemul_do_io+0x1b1/0x640
(XEN)[] emulate.c#hvmemul_do_io_buffer+0x2e/0x70
(XEN)[] emulate.c#hvmemul_linear_mmio_access+0x24d/0x540
(XEN)[] common_interrupt+0x9b/0x120
(XEN)[] emulate.c#__hvmemul_read+0x221/0x230
(XEN)[] x86_emulate.c#x86_decode+0xe2d/0x1e50
(XEN)[] common_interrupt+0x9b/0x120
(XEN)[] x86_emulate+0x94d/0x19150
(XEN)[] __get_gfn_type_access+0x101/0x290
(XEN)[] emulate.c#_hvm_emulate_one+0x4a/0x1e0
(XEN)[] vmx.c#vmx_get_interrupt_shadow+0/0x10
(XEN)[] hvm_emulate_init_once+0x7e/0xb0
(XEN)[] hvm_emulate_one_insn+0x3b/0x120
(XEN)[] x86_insn_is_mem_access+0/0xc0
(XEN)[] hvm_hap_nested_page_fault+0x138/0x710
(XEN)[] timer.c#add_entry+0x50/0xc0
(XEN)[] vmx_asm_vmexit_handler+0xab/0x240
(XEN)[] vmx_asm_vmexit_handler+0x9f/0x240
(XEN)[] vmx_asm_vmexit_handler+0xab/0x240
(XEN)[] vmx_asm_vmexit_handler+0xab/0x240
(XEN)[] vmx_asm_vmexit_handler+0x9f/0x240
(XEN)[] vmx_asm_vmexit_handler+0xab/0x240
(XEN)[] vmx_vmexit_handler+0x8ae/0x1960
(XEN)[] vmx_asm_vmexit_handler+0xab/0x240
(XEN)[] vmx_asm_vmexit_handler+0xab/0x240
(XEN)[] vmx_asm_vmexit_handler+0x9f/0x240
(XEN)[] vmx_asm_vmexit_handler+0xab/0x240
(XEN)[] vmx_asm_vmexit_handler+0x9f/0x240
(XEN)[] vmx_asm_vmexit_handler+0xab/0x240
(XEN)[] vmx_asm_vmexit_handler+0x9f/0x240
(XEN)[] vmx_asm_vmexit_handler+0xab/0x240
(XEN)[] vmx_asm_vmexit_handler+0x9f/0x240
(XEN)[] vmx_asm_vmexit_handler+0xab/0x240
(XEN)[] vmx_asm_vmexit_handler+0xe2/0x240
(XEN) 
(XEN) domain_crash called from emulate.c:171
(XEN) Domain 2 (vcpu#3) crashed on cpu#39:
(XEN) [ Xen-4.10.0_15-0  x86_64  debug=n   Tainted:  C   ]
(XEN) CPU:39
(XEN) RIP:0010:[]
(XEN) RFLAGS: 00010286   CONTEXT: hvm guest (d2v3)
(XEN) rax: ffd07000   rbx: 0003   rcx: 

[Xen-devel] [PATCH v12 08/12] x86/pt: mask MSI vectors on unbind

2018-03-22 Thread Roger Pau Monne
When a MSI device with per-vector masking capabilities is detected or
added to Xen all the vectors are masked when initializing it. This
implies that the first time the interrupt is bound to a domain it's
masked.

This however only applies to the first time the interrupt is bound
because neither the unbind nor the pirq unmap will mask the vector
again. In order to fix this re-mask the interrupt when unbinding it
from a guest. This makes sure that pairs of bind/unbind will always
get the same masking state.

Note that no issues have been reported regarding this behavior because
QEMU always uses the newly introduced XEN_PT_GFLAGSSHIFT_UNMASKED when
binding interrupts, so it's always unmasked.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Cc: Jan Beulich 
---
Changes since v7:
 - New in this version.
---
 xen/drivers/passthrough/io.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 8f16e6c0a5..bab3aa349a 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -645,7 +645,22 @@ int pt_irq_destroy_bind(
 }
 break;
 case PT_IRQ_TYPE_MSI:
+{
+unsigned long flags;
+struct irq_desc *desc = domain_spin_lock_irq_desc(d, machine_gsi,
+  );
+
+if ( !desc )
+return -EINVAL;
+/*
+ * Leave the MSI masked, so that the state when calling
+ * pt_irq_create_bind is consistent across bind/unbinds.
+ */
+guest_mask_msi_irq(desc, true);
+spin_unlock_irqrestore(>lock, flags);
 break;
+}
+
 default:
 return -EOPNOTSUPP;
 }
-- 
2.16.2


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12 10/12] vpci: add a priority parameter to the vPCI register initializer

2018-03-22 Thread Roger Pau Monne
This is needed for MSI-X, since MSI-X will need to be initialized
before parsing the BARs, so that the header BAR handlers are aware of
the MSI-X related holes and make sure they are not mapped in order for
the trap handlers to work properly.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Tim Deegan 
Cc: Wei Liu 
---
Changes since v4:
 - Add a middle priority and add the PCI header to it.

Changes since v3:
 - Add a numerial suffix to the section used to store the pointer to
   each initializer function, and sort them at link time.
---
 xen/arch/arm/xen.lds.S| 4 ++--
 xen/arch/x86/xen.lds.S| 4 ++--
 xen/drivers/vpci/header.c | 2 +-
 xen/drivers/vpci/msi.c| 2 +-
 xen/include/xen/vpci.h| 8 ++--
 5 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 49cae2af71..245a0e0e85 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -69,7 +69,7 @@ SECTIONS
 #if defined(CONFIG_HAS_VPCI) && defined(CONFIG_LATE_HWDOM)
. = ALIGN(POINTER_ALIGN);
__start_vpci_array = .;
-   *(.data.vpci)
+   *(SORT(.data.vpci.*))
__end_vpci_array = .;
 #endif
   } :text
@@ -182,7 +182,7 @@ SECTIONS
 #if defined(CONFIG_HAS_VPCI) && !defined(CONFIG_LATE_HWDOM)
. = ALIGN(POINTER_ALIGN);
__start_vpci_array = .;
-   *(.data.vpci)
+   *(SORT(.data.vpci.*))
__end_vpci_array = .;
 #endif
   } :text
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 7bd6fb51c3..70afedd31d 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -139,7 +139,7 @@ SECTIONS
 #if defined(CONFIG_HAS_VPCI) && defined(CONFIG_LATE_HWDOM)
. = ALIGN(POINTER_ALIGN);
__start_vpci_array = .;
-   *(.data.vpci)
+   *(SORT(.data.vpci.*))
__end_vpci_array = .;
 #endif
   } :text
@@ -246,7 +246,7 @@ SECTIONS
 #if defined(CONFIG_HAS_VPCI) && !defined(CONFIG_LATE_HWDOM)
. = ALIGN(POINTER_ALIGN);
__start_vpci_array = .;
-   *(.data.vpci)
+   *(SORT(.data.vpci.*))
__end_vpci_array = .;
 #endif
   } :text
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 25d8ec0507..9fa07992cc 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -532,7 +532,7 @@ static int init_bars(struct pci_dev *pdev)
 
 return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, true, false) : 0;
 }
-REGISTER_VPCI_INIT(init_bars);
+REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index c3c69ec453..de4ddf562e 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -267,7 +267,7 @@ static int init_msi(struct pci_dev *pdev)
 
 return 0;
 }
-REGISTER_VPCI_INIT(init_msi);
+REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
 
 void vpci_dump_msi(void)
 {
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 116b93f519..7266c17679 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -15,9 +15,13 @@ typedef void vpci_write_t(const struct pci_dev *pdev, 
unsigned int reg,
 
 typedef int vpci_register_init_t(struct pci_dev *dev);
 
-#define REGISTER_VPCI_INIT(x)   \
+#define VPCI_PRIORITY_HIGH  "1"
+#define VPCI_PRIORITY_MIDDLE"5"
+#define VPCI_PRIORITY_LOW   "9"
+
+#define REGISTER_VPCI_INIT(x, p)\
   static vpci_register_init_t *const x##_entry  \
-   __used_section(".data.vpci") = x
+   __used_section(".data.vpci." p) = x
 
 /* Add vPCI handlers to device. */
 int __must_check vpci_add_handlers(struct pci_dev *dev);
-- 
2.16.2


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12 01/12] vpci: introduce basic handlers to trap accesses to the PCI config space

2018-03-22 Thread Roger Pau Monne
This functionality is going to reside in vpci.c (and the corresponding
vpci.h header), and should be arch-agnostic. The handlers introduced
in this patch setup the basic functionality required in order to trap
accesses to the PCI config space, and allow decoding the address and
finding the corresponding handler that should handle the access
(although no handlers are implemented).

Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
setup inside of a x86 HVM file, since that's not shared with other
arches.

A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
whether a domain should use the newly introduced vPCI handlers, this
is only enabled for PVH Dom0 at the moment.

A very simple user-space test is also provided, so that the basic
functionality of the vPCI traps can be asserted. This has been proven
quite helpful during development, since the logic to handle partial
accesses or accesses that expand across multiple registers is not
trivial.

The handlers for the registers are added to a linked list that's keep
sorted at all times. Both the read and write handlers support accesses
that expand across multiple emulated registers and contain gaps not
emulated.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
[IO parts]
Reviewed-by: Paul Durrant 
[ARM]
Acked-by: Julien Grall 
[Tools]
Acked-by: Wei Liu 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
Cc: Julien Grall 
Cc: Paul Durrant 
---
Changes since v9:
 - Remove vpci/Kconfig and use drivers/Kconfig instead.
 - Remove depends on HAS_PCI.

Changes since v8:
 - Introduce HAS_VPCI Kconfig option.
 - Drop Jan and Wei's RB (keep Paul's since the HAS_VPCI addition
   doesn't change IO code).
 - Rebase on top of XSA-256.

Changes since v7:
 - Constify d in vpci_portio_read.
 - ASSERT the correctness of the address in the read/write handlers.
 - Add newlines between non-fallthrough case statements.

Changes since v6:
 - Align the vpci handlers in the linker script.
 - Switch add/remove register functions to take a vpci parameter
   instead of a pci_dev.
 - Expand comment of merge_result.
 - Return X86EMUL_UNHANDLEABLE if accessing cfc and cf8 is disabled.

Changes since v5:
 - Use a spinlock per pci device.
 - Use the recently introduced pci_sbdf_t type.
 - Fix test harness to use the right handler type and the newly
   introduced lock.
 - Move the position of the vpci sections in the linker scripts.
 - Constify domain and pci_dev in vpci_{read/write}.
 - Fix typos in comments.
 - Use _XEN_VPCI_H_ as header guard.

Changes since v4:
* User-space test harness:
 - Do not redirect the output of the test.
 - Add main.c and emul.h as dependencies of the Makefile target.
 - Use the same rule to modify the vpci and list headers.
 - Remove underscores from local macro variables.
 - Add _check suffix to the test harness multiread function.
 - Change the value written by every different size in the multiwrite
   test.
 - Use { } to initialize the r16 and r20 arrays (instead of { 0 }).
 - Perform some of the read checks with the local variable directly.
 - Expand some comments.
 - Implement a dummy rwlock.
* Hypervisor code:
 - Guard the linker script changes with CONFIG_HAS_PCI.
 - Rename vpci_access_check to vpci_access_allowed and make it return
   bool.
 - Make hvm_pci_decode_addr return the register as return value.
 - Use ~3 instead of 0xfffc to remove the register offset when
   checking accesses to IO ports.
 - s/head/prev in vpci_add_register.
 - Add parentheses around & in vpci_add_register.
 - Fix register removal.
 - Change the BUGs in vpci_{read/write}_hw helpers to
   ASSERT_UNREACHABLE.
 - Make merge_result static and change the computation of the mask to
   avoid using a uint64_t.
 - Modify vpci_read to only read from hardware the not-emulated gaps.
 - Remove the vpci_val union and use a uint32_t instead.
 - Change handler read type to return a uint32_t instead of modifying
   a variable passed by reference.
 - Constify the data opaque parameter of read handlers.
 - Change the size parameter of the vpci_{read/write} functions to
   unsigned int.
 - Place the array of initialization handlers in init.rodata or
   .rodata depending on whether late-hwdom is enabled.
 - Remove the pci_devs lock, assume the Dom0 is well behaved and won't
   remove the device while trying to access it.
 - Change the recursive spinlock into a rw lock for performance
   reasons.

Changes since v3:
* User-space test harness:
 - Fix spaces in container_of macro.
 - Implement a dummy locking functions.

[Xen-devel] [PATCH v12 00/12] vpci: PCI config space emulation

2018-03-22 Thread Roger Pau Monne
Hello,

The following series contain an implementation of handlers for the PCI
configuration space inside of Xen. This allows Xen to detect accesses
to the PCI configuration space and react accordingly.

Why is this needed? IMHO, there are two main points of doing all this
emulation inside of Xen, the first one is to prevent adding a bunch of
duplicated Xen PV specific code to each OS we want to support in PVH
mode. This just promotes Xen code duplication amongst OSes, which
leads to a higher maintainership burden.

The second reason would be that this code (or it's functionality to be
more precise) already exists in QEMU (and pciback to a degree), and
it's code that we already support and maintain. By moving it into the
hypervisor itself every guest type can make use of it, and should be
shared between them all. I know that the code in this series is not
yet suitable for DomU HVM guests in it's current state, but it should
be in due time.

As usual, each patch contains a changeset summary between versions,
I'm not going to copy the list of changes here.

The branch containing the patches can be found at:

git://xenbits.xen.org/people/royger/xen.git vpci_v11

Note that this is only safe to use for the hardware domain (that's
trusted), any non-trusted domain will need a lot more handlers before it
can freely access the PCI configuration space.

Roger Pau Monne (12):
  vpci: introduce basic handlers to trap accesses to the PCI config
space
  x86/mmcfg: add handlers for the PVH Dom0 MMCFG areas
  x86/physdev: enable PHYSDEVOP_pci_mmcfg_reserved for PVH Dom0
  pci: split code to size BARs from pci_add_device
  pci: add support to size ROM BARs to pci_size_mem_bar
  xen: introduce rangeset_consume_ranges
  vpci: add header handlers
  x86/pt: mask MSI vectors on unbind
  vpci/msi: add MSI handlers
  vpci: add a priority parameter to the vPCI register initializer
  vpci/msix: add MSI-X handlers
  vpci: do not expose unneeded functions to the user-space test harness

 .gitignore|   3 +
 tools/libxl/libxl_x86.c   |   2 +-
 tools/tests/Makefile  |   1 +
 tools/tests/vpci/Makefile |  33 +++
 tools/tests/vpci/emul.h   | 134 +
 tools/tests/vpci/main.c   | 309 +
 xen/arch/arm/xen.lds.S|  14 +
 xen/arch/x86/Kconfig  |   1 +
 xen/arch/x86/domain.c |   6 +-
 xen/arch/x86/hvm/dom0_build.c |  23 +-
 xen/arch/x86/hvm/hvm.c|   7 +
 xen/arch/x86/hvm/hypercall.c  |   5 +
 xen/arch/x86/hvm/io.c | 293 
 xen/arch/x86/hvm/ioreq.c  |   4 +
 xen/arch/x86/hvm/vmsi.c   | 246 +
 xen/arch/x86/msi.c|   3 +
 xen/arch/x86/physdev.c|  11 +
 xen/arch/x86/setup.c  |   2 +-
 xen/arch/x86/x86_64/mmconfig.h|   4 -
 xen/arch/x86/xen.lds.S|  14 +
 xen/common/rangeset.c |  28 ++
 xen/drivers/Kconfig   |   3 +
 xen/drivers/Makefile  |   1 +
 xen/drivers/passthrough/io.c  |  15 +
 xen/drivers/passthrough/pci.c | 104 ---
 xen/drivers/vpci/Makefile |   1 +
 xen/drivers/vpci/header.c | 564 ++
 xen/drivers/vpci/msi.c| 349 +++
 xen/drivers/vpci/msix.c   | 458 +++
 xen/drivers/vpci/vpci.c   | 482 
 xen/include/asm-x86/domain.h  |   1 +
 xen/include/asm-x86/hvm/domain.h  |   7 +
 xen/include/asm-x86/hvm/io.h  |  20 ++
 xen/include/asm-x86/msi.h |   3 +
 xen/include/asm-x86/pci.h |   6 +
 xen/include/public/arch-x86/xen.h |   5 +-
 xen/include/xen/irq.h |   1 +
 xen/include/xen/pci.h |   9 +
 xen/include/xen/pci_regs.h|   8 +
 xen/include/xen/rangeset.h|  10 +
 xen/include/xen/sched.h   |   4 +
 xen/include/xen/vpci.h| 225 +++
 42 files changed, 3373 insertions(+), 46 deletions(-)
 create mode 100644 tools/tests/vpci/Makefile
 create mode 100644 tools/tests/vpci/emul.h
 create mode 100644 tools/tests/vpci/main.c
 create mode 100644 xen/drivers/vpci/Makefile
 create mode 100644 xen/drivers/vpci/header.c
 create mode 100644 xen/drivers/vpci/msi.c
 create mode 100644 xen/drivers/vpci/msix.c
 create mode 100644 xen/drivers/vpci/vpci.c
 create mode 100644 xen/include/xen/vpci.h

-- 
2.16.2


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12 11/12] vpci/msix: add MSI-X handlers

2018-03-22 Thread Roger Pau Monne
Add handlers for accesses to the MSI-X message control field on the
PCI configuration space, and traps for accesses to the memory region
that contains the MSI-X table and PBA. This traps detect attempts from
the guest to configure MSI-X interrupts and properly sets them up.

Note that accesses to the Table Offset, Table BIR, PBA Offset and PBA
BIR are not trapped by Xen at the moment.

Finally, turn the panic in the Dom0 PVH builder into a warning.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
[IO]
Reviewed-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
Cc: Paul Durrant 
---
Changes since v10:
 - Do not continue to print msix entries if the MSIX struct has
   changed it's address while processing softirqs.
 - Use unsigned long to store the frame numbers in modify_bars.
 - Use lu to print frame values in modify_bars.

Changes since v9:
 - Unlock/lock when calling process_pending_softirqs.
 - Change vpci_msix_arch_print to return int in order to signal
   failure to continue after having processed softirqs.
 - Use a power of 2 to do the module.
 - Use PFN_DOWN in order to calculate the end of the MSI-X memory
   areas for the rangeset.

Changes since v8:
 - Call process_pending_softirqs between printing MSI-X entries.
 - Free msix struct in vpci_add_handlers.
 - Print only MSI or MSI-X if they are enabled.
 - Fix comment in update_entry.

Changes since v7:
 - Switch vpci.h macros to inline functions.
 - Change vpci_msix_arch_print_entry into vpci_msix_arch_print and
   make it print all the entries.
 - Add a log message if rangeset_remove_range fails to remove the BAR
   MSI-related range.
 - Introduce a new update_entry to disable and enable a MSIX entry in
   order to either update or set it up. This removes open coding it in
   two different places.
 - Unify access checks in access_allowed.
 - Add newlines between switch cases.
 - Expand max_entries to 12 bits.

Changes since v6:
 - Reduce the output of the debug keys.
 - Fix comments and code to match in vpci_msix_control_write.
 - Optimize size of the MSIX structure.
 - Convert 'tables[]' to a uint32_t in order to reduce the size of
   vpci_msix. Introduce some macros to make it easier to get the MSIX
   tables related data.
 - Limit size of the bool fields to 1 bit.
 - Remove the 'nr' field of vpci_msix_entry. The position can be
   calculated from the base of the entries array.
 - Drop the 'vpci_' prefix from the functions in msix.c, they are all
   static.
 - Remove the val local variable in control_read.
 - Initialize new_masked and new_enabled at declaration.
 - Recalculate the msix control value before writing it.
 - Remove the seg and bus local variables and use pdev->seg and
   pdev->bus instead.
 - Initialize msix at declaration in msix_{write/read}.
 - Add the must_check attribute to
   vpci_msix_arch_{enable/disable}_entry.

Changes since v5:
 - Update lock usage.
 - Unbind/unmap PIRQs when MSIX is disabled.
 - Share the arch-specific MSIX code with the MSI functions.
 - Do not reference the MSIX memory areas from the PCI BARs fields,
   instead fetch the BIR and offset each time needed.
 - Add the '_entry' suffix to the MSIX arch functions.
 - Prefix the vMSIX macros with 'V'.
 - s/gdprintk/gprintk/ in msix.c
 - Make vpci_msix_access_check return bool, and change it's name to
   vpci_msix_access_allowed.
 - Join the first two ifs in vpci_msix_{read/write} into a single one.
 - Allow Dom0 to write to the PBA area.
 - Add a note that reads from the PBA area will need to be translated
   if the PBA it's not identity mapped.

Changes since v4:
 - Remove parentheses around offsetof.
 - Add "being" to MSI-X enabling comment.
 - Use INVALID_PIRQ.
 - Add a simple sanity check to vpci_msix_arch_enable in order to
   detect wrong MSI-X entries more quickly.
 - Constify vpci_msix_arch_print entry argument.
 - s/cpu/fixed/ in vpci_msix_arch_print.
 - Dump the MSI-X info together with the MSI info.
 - Fix vpci_msix_control_write to take into account changes to the
   address and data fields when switching the function mask bit.
 - Only disable/enable the entries if the address or data fields have
   been updated.
 - Usew the BAR enable field to check if a BAR is mapped or not
   (instead of reading the command register for each device).
 - Fix error path in vpci_msix_read to set the return data to ~0.
 - Simplify mask usage in vpci_msix_write.
 - Cast data to uint64_t when shifting it 32 bits.
 - Fix writes to the table entry control register to take into account
   if the mask-all bit is set.
 - Add some 

[Xen-devel] [PATCH v12 03/12] x86/physdev: enable PHYSDEVOP_pci_mmcfg_reserved for PVH Dom0

2018-03-22 Thread Roger Pau Monne
So that MMCFG regions not present in the MCFG ACPI table can be added
at run time by the hardware domain.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
Reviewed-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Paul Durrant 
---
Changes since v7:
 - Add newline in hvm_physdev_op for non-fallthrough case.

Changes since v6:
 - Do not return EEXIST if the same exact region is already tracked by
   Xen.

Changes since v5:
 - Check for has_vpci before calling register_vpci_mmcfg_handler
   instead of checking for is_hvm_domain.

Changes since v4:
 - Change the hardware_domain check in hvm_physdev_op to a vpci check.
 - Only register the MMCFG area, but don't scan it.

Changes since v3:
 - New in this version.
---
 xen/arch/x86/hvm/hypercall.c |  5 +
 xen/arch/x86/hvm/io.c| 16 +++-
 xen/arch/x86/physdev.c   | 11 +++
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 5742dd1797..85eacd7d33 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -89,6 +89,11 @@ static long hvm_physdev_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 if ( !has_pirq(curr->domain) )
 return -ENOSYS;
 break;
+
+case PHYSDEVOP_pci_mmcfg_reserved:
+if ( !has_vpci(curr->domain) )
+return -ENOSYS;
+break;
 }
 
 if ( !curr->hcall_compat )
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 04425c064b..556810c126 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -507,10 +507,9 @@ static const struct hvm_mmio_ops vpci_mmcfg_ops = {
 .write = vpci_mmcfg_write,
 };
 
-int __hwdom_init register_vpci_mmcfg_handler(struct domain *d, paddr_t addr,
- unsigned int start_bus,
- unsigned int end_bus,
- unsigned int seg)
+int register_vpci_mmcfg_handler(struct domain *d, paddr_t addr,
+unsigned int start_bus, unsigned int end_bus,
+unsigned int seg)
 {
 struct hvm_mmcfg *mmcfg, *new = xmalloc(struct hvm_mmcfg);
 
@@ -535,9 +534,16 @@ int __hwdom_init register_vpci_mmcfg_handler(struct domain 
*d, paddr_t addr,
 if ( new->addr < mmcfg->addr + mmcfg->size &&
  mmcfg->addr < new->addr + new->size )
 {
+int ret = -EEXIST;
+
+if ( new->addr == mmcfg->addr &&
+ new->start_bus == mmcfg->start_bus &&
+ new->segment == mmcfg->segment &&
+ new->size == mmcfg->size )
+ret = 0;
 write_unlock(>arch.hvm_domain.mmcfg_lock);
 xfree(new);
-return -EEXIST;
+return ret;
 }
 
 if ( list_empty(>arch.hvm_domain.mmcfg_regions) )
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 380d36f6b9..984491c3dc 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -557,6 +557,17 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 ret = pci_mmcfg_reserved(info.address, info.segment,
  info.start_bus, info.end_bus, info.flags);
+if ( !ret && has_vpci(currd) )
+{
+/*
+ * For HVM (PVH) domains try to add the newly found MMCFG to the
+ * domain.
+ */
+ret = register_vpci_mmcfg_handler(currd, info.address,
+  info.start_bus, info.end_bus,
+  info.segment);
+}
+
 break;
 }
 
-- 
2.16.2


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12 09/12] vpci/msi: add MSI handlers

2018-03-22 Thread Roger Pau Monne
Add handlers for the MSI control, address, data and mask fields in
order to detect accesses to them and setup the interrupts as requested
by the guest.

Note that the pending register is not trapped, and the guest can
freely read/write to it.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
[IO]
Reviewed-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
Cc: Paul Durrant 
---
Changes since v8:
 - Add a FIXME about the lack of testing and a comment regarding the
   lack of cleaning done in the init_msi error path.
 - Free msi struct when cleaning up if an init function failed.
 - Remove the 'error' label of init_msi, the caller will already
   perform the cleaning.

Changes since v7:
 - Don't store pci segment/bus on local variables.
 - Add an error label to init_msi.
 - Don't trap accesses to the PBA.
 - Fix msi_pending_bits_reg macro so it matches coding style.
 - Move the position of vectors in the vpci_msi struct.
 - Add a comment to clarify the expected state of vectors after
   pt_irq_create_bind and use XEN_DOMCTL_VMSI_X86_UNMASKED.

Changes since v6:
 - Use domain_spin_lock_irq_desc instead of open coding it.
 - Reduce the size of printed debug messages.
 - Constify domain in vpci_dump_msi.
 - Lock domlist_read_lock before iterating over the list of domains.
 - Make max_vectors and vectors uint8_t.
 - Drop the vpci_ prefix from the static functions in msi.c.
 - Turn the booleans in vpci_msi into bitfields.
 - Apply the mask bits to all vectors when enabling msi.
 - Remove the pos field.
 - Remove the usage of __msi_set_{enable/disable}.
 - Update the bindings when the message or data fields are updated.
 - Make vpci_msi_arch_disable return void, it wasn't returning any
   error.
 - Prevent the guest from writing to the pending bits field, it's read
   only as defined in the spec.
 - Add the must_check attribute to vpci_msi_arch_enable.

Changes since v5:
 - Update to new lock usage.
 - Change handlers to match the new type.
 - s/msi_flags/msi_gflags/, remove the local variables and use the new
   DOMCTL_VMSI_* defines.
 - Change the MSI arch function to take a vpci_msi instead of a
   vpci_arch_msi as parameter.
 - Fix the calculation of the guest vector for MSI injection to take
   into account the number of bits that can be modified.
 - Use INVALID_PIRQ everywhere.
 - Simplify exit path of vpci_msi_disable.
 - Remove the conditional when setting address64 and masking fields.
 - Add a process_pending_softirqs to the MSI dump loop.
 - Place the prototypes for the MSI arch-specific functions in
   xen/vpci.h.
 - Add parentheses around the INVALID_PIRQ definition.

Changes since v4:
 - Fix commit message.
 - Change the ASSERTs in vpci_msi_arch_mask into ifs.
 - Introduce INVALID_PIRQ.
 - Destroy the partially created bindings in case of failure in
   vpci_msi_arch_enable.
 - Just take the pcidevs lock once in vpci_msi_arch_disable.
 - Print an error message in case of failure of pt_irq_destroy_bind.
 - Make vpci_msi_arch_init return void.
 - Constify the arch parameter of vpci_msi_arch_print.
 - Use fixed instead of cpu for msi redirection.
 - Separate the header includes in vpci/msi.c between xen and asm.
 - Store the number of configured vectors even if MSI is not enabled
   and always return it in vpci_msi_control_read.
 - Fix/add comments in vpci_msi_control_write to clarify intended
   behavior.
 - Simplify usage of masks in vpci_msi_address_{upper_}write.
 - Add comment to vpci_msi_mask_{read/write}.
 - Don't use MASK_EXTR in vpci_msi_mask_write.
 - s/msi_offset/pos/ in vpci_init_msi.
 - Move control variable setup closer to it's usage.
 - Use d%d in vpci_dump_msi.
 - Fix printing of bitfield mask in vpci_dump_msi.
 - Fix definition of MSI_ADDR_REDIRECTION_MASK.
 - Shuffle the layout of vpci_msi to minimize gaps.
 - Remove the error label in vpci_init_msi.

Changes since v3:
 - Propagate changes from previous versions: drop xen_ prefix, drop
   return value from handlers, use the new vpci_val fields.
 - Use MASK_EXTR.
 - Remove the usage of GENMASK.
 - Add GFLAGS_SHIFT_DEST_ID and use it in msi_flags.
 - Add "arch" to the MSI arch specific functions.
 - Move the dumping of vPCI MSI information to dump_msi (key 'M').
 - Remove the guest_vectors field.
 - Allow the guest to change the number of active vectors without
   having to disable and enable MSI.
 - Check the number of active vectors when parsing the disable
   mask.
 - Remove the debug messages from vpci_init_msi.
 - Move the arch-specific part of the dump handler to x86/hvm/vmsi.c.
 - Use trylock 

Re: [Xen-devel] [PATCH v3a 39/39] ARM: VGIC: wire new VGIC(-v2) files into Xen build system

2018-03-22 Thread Julien Grall

Hi Andre,

On 03/22/2018 11:56 AM, Andre Przywara wrote:

Now that we have both the old VGIC prepared to cope with a sibling and
the code for the new VGIC in place, lets add a Kconfig option to enable
the new code and wire it into the Xen build system.
This will add a compile time option to use either the "old" or the "new"
VGIC.
In the moment this is restricted to a vGIC-v2. To make the build system
happy, we provide a temporary dummy implementation of
vgic_v3_setup_hw() to allow building for now.

Signed-off-by: Andre Przywara 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12 04/12] pci: split code to size BARs from pci_add_device

2018-03-22 Thread Roger Pau Monne
So that it can be called from outside in order to get the size of regular PCI
BARs. This will be required in order to map the BARs from PCI devices into PVH
Dom0 p2m.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
---
Changes since v11:
 - Fix initialization of sbdf with gcc 4.3.

Changes since v7:
 - Do not return error from pci_size_mem_bar in order to keep previous
   behavior.

Changes since v6:
 - Remove the vf and addr local variables.
 - Change the way flags are declared.
 - Move the last bool parameter to the flags field.

Changes since v5:
 - Introduce a flags field for pci_size_mem_bar.
 - Use pci_sbdf_t.

Changes since v4:
 - Restore printing whether the BAR is from a vf.
 - Make the psize pointer parameter not optional.
 - s/u64/uint64_t.
 - Remove some unneeded parentheses.
 - Assert the return value is never 0.
 - Use the newly introduced pci_sbdf_t type.

Changes since v3:
 - Rename function to size BARs to pci_size_mem_bar.
 - Change the parameters passed to the function. Pass the position and
   whether the BAR is the last one, instead of the (base, max_bars,
   *index) tuple.
 - Make the function return the number of BARs consumed (1 for 32b, 2
   for 64b BARs).
 - Change the dprintk back to printk.
 - Do not log another error message in pci_add_device in case
   pci_size_mem_bar fails.
---
 xen/drivers/passthrough/pci.c | 94 +++
 xen/include/xen/pci.h |  5 +++
 2 files changed, 65 insertions(+), 34 deletions(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e65c7faa6f..c0846e8ebb 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -603,6 +603,56 @@ static int iommu_add_device(struct pci_dev *pdev);
 static int iommu_enable_device(struct pci_dev *pdev);
 static int iommu_remove_device(struct pci_dev *pdev);
 
+unsigned int pci_size_mem_bar(pci_sbdf_t sbdf, unsigned int pos,
+  uint64_t *paddr, uint64_t *psize,
+  unsigned int flags)
+{
+uint32_t hi = 0, bar = pci_conf_read32(sbdf.seg, sbdf.bus, sbdf.dev,
+   sbdf.func, pos);
+uint64_t size;
+
+ASSERT((bar & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_MEMORY);
+pci_conf_write32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, pos, ~0);
+if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+ PCI_BASE_ADDRESS_MEM_TYPE_64 )
+{
+if ( flags & PCI_BAR_LAST )
+{
+printk(XENLOG_WARNING
+   "%sdevice %04x:%02x:%02x.%u with 64-bit %sBAR in last 
slot\n",
+   (flags & PCI_BAR_VF) ? "SR-IOV " : "", sbdf.seg, sbdf.bus,
+   sbdf.dev, sbdf.func, (flags & PCI_BAR_VF) ? "vf " : "");
+*psize = 0;
+return 1;
+}
+hi = pci_conf_read32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, pos + 4);
+pci_conf_write32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, pos + 4, ~0);
+}
+size = pci_conf_read32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, pos) &
+   PCI_BASE_ADDRESS_MEM_MASK;
+if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+ PCI_BASE_ADDRESS_MEM_TYPE_64 )
+{
+size |= (uint64_t)pci_conf_read32(sbdf.seg, sbdf.bus, sbdf.dev,
+  sbdf.func, pos + 4) << 32;
+pci_conf_write32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, pos + 4, hi);
+}
+else if ( size )
+size |= (uint64_t)~0 << 32;
+pci_conf_write32(sbdf.seg, sbdf.bus, sbdf.dev, sbdf.func, pos, bar);
+size = -size;
+
+if ( paddr )
+*paddr = (bar & PCI_BASE_ADDRESS_MEM_MASK) | ((uint64_t)hi << 32);
+*psize = size;
+
+if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+ PCI_BASE_ADDRESS_MEM_TYPE_64 )
+return 2;
+
+return 1;
+}
+
 int pci_add_device(u16 seg, u8 bus, u8 devfn,
const struct pci_dev_info *info, nodeid_t node)
 {
@@ -672,11 +722,13 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 unsigned int i;
 
 BUILD_BUG_ON(ARRAY_SIZE(pdev->vf_rlen) != PCI_SRIOV_NUM_BARS);
-for ( i = 0; i < PCI_SRIOV_NUM_BARS; ++i )
+for ( i = 0; i < PCI_SRIOV_NUM_BARS; )
 {
 unsigned int idx = pos + PCI_SRIOV_BAR + i * 4;
 u32 bar = pci_conf_read32(seg, bus, slot, func, idx);
-u32 hi = 0;
+pci_sbdf_t sbdf = {
+.sbdf = PCI_SBDF3(seg, bus, devfn),
+};
 

Re: [Xen-devel] [PATCH v3 14/39] ARM: new VGIC: Add GICv2 world switch backend

2018-03-22 Thread Andre Przywara
This is a "patch to the patch" mentioned above, to make it clear what
changed:
We now take the desc lock in vgic_v2_fold_lr_state() when we are dealing
with a hardware IRQ. This is a bit complicated, because we have to obey
the existing locking order, so do our infamous "drop-take-retake" dance.
Also I print a message about using the new VGIC and fix that last
remaining "u32" usage.

Please note that I had to initialise "desc" to NULL because my compiler
(GCC 5.3) is not smart enough to see that we only use it with irq->hw
set and it's safe. Please let me know if it's me not being smart enough
here instead ;-)

Signed-off-by: Andre Przywara 
---
Hi,

will send a proper, merged v3a version of the patch separately.

Cheers,
Andre

 xen/arch/arm/vgic/vgic-v2.c | 43 ++-
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
index 5516a8534f..3424a4a66f 100644
--- a/xen/arch/arm/vgic/vgic-v2.c
+++ b/xen/arch/arm/vgic/vgic-v2.c
@@ -43,6 +43,8 @@ void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t 
csize,
 gic_v2_hw_data.csize = csize;
 gic_v2_hw_data.vbase = vbase;
 gic_v2_hw_data.aliased_offset = aliased_offset;
+
+printk("Using the new VGIC implementation.\n");
 }
 
 /*
@@ -69,6 +71,8 @@ void vgic_v2_fold_lr_state(struct vcpu *vcpu)
 struct gic_lr lr_val;
 uint32_t intid;
 struct vgic_irq *irq;
+struct irq_desc *desc = NULL;
+bool have_desc_lock = false;
 
 gic_hw_ops->read_lr(lr, _val);
 
@@ -88,18 +92,30 @@ void vgic_v2_fold_lr_state(struct vcpu *vcpu)
 intid = lr_val.virq;
 irq = vgic_get_irq(vcpu->domain, vcpu, intid);
 
-spin_lock_irqsave(>irq_lock, flags);
+local_irq_save(flags);
+spin_lock(>irq_lock);
+
+/* The locking order forces us to drop and re-take the locks here. */
+if ( irq->hw )
+{
+spin_unlock(>irq_lock);
+
+desc = irq_to_desc(irq->hwintid);
+spin_lock(>lock);
+spin_lock(>irq_lock);
+
+/* This h/w IRQ should still be assigned to the virtual IRQ. */
+ASSERT(irq->hw && desc->irq == irq->hwintid);
+
+have_desc_lock = true;
+}
 
 /*
  * If a hardware mapped IRQ has been handled for good, we need to
  * clear the _IRQ_INPROGRESS bit to allow handling of new IRQs.
  */
 if ( irq->hw && !lr_val.active && !lr_val.pending )
-{
-struct irq_desc *irqd = irq_to_desc(irq->hwintid);
-
-clear_bit(_IRQ_INPROGRESS, >status);
-}
+clear_bit(_IRQ_INPROGRESS, >status);
 
 /* Always preserve the active bit */
 irq->active = lr_val.active;
@@ -132,18 +148,19 @@ void vgic_v2_fold_lr_state(struct vcpu *vcpu)
  */
 if ( vgic_irq_is_mapped_level(irq) && lr_val.pending )
 {
-struct irq_desc *irqd;
-
 ASSERT(irq->hwintid >= VGIC_NR_PRIVATE_IRQS);
 
-irqd = irq_to_desc(irq->hwintid);
-irq->line_level = gic_read_pending_state(irqd);
+irq->line_level = gic_read_pending_state(desc);
 
 if ( !irq->line_level )
-gic_set_active_state(irqd, false);
+gic_set_active_state(desc, false);
 }
 
-spin_unlock_irqrestore(>irq_lock, flags);
+spin_unlock(>irq_lock);
+if ( have_desc_lock )
+spin_unlock(>lock);
+local_irq_restore(flags);
+
 vgic_put_irq(vcpu->domain, irq);
 }
 
@@ -184,7 +201,7 @@ void vgic_v2_populate_lr(struct vcpu *vcpu, struct vgic_irq 
*irq, int lr)
 
 if ( vgic_irq_is_sgi(irq->intid) )
 {
-u32 src = ffs(irq->source);
+uint32_t src = ffs(irq->source);
 
 BUG_ON(!src);
 lr_val.virt.source = (src - 1);
-- 
2.14.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ

2018-03-22 Thread Andre Przywara
Hi,

On 22/03/18 01:51, Julien Grall wrote:
> Hi Andre,
> 
> On 03/21/2018 04:31 PM, Andre Przywara wrote:
>> When playing around with hardware mapped, level triggered virtual IRQs,
>> there is the need to explicitly set the active or pending state of an
>> interrupt at some point.
>> To prepare the GIC for that, we introduce a set_active_state() and a
>> set_pending_state() function to let the VGIC manipulate the state of
>> an associated hardware IRQ.
>> This takes care of properly setting the _IRQ_INPROGRESS bit.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>> Changelog v2 ... v3:
>> - rework setting _IRQ_INPROGRESS bit:
>>    - no change when changing active state
>>    - unconditional set/clear on changing pending state
>> - drop introduction of gicv[23]_peek_irq() (only needed in the next
>> patch now)
>>
>> Changelog v1 ... v2:
>> - properly set _IRQ_INPROGRESS bit
>> - add gicv[23]_peek_irq() (pulled in from later patch)
>> - move wrappers functions into gic.h
>>
>>   xen/arch/arm/gic-v2.c | 36 
>>   xen/arch/arm/gic-v3.c | 32 
>>   xen/include/asm-arm/gic.h | 24 
>>   3 files changed, 92 insertions(+)
>>
>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> index aa0fc6c1a1..d1f1578c05 100644
>> --- a/xen/arch/arm/gic-v2.c
>> +++ b/xen/arch/arm/gic-v2.c
>> @@ -243,6 +243,40 @@ static void gicv2_poke_irq(struct irq_desc *irqd,
>> uint32_t offset)
>>   writel_gicd(1U << (irqd->irq % 32), offset + (irqd->irq / 32) * 4);
>>   }
>>   +static void gicv2_set_active_state(struct irq_desc *irqd, bool active)
>> +{
>> +    ASSERT(spin_is_locked(>lock));
>> +
>> +    if ( active )
>> +    {
>> +    if ( test_bit(_IRQ_GUEST, >status) )
> 
> I don't understand why you only set/clear INPROGRESS bit for interrupt
> routed to guest. This will matter when releasing interrupt used by Xen
> (see release_irq).

D'oh, indeed! Seems like I am too focused on the _V_GIC these days ;-)

Fixed.

Cheers,
Andre.

> Note that I don't expect this helper to be call on Xen IRQ, but I think
> we should make
> 
> Other than same remark on GICv3 code, the pending implementation looks
> good to me now.
> 
> Cheers,
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   >