Re: [Xen-devel] [PATCH RFC v1 56/74] xen/pvshim: add grant table operations

2018-01-09 Thread Jan Beulich
>>> On 09.01.18 at 19:34,  wrote:
> On Mon, Jan 08, 2018 at 10:19:39AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:06,  wrote:
>> > +{
>> > +struct gnttab_query_size op;
>> > +int rc;
>> > +
>> > +if ( unlikely(copy_from_guest(, uop, 1)) )
>> > +{
>> > +rc = -EFAULT;
>> > +break;
>> > +}
>> > +
>> > +rc = xen_hypercall_grant_table_op(GNTTABOP_query_size, , 
>> > count);
>> > +if ( rc )
>> > +break;
>> > +
>> > +if ( copy_to_guest(uop, , 1) )
>> 
>> __copy_to_guest() (assuming this coping in and out is necessary
>> in the first place).
> 
> I guess this could be bypassed by just using uop instead of op in the
> hypercall?

That's my impression, but you doing the copying made me assume
you might have found a case where things don't work without
copying.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen Project Spectre/Meltdown FAQ

2018-01-09 Thread Juergen Gross
On 10/01/18 04:58, Peter wrote:
> On 2018-01-09 15:04, Stefano Stabellini wrote:
>> On Sun, 7 Jan 2018, Marek Marczykowski-Górecki wrote:
>>> On Fri, Jan 05, 2018 at 07:05:56PM +, Andrew Cooper wrote:
>>> > On 05/01/18 18:16, Rich Persaud wrote:
>>> > >> On Jan 5, 2018, at 06:35, Lars Kurth >> > >> > wrote:
>>> > >> Linux’s KPTI series is designed to address SP3 only.  For Xen
>>> guests,
>>> > >> only 64-bit PV guests are affected by SP3. A KPTI-like approach was
>>> > >> explored initially, but required significant ABI changes.  
>>>
>>> Is some partial KPTI-like approach feasible? Like unmapping memory owned
>>> by other guests, but keeping Xen areas mapped? This will still allow
>>> leaking Xen memory, but there are very few secrets there (vCPUs state,
>>> anything else?), so overall impact will be much lower.
>>
>> +1
>>
> 
> I believe
> https://blog.xenproject.org/2018/01/04/xen-project-spectremeltdown-faq/
> is clear re VMs attacking/accessing the host/dom0/hypervisor and the
> mitigations for that.
> 
> However the page seems ambiguous about whether 64 bit VMs running as
> PVHv2 with host provided kernels are protected or not (from each VM's
> own processes).

PVHv2 is using exactly the same runtime environment as HVM seen from the
hypervisor. So a guest running as PVHv2 needs a PTI like approach like
HVM in its kernel.

> Can the page be updated to be more explicit and perhaps describe how the
> VM kernel or how the PVHv2 virtualization provides that protection.  And
> ideally how that could be checked from the VM itself.  e.g. grep pti
> /proc/cpuinfo?

As this is really guest specific this information can't be provided by
Xen.

> e.g. the page says: "Guest kernels running in 64-bit PV mode are not
> directly vulnerable to attack using SP3, because 64-bit PV guests
> already run in a KPTI-like mode." but it does not mention PVHv2 for
> that.  Is it protected under PVHv2?  Does it depend on the kernel?  Is
> so what is the patchset/option/mechanism that protects the VM from its
> own processes?

This question should have been answered above already.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Juergen Gross
On 09/01/18 23:11, Hans van Kranenburg wrote:
> On 01/09/2018 07:22 PM, Rich Persaud wrote:
 On Jan 9, 2018, at 12:56, Stefano Stabellini  
 wrote:

 On Tue, 9 Jan 2018, Doug Goldstein wrote:
 On 1/9/18 11:33 AM, Jan Beulich wrote:
 On 09.01.18 at 18:23,  wrote:
>> On Tue, Jan 9, 2018 at 8:52 AM, Stefano Stabellini
>>  wrote:
> On Tue, 9 Jan 2018, George Dunlap wrote:
> On Mon, Jan 8, 2018 at 9:01 PM, Rich Persaud  
> wrote:
> On a similarly pragmatic note: would a variation of Anthony's vixen 
> patch
>> series be suitable for pre-PVH Xen 4.6 - 4.9?  These versions are 
>> currently 
>> documented as security-supported (Oct 2018 - July 2020).

 Hmm, Ian's mail seems to be focusing on the idea of checking in a
 non-polished series to 4.10, rather than exctly what the content of
 that series would be.

 In the IRL conversation that preceeded this mail, the new short-term
 target we discussed was:
 1. A 4.10-based shim that could boot either under HVM or PVH
 2. A script that would take an existing PV config, and spit out a) a
 bootable ISO with the shim & whatever was needed, and b) a new config
 that would boot the same VM, but in HVM mode with the shim

 The script + a 4.10 shim binary *should* allow most PV guests to boot
 without any changes whatsoever for most older versions of Xen.

 There are a number of people for whom this won't work; I think we also
 need to provide a way to transparently change PV guests into PVshim
 guests.  But that will necessarily involve significant toolstack
 functionality, at which point you might as well backport PVH as well.
>>>
>>> Yes, there will be a number of people that won't be covered by this fix,
>>> including those that can't use HVM/PVH mode because VT-x isn't available
>>> at all in their environment. That is the only reason to run PV today.
>>> Providing a way to transparently change PV guests into PVshim guests
>>> won't cover any of these cases. A more complete workaround to SP3 is
>>> along the lines of https://marc.info/?l=xen-devel=151509740625690.
>>>
>>> That said, I realize that we are only trying to do the best we can in a
>>> very difficult situation, with very little time in our hands. I agree
>>> with Ian that we should commit something unpolished and only partially
>>> reviewed soon, even though it doesn't cover a good chunk of the userbase
>>> for one reason or another. Even if migration doesn't work, it will still
>>> help all that don't require it. It is only a partial fix by nature
>>> anyway.
>>
>> Can people be a bit more explicit about what they think should be done 
>> here?
>>
>> I'm happy to redirect effort to PVH shim if that's what the solution
>> is going to be.
>>
>> I obviously prefer the HVM approach as it works on a broad range of Xen 
>> versions
>> without modification but I'm keen to get something done quickly and
>> don't want to
>> waste effort.
>
> From what I've read today, I have no reason to believe the PVH
> shim won't work in HVM mode. How would the HVM-only approach
> be better in that case?
>
> Jan

 I feel like I should state the obvious here. Its tested over a large
 data set.
>>>
>>> Right: if we are going to commit something unpolished and unreviewed,
>>> let it be at least very well tested by the submitter. Honest question:
>>> how much more dev we need on PVShim before we get it to similar
>>> levels of confidence?
>>
> 
>> Since the primary audience for security fixes are production
>> deployments of Xen where customer assets are at risk, is there an
>> estimate for the percentage/size of Xen deployments where PVH (not
>> only Xen 4.10) has already been deployed for production customers?
>> That could give other customers more confidence in deploying PVH in
>> production.
> +1
> 
> I have been hearing mostly-very-positive stories around, except for the
> missing pvgrub2 support. :)

https://lists.xen.org/archives/html/xen-devel/2017-11/msg01795.html

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread pedro

On 2018-01-10 11:11, Hans van Kranenburg wrote:




Since the primary audience for security fixes are production
deployments of Xen where customer assets are at risk, is there an
estimate for the percentage/size of Xen deployments where PVH (not
only Xen 4.10) has already been deployed for production customers?
That could give other customers more confidence in deploying PVH in
production.

+1

I have been hearing mostly-very-positive stories around, except for the
missing pvgrub2 support. :)

But as a sysadmin who's also strongly considering to jump to 4.10 and
PVH I'd definitely like to hear more stories.

Hans


I deployed deployed Xen 4.10 PVHv2 (from Xen 4.7/4.9 PV) to a good 
number of hosts, VMs.  Not seeing anything too alarming as yet.


Some issues per email thread 'DomU not starting under pvhv2' where some 
CPUs don't appear to work for me.  Seems like these older CPUs are 
giving us problems:  cat /proc/cpuinfo | egrep -qai ' E4600 | E3110 | 
E5310 | E5320 | E5410 | E5420 | X3220 |Processor 4284|Processor 2212$| 
Q6600 | E4500 ' &&   echo "CPU not supporting PVHv2"


Still needing/wanting a solution for VMs currently running pv-grub or PV 
DomUs stuck on old distros that fail with pvh compatible kernels 
(4.??+).


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 00/24] Vixen: A PV-in-HVM shim

2018-01-09 Thread Sarah Newman
On 01/09/2018 09:07 PM, Anthony Liguori wrote:
> On Tue, Jan 9, 2018 at 8:46 PM, Sarah Newman  wrote:
>> vixen-upstream-v2 hangs for us after dumping the e820 map. We're able to 
>> build and run vixen-upstream-v1.
> 
> Can give me more details about your guest config?  I'm happy to take a
> look and debug.

The HVM pertinent items:

pae = 1
nx = 0
acpi = 1
viridian = 0
xen_platform_pci = 1
apic = 1
boot = 'dc'
sdl = 0
usb = 0
vnc = 0
nographic = 1
vga = "none"
serial = 'pty'

> 
>> My company needs serial input. It looks like that wasn't implemented. If so, 
>> and nobody else is working on patches to enable serial input, I believe
>> we can come up with something in the next day or so. We'll also disable 
>> switching to the xen dom0 console at the same time.
> 
> Yeah, it's not there yet.  There's something in liuw's pvshim branch
> that you might want to look at.

Assuming console input works on that branch, it looks fairly straightforward to 
integrate. Thanks.

--Sarah

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 00/24] Vixen: A PV-in-HVM shim

2018-01-09 Thread Anthony Liguori
On Tue, Jan 9, 2018 at 8:46 PM, Sarah Newman  wrote:
> vixen-upstream-v2 hangs for us after dumping the e820 map. We're able to 
> build and run vixen-upstream-v1.

Can give me more details about your guest config?  I'm happy to take a
look and debug.

> My company needs serial input. It looks like that wasn't implemented. If so, 
> and nobody else is working on patches to enable serial input, I believe
> we can come up with something in the next day or so. We'll also disable 
> switching to the xen dom0 console at the same time.

Yeah, it's not there yet.  There's something in liuw's pvshim branch
that you might want to look at.

Regards,

Anthony Liguori

> --Sarah
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [qemu-mainline test] 117732: tolerable FAIL - PUSHED

2018-01-09 Thread osstest service owner
flight 117732 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/117732/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 117335
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 117335
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 117335
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 117335
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 117335
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 117335
 test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass
 test-amd64-amd64-xl-pvhv2-amd 12 guest-start  fail  never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass

version targeted for testing:
 qemuu4124ea4f5bd367ca6412fb2dfe7ac4d80e1504d9
baseline version:
 qemuueaefea537b476cb853e2edbdc68e969ec777e4bb

Last test of basis   117335  2017-12-19 12:50:51 Z   21 days
Failing since117534  2018-01-02 17:34:07 Z7 days6 attempts
Testing same since   117732  2018-01-08 18:52:04 Z1 days1 attempts


People who touched revisions under test:
  Alex Bennée 
  Aurelien Jarno 
  Chen Hanxiao 
  Cornelia Huck 
  Daniel Henrique Barboza 
  Daniel P. Berrange 
  David Hildenbrand 
  Doug Gale 
  Dr. David Alan Gilbert 
  Ed Swierk 
  Ed Swierk via Qemu-devel 
  Edgar Kaziakhmedov 
  Evgeny Yakovlev 
  Fam Zheng 
  Jason Wang 

Re: [Xen-devel] Xen Project Spectre/Meltdown FAQ

2018-01-09 Thread Peter

On 2018-01-09 15:04, Stefano Stabellini wrote:

On Sun, 7 Jan 2018, Marek Marczykowski-Górecki wrote:

On Fri, Jan 05, 2018 at 07:05:56PM +, Andrew Cooper wrote:
> On 05/01/18 18:16, Rich Persaud wrote:
> >> On Jan 5, 2018, at 06:35, Lars Kurth  >> > wrote:
> >> Linux’s KPTI series is designed to address SP3 only.  For Xen guests,
> >> only 64-bit PV guests are affected by SP3. A KPTI-like approach was
> >> explored initially, but required significant ABI changes.  

Is some partial KPTI-like approach feasible? Like unmapping memory 
owned

by other guests, but keeping Xen areas mapped? This will still allow
leaking Xen memory, but there are very few secrets there (vCPUs state,
anything else?), so overall impact will be much lower.


+1



I believe 
https://blog.xenproject.org/2018/01/04/xen-project-spectremeltdown-faq/ 
is clear re VMs attacking/accessing the host/dom0/hypervisor and the 
mitigations for that.


However the page seems ambiguous about whether 64 bit VMs running as 
PVHv2 with host provided kernels are protected or not (from each VM's 
own processes).


Can the page be updated to be more explicit and perhaps describe how the 
VM kernel or how the PVHv2 virtualization provides that protection.  And 
ideally how that could be checked from the VM itself.  e.g. grep pti 
/proc/cpuinfo?


e.g. the page says: "Guest kernels running in 64-bit PV mode are not 
directly vulnerable to attack using SP3, because 64-bit PV guests 
already run in a KPTI-like mode." but it does not mention PVHv2 for 
that.  Is it protected under PVHv2?  Does it depend on the kernel?  Is 
so what is the patchset/option/mechanism that protects the VM from its 
own processes?




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 1/8] x86/domctl: introduce a pair of hypercall to set and get cpu topology

2018-01-09 Thread Chao Gao
On Tue, Jan 09, 2018 at 11:47:54PM +, Andrew Cooper wrote:
>On 08/01/18 04:01, Chao Gao wrote:
>> Define interface, structures and hypercalls for toolstack to build
>> cpu topology and for guest that will retrieve it [1].
>> Two subop hypercalls introduced by this patch:
>> XEN_DOMCTL_set_cpu_topology to define cpu topology information per domain
>> and XENMEM_get_cpu_topology to retrieve cpu topology information.
>>
>> [1]: during guest creation, those information helps hvmloader to build ACPI.
>>
>> Signed-off-by: Chao Gao 
>
>I'm sorry, but this going in the wrong direction.  Details like this
>should be contained and communicated exclusively in the CPUID policy.
>
>Before the spectre/meltdown fire started, I had a prototype series
>introducing a toolstack interface for getting and setting a full CPUID
>policy at once, rather than piecewise.  I will be continuing with this

Is the new interface able to set CPUID policy for each vCPU rather than
current for each domain? Otherwise I couldn't see how to set APIC_ID
for each vcpu except by introducing a new interface.

>work once the dust settles.
>
>In particular, we should not have multiple ways of conveying the same
>information, or duplication of the same data inside the hypervisor.
>
>If you rearrange your series to put the struct cpuid_policy changes
>first, then patch 2 will become far more simple.  HVMLoader should
>derive its topology information from the CPUID instruction, just as is
>expected on native hardware.

Good point. It seems that in HVMLoader BSP should boot APs in a
broadcase fashion and then information is collected via CPUID and then
build MADT/SRAT.

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] xen/gntdev: Fix partial gntdev_mmap() cleanup

2018-01-09 Thread Boris Ostrovsky



On 01/09/2018 07:10 AM, Ross Lagerwall wrote:

When cleaning up after a partially successful gntdev_mmap(), unmap the
successfully mapped grant pages otherwise Xen will kill the domain if
in debug mode (Attempt to implicitly unmap a granted PTE) or Linux will
kill the process and emit "BUG: Bad page map in process" if Xen is in
release mode.

This is only needed when use_ptemod is true because gntdev_put_map()
will unmap grant pages itself when use_ptemod is false.

Signed-off-by: Ross Lagerwall 


Reviewed-by: Boris Ostrovsky 

although I wonder whether it may be possible to have gntdev_put_map() 
figure whether to unmap the pages if use_ptemod is set.



---
  drivers/xen/gntdev.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index d3391a1..bd56653 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -1071,8 +1071,10 @@ static int gntdev_mmap(struct file *flip, struct 
vm_area_struct *vma)
  out_unlock_put:
mutex_unlock(>lock);
  out_put_map:
-   if (use_ptemod)
+   if (use_ptemod) {
map->vma = NULL;
+   unmap_grant_pages(map, 0, map->count);
+   }
gntdev_put_map(priv, map);
return err;
  }



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] xen/gntdev: Fix off-by-one error when unmapping with holes

2018-01-09 Thread Boris Ostrovsky



On 01/09/2018 07:10 AM, Ross Lagerwall wrote:

If the requested range has a hole, the calculation of the number of
pages to unmap is off by one. Fix it.

Signed-off-by: Ross Lagerwall 


Reviewed-by: Boris Ostrovsky 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 1/8] x86/domctl: introduce a pair of hypercall to set and get cpu topology

2018-01-09 Thread Chao Gao
On Tue, Jan 09, 2018 at 12:18:13PM -0500, Daniel De Graaf wrote:
>On 01/09/2018 04:06 AM, Chao Gao wrote:
>> On Mon, Jan 08, 2018 at 01:14:44PM -0500, Daniel De Graaf wrote:
>> > On 01/07/2018 11:01 PM, Chao Gao wrote:
>> > > Define interface, structures and hypercalls for toolstack to build
>> > > cpu topology and for guest that will retrieve it [1].
>> > > Two subop hypercalls introduced by this patch:
>> > > XEN_DOMCTL_set_cpu_topology to define cpu topology information per domain
>> > > and XENMEM_get_cpu_topology to retrieve cpu topology information.
>> > > 
>> > > [1]: during guest creation, those information helps hvmloader to build 
>> > > ACPI.
>> > > 
>> > > Signed-off-by: Chao Gao 
>> > 
>> > When adding new XSM controls for use by device models, you also
>> > need to add the permissions to the device_model macro defined in
>> > tools/flask/policy/modules/xen.if.  If domains need to call this
>> > function on themselves (is this only true for get?), you will also
>> > need to add it to declare_domain_common.
>> > 
>> 
>> Hi, Daniel.
>> 
>> Yes. XENMEM_get_cpu_topology will be called by the domain itself.
>> And Both get and set will be called by dom0 when creating one domain.
>> So I need:
>> 1. add *set* and *get* to create_domain_common.
>> 2. add *set* to declare_domain_common.
>> 
>> Is it right?
>> 
>> Thanks
>> Chao
>
>It sounds like you need to add get to declare_domain_common (not set)
>because the domain only needs to invoke this on itself.  If the device
>model doesn't need to use these hypercalls (would guest cpu hotplug or
>similar things need them?), then that's all you need to add.

Got it. I will first recognize whether device model needs these
hypercalls. If yes, make changes to macro device_model accordingly.

Thanks
chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [qemu-upstream-unstable test] 117731: trouble: broken/fail/pass

2018-01-09 Thread osstest service owner
flight 117731 qemu-upstream-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/117731/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-amd64-pvgrub broken
 test-amd64-i386-libvirt  broken
 test-amd64-i386-libvirt   4 host-install(4)broken REGR. vs. 116133

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-amd64-pvgrub  4 host-install(4)broken like 115739
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 116133
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 116133
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 116133
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 116133
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 116133
 test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvhv2-amd 12 guest-start  fail  never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass

version targeted for testing:
 qemuu2b033e396f4fa0981bae1213cdacd15775655a97
baseline version:
 qemuub79708a8ed1b3d18bee67baeaf33b3fa529493e2

Last test of basis   116133  2017-11-13 06:47:58 Z   57 days
Testing same since   117731  2018-01-08 18:12:06 Z1 days1 attempts


People who touched revisions under test:
  "Daniel P. Berrange" 
  Aaron Lindsay 
  Alberto Garcia 
  Aleksandr Bezzubikov 
  Alex Bennée 
  Alex Williamson 
  Alexander Graf 
  Alexey Kardashevskiy 
  Alexey Perevalov 
  Alistair Francis 

[Xen-devel] [PATCH v3 18/24] vixen: Introduce ECS_PROXY for event channel proxying

2018-01-09 Thread Anthony Liguori
From: Jan H. Schönherr 

Previously, we would keep proxied event channels as ECS_INTERDOMAIN
channel around. This works for most things, but has the problem
that EVTCHNOP_status is broken, and that EVTCHNOP_close does not
mark an event channel as free.

Introduce a separate ECS_PROXY to denote event channels that are
forwarded to the hypervisor we're running under.

This makes the code more readable in many places.

Signed-off-by: Jan H. Schönherr 
Signed-off-by: Anthony Liguori 
---
 xen/common/event_channel.c | 87 --
 xen/include/xen/event.h|  3 ++
 xen/include/xen/sched.h|  1 +
 3 files changed, 81 insertions(+), 10 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index c69f9db..85ff7e0 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define ERROR_EXIT(_errno)  \
 do {\
@@ -156,25 +157,25 @@ static void free_evtchn_bucket(struct domain *d, struct 
evtchn *bucket)
 xfree(bucket);
 }
 
-static int get_free_port(struct domain *d)
+static int allocate_port(struct domain *d, int port)
 {
 struct evtchn *chn;
 struct evtchn **grp;
-intport;
 
 if ( d->is_dying )
 return -EINVAL;
 
-for ( port = 0; port_is_valid(d, port); port++ )
+if ( port_is_valid(d, port) )
 {
 if ( port > d->max_evtchn_port )
 return -ENOSPC;
 if ( evtchn_from_port(d, port)->state == ECS_FREE
  && !evtchn_port_is_busy(d, port) )
 return port;
+return -EINVAL;
 }
 
-if ( port == d->max_evtchns || port > d->max_evtchn_port )
+if ( port >= d->max_evtchns || port > d->max_evtchn_port )
 return -ENOSPC;
 
 if ( !group_from_port(d, port) )
@@ -185,16 +186,59 @@ static int get_free_port(struct domain *d)
 group_from_port(d, port) = grp;
 }
 
-chn = alloc_evtchn_bucket(d, port);
-if ( !chn )
-return -ENOMEM;
-bucket_from_port(d, port) = chn;
+while ( d->valid_evtchns <= port )
+{
+chn = alloc_evtchn_bucket(d, d->valid_evtchns);
+if ( !chn )
+return -ENOMEM;
+bucket_from_port(d, d->valid_evtchns) = chn;
 
-write_atomic(>valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
+write_atomic(>valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
+}
 
 return port;
 }
 
+static int get_free_port(struct domain *d)
+{
+int port;
+
+for ( port = 0; port_is_valid(d, port); port++ )
+{
+if ( port > d->max_evtchn_port )
+return -ENOSPC;
+if ( evtchn_from_port(d, port)->state == ECS_FREE
+ && !evtchn_port_is_busy(d, port) )
+break;
+}
+
+return allocate_port(d, port);
+}
+
+int evtchn_alloc_proxy(struct domain *d, int port, u8 ecs)
+{
+struct evtchn *chn;
+int rc;
+
+if ( !is_vixen() )
+return -ENOSYS;
+
+rc = allocate_port(d, port);
+if ( rc < 0 )
+return rc;
+
+chn = evtchn_from_port(d, port);
+spin_lock(>lock);
+chn->state = ECS_PROXY;
+evtchn_port_init(d, chn);
+
+if ( ecs == ECS_INTERDOMAIN )
+evtchn_port_set_pending(d, chn->notify_vcpu_id, chn);
+spin_unlock(>lock);
+
+return 0;
+}
+
 static void free_evtchn(struct domain *d, struct evtchn *chn)
 {
 /* Clear pending event to avoid unexpected behavior on re-bind. */
@@ -628,6 +672,9 @@ static long evtchn_close(struct domain *d1, int port1, 
bool_t guest)
 
 goto out;
 
+case ECS_PROXY:
+break;
+
 default:
 BUG();
 }
@@ -690,6 +737,14 @@ int evtchn_send(struct domain *ld, unsigned int lport)
 case ECS_UNBOUND:
 /* silently drop the notification */
 break;
+case ECS_PROXY:
+ret = -EINVAL;
+if ( is_vixen() )
+{
+struct evtchn_send send = { .port = lport };
+ret = HYPERVISOR_event_channel_op(EVTCHNOP_send, );
+}
+break;
 default:
 ret = -EINVAL;
 }
@@ -892,6 +947,10 @@ static long evtchn_status(evtchn_status_t *status)
 case ECS_IPI:
 status->status = EVTCHNSTAT_ipi;
 break;
+case ECS_PROXY:
+BUG_ON(!is_vixen());
+rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, status);
+break;
 default:
 BUG();
 }
@@ -944,6 +1003,14 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int 
vcpu_id)
 case ECS_INTERDOMAIN:
 chn->notify_vcpu_id = vcpu_id;
 break;
+case ECS_PROXY:
+if ( is_vixen() && vixen_has_per_cpu_notifications() )
+{
+struct evtchn_bind_vcpu bind = { .port = port, .vcpu = vcpu_id };
+HYPERVISOR_event_channel_op(EVTCHNOP_bind_vcpu, 

[Xen-devel] [PATCH v3 09/24] vixen: modify the e820 table to advertise HVM special pages as RAM

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

In order to be able to assign the Xenstore page into the Vixen guest,
we need struct page_info's to exist.  We do this by modifying the
e820 table early in boot and then using the badpages handling to
prevent these pages from being added to the xenheap.

Since these pages exist in a somewhat weird state in Xen, we need
to relax permission checking too in order to be able to assign them
to the guest.

Signed-off-by: Anthony Liguori 
---
v1 -> v2
 - lay the ground work to use hvm_info_table to determine range
---
 xen/arch/x86/e820.c   | 11 +++
 xen/arch/x86/guest/vixen.c|  9 +
 xen/arch/x86/mm.c |  3 ++-
 xen/common/page_alloc.c   |  7 +++
 xen/include/asm-x86/guest/vixen.h |  2 ++
 5 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
index 7c572ba..78ab8db 100644
--- a/xen/arch/x86/e820.c
+++ b/xen/arch/x86/e820.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * opt_mem: Limit maximum address of physical RAM.
@@ -698,6 +699,16 @@ unsigned long __init init_e820(const char *str, struct 
e820map *raw)
 print_e820_memory_map(raw->map, raw->nr_map);
 }
 
+if ( is_vixen() )
+{
+unsigned long start_pfn, end_pfn;
+
+vixen_get_reserved_mem(_pfn, _pfn);
+
+/* Pretend that passed through special pages are RAM */
+e820_change_range_type(raw, start_pfn << XEN_PAGE_SHIFT,
+   end_pfn << XEN_PAGE_SHIFT, E820_RESERVED, 
E820_RAM);
+}
 machine_specific_memory_setup(raw);
 
 printk("%s RAM map:\n", str);
diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index c0a81dd..cacbe69 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -23,6 +23,7 @@
 
 static int in_vixen;
 static int vixen_domid = 1;
+static uint32_t vixen_reserved_mem_pgstart = 0xfeff;
 
 integer_param("vixen_domid", vixen_domid);
 
@@ -35,3 +36,11 @@ int vixen_get_domid(void)
 {
 return vixen_domid;
 }
+
+void vixen_get_reserved_mem(unsigned long *start_pfn, unsigned long *end_pfn)
+{
+*start_pfn = vixen_reserved_mem_pgstart >> XEN_PAGE_SHIFT;
+
+/* This is part of the Xen ABI */
+*end_pfn   = 0x10;
+}
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a56f875..f0260ea 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -122,6 +122,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -945,7 +946,7 @@ get_page_from_l1e(
 case 0:
 break;
 case 1:
-if ( !is_hardware_domain(l1e_owner) )
+if ( !is_vixen() && !is_hardware_domain(l1e_owner) )
 break;
 /* fallthrough */
 case -1:
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index c0c2d82..35a433d 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -303,6 +303,13 @@ void __init init_boot_pages(paddr_t ps, paddr_t pe)
 badpage++;
 }
 }
+
+if ( is_vixen() ) {
+unsigned long start_pfn, end_pfn;
+
+vixen_get_reserved_mem(_pfn, _pfn);
+bootmem_region_zap(start_pfn, end_pfn);
+}
 #endif
 
 /* Check new pages against the bad-page list. */
diff --git a/xen/include/asm-x86/guest/vixen.h 
b/xen/include/asm-x86/guest/vixen.h
index 4e80b76..fb8e871 100644
--- a/xen/include/asm-x86/guest/vixen.h
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -72,4 +72,6 @@ bool is_vixen(void);
 
 int vixen_get_domid(void);
 
+void vixen_get_reserved_mem(unsigned long *start_pfn, unsigned long *end_pfn);
+
 #endif
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 16/24] vixen: pass grant table operations through to the outer Xen

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

The grant table is a region of guest memory that contains GMFNs
which in PV are MFNs but are PFNs in HVM.  Since a Vixen guest MFN
is an HVM PFN, we can pass this table directly through to the outer
Xen which cuts down considerably on overhead.

We do not forward most of the hypercalls since we only intend on
Vixen to be used for normal guests, not driver domains.

Signed-off-by: Anthony Liguori 
---
v1 -> v2
 - move to using reserved memory space for grant table instead of heap
 - use a dispatch function instead of modifying all calls
---
 xen/arch/x86/guest/vixen.c |   4 ++
 xen/common/grant_table.c   | 101 +
 2 files changed, 105 insertions(+)

diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index 7c886a2..2437c92 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -22,10 +22,14 @@
 #include 
 #include 
 #include 
+#include 
+
+#define PCI_DEVICE_ID_XENSOURCE_PLATFORM   0x0001
 
 #define X86_HVM_END_SPECIAL_REGION  0xff000u
 
 #define SHARED_INFO_PFN(X86_HVM_END_SPECIAL_REGION + 0)
+#define GRANT_TABLE_PFN_0  (X86_HVM_END_SPECIAL_REGION + 1)
 
 static int in_vixen;
 static int vixen_domid = 1;
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 250450b..60a7941 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Per-domain grant information. */
 struct grant_table {
@@ -1801,6 +1802,56 @@ grant_table_init(struct domain *d, struct grant_table 
*gt,
 }
 
 static long
+vixen_gnttab_setup_table(
+XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
+{
+long rc;
+
+struct gnttab_setup_table op;
+xen_pfn_t *frame_list = NULL;
+XEN_GUEST_HANDLE(xen_pfn_t) old_frame_list;
+
+if ( count != 1 )
+return -EINVAL;
+
+if ( unlikely(copy_from_guest(, uop, 1) != 0) )
+{
+gdprintk(XENLOG_INFO, "Fault while reading gnttab_setup_table_t.\n");
+return -EFAULT;
+}
+
+if ( op.nr_frames > 0 ) {
+frame_list = xzalloc_array(xen_pfn_t, op.nr_frames);
+if ( frame_list == NULL )
+return -ENOMEM;
+}
+
+old_frame_list = op.frame_list;
+op.frame_list.p = frame_list;
+
+rc = HYPERVISOR_grant_table_op(GNTTABOP_setup_table, , count);
+op.frame_list = old_frame_list;
+
+if ( rc >= 0 ) {
+if ( op.status == 0 && op.nr_frames &&
+ copy_to_guest(old_frame_list, frame_list, op.nr_frames) != 0 ) {
+rc = -EFAULT;
+goto out;
+}
+
+if ( unlikely(copy_to_guest(uop, , 1)) != 0 ) {
+rc = -EFAULT;
+goto out;
+}
+}
+
+ out:
+xfree(frame_list);
+
+return rc;
+}
+
+static long
 gnttab_setup_table(
 XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count,
 unsigned int limit_max)
@@ -1892,6 +1943,26 @@ gnttab_setup_table(
 }
 
 static long
+vixen_gnttab_query_size(
+XEN_GUEST_HANDLE_PARAM(gnttab_query_size_t) uop, unsigned int count)
+{
+struct gnttab_query_size op;
+int rc;
+
+if ( count != 1 )
+return -EINVAL;
+
+if ( unlikely(copy_from_guest(, uop, 1)) != 0)
+return -EFAULT;
+
+rc = HYPERVISOR_grant_table_op(GNTTABOP_query_size, , count);
+if (rc == 0 && unlikely(__copy_to_guest(uop, , 1)) )
+rc = -EFAULT;
+
+return rc;
+}
+
+static long
 gnttab_query_size(
 XEN_GUEST_HANDLE_PARAM(gnttab_query_size_t) uop, unsigned int count)
 {
@@ -3311,6 +3382,33 @@ 
gnttab_cache_flush(XEN_GUEST_HANDLE_PARAM(gnttab_cache_flush_t) uop,
 return 0;
 }
 
+static long
+vixen_do_grant_table_op(
+unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop, unsigned int count)
+{
+long rc;
+
+rc = -EFAULT;
+switch ( cmd )
+{
+case GNTTABOP_setup_table:
+rc = vixen_gnttab_setup_table(
+guest_handle_cast(uop, gnttab_setup_table_t), count);
+break;
+
+case GNTTABOP_query_size:
+rc = vixen_gnttab_query_size(
+guest_handle_cast(uop, gnttab_query_size_t), count);
+break;
+
+default:
+rc = -ENOSYS;
+break;
+}
+
+return rc;
+ }
+
 long
 do_grant_table_op(
 unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop, unsigned int count)
@@ -3324,6 +3422,9 @@ do_grant_table_op(
 if ( (cmd &= GNTTABOP_CMD_MASK) != GNTTABOP_cache_flush && opaque_in )
 return -EINVAL;
 
+if ( is_vixen() )
+return vixen_do_grant_table_op(cmd, uop, count);
+
 rc = -EFAULT;
 switch ( cmd )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 13/24] vixen: Use SCHEDOP_shutdown to shutdown the machine

2018-01-09 Thread Anthony Liguori
From: Jan H. Schönherr 

While the hwdom_shutdown() is able to reboot the system, it fails to
properly power it off. With SCHEDOP_shutdown, we delegate the problem.

Signed-off-by: Jan H. Schönherr 
---
 xen/common/domain.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index b4d679e..ede377c 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Linux config option: propageted to domain0 */
 /* xen_processor_pmbits: xen control Cx, Px, ... */
@@ -693,6 +694,17 @@ void __domain_crash_synchronous(void)
 }
 
 
+static void vixen_shutdown(u8 reason)
+{
+struct sched_shutdown sched_shutdown = { .reason = reason };
+
+if (!opt_noreboot)
+HYPERVISOR_sched_op(SCHEDOP_shutdown, _shutdown);
+
+/* Fallback, in case the hypercall fails */
+hwdom_shutdown(reason);
+}
+ 
 void domain_shutdown(struct domain *d, u8 reason)
 {
 struct vcpu *v;
@@ -703,6 +715,8 @@ void domain_shutdown(struct domain *d, u8 reason)
 d->shutdown_code = reason;
 reason = d->shutdown_code;
 
+if ( is_vixen() )
+vixen_shutdown(reason);
 if ( is_hardware_domain(d) )
 hwdom_shutdown(reason);
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 24/24] xen/pvshim: memory hotplug

2018-01-09 Thread Anthony Liguori
From: Roger Pau Monne 

Signed-off-by: Roger Pau Monné 
Signed-off-by: Anthony Liguori 
---
v2 -> v3
 - adapted for Vixen
---
 xen/arch/x86/guest/vixen.c| 110 ++
 xen/common/memory.c   |  14 +
 xen/include/asm-x86/guest/vixen.h |   4 ++
 3 files changed, 128 insertions(+)

diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index 7e367ef..4d59cd8 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -48,6 +48,9 @@ static volatile struct xencons_interface *vixen_xencons_iface;
 static uint16_t vixen_xencons_port;
 static spinlock_t vixen_xencons_lock;
 
+static PAGE_LIST_HEAD(balloon);
+static DEFINE_SPINLOCK(balloon_lock);
+
 integer_param("vixen_domid", vixen_domid);
 boolean_param("vixen_ptver", vixen_ptver);
 
@@ -558,3 +561,110 @@ vixen_transform(struct domain *dom0,
share_xen_page_with_guest(mfn_to_page(xatp.gpfn), dom0, 
XENSHARE_writable);
 }
 }
+
+static unsigned long batch_memory_op(int cmd, struct page_list_head *list)
+{
+struct xen_memory_reservation xmr = {
+.domid = DOMID_SELF,
+};
+unsigned long pfns[64];
+struct page_info *pg;
+unsigned long done = 0;
+
+set_xen_guest_handle(xmr.extent_start, pfns);
+page_list_for_each ( pg, list )
+{
+pfns[xmr.nr_extents++] = page_to_mfn(pg);
+if ( xmr.nr_extents == ARRAY_SIZE(pfns) || !page_list_next(pg, list) )
+{
+long nr = xen_hypercall_memory_op(cmd, );
+
+done += nr > 0 ? nr : 0;
+if ( nr != xmr.nr_extents )
+break;
+xmr.nr_extents = 0;
+}
+}
+
+return done;
+}
+
+void vixen_online_memory(unsigned int nr, unsigned int order)
+{
+struct page_info *page, *tmp;
+PAGE_LIST_HEAD(list);
+
+spin_lock(_lock);
+page_list_for_each_safe ( page, tmp,  )
+{
+if ( page->v.free.order != order )
+continue;
+
+page_list_del(page, );
+page_list_add_tail(page, );
+if ( !--nr )
+break;
+}
+spin_unlock(_lock);
+
+if ( nr )
+gprintk(XENLOG_WARNING,
+"failed to allocate %u extents of order %u for onlining\n",
+nr, order);
+
+nr = batch_memory_op(XENMEM_populate_physmap, );
+while ( nr-- )
+{
+BUG_ON((page = page_list_remove_head()) == NULL);
+free_domheap_pages(page, order);
+}
+
+if ( !page_list_empty() )
+{
+gprintk(XENLOG_WARNING,
+"failed to online some of the memory regions\n");
+spin_lock(_lock);
+while ( (page = page_list_remove_head()) != NULL )
+page_list_add_tail(page, );
+spin_unlock(_lock);
+}
+}
+
+void vixen_offline_memory(unsigned int nr, unsigned int order)
+{
+struct page_info *page;
+PAGE_LIST_HEAD(list);
+
+while ( nr-- )
+{
+page = alloc_domheap_pages(NULL, order, 0);
+if ( !page )
+break;
+
+page_list_add_tail(page, );
+page->v.free.order = order;
+}
+
+if ( nr + 1 )
+gprintk(XENLOG_WARNING,
+"failed to reserve %u extents of order %u for offlining\n",
+nr + 1, order);
+
+
+nr = batch_memory_op(XENMEM_decrease_reservation, );
+spin_lock(_lock);
+while ( nr-- )
+{
+BUG_ON((page = page_list_remove_head()) == NULL);
+page_list_add_tail(page, );
+}
+spin_unlock(_lock);
+
+if ( !page_list_empty() )
+{
+gprintk(XENLOG_WARNING,
+"failed to offline some of the memory regions\n");
+while ( (page = page_list_remove_head()) != NULL )
+free_domheap_pages(page, order);
+}
+}
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 5a1508a..5a7ecf0 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -29,6 +29,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_X86
+#include 
+#endif
+
 struct memop_args {
 /* INPUT */
 struct domain *domain; /* Domain to be affected. */
@@ -993,6 +997,11 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return start_extent;
 }
 
+#ifdef CONFIG_X86
+if ( is_vixen() && op != XENMEM_decrease_reservation && !args.nr_done )
+vixen_online_memory(args.nr_extents, args.extent_order);
+#endif
+
 switch ( op )
 {
 case XENMEM_increase_reservation:
@@ -1015,6 +1024,11 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 __HYPERVISOR_memory_op, "lh",
 op | (rc << MEMOP_EXTENT_SHIFT), arg);
 
+#ifdef CONFIG_X86
+if ( is_vixen() && op == XENMEM_decrease_reservation )
+vixen_offline_memory(args.nr_extents, args.extent_order);
+#endif
+
 break;
 
 case XENMEM_exchange:
diff --git 

[Xen-devel] [PATCH v3 17/24] vixen: setup infrastructure to receive event channel notifications

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

This patch registers an interrupt handler using either an INTx
interrupt from the platform PCI device, CALLBACK_IRQ vector
delivery, or evtchn_upcall_vector depending on what the parent
hypervisor supports.

The event channel polling code comes from Linux but uses the
internal infrastructure for delivery.

Finally, this infrastructure has to be initialized per-VCPU so
hook the appropriate place for that.

Signed-off-by: KarimAllah Ahmed 
Signed-off-by: Jan H. Schönherr 
Signed-off-by: Anthony Liguori 
---
v1 -> v2
 - coding style
 - introduce #defines for the PCI vendor and device id
 - initialize grant table memory space
---
 xen/arch/x86/domain.c |   3 +
 xen/arch/x86/guest/vixen.c| 302 ++
 xen/arch/x86/setup.c  |   3 +
 xen/include/asm-x86/guest/vixen.h |   6 +
 xen/include/xen/pci_ids.h |   2 +
 5 files changed, 316 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index da1bf1a..3e9c5be 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1147,6 +1147,9 @@ int arch_set_info_guest(
 
 update_cr3(v);
 
+if ( is_vixen() )
+vixen_vcpu_initialize(v);
+
  out:
 if ( flags & VGCF_online )
 clear_bit(_VPF_down, >pause_flags);
diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index 2437c92..59faa0c 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -23,6 +23,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #define PCI_DEVICE_ID_XENSOURCE_PLATFORM   0x0001
 
@@ -36,6 +39,10 @@ static int vixen_domid = 1;
 static uint32_t vixen_reserved_mem_pgstart = 0xfeff;
 static shared_info_t *global_si;
 static bool vixen_ptver;
+static bool vixen_per_cpu_notifications = true;
+static uint8_t vixen_evtchn_vector;
+static bool vixen_needs_apic_ack = true;
+struct irqaction vixen_irqaction;
 
 integer_param("vixen_domid", vixen_domid);
 boolean_param("vixen_ptver", vixen_ptver);
@@ -153,3 +160,298 @@ bool vixen_passthru_version(void)
 {
 return is_vixen() && vixen_ptver;
 }
+
+/*
+ * Make a bitmask (i.e. unsigned long *) of a xen_ulong_t
+ * array. Primarily to avoid long lines (hence the terse name).
+ */
+#define BM(x) (unsigned long *)(x)
+/* Find the first set bit in a evtchn mask */
+#define EVTCHN_FIRST_BIT(w) find_first_bit(BM(&(w)), BITS_PER_XEN_ULONG)
+
+/*
+ * Mask out the i least significant bits of w
+ */
+#define MASK_LSBS(w, i) (w & ((~((xen_ulong_t)0UL)) << i))
+
+static DEFINE_PER_CPU(unsigned int, current_word_idx);
+static DEFINE_PER_CPU(unsigned int, current_bit_idx);
+
+static inline xen_ulong_t active_evtchns(unsigned int cpu,
+ shared_info_t *sh,
+ unsigned int idx)
+{
+return sh->native.evtchn_pending[idx] &
+   ~sh->native.evtchn_mask[idx];
+}
+
+static void vixen_evtchn_poll_one(size_t cpu)
+{
+shared_info_t *s = global_si;
+struct vcpu_info *vcpu_info = >native.vcpu_info[cpu];
+xen_ulong_t pending_words;
+xen_ulong_t pending_bits;
+int start_word_idx, start_bit_idx;
+int word_idx, bit_idx, i;
+
+/*
+ * Master flag must be cleared /before/ clearing
+ * selector flag. xchg_xen_ulong must contain an
+ * appropriate barrier.
+ */
+pending_words = xchg(_info->evtchn_pending_sel, 0);
+
+start_word_idx = this_cpu(current_word_idx);
+start_bit_idx = this_cpu(current_bit_idx);
+
+word_idx = start_word_idx;
+
+for ( i = 0; pending_words != 0; i++ )
+{
+xen_ulong_t words;
+
+words = MASK_LSBS(pending_words, word_idx);
+
+/*
+ * If we masked out all events, wrap to beginning.
+ */
+if ( words == 0 )
+{
+word_idx = 0;
+bit_idx = 0;
+continue;
+}
+word_idx = EVTCHN_FIRST_BIT(words);
+
+pending_bits = active_evtchns(cpu, s, word_idx);
+bit_idx = 0; /* usually scan entire word from start */
+/*
+ * We scan the starting word in two parts.
+ *
+ * 1st time: start in the middle, scanning the
+ * upper bits.
+ *
+ * 2nd time: scan the whole word (not just the
+ * parts skipped in the first pass) -- if an
+ * event in the previously scanned bits is
+ * pending again it would just be scanned on
+ * the next loop anyway.
+ */
+if ( word_idx == start_word_idx )
+{
+if ( i == 0 )
+bit_idx = start_bit_idx;
+}
+
+do
+{
+struct evtchn *chn;
+xen_ulong_t bits;
+int port;
+
+bits = MASK_LSBS(pending_bits, bit_idx);
+
+/* If we masked out all events, move on. */
+if ( bits == 0 )
+

[Xen-devel] [PATCH v3 22/24] vixen: dom0 builder support

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

The dom0 builder requires a number of modifications in order to be
able to launch unprivileged guests.  The console and store pages
must be mapped in a specific location within the guest's initial
page table.

We also have to setup the start info to be what's expected for
unprivileged guests and supress the normal logic to give dom0
increased permissions.

We have to pass around the console and store pages which involves
touching a number of places including the PVH builder.

Signed-off-by: Anthony Liguori 
---
v1 -> v2
 - panic in the event of errors
---
 xen/arch/x86/dom0_build.c |  7 +++-
 xen/arch/x86/guest/vixen.c| 64 +++-
 xen/arch/x86/hvm/dom0_build.c |  4 +-
 xen/arch/x86/pv/dom0_build.c  | 77 ++-
 xen/arch/x86/setup.c  | 12 +-
 xen/include/asm-x86/dom0_build.h  |  8 +++-
 xen/include/asm-x86/guest/vixen.h |  5 ++-
 xen/include/asm-x86/setup.h   |  4 +-
 8 files changed, 161 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 88810db..df9d3f8 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -464,7 +464,9 @@ int __init dom0_setup_permissions(struct domain *d)
 int __init construct_dom0(struct domain *d, const module_t *image,
   unsigned long image_headroom, module_t *initrd,
   void *(*bootstrap_map)(const module_t *),
-  char *cmdline)
+  char *cmdline,
+  xen_pfn_t store_mfn, uint32_t store_evtchn,
+  xen_pfn_t console_mfn, uint32_t console_evtchn)
 {
 int rc;
 
@@ -484,7 +486,8 @@ int __init construct_dom0(struct domain *d, const module_t 
*image,
 #endif
 
 rc = (is_hvm_domain(d) ? dom0_construct_pvh : dom0_construct_pv)
- (d, image, image_headroom, initrd, bootstrap_map, cmdline);
+ (d, image, image_headroom, initrd, bootstrap_map, cmdline,
+  store_mfn, store_evtchn, console_mfn, console_evtchn);
 if ( rc )
 return rc;
 
diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index d871218..7e367ef 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -345,6 +345,23 @@ bool vixen_ring_process(uint16_t port)
 return true;
 }
 
+static int hvm_get_parameter(int idx, uint64_t *value)
+{
+struct xen_hvm_param xhv;
+int r;
+
+xhv.domid = DOMID_SELF;
+xhv.index = idx;
+r = HYPERVISOR_hvm_op(HVMOP_get_param, );
+if (r < 0) {
+printk("Cannot get hvm parameter %d: %d!\n",
+   idx, r);
+return r;
+}
+*value = xhv.value;
+return r;
+}
+
 static int hvm_set_parameter(int idx, uint64_t value)
 {
 struct xen_hvm_param xhv;
@@ -468,10 +485,55 @@ bool vixen_has_per_cpu_notifications(void)
 }
 
 void __init
-vixen_transform(struct domain *dom0)
+vixen_transform(struct domain *dom0,
+xen_pfn_t *pstore_mfn, uint32_t *pstore_evtchn,
+xen_pfn_t *pconsole_mfn, uint32_t *pconsole_evtchn)
 {
 struct xen_add_to_physmap xatp;
 int i;
+uint64_t v = 0;
+long rc;
+struct evtchn_unmask unmask;
+struct evtchn_alloc_unbound alloc;
+
+/* Setup Xenstore */
+hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, );
+*pstore_evtchn = unmask.port = v;
+HYPERVISOR_event_channel_op(EVTCHNOP_unmask, );
+
+hvm_get_parameter(HVM_PARAM_STORE_PFN, );
+*pstore_mfn = v;
+
+printk("Vixen Xenstore evtchn is %d, pfn is 0x%" PRIx64 "\n",
+   *pstore_evtchn, *pstore_mfn);
+
+/* Setup Xencons */
+alloc.dom = DOMID_SELF;
+alloc.remote_dom = DOMID_SELF;
+
+rc = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound, );
+if ( rc )
+{
+printk("Failed to alloc unbound event channel: %ld\n", rc);
+*pconsole_evtchn = 0;
+*pconsole_mfn = 0;
+}
+else
+{
+void *console_data;
+
+console_data = alloc_xenheap_page();
+
+*pconsole_evtchn = alloc.port;
+*pconsole_mfn = virt_to_mfn(console_data);
+
+memset(console_data, 0, 4096);
+vixen_xencons_iface = console_data;
+vixen_xencons_port = alloc.port;
+}
+
+printk("Vixen Xencons evtchn is %d, pfn is 0x%" PRIx64 "\n",
+   *pconsole_evtchn, *pconsole_mfn);
 
 /* Setup event channel forwarding */
 alloc_direct_apic_vector(_evtchn_vector, vixen_evtchn_notify);
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 4338965..b2ca64f 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1064,7 +1064,9 @@ int __init dom0_construct_pvh(struct domain *d, const 
module_t *image,
   unsigned long image_headroom,
   module_t *initrd,
   

[Xen-devel] [PATCH v3 07/24] vixen: introduce is_vixen() to allow altering behavior

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

Vixen (Virtualized Xen) is a paravirtual mode of Xen where
paravirtual I/O is passed through from the parent hypervisor
all the way through the dom0 guest.  The dom0 guest is also
deprivileged and renumbered to give the appearance that it
is running as a normal PV guest.

Signed-off-by: Anthony Liguori 
---
v1 -> v2
 - ARM stubs
---
 xen/arch/x86/guest/Makefile   |  1 +
 xen/arch/x86/guest/vixen.c| 30 +++
 xen/include/asm-arm/guest/vixen.h | 81 +++
 xen/include/asm-x86/guest/vixen.h | 73 +++
 4 files changed, 185 insertions(+)
 create mode 100644 xen/arch/x86/guest/vixen.c
 create mode 100644 xen/include/asm-arm/guest/vixen.h
 create mode 100644 xen/include/asm-x86/guest/vixen.h

diff --git a/xen/arch/x86/guest/Makefile b/xen/arch/x86/guest/Makefile
index c5d5188..1c9cd7d 100644
--- a/xen/arch/x86/guest/Makefile
+++ b/xen/arch/x86/guest/Makefile
@@ -1,2 +1,3 @@
 obj-y += hypercall_page.o
 obj-y += xen.o
+obj-y += vixen.o
diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
new file mode 100644
index 000..d82e68f
--- /dev/null
+++ b/xen/arch/x86/guest/vixen.c
@@ -0,0 +1,30 @@
+/**
+ * arch/x86/guest/vixen.c
+ *
+ * Support for detecting and running under Xen HVM.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see .
+ *
+ * Copyright 2017-2018 Amazon.com, Inc. or its affiliates.
+ */
+
+#include 
+
+static int in_vixen;
+
+bool is_vixen(void)
+{
+return in_vixen > 0;
+}
+
diff --git a/xen/include/asm-arm/guest/vixen.h 
b/xen/include/asm-arm/guest/vixen.h
new file mode 100644
index 000..ade6724
--- /dev/null
+++ b/xen/include/asm-arm/guest/vixen.h
@@ -0,0 +1,81 @@
+/**
+ * include/asm-x86/guest.h
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see .
+ *
+ * Copyright 2018 Amazon.com, Inc. or its affiliates.
+ */
+
+#ifndef __ARM_GUEST_VIXEN_H__
+#define __ARM_GUEST_VIXEN_H__
+
+#include 
+
+static inline int
+HYPERVISOR_xen_version(int cmd, void *arg)
+{
+return -ENOSYS;
+}
+
+static inline unsigned long
+HYPERVISOR_hvm_op(int op, void *arg)
+{
+return -ENOSYS;
+}
+
+static inline int
+HYPERVISOR_grant_table_op(unsigned int cmd, void *uop, unsigned int count)
+{
+return -ENOSYS;
+}
+
+static inline long
+HYPERVISOR_memory_op(unsigned int cmd, void *arg)
+{
+return -ENOSYS;
+}
+
+static inline int
+HYPERVISOR_event_channel_op(int cmd, void *arg)
+{
+return -ENOSYS;
+}
+
+static inline int
+HYPERVISOR_sched_op(int cmd, void *arg)
+{
+return -ENOSYS;
+}
+
+static inline int
+HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args)
+{
+return -ENOSYS;
+}
+
+static inline bool is_vixen(void)
+{
+return false;
+}
+
+static inline bool vixen_has_per_cpu_notifications(void)
+{
+return false;
+}
+
+static inline bool vixen_ring_process(uint16_t port)
+{
+return false;
+}
+
+#endif
diff --git a/xen/include/asm-x86/guest/vixen.h 
b/xen/include/asm-x86/guest/vixen.h
new file mode 100644
index 000..be90c46
--- /dev/null
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -0,0 +1,73 @@
+/**
+ * include/asm-x86/guest/vixen.h
+ *
+ * Support for detecting and running under Xen HVM.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; 

[Xen-devel] [PATCH v3 03/24] x86/guest: Hypercall support

2018-01-09 Thread Anthony Liguori
From: Andrew Cooper 

Signed-off-by: Andrew Cooper 
---
 xen/arch/x86/guest/Makefile   |  1 +
 xen/arch/x86/guest/hypercall_page.S   | 79 ++
 xen/arch/x86/guest/xen.c  |  5 ++
 xen/arch/x86/xen.lds.S|  1 +
 xen/include/asm-x86/guest.h   |  1 +
 xen/include/asm-x86/guest/hypercall.h | 92 +++
 6 files changed, 179 insertions(+)
 create mode 100644 xen/arch/x86/guest/hypercall_page.S
 create mode 100644 xen/include/asm-x86/guest/hypercall.h

diff --git a/xen/arch/x86/guest/Makefile b/xen/arch/x86/guest/Makefile
index 7f67396..c5d5188 100644
--- a/xen/arch/x86/guest/Makefile
+++ b/xen/arch/x86/guest/Makefile
@@ -1 +1,2 @@
+obj-y += hypercall_page.o
 obj-y += xen.o
diff --git a/xen/arch/x86/guest/hypercall_page.S 
b/xen/arch/x86/guest/hypercall_page.S
new file mode 100644
index 000..fdd2e72
--- /dev/null
+++ b/xen/arch/x86/guest/hypercall_page.S
@@ -0,0 +1,79 @@
+#include 
+#include 
+#include 
+
+.section ".text.page_aligned", "ax", @progbits
+.p2align PAGE_SHIFT
+
+GLOBAL(hypercall_page)
+ /* Poisoned with `ret` for safety before hypercalls are set up. */
+.fill PAGE_SIZE, 1, 0xc3
+.type hypercall_page, STT_OBJECT
+.size hypercall_page, PAGE_SIZE
+
+/*
+ * Identify a specific hypercall in the hypercall page
+ * @param name Hypercall name.
+ */
+#define DECLARE_HYPERCALL(name)
 \
+.globl HYPERCALL_ ## name; 
 \
+.set   HYPERCALL_ ## name, hypercall_page + __HYPERVISOR_ ## name * 
32; \
+.type  HYPERCALL_ ## name, STT_FUNC;   
 \
+.size  HYPERCALL_ ## name, 32
+
+DECLARE_HYPERCALL(set_trap_table)
+DECLARE_HYPERCALL(mmu_update)
+DECLARE_HYPERCALL(set_gdt)
+DECLARE_HYPERCALL(stack_switch)
+DECLARE_HYPERCALL(set_callbacks)
+DECLARE_HYPERCALL(fpu_taskswitch)
+DECLARE_HYPERCALL(sched_op_compat)
+DECLARE_HYPERCALL(platform_op)
+DECLARE_HYPERCALL(set_debugreg)
+DECLARE_HYPERCALL(get_debugreg)
+DECLARE_HYPERCALL(update_descriptor)
+DECLARE_HYPERCALL(memory_op)
+DECLARE_HYPERCALL(multicall)
+DECLARE_HYPERCALL(update_va_mapping)
+DECLARE_HYPERCALL(set_timer_op)
+DECLARE_HYPERCALL(event_channel_op_compat)
+DECLARE_HYPERCALL(xen_version)
+DECLARE_HYPERCALL(console_io)
+DECLARE_HYPERCALL(physdev_op_compat)
+DECLARE_HYPERCALL(grant_table_op)
+DECLARE_HYPERCALL(vm_assist)
+DECLARE_HYPERCALL(update_va_mapping_otherdomain)
+DECLARE_HYPERCALL(iret)
+DECLARE_HYPERCALL(vcpu_op)
+DECLARE_HYPERCALL(set_segment_base)
+DECLARE_HYPERCALL(mmuext_op)
+DECLARE_HYPERCALL(xsm_op)
+DECLARE_HYPERCALL(nmi_op)
+DECLARE_HYPERCALL(sched_op)
+DECLARE_HYPERCALL(callback_op)
+DECLARE_HYPERCALL(xenoprof_op)
+DECLARE_HYPERCALL(event_channel_op)
+DECLARE_HYPERCALL(physdev_op)
+DECLARE_HYPERCALL(hvm_op)
+DECLARE_HYPERCALL(sysctl)
+DECLARE_HYPERCALL(domctl)
+DECLARE_HYPERCALL(kexec_op)
+DECLARE_HYPERCALL(tmem_op)
+DECLARE_HYPERCALL(xc_reserved_op)
+DECLARE_HYPERCALL(xenpmu_op)
+
+DECLARE_HYPERCALL(arch_0)
+DECLARE_HYPERCALL(arch_1)
+DECLARE_HYPERCALL(arch_2)
+DECLARE_HYPERCALL(arch_3)
+DECLARE_HYPERCALL(arch_4)
+DECLARE_HYPERCALL(arch_5)
+DECLARE_HYPERCALL(arch_6)
+DECLARE_HYPERCALL(arch_7)
+
+/*
+ * Local variables:
+ * tab-width: 8
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 9446a46..c5b4341 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -22,6 +22,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 #include 
@@ -29,6 +30,7 @@
 bool xen_guest;
 
 static uint32_t xen_cpuid_base;
+extern char hypercall_page[];
 
 static void __init find_xen_leaves(void)
 {
@@ -61,6 +63,9 @@ void __init probe_hypervisor(void)
 if ( !xen_cpuid_base )
 return;
 
+/* Fill the hypercall page. */
+wrmsrl(cpuid_ebx(xen_cpuid_base + 2), __pa(hypercall_page));
+
 xen_guest = true;
 }
 
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index d5e8821..dd0e1c5 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -59,6 +59,7 @@ SECTIONS
   .text : {
 _stext = .;/* Text and read-only data */
*(.text)
+   *(.text.page_aligned)
*(.text.cold)
*(.text.unlikely)
*(.fixup)
diff --git a/xen/include/asm-x86/guest.h b/xen/include/asm-x86/guest.h
index eb08434..70250b7 100644
--- a/xen/include/asm-x86/guest.h
+++ b/xen/include/asm-x86/guest.h
@@ -19,6 +19,7 @@
 #ifndef __X86_GUEST_H__
 #define __X86_GUEST_H__
 
+#include 
 #include 
 
 #endif /* __X86_GUEST_H__ */
diff --git a/xen/include/asm-x86/guest/hypercall.h 
b/xen/include/asm-x86/guest/hypercall.h
new file mode 100644
index 000..c460f59
--- /dev/null
+++ b/xen/include/asm-x86/guest/hypercall.h
@@ -0,0 +1,92 @@

[Xen-devel] [PATCH v3 19/24] vixen: Fix Vixen adaptation of send_global_virq()

2018-01-09 Thread Anthony Liguori
From: Jan H. Schönherr 

The function originally did the following unconditionally:

   send_guest_global_virq(global_virq_handlers[virq] ?: hardware_domain, virq);

The new variant should reflect the non-Vixen case correctly.

Signed-off-by: Jan H. Schönherr 
Signed-off-by: Anthony Liguori 
---
 xen/common/event_channel.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 85ff7e0..3dee73b 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -840,7 +840,10 @@ void send_global_virq(uint32_t virq)
 ASSERT(virq < NR_VIRQS);
 ASSERT(virq_is_global(virq));
 
-send_guest_global_virq(global_virq_handlers[virq] ?: hardware_domain, 
virq);
+if ( global_virq_handlers[virq] )
+send_guest_global_virq(global_virq_handlers[virq], virq);
+else if ( !is_vixen() )
+send_guest_global_virq(hardware_domain, virq);
 }
 
 int set_global_virq_handler(struct domain *d, uint32_t virq)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 21/24] vixen: provide Xencons implementation

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

Our initial approach exposed the console ring directly to guests
which worked well except for the fact that very old versions of Xen
did not support console ring for HVM guests.  It also proved to
be complicated from a management tool perspective since both the
serial console and the paravirt console for HVM guests produced
output.

Having a simple xencons implementation helps simplify using Vixen
as a management tool no longer needs to care about whether or not
this mode is enabled.

In order to output to the console without the '(Xen)' adornment,
we introduce a new entry point into the console code too.

Signed-off-by: Anthony Liguori 
---
 xen/arch/x86/guest/vixen.c| 41 +++
 xen/common/event_channel.c|  5 -
 xen/drivers/char/console.c| 16 +++
 xen/include/asm-x86/guest/vixen.h |  2 ++
 xen/include/xen/lib.h |  1 +
 5 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index 59faa0c..d871218 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define PCI_DEVICE_ID_XENSOURCE_PLATFORM   0x0001
 
@@ -43,6 +44,9 @@ static bool vixen_per_cpu_notifications = true;
 static uint8_t vixen_evtchn_vector;
 static bool vixen_needs_apic_ack = true;
 struct irqaction vixen_irqaction;
+static volatile struct xencons_interface *vixen_xencons_iface;
+static uint16_t vixen_xencons_port;
+static spinlock_t vixen_xencons_lock;
 
 integer_param("vixen_domid", vixen_domid);
 boolean_param("vixen_ptver", vixen_ptver);
@@ -89,6 +93,8 @@ void __init init_vixen(void)
}
 }
 
+spin_lock_init(_xencons_lock);
+
 in_vixen = 1;
 }
 
@@ -304,6 +310,41 @@ static void vixen_interrupt(int irq, void *dev_id, struct 
cpu_user_regs *regs)
 vixen_upcall(smp_processor_id());
 }
 
+bool vixen_ring_process(uint16_t port)
+{
+volatile struct xencons_interface *r = vixen_xencons_iface;
+char buffer[128];
+size_t n;
+
+if (r == NULL || port != vixen_xencons_port) {
+return false;
+}
+
+spin_lock(_xencons_lock);
+
+n = 0;
+while (r->out_prod != r->out_cons) {
+char ch = r->out[MASK_XENCONS_IDX(r->out_cons, r->out)];
+if (n == sizeof(buffer) - 1) {
+buffer[n] = 0;
+guest_puts(hardware_domain, buffer);
+n = 0;
+}
+buffer[n++] = ch;
+rmb();
+r->out_cons++;
+}
+
+if (n) {
+buffer[n] = 0;
+guest_puts(hardware_domain, buffer);
+}
+
+spin_unlock(_xencons_lock);
+
+return true;
+}
+
 static int hvm_set_parameter(int idx, uint64_t value)
 {
 struct xen_hvm_param xhv;
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 54ea720..6d060a5 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1241,7 +1241,10 @@ long do_event_channel_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 struct evtchn_send send;
 if ( copy_from_guest(, arg, 1) != 0 )
 return -EFAULT;
-rc = evtchn_send(current->domain, send.port);
+if ( vixen_ring_process(send.port) )
+rc = 0;
+else
+rc = evtchn_send(current->domain, send.port);
 break;
 }
 
diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 3eb130d..4be34d4 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -775,6 +775,22 @@ void guest_printk(const struct domain *d, const char *fmt, 
...)
 va_end(args);
 }
 
+void guest_puts(const struct domain *d, const char *kbuf)
+{
+spin_lock_irq(_lock);
+
+sercon_puts(kbuf);
+video_puts(kbuf);
+
+if ( opt_console_to_ring )
+{
+conring_puts(kbuf);
+tasklet_schedule(_dom0_con_ring_tasklet);
+}
+
+spin_unlock_irq(_lock);
+}
+
 void __init console_init_preirq(void)
 {
 char *p;
diff --git a/xen/include/asm-x86/guest/vixen.h 
b/xen/include/asm-x86/guest/vixen.h
index 140645c..4b59cc7 100644
--- a/xen/include/asm-x86/guest/vixen.h
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -88,4 +88,6 @@ void vixen_vcpu_initialize(struct vcpu *v);
 
 void __init vixen_transform(struct domain *dom0);
 
+bool vixen_ring_process(uint16_t port);
+
 #endif
diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h
index ed00ae1..de84638 100644
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -92,6 +92,7 @@ extern void printk(const char *format, ...)
 __attribute__ ((format (printf, 1, 2)));
 extern void guest_printk(const struct domain *d, const char *format, ...)
 __attribute__ ((format (printf, 2, 3)));
+extern void guest_puts(const struct domain *d, const char *message);
 extern void noreturn panic(const char *format, ...)
 __attribute__ ((format (printf, 1, 2)));
 extern long 

[Xen-devel] [PATCH v3 14/24] vixen: forward VCPUOP_register_runstate_memory_area to outer Xen

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

This allows for proper accounting of steal time within the guest.

Signed-off-by: Anthony Liguori 
---
 xen/common/domain.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index ede377c..780f8ff 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1414,6 +1414,12 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 if ( !guest_handle_okay(area.addr.h, 1) )
 break;
 
+if ( is_vixen() ) {
+rc = HYPERVISOR_vcpu_op(VCPUOP_register_runstate_memory_area,
+vcpuid, );
+break;
+}
+
 rc = 0;
 runstate_guest(v) = area.addr.h;
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 12/24] vixen: paravirtualization TSC frequency calculation

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

Otherwise when time sharing a physical CPU, the calculation can
be bogus resulting in time drift for the guest due to improper
frequency within pvclock.

Signed-off-by: Anthony Liguori 
---
 xen/arch/x86/guest/vixen.c| 22 ++
 xen/arch/x86/time.c   |  9 -
 xen/include/asm-x86/guest/vixen.h |  2 ++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index 1ad5bd7..a1614e0 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -120,3 +120,25 @@ void vixen_get_reserved_mem(unsigned long *start_pfn, 
unsigned long *end_pfn)
 /* This is part of the Xen ABI */
 *end_pfn   = 0x10;
 }
+
+u64 vixen_get_cpu_freq(void)
+{
+volatile vcpu_time_info_t *timep = _si->native.vcpu_info[0].time;
+vcpu_time_info_t time;
+uint32_t version;
+u64 imm;
+
+do {
+   version = timep->version;
+   rmb();
+   time = *timep;
+} while ((version & 1) || version != time.version);
+
+imm = (10ULL << 32) / time.tsc_to_system_mul;
+
+if (time.tsc_shift < 0) {
+   return imm << -time.tsc_shift;
+} else {
+   return imm >> time.tsc_shift;
+}
+}
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 2a87950..04c0fbb 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -36,6 +36,7 @@
 #include 
 #include  /* for early_time_init */
 #include 
+#include 
 
 /* opt_clocksource: Force clocksource to one of: pit, hpet, acpi. */
 static char __initdata opt_clocksource[10];
@@ -1687,6 +1688,12 @@ void __init early_time_init(void)
 
 preinit_pit();
 tmp = init_platform_timer();
+
+/* We cannot trust calibrated values when running under
+ * a hypervisor. */
+if ( is_vixen() )
+tmp = vixen_get_cpu_freq();
+
 plt_tsc.frequency = tmp;
 
 set_time_scale(>tsc_scale, tmp);
@@ -2014,7 +2021,7 @@ void tsc_set_info(struct domain *d,
   uint32_t tsc_mode, uint64_t elapsed_nsec,
   uint32_t gtsc_khz, uint32_t incarnation)
 {
-if ( is_idle_domain(d) || is_hardware_domain(d) )
+if ( is_idle_domain(d) || is_vixen() || is_hardware_domain(d) )
 {
 d->arch.vtsc = 0;
 return;
diff --git a/xen/include/asm-x86/guest/vixen.h 
b/xen/include/asm-x86/guest/vixen.h
index 0c040ee..e6b64f2 100644
--- a/xen/include/asm-x86/guest/vixen.h
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -78,4 +78,6 @@ void __init init_vixen(void);
 
 void __init early_vixen_init(void);
 
+u64 vixen_get_cpu_freq(void);
+
 #endif
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 23/24] vixen: use default position for the m2p mappings

2018-01-09 Thread Anthony Liguori
From: Roger Pau Monne 

When running a 32bit kernel as Dom0 on a 64bit hypervisor the
hypervisor will try to shrink the hypervisor hole to the minimum
needed, and thus requires the Dom0 to use XENMEM_machphys_mapping in
order to fetch the position of the start of the hypervisor virtual
mappings.

Disable this feature when running as a PV shim, since some DomU
kernels don't implemented XENMEM_machphys_mapping and break if the m2p
doesn't begin at the default address.

NB: support for the XENMEM_machphys_mapping was added in Linux by
commit 7e7750.

Signed-off-by: Roger Pau Monné 
Signed-off-by: Anthony Liguori 
---
v1 -> v2
 - adapted for Vixen
---
 xen/arch/x86/pv/dom0_build.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index a554629..2bc6339 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -398,7 +398,8 @@ int __init dom0_construct_pv(struct domain *d,
 if ( parms.pae == XEN_PAE_EXTCR3 )
 set_bit(VMASST_TYPE_pae_extended_cr3, >vm_assist);
 
-if ( (parms.virt_hv_start_low != UNSET_ADDR) && elf_32bit() )
+if ( !is_vixen() && (parms.virt_hv_start_low != UNSET_ADDR) &&
+ elf_32bit() )
 {
 unsigned long mask = (1UL << L2_PAGETABLE_SHIFT) - 1;
 value = (parms.virt_hv_start_low + mask) & ~mask;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 11/24] vixen: early initialization of Vixen including shared_info mapping

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

We split initialization of Vixen into two parts.  The first part
just detects the presence of an HVM hypervisor so that we can
figure out whether to modify the e820 table.

The later initialization is used to actually map the shared_info
structure from the parent hypervisor into Xen.

Signed-off-by: Anthony Liguori 
---
v1 -> v2
 - allow disabling vixen by specifying vixen_domid=-1 on command line
 - use hvm_info_table for reserved region if still valid
 - use reserved region for shared_info instead of BSS
---
 xen/arch/x86/guest/vixen.c| 76 +++
 xen/arch/x86/setup.c  |  5 +++
 xen/include/asm-x86/guest/vixen.h |  4 +++
 3 files changed, 85 insertions(+)

diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index cacbe69..1ad5bd7 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -20,13 +20,89 @@
  */
 
 #include 
+#include 
+#include 
+
+#define X86_HVM_END_SPECIAL_REGION  0xff000u
+
+#define SHARED_INFO_PFN(X86_HVM_END_SPECIAL_REGION + 0)
 
 static int in_vixen;
 static int vixen_domid = 1;
 static uint32_t vixen_reserved_mem_pgstart = 0xfeff;
+static shared_info_t *global_si;
 
 integer_param("vixen_domid", vixen_domid);
 
+void __init init_vixen(void)
+{
+int major, minor, version;
+struct hvm_info_table *hvm_info;
+
+if ( !xen_guest )
+{
+printk("Disabling Vixen because we are not running under Xen\n");
+in_vixen = -1;
+return;
+}
+
+if ( vixen_domid < 0 )
+{
+printk("Disabling Vixen due to user request\n");
+in_vixen = -1;
+return;
+}
+
+version = HYPERVISOR_xen_version(XENVER_version, NULL);
+major = version >> 16;
+minor = version & 0x;
+
+printk("Vixen running under Xen %d.%d\n", major, minor);
+
+hvm_info = maddr_to_virt(HVM_INFO_PADDR);
+if ( strncmp(hvm_info->signature, "HVM INFO", 8) == 0 &&
+hvm_info->length >= sizeof(struct hvm_info_table) &&
+hvm_info->length < (PAGE_SIZE - HVM_INFO_OFFSET) )
+{
+   uint8_t sum;
+   uint32_t i;
+
+   for ( i = 0, sum = 0; i < hvm_info->length; i++ )
+   sum += ((uint8_t *)hvm_info)[i];
+
+   if ( sum == 0 )
+   {
+   vixen_reserved_mem_pgstart = hvm_info->reserved_mem_pgstart << 
XEN_PAGE_SHIFT;
+   }
+}
+
+in_vixen = 1;
+}
+
+void __init early_vixen_init(void)
+{
+struct xen_add_to_physmap xatp;
+long rc;
+
+if ( !is_vixen() )
+   return;
+
+global_si = mfn_to_virt(SHARED_INFO_PFN);
+
+/* Setup our own shared info area */
+xatp.domid = DOMID_SELF;
+xatp.idx = 0;
+xatp.space = XENMAPSPACE_shared_info;
+xatp.gpfn = virt_to_mfn(global_si);
+
+rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, );
+if ( rc < 0 )
+printk("Setting shared info page failed: %ld\n", rc);
+
+memset(_si->native.evtchn_mask[0], 0x00,
+   sizeof(global_si->native.evtchn_mask));
+}
+
 bool is_vixen(void)
 {
 return in_vixen > 0;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index f9d087e..07239c0 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -869,6 +869,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 else
 panic("Bootloader provided no memory information.");
 
+/* Vixen must be initialized before init_e820() */
+init_vixen();
+
 /* Sanitise the raw E820 map to produce a final clean version. */
 max_page = raw_max_page = init_e820(memmap_type, _raw);
 
@@ -1516,6 +1519,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
 rcu_init();
 
+early_vixen_init();
+
 early_time_init();
 
 arch_init_memory();
diff --git a/xen/include/asm-x86/guest/vixen.h 
b/xen/include/asm-x86/guest/vixen.h
index fb8e871..0c040ee 100644
--- a/xen/include/asm-x86/guest/vixen.h
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -74,4 +74,8 @@ int vixen_get_domid(void);
 
 void vixen_get_reserved_mem(unsigned long *start_pfn, unsigned long *end_pfn);
 
+void __init init_vixen(void);
+
+void __init early_vixen_init(void);
+
 #endif
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 15/24] vixen: pass through version hypercalls to parent Xen

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

This is necessary to trigger event channel upcalls but it is also
useful to passthrough the full version information such that the
guest believes it is running on the parent Xen.

Signed-off-by: Matt Wilson 
Signed-off-by: Anthony Liguori 
---
v1 -> v2
 - don't pass through version by default
 - introduce vixen_ptver parameter to enable version passthrough.
---
 xen/arch/x86/guest/vixen.c|  7 +++
 xen/common/kernel.c   | 89 +--
 xen/include/asm-arm/guest/vixen.h |  5 +++
 xen/include/asm-x86/guest/vixen.h |  2 +
 4 files changed, 91 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index a1614e0..7c886a2 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -31,8 +31,10 @@ static int in_vixen;
 static int vixen_domid = 1;
 static uint32_t vixen_reserved_mem_pgstart = 0xfeff;
 static shared_info_t *global_si;
+static bool vixen_ptver;
 
 integer_param("vixen_domid", vixen_domid);
+boolean_param("vixen_ptver", vixen_ptver);
 
 void __init init_vixen(void)
 {
@@ -142,3 +144,8 @@ u64 vixen_get_cpu_freq(void)
return imm >> time.tsc_shift;
 }
 }
+
+bool vixen_passthru_version(void)
+{
+return is_vixen() && vixen_ptver;
+}
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 8d137c5..ac85bea 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 
@@ -311,14 +312,39 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 switch ( cmd )
 {
 case XENVER_version:
-return (xen_major_version() << 16) | xen_minor_version();
+if ( vixen_passthru_version() )
+return HYPERVISOR_xen_version(XENVER_version, NULL);
+else
+{
+/* This hypercall is used to force event channel injections
+   after re-enabling interrupts so if we're Vixen, we need
+   to invoke the parent. */
+if ( is_vixen() )
+(void)HYPERVISOR_xen_version(0, NULL);
+return (xen_major_version() << 16) | xen_minor_version();
+}
 
 case XENVER_extraversion:
 {
 xen_extraversion_t extraversion;
+int rc;
 
 memset(extraversion, 0, sizeof(extraversion));
-safe_strcpy(extraversion, deny ? xen_deny() : xen_extra_version());
+if ( vixen_passthru_version() )
+{
+if ( deny )
+safe_strcpy(extraversion, xen_deny());
+else
+{
+rc = HYPERVISOR_xen_version(XENVER_extraversion, 
);
+if ( rc )
+return rc;
+}
+}
+else
+{
+safe_strcpy(extraversion, deny ? xen_deny() : xen_extra_version());
+}
 if ( copy_to_guest(arg, extraversion, ARRAY_SIZE(extraversion)) )
 return -EFAULT;
 return 0;
@@ -327,12 +353,22 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case XENVER_compile_info:
 {
 xen_compile_info_t info;
+int rc;
 
 memset(, 0, sizeof(info));
-safe_strcpy(info.compiler,   deny ? xen_deny() : xen_compiler());
-safe_strcpy(info.compile_by, deny ? xen_deny() : xen_compile_by());
-safe_strcpy(info.compile_domain, deny ? xen_deny() : 
xen_compile_domain());
-safe_strcpy(info.compile_date,   deny ? xen_deny() : 
xen_compile_date());
+if ( vixen_passthru_version() )
+{
+rc = HYPERVISOR_xen_version(XENVER_compile_info, );
+if ( rc )
+return rc;
+}
+else
+{
+safe_strcpy(info.compiler,   deny ? xen_deny() : 
xen_compiler());
+safe_strcpy(info.compile_by, deny ? xen_deny() : 
xen_compile_by());
+safe_strcpy(info.compile_domain, deny ? xen_deny() : 
xen_compile_domain());
+safe_strcpy(info.compile_date,   deny ? xen_deny() : 
xen_compile_date());
+}
 if ( copy_to_guest(arg, , 1) )
 return -EFAULT;
 return 0;
@@ -366,9 +402,24 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case XENVER_changeset:
 {
 xen_changeset_info_t chgset;
+int rc;
 
 memset(chgset, 0, sizeof(chgset));
-safe_strcpy(chgset, deny ? xen_deny() : xen_changeset());
+if ( vixen_passthru_version() )
+{
+if ( deny )
+safe_strcpy(chgset, xen_deny());
+else
+{
+rc = HYPERVISOR_xen_version(XENVER_changeset, );
+if ( rc )
+return rc;
+}
+}
+else
+{
+safe_strcpy(chgset, deny ? xen_deny() : xen_changeset());
+}
 if ( copy_to_guest(arg, chgset, 

[Xen-devel] [PATCH v3 02/24] x86/entry: Probe for Xen early during boot

2018-01-09 Thread Anthony Liguori
From: Andrew Cooper 

Signed-off-by: Andrew Cooper 
---
v1 -> v2
 - ARM stubs
---
 xen/arch/x86/Makefile   |  1 +
 xen/arch/x86/guest/Makefile |  1 +
 xen/arch/x86/guest/xen.c| 75 +
 xen/arch/x86/setup.c|  4 +++
 xen/include/asm-arm/guest.h | 22 
 xen/include/asm-x86/guest.h | 34 +++
 xen/include/asm-x86/guest/xen.h | 47 ++
 7 files changed, 184 insertions(+)
 create mode 100644 xen/arch/x86/guest/Makefile
 create mode 100644 xen/arch/x86/guest/xen.c
 create mode 100644 xen/include/asm-arm/guest.h
 create mode 100644 xen/include/asm-x86/guest.h
 create mode 100644 xen/include/asm-x86/guest/xen.h

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index d5d58a2..c1977d1 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -1,6 +1,7 @@
 subdir-y += acpi
 subdir-y += cpu
 subdir-y += genapic
+subdir-$(CONFIG_XEN_GUEST) += guest
 subdir-$(CONFIG_HVM) += hvm
 subdir-y += mm
 subdir-$(CONFIG_XENOPROF) += oprofile
diff --git a/xen/arch/x86/guest/Makefile b/xen/arch/x86/guest/Makefile
new file mode 100644
index 000..7f67396
--- /dev/null
+++ b/xen/arch/x86/guest/Makefile
@@ -0,0 +1 @@
+obj-y += xen.o
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
new file mode 100644
index 000..9446a46
--- /dev/null
+++ b/xen/arch/x86/guest/xen.c
@@ -0,0 +1,75 @@
+/**
+ * arch/x86/guest/xen.c
+ *
+ * Support for detecting and running under Xen.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see .
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+
+bool xen_guest;
+
+static uint32_t xen_cpuid_base;
+
+static void __init find_xen_leaves(void)
+{
+uint32_t eax, ebx, ecx, edx, base;
+
+for ( base = XEN_CPUID_FIRST_LEAF;
+  base < XEN_CPUID_FIRST_LEAF + 0x1; base += 0x100 )
+{
+cpuid(base, , , , );
+
+if ( (ebx == XEN_CPUID_SIGNATURE_EBX) &&
+ (ecx == XEN_CPUID_SIGNATURE_ECX) &&
+ (edx == XEN_CPUID_SIGNATURE_EDX) &&
+ ((eax - base) >= 2) )
+{
+xen_cpuid_base = base;
+break;
+}
+}
+}
+
+void __init probe_hypervisor(void)
+{
+/* Too early to use cpu_has_hypervisor */
+if ( !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) )
+return;
+
+find_xen_leaves();
+
+if ( !xen_cpuid_base )
+return;
+
+xen_guest = true;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 2e10c6b..7627c3f 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -51,6 +51,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /* opt_nosmp: If true, secondary processors are ignored. */
 static bool __initdata opt_nosmp;
@@ -704,6 +706,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
  * allocing any xenheap structures wanted in lower memory. */
 kexec_early_calculations();
 
+probe_hypervisor();
+
 parse_video_info();
 
 rdmsrl(MSR_EFER, this_cpu(efer));
diff --git a/xen/include/asm-arm/guest.h b/xen/include/asm-arm/guest.h
new file mode 100644
index 000..4d143d7
--- /dev/null
+++ b/xen/include/asm-arm/guest.h
@@ -0,0 +1,22 @@
+/**
+ * include/asm-x86/guest.h
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see .
+ *
+ * Copyright 2018 Amazon.com, Inc. or its affiliates.
+ */
+
+#ifndef __ARM_GUEST_H__
+#define __ARM_GUEST_H__

[Xen-devel] [PATCH v3 05/24] char: optionally redirect {, g}printk output to QEMU debug log

2018-01-09 Thread Anthony Liguori
From: Matt Wilson 

When using Vixen, it is helpful to get the Xen messages in a
separate channel than the console output.  Add an option to
output to the QEMU backdoor logging port.

Signed-off-by: Matt Wilson 
---
v1 -> v2
 - #ifdef for !x86_64
---
 xen/drivers/char/console.c | 35 ---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 19d0e74..0f85707 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -85,6 +85,11 @@ static int __read_mostly sercon_handle = -1;
 
 static DEFINE_SPINLOCK(console_lock);
 
+/* send all printk output to QEMU debug log. Input does not change,
+ * nor does dom0 output.
+ */
+static bool_t __read_mostly qemu_debug = false;
+
 /*
  * To control the amount of printing, thresholds are added.
  * These thresholds correspond to the XENLOG logging levels.
@@ -560,14 +565,36 @@ long do_console_io(int cmd, int count, 
XEN_GUEST_HANDLE_PARAM(char) buffer)
 
 static bool_t console_locks_busted;
 
+#if defined(__x86_64__)
+static void qemu_putstr(const char *str)
+{
+char c;
+while ( (c = *str++) != '\0' )
+{
+outb(c, 0x12);
+}
+}
+#else
+static void qemu_putstr(const char *str)
+{
+}
+#endif
+
 static void __putstr(const char *str)
 {
 ASSERT(spin_is_locked(_lock));
 
-sercon_puts(str);
-video_puts(str);
+if ( qemu_debug )
+{
+qemu_putstr(str);
+}
+else
+{
+sercon_puts(str);
+video_puts(str);
 
-conring_puts(str);
+conring_puts(str);
+}
 
 if ( !console_locks_busted )
 tasklet_schedule(_dom0_con_ring_tasklet);
@@ -762,6 +789,8 @@ void __init console_init_preirq(void)
 p++;
 if ( !strncmp(p, "vga", 3) )
 video_init();
+else if ( !strncmp(p, "qemu", 4) )
+qemu_debug = true;
 else if ( !strncmp(p, "none", 4) )
 continue;
 else if ( (sh = serial_parse_handle(p)) >= 0 )
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 06/24] console: do not print banner if below info log threshold

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

Only print the banner if the log threshold is at least info.

For Vixen guests, we want the console output to be exactly what the
PV guest would show on it's own.  That means the inner Xen banner
can potentially break automation that assumes a specific type of
console output.

Signed-off-by: Anthony Liguori 
---
 xen/drivers/char/console.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 0f85707..3eb130d 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -812,9 +812,12 @@ void __init console_init_preirq(void)
 serial_set_rx_handler(sercon_handle, serial_rx);
 
 /* HELLO WORLD --- start-of-day banner text. */
-spin_lock(_lock);
-__putstr(xen_banner());
-spin_unlock(_lock);
+if ( 2 < xenlog_lower_thresh ) {
+/* Only display at XENLOG_INFO level */
+spin_lock(_lock);
+__putstr(xen_banner());
+spin_unlock(_lock);
+}
 printk("Xen version %d.%d%s (%s@%s) (%s) debug=%c " gcov_string " %s\n",
xen_major_version(), xen_minor_version(), xen_extra_version(),
xen_compile_by(), xen_compile_domain(),
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 08/24] vixen: allow dom0 to be created with a domid != 0

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

Some older guests special case domid=0 instead of checking the
shared info flags so in order to get PV drivers loaded properly,
we need to make the guest always appear with a domid != 0.

While the Vixen domain is the hardware domain, we don't want it
to behave that way so we also modify the is_hardware_domain()
check.

Signed-off-by: Anthony Liguori 
---
v1 -> v2
 - allow vixen domain id to be overridden via the Xen command line
---
 xen/arch/x86/dom0_build.c | 2 +-
 xen/arch/x86/guest/vixen.c| 7 +++
 xen/arch/x86/setup.c  | 2 +-
 xen/common/domain.c   | 4 ++--
 xen/include/asm-arm/guest/vixen.h | 5 +
 xen/include/asm-x86/guest/vixen.h | 2 ++
 xen/include/xen/sched.h   | 6 +-
 7 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index bf992fe..88810db 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -469,7 +469,7 @@ int __init construct_dom0(struct domain *d, const module_t 
*image,
 int rc;
 
 /* Sanity! */
-BUG_ON(d->domain_id != 0);
+BUG_ON(d->domain_id != dom0_domid);
 BUG_ON(d->vcpu[0] == NULL);
 BUG_ON(d->vcpu[0]->is_initialised);
 
diff --git a/xen/arch/x86/guest/vixen.c b/xen/arch/x86/guest/vixen.c
index d82e68f..c0a81dd 100644
--- a/xen/arch/x86/guest/vixen.c
+++ b/xen/arch/x86/guest/vixen.c
@@ -22,9 +22,16 @@
 #include 
 
 static int in_vixen;
+static int vixen_domid = 1;
+
+integer_param("vixen_domid", vixen_domid);
 
 bool is_vixen(void)
 {
 return in_vixen > 0;
 }
 
+int vixen_get_domid(void)
+{
+return vixen_domid;
+}
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 7627c3f..f9d087e 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1591,7 +1591,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 }
 
 /* Create initial domain 0. */
-dom0 = domain_create(0, domcr_flags, 0, );
+dom0 = domain_create(dom0_domid, domcr_flags, 0, );
 if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
 panic("Error creating domain 0");
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7af8d12..b4d679e 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -202,7 +202,7 @@ static int late_hwdom_init(struct domain *d)
 struct domain *dom0;
 int rv;
 
-if ( d != hardware_domain || d->domain_id == 0 )
+if ( d != hardware_domain || d->domain_id == dom0_domid )
 return 0;
 
 rv = xsm_init_hardware_domain(XSM_HOOK, d);
@@ -310,7 +310,7 @@ struct domain *domain_create(domid_t domid, unsigned int 
domcr_flags,
 else
 d->guest_type = guest_type_pv;
 
-if ( domid == 0 || domid == hardware_domid )
+if ( domid == dom0_domid || domid == hardware_domid )
 {
 if ( hardware_domid < 0 || hardware_domid >= DOMID_FIRST_RESERVED )
 panic("The value of hardware_dom must be a valid domain ID");
diff --git a/xen/include/asm-arm/guest/vixen.h 
b/xen/include/asm-arm/guest/vixen.h
index ade6724..cb51698 100644
--- a/xen/include/asm-arm/guest/vixen.h
+++ b/xen/include/asm-arm/guest/vixen.h
@@ -78,4 +78,9 @@ static inline bool vixen_ring_process(uint16_t port)
 return false;
 }
 
+static inline int vixen_get_domid(void)
+{
+return 0;
+}
+
 #endif
diff --git a/xen/include/asm-x86/guest/vixen.h 
b/xen/include/asm-x86/guest/vixen.h
index be90c46..4e80b76 100644
--- a/xen/include/asm-x86/guest/vixen.h
+++ b/xen/include/asm-x86/guest/vixen.h
@@ -70,4 +70,6 @@ HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args)
 
 bool is_vixen(void);
 
+int vixen_get_domid(void);
+
 #endif
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 002ba29..5ddf6a2 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -27,6 +27,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #ifdef CONFIG_COMPAT
 #include 
@@ -54,6 +56,8 @@ extern domid_t hardware_domid;
 #define hardware_domid 0
 #endif
 
+#define dom0_domid (is_vixen() ? vixen_get_domid() : 0)
+
 #ifndef CONFIG_COMPAT
 #define BITS_PER_EVTCHN_WORD(d) BITS_PER_XEN_ULONG
 #else
@@ -873,7 +877,7 @@ void watchdog_domain_destroy(struct domain *d);
  *(that is, this would not be suitable for a driver domain)
  *  - There is never a reason to deny the hardware domain access to this
  */
-#define is_hardware_domain(_d) ((_d) == hardware_domain)
+#define is_hardware_domain(_d) (!is_vixen() && ((_d) == hardware_domain))
 
 /* This check is for functionality specific to a control domain */
 #define is_control_domain(_d) ((_d)->is_privileged)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 10/24] vixen: do not permit access to physical IRQs if in Vixen mode

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

Our intention is for the Vixen guest to be deprivileged so we need
to avoid permitting access to each IRQ even though it is technically
the hardware domain.

Signed-off-by: Anthony Liguori 
---
 xen/arch/x86/irq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 87ef2e8..bd75108 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int parse_irq_vector_map_param(const char *s);
 
@@ -190,7 +191,7 @@ int create_irq(nodeid_t node)
 desc->arch.used = IRQ_UNUSED;
 irq = ret;
 }
-else if ( hardware_domain )
+else if ( !is_vixen() && hardware_domain )
 {
 ret = irq_permit_access(hardware_domain, irq);
 if ( ret )
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 00/24] Vixen: A PV-in-HVM shim

2018-01-09 Thread Anthony Liguori
From: Anthony Liguori 

CVE-2017-5754 is problematic for paravirtualized x86 domUs because it
appears to be very difficult to isolate the hypervisor's page tables
from PV domUs while maintaining ABI compatibility.  Instead of trying
to make a KPTI-like approach work for Xen PV, it seems reasonable to
run a copy of Xen within an HVM (or PVH) domU to provide backwards
compatibility with guests as mentioned in XSA-254 [1].

This patch series adds a new mode to Xen called Vixen (Virtualized
Xen) which provides a PV-compatible interface while gaining
CVE-2017-5754 protection for the host provided by hardware
virtualization.  Vixen supports running a single unprivileged PV
domain (a dom1) that is constructed by the dom0 domain builder.

Please note the Xen page table configuration fundamental to the
current PV ABI makes it impossible for an operating system to mitigate
CVE-2017-5754 through mechanisms like Kernel Page Table Isolation
(KPTI).  In order for an operating system to mitigate CVE-2017-5754 it
must run directly in a HVM or PVH domU.

This series is very similar to the PVH series posted by Wei and we
have been discussing how to merge efforts.  We were hoping to have
more time to work this out.  I am posting this because I'm fairly
confident that this series is complete (all PV instances in EC2 are
using this) and others might find it useful.  I also wanted to have
more of a discussion about the best way to merge and some of the
differences in designs.

This series is also available at:

 git clone https://github.com/aliguori/xen.git vixen-upstream-v2

Changelog:
v1 -> v2
 - fix ARM build
 - add vixen_domid command line parameter
 - make version pass through optional
 - pull in p2m mapping fix from sidewinder
 - panic if dom0_construct_pv fails
 - #defines for the vendor/device id of platform device
 - coding style for event channel polling
 - reserve even more in the e820 table based on hvm_info_table
 - moved shared info to special page range
 - make grant table frames come from special page range
 - refactor grant tables to use single dispatch function

v2 -> v3
 - ballooning support cherry picked from the pvshim branch

Not in this version:
 - Avoiding vixen domain == hardware domain

Regards,

Anthony Liguori

[1] https://xenbits.xen.org/xsa/advisory-254.html

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 01/24] ---- x86/Kconfig: Options for Xen and PVH support

2018-01-09 Thread Anthony Liguori
From: Andrew Cooper 

Signed-off-by: Andrew Cooper 
---
 xen/arch/x86/Kconfig | 17 +
 1 file changed, 17 insertions(+)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 7c45829..07530bf 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -117,6 +117,23 @@ config TBOOT
  Technology (TXT)
 
  If unsure, say Y.
+
+config XEN_GUEST
+   def_bool y
+   prompt "Xen Guest"
+   ---help---
+ Support for Xen detecting when it is running under Xen.
+
+ If unsure, say Y.
+
+config PVH_GUEST
+   def_bool n
+   prompt "PVH Guest"
+   depends on XEN_GUEST
+   ---help---
+ Support booting using the PVH ABI.
+
+ If unsure, say N.
 endmenu
 
 source "common/Kconfig"
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 04/24] x86: Don't use potentially incorrect CPUID values for topology information

2018-01-09 Thread Anthony Liguori
From: Jan H. Schönherr 

Intel says for CPUID leaf 0Bh:

  "Software must not use EBX[15:0] to enumerate processor
   topology of the system. This value in this field
   (EBX[15:0]) is only intended for display/diagnostic
   purposes. The actual number of logical processors
   available to BIOS/OS/Applications may be different from
   the value of EBX[15:0], depending on software and platform
   hardware configurations."

And yet, we're using them to derive the number cores in a package
and the number of siblings in a core.

Derive the number of siblings and cores from EAX instead, which is
intended for that.

Signed-off-by: Jan H. Schönherr 
---
 xen/arch/x86/cpu/common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index e9588b3..22f392f 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -479,8 +479,8 @@ void detect_extended_topology(struct cpuinfo_x86 *c)
initial_apicid = edx;
 
/* Populate HT related information from sub-leaf level 0 */
-   core_level_siblings = c->x86_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+   core_level_siblings = c->x86_num_siblings = 1 << ht_mask_width;
 
sub_index = 1;
do {
@@ -488,8 +488,8 @@ void detect_extended_topology(struct cpuinfo_x86 *c)
 
/* Check for the Core type in the implemented sub leaves */
if ( LEAFB_SUBTYPE(ecx) == CORE_TYPE ) {
-   core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+   core_level_siblings = 1 << core_plus_mask_width;
break;
}
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 1/8] x86/domctl: introduce a pair of hypercall to set and get cpu topology

2018-01-09 Thread Andrew Cooper
On 08/01/18 04:01, Chao Gao wrote:
> Define interface, structures and hypercalls for toolstack to build
> cpu topology and for guest that will retrieve it [1].
> Two subop hypercalls introduced by this patch:
> XEN_DOMCTL_set_cpu_topology to define cpu topology information per domain
> and XENMEM_get_cpu_topology to retrieve cpu topology information.
>
> [1]: during guest creation, those information helps hvmloader to build ACPI.
>
> Signed-off-by: Chao Gao 

I'm sorry, but this going in the wrong direction.  Details like this
should be contained and communicated exclusively in the CPUID policy.

Before the spectre/meltdown fire started, I had a prototype series
introducing a toolstack interface for getting and setting a full CPUID
policy at once, rather than piecewise.  I will be continuing with this
work once the dust settles.

In particular, we should not have multiple ways of conveying the same
information, or duplication of the same data inside the hypervisor.

If you rearrange your series to put the struct cpuid_policy changes
first, then patch 2 will become far more simple.  HVMLoader should
derive its topology information from the CPUID instruction, just as is
expected on native hardware.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH FAIRLY-RFC 00/44] x86: Prerequisite work for a Xen KAISER solution

2018-01-09 Thread Stefano Stabellini
On Fri, 5 Jan 2018, Juergen Gross wrote:
> On 04/01/18 21:21, Andrew Cooper wrote:
> > This work was developed as an SP3 mitigation, but shelved when it became 
> > clear
> > that it wasn't viable to get done in the timeframe.
> > 
> > To protect against SP3 attacks, most mappings needs to be flushed while in
> > user context.  However, to protect against all cross-VM attacks, it is
> > necessary to ensure that the Xen stacks are not mapped in any other cpus
> > address space, or an attacker can still recover at least the GPR state of
> > separate VMs.
> 
> Above statement is too strict: it would be sufficient if no stacks of
> other domains are mapped.
> 
> I'm just working on a proof of concept using dedicated per-vcpu stacks
> for 64 bit pv domains. Those stacks would be mapped in the per-domain
> region of the address space. I hope to have a RFC version of the patches
> ready next week.
> 
> This would allow to remove the per physical cpu mappings in the guest
> visible address space when doing page table isolation.
> 
> In order to avoid SP3 attacks to other vcpu's stacks of the same guest
> we could extend the pv ABI to mark a guest's user L4 page table as
> "single use", i.e. not allowed to be active on multiple vcpus at the
> same time (introducing that ABI modification in the Linux kernel would
> be simple, as the Linux kernel currently lacks support for cross-cpu
> stack exploits and when that support is being added by per-cpu L4 user
> page tables we could just chime in). A L4 page table marked as "single
> use" would map the local vcpu stacks only.

Regardless of what we do as a stop-gap now (vixen for example), I think
we need to continue pursuing this solution because it is the only one
that can mitigate SP3 when VT-x is not available.

I have several users exactly in this condition, and this is the only
hope for them.

I think this series should be a blocker for 4.11.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] unable to shutdown (page fault in mwait_idle()/do_dbs_timer()/__find_next_bit()) (fwd)

2018-01-09 Thread Martin Cerveny

Hello.

On Tue, 9 Jan 2018, Jan Beulich wrote:

On 08.01.18 at 17:07,  wrote:

On Mon, 8 Jan 2018, Jan Beulich wrote:

On 07.01.18 at 13:34,  wrote:

(XEN) [ Xen-4.10.0-vgpu  x86_64  debug=n   Not tainted ]


The -vgpu tag makes me wonder whether you have any patches in
your tree on top of plain 4.10.0 (or 4.10-staging). Also the debug=n
above ...


4.10.0 + 11 patches to make nvidia/vgpu work
(https://github.com/xenserver/xen-4.7.pg).
debug=n because xen's modified debug build process.


(XEN)[] __find_next_bit+0x10/0x80
(XEN)[] cpufreq_ondemand.c#do_dbs_timer+0x160/0x220
(XEN)[] mwait-idle.c#mwait_idle+0x23e/0x340
(XEN)[] domain.c#idle_loop+0x86/0xc0


... makes this call trace unreliable. But even with a reliable call
trace, analysis of the crash would be helped if you made
available the xen-syms (or xen.efi, depending on how you boot)
somewhere.


xen-syms - http://www.uschovna.cz/en/zasilka/UDP5LVE2679CGBIS-4YV/


Thanks. Looks to be a race between a timer in the governor and
the CPUs being brought down. In general the governor is supposed
to be disabled in the course of CPUs being brought down, so first
of all I wonder whether you're having some daemon in use which
sends management requests to the CPUfreq driver in Xen. Such a
daemon should of course be disabled by the system shutdown
scripts. Otherwise please try the attached debugging patch -
maybe we can see something from its output.


I suppose there should no be running anything because Dom0 kernel already 
ended (see last two messages from dom0 kernel). Or how to check it ?


Patch added.
- no "dbs:" in output (grep "dbs:" ...)
- exaples of shutdown output (1* OK + 2* fail):

-

[  632.439402] ACPI: Preparing to enter system sleep state S5
[  632.486728] reboot: Power down
(XEN) Preparing system for ACPI S5 state.
(XEN) Disabling non-boot CPUs ...
(XEN) cpufreq: del CPU1 (1,ffaaab,1,2)
(XEN) Broke affinity for irq 140
(XEN) cpufreq: del CPU2 (1,4,1,4)
(XEN) Broke affinity for irq 139
(XEN) cpufreq: del CPU3 (1,ffaaa9,1,8)
(XEN) Broke affinity for irq 83
(XEN) cpufreq: del CPU4 (1,10,1,10)
(XEN) Broke affinity for irq 137
(XEN) cpufreq: del CPU5 (1,ffaaa1,1,20)
(XEN) cpufreq: del CPU6 (1,40,1,40)
(XEN) Broke affinity for irq 141
(XEN) cpufreq: del CPU7 (1,ffaa81,1,80)
(XEN) cpufreq: del CPU8 (1,100,1,100)
(XEN) cpufreq: del CPU9 (1,ffaa01,1,200)
(XEN) cpufreq: del CPU10 (1,400,1,400)
(XEN) cpufreq: del CPU11 (1,ffa801,1,800)
(XEN) cpufreq: del CPU12 (1,1000,1,1000)
(XEN) cpufreq: del CPU13 (1,ffa001,1,2000)
(XEN) cpufreq: del CPU14 (1,4000,1,4000)
(XEN) cpufreq: del CPU15 (1,ff8001,1,8000)
(XEN) cpufreq: del CPU16 (1,ff0001,1,1)
(XEN) cpufreq: del CPU17 (1,fe0001,1,2)
(XEN) cpufreq: del CPU18 (1,fc0001,1,4)
(XEN) cpufreq: del CPU19 (1,f80001,1,8)
(XEN) cpufreq: del CPU20 (1,f1,1,10)
(XEN) cpufreq: del CPU21 (1,e1,1,20)
(XEN) cpufreq: del CPU22 (1,c1,1,40)
(XEN) cpufreq: del CPU23 (1,81,1,80)
(XEN) Broke affinity for irq 72
(XEN) cpufreq: del CPU0 (1,1,1,1)
(XEN) Entering ACPI S5 state.

---

[  669.171396] ACPI: Preparing to enter system sleep state S5
[  669.218637] reboot: Power down
(XEN) Preparing system for ACPI S5 state.
(XEN) Disabling non-boot CPUs ...
(XEN) cpufreq: del CPU1 (1,ffaaab,1,2)
(XEN) Broke affinity for irq 138
(XEN) cpufreq: del CPU2 (1,4,1,4)
(XEN) Broke affinity for irq 141
(XEN) cpufreq: del CPU3 (1,ffaaa9,1,8)
(XEN) cpufreq: del CPU4 (1,10,1,10)
(XEN) cpufreq: del CPU5 (1,ffaaa1,1,20)
(XEN) Broke affinity for irq 140
(XEN) cpufreq: del CPU6 (1,40,1,40)
(XEN) Broke affinity for irq 139
(XEN) cpufreq: del CPU7 (1,ffaa81,1,80)
(XEN) Broke affinity for irq 137
(XEN) cpufreq: del CPU8 (1,100,1,100)
(XEN) cpufreq: del CPU9 (1,ffaa01,1,200)
(XEN) cpufreq: del CPU10 (1,400,1,400)
(XEN) cpufreq: del CPU11 (1,ffa801,1,800)
(XEN) cpufreq: del CPU12 (1,1000,1,1000)
(XEN) cpufreq: del CPU13 (1,ffa001,1,2000)
(XEN) cpufreq: del CPU14 (1,4000,1,4000)
(XEN) cpufreq: del CPU15 (1,ff8001,1,8000)
(XEN) cpufreq: del CPU16 (1,ff0001,1,1)
(XEN) cpufreq: del CPU17 (1,fe0001,1,2)
(XEN) cpufreq: del CPU18 (1,fc0001,1,4)
(XEN) cpufreq: del CPU19 (1,f80001,1,8)
(XEN) cpufreq: del CPU20 (1,f1,1,10)
(XEN) cpufreq: del CPU21 (1,e1,1,20)
(XEN) cpufreq: del CPU22 (1,c1,1,40)
(XEN) cpufreq: del CPU23 (1,81,1,80)
(XEN) [ Xen-4.10.0-vgpu  x86_64  debug=n   Not tainted ]
(XEN) CPU:23
(XEN) RIP:e008:[] __find_next_bit+0x10/0x80
(XEN) RFLAGS: 00010206   CONTEXT: hypervisor
(XEN) rax:    rbx: 830879db0400   rcx: 0018
(XEN) rdx: 0018   rsi: 0018   rdi: 
(XEN) rbp: 061c6652   rsp: 83104eaafdd8   r8:  0018
(XEN) r9:  830879db6d70   r10: 830879db28e8   r11: 009df890a1e7

[Xen-devel] [qemu-upstream-4.10-testing test] 117730: regressions - FAIL

2018-01-09 Thread osstest service owner
flight 117730 qemu-upstream-4.10-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/117730/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-credit2 16 guest-start/debian.repeat fail REGR. vs. 117345

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 117345

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvhv2-amd 12 guest-start  fail  never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass

version targeted for testing:
 qemuubb019fb2cbbe23e2419e07bf347f45415360677d
baseline version:
 qemuua4166a0a50dda967f30c9d85fa8aa2ea2539798e

Last test of basis   117345  2017-12-19 18:48:50 Z   21 days
Testing same since   117730  2018-01-08 17:14:22 Z1 days1 attempts


People who touched revisions under test:
  Alex Williamson 
  Alexey Kardashevskiy 
  Anthony PERARD 
  Daniel Henrique Barboza 
  Daniel P. Berrange 
  David Gibson 
  Eric Auger 
  Eric Blake 
  Gerd Hoffmann 
  Greg 

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Hans van Kranenburg
On 01/09/2018 07:22 PM, Rich Persaud wrote:
>>> On Jan 9, 2018, at 12:56, Stefano Stabellini  wrote:
>>>
>>> On Tue, 9 Jan 2018, Doug Goldstein wrote:
>>> On 1/9/18 11:33 AM, Jan Beulich wrote:
>>> On 09.01.18 at 18:23,  wrote:
> On Tue, Jan 9, 2018 at 8:52 AM, Stefano Stabellini
>  wrote:
 On Tue, 9 Jan 2018, George Dunlap wrote:
 On Mon, Jan 8, 2018 at 9:01 PM, Rich Persaud  wrote:
 On a similarly pragmatic note: would a variation of Anthony's vixen 
 patch
> series be suitable for pre-PVH Xen 4.6 - 4.9?  These versions are 
> currently 
> documented as security-supported (Oct 2018 - July 2020).
>>>
>>> Hmm, Ian's mail seems to be focusing on the idea of checking in a
>>> non-polished series to 4.10, rather than exctly what the content of
>>> that series would be.
>>>
>>> In the IRL conversation that preceeded this mail, the new short-term
>>> target we discussed was:
>>> 1. A 4.10-based shim that could boot either under HVM or PVH
>>> 2. A script that would take an existing PV config, and spit out a) a
>>> bootable ISO with the shim & whatever was needed, and b) a new config
>>> that would boot the same VM, but in HVM mode with the shim
>>>
>>> The script + a 4.10 shim binary *should* allow most PV guests to boot
>>> without any changes whatsoever for most older versions of Xen.
>>>
>>> There are a number of people for whom this won't work; I think we also
>>> need to provide a way to transparently change PV guests into PVshim
>>> guests.  But that will necessarily involve significant toolstack
>>> functionality, at which point you might as well backport PVH as well.
>>
>> Yes, there will be a number of people that won't be covered by this fix,
>> including those that can't use HVM/PVH mode because VT-x isn't available
>> at all in their environment. That is the only reason to run PV today.
>> Providing a way to transparently change PV guests into PVshim guests
>> won't cover any of these cases. A more complete workaround to SP3 is
>> along the lines of https://marc.info/?l=xen-devel=151509740625690.
>>
>> That said, I realize that we are only trying to do the best we can in a
>> very difficult situation, with very little time in our hands. I agree
>> with Ian that we should commit something unpolished and only partially
>> reviewed soon, even though it doesn't cover a good chunk of the userbase
>> for one reason or another. Even if migration doesn't work, it will still
>> help all that don't require it. It is only a partial fix by nature
>> anyway.
>
> Can people be a bit more explicit about what they think should be done 
> here?
>
> I'm happy to redirect effort to PVH shim if that's what the solution
> is going to be.
>
> I obviously prefer the HVM approach as it works on a broad range of Xen 
> versions
> without modification but I'm keen to get something done quickly and
> don't want to
> waste effort.

 From what I've read today, I have no reason to believe the PVH
 shim won't work in HVM mode. How would the HVM-only approach
 be better in that case?

 Jan
>>>
>>> I feel like I should state the obvious here. Its tested over a large
>>> data set.
>>
>> Right: if we are going to commit something unpolished and unreviewed,
>> let it be at least very well tested by the submitter. Honest question:
>> how much more dev we need on PVShim before we get it to similar
>> levels of confidence?
> 

> Since the primary audience for security fixes are production
> deployments of Xen where customer assets are at risk, is there an
> estimate for the percentage/size of Xen deployments where PVH (not
> only Xen 4.10) has already been deployed for production customers?
> That could give other customers more confidence in deploying PVH in
> production.
+1

I have been hearing mostly-very-positive stories around, except for the
missing pvgrub2 support. :)

But as a sysadmin who's also strongly considering to jump to 4.10 and
PVH I'd definitely like to hear more stories.

Hans

Hans

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Matt Wilson
On Tue, Jan 09, 2018 at 05:58:46PM +, Wei Liu wrote:
> 
> Ian has been busy writing the sidecar script and Roger and I have been
> working on cleaning up the branch.  We want to post a new version as
> soon as possible (tomorrow or even tonight).

Ian,

Let me know if you need any help with the sidecar script. Generally
it's straightforward enough to build so I'm sure you won't have any
trouble. Here's one that I used for local testing on my laptop in a
CentOS-ish chroot (we have other bits responsible for this in
EC2). Please excuse the cruft, including the use of legacy GRUB.

-- 8< ---
#!/bin/bash
if [ $# -lt 2 ]; then
echo "usage: $0 xen.gz kernel.gz [initrd.img]"
exit 1
fi

if [ $# -eq 3 ]; then
INITRD=$2
fi

if [ ! -f /usr/share/grub/x86_64-redhat/stage2_eltorito ]; then
echo "/usr/share/grub/x86_64-redhat/stage2_eltorito not found."
echo "Install grub RPM?"
exit 1
fi

if [ ! -f $1 ]; then
echo "$1 is not a file"
exit 1
fi

if [ ! -f $2 ]; then
echo "$2 is not a file"
exit 1
fi

TMPDIR=$(mktemp -d)
cat >> $TMPDIR/menu.lst <> $TMPDIR/menu.lst
INITRD_GRAFT=boot/initrd.img=$INITRD
fi
cp $TMPDIR/menu.lst $TMPDIR/grub.conf
cp /usr/share/grub/x86_64-redhat/stage2_eltorito $TMPDIR/

mkisofs -output vixen.iso \
-input-charset utf-8 \
-joliet -rational-rock \
-translation-table \
-eltorito-boot boot/grub/stage2_eltorito \
-no-emul-boot \
-boot-load-size 4 \
-boot-info-table \
-graft-points \
boot/grub/stage2_eltorito=$TMPDIR/stage2_eltorito \
boot/xen.gz=$1 \
boot/kernel.gz=$2 \
$INITRD_GRAFT \
boot/grub/menu.lst=$TMPDIR/menu.lst \
boot/grub/grub.conf=$TMPDIR/grub.conf

rm -rf $TMPDIR
-- 8< ---

Everything but ancient xend based Xen can sideload while avoiding a PV
domU block device by passing in device_model_args like this:

...
boot="d"
device_model_args=[ '-drive', 'file=/path/to/vixen.iso,media=cdrom,format=raw' ]
...

For xend versions we craft a qemu-dm wrapper script and change
device_model to use it.

--msw

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Anthony Liguori
On Tue, Jan 9, 2018 at 11:43 AM, Wei Liu  wrote:
> On Mon, Jan 08, 2018 at 05:45:32PM +, Ian Jackson wrote:
>> AIUI we have a series for pv-in-pvh shim which is nearing completion
>> in the sense that it will have been well-tested (especially the
>> hypervisor parts) and has good functionality.  (Wei is handling the
>> assembly of this series.)
>>
>> The series, however, needs proper review and tidying up.
>> Specifically, it needs the kind of tidying up that fixes code
>> structure and style issues that will hinder future Xen development.
>> I.e. the kind of technical debt which does not directly cause bugs now
>> but will cause trouble (including bugs) in the future.
>>
>> IMO that kind of tidying up is definitely essential for
>> xen.git#master.  However, it is much less of an issue for Xen 4.10.
>> Xen 4.10, as a stable branch, will get much more limited further
>> development.  Failure to tidy things up there will make backporting
>> other changes more awkward but the overall impact is both lower and
>> time-bound.
>>
>> Currently the Xen Project has no published resolution for PV guests
>> that can't be booted as, or converted to, PVH or HVM.  (And HVM guests
>> bring their own problems.)  We need to provide our users with more
>> good options as quickly as possible.
>>
>> I would like to suggest that a good way of doing this would be to ship
>> the shim series as 4.10.1 within the next very few days.  It needs
>> some minor bugfixing (build breakage etc.) but is basically ready for
>> use.
>>
>> Speaking as a sysadmin (even, a very conservative sysadmin many of
>> whose systems are running Debian oldstable), I have already taken a
>> decision to rapidly advance to new software, in one context, because
>> of these vulnerabilities - and take and fix whatever impact that has.
>> I think many of our users would like to make the same choice.
>>
>> Releaseing 4.10.1 this week with pv-in-pvh support would give many of
>> our users with PV guests an immediately deployable update, even though
>> of course the version bump to get to 4.10 may be disruptive.
>>
>> Doing this would be a departure from our uusual non-security-bug
>> process of committing changes to xen.git#staging, and then backporting
>> only after the patches have been sitting in xen.git#master for some
>> time.  It's also a departure from our usual security-bug process of
>> developing and testing and committing patches for all supported
>> versions in parallel.
>>
>> But this is not a usual situation.  This time, we don't have the time
>> to wait.
>>
>> Opinions ?
>>
>
> Anthony and others joined #xendevel to express their findings and
> opinions.
>
> Converging the PVH and HVM solution is doable and essential in the long
> run, but merging the two series in two or three days (if we want to make
> something ready this week) is not possible. It all comes down to which
> series should we use for the temporary solution.
>
> We discussed the test coverage of both series. It seems that the PV in
> PVH series has had in depth testing done on 4.7 and 4.10, while PV in
> HVM series has had testing done from Xen 3.4 onward with various old and
> new guests. Anthony also pointed out that PV in PVH shim won't work for
> some configurations -- there are far too many subtleties to fix without
> time and testing resources (both of which upstream lacks). These are
> rather strong arguments for the PV in HVM series, because being able to
> run on older versions of Xen and older versions of guest kernels
> provides our users with the maximum coverage.
>
> An argument for PV in PVH series is that it has more functionalities,
> but I think migration etc are just nice-to-have's in the context of this
> security fix series.
>
> I think providing a well tested solution to our users as soon as
> possible, even if the solution has reduced functionality, is better than
> delaying for the perfect solution.  I suggest we go with Amazon's series
> first and produce something this week, then we seek to merge the two
> solutions. Anthony has agreed to be on the hook to review future
> patches. ;-)

Thanks Wei.

I merged in ballooning support and that seems to be working okay.

Unfortunately, vcpu hotplug crashes during SMP boot up because we're
passing through runstate registration to make steal time accounting work
but there seems to be an incompatibility with that and how hotplug works
in PVShim.

I'm going to port over the migration bits next and then I'll send out a v3.

Regards,

Anthony Liguori

> Wei.
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/efi: Avoid EFI stub using absolute symbols

2018-01-09 Thread Julien Grall

Hi Jan,

On 01/02/2018 04:35 PM, Jan Beulich wrote:

On 21.12.17 at 15:55,  wrote:

The EFI image should be relocatable. At the moment, all the stub is
relocatable but one place.


Do you really mean relocatable here? Based on ...


Hmm yes position independent.




On both Arm64 and x86-64 (from a quick glance) , the compiler will generate
absolute pointer in the ErrCodeToStr array. Those values are based on Xen
view of the virtual memory and may not be the same as EFI.


... this I'm wondering whether you don't instead mean position
independent. xen.efi (on x86) wouldn't work right if there were
no relocations for this array.


When I compiled the snippet on x86 and Arm, no relocation is available 
for the pointers to string in the array in the final binary. Yet they 
are available in the object.


Indeed the relocation seem to be absolute (e.g R_X86_64_64) and 
disappeared at linking. Hence why I suggested a compiler bug because the 
code should be PIE and that would not even work is the binary is 
randomized on Linux.


So I am wondering how this work on x86? Note that this code is only used 
in error path.





For instance, at least on Arm64, EFI will do a 1:1 mappings of the Stub.


I'm afraid it is not clear to me what "1:1 mapping" in this context
means.


I meant VA = PA.


Isn't your problem rather than on ARM64 xen.efi's
.reloc section is empty (which presumably is a result of how it is
being built)?


See above.




--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -342,7 +342,7 @@ static void __init noreturn blexit(const CHAR16 *str)
  /* generic routine for printing error messages */
  static void __init PrintErrMesg(const CHAR16 *mesg, EFI_STATUS ErrCode)
  {
-static const CHAR16* const ErrCodeToStr[] __initconstrel = {
+static const CHAR16 ErrCodeToStr[][25] __initconst  = {
  [~EFI_ERROR_MASK & EFI_NOT_FOUND]   = L"Not found",
  [~EFI_ERROR_MASK & EFI_NO_MEDIA]= L"The device has no 
media",
  [~EFI_ERROR_MASK & EFI_MEDIA_CHANGED]   = L"Media changed",


If we really wanted (needed) to go this route, at least a comment
would be needed to prevent someone later "correcting" to obvious
oddity by switching back to what we have now. But I'd prefer if this
code was left alone.


That my preferred way too. But at the moment, I can't see how to avoid 
leave the array unchanged.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Wei Liu
On Mon, Jan 08, 2018 at 05:45:32PM +, Ian Jackson wrote:
> AIUI we have a series for pv-in-pvh shim which is nearing completion
> in the sense that it will have been well-tested (especially the
> hypervisor parts) and has good functionality.  (Wei is handling the
> assembly of this series.)
> 
> The series, however, needs proper review and tidying up.
> Specifically, it needs the kind of tidying up that fixes code
> structure and style issues that will hinder future Xen development.
> I.e. the kind of technical debt which does not directly cause bugs now
> but will cause trouble (including bugs) in the future.
> 
> IMO that kind of tidying up is definitely essential for
> xen.git#master.  However, it is much less of an issue for Xen 4.10.
> Xen 4.10, as a stable branch, will get much more limited further
> development.  Failure to tidy things up there will make backporting
> other changes more awkward but the overall impact is both lower and
> time-bound.
> 
> Currently the Xen Project has no published resolution for PV guests
> that can't be booted as, or converted to, PVH or HVM.  (And HVM guests
> bring their own problems.)  We need to provide our users with more
> good options as quickly as possible.
> 
> I would like to suggest that a good way of doing this would be to ship
> the shim series as 4.10.1 within the next very few days.  It needs
> some minor bugfixing (build breakage etc.) but is basically ready for
> use.
> 
> Speaking as a sysadmin (even, a very conservative sysadmin many of
> whose systems are running Debian oldstable), I have already taken a
> decision to rapidly advance to new software, in one context, because
> of these vulnerabilities - and take and fix whatever impact that has.
> I think many of our users would like to make the same choice.
> 
> Releaseing 4.10.1 this week with pv-in-pvh support would give many of
> our users with PV guests an immediately deployable update, even though
> of course the version bump to get to 4.10 may be disruptive.
> 
> Doing this would be a departure from our uusual non-security-bug
> process of committing changes to xen.git#staging, and then backporting
> only after the patches have been sitting in xen.git#master for some
> time.  It's also a departure from our usual security-bug process of
> developing and testing and committing patches for all supported
> versions in parallel.
> 
> But this is not a usual situation.  This time, we don't have the time
> to wait.
> 
> Opinions ?
> 

Anthony and others joined #xendevel to express their findings and
opinions.

Converging the PVH and HVM solution is doable and essential in the long
run, but merging the two series in two or three days (if we want to make
something ready this week) is not possible. It all comes down to which
series should we use for the temporary solution.

We discussed the test coverage of both series. It seems that the PV in
PVH series has had in depth testing done on 4.7 and 4.10, while PV in
HVM series has had testing done from Xen 3.4 onward with various old and
new guests. Anthony also pointed out that PV in PVH shim won't work for
some configurations -- there are far too many subtleties to fix without
time and testing resources (both of which upstream lacks). These are
rather strong arguments for the PV in HVM series, because being able to
run on older versions of Xen and older versions of guest kernels
provides our users with the maximum coverage.

An argument for PV in PVH series is that it has more functionalities,
but I think migration etc are just nice-to-have's in the context of this
security fix series.

I think providing a well tested solution to our users as soon as
possible, even if the solution has reduced functionality, is better than
delaying for the perfect solution.  I suggest we go with Amazon's series
first and produce something this week, then we seek to merge the two
solutions. Anthony has agreed to be on the hook to review future
patches. ;-)

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable test] 117727: FAIL

2018-01-09 Thread osstest service owner
flight 117727 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/117727/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-libvirt broken  in 117696

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt 4 host-install(4) broken in 117696 pass in 117727
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat  fail pass in 117696

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 117311
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 117311
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 117311
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 117311
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 117311
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 117311
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 117311
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 117311
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 117311
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 117311
 test-amd64-amd64-xl-pvhv2-intel 12 guest-start fail never pass
 test-amd64-amd64-xl-pvhv2-amd 12 guest-start  fail  never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass

version targeted for testing:
 xen  2d1c82261d966735e82e5971eddb63ba3c565a37
baseline version:
 xen  ec320542e4f4de12305551ef5e3cd4d2ced85771

Last test of basis   117311  2017-12-19 02:35:18 Z   21 days
Failing since 

Re: [Xen-devel] [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains

2018-01-09 Thread Andrew Cooper
(sorry for the top-post. I'm on my phone) 

I can see you are using ltr, but I don't see anywhere where where you are 
changing the content on the TSS, or the top-of-stack content.

It is very complicated to safely switch IST stacks when you might be taking 
interrupts. 

~Andrew 

From: Juergen Gross [jgr...@suse.com]
Sent: 09 January 2018 17:40
To: Andrew Cooper; xen-devel@lists.xenproject.org
Cc: Ian Jackson; konrad.w...@oracle.com; jbeul...@suse.com
Subject: Re: [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains

On 09/01/18 18:01, Andrew Cooper wrote:
> On 09/01/18 14:27, Juergen Gross wrote:
>> Instead of using the TSS and stacks of the physical processor allocate
>> them per vcpu, map them in the per domain area, and use those.
>>
>> Signed-off-by: Juergen Gross 
>
> I don't see anything here which updates the fields in the TSS across
> context switch.  Without it, you'll be taking NMIs/MCEs/DF's on the
> wrong stack.

No, I'm doing ltr() with a TSS referencing the per-vcpu stacks. TSS is
per vcpu, too.



> I still don't see how your plan is viable in the first place, and is
> adding substantially more complexity to an answer which doesn't need it.
>
> I'm afraid I'm on the verge of a nack unless you can explain how is
> intended to be safe, and better than what we currently have.

It is laying the groundwork for a KAISER solution needing no mapping of
per physical cpu areas in the user guest tables, so isolating the guests
from each other.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [linux-linus bisection] complete test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm

2018-01-09 Thread osstest service owner
branch xen-unstable
xenbranch xen-unstable
job test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm
testid xen-boot

Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  linux 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
  Bug introduced:  b2cd1df66037e7c4697c7e40496bf7e4a5e16a2d
  Bug not present: 1c9dbd4615fd751e5e0b99807a3c7c8612e28e20
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/117757/


  (Revision log too long, omitted.)


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-linus/test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm.xen-boot.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/linux-linus/test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm.xen-boot
 --summary-out=tmp/117757.bisection-summary --basis-template=115643 
--blessings=real,real-bisect linux-linus 
test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm xen-boot
Searching for failure / basis pass:
 117721 fail [host=huxelrebe0] / 117305 [host=baroque0] 117251 [host=godello0] 
117205 [host=italia1] 117143 [host=elbling1] 116136 [host=elbling0] 116119 
[host=huxelrebe1] 116103 [host=baroque0] 115718 ok.
Failure / basis pass flights: 117721 / 115718
(tree with no url: minios)
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Latest b2cd1df66037e7c4697c7e40496bf7e4a5e16a2d 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
c8ea0457495342c417c3dc033bba25148b279f60 
b79708a8ed1b3d18bee67baeaf33b3fa529493e2 
ec320542e4f4de12305551ef5e3cd4d2ced85771
Basis pass 1c9dbd4615fd751e5e0b99807a3c7c8612e28e20 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
c8ea0457495342c417c3dc033bba25148b279f60 
5cd7ce5dde3f228b3b669ed9ca432f588947bd40 
ff93dc55431517ed29c70dbff6721c6b0803acf9
Generating revisions with ./adhoc-revtuple-generator  
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git#1c9dbd4615fd751e5e0b99807a3c7c8612e28e20-b2cd1df66037e7c4697c7e40496bf7e4a5e16a2d
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/qemu-xen-traditional.git#c8ea0457495342c417c3dc033bba25148b279f60-c8ea0457495342c417c3dc033bba25148b279f60
 
git://xenbits.xen.org/qemu-xen.git#5cd7ce5dde3f228b3b669ed9ca432f588947bd40-b79708a8ed1b3d18bee67baeaf33b3fa529493e2
 
git://xenbits.xen.org/xen.git#ff93dc55431517ed29c70dbff6721c6b0803acf9-ec320542e4f4de12305551ef5e3cd4d2ced85771
adhoc-revtuple-generator: tree discontiguous: linux-2.6
Loaded 2006 nodes in revision graph
Searching for test results:
 115599 [host=rimava1]
 115543 [host=nobling0]
 115573 [host=godello1]
 115615 [host=fiano0]
 115628 [host=nocera0]
 115643 [host=chardonnay0]
 115678 [host=pinot0]
 115690 [host=nobling1]
 115718 pass 1c9dbd4615fd751e5e0b99807a3c7c8612e28e20 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
c8ea0457495342c417c3dc033bba25148b279f60 
5cd7ce5dde3f228b3b669ed9ca432f588947bd40 
ff93dc55431517ed29c70dbff6721c6b0803acf9
 116103 [host=baroque0]
 116152 []
 116119 [host=huxelrebe1]
 116136 [host=elbling0]
 116164 []
 116182 []
 116215 []
 116226 [host=merlot0]
 116268 [host=merlot0]
 116316 [host=merlot0]
 116343 [host=merlot0]
 116433 [host=merlot0]
 116461 [host=merlot0]
 116514 [host=merlot0]
 116536 [host=merlot0]
 116550 [host=merlot0]
 116577 [host=merlot0]
 116592 [host=merlot0]
 116628 [host=merlot0]
 116775 [host=merlot0]
 116735 [host=merlot0]
 116810 [host=merlot0]
 116840 [host=merlot0]
 116876 [host=merlot0]
 116921 [host=merlot0]
 116947 [host=merlot0]
 117205 [host=italia1]
 117143 [host=elbling1]
 117251 [host=godello0]
 117305 [host=baroque0]
 117359 fail irrelevant
 117655 fail irrelevant
 117747 fail b2cd1df66037e7c4697c7e40496bf7e4a5e16a2d 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
c8ea0457495342c417c3dc033bba25148b279f60 
b79708a8ed1b3d18bee67baeaf33b3fa529493e2 
ec320542e4f4de12305551ef5e3cd4d2ced85771
 117694 fail irrelevant
 117740 pass 1c9dbd4615fd751e5e0b99807a3c7c8612e28e20 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
c8ea0457495342c417c3dc033bba25148b279f60 
b79708a8ed1b3d18bee67baeaf33b3fa529493e2 
b95f7be32d668fa4b09300892ebe19636ecebe36
 117742 pass 1c9dbd4615fd751e5e0b99807a3c7c8612e28e20 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 

Re: [Xen-devel] [PATCH RFC v1 56/74] xen/pvshim: add grant table operations

2018-01-09 Thread Roger Pau Monné
On Mon, Jan 08, 2018 at 10:19:39AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06,  wrote:
> > @@ -30,11 +31,17 @@
> >  #include 
> >  #include 
> >  
> > +#include 
> 
> Interesting: The event channel patch gave me the impression that
> it is not intended to deal with 32-bit guests.

AFAICT the event channel didn't need any explicit compat stuff. That's
not the case with grant tables however...

> > @@ -360,6 +367,173 @@ void pv_shim_inject_evtchn(unsigned int port)
> >  }
> >  }
> >  
> > +long pv_shim_grant_table_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) 
> > uop,
> > +unsigned int count, bool compat)
> > +{
> > +struct domain *d = current->domain;
> > +long rc = 0;
> > +
> > +if ( count != 1 )
> > +return -EINVAL;
> > +
> > +switch ( cmd )
> > +{
> > +case GNTTABOP_setup_table:
> > +{
> > +struct gnttab_setup_table nat;
> > +struct compat_gnttab_setup_table cmp;
> > +unsigned int i;
> > +
> > +if ( unlikely(compat ? copy_from_guest(, uop, 1)
> > + : copy_from_guest(, uop, 1)) ||
> > + unlikely(compat ? !compat_handle_okay(cmp.frame_list,
> > +   cmp.nr_frames)
> > + : !guest_handle_okay(nat.frame_list,
> > +  nat.nr_frames)) )
> > +{
> > +rc = -EFAULT;
> > +break;
> > +}
> > +if ( compat )
> > +#define XLAT_gnttab_setup_table_HNDL_frame_list(d, s)
> > +XLAT_gnttab_setup_table(, );
> > +#undef XLAT_gnttab_setup_table_HNDL_frame_list
> > +
> > +nat.status = GNTST_okay;
> > +
> > +spin_lock(_lock);
> > +if ( !nr_grant_list )
> > +{
> > +struct gnttab_query_size query_size = {
> > +.dom = DOMID_SELF,
> > +};
> > +
> > +rc = xen_hypercall_grant_table_op(GNTTABOP_query_size,
> > +  _size, 1);
> > +if ( rc )
> > +{
> > +spin_unlock(_lock);
> > +break;
> > +}
> > +
> > +ASSERT(!grant_frames);
> > +grant_frames = xzalloc_array(unsigned long,
> > + query_size.max_nr_frames);
> 
> Hmm, such runtime allocations (especially when the amount can
> be large) are a fundamental problem. I think this needs setting
> up before the guest is started.

The shim already sets some memory apart for it's own usage. It could
be moved to some shim-start function, but it will likely have to be
freed and allocated again on migration, since the number of grant
table frames can change when migrating from one host to another.

> > +{
> > +struct gnttab_query_size op;
> > +int rc;
> > +
> > +if ( unlikely(copy_from_guest(, uop, 1)) )
> > +{
> > +rc = -EFAULT;
> > +break;
> > +}
> > +
> > +rc = xen_hypercall_grant_table_op(GNTTABOP_query_size, , count);
> > +if ( rc )
> > +break;
> > +
> > +if ( copy_to_guest(uop, , 1) )
> 
> __copy_to_guest() (assuming this coping in and out is necessary
> in the first place).

I guess this could be bypassed by just using uop instead of op in the
hypercall?

> > +{
> > +rc = -EFAULT;
> > +break;
> > +}
> > +
> > +break;
> > +}
> > +default:
> > +rc = -ENOSYS;
> 
> -EOPNOTSUPP again please. Plus - what about other sub-ops?

They are not yet implemented. I think this is bare minimum needed to
boot a PV DomU, we can expand this later on.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Rich Persaud
>> On Jan 9, 2018, at 12:56, Stefano Stabellini  wrote:
>> 
>> On Tue, 9 Jan 2018, Doug Goldstein wrote:
>> On 1/9/18 11:33 AM, Jan Beulich wrote:
>> On 09.01.18 at 18:23,  wrote:
 On Tue, Jan 9, 2018 at 8:52 AM, Stefano Stabellini
  wrote:
>>> On Tue, 9 Jan 2018, George Dunlap wrote:
>>> On Mon, Jan 8, 2018 at 9:01 PM, Rich Persaud  wrote:
>>> On a similarly pragmatic note: would a variation of Anthony's vixen 
>>> patch
 series be suitable for pre-PVH Xen 4.6 - 4.9?  These versions are 
 currently 
 documented as security-supported (Oct 2018 - July 2020).
>> 
>> Hmm, Ian's mail seems to be focusing on the idea of checking in a
>> non-polished series to 4.10, rather than exctly what the content of
>> that series would be.
>> 
>> In the IRL conversation that preceeded this mail, the new short-term
>> target we discussed was:
>> 1. A 4.10-based shim that could boot either under HVM or PVH
>> 2. A script that would take an existing PV config, and spit out a) a
>> bootable ISO with the shim & whatever was needed, and b) a new config
>> that would boot the same VM, but in HVM mode with the shim
>> 
>> The script + a 4.10 shim binary *should* allow most PV guests to boot
>> without any changes whatsoever for most older versions of Xen.
>> 
>> There are a number of people for whom this won't work; I think we also
>> need to provide a way to transparently change PV guests into PVshim
>> guests.  But that will necessarily involve significant toolstack
>> functionality, at which point you might as well backport PVH as well.
> 
> Yes, there will be a number of people that won't be covered by this fix,
> including those that can't use HVM/PVH mode because VT-x isn't available
> at all in their environment. That is the only reason to run PV today.
> Providing a way to transparently change PV guests into PVshim guests
> won't cover any of these cases. A more complete workaround to SP3 is
> along the lines of https://marc.info/?l=xen-devel=151509740625690.
> 
> That said, I realize that we are only trying to do the best we can in a
> very difficult situation, with very little time in our hands. I agree
> with Ian that we should commit something unpolished and only partially
> reviewed soon, even though it doesn't cover a good chunk of the userbase
> for one reason or another. Even if migration doesn't work, it will still
> help all that don't require it. It is only a partial fix by nature
> anyway.
 
 Can people be a bit more explicit about what they think should be done 
 here?
 
 I'm happy to redirect effort to PVH shim if that's what the solution
 is going to be.
 
 I obviously prefer the HVM approach as it works on a broad range of Xen 
 versions
 without modification but I'm keen to get something done quickly and
 don't want to
 waste effort.
>>> 
>>> From what I've read today, I have no reason to believe the PVH
>>> shim won't work in HVM mode. How would the HVM-only approach
>>> be better in that case?
>>> 
>>> Jan
>> 
>> I feel like I should state the obvious here. Its tested over a large
>> data set.
> 
> Right: if we are going to commit something unpolished and unreviewed,
> let it be at least very well tested by the submitter. Honest question:
> how much more dev we need on PVShim before we get it to similar
> levels of confidence?

Since the primary audience for security fixes are production deployments of Xen 
where customer assets are at risk, is there an estimate for the percentage/size 
of Xen deployments where PVH (not only Xen 4.10) has already been deployed for 
production customers?  That could give other customers more confidence in 
deploying PVH in production.

Rich
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Doug Goldstein
On 1/8/18 11:45 AM, Ian Jackson wrote:
> But this is not a usual situation.  This time, we don't have the time
> to wait.
> 
> Opinions ?

I'm going to follow up with a top post with my feelings and from info on
various parts of the thread.

We have 2 versions of PV shim, the Citrix version and the Amazon
version. The proposal here is to go with the Citrix version. I'll
compare the two how I see them.

Citrix Version
--
- based on PVH which means 4.10 only
  - supported on 4.8 and 4.9 if we backport changes to the toolstack
- no solution for 4.7 and prior versions
- no backports currently available
- supports *most* Xen PV features
  - no support for PCI pass through
- confirmed to not work on HVM [1] [2]

Amazon Version
--
- based on HVM
- tested and deployed across Amazon's large fleet
- works from Xen 3.4 and up
- backported to 4.9 and 4.10 [3]
- supports *most* Xen PV features
  - ballooning is broken but Anthony has committed to providing a v3
with this support. [4]
  - migration is currently untested

If the primary driver for getting these patches in is for end users and
consumers of Xen, a large portion of them who have not yet deployed Xen
4.10 then why are we moving forard with an approach that requires them
to potentially upgrade or change a lot more of their environment. This
seems down right hostile to their concerns and needs.


[1]
https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00711.html
[2]
https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00778.html
[3]
https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00751.html
[4]
https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00736.html
-- 
Doug Goldstein



signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Wei Liu
On Tue, Jan 09, 2018 at 11:59:01AM -0600, Doug Goldstein wrote:
> On 1/9/18 5:50 AM, Wei Liu wrote:
> > 
> > We haven't tested booting the series I posted in HVM mode, but off the
> > top of my head it should work in HVM mode as well -- the multiboot path
> > is left intact.
> > 
> 
> Can we actually do this before committing to this series? I've seen a
> number of "this should work" in this thread and other threads but no
> actual confirmation.
> 

Oops, people are so quick to reply to this thread -- see my reply a few
minutes ago to Anthony.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Wei Liu
On Tue, Jan 09, 2018 at 09:23:03AM -0800, Anthony Liguori wrote:
> On Tue, Jan 9, 2018 at 8:52 AM, Stefano Stabellini
>  wrote:
> > On Tue, 9 Jan 2018, George Dunlap wrote:
> >> On Mon, Jan 8, 2018 at 9:01 PM, Rich Persaud  wrote:
> >> > On a similarly pragmatic note: would a variation of Anthony's vixen 
> >> > patch series be suitable for pre-PVH Xen 4.6 - 4.9?  These versions are 
> >> > currently documented as security-supported (Oct 2018 - July 2020).
> >>
> >> Hmm, Ian's mail seems to be focusing on the idea of checking in a
> >> non-polished series to 4.10, rather than exctly what the content of
> >> that series would be.
> >>
> >> In the IRL conversation that preceeded this mail, the new short-term
> >> target we discussed was:
> >> 1. A 4.10-based shim that could boot either under HVM or PVH
> >> 2. A script that would take an existing PV config, and spit out a) a
> >> bootable ISO with the shim & whatever was needed, and b) a new config
> >> that would boot the same VM, but in HVM mode with the shim
> >>
> >> The script + a 4.10 shim binary *should* allow most PV guests to boot
> >> without any changes whatsoever for most older versions of Xen.
> >>
> >> There are a number of people for whom this won't work; I think we also
> >> need to provide a way to transparently change PV guests into PVshim
> >> guests.  But that will necessarily involve significant toolstack
> >> functionality, at which point you might as well backport PVH as well.
> >
> > Yes, there will be a number of people that won't be covered by this fix,
> > including those that can't use HVM/PVH mode because VT-x isn't available
> > at all in their environment. That is the only reason to run PV today.
> > Providing a way to transparently change PV guests into PVshim guests
> > won't cover any of these cases. A more complete workaround to SP3 is
> > along the lines of https://marc.info/?l=xen-devel=151509740625690.
> >
> > That said, I realize that we are only trying to do the best we can in a
> > very difficult situation, with very little time in our hands. I agree
> > with Ian that we should commit something unpolished and only partially
> > reviewed soon, even though it doesn't cover a good chunk of the userbase
> > for one reason or another. Even if migration doesn't work, it will still
> > help all that don't require it. It is only a partial fix by nature
> > anyway.
> 
> Can people be a bit more explicit about what they think should be done here?
> 
> I'm happy to redirect effort to PVH shim if that's what the solution
> is going to be.
> 
> I obviously prefer the HVM approach as it works on a broad range of Xen 
> versions
> without modification but I'm keen to get something done quickly and
> don't want to
> waste effort.
> 

Ian, George, Roger and I had discussions yesterday and today to see what
we can do in the short term and we think the HVM approach is very
attractive. And we certainly appreciate your effort and willing to help.

After going through the PV in PVH work we thought it should work in HVM
mode the same way as it does in PVH. So today we tested our PV in PVH
branch, which booted fine in an HVM guest (turned out only one small fix
is needed!), and everything which worked under PVH mode works in HVM
mode as well.

So basically we've been working on your idea of running PV in HVM the
whole day -- to make it work with our branch, to provide sidecar
generation mechanism.

Ian has been busy writing the sidecar script and Roger and I have been
working on cleaning up the branch.  We want to post a new version as
soon as possible (tomorrow or even tonight).

All in all: yes, we like the idea  and we're working on it. Code-wise,
we start from the PV in PVH branch because it is more functionally
complete.  I want to take in some of the code from Amazon later when
necessary (for example I like the ECS_PROXY state but haven't had time
to think deeply about it). The final shim is going to be able to run in
HVM and PVH.  When running in HVM, users need to use the sidecar
mechanism, and this is only the short term solution. The same shim is
going to be able to run in PVH, so user can smoothly upgrade to a new
PVH capable version of Xen when required.

Ian, George and Roger please correct me if I'm wrong.

Anthony, you are welcome to join #xendevel to have a quick chat about
your ideas / concerns / whatever. It is far easy to grab our attention
there. :-)

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Doug Goldstein
On 1/9/18 5:50 AM, Wei Liu wrote:
> 
> We haven't tested booting the series I posted in HVM mode, but off the
> top of my head it should work in HVM mode as well -- the multiboot path
> is left intact.
> 

Can we actually do this before committing to this series? I've seen a
number of "this should work" in this thread and other threads but no
actual confirmation.

-- 
Doug Goldstein



signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Stefano Stabellini
On Tue, 9 Jan 2018, Doug Goldstein wrote:
> On 1/9/18 11:33 AM, Jan Beulich wrote:
>  On 09.01.18 at 18:23,  wrote:
> >> On Tue, Jan 9, 2018 at 8:52 AM, Stefano Stabellini
> >>  wrote:
> >>> On Tue, 9 Jan 2018, George Dunlap wrote:
>  On Mon, Jan 8, 2018 at 9:01 PM, Rich Persaud  wrote:
> > On a similarly pragmatic note: would a variation of Anthony's vixen 
> > patch 
> >> series be suitable for pre-PVH Xen 4.6 - 4.9?  These versions are 
> >> currently 
> >> documented as security-supported (Oct 2018 - July 2020).
> 
>  Hmm, Ian's mail seems to be focusing on the idea of checking in a
>  non-polished series to 4.10, rather than exctly what the content of
>  that series would be.
> 
>  In the IRL conversation that preceeded this mail, the new short-term
>  target we discussed was:
>  1. A 4.10-based shim that could boot either under HVM or PVH
>  2. A script that would take an existing PV config, and spit out a) a
>  bootable ISO with the shim & whatever was needed, and b) a new config
>  that would boot the same VM, but in HVM mode with the shim
> 
>  The script + a 4.10 shim binary *should* allow most PV guests to boot
>  without any changes whatsoever for most older versions of Xen.
> 
>  There are a number of people for whom this won't work; I think we also
>  need to provide a way to transparently change PV guests into PVshim
>  guests.  But that will necessarily involve significant toolstack
>  functionality, at which point you might as well backport PVH as well.
> >>>
> >>> Yes, there will be a number of people that won't be covered by this fix,
> >>> including those that can't use HVM/PVH mode because VT-x isn't available
> >>> at all in their environment. That is the only reason to run PV today.
> >>> Providing a way to transparently change PV guests into PVshim guests
> >>> won't cover any of these cases. A more complete workaround to SP3 is
> >>> along the lines of https://marc.info/?l=xen-devel=151509740625690.
> >>>
> >>> That said, I realize that we are only trying to do the best we can in a
> >>> very difficult situation, with very little time in our hands. I agree
> >>> with Ian that we should commit something unpolished and only partially
> >>> reviewed soon, even though it doesn't cover a good chunk of the userbase
> >>> for one reason or another. Even if migration doesn't work, it will still
> >>> help all that don't require it. It is only a partial fix by nature
> >>> anyway.
> >>
> >> Can people be a bit more explicit about what they think should be done 
> >> here?
> >>
> >> I'm happy to redirect effort to PVH shim if that's what the solution
> >> is going to be.
> >>
> >> I obviously prefer the HVM approach as it works on a broad range of Xen 
> >> versions
> >> without modification but I'm keen to get something done quickly and
> >> don't want to
> >> waste effort.
> > 
> > From what I've read today, I have no reason to believe the PVH
> > shim won't work in HVM mode. How would the HVM-only approach
> > be better in that case?
> > 
> > Jan
> 
> I feel like I should state the obvious here. Its tested over a large
> data set.

Right: if we are going to commit something unpolished and unreviewed,
let it be at least very well tested by the submitter. Honest question:
how much more dev we need on PVShim before we get it to similar
levels of confidence?

This is about an emergency stop-gap, we can work on a nice and shiny new
fix for the next release. We can even revert Anthony's series entirely
and start from scratch again, if that is required.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Doug Goldstein
On 1/8/18 3:44 PM, Anthony Liguori wrote:
> 
> It's not particularly hard to plumb through I think but if you are
> using PCI passthrough for PV, then you really shouldn't worry about
> Spectre/Meltdown.  That PV guest can already read all of physical
> memory (since no IOMMU is used) and they can also write to all
> physical memory which is far worse than what you can do with
> Spectre/Meltdown.
> 

That's certainly not true. The IOMMU is used by default with PV if its
available since Xen 4.0.1. Prior to that there was an option that was
"iommu=pv" which was not the default for 4.0.0. Its certainly possible
that's true for Xen 3.4 however.

-- 
Doug Goldstein



signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Doug Goldstein
On 1/9/18 11:33 AM, Jan Beulich wrote:
 On 09.01.18 at 18:23,  wrote:
>> On Tue, Jan 9, 2018 at 8:52 AM, Stefano Stabellini
>>  wrote:
>>> On Tue, 9 Jan 2018, George Dunlap wrote:
 On Mon, Jan 8, 2018 at 9:01 PM, Rich Persaud  wrote:
> On a similarly pragmatic note: would a variation of Anthony's vixen patch 
>> series be suitable for pre-PVH Xen 4.6 - 4.9?  These versions are currently 
>> documented as security-supported (Oct 2018 - July 2020).

 Hmm, Ian's mail seems to be focusing on the idea of checking in a
 non-polished series to 4.10, rather than exctly what the content of
 that series would be.

 In the IRL conversation that preceeded this mail, the new short-term
 target we discussed was:
 1. A 4.10-based shim that could boot either under HVM or PVH
 2. A script that would take an existing PV config, and spit out a) a
 bootable ISO with the shim & whatever was needed, and b) a new config
 that would boot the same VM, but in HVM mode with the shim

 The script + a 4.10 shim binary *should* allow most PV guests to boot
 without any changes whatsoever for most older versions of Xen.

 There are a number of people for whom this won't work; I think we also
 need to provide a way to transparently change PV guests into PVshim
 guests.  But that will necessarily involve significant toolstack
 functionality, at which point you might as well backport PVH as well.
>>>
>>> Yes, there will be a number of people that won't be covered by this fix,
>>> including those that can't use HVM/PVH mode because VT-x isn't available
>>> at all in their environment. That is the only reason to run PV today.
>>> Providing a way to transparently change PV guests into PVshim guests
>>> won't cover any of these cases. A more complete workaround to SP3 is
>>> along the lines of https://marc.info/?l=xen-devel=151509740625690.
>>>
>>> That said, I realize that we are only trying to do the best we can in a
>>> very difficult situation, with very little time in our hands. I agree
>>> with Ian that we should commit something unpolished and only partially
>>> reviewed soon, even though it doesn't cover a good chunk of the userbase
>>> for one reason or another. Even if migration doesn't work, it will still
>>> help all that don't require it. It is only a partial fix by nature
>>> anyway.
>>
>> Can people be a bit more explicit about what they think should be done here?
>>
>> I'm happy to redirect effort to PVH shim if that's what the solution
>> is going to be.
>>
>> I obviously prefer the HVM approach as it works on a broad range of Xen 
>> versions
>> without modification but I'm keen to get something done quickly and
>> don't want to
>> waste effort.
> 
> From what I've read today, I have no reason to believe the PVH
> shim won't work in HVM mode. How would the HVM-only approach
> be better in that case?
> 
> Jan

I feel like I should state the obvious here. Its tested over a large
data set.
-- 
Doug Goldstein



signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU

2018-01-09 Thread Anthony Liguori
On Mon, Jan 8, 2018 at 8:05 AM, Jan Beulich  wrote:
 On 04.01.18 at 14:06,  wrote:
>> From: Roger Pau Monne 
>>
>> Note that the unmask and the virq operations are handled by the shim
>> itself, and that FIFO event channels are not exposed to the guest.
>>
>> Signed-off-by: Anthony Liguori 
>> Signed-off-by: Roger Pau Monné 
>> Signed-off-by: Sergey Dyasli 
>
> In RFC state this certainly doesn't matter yet, but generally I'd
> expect From: to match the first S-o-b.
>
>> @@ -155,11 +156,31 @@ static void set_vcpu_id(void)
>>  static void xen_evtchn_upcall(struct cpu_user_regs *regs)
>>  {
>>  struct vcpu_info *vcpu_info = this_cpu(vcpu_info);
>> +unsigned long pending;
>>
>>  vcpu_info->evtchn_upcall_pending = 0;
>> -xchg(_info->evtchn_pending_sel, 0);
>> +pending = xchg(_info->evtchn_pending_sel, 0);
>>
>> -pv_console_rx(regs);
>> +while ( pending )
>> +{
>> +unsigned int l1 = ffsl(pending) - 1;
>
> find_first_set_bit() would look to be the better match here (and
> below), not the least because it translates (on capable hardware)
> to TZCNT instead of BSF.
>
>> +unsigned long evtchn = xchg(_shared_info->evtchn_pending[l1], 
>> 0);
>> +
>> +__clear_bit(l1, );
>> +evtchn &= ~XEN_shared_info->evtchn_mask[l1];
>> +while ( evtchn )
>> +{
>> +unsigned int port = ffsl(evtchn) - 1;
>> +
>> +__clear_bit(port, );
>> +port += l1 * BITS_PER_LONG;
>
> What about a 32-bit client? If that's not intended to be supported,
> building of such a guest should be prevented (in dom0_build.c).

Note that we discarded this approach in the Vixen series because it
wasn't working reliably for injecting remote event channel
notifications.

Regards,

Anthony Liguori

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Anthony Liguori
On Tue, Jan 9, 2018 at 9:33 AM, Jan Beulich  wrote:
 On 09.01.18 at 18:23,  wrote:
>> On Tue, Jan 9, 2018 at 8:52 AM, Stefano Stabellini
>>  wrote:
>>> On Tue, 9 Jan 2018, George Dunlap wrote:
 On Mon, Jan 8, 2018 at 9:01 PM, Rich Persaud  wrote:
 > On a similarly pragmatic note: would a variation of Anthony's vixen patch
>> series be suitable for pre-PVH Xen 4.6 - 4.9?  These versions are currently
>> documented as security-supported (Oct 2018 - July 2020).

 Hmm, Ian's mail seems to be focusing on the idea of checking in a
 non-polished series to 4.10, rather than exctly what the content of
 that series would be.

 In the IRL conversation that preceeded this mail, the new short-term
 target we discussed was:
 1. A 4.10-based shim that could boot either under HVM or PVH
 2. A script that would take an existing PV config, and spit out a) a
 bootable ISO with the shim & whatever was needed, and b) a new config
 that would boot the same VM, but in HVM mode with the shim

 The script + a 4.10 shim binary *should* allow most PV guests to boot
 without any changes whatsoever for most older versions of Xen.

 There are a number of people for whom this won't work; I think we also
 need to provide a way to transparently change PV guests into PVshim
 guests.  But that will necessarily involve significant toolstack
 functionality, at which point you might as well backport PVH as well.
>>>
>>> Yes, there will be a number of people that won't be covered by this fix,
>>> including those that can't use HVM/PVH mode because VT-x isn't available
>>> at all in their environment. That is the only reason to run PV today.
>>> Providing a way to transparently change PV guests into PVshim guests
>>> won't cover any of these cases. A more complete workaround to SP3 is
>>> along the lines of https://marc.info/?l=xen-devel=151509740625690.
>>>
>>> That said, I realize that we are only trying to do the best we can in a
>>> very difficult situation, with very little time in our hands. I agree
>>> with Ian that we should commit something unpolished and only partially
>>> reviewed soon, even though it doesn't cover a good chunk of the userbase
>>> for one reason or another. Even if migration doesn't work, it will still
>>> help all that don't require it. It is only a partial fix by nature
>>> anyway.
>>
>> Can people be a bit more explicit about what they think should be done here?
>>
>> I'm happy to redirect effort to PVH shim if that's what the solution
>> is going to be.
>>
>> I obviously prefer the HVM approach as it works on a broad range of Xen
>> versions
>> without modification but I'm keen to get something done quickly and
>> don't want to
>> waste effort.
>
> From what I've read today, I have no reason to believe the PVH
> shim won't work in HVM mode. How would the HVM-only approach
> be better in that case?

PVShim doesn't work on HVM.  I haven't debugged it but I get an early
panic due when constructing dom0.

There isn't adequate compatibility in the series too to support
anything but very recent Xen versions (for event channels at least).

The HVM-only approach is known to work on a wide set of Xen versions.

Regards,

Anthony Liguori

> Jan
>

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU

2018-01-09 Thread Jan Beulich
>>> On 09.01.18 at 17:45,  wrote:
> On Tue, Jan 09, 2018 at 01:00:10AM -0700, Jan Beulich wrote:
>> >>> On 08.01.18 at 17:22,  wrote:
>> > On Mon, Jan 08, 2018 at 09:05:40AM -0700, Jan Beulich wrote:
>> >> >>> On 04.01.18 at 14:06,  wrote:
>> >> > +unsigned long evtchn = 
>> >> > xchg(_shared_info->evtchn_pending[l1], 0);
>> >> > +
>> >> > +__clear_bit(l1, );
>> >> > +evtchn &= ~XEN_shared_info->evtchn_mask[l1];
>> >> > +while ( evtchn )
>> >> > +{
>> >> > +unsigned int port = ffsl(evtchn) - 1;
>> >> > +
>> >> > +__clear_bit(port, );
>> >> > +port += l1 * BITS_PER_LONG;
>> >> 
>> >> What about a 32-bit client? If that's not intended to be supported,
>> >> building of such a guest should be prevented (in dom0_build.c).
>> > 
>> > 32bit client? You mean building a shim that runs in 32bit mode? If so
>> > I haven't really through of it, but in any case BITS_PER_LOG would be
>> > OK also in that case?
>> 
>> No, by "client" I mean the (sole) guest of the shim, in the 32-bit
>> case of which you'd need to use BITS_PER_EVTCHN_WORD() here.
>> But since 32-bit PV guests are not a problem wrt SP3, I can see
>> why we wouldn't want/need to support that case. Yet if so, I'd
>> prefer if we did that uniformly, by e.g. also avoiding the compat
>> complications in the new grant table wrapper.
> 
> Hm, I'm afraid I'm not following. Xen is 64bits, and this is the
> shared_info page of the shim (Xen), so the size it's BITS_PER_LONG.

Oh, in that case I'm sorry for being the one being confused here.
I was certainly under the impression that this is the page shared
with the client domain.

>> >> > +case EVTCHNOP_unmask: {
>> >> > +struct evtchn_unmask unmask;
>> >> > +
>> >> > +if ( copy_from_guest(, arg, 1) != 0 )
>> >> > +return -EFAULT;
>> >> > +
>> >> > +/* Unmask is handled in L1 */
>> >> > +rc = evtchn_unmask(unmask.port);
>> >> > +
>> >> > +break;
>> >> > +}
>> >> 
>> >> Is this really sufficient, without handing anything through to L0?
>> >> Perhaps it's fine as long as there's no pass-through support here.
>> > 
>> > For the unmask operation? I think so, if there was a pending event the
>> > shim will already take care of injecting it to the guest.
>> 
>> Well, as the Linux code (evtchn_2l_unmask()) tells us certain
>> unmasks have to go through the hypervisor. I would assume
>> that in the case of the shim this means that L2 requests need
>> to also be handed through to L0 whenever they're not being
>> handled entirely locally to L1.
> 
> I'm not sure any L2 unmask needs to go through L0. If we perform the
> unmask in L1 and there's an event pending L1 will already inject an
> interrupt into L2, and AFAIK that's the point of using EVTCHNOP_unmask
> (get an interrupt after unmask if an event is pending).

Possible, but to be honest I'm not sure: If getting an event was
all that's wanted in Linux, I don't think it would need to be done
by issuing a hypercall. Otoh maybe that code just isn't optimally
written. IOW - as long as things work, I'm fine here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains

2018-01-09 Thread Juergen Gross
On 09/01/18 18:01, Andrew Cooper wrote:
> On 09/01/18 14:27, Juergen Gross wrote:
>> Instead of using the TSS and stacks of the physical processor allocate
>> them per vcpu, map them in the per domain area, and use those.
>>
>> Signed-off-by: Juergen Gross 
> 
> I don't see anything here which updates the fields in the TSS across
> context switch.  Without it, you'll be taking NMIs/MCEs/DF's on the
> wrong stack.

No, I'm doing ltr() with a TSS referencing the per-vcpu stacks. TSS is
per vcpu, too.

> I still don't see how your plan is viable in the first place, and is
> adding substantially more complexity to an answer which doesn't need it.
> 
> I'm afraid I'm on the verge of a nack unless you can explain how is
> intended to be safe, and better than what we currently have.

It is laying the groundwork for a KAISER solution needing no mapping of
per physical cpu areas in the user guest tables, so isolating the guests
from each other.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Jan Beulich
>>> On 09.01.18 at 18:23,  wrote:
> On Tue, Jan 9, 2018 at 8:52 AM, Stefano Stabellini
>  wrote:
>> On Tue, 9 Jan 2018, George Dunlap wrote:
>>> On Mon, Jan 8, 2018 at 9:01 PM, Rich Persaud  wrote:
>>> > On a similarly pragmatic note: would a variation of Anthony's vixen patch 
> series be suitable for pre-PVH Xen 4.6 - 4.9?  These versions are currently 
> documented as security-supported (Oct 2018 - July 2020).
>>>
>>> Hmm, Ian's mail seems to be focusing on the idea of checking in a
>>> non-polished series to 4.10, rather than exctly what the content of
>>> that series would be.
>>>
>>> In the IRL conversation that preceeded this mail, the new short-term
>>> target we discussed was:
>>> 1. A 4.10-based shim that could boot either under HVM or PVH
>>> 2. A script that would take an existing PV config, and spit out a) a
>>> bootable ISO with the shim & whatever was needed, and b) a new config
>>> that would boot the same VM, but in HVM mode with the shim
>>>
>>> The script + a 4.10 shim binary *should* allow most PV guests to boot
>>> without any changes whatsoever for most older versions of Xen.
>>>
>>> There are a number of people for whom this won't work; I think we also
>>> need to provide a way to transparently change PV guests into PVshim
>>> guests.  But that will necessarily involve significant toolstack
>>> functionality, at which point you might as well backport PVH as well.
>>
>> Yes, there will be a number of people that won't be covered by this fix,
>> including those that can't use HVM/PVH mode because VT-x isn't available
>> at all in their environment. That is the only reason to run PV today.
>> Providing a way to transparently change PV guests into PVshim guests
>> won't cover any of these cases. A more complete workaround to SP3 is
>> along the lines of https://marc.info/?l=xen-devel=151509740625690.
>>
>> That said, I realize that we are only trying to do the best we can in a
>> very difficult situation, with very little time in our hands. I agree
>> with Ian that we should commit something unpolished and only partially
>> reviewed soon, even though it doesn't cover a good chunk of the userbase
>> for one reason or another. Even if migration doesn't work, it will still
>> help all that don't require it. It is only a partial fix by nature
>> anyway.
> 
> Can people be a bit more explicit about what they think should be done here?
> 
> I'm happy to redirect effort to PVH shim if that's what the solution
> is going to be.
> 
> I obviously prefer the HVM approach as it works on a broad range of Xen 
> versions
> without modification but I'm keen to get something done quickly and
> don't want to
> waste effort.

From what I've read today, I have no reason to believe the PVH
shim won't work in HVM mode. How would the HVM-only approach
be better in that case?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Anthony Liguori
On Tue, Jan 9, 2018 at 8:52 AM, Stefano Stabellini
 wrote:
> On Tue, 9 Jan 2018, George Dunlap wrote:
>> On Mon, Jan 8, 2018 at 9:01 PM, Rich Persaud  wrote:
>> > On a similarly pragmatic note: would a variation of Anthony's vixen patch 
>> > series be suitable for pre-PVH Xen 4.6 - 4.9?  These versions are 
>> > currently documented as security-supported (Oct 2018 - July 2020).
>>
>> Hmm, Ian's mail seems to be focusing on the idea of checking in a
>> non-polished series to 4.10, rather than exctly what the content of
>> that series would be.
>>
>> In the IRL conversation that preceeded this mail, the new short-term
>> target we discussed was:
>> 1. A 4.10-based shim that could boot either under HVM or PVH
>> 2. A script that would take an existing PV config, and spit out a) a
>> bootable ISO with the shim & whatever was needed, and b) a new config
>> that would boot the same VM, but in HVM mode with the shim
>>
>> The script + a 4.10 shim binary *should* allow most PV guests to boot
>> without any changes whatsoever for most older versions of Xen.
>>
>> There are a number of people for whom this won't work; I think we also
>> need to provide a way to transparently change PV guests into PVshim
>> guests.  But that will necessarily involve significant toolstack
>> functionality, at which point you might as well backport PVH as well.
>
> Yes, there will be a number of people that won't be covered by this fix,
> including those that can't use HVM/PVH mode because VT-x isn't available
> at all in their environment. That is the only reason to run PV today.
> Providing a way to transparently change PV guests into PVshim guests
> won't cover any of these cases. A more complete workaround to SP3 is
> along the lines of https://marc.info/?l=xen-devel=151509740625690.
>
> That said, I realize that we are only trying to do the best we can in a
> very difficult situation, with very little time in our hands. I agree
> with Ian that we should commit something unpolished and only partially
> reviewed soon, even though it doesn't cover a good chunk of the userbase
> for one reason or another. Even if migration doesn't work, it will still
> help all that don't require it. It is only a partial fix by nature
> anyway.

Can people be a bit more explicit about what they think should be done here?

I'm happy to redirect effort to PVH shim if that's what the solution
is going to be.

I obviously prefer the HVM approach as it works on a broad range of Xen versions
without modification but I'm keen to get something done quickly and
don't want to
waste effort.

Where are people's heads at?

Regards,

Anthony Liguori


> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.15-rc6 + xen-unstable: BUG: unable to handle kernel NULL pointer dereference at (null), [ 0.000000] IP: zero_resv_unavail+0x8e/0xe1

2018-01-09 Thread Sander Eikelenboom
On 09/01/18 17:38, Boris Ostrovsky wrote:
> On 01/09/2018 11:31 AM, Sander Eikelenboom wrote:
>> On 09/01/18 17:16, Pavel Tatashin wrote:
>>> Hi Juergen,
>>>
>>> Do you have this patch applied:
>>>
>>> https://github.com/torvalds/linux/commit/e8c24773d6b2cd9bc8b36bd6e60beff599be14be
>> Seems this hasn't made it to Linus yet ?

Hmm that was a stupid remark, since the link actually is to Linus his
github repo :p (though not his git.kernel.org repo).

>> I will give it a test and report back, thanks !

Test turns out the patch helps and dom0 boots fine now.
Thanks !

> 
> 
> BTW, I assume this problem goes away if you don't specify dom0_mem?

Haven't tested, since i need the dom0_mem for pci-passthrough.

> -boris
> 

--
Sander


>>
>>> Thank you,
>>> Pavel
>>>
>>> On 01/09/2018 11:10 AM, Juergen Gross wrote:
 On 09/01/18 16:29, Sander Eikelenboom wrote:
> Since it's already rc7:
> "Give me a subtle ping, Vasili. One subtle ping only, please."
 I like that film :-)
>> :)
>>
>> --
>> Sander
>>
 Pavel, can you please comment? Do you have an idea how to repair the
 issue or should we revert your patch in 4.15?


 Juergen

> On 04/01/18 21:02, Sander Eikelenboom wrote:
>> On 04/01/18 12:44, Juergen Gross wrote:
>>> On 04/01/18 11:17, Sander Eikelenboom wrote:
 Hi Boris / Juergen,

 First of all best wishes for a quite turbulent starting new year.

 Now the holidays are over I finally gotten to test a linux 4.15-rc6 
 kernel
 and experienced a crash in early dom0 boot on my system (AMD phenom 
 x6).

 I tested some earlier linux 4.15 rc's but experienced crashes then as 
 well,
 but didn't have time to setup serial console to send them in
 (and waited to see if the issue Boris fixed with AMD PCI 64bit bar's 
 could be it).

 But since that patch went in before 4.15 rc6, that doesn't seem to be 
 the issue.
 So it could be that the culprit went in pretty earlier in the 4.15 
 cycle.

 The 4.15-rc6 kernel boots fine on bare metal, as does a 4.14.6 kernel 
 on xen-unstable.

 Hopefully you have a pointer to what is wrong, if not i can try to do 
 a bisect.
>>> A bisect would be very welcome.
>> Hi Juergen / Boris / Pavel,
>>
>> Bisection result is:
>>
>> a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b is the first bad commit
>> commit a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b
>> Author: Pavel Tatashin 
>> Date:   Wed Nov 15 17:36:31 2017 -0800
>>
>>  mm: zero reserved and unavailable struct pages
>>  
>>  Some memory is reserved but unavailable: not present in 
>> memblock.memory
>>  (because not backed by physical pages), but present in 
>> memblock.reserved.
>>  Such memory has backing struct pages, but they are not initialized 
>> by
>>  going through __init_single_page().
>>  
>>  In some cases these struct pages are accessed even if they do not
>>  contain any data.  One example is page_to_pfn() might access 
>> page->flags
>>  if this is where section information is stored (CONFIG_SPARSEMEM,
>>  SECTION_IN_PAGE_FLAGS).
>>  
>>  One example of such memory: trim_low_memory_range() unconditionally
>>  reserves from pfn 0, but e820__memblock_setup() might provide the
>>  exiting memory from pfn 1 (i.e.  KVM).
>>  
>>  Since struct pages are zeroed in __init_single_page(), and not 
>> during
>>  allocation time, we must zero such struct pages explicitly.
>>  
>>  The patch involves adding a new memblock iterator:
>>  for_each_resv_unavail_range(i, p_start, p_end)
>>  
>>  Which iterates through reserved && !memory lists, and we zero 
>> struct pages
>>  explicitly by calling mm_zero_struct_page().
>>  
>>  ===
>>  
>>  Here is more detailed example of problem that this patch is 
>> addressing:
>>  
>>  Run tested on qemu with the following arguments:
>>  
>>  -enable-kvm -cpu kvm64 -m 512 -smp 2
>>  
>>  This patch reports that there are 98 unavailable pages.
>>  
>>  They are: pfn 0 and pfns in range [159, 255].
>>  
>>  Note, trim_low_memory_range() reserves only pfns in range [0, 15], 
>> it does
>>  not reserve [159, 255] ones.
>>  
>>  e820__memblock_setup() reports linux that the following physical 
>> ranges are
>>  available:
>>  [1 , 158]
>>  [256, 130783]
>>  
>>  Notice, that exactly unavailable pfns are missing!
>>  
>>  Now, lets check what we have in zone 0: [1, 

Re: [Xen-devel] [RFC PATCH 1/8] x86/domctl: introduce a pair of hypercall to set and get cpu topology

2018-01-09 Thread Daniel De Graaf

On 01/09/2018 04:06 AM, Chao Gao wrote:

On Mon, Jan 08, 2018 at 01:14:44PM -0500, Daniel De Graaf wrote:

On 01/07/2018 11:01 PM, Chao Gao wrote:

Define interface, structures and hypercalls for toolstack to build
cpu topology and for guest that will retrieve it [1].
Two subop hypercalls introduced by this patch:
XEN_DOMCTL_set_cpu_topology to define cpu topology information per domain
and XENMEM_get_cpu_topology to retrieve cpu topology information.

[1]: during guest creation, those information helps hvmloader to build ACPI.

Signed-off-by: Chao Gao 


When adding new XSM controls for use by device models, you also
need to add the permissions to the device_model macro defined in
tools/flask/policy/modules/xen.if.  If domains need to call this
function on themselves (is this only true for get?), you will also
need to add it to declare_domain_common.



Hi, Daniel.

Yes. XENMEM_get_cpu_topology will be called by the domain itself.
And Both get and set will be called by dom0 when creating one domain.
So I need:
1. add *set* and *get* to create_domain_common.
2. add *set* to declare_domain_common.

Is it right?

Thanks
Chao


It sounds like you need to add get to declare_domain_common (not set)
because the domain only needs to invoke this on itself.  If the device
model doesn't need to use these hypercalls (would guest cpu hotplug or
similar things need them?), then that's all you need to add.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 21/74] x86/entry: Early PVH boot code

2018-01-09 Thread Wei Liu
On Tue, Jan 09, 2018 at 09:41:51AM -0700, Jan Beulich wrote:
> >>> On 09.01.18 at 16:45,  wrote:
> > On Fri, Jan 05, 2018 at 06:32:56AM -0700, Jan Beulich wrote:
> >> > +pvh_mbi.mods_count = pvh_info->nr_modules;
> >> > +pvh_mbi.mods_addr = __pa(pvh_mbi_mods);
> >> > +
> >> > +mod = pvh_mbi_mods;
> >> > +entry = __va(pvh_info->modlist_paddr);
> >> 
> >> How come __va() already works at this point in time? And what about
> >> this address being beyond 4Gb?
> >> 
> > 
> > The original code uses __va at the beginning of __start_xen so this is
> > no more erroneous than what we originally have.
> 
> Well, I was assuming that these uses of __va() here are the
> reason why you need to extend the initial mapping in another
> patch. The original ones early in __start_xen() all deal with the
> MBI which we've relocated to a place where __va() can be used.

I see -- I thought everything was relocated automatically, which in
hindsight looks very stupid. That's probably why Andrew wrote that patch
to extend the mapping. We can certainly relocate pvh info as well, but
then that would delay the work.  We can add that as a blocker for the
proper solution later.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains

2018-01-09 Thread Andrew Cooper
On 09/01/18 14:27, Juergen Gross wrote:
> Instead of using the TSS and stacks of the physical processor allocate
> them per vcpu, map them in the per domain area, and use those.
>
> Signed-off-by: Juergen Gross 

I don't see anything here which updates the fields in the TSS across
context switch.  Without it, you'll be taking NMIs/MCEs/DF's on the
wrong stack.

I still don't see how your plan is viable in the first place, and is
adding substantially more complexity to an answer which doesn't need it.

I'm afraid I'm on the verge of a nack unless you can explain how is
intended to be safe, and better than what we currently have.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Stefano Stabellini
On Tue, 9 Jan 2018, George Dunlap wrote:
> On Mon, Jan 8, 2018 at 9:01 PM, Rich Persaud  wrote:
> > On a similarly pragmatic note: would a variation of Anthony's vixen patch 
> > series be suitable for pre-PVH Xen 4.6 - 4.9?  These versions are currently 
> > documented as security-supported (Oct 2018 - July 2020).
> 
> Hmm, Ian's mail seems to be focusing on the idea of checking in a
> non-polished series to 4.10, rather than exctly what the content of
> that series would be.
> 
> In the IRL conversation that preceeded this mail, the new short-term
> target we discussed was:
> 1. A 4.10-based shim that could boot either under HVM or PVH
> 2. A script that would take an existing PV config, and spit out a) a
> bootable ISO with the shim & whatever was needed, and b) a new config
> that would boot the same VM, but in HVM mode with the shim
> 
> The script + a 4.10 shim binary *should* allow most PV guests to boot
> without any changes whatsoever for most older versions of Xen.
> 
> There are a number of people for whom this won't work; I think we also
> need to provide a way to transparently change PV guests into PVshim
> guests.  But that will necessarily involve significant toolstack
> functionality, at which point you might as well backport PVH as well.

Yes, there will be a number of people that won't be covered by this fix,
including those that can't use HVM/PVH mode because VT-x isn't available
at all in their environment. That is the only reason to run PV today.
Providing a way to transparently change PV guests into PVshim guests
won't cover any of these cases. A more complete workaround to SP3 is
along the lines of https://marc.info/?l=xen-devel=151509740625690.

That said, I realize that we are only trying to do the best we can in a
very difficult situation, with very little time in our hands. I agree
with Ian that we should commit something unpolished and only partially
reviewed soon, even though it doesn't cover a good chunk of the userbase
for one reason or another. Even if migration doesn't work, it will still
help all that don't require it. It is only a partial fix by nature
anyway.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU

2018-01-09 Thread Roger Pau Monné
On Tue, Jan 09, 2018 at 01:00:10AM -0700, Jan Beulich wrote:
> >>> On 08.01.18 at 17:22,  wrote:
> > On Mon, Jan 08, 2018 at 09:05:40AM -0700, Jan Beulich wrote:
> >> >>> On 04.01.18 at 14:06,  wrote:
> >> > +unsigned long evtchn = 
> >> > xchg(_shared_info->evtchn_pending[l1], 0);
> >> > +
> >> > +__clear_bit(l1, );
> >> > +evtchn &= ~XEN_shared_info->evtchn_mask[l1];
> >> > +while ( evtchn )
> >> > +{
> >> > +unsigned int port = ffsl(evtchn) - 1;
> >> > +
> >> > +__clear_bit(port, );
> >> > +port += l1 * BITS_PER_LONG;
> >> 
> >> What about a 32-bit client? If that's not intended to be supported,
> >> building of such a guest should be prevented (in dom0_build.c).
> > 
> > 32bit client? You mean building a shim that runs in 32bit mode? If so
> > I haven't really through of it, but in any case BITS_PER_LOG would be
> > OK also in that case?
> 
> No, by "client" I mean the (sole) guest of the shim, in the 32-bit
> case of which you'd need to use BITS_PER_EVTCHN_WORD() here.
> But since 32-bit PV guests are not a problem wrt SP3, I can see
> why we wouldn't want/need to support that case. Yet if so, I'd
> prefer if we did that uniformly, by e.g. also avoiding the compat
> complications in the new grant table wrapper.

Hm, I'm afraid I'm not following. Xen is 64bits, and this is the
shared_info page of the shim (Xen), so the size it's BITS_PER_LONG.

32bit PV guests have been tested and seem to work fine. Whether
someone would want to convert them or not I don't know, but it's
almost no extra effort to provide a shim that works for both
bitness.

> >> > +case EVTCHNOP_unmask: {
> >> > +struct evtchn_unmask unmask;
> >> > +
> >> > +if ( copy_from_guest(, arg, 1) != 0 )
> >> > +return -EFAULT;
> >> > +
> >> > +/* Unmask is handled in L1 */
> >> > +rc = evtchn_unmask(unmask.port);
> >> > +
> >> > +break;
> >> > +}
> >> 
> >> Is this really sufficient, without handing anything through to L0?
> >> Perhaps it's fine as long as there's no pass-through support here.
> > 
> > For the unmask operation? I think so, if there was a pending event the
> > shim will already take care of injecting it to the guest.
> 
> Well, as the Linux code (evtchn_2l_unmask()) tells us certain
> unmasks have to go through the hypervisor. I would assume
> that in the case of the shim this means that L2 requests need
> to also be handed through to L0 whenever they're not being
> handled entirely locally to L1.

I'm not sure any L2 unmask needs to go through L0. If we perform the
unmask in L1 and there's an event pending L1 will already inject an
interrupt into L2, and AFAIK that's the point of using EVTCHNOP_unmask
(get an interrupt after unmask if an event is pending).

> >> > @@ -1030,6 +1055,11 @@ long do_event_channel_op(int cmd, 
> >> > XEN_GUEST_HANDLE_PARAM(void) arg)
> >> >  {
> >> >  long rc;
> >> >  
> >> > +#ifdef CONFIG_X86
> >> > +if ( pv_shim )
> >> > +return pv_shim_event_channel_op(cmd, arg);
> >> > +#endif
> >> 
> >> Patch it right into the hypercall table instead?
> > 
> > That would only work if the shim is a compile time option, but not a
> > run time one, the hypercall table is ro.
> 
> Well, yes and no: See nmi_shootdown_cpus() for a precedent
> of how to do that without removing the r/o attribute. Not having
> the hook sit here would (I assume) allow to avoid compiling the
> entire do_event_channel_op() down the road in the shim-only
> case. The compiler may be able to partially do this (omitting the
> rest of the function), but my experience is that deferring to the
> compiler in this regard often means leaving some traces around.

I see, I could use the write_atomic + directmap trick, but I think I
will leave that for later, since doesn't seem crucial to me.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/ioemul: Account for ioemul_handle_quirk() in stub length check

2018-01-09 Thread Andrew Cooper
The opcode potentially written into ctxt->io_emul_stub[] in the case
that ioemul_handle_quirk() is overriding the default logic isnt
accounted for in the build-time check that the stubs are large enough.

Introduce IOEMUL_QUIRK_STUB_BYTES and use for both the main and quirk
stub cases.  As a slim optimisation, avoid writing out the default stub
when we know we are going to overwrite it.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
---
 xen/arch/x86/ioport_emulate.c  |  2 ++
 xen/arch/x86/pv/emul-priv-op.c | 27 ---
 xen/include/asm-x86/io.h   |  1 +
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/ioport_emulate.c b/xen/arch/x86/ioport_emulate.c
index 58d2b53..1f6f794 100644
--- a/xen/arch/x86/ioport_emulate.c
+++ b/xen/arch/x86/ioport_emulate.c
@@ -35,6 +35,8 @@ static void ioemul_handle_proliant_quirk(
 io_emul_stub[8] = 0x9d;
 /*ret */
 io_emul_stub[9] = 0xc3;
+
+BUILD_BUG_ON(IOEMUL_QUIRK_STUB_BYTES < 10);
 }
 
 static int __init proliant_quirk(struct dmi_system_id *d)
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 1041a4c..4087cf2 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -89,19 +89,24 @@ static io_emul_stub_t *io_emul_stub_setup(struct 
priv_op_ctxt *ctxt, u8 opcode,
 /* callq *%rcx */
 ctxt->io_emul_stub[10] = 0xff;
 ctxt->io_emul_stub[11] = 0xd1;
-/* data16 or nop */
-ctxt->io_emul_stub[12] = (bytes != 2) ? 0x90 : 0x66;
-/*  */
-ctxt->io_emul_stub[13] = opcode;
-/* imm8 or nop */
-ctxt->io_emul_stub[14] = !(opcode & 8) ? port : 0x90;
-/* ret (jumps to guest_to_host_gpr_switch) */
-ctxt->io_emul_stub[15] = 0xc3;
-BUILD_BUG_ON(STUB_BUF_SIZE / 2 < 16);
-
-if ( ioemul_handle_quirk )
+
+if ( likely(!ioemul_handle_quirk) )
+{
+/* data16 or nop */
+ctxt->io_emul_stub[12] = (bytes != 2) ? 0x90 : 0x66;
+/*  */
+ctxt->io_emul_stub[13] = opcode;
+/* imm8 or nop */
+ctxt->io_emul_stub[14] = !(opcode & 8) ? port : 0x90;
+/* ret (jumps to guest_to_host_gpr_switch) */
+ctxt->io_emul_stub[15] = 0xc3;
+}
+else
 ioemul_handle_quirk(opcode, >io_emul_stub[12], ctxt->ctxt.regs);
 
+BUILD_BUG_ON(STUB_BUF_SIZE / 2 < MAX(16, /* Regular stubs */
+ 12 + IOEMUL_QUIRK_STUB_BYTES));
+
 /* Handy function-typed pointer to the stub. */
 return (void *)(this_cpu(stubs.addr) + STUB_BUF_SIZE / 2);
 }
diff --git a/xen/include/asm-x86/io.h b/xen/include/asm-x86/io.h
index b156f48..e6bb20c 100644
--- a/xen/include/asm-x86/io.h
+++ b/xen/include/asm-x86/io.h
@@ -51,6 +51,7 @@ __OUT(l,,int)
 extern void (*pv_post_outb_hook)(unsigned int port, u8 value);
 
 /* Function pointer used to handle platform specific I/O port emulation. */
+#define IOEMUL_QUIRK_STUB_BYTES 10
 extern void (*ioemul_handle_quirk)(
 u8 opcode, char *io_emul_stub, struct cpu_user_regs *regs);
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 21/74] x86/entry: Early PVH boot code

2018-01-09 Thread Jan Beulich
>>> On 09.01.18 at 16:45,  wrote:
> On Fri, Jan 05, 2018 at 06:32:56AM -0700, Jan Beulich wrote:
>> > +pvh_mbi.mods_count = pvh_info->nr_modules;
>> > +pvh_mbi.mods_addr = __pa(pvh_mbi_mods);
>> > +
>> > +mod = pvh_mbi_mods;
>> > +entry = __va(pvh_info->modlist_paddr);
>> 
>> How come __va() already works at this point in time? And what about
>> this address being beyond 4Gb?
>> 
> 
> The original code uses __va at the beginning of __start_xen so this is
> no more erroneous than what we originally have.

Well, I was assuming that these uses of __va() here are the
reason why you need to extend the initial mapping in another
patch. The original ones early in __start_xen() all deal with the
MBI which we've relocated to a place where __va() can be used.

>> > +for ( i = 0; i < pvh_info->nr_modules; i++ )
>> > +{
>> > +ASSERT(!(entry[i].paddr >> 32));
>> 
>> To relax this condition (in particular to allow huge initrd), how
>> about ...
>> 
>> > +mod[i].mod_start = entry[i].paddr;
>> > +mod[i].mod_end   = entry[i].paddr + entry[i].size;
>> 
>> ... using the EFI approach here and store the PFN in mod_start
>> and the size in mod_end?
> 
> 
> This function turns pvh_info into multiboot info. I'm afraid I don't
> follow you suggestion here. The best approach now is to BUG_ON here and
> consider huge initrd later.

Doing this later is fine of course; what I'm referring to is that
you store paddr of start and end, whereas the early EFI code
stores PFN and size (and the consumer code in __start_xen
knows to tell the cases apart).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.15-rc6 + xen-unstable: BUG: unable to handle kernel NULL pointer dereference at (null), [ 0.000000] IP: zero_resv_unavail+0x8e/0xe1

2018-01-09 Thread Boris Ostrovsky
On 01/09/2018 11:31 AM, Sander Eikelenboom wrote:
> On 09/01/18 17:16, Pavel Tatashin wrote:
>> Hi Juergen,
>>
>> Do you have this patch applied:
>>
>> https://github.com/torvalds/linux/commit/e8c24773d6b2cd9bc8b36bd6e60beff599be14be
> Seems this hasn't made it to Linus yet ?
>
> I will give it a test and report back, thanks !


BTW, I assume this problem goes away if you don't specify dom0_mem?

-boris

>
>> Thank you,
>> Pavel
>>
>> On 01/09/2018 11:10 AM, Juergen Gross wrote:
>>> On 09/01/18 16:29, Sander Eikelenboom wrote:
 Since it's already rc7:
 "Give me a subtle ping, Vasili. One subtle ping only, please."
>>> I like that film :-)
> :)
>
> --
> Sander
>
>>> Pavel, can you please comment? Do you have an idea how to repair the
>>> issue or should we revert your patch in 4.15?
>>>
>>>
>>> Juergen
>>>
 On 04/01/18 21:02, Sander Eikelenboom wrote:
> On 04/01/18 12:44, Juergen Gross wrote:
>> On 04/01/18 11:17, Sander Eikelenboom wrote:
>>> Hi Boris / Juergen,
>>>
>>> First of all best wishes for a quite turbulent starting new year.
>>>
>>> Now the holidays are over I finally gotten to test a linux 4.15-rc6 
>>> kernel
>>> and experienced a crash in early dom0 boot on my system (AMD phenom x6).
>>>
>>> I tested some earlier linux 4.15 rc's but experienced crashes then as 
>>> well,
>>> but didn't have time to setup serial console to send them in
>>> (and waited to see if the issue Boris fixed with AMD PCI 64bit bar's 
>>> could be it).
>>>
>>> But since that patch went in before 4.15 rc6, that doesn't seem to be 
>>> the issue.
>>> So it could be that the culprit went in pretty earlier in the 4.15 
>>> cycle.
>>>
>>> The 4.15-rc6 kernel boots fine on bare metal, as does a 4.14.6 kernel 
>>> on xen-unstable.
>>>
>>> Hopefully you have a pointer to what is wrong, if not i can try to do a 
>>> bisect.
>> A bisect would be very welcome.
> Hi Juergen / Boris / Pavel,
>
> Bisection result is:
>
> a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b is the first bad commit
> commit a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b
> Author: Pavel Tatashin 
> Date:   Wed Nov 15 17:36:31 2017 -0800
>
>  mm: zero reserved and unavailable struct pages
>  
>  Some memory is reserved but unavailable: not present in 
> memblock.memory
>  (because not backed by physical pages), but present in 
> memblock.reserved.
>  Such memory has backing struct pages, but they are not initialized by
>  going through __init_single_page().
>  
>  In some cases these struct pages are accessed even if they do not
>  contain any data.  One example is page_to_pfn() might access 
> page->flags
>  if this is where section information is stored (CONFIG_SPARSEMEM,
>  SECTION_IN_PAGE_FLAGS).
>  
>  One example of such memory: trim_low_memory_range() unconditionally
>  reserves from pfn 0, but e820__memblock_setup() might provide the
>  exiting memory from pfn 1 (i.e.  KVM).
>  
>  Since struct pages are zeroed in __init_single_page(), and not during
>  allocation time, we must zero such struct pages explicitly.
>  
>  The patch involves adding a new memblock iterator:
>  for_each_resv_unavail_range(i, p_start, p_end)
>  
>  Which iterates through reserved && !memory lists, and we zero struct 
> pages
>  explicitly by calling mm_zero_struct_page().
>  
>  ===
>  
>  Here is more detailed example of problem that this patch is 
> addressing:
>  
>  Run tested on qemu with the following arguments:
>  
>  -enable-kvm -cpu kvm64 -m 512 -smp 2
>  
>  This patch reports that there are 98 unavailable pages.
>  
>  They are: pfn 0 and pfns in range [159, 255].
>  
>  Note, trim_low_memory_range() reserves only pfns in range [0, 15], 
> it does
>  not reserve [159, 255] ones.
>  
>  e820__memblock_setup() reports linux that the following physical 
> ranges are
>  available:
>  [1 , 158]
>  [256, 130783]
>  
>  Notice, that exactly unavailable pfns are missing!
>  
>  Now, lets check what we have in zone 0: [1, 131039]
>  
>  pfn 0, is not part of the zone, but pfns [1, 158], are.
>  
>  However, the bigger problem we have if we do not initialize these 
> struct
>  pages is with memory hotplug.  Because, that path operates at 2M
>  boundaries (section_nr).  And checks if 2M range of pages is hot
>  removable.  It starts with first pfn from zone, rounds it down to 2M
>  boundary (sturct pages are allocated 

Re: [Xen-devel] Linux 4.15-rc6 + xen-unstable: BUG: unable to handle kernel NULL pointer dereference at (null), [ 0.000000] IP: zero_resv_unavail+0x8e/0xe1

2018-01-09 Thread Sander Eikelenboom
On 09/01/18 17:16, Pavel Tatashin wrote:
> Hi Juergen,
> 
> Do you have this patch applied:
> 
> https://github.com/torvalds/linux/commit/e8c24773d6b2cd9bc8b36bd6e60beff599be14be

Seems this hasn't made it to Linus yet ?

I will give it a test and report back, thanks !

> 
> Thank you,
> Pavel
> 
> On 01/09/2018 11:10 AM, Juergen Gross wrote:
>> On 09/01/18 16:29, Sander Eikelenboom wrote:
>>> Since it's already rc7:
>>> "Give me a subtle ping, Vasili. One subtle ping only, please."
>>
>> I like that film :-)
:)

--
Sander

>> Pavel, can you please comment? Do you have an idea how to repair the
>> issue or should we revert your patch in 4.15?
>>
>>
>> Juergen
>>
>>>
>>> On 04/01/18 21:02, Sander Eikelenboom wrote:
 On 04/01/18 12:44, Juergen Gross wrote:
> On 04/01/18 11:17, Sander Eikelenboom wrote:
>> Hi Boris / Juergen,
>>
>> First of all best wishes for a quite turbulent starting new year.
>>
>> Now the holidays are over I finally gotten to test a linux 4.15-rc6 
>> kernel
>> and experienced a crash in early dom0 boot on my system (AMD phenom x6).
>>
>> I tested some earlier linux 4.15 rc's but experienced crashes then as 
>> well,
>> but didn't have time to setup serial console to send them in
>> (and waited to see if the issue Boris fixed with AMD PCI 64bit bar's 
>> could be it).
>>
>> But since that patch went in before 4.15 rc6, that doesn't seem to be 
>> the issue.
>> So it could be that the culprit went in pretty earlier in the 4.15 cycle.
>>
>> The 4.15-rc6 kernel boots fine on bare metal, as does a 4.14.6 kernel on 
>> xen-unstable.
>>
>> Hopefully you have a pointer to what is wrong, if not i can try to do a 
>> bisect.
>
> A bisect would be very welcome.

 Hi Juergen / Boris / Pavel,

 Bisection result is:

 a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b is the first bad commit
 commit a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b
 Author: Pavel Tatashin 
 Date:   Wed Nov 15 17:36:31 2017 -0800

  mm: zero reserved and unavailable struct pages
  
  Some memory is reserved but unavailable: not present in 
 memblock.memory
  (because not backed by physical pages), but present in 
 memblock.reserved.
  Such memory has backing struct pages, but they are not initialized by
  going through __init_single_page().
  
  In some cases these struct pages are accessed even if they do not
  contain any data.  One example is page_to_pfn() might access 
 page->flags
  if this is where section information is stored (CONFIG_SPARSEMEM,
  SECTION_IN_PAGE_FLAGS).
  
  One example of such memory: trim_low_memory_range() unconditionally
  reserves from pfn 0, but e820__memblock_setup() might provide the
  exiting memory from pfn 1 (i.e.  KVM).
  
  Since struct pages are zeroed in __init_single_page(), and not during
  allocation time, we must zero such struct pages explicitly.
  
  The patch involves adding a new memblock iterator:
  for_each_resv_unavail_range(i, p_start, p_end)
  
  Which iterates through reserved && !memory lists, and we zero struct 
 pages
  explicitly by calling mm_zero_struct_page().
  
  ===
  
  Here is more detailed example of problem that this patch is 
 addressing:
  
  Run tested on qemu with the following arguments:
  
  -enable-kvm -cpu kvm64 -m 512 -smp 2
  
  This patch reports that there are 98 unavailable pages.
  
  They are: pfn 0 and pfns in range [159, 255].
  
  Note, trim_low_memory_range() reserves only pfns in range [0, 15], it 
 does
  not reserve [159, 255] ones.
  
  e820__memblock_setup() reports linux that the following physical 
 ranges are
  available:
  [1 , 158]
  [256, 130783]
  
  Notice, that exactly unavailable pfns are missing!
  
  Now, lets check what we have in zone 0: [1, 131039]
  
  pfn 0, is not part of the zone, but pfns [1, 158], are.
  
  However, the bigger problem we have if we do not initialize these 
 struct
  pages is with memory hotplug.  Because, that path operates at 2M
  boundaries (section_nr).  And checks if 2M range of pages is hot
  removable.  It starts with first pfn from zone, rounds it down to 2M
  boundary (sturct pages are allocated at 2M boundaries when vmemmap is
  created), and checks if that section is hot removable.  In this case
  start with pfn 1 and convert it down to pfn 0.  Later pfn is converted
  to struct page, and some fields are checked. 

Re: [Xen-devel] [PATCH RFC v1 21/74] x86/entry: Early PVH boot code

2018-01-09 Thread Wei Liu
On Fri, Jan 05, 2018 at 06:32:56AM -0700, Jan Beulich wrote:
> > +module_t *mod;
> > +unsigned int i;
> > +
> > +ASSERT(pvh_info->magic == XEN_HVM_START_MAGIC_VALUE);
> > +
> > +/*
> > + * Turn hvm_start_info into mbi. Luckily all modules are placed under 
> > 4GB
> > + * boundary on x86.
> 
> ISTR having that discussion relatively recently in another context:
> All the header states is "NB: Xen on x86 will always try to place all
> the data below the 4GiB boundary." Note the "try to". Hence I
> think ...
> 
> > + */
> > +pvh_mbi.flags = MBI_CMDLINE | MBI_MODULES | MBI_LOADERNAME;
> > +
> > +ASSERT(!(pvh_info->cmdline_paddr >> 32));
> 
> ... this, if we don't want to handle the case, should be BUG_ON() or
> panic() (same further down).
> 
> > +pvh_mbi.cmdline = pvh_info->cmdline_paddr;
> > +pvh_mbi.boot_loader_name = __pa(pvh_loader);
> > +
> > +ASSERT(pvh_info->nr_modules < 32);
> 
> ARRAY_SIZE(pvh_mbi_mods) and perhaps again BUG_ON() or
> panic().
> 
> > +pvh_mbi.mods_count = pvh_info->nr_modules;
> > +pvh_mbi.mods_addr = __pa(pvh_mbi_mods);
> > +
> > +mod = pvh_mbi_mods;
> > +entry = __va(pvh_info->modlist_paddr);
> 
> How come __va() already works at this point in time? And what about
> this address being beyond 4Gb?
> 

The original code uses __va at the beginning of __start_xen so this is
no more erroneous than what we originally have.

We shall BUG_ON address beyond 4Gb for the time being.

> > +for ( i = 0; i < pvh_info->nr_modules; i++ )
> > +{
> > +ASSERT(!(entry[i].paddr >> 32));
> 
> To relax this condition (in particular to allow huge initrd), how
> about ...
> 
> > +mod[i].mod_start = entry[i].paddr;
> > +mod[i].mod_end   = entry[i].paddr + entry[i].size;
> 
> ... using the EFI approach here and store the PFN in mod_start
> and the size in mod_end?


This function turns pvh_info into multiboot info. I'm afraid I don't
follow you suggestion here. The best approach now is to BUG_ON here and
consider huge initrd later.

(I will try to fix other comments where I can)

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 54/74] xen/pvshim: set correct domid value

2018-01-09 Thread Roger Pau Monné
On Mon, Jan 08, 2018 at 07:17:16AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06,  wrote:
> > @@ -576,11 +578,11 @@ static void noinline init_done(void)
> >  
> >  system_state = SYS_STATE_active;
> >  
> > +domain_unpause_by_systemcontroller(dom0);
> > +
> >  /* MUST be done prior to removing .init data. */
> >  unregister_init_virtual_region();
> >  
> > -domain_unpause_by_systemcontroller(hardware_domain);
> 
> Why the re-ordering? Along the lines of the earlier comment,
> using "dom0" as replacement (static) variable isn't very nice.
> Please at least accompany its declaration by a comment.

The 'dom0' variable is in the .init section, so it seems best to do
the unpause first and then remove the init virtual regions.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU

2018-01-09 Thread Jan Beulich
>>> On 09.01.18 at 16:43,  wrote:
> On Tue, 2018-01-09 at 02:13 -0700, Jan Beulich wrote:
>> > > > On 04.01.18 at 14:06,  wrote:
>> > +size_t consoled_guest_rx(void)
>> > +{
>> > +size_t recv = 0, idx = 0;
>> > +XENCONS_RING_IDX cons, prod;
>> > +
>> > +if ( !cons_ring )
>> > +return 0;
>> > +
>> > +spin_lock(_lock);
>> > +
>> > +cons = cons_ring->out_cons;
>> > +prod = ACCESS_ONCE(cons_ring->out_prod);
>> > +ASSERT((prod - cons) <= sizeof(cons_ring->out));
>> > +
>> > +/* Is the ring empty? */
>> > +if ( cons == prod )
>> > +goto out;
>> > +
>> > +/* Update pointers before accessing the ring */
>> > +smp_rmb();
>> 
>> I think this need to move up ahead of the if(). In the comment
>> perhaps s/Update/Latch/?
> 
> The read/write memory barriers here are between read/write accesses to
> ring->out_prod and ring->out array. So there is no need to move them.
> (the same goes for the input ring)

And there is no multiple-read issue here?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU

2018-01-09 Thread Jan Beulich
>>> On 09.01.18 at 17:09,  wrote:
> On Mon, Jan 08, 2018 at 07:06:14AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:06,  wrote:
>> > From: Roger Pau Monne 
>> > --- a/xen/arch/x86/pv/dom0_build.c
>> > +++ b/xen/arch/x86/pv/dom0_build.c
>> > @@ -31,9 +31,8 @@
>> >  #define L3_PROT (BASE_PROT|_PAGE_DIRTY)
>> >  #define L4_PROT (BASE_PROT|_PAGE_DIRTY)
>> >  
>> > -static __init void dom0_update_physmap(struct domain *d, unsigned long 
>> > pfn,
>> > -   unsigned long mfn,
>> > -   unsigned long vphysmap_s)
>> > +__init void dom0_update_physmap(struct domain *d, unsigned long pfn,
>> 
>> Please don't re-order type and annotation.
> 
> I'm not re-ordering anything here, just removing "static".

Oops, I'm sorry. Things being mis-ordered simply becomes more
obvious with the "static gone".

> Do you mean that you prefer "void __init ..."?

Yes.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [BUG] unable to shutdown (page fault in mwait_idle()/do_dbs_timer()/__find_next_bit()) (fwd)

2018-01-09 Thread Jan Beulich
>>> On 08.01.18 at 17:07,  wrote:
> On Mon, 8 Jan 2018, Jan Beulich wrote:
> On 07.01.18 at 13:34,  wrote:
>>> (XEN) [ Xen-4.10.0-vgpu  x86_64  debug=n   Not tainted ]
>>
>> The -vgpu tag makes me wonder whether you have any patches in
>> your tree on top of plain 4.10.0 (or 4.10-staging). Also the debug=n
>> above ...
> 
> 4.10.0 + 11 patches to make nvidia/vgpu work 
> (https://github.com/xenserver/xen-4.7.pg).
> debug=n because xen's modified debug build process.
> 
>>> (XEN)[] __find_next_bit+0x10/0x80
>>> (XEN)[] cpufreq_ondemand.c#do_dbs_timer+0x160/0x220
>>> (XEN)[] mwait-idle.c#mwait_idle+0x23e/0x340
>>> (XEN)[] domain.c#idle_loop+0x86/0xc0
>>
>> ... makes this call trace unreliable. But even with a reliable call
>> trace, analysis of the crash would be helped if you made
>> available the xen-syms (or xen.efi, depending on how you boot)
>> somewhere.
> 
> xen-syms - http://www.uschovna.cz/en/zasilka/UDP5LVE2679CGBIS-4YV/ 

Thanks. Looks to be a race between a timer in the governor and
the CPUs being brought down. In general the governor is supposed
to be disabled in the course of CPUs being brought down, so first
of all I wonder whether you're having some daemon in use which
sends management requests to the CPUfreq driver in Xen. Such a
daemon should of course be disabled by the system shutdown
scripts. Otherwise please try the attached debugging patch -
maybe we can see something from its output.

Jan

--- unstable.orig/xen/drivers/cpufreq/cpufreq.c 2017-09-12 12:39:58.310556379 
+0200
+++ unstable/xen/drivers/cpufreq/cpufreq.c  2018-01-09 17:21:09.659208437 
+0100
@@ -352,6 +352,8 @@ int cpufreq_del_cpu(unsigned int cpu)
 
 /* for HW_ALL, stop gov for each core of the _PSD domain */
 /* for SW_ALL & SW_ANY, stop gov for the 1st core of the _PSD domain */
+printk("cpufreq: del CPU%u (%u,%lx,%lu,%lx)\n", cpu,//temp
+   hw_all, cpufreq_dom->map->bits[0], perf->domain_info.num_processors, 
policy->cpus->bits[0]);//temp
 if (hw_all || (cpumask_weight(cpufreq_dom->map) ==
perf->domain_info.num_processors))
 __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
--- unstable.orig/xen/drivers/cpufreq/cpufreq_ondemand.c2017-09-12 
12:39:58.310556379 +0200
+++ unstable/xen/drivers/cpufreq/cpufreq_ondemand.c 2018-01-09 
17:16:07.633604995 +0100
@@ -218,6 +218,9 @@ int cpufreq_governor_dbs(struct cpufreq_
 
 switch (event) {
 case CPUFREQ_GOV_START:
+if(system_state > SYS_STATE_active) {//temp
+ printk("dbs: start CPU%u [%pS]\n", cpu, __builtin_return_address(0));
+}
 if ((!cpu_online(cpu)) || (!policy->cur))
 return -EINVAL;
 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Linux 4.15-rc6 + xen-unstable: BUG: unable to handle kernel NULL pointer dereference at (null), [ 0.000000] IP: zero_resv_unavail+0x8e/0xe1

2018-01-09 Thread Pavel Tatashin

Hi Juergen,

Do you have this patch applied:

https://github.com/torvalds/linux/commit/e8c24773d6b2cd9bc8b36bd6e60beff599be14be

Thank you,
Pavel

On 01/09/2018 11:10 AM, Juergen Gross wrote:

On 09/01/18 16:29, Sander Eikelenboom wrote:

Since it's already rc7:
"Give me a subtle ping, Vasili. One subtle ping only, please."


I like that film :-)

Pavel, can you please comment? Do you have an idea how to repair the
issue or should we revert your patch in 4.15?


Juergen



On 04/01/18 21:02, Sander Eikelenboom wrote:

On 04/01/18 12:44, Juergen Gross wrote:

On 04/01/18 11:17, Sander Eikelenboom wrote:

Hi Boris / Juergen,

First of all best wishes for a quite turbulent starting new year.

Now the holidays are over I finally gotten to test a linux 4.15-rc6 kernel
and experienced a crash in early dom0 boot on my system (AMD phenom x6).

I tested some earlier linux 4.15 rc's but experienced crashes then as well,
but didn't have time to setup serial console to send them in
(and waited to see if the issue Boris fixed with AMD PCI 64bit bar's could be 
it).

But since that patch went in before 4.15 rc6, that doesn't seem to be the issue.
So it could be that the culprit went in pretty earlier in the 4.15 cycle.

The 4.15-rc6 kernel boots fine on bare metal, as does a 4.14.6 kernel on 
xen-unstable.

Hopefully you have a pointer to what is wrong, if not i can try to do a bisect.


A bisect would be very welcome.


Hi Juergen / Boris / Pavel,

Bisection result is:

a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b is the first bad commit
commit a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b
Author: Pavel Tatashin 
Date:   Wed Nov 15 17:36:31 2017 -0800

 mm: zero reserved and unavailable struct pages
 
 Some memory is reserved but unavailable: not present in memblock.memory

 (because not backed by physical pages), but present in memblock.reserved.
 Such memory has backing struct pages, but they are not initialized by
 going through __init_single_page().
 
 In some cases these struct pages are accessed even if they do not

 contain any data.  One example is page_to_pfn() might access page->flags
 if this is where section information is stored (CONFIG_SPARSEMEM,
 SECTION_IN_PAGE_FLAGS).
 
 One example of such memory: trim_low_memory_range() unconditionally

 reserves from pfn 0, but e820__memblock_setup() might provide the
 exiting memory from pfn 1 (i.e.  KVM).
 
 Since struct pages are zeroed in __init_single_page(), and not during

 allocation time, we must zero such struct pages explicitly.
 
 The patch involves adding a new memblock iterator:

 for_each_resv_unavail_range(i, p_start, p_end)
 
 Which iterates through reserved && !memory lists, and we zero struct pages

 explicitly by calling mm_zero_struct_page().
 
 ===
 
 Here is more detailed example of problem that this patch is addressing:
 
 Run tested on qemu with the following arguments:
 
 -enable-kvm -cpu kvm64 -m 512 -smp 2
 
 This patch reports that there are 98 unavailable pages.
 
 They are: pfn 0 and pfns in range [159, 255].
 
 Note, trim_low_memory_range() reserves only pfns in range [0, 15], it does

 not reserve [159, 255] ones.
 
 e820__memblock_setup() reports linux that the following physical ranges are

 available:
 [1 , 158]
 [256, 130783]
 
 Notice, that exactly unavailable pfns are missing!
 
 Now, lets check what we have in zone 0: [1, 131039]
 
 pfn 0, is not part of the zone, but pfns [1, 158], are.
 
 However, the bigger problem we have if we do not initialize these struct

 pages is with memory hotplug.  Because, that path operates at 2M
 boundaries (section_nr).  And checks if 2M range of pages is hot
 removable.  It starts with first pfn from zone, rounds it down to 2M
 boundary (sturct pages are allocated at 2M boundaries when vmemmap is
 created), and checks if that section is hot removable.  In this case
 start with pfn 1 and convert it down to pfn 0.  Later pfn is converted
 to struct page, and some fields are checked.  Now, if we do not zero
 struct pages, we get unpredictable results.
 
 In fact when CONFIG_VM_DEBUG is enabled, and we explicitly set all

 vmemmap memory to ones, the following panic is observed with kernel test
 without this patch applied:
 
   BUG: unable to handle kernel NULL pointer dereference at  (null)

   IP: is_pageblock_removable_nolock+0x35/0x90
   PGD 0 P4D 0
   Oops:  [#1] PREEMPT
   ...
   task: 88001f4e2900 task.stack: c9314000
   RIP: 0010:is_pageblock_removable_nolock+0x35/0x90
   Call Trace:
? is_mem_section_removable+0x5a/0xd0
show_mem_removable+0x6b/0xa0
dev_attr_show+0x1b/0x50
sysfs_kf_seq_show+0xa1/0x100

Re: [Xen-devel] Linux 4.15-rc6 + xen-unstable: BUG: unable to handle kernel NULL pointer dereference at (null), [ 0.000000] IP: zero_resv_unavail+0x8e/0xe1

2018-01-09 Thread Juergen Gross
On 09/01/18 16:29, Sander Eikelenboom wrote:
> Since it's already rc7:
> "Give me a subtle ping, Vasili. One subtle ping only, please."

I like that film :-)

Pavel, can you please comment? Do you have an idea how to repair the
issue or should we revert your patch in 4.15?


Juergen

> 
> On 04/01/18 21:02, Sander Eikelenboom wrote:
>> On 04/01/18 12:44, Juergen Gross wrote:
>>> On 04/01/18 11:17, Sander Eikelenboom wrote:
 Hi Boris / Juergen,

 First of all best wishes for a quite turbulent starting new year.

 Now the holidays are over I finally gotten to test a linux 4.15-rc6 kernel
 and experienced a crash in early dom0 boot on my system (AMD phenom x6).

 I tested some earlier linux 4.15 rc's but experienced crashes then as 
 well, 
 but didn't have time to setup serial console to send them in 
 (and waited to see if the issue Boris fixed with AMD PCI 64bit bar's could 
 be it). 

 But since that patch went in before 4.15 rc6, that doesn't seem to be the 
 issue. 
 So it could be that the culprit went in pretty earlier in the 4.15 cycle.

 The 4.15-rc6 kernel boots fine on bare metal, as does a 4.14.6 kernel on 
 xen-unstable.

 Hopefully you have a pointer to what is wrong, if not i can try to do a 
 bisect.
>>>
>>> A bisect would be very welcome.
>>
>> Hi Juergen / Boris / Pavel,
>>
>> Bisection result is:
>>
>> a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b is the first bad commit
>> commit a4a3ede2132ae0863e2d43e06f9b5697c51a7a3b
>> Author: Pavel Tatashin 
>> Date:   Wed Nov 15 17:36:31 2017 -0800
>>
>> mm: zero reserved and unavailable struct pages
>> 
>> Some memory is reserved but unavailable: not present in memblock.memory
>> (because not backed by physical pages), but present in memblock.reserved.
>> Such memory has backing struct pages, but they are not initialized by
>> going through __init_single_page().
>> 
>> In some cases these struct pages are accessed even if they do not
>> contain any data.  One example is page_to_pfn() might access page->flags
>> if this is where section information is stored (CONFIG_SPARSEMEM,
>> SECTION_IN_PAGE_FLAGS).
>> 
>> One example of such memory: trim_low_memory_range() unconditionally
>> reserves from pfn 0, but e820__memblock_setup() might provide the
>> exiting memory from pfn 1 (i.e.  KVM).
>> 
>> Since struct pages are zeroed in __init_single_page(), and not during
>> allocation time, we must zero such struct pages explicitly.
>> 
>> The patch involves adding a new memblock iterator:
>> for_each_resv_unavail_range(i, p_start, p_end)
>> 
>> Which iterates through reserved && !memory lists, and we zero struct 
>> pages
>> explicitly by calling mm_zero_struct_page().
>> 
>> ===
>> 
>> Here is more detailed example of problem that this patch is addressing:
>> 
>> Run tested on qemu with the following arguments:
>> 
>> -enable-kvm -cpu kvm64 -m 512 -smp 2
>> 
>> This patch reports that there are 98 unavailable pages.
>> 
>> They are: pfn 0 and pfns in range [159, 255].
>> 
>> Note, trim_low_memory_range() reserves only pfns in range [0, 15], it 
>> does
>> not reserve [159, 255] ones.
>> 
>> e820__memblock_setup() reports linux that the following physical ranges 
>> are
>> available:
>> [1 , 158]
>> [256, 130783]
>> 
>> Notice, that exactly unavailable pfns are missing!
>> 
>> Now, lets check what we have in zone 0: [1, 131039]
>> 
>> pfn 0, is not part of the zone, but pfns [1, 158], are.
>> 
>> However, the bigger problem we have if we do not initialize these struct
>> pages is with memory hotplug.  Because, that path operates at 2M
>> boundaries (section_nr).  And checks if 2M range of pages is hot
>> removable.  It starts with first pfn from zone, rounds it down to 2M
>> boundary (sturct pages are allocated at 2M boundaries when vmemmap is
>> created), and checks if that section is hot removable.  In this case
>> start with pfn 1 and convert it down to pfn 0.  Later pfn is converted
>> to struct page, and some fields are checked.  Now, if we do not zero
>> struct pages, we get unpredictable results.
>> 
>> In fact when CONFIG_VM_DEBUG is enabled, and we explicitly set all
>> vmemmap memory to ones, the following panic is observed with kernel test
>> without this patch applied:
>> 
>>   BUG: unable to handle kernel NULL pointer dereference at  
>> (null)
>>   IP: is_pageblock_removable_nolock+0x35/0x90
>>   PGD 0 P4D 0
>>   Oops:  [#1] PREEMPT
>>   ...
>>   task: 88001f4e2900 task.stack: c9314000
>>   RIP: 0010:is_pageblock_removable_nolock+0x35/0x90
>>   Call Trace:
>>? is_mem_section_removable+0x5a/0xd0
>>  

Re: [Xen-devel] [RFC PATCH 1/8] x86/domctl: introduce a pair of hypercall to set and get cpu topology

2018-01-09 Thread Chao Gao
On Mon, Jan 08, 2018 at 01:14:44PM -0500, Daniel De Graaf wrote:
>On 01/07/2018 11:01 PM, Chao Gao wrote:
>> Define interface, structures and hypercalls for toolstack to build
>> cpu topology and for guest that will retrieve it [1].
>> Two subop hypercalls introduced by this patch:
>> XEN_DOMCTL_set_cpu_topology to define cpu topology information per domain
>> and XENMEM_get_cpu_topology to retrieve cpu topology information.
>> 
>> [1]: during guest creation, those information helps hvmloader to build ACPI.
>> 
>> Signed-off-by: Chao Gao 
>
>When adding new XSM controls for use by device models, you also
>need to add the permissions to the device_model macro defined in
>tools/flask/policy/modules/xen.if.  If domains need to call this
>function on themselves (is this only true for get?), you will also
>need to add it to declare_domain_common.
>

Hi, Daniel.

Yes. XENMEM_get_cpu_topology will be called by the domain itself.
And Both get and set will be called by dom0 when creating one domain.
So I need:
1. add *set* and *get* to create_domain_common.
2. add *set* to declare_domain_common.

Is it right?

Thanks
Chao

>> ---
>>   xen/arch/x86/domctl.c   | 27 ++
>>   xen/arch/x86/hvm/hvm.c  |  7 ++
>>   xen/arch/x86/mm.c   | 45 
>> +
>>   xen/include/asm-x86/hvm/domain.h| 15 +
>>   xen/include/public/domctl.h | 22 ++
>>   xen/include/public/memory.h | 27 +-
>>   xen/include/xsm/dummy.h |  6 +
>>   xen/xsm/dummy.c |  1 +
>>   xen/xsm/flask/hooks.c   | 10 +
>>   xen/xsm/flask/policy/access_vectors |  4 
>>   10 files changed, 163 insertions(+), 1 deletion(-)
>> 
>> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
>> index 36ab235..4e1bbd5 100644
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -347,6 +347,29 @@ void arch_get_domain_info(const struct domain *d,
>>   info->flags |= XEN_DOMINF_hap;
>>   }
>> +static int arch_set_cpu_topology(struct domain *d,
>> + struct xen_domctl_cpu_topology *topology)
>> +{
>> +if ( !is_hvm_domain(d) ||
>> + !topology->size || topology->size > HVM_MAX_VCPUS )
>> +return -EINVAL;
>> +
>> +if ( !d->arch.hvm_domain.apic_id )
>> +d->arch.hvm_domain.apic_id = xmalloc_array(uint32_t, 
>> topology->size);
>> +
>> +if ( !d->arch.hvm_domain.apic_id )
>> +return -ENOMEM;
>> +
>> +if ( copy_from_guest(d->arch.hvm_domain.apic_id, topology->tid,
>> + topology->size) )
>> +return -EFAULT;
>> +
>> +d->arch.hvm_domain.apic_id_size = topology->size;
>> +d->arch.hvm_domain.core_per_socket = topology->core_per_socket;
>> +d->arch.hvm_domain.thread_per_core = topology->thread_per_core;
>> +return 0;
>> +}
>> +
>>   #define MAX_IOPORTS 0x1
>>   long arch_do_domctl(
>> @@ -1555,6 +1578,10 @@ long arch_do_domctl(
>>   recalculate_cpuid_policy(d);
>>   break;
>> +case XEN_DOMCTL_set_cpu_topology:
>> +ret = arch_set_cpu_topology(d, >u.cpu_topology);
>> +break;
>> +
>>   default:
>>   ret = iommu_do_domctl(domctl, d, u_domctl);
>>   break;
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 71fddfd..b3b3224 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -1509,6 +1509,13 @@ int hvm_vcpu_initialise(struct vcpu *v)
>>   int rc;
>>   struct domain *d = v->domain;
>> +if ( v->vcpu_id > d->arch.hvm_domain.apic_id_size )
>> +{
>> +printk(XENLOG_ERR "d%dv%d's apic id isn't set.\n",
>> +   d->domain_id, v->vcpu_id);
>> +return -ENOENT;
>> +}
>> +
>>   hvm_asid_flush_vcpu(v);
>>   spin_lock_init(>arch.hvm_vcpu.tm_lock);
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index a56f875..b90e663 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -4413,6 +4413,51 @@ long arch_memory_op(unsigned long cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>   return rc;
>>   }
>> +case XENMEM_get_cpu_topology:
>> +{
>> +struct domain *d;
>> +struct xen_cpu_topology_info topology;
>> +
>> +if ( copy_from_guest(, arg, 1) )
>> +return -EFAULT;
>> +
>> +if ( topology.pad || topology.pad2 )
>> +return -EINVAL;
>> +
>> +if ( (d = rcu_lock_domain_by_any_id(topology.domid)) == NULL )
>> +return -ESRCH;
>> +
>> +rc = xsm_get_cpu_topology(XSM_TARGET, d);
>> +if ( rc )
>> +goto get_cpu_topology_failed;
>> +
>> +rc = -EOPNOTSUPP;
>> +if ( !is_hvm_domain(d) || !d->arch.hvm_domain.apic_id )
>> +goto get_cpu_topology_failed;
>> +
>> +/* allow the size to be zero for users who don't care apic_id */

Re: [Xen-devel] [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU

2018-01-09 Thread Roger Pau Monné
On Mon, Jan 08, 2018 at 07:06:14AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06,  wrote:
> > From: Roger Pau Monne 
> > --- a/xen/arch/x86/pv/dom0_build.c
> > +++ b/xen/arch/x86/pv/dom0_build.c
> > @@ -31,9 +31,8 @@
> >  #define L3_PROT (BASE_PROT|_PAGE_DIRTY)
> >  #define L4_PROT (BASE_PROT|_PAGE_DIRTY)
> >  
> > -static __init void dom0_update_physmap(struct domain *d, unsigned long pfn,
> > -   unsigned long mfn,
> > -   unsigned long vphysmap_s)
> > +__init void dom0_update_physmap(struct domain *d, unsigned long pfn,
> 
> Please don't re-order type and annotation.

I'm not re-ordering anything here, just removing "static". Do you mean
that you prefer "void __init ..."?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU

2018-01-09 Thread Sergey Dyasli
On Tue, 2018-01-09 at 02:13 -0700, Jan Beulich wrote:
> > > > On 04.01.18 at 14:06,  wrote:
> > +size_t consoled_guest_rx(void)
> > +{
> > +size_t recv = 0, idx = 0;
> > +XENCONS_RING_IDX cons, prod;
> > +
> > +if ( !cons_ring )
> > +return 0;
> > +
> > +spin_lock(_lock);
> > +
> > +cons = cons_ring->out_cons;
> > +prod = ACCESS_ONCE(cons_ring->out_prod);
> > +ASSERT((prod - cons) <= sizeof(cons_ring->out));
> > +
> > +/* Is the ring empty? */
> > +if ( cons == prod )
> > +goto out;
> > +
> > +/* Update pointers before accessing the ring */
> > +smp_rmb();
> 
> I think this need to move up ahead of the if(). In the comment
> perhaps s/Update/Latch/?

The read/write memory barriers here are between read/write accesses to
ring->out_prod and ring->out array. So there is no need to move them.
(the same goes for the input ring)

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Anthony Liguori
On Tue, Jan 9, 2018 at 2:49 AM, Ian Jackson  wrote:
> Andrew Cooper writes ("Re: Radical proposal: ship not-fully-tidied shim as 
> 4.10.1"):
>> Does this sound fair?
>
> Everything is on fire.  Your proposal seems much less radical than
> mine.  I doubt it will produce a release to our users tomorrow, let
> alone this week.
>
> If we can't get agreement to commit an under-reviewed and under-tested
> series to staging-4.10, then IMO we should fork 4.10 and make an
> emergency security release off the fork, instead.

My series is on top of staging and I already have a backport to 4.9
and 4.10 stable published.

I will cherry pick as much as I can from Wei's branch this morning and
send out a v3.  I will try
to close the migration, hotplug, ballooning gap.

I do think it's useful for folks to review the series deeply.  It's
not that big so it should take all that
long to do so.

I think v3 of my series is likely the closest to something that can be
merged this week.

Regards,

Anthony Liguori

> Ian.
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 32/74] x86: don't swallow the first command line item in pvh mode

2018-01-09 Thread Roger Pau Monné
On Thu, Jan 04, 2018 at 01:05:43PM +, Wei Liu wrote:
> Instead, special case GRUB1 rather assuming that all bootloaders except GRUB2
> need a parameter stripping.

The FreeBSD loader also prepends "xen.gz" (or the Xen kernel filename)
to the command line. Hence this change will break it.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6.5 20/26] x86: Protect unaware domains from meddling hyperthreads

2018-01-09 Thread Jan Beulich
>>> On 09.01.18 at 15:21,  wrote:
> On 04/01/18 09:59, Jan Beulich wrote:
> On 04.01.18 at 01:15,  wrote:
>>> Signed-off-by: Andrew Cooper 
>> Fundamentally (as before)
>> Reviewed-by: Jan Beulich 
>> However:
>>
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -2027,6 +2027,25 @@ int domain_relinquish_resources(struct domain *d)
>>>   */
>>>  void cpuid_policy_updated(struct vcpu *v)
>>>  {
>>> +const struct cpuid_policy *cp = v->domain->arch.cpuid;
>>> +struct msr_vcpu_policy *vp = v->arch.msr;
>>> +
>>> +/*
>>> + * For guests which know about IBRS but are not told about STIBP 
> running
>>> + * on hardware supporting hyperthreading, the guest doesn't know to
>>> + * protect itself fully.  (Such a guest won't be permitted direct 
> access
>>> + * to the MSR.)  Have Xen fill in the gaps, so an unaware guest can't 
> be
>>> + * interfered with by a meddling guest on an adjacent hyperthread.
>>> + */
>>> +if ( cp->feat.ibrsb )
>>> +{
>>> +if ( !cp->feat.stibp && cpu_has_stibp &&
>>> + !(vp->spec_ctrl.guest & (SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) )
>>> +vp->spec_ctrl.host = SPEC_CTRL_STIBP;
>>> +else
>>> +vp->spec_ctrl.host = vp->spec_ctrl.guest;
>> This code is so similar to ...
>>
>>> --- a/xen/arch/x86/msr.c
>>> +++ b/xen/arch/x86/msr.c
>>> @@ -181,7 +181,20 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t 
> val)
>>>   (cp->feat.stibp ? SPEC_CTRL_STIBP : 0)) )
>>>  goto gp_fault; /* Rsvd bit set? */
>>>  vp->spec_ctrl.guest = val;
>>> -vp->spec_ctrl.host  = val;
>>> +
>>> +/*
>>> + * For guests which are not told about STIBP, running on hardware
>>> + * supporting hyperthreading, the guest doesn't know to protect 
>>> itself
>>> + * fully.  (Such a guest won't be permitted direct access to the 
>>> MSR.)
>>> + * When IBRS is not in force, have Xen fill in the gaps, so an 
>>> unaware
>>> + * guest can't be interfered with by a meddling guest on an 
>>> adjacent
>>> + * hyperthread.
>>> + */
>>> +if ( !cp->feat.stibp && cpu_has_stibp &&
>>> + !(val & (SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) )
>>> +vp->spec_ctrl.host = SPEC_CTRL_STIBP;
>>> +else
>>> +vp->spec_ctrl.host = val;
>> ... this that I think a helper function would be warranted, unless you
>> have reasons to believe that future changes might break the
>> similarity.
> 
> I don't expect them to diverge, and will pull it out into a separate helper.
> 
>>
>> I'm also a little puzzled by you checking SPEC_CTRL_STIBP there -
>> this bit ought to be clear when !cp->feat.stibp due to the earlier
>> reserved bit check (part of which is even visible in context above).
>> IOW the check is not wrong, but perhaps misleading. You had
>> replied to this remark with
>>
>> "The SPEC_CTRL_STIBP check exists solely because of v3 review which
>>  objected to me implying a link between IBRS and STIPB."
> 
> The original logic was was "!cp->feat.stibp && cpu_has_stibp && val ==
> 0", which you argued would go stale as new SPEC_CTRL_ bits got added.

Ah, I recall now. But by just checking !(val & SPEC_CTRL_IBRS) you
would avoid the staleness; you might even consider putting an
ASSERT() in to validate the other bit is clear.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH RFC 2/4] xen/x86: add helper for stack guard

2018-01-09 Thread Juergen Gross
Instead of open coding the calculation of the stack guard page multiple
times add a helper to do the calculation.

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/mm.c| 8 ++--
 xen/include/asm-x86/mm.h | 6 ++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a56f875d45..b60e79e82e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5517,16 +5517,12 @@ void memguard_unguard_range(void *p, unsigned long l)
 void memguard_guard_stack(void *p)
 {
 BUILD_BUG_ON((PRIMARY_STACK_SIZE + PAGE_SIZE) > STACK_SIZE);
-p = (void *)((unsigned long)p + STACK_SIZE -
- PRIMARY_STACK_SIZE - PAGE_SIZE);
-memguard_guard_range(p, PAGE_SIZE);
+memguard_guard_range(memguard_get_guard_page(p), PAGE_SIZE);
 }
 
 void memguard_unguard_stack(void *p)
 {
-p = (void *)((unsigned long)p + STACK_SIZE -
- PRIMARY_STACK_SIZE - PAGE_SIZE);
-memguard_unguard_range(p, PAGE_SIZE);
+memguard_unguard_range(memguard_get_guard_page(p), PAGE_SIZE);
 }
 
 void arch_dump_shared_mem_info(void)
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 4af6b2341a..84e112b830 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -517,6 +517,12 @@ void memguard_unguard_range(void *p, unsigned long l);
 #define memguard_unguard_range(_p,_l)  ((void)0)
 #endif
 
+static inline void *memguard_get_guard_page(void *p)
+{
+return (void *)((unsigned long)p + STACK_SIZE -
+PRIMARY_STACK_SIZE - PAGE_SIZE);
+}
+
 void memguard_guard_stack(void *p);
 void memguard_unguard_stack(void *p);
 
-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH RFC 4/4] xen: use per-vcpu TSS and stacks for pv domains

2018-01-09 Thread Juergen Gross
Instead of using the TSS and stacks of the physical processor allocate
them per vcpu, map them in the per domain area, and use those.

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/domain.c| 45 +++
 xen/arch/x86/pv/domain.c | 72 +---
 xen/arch/x86/x86_64/entry.S  |  4 +++
 xen/include/asm-x86/config.h |  9 +-
 xen/include/asm-x86/mm.h |  5 +++
 5 files changed, 124 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index c0cb2cae64..952ed7e121 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1582,7 +1582,12 @@ static void _update_runstate_area(struct vcpu *v)
 
 static inline bool need_full_gdt(const struct domain *d)
 {
-return is_pv_domain(d) && !is_idle_domain(d);
+return is_pv_32bit_domain(d);
+}
+
+static inline bool need_per_vcpu_data(const struct domain *d)
+{
+return is_pv_domain(d) && !is_idle_domain(d) && !is_pv_32bit_domain(d);
 }
 
 static void __context_switch(void)
@@ -1657,8 +1662,19 @@ static void __context_switch(void)
 
 write_ptbase(n);
 
-if ( need_full_gdt(nd) &&
- ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) )
+if ( need_per_vcpu_data(nd) )
+{
+gdt = (struct desc_struct *)GDT_VIRT_START(n);
+gdt[PER_CPU_GDT_ENTRY].a = cpu;
+
+gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
+gdt_desc.base = GDT_VIRT_START(n);
+
+lgdt(_desc);
+ltr(TSS_ENTRY << 3);
+}
+else if ( need_full_gdt(nd) &&
+  ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) )
 {
 gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
 gdt_desc.base = GDT_VIRT_START(n);
@@ -1673,8 +1689,8 @@ static void __context_switch(void)
 per_cpu(curr_vcpu, cpu) = n;
 }
 
-static void context_switch_irqoff(struct vcpu *prev, struct vcpu *next,
-  unsigned int cpu)
+void context_switch_irqoff(struct vcpu *prev, struct vcpu *next,
+   unsigned int cpu)
 {
 const struct domain *prevd = prev->domain, *nextd = next->domain;
 
@@ -1764,7 +1780,24 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
 set_current(next);
 
-context_switch_irqoff(prev, next, cpu);
+if ( is_pv_domain(prevd) && !is_pv_32bit_domain(prevd) )
+{
+struct desc_struct *gdt = this_cpu(compat_gdt_table) -
+  FIRST_RESERVED_GDT_ENTRY;
+const struct desc_ptr gdtr = {
+.base = (unsigned long)gdt,
+.limit = LAST_RESERVED_GDT_BYTE,
+};
+void *stack = (struct cpu_info *)(stack_base[cpu] + STACK_SIZE) - 1;
+
+/* Switch to global accessible gdt and tss. */
+lgdt();
+ltr(TSS_ENTRY << 3);
+
+context_switch_irqoff_stack(prev, next, cpu, stack);
+}
+else
+context_switch_irqoff(prev, next, cpu);
 }
 
 void continue_running(struct vcpu *same)
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 74e9e667d2..6692aa6922 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -96,10 +96,32 @@ int switch_compat(struct domain *d)
 
 static int pv_create_gdt_ldt_l1tab(struct vcpu *v)
 {
-return create_perdomain_mapping(v->domain, GDT_VIRT_START(v),
-1U << GDT_LDT_VCPU_SHIFT,
-v->domain->arch.pv_domain.gdt_ldt_l1tab,
-NULL);
+int rc;
+
+rc = create_perdomain_mapping(v->domain, GDT_VIRT_START(v),
+  1U << GDT_LDT_VCPU_SHIFT,
+  v->domain->arch.pv_domain.gdt_ldt_l1tab,
+  NULL);
+if ( !rc && !is_pv_32bit_vcpu(v) )
+{
+struct desc_struct *gdt;
+
+gdt = (struct desc_struct *)GDT_VIRT_START(v) +
+  FIRST_RESERVED_GDT_ENTRY;
+rc = create_perdomain_mapping(v->domain, (unsigned long)gdt,
+  NR_RESERVED_GDT_BYTES,
+  NULL, NIL(struct page_info *));
+if ( !rc )
+{
+memcpy(gdt, boot_cpu_gdt_table, NR_RESERVED_GDT_BYTES);
+_set_tssldt_desc(gdt + TSS_ENTRY - FIRST_RESERVED_GDT_ENTRY,
+ TSS_START(v),
+ offsetof(struct tss_struct, __cacheline_filler) - 1,
+ SYS_DESC_tss_avail);
+}
+}
+
+return rc;
 }
 
 static void pv_destroy_gdt_ldt_l1tab(struct vcpu *v)
@@ -119,6 +141,46 @@ void pv_vcpu_destroy(struct vcpu *v)
 pv_destroy_gdt_ldt_l1tab(v);
 xfree(v->arch.pv_vcpu.trap_ctxt);
 v->arch.pv_vcpu.trap_ctxt = NULL;
+
+if ( !is_pv_32bit_vcpu(v) )
+destroy_perdomain_mapping(v->domain, STACKS_START(v),
+  STACK_SIZE + PAGE_SIZE);
+}
+
+static int pv_vcpu_init_tss_stacks(struct vcpu *v)
+{
+   

[Xen-devel] [PATCH RFC 1/4] xen/x86: use dedicated function for tss initialization

2018-01-09 Thread Juergen Gross
Carve out the TSS initialization from load_system_tables().

Signed-off-by: Juergen Gross 
---
 xen/arch/x86/cpu/common.c| 56 
 xen/include/asm-x86/system.h |  1 +
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index e9588b3c0d..8c0d3181d0 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -634,6 +634,35 @@ void __init early_cpu_init(void)
early_cpu_detect();
 }
 
+void tss_init(struct tss_struct *tss, unsigned long stack_bottom)
+{
+   unsigned long stack_top = stack_bottom & ~(STACK_SIZE - 1);
+
+   *tss = (struct tss_struct){
+   /* Main stack for interrupts/exceptions. */
+   .rsp0 = stack_bottom,
+
+   /* Ring 1 and 2 stacks poisoned. */
+   .rsp1 = 0x8600ul,
+   .rsp2 = 0x8600ul,
+
+   /*
+* MCE, NMI and Double Fault handlers get their own stacks.
+* All others poisoned.
+*/
+   .ist = {
+   [IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE,
+   [IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE,
+   [IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE,
+
+   [IST_MAX ... ARRAY_SIZE(tss->ist) - 1] =
+   0x8600ul,
+   },
+
+   .bitmap = IOBMP_INVALID_OFFSET,
+   };
+}
+
 /*
  * Sets up system tables and descriptors.
  *
@@ -645,8 +674,7 @@ void __init early_cpu_init(void)
 void load_system_tables(void)
 {
unsigned int cpu = smp_processor_id();
-   unsigned long stack_bottom = get_stack_bottom(),
-   stack_top = stack_bottom & ~(STACK_SIZE - 1);
+   unsigned long stack_bottom = get_stack_bottom();
 
struct tss_struct *tss = _cpu(init_tss);
struct desc_struct *gdt =
@@ -663,29 +691,7 @@ void load_system_tables(void)
.limit = (IDT_ENTRIES * sizeof(idt_entry_t)) - 1,
};
 
-   *tss = (struct tss_struct){
-   /* Main stack for interrupts/exceptions. */
-   .rsp0 = stack_bottom,
-
-   /* Ring 1 and 2 stacks poisoned. */
-   .rsp1 = 0x8600ul,
-   .rsp2 = 0x8600ul,
-
-   /*
-* MCE, NMI and Double Fault handlers get their own stacks.
-* All others poisoned.
-*/
-   .ist = {
-   [IST_MCE - 1] = stack_top + IST_MCE * PAGE_SIZE,
-   [IST_DF  - 1] = stack_top + IST_DF  * PAGE_SIZE,
-   [IST_NMI - 1] = stack_top + IST_NMI * PAGE_SIZE,
-
-   [IST_MAX ... ARRAY_SIZE(tss->ist) - 1] =
-   0x8600ul,
-   },
-
-   .bitmap = IOBMP_INVALID_OFFSET,
-   };
+   tss_init(tss, stack_bottom);
 
_set_tssldt_desc(
gdt + TSS_ENTRY,
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 8ac170371b..2cf50d1d49 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -230,6 +230,7 @@ static inline int local_irq_is_enabled(void)
 
 void trap_init(void);
 void init_idt_traps(void);
+void tss_init(struct tss_struct *tss, unsigned long stack_bottom);
 void load_system_tables(void);
 void percpu_traps_init(void);
 void subarch_percpu_traps_init(void);
-- 
2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 4/4] efi: Rename efi_get_secureboot() to __efi_get_secureboot() and make it static

2018-01-09 Thread Daniel Kiper
This may help compiler to do some function call optimization.

This is rather cosmetic. If you like this patch apply.
If you do not you may ignore it.

Signed-off-by: Daniel Kiper 
---
 arch/x86/xen/efi.c |2 +-
 drivers/firmware/efi/libstub/secureboot-core.c |2 +-
 drivers/firmware/efi/libstub/secureboot.c  |5 +
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/efi.c b/arch/x86/xen/efi.c
index 5ad2b8f..d45677f 100644
--- a/arch/x86/xen/efi.c
+++ b/arch/x86/xen/efi.c
@@ -141,7 +141,7 @@ void __init xen_efi_init(void)
boot_params.efi_info.efi_systab = (__u32)__pa(efi_systab_xen);
boot_params.efi_info.efi_systab_hi = (__u32)(__pa(efi_systab_xen) >> 
32);
 
-   boot_params.secure_boot = efi_get_secureboot(efi_systab_xen);
+   boot_params.secure_boot = __efi_get_secureboot(efi_systab_xen);
 
set_bit(EFI_BOOT, );
set_bit(EFI_PARAVIRT, );
diff --git a/drivers/firmware/efi/libstub/secureboot-core.c 
b/drivers/firmware/efi/libstub/secureboot-core.c
index d503ee4..07526a6 100644
--- a/drivers/firmware/efi/libstub/secureboot-core.c
+++ b/drivers/firmware/efi/libstub/secureboot-core.c
@@ -28,7 +28,7 @@
 /*
  * Determine whether we're in secure boot mode.
  */
-enum __sb_init efi_secureboot_mode efi_get_secureboot(efi_system_table_t 
*sys_table_arg)
+static enum __sb_init efi_secureboot_mode 
__efi_get_secureboot(efi_system_table_t *sys_table_arg)
 {
u32 attr;
u8 secboot, setupmode, moksbstate;
diff --git a/drivers/firmware/efi/libstub/secureboot.c 
b/drivers/firmware/efi/libstub/secureboot.c
index 1142170..f872afd 100644
--- a/drivers/firmware/efi/libstub/secureboot.c
+++ b/drivers/firmware/efi/libstub/secureboot.c
@@ -23,3 +23,8 @@
 __VA_ARGS__);
 
 #include "secureboot-core.c"
+
+enum efi_secureboot_mode efi_get_secureboot(efi_system_table_t *sys_table_arg)
+{
+   return __efi_get_secureboot(sys_table_arg);
+}
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/4] x86/xen/efi: Initialize UEFI secure boot state during dom0 boot

2018-01-09 Thread Daniel Kiper
Hi,

Initialize UEFI secure boot state during dom0 boot. Otherwise the kernel
may not even know that it runs on secure boot enabled platform.

Daniel

 arch/x86/xen/Makefile  |4 +++-
 arch/x86/xen/efi.c |   14 +
 drivers/firmware/efi/libstub/secureboot-core.c |   77 
+
 drivers/firmware/efi/libstub/secureboot.c  |   66 
+--
 4 files changed, 99 insertions(+), 62 deletions(-)

Daniel Kiper (4):
  efi/stub: Extract efi_get_secureboot() to separate file
  x86/xen/efi: Initialize boot_params.secure_boot in xen_efi_init()
  efi: Tweak efi_get_secureboot() and its data section assignment
  efi: Rename efi_get_secureboot() to __efi_get_secureboot() and make it 
static


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/4] efi/stub: Extract efi_get_secureboot() to separate file

2018-01-09 Thread Daniel Kiper
We have to call efi_get_secureboot() from early Xen dom0 boot code to properly
initialize boot_params.secure_boot. Sadly it lives in the EFI stub. Hence, it is
not readily reachable from the kernel proper. So, move efi_get_secureboot() to
separate file which can be included from the core kernel code. Subsequent patch
will add efi_get_secureboot() call from Xen dom0 boot code.

There is no functional change.

Signed-off-by: Daniel Kiper 
---
 drivers/firmware/efi/libstub/secureboot-core.c |   77 
 drivers/firmware/efi/libstub/secureboot.c  |   66 +---
 2 files changed, 78 insertions(+), 65 deletions(-)
 create mode 100644 drivers/firmware/efi/libstub/secureboot-core.c

diff --git a/drivers/firmware/efi/libstub/secureboot-core.c 
b/drivers/firmware/efi/libstub/secureboot-core.c
new file mode 100644
index 000..11a4feb
--- /dev/null
+++ b/drivers/firmware/efi/libstub/secureboot-core.c
@@ -0,0 +1,77 @@
+/*
+ * Secure boot handling.
+ *
+ * Copyright (C) 2013,2014 Linaro Limited
+ * Roy Franz 
+ * Copyright (C) 2013 Red Hat, Inc.
+ * Mark Salter 
+ *
+ * This file is part of the Linux kernel, and is made available under the
+ * terms of the GNU General Public License version 2.
+ */
+
+/* BIOS variables */
+static const efi_guid_t efi_variable_guid = EFI_GLOBAL_VARIABLE_GUID;
+static const efi_char16_t efi_SecureBoot_name[] = {
+   'S', 'e', 'c', 'u', 'r', 'e', 'B', 'o', 'o', 't', 0
+};
+static const efi_char16_t efi_SetupMode_name[] = {
+   'S', 'e', 't', 'u', 'p', 'M', 'o', 'd', 'e', 0
+};
+
+/* SHIM variables */
+static const efi_guid_t shim_guid = EFI_SHIM_LOCK_GUID;
+static const efi_char16_t shim_MokSBState_name[] = {
+   'M', 'o', 'k', 'S', 'B', 'S', 't', 'a', 't', 'e', 0
+};
+
+/*
+ * Determine whether we're in secure boot mode.
+ */
+enum efi_secureboot_mode efi_get_secureboot(efi_system_table_t *sys_table_arg)
+{
+   u32 attr;
+   u8 secboot, setupmode, moksbstate;
+   unsigned long size;
+   efi_status_t status;
+
+   size = sizeof(secboot);
+   status = get_efi_var(efi_SecureBoot_name, _variable_guid,
+NULL, , );
+   if (status == EFI_NOT_FOUND)
+   return efi_secureboot_mode_disabled;
+   if (status != EFI_SUCCESS)
+   goto out_efi_err;
+
+   size = sizeof(setupmode);
+   status = get_efi_var(efi_SetupMode_name, _variable_guid,
+NULL, , );
+   if (status != EFI_SUCCESS)
+   goto out_efi_err;
+
+   if (secboot == 0 || setupmode == 1)
+   return efi_secureboot_mode_disabled;
+
+   /*
+* See if a user has put the shim into insecure mode. If so, and if the
+* variable doesn't have the runtime attribute set, we might as well
+* honor that.
+*/
+   size = sizeof(moksbstate);
+   status = get_efi_var(shim_MokSBState_name, _guid,
+, , );
+
+   /* If it fails, we don't care why. Default to secure */
+   if (status != EFI_SUCCESS)
+   goto secure_boot_enabled;
+   if (!(attr & EFI_VARIABLE_RUNTIME_ACCESS) && moksbstate == 1)
+   return efi_secureboot_mode_disabled;
+
+secure_boot_enabled:
+   pr_efi(sys_table_arg, "UEFI Secure Boot is enabled.\n");
+   return efi_secureboot_mode_enabled;
+
+out_efi_err:
+   pr_efi_err(sys_table_arg, "Could not determine UEFI Secure Boot 
status.\n");
+   return efi_secureboot_mode_unknown;
+}
diff --git a/drivers/firmware/efi/libstub/secureboot.c 
b/drivers/firmware/efi/libstub/secureboot.c
index 959777e..4a6159f 100644
--- a/drivers/firmware/efi/libstub/secureboot.c
+++ b/drivers/firmware/efi/libstub/secureboot.c
@@ -14,73 +14,9 @@
 
 #include "efistub.h"
 
-/* BIOS variables */
-static const efi_guid_t efi_variable_guid = EFI_GLOBAL_VARIABLE_GUID;
-static const efi_char16_t efi_SecureBoot_name[] = {
-   'S', 'e', 'c', 'u', 'r', 'e', 'B', 'o', 'o', 't', 0
-};
-static const efi_char16_t efi_SetupMode_name[] = {
-   'S', 'e', 't', 'u', 'p', 'M', 'o', 'd', 'e', 0
-};
-
-/* SHIM variables */
-static const efi_guid_t shim_guid = EFI_SHIM_LOCK_GUID;
-static efi_char16_t const shim_MokSBState_name[] = {
-   'M', 'o', 'k', 'S', 'B', 'S', 't', 'a', 't', 'e', 0
-};
-
 #define get_efi_var(name, vendor, ...) \
efi_call_runtime(get_variable, \
 (efi_char16_t *)(name), (efi_guid_t *)(vendor), \
 __VA_ARGS__);
 
-/*
- * Determine whether we're in secure boot mode.
- */
-enum efi_secureboot_mode efi_get_secureboot(efi_system_table_t *sys_table_arg)
-{
-   u32 attr;
-   u8 secboot, setupmode, moksbstate;
-   unsigned long size;
-   efi_status_t status;
-
-   size = sizeof(secboot);
-   status = get_efi_var(efi_SecureBoot_name, _variable_guid,
-NULL, , );
-   if 

Re: [Xen-devel] [PATCH v6.5 20/26] x86: Protect unaware domains from meddling hyperthreads

2018-01-09 Thread Andrew Cooper
On 04/01/18 09:59, Jan Beulich wrote:
 On 04.01.18 at 01:15,  wrote:
>> Signed-off-by: Andrew Cooper 
> Fundamentally (as before)
> Reviewed-by: Jan Beulich 
> However:
>
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -2027,6 +2027,25 @@ int domain_relinquish_resources(struct domain *d)
>>   */
>>  void cpuid_policy_updated(struct vcpu *v)
>>  {
>> +const struct cpuid_policy *cp = v->domain->arch.cpuid;
>> +struct msr_vcpu_policy *vp = v->arch.msr;
>> +
>> +/*
>> + * For guests which know about IBRS but are not told about STIBP running
>> + * on hardware supporting hyperthreading, the guest doesn't know to
>> + * protect itself fully.  (Such a guest won't be permitted direct access
>> + * to the MSR.)  Have Xen fill in the gaps, so an unaware guest can't be
>> + * interfered with by a meddling guest on an adjacent hyperthread.
>> + */
>> +if ( cp->feat.ibrsb )
>> +{
>> +if ( !cp->feat.stibp && cpu_has_stibp &&
>> + !(vp->spec_ctrl.guest & (SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) )
>> +vp->spec_ctrl.host = SPEC_CTRL_STIBP;
>> +else
>> +vp->spec_ctrl.host = vp->spec_ctrl.guest;
> This code is so similar to ...
>
>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -181,7 +181,20 @@ int guest_wrmsr(struct vcpu *v, uint32_t msr, uint64_t 
>> val)
>>   (cp->feat.stibp ? SPEC_CTRL_STIBP : 0)) )
>>  goto gp_fault; /* Rsvd bit set? */
>>  vp->spec_ctrl.guest = val;
>> -vp->spec_ctrl.host  = val;
>> +
>> +/*
>> + * For guests which are not told about STIBP, running on hardware
>> + * supporting hyperthreading, the guest doesn't know to protect 
>> itself
>> + * fully.  (Such a guest won't be permitted direct access to the 
>> MSR.)
>> + * When IBRS is not in force, have Xen fill in the gaps, so an 
>> unaware
>> + * guest can't be interfered with by a meddling guest on an adjacent
>> + * hyperthread.
>> + */
>> +if ( !cp->feat.stibp && cpu_has_stibp &&
>> + !(val & (SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) )
>> +vp->spec_ctrl.host = SPEC_CTRL_STIBP;
>> +else
>> +vp->spec_ctrl.host = val;
> ... this that I think a helper function would be warranted, unless you
> have reasons to believe that future changes might break the
> similarity.

I don't expect them to diverge, and will pull it out into a separate helper.

>
> I'm also a little puzzled by you checking SPEC_CTRL_STIBP there -
> this bit ought to be clear when !cp->feat.stibp due to the earlier
> reserved bit check (part of which is even visible in context above).
> IOW the check is not wrong, but perhaps misleading. You had
> replied to this remark with
>
> "The SPEC_CTRL_STIBP check exists solely because of v3 review which
>  objected to me implying a link between IBRS and STIPB."

The original logic was was "!cp->feat.stibp && cpu_has_stibp && val ==
0", which you argued would go stale as new SPEC_CTRL_ bits got added.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Roger Pau Monné
On Tue, Jan 09, 2018 at 06:08:53AM -0800, Anthony Liguori wrote:
> On Jan 9, 2018 2:59 AM, "Ian Jackson"  wrote:
> 
> George Dunlap writes ("Re: Radical proposal: ship not-fully-tidied shim as
> 4.10.1"):
> > On 01/09/2018 10:53 AM, Ian Jackson wrote:
> > > And as my other mail suggests, I don't think we should allow this work
> > > to be blocked by outstanding reviewed.  IMO we should ship what we
> > > have ASAP.
> >
> > Well "what we have" boot under HVM?
> 
> No, so that does need to be fixed.  We could ship Amazon's series but
> that has no migration and no ballooning.
> 
> 
> Why do we think migration doesn't work?  I haven't tested but I cannot
> imagine why it wouldn't.

You need something like the following commit to make migration work:

http://xenbits.xen.org/gitweb/?p=people/liuw/xen.git;a=commit;h=ded375c74b435e6f03d6dbcaa11257a2568e7740

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Radical proposal: ship not-fully-tidied shim as 4.10.1

2018-01-09 Thread Anthony Liguori
On Jan 9, 2018 2:59 AM, "Ian Jackson"  wrote:

George Dunlap writes ("Re: Radical proposal: ship not-fully-tidied shim as
4.10.1"):
> On 01/09/2018 10:53 AM, Ian Jackson wrote:
> > And as my other mail suggests, I don't think we should allow this work
> > to be blocked by outstanding reviewed.  IMO we should ship what we
> > have ASAP.
>
> Well "what we have" boot under HVM?

No, so that does need to be fixed.  We could ship Amazon's series but
that has no migration and no ballooning.


Why do we think migration doesn't work?  I haven't tested but I cannot
imagine why it wouldn't.

Ballooning is trivial.  I can send a V3 this morning with ballooning.

Regards,

Anthony Liguori


I was hoping someone would have an opinion about how hard it would be
to take Amazon's early boot approach and stuff it into Citrix's
more-comprehensive shim series.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6.5 19/26] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD}

2018-01-09 Thread Jan Beulich
>>> On 09.01.18 at 14:34,  wrote:
> On 09/01/18 13:28, Jan Beulich wrote:
> On 09.01.18 at 13:03,  wrote:
>>> On 04/01/18 09:52, Jan Beulich wrote:
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -132,7 +132,8 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, 
> uint64_t *val)
>  case MSR_SPEC_CTRL:
>  if ( !cp->feat.ibrsb )
>  goto gp_fault;
> -*val = vp->spec_ctrl.guest;
> +*val = (vp->spec_ctrl.direct_access
> +? vp->spec_ctrl.host : vp->spec_ctrl.guest);
>  break;
 To recap, I had asked whether this is valid ahead of later changes,
 which you replied to saying this won't have any "by not permitting
 the guest any access until patch 25". In which case at the very
 least the patch title is misleading. Yet I don't even agree with what
 you say - patch 25 only fiddles with CPUID bits. Did you perhaps
 mean to say "By not permitting a well behaved guest any access
 until patch 25," as one trying to access the MSRs without consulting
 the CPUID bits would be able to starting with the patch here aiui?
>>> The guest access bit being clear in cpufeatureset.h means that the
>>> maximum featureset calculations for guests will guarantee that
>>> cp->feat.ibrsb is currently false.
>> Well, that was the point of my reply: You mean well behaved
>> guests (ones consulting CPUID), but you don't say so, and - as
>> said - I think ones trying to access the MSRs anyway will observe
>> the accesses to work as of this patch, yet as it seems not fully
>> correctly (until that later patch is in place).
>>
>> As pointed out before - I fine with things not working fully right
>> until that later patch, but the situation should be stated clearly.
> 
> But it still functions correctly.  A guest which ignores CPUID will
> still find two MSRs which unconditionally #GP when poked.  The logic to
> allow passthrough is also derived from the cpuid policy, which disallows
> the passthrough until patch 25.

Oh, right - I keep not realizing that the variable named "cp" is
the cpuid policy of the guest.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 20/74] x86: produce a binary that can be booted as PVH

2018-01-09 Thread Wei Liu
On Mon, Jan 08, 2018 at 09:42:54AM -0700, Jan Beulich wrote:
> >>> On 08.01.18 at 16:59,  wrote:
> > On Fri, Jan 05, 2018 at 04:39:33AM -0700, Jan Beulich wrote:
> >> >>> On 04.01.18 at 14:05,  wrote:
> >> > --- a/xen/arch/x86/Makefile
> >> > +++ b/xen/arch/x86/Makefile
> >> > @@ -75,6 +75,8 @@ efi-y := $(shell if [ ! -r 
> >> > $(BASEDIR)/include/xen/compile.h -o \
> >> >-O $(BASEDIR)/include/xen/compile.h ]; then \
> >> >   echo '$(TARGET).efi'; fi)
> >> >  
> >> > +shim-$(CONFIG_PVH_GUEST) := $(TARGET)-shim
> >> > +
> >> >  ifneq ($(build_id_linker),)
> >> >  notes_phdrs = --notes
> >> >  else
> >> > @@ -93,7 +95,7 @@ endif
> >> >  syms-warn-dup-y := --warn-dup
> >> >  syms-warn-dup-$(CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS) :=
> >> >  
> >> > -$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
> >> > +$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32 $(shim-y)
> >> >  ./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TARGET) 
> >> > $(XEN_IMG_OFFSET) \
> >> > `$(NM) $(TARGET)-syms | sed -ne 's/^\([^ ]*\) . 
> >> > __2M_rwdata_end$$/0x\1/p'`
> >> 
> >> Hmm, so you mean to build shim and "normal" Xen at the same time,
> >> with all the same objects? That's rather unexpected following the
> >> earlier exchange Andrew and I had. I would expect the shim to not
> >> require quite a few bits and pieces, and hence wanting to be built
> >> independently.
> >> 
> > 
> > There is a later patch in this series to link xen under tools/firmware/
> > to build the shim there, which would need build system patch like this.
> > 
> > The can be cleaned up somehow. At the time I wasn't sure how best to
> > proceed (and certainly didn't take part in the discussion between Andrew
> > and you).
> > 
> > Suggestions welcome.
> 
> Well, when I had discussed this with Andrew, my view on the
> outcome was that we'd build either xen-shim or the pair of
> xen.gz and xen.efi in a single build invocation (hence two build
> all three, a second make would be needed, which would seem
> to be at least along the lines of what that later patch is doing).
> 
> The above dependency, otoh, suggests that you want to
> build both xen.gz and xen-shim.

Removing the dependency should be easy. It was added mostly for
convenience of development.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6.5 19/26] x86/hvm: Permit guests direct access to MSR_{SPEC_CTRL, PRED_CMD}

2018-01-09 Thread Andrew Cooper
On 09/01/18 13:28, Jan Beulich wrote:
 On 09.01.18 at 13:03,  wrote:
>> On 04/01/18 09:52, Jan Beulich wrote:
 --- a/xen/arch/x86/msr.c
 +++ b/xen/arch/x86/msr.c
 @@ -132,7 +132,8 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, 
 uint64_t *val)
  case MSR_SPEC_CTRL:
  if ( !cp->feat.ibrsb )
  goto gp_fault;
 -*val = vp->spec_ctrl.guest;
 +*val = (vp->spec_ctrl.direct_access
 +? vp->spec_ctrl.host : vp->spec_ctrl.guest);
  break;
>>> To recap, I had asked whether this is valid ahead of later changes,
>>> which you replied to saying this won't have any "by not permitting
>>> the guest any access until patch 25". In which case at the very
>>> least the patch title is misleading. Yet I don't even agree with what
>>> you say - patch 25 only fiddles with CPUID bits. Did you perhaps
>>> mean to say "By not permitting a well behaved guest any access
>>> until patch 25," as one trying to access the MSRs without consulting
>>> the CPUID bits would be able to starting with the patch here aiui?
>> The guest access bit being clear in cpufeatureset.h means that the
>> maximum featureset calculations for guests will guarantee that
>> cp->feat.ibrsb is currently false.
> Well, that was the point of my reply: You mean well behaved
> guests (ones consulting CPUID), but you don't say so, and - as
> said - I think ones trying to access the MSRs anyway will observe
> the accesses to work as of this patch, yet as it seems not fully
> correctly (until that later patch is in place).
>
> As pointed out before - I fine with things not working fully right
> until that later patch, but the situation should be stated clearly.

But it still functions correctly.  A guest which ignores CPUID will
still find two MSRs which unconditionally #GP when poked.  The logic to
allow passthrough is also derived from the cpuid policy, which disallows
the passthrough until patch 25.

(In fact, before this patch, a guest which poked at the MSRs wouldn't
find a #GP everywhere, because our MSR infrastructure is leaky.)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 52/74] xen: mark xenstore/console pages as RAM and add them to dom_io

2018-01-09 Thread Jan Beulich
>>> On 09.01.18 at 12:26,  wrote:
> On Tue, Jan 09, 2018 at 04:03:25AM -0700, Jan Beulich wrote:
>> >>> On 09.01.18 at 10:25,  wrote:
>> > On Mon, Jan 08, 2018 at 06:49:21AM -0700, Jan Beulich wrote:
>> >> >>> On 04.01.18 at 14:06,  wrote:
>> >> > +void __init hypervisor_init_memory(void)
>> >> > +{
>> >> > +uint64_t pfn = 0;
>> >> > +long rc;
>> >> > +
>> >> > +if ( !xen_guest )
>> >> > +return;
>> >> > +
>> >> > +#define SHARE_PARAM(p) ({  
>> >> >  \
>> >> > +rc = xen_hypercall_hvm_get_param(p, ); 
>> >> > \
>> >> > +if ( rc )  
>> >> >  \
>> >> > +panic("Unable to get " #p);
>> >> >  \
>> >> > +share_xen_page_with_guest(mfn_to_page(pfn), dom_io, 
>> >> > XENSHARE_writable); \
>> >> 
>> >> Why dom_io rather than the client domain?
>> > 
>> > The client domain is not yet created at this point. This is exactly
>> > the same that Xen does for the low 1MiB for example.
>> 
>> The low 1Mb is being treated as MMIO, hence remains assigned
>> to dom_io.
>> 
>> >> The more that dom_io
>> >> pages can only be mapped by privileged guests (and hence I
>> >> assume you need another tweak somewhere this way).
>> > 
>> > I just use unshare_xen_page and share it again with the guest.
>> 
>> And there is no option of simply doing the sharing here later,
>> when the domain is already in existence?
> 
> I'm afraid that if I don't add the pages to dom_io at this point they
> would be added to the free memory pool, and thus might be used for
> anything. Maybe I'm missing something, but I didn't find any other way
> to deal with this given the short time.

The first thing you do is mark these pages as E820 RAM. If that
wasn't done, I don't think they'd end up in the allocator, and
hence could be shared later. I guess there may nevertheless be
a reason to do that early E820 manipulation, but with the patch
having no description I cannot guess what that reason might be
(and hence I can't think of alternatives).

Anyway - in the interest of quick progress I'm fine with this being
left as is for now, as long as revisiting it is being put on a todo
list.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   >