Re: [Xen-devel] [RFC] ARM PCI Passthrough design document

2017-05-29 Thread Manish Jaggi

Hi Julien,

On 5/29/2017 11:44 PM, Julien Grall wrote:



On 05/29/2017 03:30 AM, Manish Jaggi wrote:

Hi Julien,


Hello Manish,


On 5/26/2017 10:44 PM, Julien Grall wrote:
PCI pass-through allows the guest to receive full control of 
physical PCI
devices. This means the guest will have full and direct access to 
the PCI

device.

ARM is supporting a kind of guest that exploits as much as possible
virtualization support in hardware. The guest will rely on PV driver 
only

for IO (e.g block, network) and interrupts will come through the
virtualized
interrupt controller, therefore there are no big changes required
within the
kernel.

As a consequence, it would be possible to replace PV drivers by
assigning real
devices to the guest for I/O access. Xen on ARM would therefore be
able to
run unmodified operating system.

To achieve this goal, it looks more sensible to go towards emulating 
the

host bridge (there will be more details later).

IIUC this means that domU would have an emulated host bridge and dom0
will see the actual host bridge?


You don't want the hardware domain and Xen access the configuration 
space at the same time. So if Xen is in charge of the host bridge, 
then an emulated host bridge should be exposed to the hardware.
I believe in x86 case dom0 and Xen do access the config space. In the 
context of pci device add hypercall.

Thats when the pci_config_XXX functions in xen are called.


Although, this is depending on who is in charge of the the host 
bridge. As you may have noticed, this design document is proposing two 
ways to handle configuration space access. At the moment any generic 
host bridge (see the definition in the design document) will be 
handled in Xen and the hardware domain will have an emulated host bridge.


So in case of generic hb, xen will manage the config space and provide a 
emulated I/f to dom0, and accesses would be trapped by Xen.
Essentially the goal is to scan all pci devices and register them with 
Xen (which in turn will configure the smmu).
For a  generic hb, this can be done either in dom0/xen. The only doubt 
here is what extra benefit the emulated hb give in case of dom0.


If your host bridges is not a generic one, then the hardware domain 
will be  in charge of the host bridges, any configuration access from 
Xen will be forward to the hardware domain.


At the moment, as part of the first implementation, we are only 
looking to implement a generic host bridge in Xen. We will decide on 
case by case basis for all the other host bridges whether we want to 
have the driver in Xen.

agreed.


[...]


## IOMMU

The IOMMU will be used to isolate the PCI device when accessing the
memory (e.g
DMA and MSI Doorbells). Often the IOMMU will be configured using a
MasterID
(aka StreamID for ARM SMMU)  that can be deduced from the SBDF with
the help
of the firmware tables (see below).

Whilst in theory, all the memory transactions issued by a PCI device
should
go through the IOMMU, on certain platforms some of the memory
transaction may
not reach the IOMMU because they are interpreted by the host bridge. 
For

instance, this could happen if the MSI doorbell is built into the PCI
host
bridge or for P2P traffic. See [6] for more details.

XXX: I think this could be solved by using direct mapping (e.g GFN ==
MFN),
this would mean the guest memory layout would be similar to the host
one when
PCI devices will be pass-throughed => Detail it.
In the example given in the IORT spec, for pci devices not behind an 
SMMU,

how would the writes from the device be protected.


I realize the XXX paragraph is quite confusing. I am not trying to 
solve the problem where PCI devices are not protected behind an SMMU 
but platform where some transactions (e.g P2P or MSI doorbell access) 
are by-passing the SMMU.


You may still want to allow PCI passthrough in that case, because you 
know that P2P cannot be done (or potentially disabled) and MSI 
doorbell access is protected (for instance a write in the ITS doorbell 
will be tagged with the device by the hardware). In order to support 
such platform you need to direct map the doorbel (e.g GFN == MFN) and 
carve out the P2P region from the guest memory map. Hence the 
suggestion to re-use the host memory layout for the guest.


Note that it does not mean the RAM region will be direct mapped. It is 
only there to ease carving out memory region by-passed by the SMMU.


[...]


## ACPI

### Host bridges

The static table MCFG (see 4.2 in [1]) will describe the host bridges
available
at boot and supporting ECAM. Unfortunately, there are platforms out 
there

(see [2]) that re-use MCFG to describe host bridge that are not fully
ECAM
compatible.

This means that Xen needs to account for possible quirks in the host
bridge.
The Linux community are working on a patch series for this, see [2]
and [3],
where quirks will be detected with:
 * OEM ID
 * OEM Table ID
 * OEM Revision
 * PCI Segment
 * PCI bus number range 

Re: [Xen-devel] [PATCH 2/2] xen/input: add multi-touch support

2017-05-29 Thread Dmitry Torokhov
On Fri, Apr 21, 2017 at 09:40:36AM +0300, Oleksandr Andrushchenko wrote:
> Hi, Dmitry!
> 
> On 04/21/2017 05:10 AM, Dmitry Torokhov wrote:
> >Hi Oleksandr,
> >
> >On Thu, Apr 13, 2017 at 02:38:04PM +0300, Oleksandr Andrushchenko wrote:
> >>From: Oleksandr Andrushchenko 
> >>
> >>Extend xen_kbdfront to provide multi-touch support
> >>to unprivileged domains.
> >>
> >>Signed-off-by: Oleksandr Andrushchenko 
> >>---
> >>  drivers/input/misc/xen-kbdfront.c | 142 
> >> +-
> >>  1 file changed, 140 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/drivers/input/misc/xen-kbdfront.c 
> >>b/drivers/input/misc/xen-kbdfront.c
> >>index 01c27b4c3288..e5d064aaa237 100644
> >>--- a/drivers/input/misc/xen-kbdfront.c
> >>+++ b/drivers/input/misc/xen-kbdfront.c
> >>@@ -17,6 +17,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >>+#include 
> >>  #include 
> >>  #include 
> >>@@ -34,11 +35,14 @@
> >>  struct xenkbd_info {
> >>struct input_dev *kbd;
> >>struct input_dev *ptr;
> >>+   struct input_dev *mtouch;
> >>struct xenkbd_page *page;
> >>int gref;
> >>int irq;
> >>struct xenbus_device *xbdev;
> >>char phys[32];
> >>+   /* current MT slot/contact ID we are injecting events in */
> >>+   int mtouch_cur_contact_id;
> >>  };
> >>  enum { KPARAM_X, KPARAM_Y, KPARAM_CNT };
> >>@@ -47,6 +51,12 @@ module_param_array(ptr_size, int, NULL, 0444);
> >>  MODULE_PARM_DESC(ptr_size,
> >>"Pointing device width, height in pixels (default 800,600)");
> >>+enum { KPARAM_MT_X, KPARAM_MT_Y, KPARAM_MT_CNT };
> >>+static int mtouch_size[KPARAM_MT_CNT] = { XENFB_WIDTH, XENFB_HEIGHT };
> >>+module_param_array(mtouch_size, int, NULL, 0444);
> >>+MODULE_PARM_DESC(ptr_size,
> >>+   "Multi-touch device width, height in pixels (default 800,600)");
> >>+
> >Why do you need separate module parameters for multi-touch device?
> please see below
> >
> >>  static int xenkbd_remove(struct xenbus_device *);
> >>  static int xenkbd_connect_backend(struct xenbus_device *, struct 
> >> xenkbd_info *);
> >>  static void xenkbd_disconnect_backend(struct xenkbd_info *);
> >>@@ -100,6 +110,60 @@ static irqreturn_t input_handler(int rq, void *dev_id)
> >>input_report_rel(dev, REL_WHEEL,
> >> -event->pos.rel_z);
> >>break;
> >>+   case XENKBD_TYPE_MTOUCH:
> >>+   dev = info->mtouch;
> >>+   if (unlikely(!dev))
> >>+   break;
> >>+   if (unlikely(event->mtouch.contact_id !=
> >>+   info->mtouch_cur_contact_id)) {
> >Why is this unlikely? Does contact ID changes once in 1000 packets or
> >even less?
> Mu assumption was that regardless of the fact that we are multi-touch
> device still single touches will come in more frequently
> But I can remove *unlikely* if my assumption is not correct

I think the normal expectation is that "unlikely" is supposed for
something that happens once in a blue moon, so I'd rather remove it.

> >>+   info->mtouch_cur_contact_id =
> >>+   event->mtouch.contact_id;
> >>+   input_mt_slot(dev, event->mtouch.contact_id);
> >>+   }
> >>+   switch (event->mtouch.event_type) {
> >>+   case XENKBD_MT_EV_DOWN:
> >>+   input_mt_report_slot_state(dev, MT_TOOL_FINGER,
> >>+  true);

Should we establish tool event? We have MT_TOOL_PEN, etc.

> >>+   input_event(dev, EV_ABS, ABS_MT_POSITION_X,
> >>+   event->mtouch.u.pos.abs_x);
> >>+   input_event(dev, EV_ABS, ABS_MT_POSITION_Y,
> >>+   event->mtouch.u.pos.abs_y);
> >>+   input_event(dev, EV_ABS, ABS_X,
> >>+   event->mtouch.u.pos.abs_x);
> >>+   input_event(dev, EV_ABS, ABS_Y,
> >>+   event->mtouch.u.pos.abs_y);
> >>+   break;
> >>+   case XENKBD_MT_EV_UP:
> >>+   input_mt_report_slot_state(dev, MT_TOOL_FINGER,
> >>+  false);
> >>+   break;
> >>+   case XENKBD_MT_EV_MOTION:
> >>+   input_event(dev, EV_ABS, ABS_MT_POSITION_X,
> >>+   event->mtouch.u.pos.abs_x);
> >>+   input_event(dev, EV_ABS, ABS_MT_POSITION_Y,
> >>+   event->mtouch.u.pos.abs_y);
> >>+   input_event(dev, EV_ABS, ABS_X,
> >>+   

[Xen-devel] [linux-4.1 test] 109834: regressions - FAIL

2017-05-29 Thread osstest service owner
flight 109834 linux-4.1 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109834/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-libvirt-xsm 17 guest-start/debian.repeat fail REGR. vs. 106776
 test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 
106776
 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 
106776
 test-amd64-i386-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 
106776

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail  like 106756
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 106776
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 106776
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 106776
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl   6 xen-boot fail   never pass
 test-arm64-arm64-libvirt-qcow2  6 xen-boot fail never pass
 test-arm64-arm64-libvirt-xsm  6 xen-boot fail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64  9 windows-installfail never pass
 test-arm64-arm64-xl-multivcpu  6 xen-boot fail  never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-examine  6 reboot   fail   never pass
 test-arm64-arm64-xl-credit2   6 xen-boot fail   never pass
 test-arm64-arm64-libvirt  6 xen-boot fail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-rtds  6 xen-boot fail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm   6 xen-boot fail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64  9 windows-install fail never pass
 test-amd64-amd64-xl-qemut-win10-i386  9 windows-installfail never pass
 test-amd64-amd64-xl-qemuu-win10-i386  9 windows-installfail never pass
 test-amd64-i386-xl-qemut-ws16-amd64  9 windows-install fail never pass
 test-amd64-i386-xl-qemuu-win10-i386  9 windows-install fail never pass
 test-amd64-i386-xl-qemut-win10-i386  9 windows-install fail never pass

version targeted for testing:
 linux56d847e3ef9433d7ac92376e4ba49d3cf3cb70d2
baseline version:
 linuxd9e0350d2575a20ee7783427da9bd6b6107eb983

Last test of basis   106776  2017-03-19 14:16:43 Z   71 days
Testing same since   109834  2017-05-29 10:21:14 Z0 days1 attempts


People who touched revisions under test:
  "Eric W. Biederman" 
  

[Xen-devel] [linux-next test] 109833: regressions - FAIL

2017-05-29 Thread osstest service owner
flight 109833 linux-next real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109833/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail REGR. vs. 
109821
 build-armhf-pvops 5 kernel-build fail REGR. vs. 109821

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-rumprun-i386 16 rumprun-demo-xenstorels/xenstorels.repeat fail 
REGR. vs. 109821

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-examine  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop  fail blocked in 109821
 test-amd64-i386-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail like 109809
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 109821
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64  9 windows-installfail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-arm64-arm64-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-rtds 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-rtds 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 12 saverestore-support-checkfail never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-xl-qemuu-win10-i386  9 windows-install fail never pass
 test-amd64-i386-xl-qemut-win10-i386  9 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386  9 windows-installfail never pass
 test-amd64-i386-xl-qemut-ws16-amd64  9 windows-install fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64  9 windows-install fail never pass
 test-amd64-amd64-xl-qemut-win10-i386  9 windows-installfail never pass

version targeted for testing:
 linux62d5d7921010983f4fff9912aba8942ad0032f42
baseline version:
 linux249f1efd8e3d58f86469fc305b0eda39db18d7ce

Last test of basis  (not found) 
Failing since   (not found) 
Testing same since   109833  2017-05-29 09:22:00 Z0 days1 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  

[Xen-devel] [ovmf baseline-only test] 71456: tolerable FAIL

2017-05-29 Thread Platform Team regression test user
This run is configured for baseline tests only.

flight 71456 ovmf real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/71456/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 build-amd64-libvirt   5 libvirt-buildfail   like 71453
 build-i386-libvirt5 libvirt-buildfail   like 71453

version targeted for testing:
 ovmf b9036ebee9ddaf26afc9fbe3236c3d03f83c1b0a
baseline version:
 ovmf f4d3ba87bb8f5d82d3b80532ea4c83b7bbca41c0

Last test of basis71453  2017-05-28 14:49:57 Z1 days
Testing same since71456  2017-05-29 20:19:57 Z0 days1 attempts


People who touched revisions under test:
  Laszlo Ersek 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


commit b9036ebee9ddaf26afc9fbe3236c3d03f83c1b0a
Author: Laszlo Ersek 
Date:   Thu May 18 14:48:13 2017 +0200

QuarkPlatformPkg/SpiFvbServices: correct NumOfLba vararg type in 
EraseBlocks()

According to the PI spec, Volume 3,
EFI_FIRMWARE_VOLUME_BLOCK2_PROTOCOL.EraseBlocks():

> The variable argument list is a list of tuples. Each tuple describes a
> range of LBAs to erase and consists of the following:
> * An EFI_LBA that indicates the starting LBA
> * A UINTN that indicates the number of blocks to erase

(NB, in edk2, EFI_FIRMWARE_VOLUME_BLOCK_PROTOCOL is a typedef to
EFI_FIRMWARE_VOLUME_BLOCK2_PROTOCOL.)

In this driver, the NumOfLba local variable is defined with type UINTN,
but the TYPE argument passed to VA_ARG() is UINT32. Fix the mismatch.

Cc: Jordan Justen 
Cc: Kelly Steele 
Cc: Michael D Kinney 
Reported-by: Jordan Justen 
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek 
Reviewed-by: Jordan Justen 
Reviewed-by: Michael D Kinney 

commit d0d7289cce487eabc2180bb5b065886a37a257bd
Author: Laszlo Ersek 
Date:   Thu May 18 14:48:13 2017 +0200

Nt32Pkg/FvbServicesRuntimeDxe: correct NumOfLba vararg type in EraseBlocks()

According to the PI spec, Volume 3,
EFI_FIRMWARE_VOLUME_BLOCK2_PROTOCOL.EraseBlocks():

> The variable argument list is a list of tuples. Each tuple describes a
> range of LBAs to erase and consists of the following:
> * An EFI_LBA that indicates the starting LBA
> * A UINTN that indicates the number of blocks to erase

(NB, in edk2, EFI_FIRMWARE_VOLUME_BLOCK_PROTOCOL is a typedef to
EFI_FIRMWARE_VOLUME_BLOCK2_PROTOCOL.)

In this driver, the NumOfLba local variable is defined with type UINTN,
but the TYPE argument passed to VA_ARG() is UINT32. Fix the mismatch.

Cc: Jordan Justen 
Cc: Ruiyu Ni 
Reported-by: Jordan Justen 
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek 
Reviewed-by: Jordan Justen 
Reviewed-by: Hao Wu 

commit d98e939f4fd72d3f426e0005beba554de5fb9bd0
Author: Laszlo Ersek 
Date:   Thu May 18 14:48:13 2017 +0200

DuetPkg/FvbRuntimeService: correct NumOfLba vararg type in EraseBlocks()

According to the PI spec, Volume 3,
EFI_FIRMWARE_VOLUME_BLOCK2_PROTOCOL.EraseBlocks():

> The variable argument list is a list of tuples. Each tuple describes a
> range of LBAs to erase and consists of the following:
> * An EFI_LBA that indicates the starting LBA

[Xen-devel] [linux-linus test] 109832: regressions - FAIL

2017-05-29 Thread osstest service owner
flight 109832 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109832/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-libvirt  7 host-ping-check-xen  fail REGR. vs. 109656
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail 
REGR. vs. 109656
 test-amd64-amd64-xl 19 guest-start/debian.repeat fail REGR. vs. 109656
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-saverestore.2 fail REGR. vs. 
109656
 test-amd64-amd64-xl-qcow2   18 guest-start/debian.repeat fail REGR. vs. 109656

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-rtds   15 guest-start/debian.repeat fail blocked in 109656
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 109656
 test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail like 109656
 test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail like 109656
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 109656
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 109656
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 109656
 test-amd64-amd64-xl-qemut-ws16-amd64  9 windows-installfail never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-arm64-arm64-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-rtds 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-rtds 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 12 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-amd64-i386-xl-qemuu-win10-i386  9 windows-install fail never pass
 test-amd64-i386-xl-qemut-win10-i386  9 windows-install fail never pass
 test-amd64-amd64-xl-qemut-win10-i386  9 windows-installfail never pass
 test-amd64-i386-xl-qemut-ws16-amd64  9 windows-install fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64  9 windows-install fail never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   

[Xen-devel] [ovmf test] 109835: all pass - PUSHED

2017-05-29 Thread osstest service owner
flight 109835 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109835/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf b9036ebee9ddaf26afc9fbe3236c3d03f83c1b0a
baseline version:
 ovmf f4d3ba87bb8f5d82d3b80532ea4c83b7bbca41c0

Last test of basis   109816  2017-05-28 12:17:40 Z1 days
Testing same since   109835  2017-05-29 12:48:58 Z0 days1 attempts


People who touched revisions under test:
  Laszlo Ersek 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=ovmf
+ revision=b9036ebee9ddaf26afc9fbe3236c3d03f83c1b0a
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push ovmf 
b9036ebee9ddaf26afc9fbe3236c3d03f83c1b0a
+ branch=ovmf
+ revision=b9036ebee9ddaf26afc9fbe3236c3d03f83c1b0a
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=ovmf
+ xenbranch=xen-unstable
+ '[' xovmf = xlinux ']'
+ linuxbranch=
+ '[' x = x ']'
+ qemuubranch=qemu-upstream-unstable
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable
+ prevxenbranch=xen-4.8-testing
+ '[' xb9036ebee9ddaf26afc9fbe3236c3d03f83c1b0a = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git
++ : git://git.seabios.org/seabios.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git
++ : git://xenbits.xen.org/osstest/seabios.git
++ : https://github.com/tianocore/edk2.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/linux-firmware.git
++ : osst...@xenbits.xen.org:/home/osstest/ext/linux-firmware.git
++ : git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
++ : 

Re: [Xen-devel] [for-4.9] Re: HVM guest performance regression

2017-05-29 Thread Juergen Gross
On 26/05/17 21:01, Stefano Stabellini wrote:
> On Fri, 26 May 2017, Juergen Gross wrote:
>> On 26/05/17 18:19, Ian Jackson wrote:
>>> Juergen Gross writes ("HVM guest performance regression"):
 Looking for the reason of a performance regression of HVM guests under
 Xen 4.7 against 4.5 I found the reason to be commit
 c26f92b8fce3c9df17f7ef035b54d97cbe931c7a ("libxl: remove freemem_slack")
 in Xen 4.6.

 The problem occurred when dom0 had to be ballooned down when starting
 the guest. The performance of some micro benchmarks dropped by about
 a factor of 2 with above commit.

 Interesting point is that the performance of the guest will depend on
 the amount of free memory being available at guest creation time.
 When there was barely enough memory available for starting the guest
 the performance will remain low even if memory is being freed later.

 I'd like to suggest we either revert the commit or have some other
 mechanism to try to have some reserve free memory when starting a
 domain.
>>>
>>> Oh, dear.  The memory accounting swamp again.  Clearly we are not
>>> going to drain that swamp now, but I don't like regressions.
>>>
>>> I am not opposed to reverting that commit.  I was a bit iffy about it
>>> at the time; and according to the removal commit message, it was
>>> basically removed because it was a piece of cargo cult for which we
>>> had no justification in any of our records.
>>>
>>> Indeed I think fixing this is a candidate for 4.9.
>>>
>>> Do you know the mechanism by which the freemem slack helps ?  I think
>>> that would be a prerequisite for reverting this.  That way we can have
>>> an understanding of why we are doing things, rather than just
>>> flailing at random...
>>
>> I wish I would understand it.
>>
>> One candidate would be 2M/1G pages being possible with enough free
>> memory, but I haven't proofed this yet. I can have a try by disabling
>> big pages in the hypervisor.
> 
> Right, if I had to bet, I would put my money on superpages shattering
> being the cause of the problem.

Creating the domains with

xl -vvv create ...

showed the numbers of superpages and normal pages allocated for the
domain.

The following allocation pattern resulted in a slow domain:

xc: detail: PHYSICAL MEMORY ALLOCATION:
xc: detail:   4KB PAGES: 0x0600
xc: detail:   2MB PAGES: 0x03f9
xc: detail:   1GB PAGES: 0x

And this one was fast:

xc: detail: PHYSICAL MEMORY ALLOCATION:
xc: detail:   4KB PAGES: 0x0400
xc: detail:   2MB PAGES: 0x03fa
xc: detail:   1GB PAGES: 0x

I ballooned dom0 down in small steps to be able to create those
test cases.

I believe the main reason is that some data needed by the benchmark
is located near the end of domain memory resulting in a rather high
TLB miss rate in case of not all (or nearly all) memory available in
form of 2MB pages.

>> What makes the whole problem even more mysterious is that the
>> regression was detected first with SLE12 SP3 (guest and dom0, Xen 4.9
>> and Linux 4.4) against older systems (guest and dom0). While trying
>> to find out whether the guest or the Xen version are the culprit I
>> found that the old guest (based on kernel 3.12) showed the mentioned
>> performance drop with above commit. The new guest (based on kernel
>> 4.4) shows the same bad performance regardless of the Xen version or
>> amount of free memory. I haven't found the Linux kernel commit yet
>> being responsible for that performance drop.

And this might be result of a different memory usage of more recent
kernels: I suspect the critical data is now at the very end of the
domain's memory. As there are always some pages allocated in 4kB
chunks the last pages of the domain will never be part of a 2MB page.

Looking at meminit_hvm() in libxc doing the allocation of the memory
I realized it is kind of sub-optimal: shouldn't it try to allocate
the largest pages first and the smaller pages later?

Would it be possible to make memory holes larger sometimes to avoid
having to use 4kB pages (with the exception of the first 2MB of the
domain, of course)?

Maybe it would even make sense to be able to tweak the allocation
pattern depending on the guest type: preferring large pages either
at the top or at the bottom of the domain's physical address space.

And what should be done with the "freemem_slack" patch? With my
findings I don't think we can define a fixed percentage of the memory
which should be free. I could imagine some kind of mechanism using
dom0 ballooning more dynamically: As long as enough memory is unused
in dom0 balloon it down in case of an allocation failure of a large
page (1GB or 2MB). After all memory for the new domain has been
allocated balloon dom0 up again (but not more than before starting
creation of the new domain, of course).

Thoughts?


Juergen

___
Xen-devel mailing list

Re: [Xen-devel] [PATCH 03/12 v3] xen/arm: vpl011: Add pl011 uart emulation in Xen

2017-05-29 Thread Julien Grall



On 05/29/2017 08:13 AM, Bhupinder Thakur wrote:

Hi Julien,


Hi Bhupinder,


On 26 May 2017 at 19:12, Bhupinder Thakur  wrote:

+#ifndef _VPL011_H_
+
+#define _VPL011_H_
+
+#include 
+#include 
+
+DEFINE_XEN_FLEX_RING(vpl011);



I am sure someone already said it in a previous version. The vpl011 is the
console ring. So why are we defining our own internally?


This macro only defines standard functions to operate on the console
ring. Stefano suggested to use the standard functions to operate on
the ring buffer.


I don't want things to be mixed up like that, this is a call to trouble 
later on if someone decide to update console.h.


If you need to introduce standard functions, they should be defined in 
console.h and not vpl011.h.





At least this should have been used by xenconsole, but this is not the
case... So we should really avoid defining our own ring here and re-use
public/io/console.h.


I am using the console ring definition as defined in
xen/include/public/io/console.h.


See above.

Cheers,

--
Julien Grall


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC] ARM PCI Passthrough design document

2017-05-29 Thread Julien Grall



On 05/29/2017 03:30 AM, Manish Jaggi wrote:

Hi Julien,


Hello Manish,


On 5/26/2017 10:44 PM, Julien Grall wrote:

PCI pass-through allows the guest to receive full control of physical PCI
devices. This means the guest will have full and direct access to the PCI
device.

ARM is supporting a kind of guest that exploits as much as possible
virtualization support in hardware. The guest will rely on PV driver only
for IO (e.g block, network) and interrupts will come through the
virtualized
interrupt controller, therefore there are no big changes required
within the
kernel.

As a consequence, it would be possible to replace PV drivers by
assigning real
devices to the guest for I/O access. Xen on ARM would therefore be
able to
run unmodified operating system.

To achieve this goal, it looks more sensible to go towards emulating the
host bridge (there will be more details later).

IIUC this means that domU would have an emulated host bridge and dom0
will see the actual host bridge?


You don't want the hardware domain and Xen access the configuration 
space at the same time. So if Xen is in charge of the host bridge, then 
an emulated host bridge should be exposed to the hardware.


Although, this is depending on who is in charge of the the host bridge. 
As you may have noticed, this design document is proposing two ways to 
handle configuration space access. At the moment any generic host bridge 
(see the definition in the design document) will be handled in Xen and 
the hardware domain will have an emulated host bridge.


If your host bridges is not a generic one, then the hardware domain will 
be  in charge of the host bridges, any configuration access from Xen 
will be forward to the hardware domain.


At the moment, as part of the first implementation, we are only looking 
to implement a generic host bridge in Xen. We will decide on case by 
case basis for all the other host bridges whether we want to have the 
driver in Xen.


[...]


## IOMMU

The IOMMU will be used to isolate the PCI device when accessing the
memory (e.g
DMA and MSI Doorbells). Often the IOMMU will be configured using a
MasterID
(aka StreamID for ARM SMMU)  that can be deduced from the SBDF with
the help
of the firmware tables (see below).

Whilst in theory, all the memory transactions issued by a PCI device
should
go through the IOMMU, on certain platforms some of the memory
transaction may
not reach the IOMMU because they are interpreted by the host bridge. For
instance, this could happen if the MSI doorbell is built into the PCI
host
bridge or for P2P traffic. See [6] for more details.

XXX: I think this could be solved by using direct mapping (e.g GFN ==
MFN),
this would mean the guest memory layout would be similar to the host
one when
PCI devices will be pass-throughed => Detail it.

In the example given in the IORT spec, for pci devices not behind an SMMU,
how would the writes from the device be protected.


I realize the XXX paragraph is quite confusing. I am not trying to solve 
the problem where PCI devices are not protected behind an SMMU but 
platform where some transactions (e.g P2P or MSI doorbell access) are 
by-passing the SMMU.


You may still want to allow PCI passthrough in that case, because you 
know that P2P cannot be done (or potentially disabled) and MSI doorbell 
access is protected (for instance a write in the ITS doorbell will be 
tagged with the device by the hardware). In order to support such 
platform you need to direct map the doorbel (e.g GFN == MFN) and carve 
out the P2P region from the guest memory map. Hence the suggestion to 
re-use the host memory layout for the guest.


Note that it does not mean the RAM region will be direct mapped. It is 
only there to ease carving out memory region by-passed by the SMMU.


[...]


## ACPI

### Host bridges

The static table MCFG (see 4.2 in [1]) will describe the host bridges
available
at boot and supporting ECAM. Unfortunately, there are platforms out there
(see [2]) that re-use MCFG to describe host bridge that are not fully
ECAM
compatible.

This means that Xen needs to account for possible quirks in the host
bridge.
The Linux community are working on a patch series for this, see [2]
and [3],
where quirks will be detected with:
 * OEM ID
 * OEM Table ID
 * OEM Revision
 * PCI Segment
 * PCI bus number range (wildcard allowed)

Based on what Linux is currently doing, there are two kind of quirks:
 * Accesses to the configuration space of certain sizes are not
allowed
 * A specific driver is necessary for driving the host bridge

The former is straightforward to solve but the latter will require
more thought.
Instantiation of a specific driver for the host controller can be
easily done
if Xen has the information to detect it.

So Xen would parse the MCFG to find a hb, then map the config space in
dom0 stage2 ?
and then provide the same MCFG to dom0?


This is implementation details. I have been really careful so far to 
leave 

[Xen-devel] [PATCH 1/1] xl man page cleanup and fixes

2017-05-29 Thread Armando Vega
---
 docs/man/xl.pod.1.in | 362 +--
 1 file changed, 179 insertions(+), 183 deletions(-)

diff --git a/docs/man/xl.pod.1.in b/docs/man/xl.pod.1.in
index 78bf884af2..326c5fef5b 100644
--- a/docs/man/xl.pod.1.in
+++ b/docs/man/xl.pod.1.in
@@ -1,6 +1,6 @@
 =head1 NAME
 
-XL - Xen management tool, based on LibXenlight
+xl - Xen management tool, based on LibXenlight
 
 =head1 SYNOPSIS
 
@@ -82,7 +82,7 @@ though it is unsafe.
 
 =item B<-t>
 
-Always use carriage-return-based overwriting for printing progress
+Always use carriage-return-based overwriting for displaying progress
 messages without scrolling the screen.  Without -t, this is done only
 if stderr is a tty.
 
@@ -97,17 +97,17 @@ previously, most commands take I as the first 
parameter.
 
 =item B I I
 
-I in preference>
+I instead.>
 
-Indicate an ACPI button press to the domain. I is may be 'power' or
+Indicate an ACPI button press to the domain, where I can be 'power' or
 'sleep'. This command is only available for HVM domains.
 
 =item B [I] [I]
 
-The create subcommand takes a config file as first argument: see
-L for full details of that file format and possible options.
-If I is missing B creates the domain starting from the
-default value for every option.
+The create subcommand takes a config file as its first argument: see
+L for full details of the file format and possible options.
+If I is missing B creates the domain assuming the default 
+values for every option.
 
 I has to be an absolute path to a file.
 
@@ -144,7 +144,7 @@ Attach to domain's VNC server, forking a vncviewer process.
 
 =item B<-A>, B<--vncviewer-autopass>
 
-Pass VNC password to vncviewer via stdin.
+Pass the VNC password to vncviewer via stdin.
 
 =item B<-c>
 
@@ -187,7 +187,7 @@ cpus 0-3, and passes through two PCI devices.
 
 =back
 
-=item B B [I] [I]
+=item B I [I] [I]
 
 Update the saved configuration for a running domain. This has no
 immediate effect but will be applied when the guest is next
@@ -195,7 +195,7 @@ restarted. This command is useful to ensure that runtime 
modifications
 made to the guest will be preserved when the guest is restarted.
 
 Since Xen 4.5 xl has improved capabilities to handle dynamic domain
-configuration changes and will preserve any changes made a runtime
+configuration changes and will preserve any changes made at runtime
 when necessary. Therefore it should not normally be necessary to use
 this command any more.
 
@@ -221,11 +221,11 @@ 

[Xen-devel] [PATCH 0/1] xl man page cleanup and fixes proposal

2017-05-29 Thread Armando Vega
Hey everyone!
As we've been doing some crazy stuff with cpu masking on our systems we turned
to the documentation a lot and I noticed that it could use some improvements.
I've cleaned xl(1) up a bit to make it all adhere to one style, fixed all the 
typos I found, fixed some incorrect information that was obviously copied but 
never edited to reflect the true nature of the other command, and hopefully 
made some things that seemed really strangely worded more clear.

If you accept this patch I would gladly donate my time to fixing the rest of 
the man pages, as I did notice the same problems in some of them.

I have a personal fork of this repo on https://github.com/twizted/xen that I 
can use to get all the commits in one place in the future, if that would be 
preffered.

Hopefully you find this helpful.

If needed please contact me directly on my company mail as I'm not following 
the devel mailing list.

kind regards,
Armando Vega

Armando Vega (1):
  xl man page cleanup and fixes

 docs/man/xl.pod.1.in | 362 +--
 1 file changed, 179 insertions(+), 183 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.9] docs: remove PVHv1 document

2017-05-29 Thread Roger Pau Monne
The current misc/pvh.markdown document refers to PVHv1, remove it to
avoid confusion with PVHv2 since the PVHv1 code has already been
removed.

Signed-off-by: Roger Pau Monné 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
Cc: Julien Grall 
---
 docs/misc/pvh.markdown | 377 -
 1 file changed, 377 deletions(-)
 delete mode 100644 docs/misc/pvh.markdown

diff --git a/docs/misc/pvh.markdown b/docs/misc/pvh.markdown
deleted file mode 100644
index 52d8e743e7..00
--- a/docs/misc/pvh.markdown
+++ /dev/null
@@ -1,377 +0,0 @@
-# PVH Specification #
-
-## Rationale ##
-
-PVH is a new kind of guest that has been introduced on Xen 4.4 as a DomU, and
-on Xen 4.5 as a Dom0. The aim of PVH is to make use of the hardware
-virtualization extensions present in modern x86 CPUs in order to
-improve performance.
-
-PVH is considered a mix between PV and HVM, and can be seen as a PV guest
-that runs inside of an HVM container, or as a PVHVM guest without any emulated
-devices. The design goal of PVH is to provide the best performance possible and
-to reduce the amount of modifications needed for a guest OS to run in this mode
-(compared to pure PV).
-
-This document tries to describe the interfaces used by PVH guests, focusing
-on how an OS should make use of them in order to support PVH.
-
-## Early boot ##
-
-PVH guests use the PV boot mechanism, that means that the kernel is loaded and
-directly launched by Xen (by jumping into the entry point). In order to do this
-Xen ELF Notes need to be added to the guest kernel, so that they contain the
-information needed by Xen. Here is an example of the ELF Notes added to the
-FreeBSD amd64 kernel in order to boot as PVH:
-
-ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS,   .asciz, "FreeBSD")
-ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION,  .asciz, 
__XSTRING(__FreeBSD_version))
-ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION,.asciz, "xen-3.0")
-ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE,  .quad,  KERNBASE)
-ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET,   .quad,  KERNBASE)
-ELFNOTE(Xen, XEN_ELFNOTE_ENTRY,  .quad,  xen_start)
-ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .quad,  hypercall_page)
-ELFNOTE(Xen, XEN_ELFNOTE_HV_START_LOW,   .quad,  HYPERVISOR_VIRT_START)
-ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,   .asciz, 
"writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector")
-ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE,   .asciz, "yes")
-ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID,   .long,  PG_V, PG_V)
-ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic")
-ELFNOTE(Xen, XEN_ELFNOTE_SUSPEND_CANCEL, .long,  0)
-ELFNOTE(Xen, XEN_ELFNOTE_BSD_SYMTAB, .asciz, "yes")
-
-On the Linux side, the above can be found in `arch/x86/xen/xen-head.S`.
-
-It is important to highlight the following notes:
-
-  * `XEN_ELFNOTE_ENTRY`: contains the virtual memory address of the kernel 
entry
-point.
-  * `XEN_ELFNOTE_HYPERCALL_PAGE`: contains the virtual memory address of the
-hypercal page inside of the guest kernel (this memory region will be filled
-by Xen prior to booting).
-  * `XEN_ELFNOTE_FEATURES`: contains the list of features supported by the 
kernel.
-In the example above the kernel is only able to boot as a PVH guest, but
-those options can be mixed with the ones used by pure PV guests in order to
-have a kernel that supports both PV and PVH (like Linux). The list of
-options available can be found in the `features.h` public header. Note that
-in the example above `hvm_callback_vector` is in `XEN_ELFNOTE_FEATURES`.
-Older hypervisors will balk at this being part of it, so it can also be put
-in `XEN_ELFNOTE_SUPPORTED_FEATURES` which older hypervisors will ignore.
-
-Xen will jump into the kernel entry point defined in `XEN_ELFNOTE_ENTRY` with
-paging enabled (either long mode or protected mode with paging turned on
-depending on the kernel bitness) and some basic page tables setup. An important
-distinction for a 64bit PVH is that it is launched at privilege level 0 as
-opposed to a 64bit PV guest which is launched at privilege level 3.
-
-Also, the `rsi` (`esi` on 32bits) register is going to contain the virtual
-memory address where Xen has placed the `start_info` structure. The `rsp` 
(`esp`
-on 32bits) will point to the top of an initial single page stack, that can be
-used by the guest kernel. The `start_info` structure contains all the info the
-guest needs in order to initialize. More information about the contents can be
-found in the `xen.h` public header.
-
-### Initial amd64 control registers 

Re: [Xen-devel] Guidance for PVHv2 usage?

2017-05-29 Thread PGNet Dev
On 5/29/17 10:03 AM, Roger Pau Monné wrote:
>> IIUC, Xen 4.9 + kernel-default > v4.11 should now fully support PVHv2?
> 
> For DomUs yes, for Dom0 not yet.

thx

> Forget about this document, I should have removed it as part of the
> PVHv1 code removal. In any case, those where PVHv1 internal details,
> that are to no use to the end user.

noted

>> Is that the case already?  Or is there a transition state in effect, where 
>> both are available?
> 
> Almost, Xen is still currently transitioning from PVHv1 to PVHv2.
> PVHv1 is not available anymore, and PVHv2 is still experimental (and
> not finished in the Dom0 case).

Aha.  The still-experimental state wasn't immediately clear from the Phoronix 
article's "... Lands ..." title.

>>  GRUB_CMDLINE_XEN="dom0=hvm dom0_mem=4G,max:4G dom0_max_vcpus=4 ..."
>>
> 
> I guess this is in the Dom0 grub config file?

Yes / Sry, should've been clearer on that.

> In any case, please
> hold off before trying Dom0, it's still not in a working state.

Does "hold off" -- in this fuzzy interim -- simply mean -- do NOT specify any 
dom0=XXX?  Or to specify dom0=pvh, or somesuch?

>> and in Guest.cfg
>>
>>  builder = 'linux'
>>  kernel = '/usr/lib/grub2/x86_64-xen/grub.xen'
>>  xen_platform_pci = 1
>>  device_model_version = 'qemu-xen'
>>  ...
> 
> I'm not really sure what this guest is, PV I assume?

Heh.  Well, that's a good question.

It WAS hvm, and now it's TRYING to be PVHv2.  It's booted atm as SOMETHING ... 
I suppose it's 'still' PVH.

That^ config IS what I *used* to use for PVH on Opensuse ...

> builder = 'linux' is not a valid option [0].

It is -- or at least, WAS -- for Opensuse's Xen:

   https://lists.opensuse.org/opensuse-virtual/2016-01/msg4.html

Also referenced here:

   https://patchwork.kernel.org/patch/9368403/


I've been running only HVM guests for awhile, so I am NOT certain that hasn't 
changed ...

In any case, the Guest *IS* launching with builder=linux, with no complaint at 
the moment.

Unclear, then, what that means if it is, in fact, NOT a valid option.

> Also grub.xen (which I imagine is pvgrub)

It's from grub2

 rpm -q --whatprovides /usr/lib/grub2/x86_64-xen/grub.xen
  grub2-x86_64-xen-2.02-3.3.x86_64

which I thought makes it pvgrub2?  Not that it matters since, apparently

> is not going to work for PVHv2,


> you will have to
> direct boot into the kernel you wish to use. For example a suitable
> config file for PVHv2 would be:
> 
> builder='hvm'
> device_model_version='none'
> 
> kernel='/path/to/kernel'
> ramdisk='/path/to/initrd'


so, just to be clear, for PVHv2, we'll use =HVM, not PVH (=generic), builder? 
Even given that PVHv1 is not available anymore, and it _sounds_ like PVHv2 will 
"replace" PVH?

> Note that this is a temporary interface, this is subject to change.

sure.  hence, 'testing' ...

> Since the interface is not yet finished, no, there are no current docs
> available, sorry.

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Guidance for PVHv2 usage?

2017-05-29 Thread Roger Pau Monné
On Mon, May 29, 2017 at 08:58:04AM -0700, PGNet Dev wrote:
> Starting with
> 
>   Xen Changes For Linux 4.11: Lands PVHv2 Guest Support
>   
> https://www.phoronix.com/scan.php?page=news_item=Linux-4.11-Xen-Changes
> 
>   [GIT PULL] xen: features and fixes for 4.11-rc0
>   http://lkml.iu.edu/hypermail/linux/kernel/1702.2/03209.html
> 
> IIUC, Xen 4.9 + kernel-default > v4.11 should now fully support PVHv2?

For DomUs yes, for Dom0 not yet.

> I'm running Xen 4.9.0 host on kernel 4.11.3
> 
>   dmesg | egrep -i "linux version|xen version"
>   [0.00] Linux version 4.11.3-2.g7262353-default 
> (geeko@buildhost) (gcc version 7.1.1 20170517 [gcc-7-branch revision 248152] 
> (SUSE Linux) ) #1 SMP PREEMPT Thu May 25 17:55:04 UTC 2017 (7262353)
>   [0.00] Xen version: 4.9.0_07-503 (preserve-AD)
> 
> , both installed from pkgs built for an Opensuse Leap 42.2 env.
> 
> The docs @
> 
>   https://xenbits.xen.org/docs/unstable/misc/hvmlite.html
>   
> https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html#Support-for-Paravirtualisation-of-HVM-Guests
> 
> are not clear on usage.
> 
> This
> 
>   https://xenbits.xen.org/docs/unstable/misc/pvh.html
> 
> suggests that for
> 
>   PVH 
>   ...
>   ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic")
>   ...

Forget about this document, I should have removed it as part of the
PVHv1 code removal. In any case, those where PVHv1 internal details,
that are to no use to the end user.

> the .cfg spec'n should be
> 
>   builder='generic'
> 
> but, there's no mention of which value for specific PVHv2 usage.
> 
> Poking around, I've found discussion that PVH may be eventually retired, and 
> replaced with PVHv2, with same usage.
> 
> Is that the case already?  Or is there a transition state in effect, where 
> both are available?

Almost, Xen is still currently transitioning from PVHv1 to PVHv2.
PVHv1 is not available anymore, and PVHv2 is still experimental (and
not finished in the Dom0 case).

> Atm, I do have a 'converted' HVM guest booting into _some- form of PVH for 
> testing.
> 
> My current setup -- cobbled together from old PVH notes and some older posts, 
> contains, in Grub
> 
>   GRUB_CMDLINE_XEN="dom0=hvm dom0_mem=4G,max:4G dom0_max_vcpus=4 ..."
> 

I guess this is in the Dom0 grub config file? In any case, please
hold off before trying Dom0, it's still not in a working state.

> and in Guest.cfg
> 
>   builder = 'linux'
>   kernel = '/usr/lib/grub2/x86_64-xen/grub.xen'
>   xen_platform_pci = 1
>   device_model_version = 'qemu-xen'
>   ...

I'm not really sure what this guest is, PV I assume?

builder = 'linux' is not a valid option [0]. Also grub.xen (which I
imagine is pvgrub) is not going to work for PVHv2, you will have to
direct boot into the kernel you wish to use. For example a suitable
config file for PVHv2 would be:

builder='hvm'
device_model_version='none'

kernel='/path/to/kernel'
ramdisk='/path/to/initrd'

Note that this is a temporary interface, this is subject to change.

> 
> With that^, the guest *IS* up & , as far as I've tested, functional.
> 
> At current host & guest loglevels, I've not yet seen any errors  -- I'm just 
> not yet clear whether it's successfully, or completely & correctly, PVHv2 ...
> 
> Bottom line -- 
> 
> What *IS* the config/usage for PVHv2?  (Any current docs available?)

Since the interface is not yet finished, no, there are no current docs
available, sorry.

Roger.

[0] http://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html#Selecting-Guest-Type

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 22/22] x86: clean up traps.c

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:10,  wrote:
> Replace bool_t with bool. Delete trailing white spaces. Fix some coding
> style issues.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu 

Acked-by: Jan Beulich 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 19/22] x86: clean up pv/traps.c

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:10,  wrote:
> Fix coding style issues. No functional change.

Same general comment as on the earlier cleanup patch.

> @@ -318,17 +325,19 @@ int send_guest_trap(struct domain *d, uint16_t vcpuid, 
> unsigned int trap_nr)
>  return -EBUSY;
>  
>  /* We are called by the machine check (exception or polling) handlers
> - * on the physical CPU that reported a machine check error. */
> + * on the physical CPU that reported a machine check error.
> + */

Please correct the comment as a whole, not just one of its ends.

> @@ -448,27 +458,31 @@ static long register_guest_callback(struct 
> callback_register *reg)
>  switch ( reg->type )
>  {
>  case CALLBACKTYPE_event:
> -v->arch.pv_vcpu.event_callback_eip= reg->address;
> +v->arch.pv_vcpu.event_callback_eip = reg->address;
>  break;
>  
>  case CALLBACKTYPE_failsafe:
>  v->arch.pv_vcpu.failsafe_callback_eip = reg->address;
> +
>  if ( reg->flags & CALLBACKF_mask_events )
>  set_bit(_VGCF_failsafe_disables_events,
>  >arch.vgc_flags);
>  else
>  clear_bit(_VGCF_failsafe_disables_events,
>>arch.vgc_flags);
> +
>  break;
>  
>  case CALLBACKTYPE_syscall:
>  v->arch.pv_vcpu.syscall_callback_eip  = reg->address;
> +
>  if ( reg->flags & CALLBACKF_mask_events )
>  set_bit(_VGCF_syscall_disables_events,
>  >arch.vgc_flags);
>  else
>  clear_bit(_VGCF_syscall_disables_events,
>>arch.vgc_flags);
> +
>  break;

Some of the earlier additions of blank lines were already
questionable imo, but especially the ones you add here before the
break statements go to far afaic.

> @@ -674,13 +688,16 @@ void compat_show_guest_stack(struct vcpu *v, const 
> struct cpu_user_regs *regs,
>  printk(" %08x", addr);
>  stack++;
>  }
> +
>  if ( mask == PAGE_SIZE )
>  {
>  BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
>  unmap_domain_page(stack);
>  }
> +
>  if ( i == 0 )
>  printk("Stack empty.");
> +
>  printk("\n");
>  }

Same for at least the last one here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 18/22] x86/traps: merge x86_64/compat/traps.c into pv/traps.c

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:10,  wrote:
> Export hypercall_page_initialise_ring1_kernel as the code is moved.

This one again wants to go into hypercall.c. compat_iret() wants to go
next to do_iret() (wherever that ends up), and similarly I think other
functions would better go next to their non-compat counterparts
instead of creating a compat section in pv/traps.c.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 17/22] x86/traps: move hypercall_page_initialise_ring3_kernel

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> And export it via pv/domain.h.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu 
> ---
>  xen/arch/x86/pv/traps.c | 36 

I'm pretty certain hypercall.c would be a better fit here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 16/22] x86/traps: move callback_op code

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> No functional change.
> 
> Signed-off-by: Wei Liu 
> ---
>  xen/arch/x86/pv/traps.c | 148 +++

callback.c ?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 14/22] x86/traps: move do_iret to pv/traps.c

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> No functional change.
> 
> Signed-off-by: Wei Liu 

Acked-by: Jan Beulich 

Albeit putting it into its own file (iret.c) would also be an option. Not
sure how fine grained we want things to be.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 15/22] x86/traps: move init_int80_direct_trap

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> No functional change.
> 
> Signed-off-by: Wei Liu 

Acked-by: Jan Beulich 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 13/22] x86/traps: move toggle_guest_mode

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> Move from x86_64/traps.c to pv/traps.c.

This again doesn't necessarily fit into traps.c (but it's an option).
Perhaps we want some misc.c or util.c?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] Guidance for PVHv2 usage?

2017-05-29 Thread PGNet Dev
Starting with

Xen Changes For Linux 4.11: Lands PVHv2 Guest Support

https://www.phoronix.com/scan.php?page=news_item=Linux-4.11-Xen-Changes

[GIT PULL] xen: features and fixes for 4.11-rc0
http://lkml.iu.edu/hypermail/linux/kernel/1702.2/03209.html

IIUC, Xen 4.9 + kernel-default > v4.11 should now fully support PVHv2?

I'm running Xen 4.9.0 host on kernel 4.11.3

dmesg | egrep -i "linux version|xen version"
[0.00] Linux version 4.11.3-2.g7262353-default 
(geeko@buildhost) (gcc version 7.1.1 20170517 [gcc-7-branch revision 248152] 
(SUSE Linux) ) #1 SMP PREEMPT Thu May 25 17:55:04 UTC 2017 (7262353)
[0.00] Xen version: 4.9.0_07-503 (preserve-AD)

, both installed from pkgs built for an Opensuse Leap 42.2 env.

The docs @

https://xenbits.xen.org/docs/unstable/misc/hvmlite.html

https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html#Support-for-Paravirtualisation-of-HVM-Guests

are not clear on usage.

This

https://xenbits.xen.org/docs/unstable/misc/pvh.html

suggests that for

PVH 
...
ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic")
...

the .cfg spec'n should be

builder='generic'

but, there's no mention of which value for specific PVHv2 usage.

Poking around, I've found discussion that PVH may be eventually retired, and 
replaced with PVHv2, with same usage.

Is that the case already?  Or is there a transition state in effect, where both 
are available?

Atm, I do have a 'converted' HVM guest booting into _some- form of PVH for 
testing.

My current setup -- cobbled together from old PVH notes and some older posts, 
contains, in Grub

GRUB_CMDLINE_XEN="dom0=hvm dom0_mem=4G,max:4G dom0_max_vcpus=4 ..."
  

and in Guest.cfg

builder = 'linux'
kernel = '/usr/lib/grub2/x86_64-xen/grub.xen'
xen_platform_pci = 1
device_model_version = 'qemu-xen'
...

With that^, the guest *IS* up & , as far as I've tested, functional.

At current host & guest loglevels, I've not yet seen any errors  -- I'm just 
not yet clear whether it's successfully, or completely & correctly, PVHv2 ...

Bottom line -- 

What *IS* the config/usage for PVHv2?  (Any current docs available?)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 12/22] x86/traps: move send_guest_trap to pv/traps.c

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:

As said on patch 10(?), this shouldn't be moved alone. And whether
we want to move it in the first place depends on what the PVH
plans here are.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 11/22] x86/traps: move guest_has_trap_callback to pv/traps.c

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> --- a/xen/arch/x86/pv/traps.c
> +++ b/xen/arch/x86/pv/traps.c
> @@ -264,6 +264,24 @@ long unregister_guest_nmi_callback(void)
>  return 0;
>  }
>  
> +int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,

bool and all pointers (including the local variables) can be const
afaict (albeit I question the value of both of the local variables,
as each is being used just once). And let's please avoid uint16_t
here when it doesn't really need to be other than unsigned int.

> +unsigned int trap_nr)
> +{
> +struct vcpu *v;
> +struct trap_info *t;
> +
> +BUG_ON(d == NULL);
> +BUG_ON(vcpuid >= d->max_vcpus);
> +
> +/* Sanity check - XXX should be more fine grained. */
> +BUG_ON(trap_nr >= NR_VECTORS);
> +
> +v = d->vcpu[vcpuid];
> +t = >arch.pv_vcpu.trap_ctxt[trap_nr];
> +
> +return (t->address != 0);

With the return type being bool, the != 0 (and the already
pointless parentheses) could then be dropped too.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 10/22] x86/traps: delcare percpu softirq_trap

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> It needs to be non-static when we split PV specific code out.

I don't follow here: The variable is used by send_guest_trap() to
communicate with its helper nmi_mce_softirq(). How would you
move one without the other? The main cleanup potential I see
here would be for the struct softirq_trap to be moved from the
header to the C file (as it's used in only one of them).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 08/22] x86/traps: move set_guest_{machinecheck, nmi}_trapbounce

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> +/*
> + * Called from asm to set up the MCE trapbounce info.
> + * Returns 0 if no callback is set up, else 1.
> + */

The comments suggest both of them want to actually return bool.

> +int set_guest_machinecheck_trapbounce(void)
> +{
> +struct vcpu *v = current;

Again "curr" please if at all possible.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 07/22] x86/traps: move pv_inject_event to pv/traps.c

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:

> @@ -128,6 +132,75 @@ unsigned long do_get_debugreg(int reg)
>  return -EINVAL;
>  }
>  
> +void pv_inject_event(const struct x86_event *event)
> +{
> +struct vcpu *v = current;

Could you take the opportunity and name this curr?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> The following handlers are moved:
> 1. do_set_trap_table

This one makes sense to move to pv/traps.c, but ...

> 2. do_set_debugreg
> 3. do_get_debugreg
> 4. do_fpu_taskswitch

... none of these do. I could see them go into pv/hypercall.c,
but I could also see that file dealing intentionally only with
everything hypercall related except individual handlers. Andrew,
do you have any opinion or thoughts here?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 05/22] x86/pv: clean up emulate.c

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> Fix coding style issues. Replace bool_t with bool. Add spaces around
> binary ops.

For these it would probably be fine to do them while moving the code.
But you did the extra work to put this into a separate patch, so I'm
not going to object to this approach.

> @@ -209,43 +209,45 @@ static int guest_io_okay(
>  /* fallthrough */
>  case 0:  break;
>  }
> -TOGGLE_MODE();
>  
> -if ( (x.mask & (((1< -return 1;
> +if ( user_mode )
> +toggle_guest_mode(v);
> +
> +if ( (x.mask & (((1 << bytes)-1) << (port & 7))) == 0 )

You've caught the << and & here, but missed the - .

> @@ -369,7 +372,7 @@ static unsigned int check_guest_io_breakpoint(struct vcpu 
> *v,
>  }
>  
>  if ( (start < (port + len)) && ((start + width) > port) )
> -match |= 1 << i;
> +match |= 1u << i;

Ah, I guess that's what "Use unsigned integer for shifting" refers to.
The wording first made me assume you talk about the shift count...

> @@ -921,11 +925,11 @@ static int priv_op_read_msr(unsigned int reg, uint64_t 
> *val,
>  *val = curr->arch.pv_vcpu.gs_base_user;
>  return X86EMUL_OKAY;
>  
> -/*
> - * In order to fully retain original behavior, defer calling
> - * pv_soft_rdtsc() until after emulation. This may want/need to be
> - * reconsidered.
> - */
> +/*
> + * In order to fully retain original behavior, defer calling
> + * pv_soft_rdtsc() until after emulation. This may want/need to be
> + * reconsidered.
> + */
>  case MSR_IA32_TSC:
>  poc->tsc |= TSC_BASE;
>  goto normal;

This comment was intentionally indented that way - the deeper
indentation would imo only be suitable if it followed the case label.

> @@ -1745,17 +1748,17 @@ void emulate_gate_op(struct cpu_user_regs *regs)
>  {
>  unsigned int ss, esp, *stkp;
>  int rc;
> -#define push(item) do \
> -{ \
> ---stkp; \
> -esp -= 4; \
> -rc = __put_user(item, stkp); \
> -if ( rc ) \
> -{ \
> -pv_inject_page_fault(PFEC_write_access, \
> - (unsigned long)(stkp + 1) - rc); \
> -return; \
> -} \
> +#define push(item) do   \
> +{   \
> +--stkp; \
> +esp -= 4;   \
> +rc = __put_user(item, stkp);\
> +if ( rc )   \
> +{   \
> +pv_inject_page_fault(PFEC_write_access, \
> + (unsigned long)(stkp + 1) - rc);   \
> +return; \
> +}   \
>  } while ( 0 )

I know it's a matter of taste, and I could imagine others having
differing opinions, but I dislike this moving out to the far right of
the backslashes. In particular it makes later adding a line longer
than all current ones ugly - either one has to touch all other lines
or one needs to accept the one new line standing out.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space

2017-05-29 Thread Jan Beulich
>>> On 29.05.17 at 17:05,  wrote:
> On Mon, May 29, 2017 at 08:16:41AM -0600, Jan Beulich wrote:
>> >>> On 29.05.17 at 14:57,  wrote:
>> > On Fri, May 19, 2017 at 05:35:47AM -0600, Jan Beulich wrote:
>> >> >>> On 27.04.17 at 16:35,  wrote:
>> >> > +int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t 
>> >> > read_handler,
>> 
>> Btw., I only now notice this further strange xen_ prefix here.
> 
> I assume this should be vpci_*, (dropping the xen_ prefix uniformly).

Yes please.

>> >> > +  vpci_write_t write_handler, unsigned int 
>> >> > offset,
>> >> > +  unsigned int size, void *data)
>> >> > +{
>> >> > +struct rb_node **new, *parent;
>> >> > +struct vpci_register *r;
>> >> > +
>> >> > +/* Some sanity checks. */
>> >> > +if ( (size != 1 && size != 2 && size != 4) || offset >= 0xFFF ||
>> >> 
>> >> Off by one again in the offset check.
>> > 
>> > Fixed to be > 0xfff. Should this maybe be added to pci_regs.h as
>> > PCI_MAX_REGISTER? 
>> 
>> Could be done, but then we need one for base and one for
>> extended config space. May want to check whether Linux has
>> invented some good names for these by now.
> 
> pci_regs.h from Linux now has:
> 
> /*
>  * Conventional PCI and PCI-X Mode 1 devices have 256 bytes of
>  * configuration space.  PCI-X Mode 2 and PCIe devices have 4096 bytes of
>  * configuration space.
>  */
> #define PCI_CFG_SPACE_SIZE256
> #define PCI_CFG_SPACE_EXP_SIZE4096
> 
> At the top. Do you think those defines are fine for importing?

Sure.

> (this
> was introduced by cc10385b6fde3, but I don't think importing this in a
> more formal way makes sense).

Agreed.

>> >> > +/* Helper macros for the read/write handlers. */
>> >> > +#define GENMASK_BYTES(e, s) GENMASK((e) * 8, (s) * 8)
>> >> 
>> >> What do e and s stand for here?
>> > 
>> > e = end, s = start (in bytes).
>> 
>> And where do you start counting. Having seen the rest of the
>> series I'm actually rather unconvinced use these macros results
>> in better code - to me, plain hex numbers are quite a bit easier
>> to read as long as the number of digits doesn't go meaningfully
>> beyond 10 or so.
>> 
>> >> > +#define SHIFT_RIGHT_BYTES(d, o) d >>= (o) * 8
>> >> 
>> >> And at least o here?
>> > 
>> > o = offset (in bytes)
>> 
>> I think simply writhing the shift expression is once again more
>> clear to the reader than using a macro which is longer to read
>> and type and which has semantics which aren't immediately
>> clear from its name.
>> 
>> > I can rename ADD_RESULT to APPEND_RESULT or something more descriptive
>> > if you think it's going to make it easier to understand.
>> 
>> I'd prefer if the name "merge" appeared in that name - I don't see
>> this being usable strictly only to append to either side of a value.
> 
> OK MERGE_RESULT or MERGE_REGISTER maybe? (and the rest removed).

Either name is fine with me, with a slight preference to the former.

>> >> > +static int vpci_write_helper(struct pci_dev *pdev,
>> >> > + const struct vpci_register *r, unsigned 
>> >> > int size,
>> >> > + unsigned int offset, uint32_t data)
>> >> > +{
>> >> > +union vpci_val val = { .double_word = data };
>> >> > +int rc;
>> >> > +
>> >> > +ASSERT(size <= r->size);
>> >> > +if ( size != r->size )
>> >> > +{
>> >> > +rc = r->read(pdev, r->offset, , r->priv_data);
>> >> > +if ( rc )
>> >> > +return rc;
>> >> > +val.double_word &= ~GENMASK_BYTES(size + offset, offset);
>> >> > +data &= GENMASK_BYTES(size, 0);
>> >> > +val.double_word |= data << (offset * 8);
>> >> > +}
>> >> > +
>> >> > +return r->write(pdev, r->offset, val, r->priv_data);
>> >> > +}
>> >> 
>> >> I'm not sure that writing back the value read is correct in all cases
>> >> (think of write-only or rw1c registers or even offsets where reads
>> >> and writes access different registers altogether). I think the write
>> >> handlers will need to be made capable of dealing with partial writes.
>> > 
>> > That seems to be what pciback does fro writes: read, merge value,
>> > write back (drivers/xen/xen-pciback/conf_space.c
>> > xen_pcibk_config_write):
>> > 
>> > err = conf_space_read(dev, cfg_entry, field_start,
>> >  _val);
>> > if (err)
>> >break;
>> > 
>> > tmp_val = merge_value(tmp_val, value, get_mask(size),
>> >  offset - field_start);
>> > 
>> > err = conf_space_write(dev, cfg_entry, field_start,
>> >   tmp_val);
>> > 
>> > I would prefer to do it this way in order to avoid adding more
>> > complexity to the handlers themselves. So far I haven't found any such
>> > registers (rw1c) in the PCI config space, do you have references to
>> > any of them?
>> 
>> The status register.
> 
> But the status register is not going to be trapped, not by Dom0 or
> 

Re: [Xen-devel] [PATCH for-next v3 04/22] x86/traps: move emulate_forced_invalid_op

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> And remove the now unused instruction_done in x86/traps.c.

Hmm, that's another candidate for moving early and making non-static
then, I guess.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 03/22] x86/traps: move emulate_invalid_rdtscp

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> ---
>  xen/arch/x86/pv/emulate.c  | 20 

And this one, together with emulate_forced_invalid_op(), perhaps
invalid.c?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 02/22] x86/traps: move gate op emulation code

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:09,  wrote:
> ---
>  xen/arch/x86/pv/emulate.c  | 403 +

Along the lines of comments to patch 1, gate-op.c?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 19:28,  wrote:
> From 58df816b937dc7a3598de01f053a6030e631057e Mon Sep 17 00:00:00 2001
> From: Wei Liu 
> Date: Thu, 18 May 2017 16:18:56 +0100
> Subject: [PATCH] x86/traps: move privilege instruction emulation code

privileged

> Move relevant code to pv/emulate.c. Export emulate_privileged_op in
> pv/traps.h.

A name of "emulate.c" sounds like a container for all sorts of cruft.
I'd prefer if we could use the opportunity of this re-org to see about
not having overly large files. Therefore e.g. "emul-priv.c" or
"priv-emul.c" or some such?

> Note that read_descriptor is duplicated in emulate.c. The duplication
> will be gone once all emulation code is moved.

That's not very desirable; we can only hope to have things
committed together then? However, together with the naming
issue above quite likely the function will want to become non-
static anyway, so perhaps this should then be done right away
instead of cloning it.

> Also move gpr_switch.S to pv/ because the code in that file is only
> used by privilege instruction emulation.
> 
> No functional change.

For this size of a change this is too weak a statement for my taste:
I don't really mean to review the 1.5k of lines you move, so I'd hope
for a statement clarifying that perhaps other than formatting the
code is being moved unchanged.

> +int emulate_privileged_op(struct cpu_user_regs *regs)

Does this perhaps want to gain a pv_ prefix?

> +#ifdef CONFIG_PV
> +
> +#include 
> +
> +int emulate_privileged_op(struct cpu_user_regs *regs);
> +
> +#else  /* !CONFIG_PV */
> +
> +#include 
> +
> +int emulate_privileged_op(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }

The function does not return -E... values.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [linux-4.9 test] 109830: regressions - FAIL

2017-05-29 Thread osstest service owner
flight 109830 linux-4.9 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109830/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-credit2   6 xen-boot fail REGR. vs. 107358
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-start/win.repeat fail in 109749 
REGR. vs. 107358

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail in 
109749 pass in 109830
 test-amd64-i386-xl-qemut-debianhvm-amd64 9 debian-hvm-install fail in 109749 
pass in 109830
 test-amd64-i386-xl-qemuu-ovmf-amd64 9 debian-hvm-install fail in 109749 pass 
in 109830
 test-arm64-arm64-xl-multivcpu 15 guest-start/debian.repeat fail in 109749 pass 
in 109830
 test-amd64-amd64-rumprun-amd64 16 rumprun-demo-xenstorels/xenstorels.repeat 
fail in 109749 pass in 109830
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail in 109796 pass in 109749
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail in 109796 pass in 
109830
 test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail in 109817 
pass in 109830
 test-amd64-amd64-xl-xsm 19 guest-start/debian.repeat fail in 109817 pass in 
109830
 test-amd64-i386-xl-raw   9 debian-di-install fail in 109817 pass in 109830
 test-amd64-i386-xl-qemuu-win7-amd64 9 windows-install fail in 109817 pass in 
109830
 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail pass in 
109796
 test-amd64-i386-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail pass in 
109817

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds  9 debian-install   fail REGR. vs. 107358

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-start/win.repeat fail blocked in 
107358
 test-amd64-amd64-qemuu-nested-intel 16 debian-hvm-install/l1/l2 fail in 109817 
blocked in 107358
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-start/win.repeat fail in 109817 
never pass
 test-armhf-armhf-xl-xsm   6 xen-boot fail  like 107358
 test-armhf-armhf-xl-rtds  6 xen-boot fail  like 107358
 test-armhf-armhf-xl-multivcpu  6 xen-boot fail like 107358
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-start/win.repeat fail like 107358
 test-armhf-armhf-libvirt-raw  6 xen-boot fail  like 107358
 test-armhf-armhf-libvirt-xsm  6 xen-boot fail  like 107358
 test-armhf-armhf-xl   6 xen-boot fail  like 107358
 test-armhf-armhf-xl-vhd   6 xen-boot fail  like 107358
 test-armhf-armhf-libvirt  6 xen-boot fail  like 107358
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale   6 xen-boot fail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64  9 windows-installfail never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-arm64-arm64-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-rtds 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-rtds 13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 12 saverestore-support-checkfail never pass
 test-armhf-armhf-examine  6 reboot   fail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt 12 

Re: [Xen-devel] [PATCH v3 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space

2017-05-29 Thread Roger Pau Monne
On Mon, May 29, 2017 at 08:16:41AM -0600, Jan Beulich wrote:
> >>> On 29.05.17 at 14:57,  wrote:
> > On Fri, May 19, 2017 at 05:35:47AM -0600, Jan Beulich wrote:
> >> >>> On 27.04.17 at 16:35,  wrote:
> >> > +/* Read/write of unset register. */
> >> > +VPCI_READ_CHECK(8, 4, 0x);
> >> > +VPCI_READ_CHECK(8, 2, 0x);
> >> > +VPCI_READ_CHECK(8, 1, 0xff);
> >> > +VPCI_WRITE(10, 2, 0xbeef);
> >> > +VPCI_READ_CHECK(10, 2, 0x);
> >> > +
> >> > +/* Read of multiple registers */
> >> > +VPCI_CHECK_REG(7, 1, 0xbd);
> >> > +VPCI_READ_CHECK(4, 4, 0xbdbabaff);
> >> 
> >> I think a variant accessing mixed size registers would also be
> >> desirable here. Perhaps it would be best to exhaustively test
> >> all possible variations (there aren't that many after all). Same
> >> for writes and partial accesses (below) then.
> > 
> > So you mean to scan the whole space (from 0 to 128 on this test) using
> > all possible register sizes for both read and write? That would indeed
> > be feasible.
> 
> Not sure what the "from 0 to 128" is meant to apply to. What I mean
> is to test all combinations of (mix of) register sizes and access sizes.
> I.e. all combinations making up a single 4-byte field ((1,1,1,1),
> (1,1,2), (2,1,1), (2,2), (4)) and all four 1-byte accesses, both 2-byte
> ones, and the sole possible 4-byte one.

OK, thanks for the clarification, now I get it.

> >> > @@ -256,6 +257,152 @@ void register_g2m_portio_handler(struct domain *d)
> >> >  handler->ops = _portio_ops;
> >> >  }
> >> >  
> >> > +/* Do some sanity checks. */
> >> > +static int vpci_access_check(unsigned int reg, unsigned int len)
> >> > +{
> >> > +/* Check access size. */
> >> > +if ( len != 1 && len != 2 && len != 4 )
> >> > +{
> >> > +gdprintk(XENLOG_WARNING, "invalid length (reg: %#x, len: %u)\n",
> >> > + reg, len);
> >> 
> >> I think many of such gdprintk()s want to go away before this series
> >> gets committed.
> > 
> > OK, I've found them useful while developing, but I guess they are not
> > really useful outside from that context. I guess there's no way to
> > leave them in place, maybe a Kconfig option?
> 
> That seems overkill. I wouldn't so much mind the messages (they
> get compiled out for non-debug builds anyway), but the clutter
> they introduce to code: In some cases half of your functions are
> the invocation of gdprintk() ...

Let me try to prune some of them.

> >> > +/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
> >> > +static bool_t vpci_portio_accept(const struct hvm_io_handler *handler,
> >> 
> >> Plain bool please.
> > 
> > Sadly struct hvm_io_ops (which is where this function is used) expects
> > a bool_t as return.
> 
> I don't follow - bool_t is simply a typedef of bool nowadays, and
> typedefs are all equivalent as far as the C type system goes.

Clearly my bad, I assumed they where actually different types.

> >> Again the question - what's the bare hardware equivalent of
> >> returning X86EMUL_UNHANDLEABLE here?
> > 
> > All 1's I assume (or other random garbage). Would you be OK with me
> > adding a "fail" label that sets data to all 1's and returns X86EMUL_OKAY?
> 
> That would probably be okay (despite my dislike of labels and goto-s).

Maybe the goto is not needed after all if vpci_read returns the data
filled with 1's and no error code.

> >> > +#include 
> >> > +#include 
> >> > +
> >> > +extern const vpci_register_init_t __start_vpci_array[], 
> >> > __end_vpci_array[];
> >> > +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
> >> > +#define vpci_init __start_vpci_array
> >> 
> >> What is this last one good for?
> > 
> > It's used by xen_vpci_add_handlers in order to call the init
> > functions, I can drop it and call __start_vpci_array[i](...) if that's
> > better.
> 
> Well, if there were several users, I could see the benefit of an
> abbreviating #define. But for a single user the #define adds
> more code / clutter than is being saved on the use site.

Ack.

> >> > +struct rb_node node;
> >> > +};
> >> > +
> >> > +int xen_vpci_add_handlers(struct pci_dev *pdev)
> >> 
> >> __hwdom_init (I notice setup_one_hwdom_device() wrongly isn't
> >> annotated so).
> > 
> > OK, so you really want the init handlers to be inside of the
> > .init.rodata section then.
> 
> Only if that's correct, and it is correct as long as all possible call trees
> root in a __hwdom_init function. (To avoid misunderstanding: This
> clearly can't be .init.rodata uniformly, as __hwdom_init isn't always
> an alias of __init).

OK.

> >> > +int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t 
> >> > read_handler,
> 
> Btw., I only now notice this further strange xen_ prefix here.

I assume this should be vpci_*, (dropping the xen_ prefix uniformly).

> >> > +  vpci_write_t write_handler, unsigned int 
> >> > offset,
> >> > +  

Re: [Xen-devel] [PATCH 1/2] x86/monitor: add masking support for write_ctrlreg events

2017-05-29 Thread Jan Beulich
>>> On 26.05.17 at 15:41,  wrote:
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -37,7 +37,7 @@
>  #include "hvm/save.h"
>  #include "memory.h"
>  
> -#define XEN_DOMCTL_INTERFACE_VERSION 0x000d
> +#define XEN_DOMCTL_INTERFACE_VERSION 0x000e
>  
>  /*
>   * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
> @@ -1107,6 +1107,10 @@ struct xen_domctl_monitor_op {
>  uint8_t sync;
>  /* Send event only on a change of value */
>  uint8_t onchangeonly;
> +/* Send event only if the changed bit in the control register
> + * is not masked
> + */
> +unsigned long bitmask;

You can't use "unsigned long" in public headers. Also beware of
alignment issues between 32-bit and 64-bit callers (I didn't check
whether there actually is any, I'm merely hinting towards the
use of uint64_aligned_t).

> --- a/xen/include/public/vm_event.h
> +++ b/xen/include/public/vm_event.h
> @@ -155,6 +155,15 @@
>  #define VM_EVENT_X86_CR42
>  #define VM_EVENT_X86_XCR0   3
>  
> +/* vm_event_write_ctrlreg default bit mask
> + * If the changed bit in the control register is masked the event is
> + * not triggered
> + */

Comment style.

> +#define VM_EVENT_X86_CR0_DEFAULT_MASK0x
> +#define VM_EVENT_X86_CR3_DEFAULT_MASK0x
> +#define VM_EVENT_X86_CR4_DEFAULT_MASK0x0080
> +#define VM_EVENT_X86_XCR0_DEFAULT_MASK   0x

These are unused - what are they good for? And why would
you want to special case CR4.PGE (and even without any
comment)?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: preserve native TSC speed during migration between identical hosts

2017-05-29 Thread Jan Beulich
>>> On 24.05.17 at 16:25,  wrote:
> @@ -2024,6 +2029,13 @@ void tsc_set_info(struct domain *d,
>  d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
>  d->arch.tsc_khz = gtsc_khz ?: cpu_khz;
>  set_time_scale(>arch.vtsc_to_ns, d->arch.tsc_khz * 1000);
> +if (!opt_vtsc_tolerance) {
> +tolerated = d->arch.tsc_khz == cpu_khz;
> +} else {

Leaving aside the question of whether we want anything like this,
there are multiple coding style issues here (braces on their own
lines, blanks inside the parentheses of control statements). 

> +khz_diff = cpu_khz > d->arch.tsc_khz ?
> +   cpu_khz - d->arch.tsc_khz : d->arch.tsc_khz - cpu_khz;
> +tolerated = khz_diff <= opt_vtsc_tolerance;
> +}

These assignments to tolerated also suggest it wants to be bool.

Finally I don't think a host wide option will do. If it is to be of use
(considering that it used wrong may break applications), it needs
to be per-domain, and its value needs to be migrated (perhaps
in the form of a low/high pair of TSC frequency values).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: preserve native TSC speed during migration between identical hosts

2017-05-29 Thread Jan Beulich
>>> On 24.05.17 at 17:33,  wrote:
> On Wed, May 24, 2017 at 05:25:03PM +0200, Olaf Hering wrote:
>> On Wed, May 24, Konrad Rzeszutek Wilk wrote:
>> 
>> > How can that be determined? As in how can the guest (domU) be within
>> > the range? Is there some way to determine that? Is there some
>> > matrix of the various OS-es that can tolerate this?
>> 
>> Just cycle through all dom0s and look for the cpu_khz values:
>>  # xl dmesg | grep -w MHz
>> (XEN) Detected 2494.018 MHz processor.
>> 
>> What I have seen are ranges up to 200khz, even if /proc/cpuinfo says
>> "2.50GHz" . Some dom0s calibrated themselves to exactly 2500.000 MHz,
>> they do not need a tolerance.
>> 
>> The expected frequency of a given domU can be seen in 'dump softtsc
>> stats' (s). How various guest kernels deal with the slightly different
>> frequency, no idea.
> 
> Right, so that answers how one would find the values (which I think
> should be in the docs?).
> 
> But it does not help customers to figure out if this is OK for them?
> 
> As in, how can customers be assured that 1% jitter is OK for their
> kernel? That time won't go backwards?
> 
> Is there some form of tests that they can run to verify and test
> that this is safe? Or perhaps this is something that is based on the kernel
> versions? Like 4.11 are safe, but 3.18 is not?

Well, no, what jitter may be acceptable depends on the
applications running inside the guest. I.e. you can only know
for yourself or ask the application vendor(s). I think such an
option, if we really want to have it, would need to be
prominently documented as unsupported - after all we can't
help it if people use it and then find their applications break.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/3] x86/altp2m: Add a hvmop for setting the suppress #VE bit

2017-05-29 Thread Jan Beulich
>>> On 18.05.17 at 17:07,  wrote:
> --- a/xen/arch/x86/mm/mem_access.c
> +++ b/xen/arch/x86/mm/mem_access.c
> @@ -466,6 +466,54 @@ int p2m_get_mem_access(struct domain *d, gfn_t gfn, 
> xenmem_access_t *access)
>  }
>  
>  /*
> + * Set/clear the #VE suppress bit for a page.  Only available on VMX.
> + */
> +int p2m_set_suppress_ve(struct domain *d, gfn_t gfn, uint8_t suppress_ve,

suppress_ve presumably is meant to be boolean.

> +unsigned int altp2m_idx)
> +{
> +struct p2m_domain *host_p2m = p2m_get_hostp2m(d);
> +struct p2m_domain *ap2m = NULL;
> +struct p2m_domain *p2m = NULL;

Pointless initializer.

> +mfn_t mfn;
> +p2m_access_t a;
> +p2m_type_t t;
> +unsigned long gfn_l;

Please avoid this local variable and use gfn_x() in the two places
you need to.

> +int rc = 0;

Pointless initializer again.

> +
> +if ( !cpu_has_vmx )
> +return -EOPNOTSUPP;

Is this enough? Wouldn't it be better to signal the caller whenever
hardware (or even software) isn't going to honor the request?

> +if ( altp2m_idx > 0 )
> +{
> +if ( altp2m_idx >= MAX_ALTP2M ||
> +d->arch.altp2m_eptp[altp2m_idx] == mfn_x(INVALID_MFN) )

Indentation.

> +return -EINVAL;
> +
> +p2m = ap2m = d->arch.altp2m_p2m[altp2m_idx];
> +}
> +else
> +{
> +p2m = host_p2m;
> +}

Unnecessary braces.

> +p2m_lock(host_p2m);
> +if ( ap2m )
> +p2m_lock(ap2m);
> +
> +gfn_l = gfn_x(gfn);
> +mfn = p2m->get_entry(p2m, gfn_l, , , 0, NULL, NULL);
> +if ( !mfn_valid(mfn) )
> +return -ESRCH;
> +rc = p2m->set_entry(p2m, gfn_l, mfn, PAGE_ORDER_4K, t, a,
> +suppress_ve);
> +if ( ap2m )
> +p2m_unlock(ap2m);
> +p2m_unlock(host_p2m);

To fiddle with a single gfn, this looks to be very heavy locking.
While for now gfn_lock() is the same as p2m_lock(), from an
abstract perspective I'd expect gfn_lock() to suffice here at 
least in the non-altp2m case.

And then there are two general questions: Without a libxc layer
function, how is one supposed to use this new sub-op? Is it
really intended to permit a guest to call this for itself?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space

2017-05-29 Thread Jan Beulich
>>> On 29.05.17 at 14:57,  wrote:
> On Fri, May 19, 2017 at 05:35:47AM -0600, Jan Beulich wrote:
>> >>> On 27.04.17 at 16:35,  wrote:
>> > +/* Read/write of unset register. */
>> > +VPCI_READ_CHECK(8, 4, 0x);
>> > +VPCI_READ_CHECK(8, 2, 0x);
>> > +VPCI_READ_CHECK(8, 1, 0xff);
>> > +VPCI_WRITE(10, 2, 0xbeef);
>> > +VPCI_READ_CHECK(10, 2, 0x);
>> > +
>> > +/* Read of multiple registers */
>> > +VPCI_CHECK_REG(7, 1, 0xbd);
>> > +VPCI_READ_CHECK(4, 4, 0xbdbabaff);
>> 
>> I think a variant accessing mixed size registers would also be
>> desirable here. Perhaps it would be best to exhaustively test
>> all possible variations (there aren't that many after all). Same
>> for writes and partial accesses (below) then.
> 
> So you mean to scan the whole space (from 0 to 128 on this test) using
> all possible register sizes for both read and write? That would indeed
> be feasible.

Not sure what the "from 0 to 128" is meant to apply to. What I mean
is to test all combinations of (mix of) register sizes and access sizes.
I.e. all combinations making up a single 4-byte field ((1,1,1,1),
(1,1,2), (2,1,1), (2,2), (4)) and all four 1-byte accesses, both 2-byte
ones, and the sole possible 4-byte one.

>> > @@ -256,6 +257,152 @@ void register_g2m_portio_handler(struct domain *d)
>> >  handler->ops = _portio_ops;
>> >  }
>> >  
>> > +/* Do some sanity checks. */
>> > +static int vpci_access_check(unsigned int reg, unsigned int len)
>> > +{
>> > +/* Check access size. */
>> > +if ( len != 1 && len != 2 && len != 4 )
>> > +{
>> > +gdprintk(XENLOG_WARNING, "invalid length (reg: %#x, len: %u)\n",
>> > + reg, len);
>> 
>> I think many of such gdprintk()s want to go away before this series
>> gets committed.
> 
> OK, I've found them useful while developing, but I guess they are not
> really useful outside from that context. I guess there's no way to
> leave them in place, maybe a Kconfig option?

That seems overkill. I wouldn't so much mind the messages (they
get compiled out for non-debug builds anyway), but the clutter
they introduce to code: In some cases half of your functions are
the invocation of gdprintk() ...

>> > +/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
>> > +static bool_t vpci_portio_accept(const struct hvm_io_handler *handler,
>> 
>> Plain bool please.
> 
> Sadly struct hvm_io_ops (which is where this function is used) expects
> a bool_t as return.

I don't follow - bool_t is simply a typedef of bool nowadays, and
typedefs are all equivalent as far as the C type system goes.

>> Again the question - what's the bare hardware equivalent of
>> returning X86EMUL_UNHANDLEABLE here?
> 
> All 1's I assume (or other random garbage). Would you be OK with me
> adding a "fail" label that sets data to all 1's and returns X86EMUL_OKAY?

That would probably be okay (despite my dislike of labels and goto-s).

>> > +#include 
>> > +#include 
>> > +
>> > +extern const vpci_register_init_t __start_vpci_array[], 
>> > __end_vpci_array[];
>> > +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>> > +#define vpci_init __start_vpci_array
>> 
>> What is this last one good for?
> 
> It's used by xen_vpci_add_handlers in order to call the init
> functions, I can drop it and call __start_vpci_array[i](...) if that's
> better.

Well, if there were several users, I could see the benefit of an
abbreviating #define. But for a single user the #define adds
more code / clutter than is being saved on the use site.

>> > +struct rb_node node;
>> > +};
>> > +
>> > +int xen_vpci_add_handlers(struct pci_dev *pdev)
>> 
>> __hwdom_init (I notice setup_one_hwdom_device() wrongly isn't
>> annotated so).
> 
> OK, so you really want the init handlers to be inside of the
> .init.rodata section then.

Only if that's correct, and it is correct as long as all possible call trees
root in a __hwdom_init function. (To avoid misunderstanding: This
clearly can't be .init.rodata uniformly, as __hwdom_init isn't always
an alias of __init).

>> > +struct vpci_register r = {
>> > +.offset = reg,
>> > +.size = size,
>> > +};
>> > +
>> > +ASSERT(vpci_locked(pdev->domain));
>> > +
>> > +node = pdev->vpci->handlers.rb_node;
>> > +while ( node )
>> > +{
>> > +struct vpci_register *t =
>> 
>> const
> 
> Making both of them const means the return must also be const, and
> that's not suitable by some of the consumers (ie:
> xen_vpci_remove_register for example).

In that case there's no choice then, okay.

>> > +int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t read_handler,

Btw., I only now notice this further strange xen_ prefix here.

>> > +  vpci_write_t write_handler, unsigned int offset,
>> > +  unsigned int size, void *data)
>> > +{
>> > +struct rb_node **new, *parent;
>> > +struct vpci_register 

Re: [Xen-devel] [PATCH v3 0/9] vpci: PCI config space emulation

2017-05-29 Thread Roger Pau Monne
On Mon, May 29, 2017 at 07:38:14AM -0600, Jan Beulich wrote:
> >>> On 27.04.17 at 16:35,  wrote:
> > The following series contain an implementation of handlers for the PCI
> > configuration space inside of Xen. This allows Xen to detect accesses to the
> > PCI configuration space and react accordingly.
> > 
> > Although there hasn't been a lot of review on the previous version, I send 
> > this
> > new version because I will be away for > 1 week, and I would rather have 
> > review
> > on this version than the old one. As usual, each patch contains a changeset
> > summary between versions.
> > 
> > Patch 1 implements the generic handlers for accesses to the PCI 
> > configuration
> > space together with a minimal user-space test harness that I've used during
> > development. Currently a per-device red-back tree is used in order to store 
> > the
> > list of handlers, and they are indexed based on their offset inside of the
> > configuration space. Patch 1 also adds the x86 port IO traps and wires them
> > into the newly introduced vPCI dispatchers. Patch 2 adds handlers for the 
> > ECAM
> > areas (as found on the MMCFG ACPI table). Patches 3 and 4 are mostly code
> > moment/refactoring in order to implement support for BAR mapping in patch 5.
> > Patch 6 allows Xen to mask certain PCI capabilities on-demand, which is 
> > used in
> > order to mask MSI and MSI-X.
> > 
> > Finally patches 8 and 9 implement support in order to emulate the MSI/MSI-X
> > capabilities inside of Xen, so that the interrupts are transparently routed 
> > to
> > the guest.
> 
> While the code looks reasonable for this early stage, it's still quite
> large a chunk of new logic.

Thanks for the detailed review, it's greatly appreciated (it's a very
large amount of code so I assume this is not trivial for you at
all).

I'm still going over the comments, I hope I will be able to send a new
version before the end of the week.

> Therefore I think that if already there
> was no prior design discussion, some reasoning behind the decisions
> you've taken should be provided here. In particular I'm quite
> unhappy about this huge amount of intercepting and emulation,
> none of which we require for PV Dom0. IOW I continue to be
> unconvinced that putting all the burden on Xen while not para-
> virtualizing any of this in the Dom0 kernel is the right choice.

IMHO, there are two main points of doing all this emulation inside of
Xen, the first one is to prevent adding a bunch of duplicated Xen PV
specific code to each OS we want to support in PVH mode. This just
promotes Xen code duplication amongst OSes, which leads to more
maintainership burden.

The second reason would be that this code (or it's functionality to be
more precise) already exists in QEMU (and pciback to a degree), and
it's code that we already support and maintain. By moving it into the
hypervisor itself every guest type can make use of it, and should be
shared between them all (I know that the code in this series is not
yet suitable for DomU HVM guests yet).

> It
> certainly would be if we were talking about HVM Dom0, but this is
> PVH, and the "PV" part is first there for a reason.

Well, I've been mostly forced into using the PVH name for historical
reasons, but when I started working on this I called it HVMlite,
because I think it's more similar to HVM than PV by a long shot (and
the PVH Dom0 builder functions were using the "hvm" prefix in the
firsts iterations of that series).

Since PVH was never finished, the PVH name was reused in order to
prevent us the shame of announcing something that was never finished,
and to prevent adding more confusion to users.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/HAP: avoid using bogus/misleading locking

2017-05-29 Thread Andrew Cooper
On 29/05/2017 13:37, Jan Beulich wrote:
> hap_teardown() unconditionally releases the paging lock and is always
> being called without the lock held: Lock acquire should then be
> unconditional too.
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 0/9] vpci: PCI config space emulation

2017-05-29 Thread Jan Beulich
>>> On 27.04.17 at 16:35,  wrote:
> The following series contain an implementation of handlers for the PCI
> configuration space inside of Xen. This allows Xen to detect accesses to the
> PCI configuration space and react accordingly.
> 
> Although there hasn't been a lot of review on the previous version, I send 
> this
> new version because I will be away for > 1 week, and I would rather have 
> review
> on this version than the old one. As usual, each patch contains a changeset
> summary between versions.
> 
> Patch 1 implements the generic handlers for accesses to the PCI configuration
> space together with a minimal user-space test harness that I've used during
> development. Currently a per-device red-back tree is used in order to store 
> the
> list of handlers, and they are indexed based on their offset inside of the
> configuration space. Patch 1 also adds the x86 port IO traps and wires them
> into the newly introduced vPCI dispatchers. Patch 2 adds handlers for the ECAM
> areas (as found on the MMCFG ACPI table). Patches 3 and 4 are mostly code
> moment/refactoring in order to implement support for BAR mapping in patch 5.
> Patch 6 allows Xen to mask certain PCI capabilities on-demand, which is used 
> in
> order to mask MSI and MSI-X.
> 
> Finally patches 8 and 9 implement support in order to emulate the MSI/MSI-X
> capabilities inside of Xen, so that the interrupts are transparently routed to
> the guest.

While the code looks reasonable for this early stage, it's still quite
large a chunk of new logic. Therefore I think that if already there
was no prior design discussion, some reasoning behind the decisions
you've taken should be provided here. In particular I'm quite
unhappy about this huge amount of intercepting and emulation,
none of which we require for PV Dom0. IOW I continue to be
unconvinced that putting all the burden on Xen while not para-
virtualizing any of this in the Dom0 kernel is the right choice. It
certainly would be if we were talking about HVM Dom0, but this is
PVH, and the "PV" part is first there for a reason.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 6/9] xen/vpci: trap access to the list of PCI capabilities

2017-05-29 Thread Jan Beulich
>>> On 27.04.17 at 16:35,  wrote:
> +static int vpci_index_capabilities(struct pci_dev *pdev)
> +{
> +uint8_t seg = pdev->seg, bus = pdev->bus;
> +uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +uint8_t pos = PCI_CAPABILITY_LIST;
> +uint16_t status;
> +unsigned int max_cap = 48;
> +struct vpci_capability *cap;
> +int rc;
> +
> +INIT_LIST_HEAD(>vpci->cap_list);
> +
> +/* Check if device has capabilities. */
> +status = pci_conf_read16(seg, bus, slot, func, PCI_STATUS);
> +if ( !(status & PCI_STATUS_CAP_LIST) )
> +return 0;
> +
> +/* Add the root capability pointer. */
> +cap = xzalloc(struct vpci_capability);
> +if ( !cap )
> +return -ENOMEM;
> +
> +cap->offset = pos;
> +list_add_tail(>next, >vpci->cap_list);
> +rc = xen_vpci_add_register(pdev, vpci_cap_read, vpci_cap_write, pos,
> +   1, cap);
> +if ( rc )
> +return rc;
> +
> +/*
> + * Iterate over the list of capabilities present in the device, and
> + * add a handler for each register pointer to the next item
> + * (PCI_CAP_LIST_NEXT).
> + */
> +while ( max_cap-- )
> +{
> +pos = pci_conf_read8(seg, bus, slot, func, pos);
> +if ( pos < 0x40 )
> +break;
> +
> +cap = xzalloc(struct vpci_capability);
> +if ( !cap )
> +return -ENOMEM;
> +
> +cap->offset = pos;
> +list_add_tail(>next, >vpci->cap_list);
> +pos += PCI_CAP_LIST_NEXT;
> +rc = xen_vpci_add_register(pdev, vpci_cap_read, vpci_cap_write, pos,
> +   1, cap);
> +if ( rc )
> +return rc;
> +}
> +
> +return 0;
> +}

Btw., instead of duplicating some of what pci_find_cap_offset()
and pci_find_next_cap() do, perhaps worth making those two
functions capable of dealing with a wildcard ID (0xff) as input.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 9/9] vpci/msix: add MSI-X handlers

2017-05-29 Thread Jan Beulich
>>> On 27.04.17 at 16:35,  wrote:
> +static int vpci_msix_control_write(struct pci_dev *pdev, unsigned int reg,
> +   union vpci_val val, void *data)
> +{
> +uint8_t seg = pdev->seg, bus = pdev->bus;
> +uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +paddr_t table_base = 
> pdev->vpci->header.bars[pdev->vpci->msix->bir].paddr;
> +struct vpci_msix *msix = data;

Wouldn't you better use this also to obtain the array index one line
earlier?

> +bool new_masked, new_enabled;
> +unsigned int i;
> +uint32_t data32;
> +int rc;
> +
> +new_masked = val.word & PCI_MSIX_FLAGS_MASKALL;
> +new_enabled = val.word & PCI_MSIX_FLAGS_ENABLE;
> +
> +if ( new_enabled != msix->enabled && new_enabled )

if ( !msix->enabled && new_enabled )

would likely be easier to read (similar for the "else if" below).

> +{
> +/* MSI-X enabled. */
> +for ( i = 0; i < msix->max_entries; i++ )
> +{
> +if ( msix->entries[i].masked )
> +continue;
> +
> +rc = vpci_msix_enable(>entries[i].arch, pdev,
> +  msix->entries[i].addr, 
> msix->entries[i].data,
> +  msix->entries[i].nr, table_base);
> +if ( rc )
> +{
> +gdprintk(XENLOG_ERR,
> + "%04x:%02x:%02x.%u: unable to update entry %u: 
> %d\n",
> + seg, bus, slot, func, i, rc);
> +return rc;
> +}
> +
> +vpci_msix_mask(>entries[i].arch, false);
> +}
> +}
> +else if ( new_enabled != msix->enabled && !new_enabled )
> +{
> +/* MSI-X disabled. */
> +for ( i = 0; i < msix->max_entries; i++ )
> +{
> +rc = vpci_msix_disable(>entries[i].arch);
> +if ( rc )
> +{
> +gdprintk(XENLOG_ERR,
> + "%04x:%02x:%02x.%u: unable to disable entry %u: 
> %d\n",
> + seg, bus, slot, func, i, rc);
> +return rc;
> +}
> +}
> +}
> +
> +data32 = val.word;
> +if ( (new_enabled != msix->enabled || new_masked != msix->masked) &&
> + pci_msi_conf_write_intercept(pdev, reg, 2, ) >= 0 )
> +pci_conf_write16(seg, bus, slot, func, reg, data32);

What's the intermediate variable "data32" good for here? Afaict you
could use val.word in its stead.

> +static struct vpci_msix *vpci_msix_find(struct domain *d, unsigned long addr)
> +{
> +struct vpci_msix *msix;
> +
> +ASSERT(vpci_locked(d));
> +list_for_each_entry ( msix,  >arch.hvm_domain.msix_tables, next )
> +if ( msix->pdev->vpci->header.command & PCI_COMMAND_MEMORY &&

Please parenthesize & within &&.

> + addr >= msix->addr &&
> + addr < msix->addr + msix->max_entries * PCI_MSIX_ENTRY_SIZE )
> +return msix;
> +
> +return NULL;
> +}

Looking ahead I'm getting the impression that you only allow
accesses to the MSI-X table entries, yet in vpci_modify_bars() you
(correctly) prevent mapping entire pages. While most other
registers are disallowed from sharing a page with the table, the PBA
is specifically named as an exception. Hence you need to support
at least reads from the entire range.

> +static int vpci_msix_table_accept(struct vcpu *v, unsigned long addr)
> +{
> +int found;
> +
> +vpci_lock(v->domain);
> +found = !!vpci_msix_find(v->domain, addr);

At the risk of repeating a comment I gave on an earlier patch: Using
"bool" for "found" allows you to avoid the !! .

> +static int vpci_msix_access_check(struct pci_dev *pdev, unsigned long addr,
> +  unsigned int len)
> +{
> +uint8_t seg = pdev->seg, bus = pdev->bus;
> +uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +
> +

No double blank lines please.

> +/* Only allow 32/64b accesses. */
> +if ( len != 4 && len != 8 )
> +{
> +gdprintk(XENLOG_ERR,
> + "%04x:%02x:%02x.%u: invalid MSI-X table access size: %u\n",
> + seg, bus, slot, func, len);
> +return -EINVAL;
> +}
> +
> +/* Do no allow accesses that span across multiple entries. */
> +if ( (addr & (PCI_MSIX_ENTRY_SIZE - 1)) + len > PCI_MSIX_ENTRY_SIZE )
> +{
> +gdprintk(XENLOG_ERR,
> + "%04x:%02x:%02x.%u: MSI-X access crosses entry boundary\n",
> + seg, bus, slot, func);
> +return -EINVAL;
> +}
> +
> +/*
> + * Only allow 64b accesses to the low message address field.
> + *
> + * NB: this is more restrictive than the specification, that allows 64b
> + * accesses to other fields under certain circumstances, so this check 
> and
> + * the code will have to be fixed in order to fully comply with the
> + * specification.
> + 

Re: [Xen-devel] Interrupt issues with hvm_emulate_one_vm_event()

2017-05-29 Thread Razvan Cojocaru
On 05/26/17 18:38, Jan Beulich wrote:
 On 26.05.17 at 16:37,  wrote:
>> On 05/26/17 17:29, Jan Beulich wrote:
>> On 25.05.17 at 11:40,  wrote:
 I've noticed that, with pages marked NX and vm_event emulation, we can
 end up emulating an ud2, for which hvm_emulate_one() returns
 X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>>>
>>> Could you explain what would lead to emulation of UD2?
>>
>> If you mean in which cases does our engine mark pages NX, I'll have to
>> ask and get back to you. If you mean why generally would an UD2 end up
>> being the instruction where RIP causes an execute violation fault, I'll
>> have to check.
> 
> The question was more for the latter, as I don't understand what
> good could come from executing UD2 intentionally, unless the
> entity doing so knows there is an emulator around to do something
> sensible with it.

I owe you an answer here: I've spoken to my introspection engine
colleague Andrei, and they purposefully put an UD2 there to terminate a
malicious process (i.e. the exception is wanted).

I've found this problem while stress-testing Xen 4.9 verifying another
patch, using our in-house user-mode test applications, which simulate
this sort of malicious behaviour.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable test] 109828: tolerable FAIL

2017-05-29 Thread osstest service owner
flight 109828 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109828/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt-raw 14 guest-start/debian.repeat fail in 109764 pass 
in 109828
 test-armhf-armhf-xl-rtds 16 guest-start.2fail in 109790 pass in 109764
 test-armhf-armhf-xl-credit2  11 guest-start  fail in 109790 pass in 109828
 test-armhf-armhf-xl-rtds 11 guest-start  fail in 109813 pass in 109828
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeat  fail pass in 109790
 test-arm64-arm64-xl-credit2   6 xen-boot   fail pass in 109813

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-start/win.repeat fail in 109764 
blocked in 109828
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop   fail in 109790 like 109803
 test-amd64-i386-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail in 109790 
like 109803
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop  fail in 109813 like 109803
 test-arm64-arm64-xl-credit2 12 migrate-support-check fail in 109813 never pass
 test-arm64-arm64-xl-credit2 13 saverestore-support-check fail in 109813 never 
pass
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 109813
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 109813
 test-amd64-i386-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail like 109813
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 109813
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 109813
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 109813
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 109813
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64  9 windows-installfail never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-arm64-arm64-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 12 saverestore-support-checkfail never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-rtds 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-rtds 13 

[Xen-devel] [PATCH] xen: avoid type warning in xchg_xen_ulong

2017-05-29 Thread Arnd Bergmann
The improved type-checking version of container_of() triggers a warning for
xchg_xen_ulong, pointing out that 'xen_ulong_t' is unsigned, but atomic64_t
contains a signed value:

drivers/xen/events/events_2l.c: In function 'evtchn_2l_handle_events':
arch/x86/include/asm/cmpxchg.h:87:21: error: 'pending_words' is used 
uninitialized in this function [-Werror=uninitialized]

This adds a cast to work around the warning

Cc: Ian Abbott 
Fixes: 85323a991d40 ("xen: arm: mandate EABI and use generic atomic 
operations.")
Fixes: daa2ac80834d ("kernel.h: handle pointers to arrays better in 
container_of()")
Signed-off-by: Arnd Bergmann 
---
 arch/arm/include/asm/xen/events.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/xen/events.h 
b/arch/arm/include/asm/xen/events.h
index 71e473d05fcc..620dc75362e5 100644
--- a/arch/arm/include/asm/xen/events.h
+++ b/arch/arm/include/asm/xen/events.h
@@ -16,7 +16,7 @@ static inline int xen_irqs_disabled(struct pt_regs *regs)
return raw_irqs_disabled_flags(regs->ARM_cpsr);
 }
 
-#define xchg_xen_ulong(ptr, val) atomic64_xchg(container_of((ptr), \
+#define xchg_xen_ulong(ptr, val) atomic64_xchg(container_of((long long*)(ptr),\
atomic64_t, \
counter), (val))
 
-- 
2.9.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space

2017-05-29 Thread Roger Pau Monne
On Fri, May 19, 2017 at 05:35:47AM -0600, Jan Beulich wrote:
> >>> On 27.04.17 at 16:35,  wrote:
> > --- a/tools/libxl/libxl_x86.c
> > +++ b/tools/libxl/libxl_x86.c
> > @@ -11,7 +11,7 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
> >  if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM) {
> >  if (d_config->b_info.device_model_version !=
> >  LIBXL_DEVICE_MODEL_VERSION_NONE) {
> > -xc_config->emulation_flags = XEN_X86_EMU_ALL;
> > +xc_config->emulation_flags = (XEN_X86_EMU_ALL & 
> > ~XEN_X86_EMU_VPCI);
> 
> I can see why you need this, but I'm not sure this is a good model.
> Ideally for ordinary HVM guests you'd never have to change this
> line. Therefore perhaps it might be a better idea to use a "negative"
> flag here.

I would expect that at some point HVM guests are also going to use the
Xen internal PCI emulation for IOREQs (XEN_DMOP_IO_RANGE_PCI), and for
PCI passthrough, so having VPCI disabled is only temporary (long term
HVM guests should again use XEN_X86_EMU_ALL).

> > --- /dev/null
> > +++ b/tools/tests/vpci/Makefile
> > @@ -0,0 +1,45 @@
> > +
> > +XEN_ROOT=$(CURDIR)/../../..
> > +include $(XEN_ROOT)/tools/Rules.mk
> > +
> > +TARGET := test_vpci
> > +
> > +.PHONY: all
> > +all: $(TARGET)
> > +
> > +.PHONY: run
> > +run: $(TARGET)
> > +   ./$(TARGET) > $(TARGET).out
> > +
> > +$(TARGET): vpci.c vpci.h rbtree.c rbtree.h
> > +   $(HOSTCC) -g -o $@ vpci.c main.c rbtree.c
> > +
> > +.PHONY: clean
> > +clean:
> > +   rm -rf $(TARGET) $(TARGET).out *.o *~ vpci.h vpci.c rbtree.c rbtree.h
> > +
> > +.PHONY: distclean
> > +distclean: clean
> > +
> > +.PHONY: install
> > +install:
> > +
> > +vpci.h: $(XEN_ROOT)/xen/include/xen/vpci.h
> > +   sed -e '/#include/d' <$< >$@
> > +
> > +vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
> > +   # Trick the compiler so it doesn't complain about missing symbols
> > +   sed -e '/#include/d' \
> > +   -e '1s;^;#include "emul.h"\
> > +const vpci_register_init_t __start_vpci_array[1]\;\
> > +const vpci_register_init_t __end_vpci_array[1]\;\
> > +;' <$< >$@
> > +
> > +rbtree.h: $(XEN_ROOT)/xen/include/xen/rbtree.h
> > +   sed -e '/#include/d' <$< >$@
> > +
> > +rbtree.c: $(XEN_ROOT)/xen/common/rbtree.c
> > +   sed -e "/#include/d" \
> > +   -e '1s;^;#include "emul.h"\
> > +;' <$< >$@
> 
> Plain symlinking and __XEN__ conditionals in the files may be the
> easier to follow variant. But I'm no heavily opposed to this one,
> I'm merely afraid that further adjustments may end up becoming
> necessary down the road, resulting in the rules here to become
> more convoluted.

Yes, I'm not opposed to doing that, I just think the code is cleaner
using this rather than adding __XEN__ conditionals, but I agree that
if this becomes too convoluted at some point I would be in favor of
using conditionals instead.

> > --- /dev/null
> > +++ b/tools/tests/vpci/emul.h
> > @@ -0,0 +1,107 @@
> > +/*
> > + * Unit tests for the generic vPCI handler code.
> > + *
> > + * Copyright (C) 2017 Citrix Systems R
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms and conditions of the GNU General Public
> > + * License, version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public
> > + * License along with this program; If not, see 
> > .
> > + */
> > +
> > +#ifndef _TEST_VPCI_
> > +#define _TEST_VPCI_
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define container_of(ptr, type, member) ({  \
> > +typeof( ((type *)0)->member ) *__mptr = (ptr);  \
> > +(type *)( (char *)__mptr - offsetof(type,member) );})
> 
> There are a couple of stray blanks (immediately inside parentheses)
> here, and a missing one after the comma in offsetof().

Sorry, I've picked this as-is from the Xen headers (kernel.h). I've
changed it to:

#define container_of(ptr, type, member) ({  \
typeof(((type *)0)->member) *__mptr = (ptr);\
(type *)((char *)__mptr - offsetof(type, member));})

> > +#include "rbtree.h"
> > +
> > +struct pci_dev {
> > +struct domain *domain;
> > +struct vpci *vpci;
> > +};
> > +
> > +struct domain {
> > +struct pci_dev pdev;
> > +};
> > +
> > +struct vcpu
> > +{
> > +struct domain *domain;
> > +};
> > +
> > +extern struct vcpu v;
> 
> This is odd. With ...
> 
> > +#define spin_lock(x)
> > +#define spin_unlock(x)
> > +#define spin_is_locked(x) true
> > 

[Xen-devel] [PATCH] x86/HAP: avoid using bogus/misleading locking

2017-05-29 Thread Jan Beulich
hap_teardown() unconditionally releases the paging lock and is always
being called without the lock held: Lock acquire should then be
unconditional too.

Signed-off-by: Jan Beulich 
---
While this is only a cosmetic change afaict I would still like to
explore whether to include this in 4.9.

--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -576,8 +576,7 @@ void hap_teardown(struct domain *d, bool
 ASSERT(d->is_dying);
 ASSERT(d != current->domain);
 
-if ( !paging_locked_by_me(d) )
-paging_lock(d); /* Keep various asserts happy */
+paging_lock(d); /* Keep various asserts happy */
 
 if ( paging_mode_enabled(d) )
 {



x86/HAP: avoid using bogus/misleading locking

hap_teardown() unconditionally releases the paging lock and is always
being called without the lock held: Lock acquire should then be
unconditional too.

Signed-off-by: Jan Beulich 
---
While this is only a cosmetic change afaict I would still like to
explore whether to include this in 4.9.

--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -576,8 +576,7 @@ void hap_teardown(struct domain *d, bool
 ASSERT(d->is_dying);
 ASSERT(d != current->domain);
 
-if ( !paging_locked_by_me(d) )
-paging_lock(d); /* Keep various asserts happy */
+paging_lock(d); /* Keep various asserts happy */
 
 if ( paging_mode_enabled(d) )
 {
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Interrupt issues with hvm_emulate_one_vm_event()

2017-05-29 Thread Razvan Cojocaru
On 05/29/17 15:11, Andrew Cooper wrote:
> On 29/05/2017 12:46, Razvan Cojocaru wrote:
>> On 05/29/17 14:05, Jan Beulich wrote:
>> On 29.05.17 at 11:20,  wrote:
 On 05/26/17 18:11, Andrew Cooper wrote:
> On 26/05/17 15:29, Jan Beulich wrote:
> On 25.05.17 at 11:40,  wrote:
>>> I've noticed that, with pages marked NX and vm_event emulation, we can
>>> end up emulating an ud2, for which hvm_emulate_one() returns
>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>> Could you explain what would lead to emulation of UD2?
>>
>>> This, in turn, causes a hvm_inject_event() call in the context of
>>> hvm_do_resume(), which can, if there's already a pending event there,
>>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
>>> input (mouse frozen, keyboard unresponsive).
>>>
>>> After much trial and error, I've been able to confirm this by leaving a
>>> guest on for almost a full day with this change:
>>>
>>>  case X86EMUL_EXCEPTION:
>>> -hvm_inject_event();
>>> +if ( !hvm_event_pending(current) )
>>> +hvm_inject_event();
>>>
>>> and checking that there's been no BSOD or loss of input.
>>>
>>> However, just losing the event here, while fine to prove that this is
>>> indeed the problem, is not OK. But I'm not sure what an elegant / robust
>>> way of fixing this is.
>> Much depends on what the other event is: If it's an interrupt, I'd
>> assume there to be an ordering problem (interrupts shouldn't be
>> injected when there is a pending exception, their delivery instead
>> should be attempted on the first instruction of the exception
>> handler [if interrupts remain on] or whenever interrupts get
>> re-enabled).
> I suspect it is an ordering issue, and something has processed and
> interrupt before the emulation occurs as part of the vm_event reply 
> happens.
>
> The interrupt ordering spec indicates that external interrupts take
> precedent over faults raised from executing an instruction, on the basis
> that once the interrupt handler returns, the instruction will generate
> the same fault again.  However, its not obvious how this is intended to
> interact with interrupt windows and vmexits.  I expect we can get away
> with ensuring that external interrupts are the final thing considered
> for injection on the return-to-guest path.
>
> It might be an idea to leave an assert in vmx_inject_event() that an
> event is not already pending, but in the short term, this probably also
> wants debugging by trying to identify what sequence of actions is
> leading us to inject two events in this case (if indeed this is what is
> happening).
 With some patience, I've been able to catch the problem: "(XEN)
 vmx_inject_event(3, 14) but 0, 225 pending".

  63 /*
  64  * x86 event types. This enumeration is valid for:
  65  *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
  66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
  67  */
  68 enum x86_event_type {
  69 X86_EVENTTYPE_EXT_INTR, /* External interrupt */
  70 X86_EVENTTYPE_NMI = 2,  /* NMI */
  71 X86_EVENTTYPE_HW_EXCEPTION, /* Hardware exception */
  72 X86_EVENTTYPE_SW_INTERRUPT, /* Software interrupt (CD nn) */
  73 X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
  74 X86_EVENTTYPE_SW_EXCEPTION, /* INT3 (CC), INTO (CE) */
  75 };

 so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
 X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.
>>> So this confirms our suspicion, but doesn't move us closer to a
>>> solution. The question after all is why an external interrupt is
>>> being delivered prior to or while emulating. As Andrew did
>>> explain, proper behavior would be to check for external
>>> interrupts and don't enter emulation if one is pending, or don't
>>> check for external interrupts until the _next_ instruction
>>> boundary. Correct architectural behavior will result either way;
>>> the second variant merely must not continuously defer interrupts
>>> (i.e. there need to be instruction boundaries at which hardware
>>> of software do check for them).
>>>
>>> I'm not that familiar with the sequence of steps when dealing
>>> with emulation requests from an introspection agent, so I would
>>> hope you could go through those code paths to see where
>>> external interrupts are being checked for. Or wait - isn't your
>>> problem that you invoke emulation out of hvm_do_resume() (via
>>> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()),
>>> which happens after all other "normal" processing of a VM exit?
>>> Perhaps emulation should be skipped there if an event is 

Re: [Xen-devel] Interrupt issues with hvm_emulate_one_vm_event()

2017-05-29 Thread Andrew Cooper
On 29/05/2017 12:46, Razvan Cojocaru wrote:
> On 05/29/17 14:05, Jan Beulich wrote:
> On 29.05.17 at 11:20,  wrote:
>>> On 05/26/17 18:11, Andrew Cooper wrote:
 On 26/05/17 15:29, Jan Beulich wrote:
 On 25.05.17 at 11:40,  wrote:
>> I've noticed that, with pages marked NX and vm_event emulation, we can
>> end up emulating an ud2, for which hvm_emulate_one() returns
>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
> Could you explain what would lead to emulation of UD2?
>
>> This, in turn, causes a hvm_inject_event() call in the context of
>> hvm_do_resume(), which can, if there's already a pending event there,
>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
>> input (mouse frozen, keyboard unresponsive).
>>
>> After much trial and error, I've been able to confirm this by leaving a
>> guest on for almost a full day with this change:
>>
>>  case X86EMUL_EXCEPTION:
>> -hvm_inject_event();
>> +if ( !hvm_event_pending(current) )
>> +hvm_inject_event();
>>
>> and checking that there's been no BSOD or loss of input.
>>
>> However, just losing the event here, while fine to prove that this is
>> indeed the problem, is not OK. But I'm not sure what an elegant / robust
>> way of fixing this is.
> Much depends on what the other event is: If it's an interrupt, I'd
> assume there to be an ordering problem (interrupts shouldn't be
> injected when there is a pending exception, their delivery instead
> should be attempted on the first instruction of the exception
> handler [if interrupts remain on] or whenever interrupts get
> re-enabled).
 I suspect it is an ordering issue, and something has processed and
 interrupt before the emulation occurs as part of the vm_event reply 
 happens.

 The interrupt ordering spec indicates that external interrupts take
 precedent over faults raised from executing an instruction, on the basis
 that once the interrupt handler returns, the instruction will generate
 the same fault again.  However, its not obvious how this is intended to
 interact with interrupt windows and vmexits.  I expect we can get away
 with ensuring that external interrupts are the final thing considered
 for injection on the return-to-guest path.

 It might be an idea to leave an assert in vmx_inject_event() that an
 event is not already pending, but in the short term, this probably also
 wants debugging by trying to identify what sequence of actions is
 leading us to inject two events in this case (if indeed this is what is
 happening).
>>> With some patience, I've been able to catch the problem: "(XEN)
>>> vmx_inject_event(3, 14) but 0, 225 pending".
>>>
>>>  63 /*
>>>  64  * x86 event types. This enumeration is valid for:
>>>  65  *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
>>>  66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
>>>  67  */
>>>  68 enum x86_event_type {
>>>  69 X86_EVENTTYPE_EXT_INTR, /* External interrupt */
>>>  70 X86_EVENTTYPE_NMI = 2,  /* NMI */
>>>  71 X86_EVENTTYPE_HW_EXCEPTION, /* Hardware exception */
>>>  72 X86_EVENTTYPE_SW_INTERRUPT, /* Software interrupt (CD nn) */
>>>  73 X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
>>>  74 X86_EVENTTYPE_SW_EXCEPTION, /* INT3 (CC), INTO (CE) */
>>>  75 };
>>>
>>> so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
>>> X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.
>> So this confirms our suspicion, but doesn't move us closer to a
>> solution. The question after all is why an external interrupt is
>> being delivered prior to or while emulating. As Andrew did
>> explain, proper behavior would be to check for external
>> interrupts and don't enter emulation if one is pending, or don't
>> check for external interrupts until the _next_ instruction
>> boundary. Correct architectural behavior will result either way;
>> the second variant merely must not continuously defer interrupts
>> (i.e. there need to be instruction boundaries at which hardware
>> of software do check for them).
>>
>> I'm not that familiar with the sequence of steps when dealing
>> with emulation requests from an introspection agent, so I would
>> hope you could go through those code paths to see where
>> external interrupts are being checked for. Or wait - isn't your
>> problem that you invoke emulation out of hvm_do_resume() (via
>> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()),
>> which happens after all other "normal" processing of a VM exit?
>> Perhaps emulation should be skipped there if an event is already
>> pending injection, as emulation not having started means we still
>> are on an instruction boundary?
> Indeed, we are 

Re: [Xen-devel] Interrupt issues with hvm_emulate_one_vm_event()

2017-05-29 Thread Razvan Cojocaru
On 05/29/17 14:05, Jan Beulich wrote:
 On 29.05.17 at 11:20,  wrote:
>> On 05/26/17 18:11, Andrew Cooper wrote:
>>> On 26/05/17 15:29, Jan Beulich wrote:
>>> On 25.05.17 at 11:40,  wrote:
> I've noticed that, with pages marked NX and vm_event emulation, we can
> end up emulating an ud2, for which hvm_emulate_one() returns
> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
 Could you explain what would lead to emulation of UD2?

> This, in turn, causes a hvm_inject_event() call in the context of
> hvm_do_resume(), which can, if there's already a pending event there,
> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
> input (mouse frozen, keyboard unresponsive).
>
> After much trial and error, I've been able to confirm this by leaving a
> guest on for almost a full day with this change:
>
>  case X86EMUL_EXCEPTION:
> -hvm_inject_event();
> +if ( !hvm_event_pending(current) )
> +hvm_inject_event();
>
> and checking that there's been no BSOD or loss of input.
>
> However, just losing the event here, while fine to prove that this is
> indeed the problem, is not OK. But I'm not sure what an elegant / robust
> way of fixing this is.
 Much depends on what the other event is: If it's an interrupt, I'd
 assume there to be an ordering problem (interrupts shouldn't be
 injected when there is a pending exception, their delivery instead
 should be attempted on the first instruction of the exception
 handler [if interrupts remain on] or whenever interrupts get
 re-enabled).
>>>
>>> I suspect it is an ordering issue, and something has processed and
>>> interrupt before the emulation occurs as part of the vm_event reply happens.
>>>
>>> The interrupt ordering spec indicates that external interrupts take
>>> precedent over faults raised from executing an instruction, on the basis
>>> that once the interrupt handler returns, the instruction will generate
>>> the same fault again.  However, its not obvious how this is intended to
>>> interact with interrupt windows and vmexits.  I expect we can get away
>>> with ensuring that external interrupts are the final thing considered
>>> for injection on the return-to-guest path.
>>>
>>> It might be an idea to leave an assert in vmx_inject_event() that an
>>> event is not already pending, but in the short term, this probably also
>>> wants debugging by trying to identify what sequence of actions is
>>> leading us to inject two events in this case (if indeed this is what is
>>> happening).
>>
>> With some patience, I've been able to catch the problem: "(XEN)
>> vmx_inject_event(3, 14) but 0, 225 pending".
>>
>>  63 /*
>>  64  * x86 event types. This enumeration is valid for:
>>  65  *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
>>  66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
>>  67  */
>>  68 enum x86_event_type {
>>  69 X86_EVENTTYPE_EXT_INTR, /* External interrupt */
>>  70 X86_EVENTTYPE_NMI = 2,  /* NMI */
>>  71 X86_EVENTTYPE_HW_EXCEPTION, /* Hardware exception */
>>  72 X86_EVENTTYPE_SW_INTERRUPT, /* Software interrupt (CD nn) */
>>  73 X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
>>  74 X86_EVENTTYPE_SW_EXCEPTION, /* INT3 (CC), INTO (CE) */
>>  75 };
>>
>> so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
>> X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.
> 
> So this confirms our suspicion, but doesn't move us closer to a
> solution. The question after all is why an external interrupt is
> being delivered prior to or while emulating. As Andrew did
> explain, proper behavior would be to check for external
> interrupts and don't enter emulation if one is pending, or don't
> check for external interrupts until the _next_ instruction
> boundary. Correct architectural behavior will result either way;
> the second variant merely must not continuously defer interrupts
> (i.e. there need to be instruction boundaries at which hardware
> of software do check for them).
> 
> I'm not that familiar with the sequence of steps when dealing
> with emulation requests from an introspection agent, so I would
> hope you could go through those code paths to see where
> external interrupts are being checked for. Or wait - isn't your
> problem that you invoke emulation out of hvm_do_resume() (via
> hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()),
> which happens after all other "normal" processing of a VM exit?
> Perhaps emulation should be skipped there if an event is already
> pending injection, as emulation not having started means we still
> are on an instruction boundary?

Indeed, we are emulating after all the vmexit processing has already
happened, in the hvm_do_resume() path you've mentioned.

Not emulating if an event 

Re: [Xen-devel] [Crash-utility] [PATCH] xen: Add support for domU with Linux kernel 3.19 and newer

2017-05-29 Thread Daniel Kiper
On Wed, May 24, 2017 at 11:56:28AM -0400, Dave Anderson wrote:
> - Original Message -
> > crash patch c3413456599161cabc4e910a0ae91dfe5eec3c21 (xen: Add support for
> > dom0 with Linux kernel 3.19 and newer) from Daniel made crash utility
> > support xen dom0 vmcores after linux kernel commit
> > 054954eb051f35e74b75a566a96fe756015352c8 (xen: switch to linear virtual
> > mapped sparse p2m list).
> >
> > This patch can be deemed as a subsequent and make this utility support Xen
> > PV domU dumpfiles again.
> >
> > Basically speaking, readmem() can't be used to read xen_p2m_addr associate
> > memory directly during m2p translation. It introduces infinite recursion.
> > Following call sequence shows the scenario, it comes from a section of
> > backtrace with only kvaddr, machine addr and mfn left as parameter:
> >
> > module_init()
> >
> > /* The first readmem() from module_init(). */
> > readmem(addr=0xa02fe4a0)
> >
> > /* readmem() needs physical address, so calls kvtop(). */
> > kvtop(kvaddr=0xa02fe4a0)
> > x86_64_kvtop(kvaddr=a02fe4a0)
> >
> > /* Calculate physical address by traversing page tables. */
> > x86_64_kvtop_xen_wpt(kvaddr=0xa02fe4a0)
> >
> > /*
> >  * x86_64_kvtop_xen_wpt() is going to traverse the page table to
> >  * get the physical address for 0xa02fe4a0. So, at first it
> >  * is needed to translate the pgd from machine address to physical
> >  * address. So invoke xen_m2p() here to do the translation. 0x58687f000
> >  * is the pgd machine address in x86_64_kvtop_xen_wpt() and is needed
> >  * to be translated to its physical address.
> >  */
> > xen_m2p(machine=0x58687f000)
> > __xen_m2p(machine=0x58687f000, mfn=0x58687f)
> >
> > /*
> >  * __xen_m2p() is going to search mfn 0x58687f in p2m VMA which starts
> >  * at VMA 0xc91cf000. It compares every mfn stored in it with
> >  * 0x58687f. Once it's proved 0x58687f is one mfn in the p2m, its offset
> >  * will be used to calculate the pfn.
> >  *
> >  * readmem() is invoked by __xen_m2p() to read the page from VMA
> >  * 0xc91cf000 here.
> >  */
> > readmem(addr=0xc91cf000)
> >
> > /*
> >  * readmem() needs physical address of 0xc91cf000 to make the
> >  * reading done. So it invokes kvtop() to get the physical address.
> >  */
> > kvtop(kvaddr=0xc91cf000)
> > x86_64_kvtop(kvaddr=0xc91cf000)
> >
> > /* It needs to calculate physical address by traversing page tables. */
> > x86_64_kvtop_xen_wpt(kvaddr=0xc91cf000)
> >
> > /*
> >  * 0x581b7e000 is the machine address of pgd need to be translated here.
> >  * The mfn is calculated in this way at x86_64_kvtop_xen_wpt():
> >  *
> >  * pml4 = ((ulong *)machdep->machspec->pml4) + pml4_index(kvaddr);
> >  * pgd_paddr = (*pml4) & PHYSICAL_PAGE_MASK;
> >  *
> >  * The kvaddr 0xc91cf000 here is quite different from the one
> >  * above, so the machine address of pgd is not the same one. And this
> >  * pgd is the one we use to access the VMA of p2m table.
> >  */
> > xen_m2p(machine=0x581b7e000)
> > __xen_m2p(machine=0x581b7e000, mfn=0x581b7e)
> >
> > /*
> >  * Looking for mfn 0x581b7e in the range of p2m page which starts at
> >  * VMA 0xc91f5000.
> >  */
> > readmem(addr=0xc91f5000)
> >
> > /* Need physical address of VMA 0xc91f5000 as same reason above. */
> > kvtop(kvaddr=0xc91f5000)
> > x86_64_kvtop(kvaddr=0xc91f5000)
> >
> > /* Need to traverse page tables to calculate physical address for it. */
> > x86_64_kvtop_xen_wpt(kvaddr=0xc91f5000)
> >
> > /*
> >  * Unfortunately, machine address 0x581b7e000 have to be translated again.
> >  * Endless loop starts from here.
> >  */
> > xen_m2p(machine=0x581b7e000)
> > __xen_m2p(machine=0x581b7e000, mfn=0x581b7e)
> > readmem(addr=0xc91f5000)
> >
> > Fortunately, PV domU p2m mapping is also stored at xd->xfd + 
> > xch_index_offset
> > and organized as struct xen_dumpcore_p2m. We have a chance to read the p2m
> > stuff directly from there, and then we avoid the loop above.
> >
> > So, this patch implements a special reading function read_xc_p2m() to 
> > extract
> > the mfns from xd->xfd + xch_index_offset. This function does not need to 
> > read
> > mfns from p2m VMA like readmem() does, so, we avoid the endless loop 
> > introduced
> > by the address translation.
> >
> > Signed-off-by: Honglei Wang 
> > Reviewed-by: Daniel Kiper 
>
> Queued for crash-7.2.0:
>
>   
> https://github.com/crash-utility/crash/commit/5c52842a58a2602dba81de71831af98b2b53c6e0

Wow, Dave, you are fast! Thanks a lot!

Honglei, congrats! Thanks for doing the work!

Daniel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Interrupt issues with hvm_emulate_one_vm_event()

2017-05-29 Thread Jan Beulich
>>> On 29.05.17 at 11:20,  wrote:
> On 05/26/17 18:11, Andrew Cooper wrote:
>> On 26/05/17 15:29, Jan Beulich wrote:
>> On 25.05.17 at 11:40,  wrote:
 I've noticed that, with pages marked NX and vm_event emulation, we can
 end up emulating an ud2, for which hvm_emulate_one() returns
 X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>>> Could you explain what would lead to emulation of UD2?
>>>
 This, in turn, causes a hvm_inject_event() call in the context of
 hvm_do_resume(), which can, if there's already a pending event there,
 cause a 101 BSOD (timer-related, if I understand correctly) or loss of
 input (mouse frozen, keyboard unresponsive).

 After much trial and error, I've been able to confirm this by leaving a
 guest on for almost a full day with this change:

  case X86EMUL_EXCEPTION:
 -hvm_inject_event();
 +if ( !hvm_event_pending(current) )
 +hvm_inject_event();

 and checking that there's been no BSOD or loss of input.

 However, just losing the event here, while fine to prove that this is
 indeed the problem, is not OK. But I'm not sure what an elegant / robust
 way of fixing this is.
>>> Much depends on what the other event is: If it's an interrupt, I'd
>>> assume there to be an ordering problem (interrupts shouldn't be
>>> injected when there is a pending exception, their delivery instead
>>> should be attempted on the first instruction of the exception
>>> handler [if interrupts remain on] or whenever interrupts get
>>> re-enabled).
>> 
>> I suspect it is an ordering issue, and something has processed and
>> interrupt before the emulation occurs as part of the vm_event reply happens.
>> 
>> The interrupt ordering spec indicates that external interrupts take
>> precedent over faults raised from executing an instruction, on the basis
>> that once the interrupt handler returns, the instruction will generate
>> the same fault again.  However, its not obvious how this is intended to
>> interact with interrupt windows and vmexits.  I expect we can get away
>> with ensuring that external interrupts are the final thing considered
>> for injection on the return-to-guest path.
>> 
>> It might be an idea to leave an assert in vmx_inject_event() that an
>> event is not already pending, but in the short term, this probably also
>> wants debugging by trying to identify what sequence of actions is
>> leading us to inject two events in this case (if indeed this is what is
>> happening).
> 
> With some patience, I've been able to catch the problem: "(XEN)
> vmx_inject_event(3, 14) but 0, 225 pending".
> 
>  63 /*
>  64  * x86 event types. This enumeration is valid for:
>  65  *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
>  66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
>  67  */
>  68 enum x86_event_type {
>  69 X86_EVENTTYPE_EXT_INTR, /* External interrupt */
>  70 X86_EVENTTYPE_NMI = 2,  /* NMI */
>  71 X86_EVENTTYPE_HW_EXCEPTION, /* Hardware exception */
>  72 X86_EVENTTYPE_SW_INTERRUPT, /* Software interrupt (CD nn) */
>  73 X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
>  74 X86_EVENTTYPE_SW_EXCEPTION, /* INT3 (CC), INTO (CE) */
>  75 };
> 
> so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
> X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.

So this confirms our suspicion, but doesn't move us closer to a
solution. The question after all is why an external interrupt is
being delivered prior to or while emulating. As Andrew did
explain, proper behavior would be to check for external
interrupts and don't enter emulation if one is pending, or don't
check for external interrupts until the _next_ instruction
boundary. Correct architectural behavior will result either way;
the second variant merely must not continuously defer interrupts
(i.e. there need to be instruction boundaries at which hardware
of software do check for them).

I'm not that familiar with the sequence of steps when dealing
with emulation requests from an introspection agent, so I would
hope you could go through those code paths to see where
external interrupts are being checked for. Or wait - isn't your
problem that you invoke emulation out of hvm_do_resume() (via
hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()),
which happens after all other "normal" processing of a VM exit?
Perhaps emulation should be skipped there if an event is already
pending injection, as emulation not having started means we still
are on an instruction boundary?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [distros-debian-sid test] 71454: tolerable trouble: blocked/broken/fail/pass

2017-05-29 Thread Platform Team regression test user
flight 71454 distros-debian-sid real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/71454/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-i386-sid-netboot-pvgrub 10 guest-start   fail blocked in 71388
 test-amd64-amd64-amd64-sid-netboot-pvgrub 10 guest-start fail blocked in 71388
 test-armhf-armhf-armhf-sid-netboot-pygrub  9 debian-di-install fail like 71388

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-armhf-sid-netboot-pygrub  1 build-check(1)blocked n/a
 build-arm64-pvops 2 hosts-allocate   broken never pass
 build-arm64   2 hosts-allocate   broken never pass
 build-arm64   3 capture-logs broken never pass
 build-arm64-pvops 3 capture-logs broken never pass

baseline version:
 flight   71388

jobs:
 build-amd64  pass
 build-arm64  broken  
 build-armhf  pass
 build-i386   pass
 build-amd64-pvopspass
 build-arm64-pvopsbroken  
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-amd64-sid-netboot-pvgrubfail
 test-amd64-i386-i386-sid-netboot-pvgrub  fail
 test-amd64-i386-amd64-sid-netboot-pygrub pass
 test-arm64-arm64-armhf-sid-netboot-pygrubblocked 
 test-armhf-armhf-armhf-sid-netboot-pygrubfail
 test-amd64-amd64-i386-sid-netboot-pygrub pass



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Interrupt issues with hvm_emulate_one_vm_event()

2017-05-29 Thread Razvan Cojocaru
On 05/26/17 18:11, Andrew Cooper wrote:
> On 26/05/17 15:29, Jan Beulich wrote:
> On 25.05.17 at 11:40,  wrote:
>>> I've noticed that, with pages marked NX and vm_event emulation, we can
>>> end up emulating an ud2, for which hvm_emulate_one() returns
>>> X86EMUL_EXCEPTION in hvm_emulate_one_vm_event().
>> Could you explain what would lead to emulation of UD2?
>>
>>> This, in turn, causes a hvm_inject_event() call in the context of
>>> hvm_do_resume(), which can, if there's already a pending event there,
>>> cause a 101 BSOD (timer-related, if I understand correctly) or loss of
>>> input (mouse frozen, keyboard unresponsive).
>>>
>>> After much trial and error, I've been able to confirm this by leaving a
>>> guest on for almost a full day with this change:
>>>
>>>  case X86EMUL_EXCEPTION:
>>> -hvm_inject_event();
>>> +if ( !hvm_event_pending(current) )
>>> +hvm_inject_event();
>>>
>>> and checking that there's been no BSOD or loss of input.
>>>
>>> However, just losing the event here, while fine to prove that this is
>>> indeed the problem, is not OK. But I'm not sure what an elegant / robust
>>> way of fixing this is.
>> Much depends on what the other event is: If it's an interrupt, I'd
>> assume there to be an ordering problem (interrupts shouldn't be
>> injected when there is a pending exception, their delivery instead
>> should be attempted on the first instruction of the exception
>> handler [if interrupts remain on] or whenever interrupts get
>> re-enabled).
> 
> I suspect it is an ordering issue, and something has processed and
> interrupt before the emulation occurs as part of the vm_event reply happens.
> 
> The interrupt ordering spec indicates that external interrupts take
> precedent over faults raised from executing an instruction, on the basis
> that once the interrupt handler returns, the instruction will generate
> the same fault again.  However, its not obvious how this is intended to
> interact with interrupt windows and vmexits.  I expect we can get away
> with ensuring that external interrupts are the final thing considered
> for injection on the return-to-guest path.
> 
> It might be an idea to leave an assert in vmx_inject_event() that an
> event is not already pending, but in the short term, this probably also
> wants debugging by trying to identify what sequence of actions is
> leading us to inject two events in this case (if indeed this is what is
> happening).

With some patience, I've been able to catch the problem: "(XEN)
vmx_inject_event(3, 14) but 0, 225 pending".

 63 /*
 64  * x86 event types. This enumeration is valid for:
 65  *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
 66  *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
 67  */
 68 enum x86_event_type {
 69 X86_EVENTTYPE_EXT_INTR, /* External interrupt */
 70 X86_EVENTTYPE_NMI = 2,  /* NMI */
 71 X86_EVENTTYPE_HW_EXCEPTION, /* Hardware exception */
 72 X86_EVENTTYPE_SW_INTERRUPT, /* Software interrupt (CD nn) */
 73 X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
 74 X86_EVENTTYPE_SW_EXCEPTION, /* INT3 (CC), INTO (CE) */
 75 };

so an X86_EVENTTYPE_EXT_INTR is pending when we're trying to inject an
X86_EVENTTYPE_HW_EXCEPTION, as a result of the code quoted above.

This is the patch that has produced the above output:

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c8ef18a..3b88eca 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1807,6 +1807,12 @@ void vmx_inject_nmi(void)
X86_EVENT_NO_EC);
 }

+static bool hvm_get_pending_event(struct vcpu *v, struct x86_event *info)
+{
+info->cr2 = v->arch.hvm_vcpu.guest_cr[2];
+return hvm_funcs.get_pending_event(v, info);
+}
+
 /*
  * Generate a virtual event in the guest.
  * NOTES:
@@ -1821,6 +1827,15 @@ static void vmx_inject_event(const struct
x86_event *event)
 struct vcpu *curr = current;
 struct x86_event _event = *event;

+if ( hvm_event_pending(current) )
+{
+   struct x86_event ev;
+   hvm_get_pending_event(current, );
+
+   printk("vmx_inject_event(%d, %d) but %d, %d pending\n",
+  _event.type, _event.vector, ev.type, ev.vector);
+}
+
 switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
 {
 case TRAP_debug:


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches

2017-05-29 Thread Jan Beulich
>>> On 29.05.17 at 11:03,  wrote:
> On 29/05/2017 09:58, Jan Beulich wrote:
> On 26.05.17 at 19:03,  wrote:
>>> --- a/xen/arch/x86/mm/guest_walk.c
>>> +++ b/xen/arch/x86/mm/guest_walk.c
>>> @@ -114,22 +114,18 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain 
>>> *p2m,
>>>  ASSERT(!(walk & PFEC_implicit) ||
>>> !(walk & (PFEC_insn_fetch | PFEC_user_mode)));
>>>  
>>> -/*
>>> - * PFEC_insn_fetch is only used as an input to pagetable walking if NX 
>>> or
>>> - * SMEP are enabled.  Otherwise, instruction fetches are 
>>> indistinguishable
>>> - * from data reads.
>>> - *
>>> - * This property can be demonstrated on real hardware by having NX and
>>> - * SMEP inactive, but SMAP active, and observing that EFLAGS.AC 
>>> determines
>>> - * whether a pagefault occures for supervisor execution on user 
>>> mappings.
>>> - */
>>> -if ( !(guest_nx_enabled(v) || guest_smep_enabled(v)) )
>>> -walk &= ~PFEC_insn_fetch;
>>> -
>>>  perfc_incr(guest_walk);
>>>  memset(gw, 0, sizeof(*gw));
>>>  gw->va = va;
>>> -gw->pfec = walk & (PFEC_insn_fetch | PFEC_user_mode | 
>>> PFEC_write_access);
>>> +gw->pfec = walk & (PFEC_user_mode | PFEC_write_access);
>>> +
>>> +/*
>>> + * PFEC_insn_fetch is only reported if NX or SMEP are enabled.  
>>> Hardware
>>> + * still distingueses instruction fetches during determination of 
>>> access
>>> + * rights.
>>> + */
>>> +if ( guest_nx_enabled(v) || guest_smep_enabled(v) )
>>> +gw->pfec |= (walk & PFEC_insn_fetch);
>>>  
>>>  #if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */
>>>  #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
>> Don't you another adjustment to
>>
>> if ( (walk & PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )
>> /* Requested an instruction fetch and found NX? Fail. */
>> goto out;
>>
>> I can't see anything that would keep _PAGE_NX_BIT out of
>> ar if NX is not enabled.
> 
> _PAGE_NX_BIT is reserved if NX is not enabled, and is accounted for in
> guest_rsvd_bits() in guest_pt.h, and we never hit the access rights logic.

Ah, right. But perhaps worth having a respective ASSERT()
here, at once serving as documentation? In any event
Reviewed-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen: don't print error message in case of missing Xenstore entry

2017-05-29 Thread Juergen Gross
When registering for the Xenstore watch of the node control/sysrq the
handler will be called at once. Don't issue an error message if the
Xenstore node isn't there, as it will be created only when an event
is being triggered.

Signed-off-by: Juergen Gross 
---
 drivers/xen/manage.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index c1ec8ee80924..7ddd0803da23 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -277,8 +277,11 @@ static void sysrq_handler(struct xenbus_watch *watch, 
const char *path,
err = xenbus_transaction_start();
if (err)
return;
-   if (xenbus_scanf(xbt, "control", "sysrq", "%c", _key) < 0) {
-   pr_err("Unable to read sysrq code in control/sysrq\n");
+   err = xenbus_scanf(xbt, "control", "sysrq", "%c", _key);
+   if (err < 0) {
+   if (err != -ENOENT)
+   pr_err("Error %d reading sysrq code in control/sysrq\n",
+  err);
xenbus_transaction_end(xbt, 1);
return;
}
-- 
2.12.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches

2017-05-29 Thread Andrew Cooper
On 29/05/2017 09:58, Jan Beulich wrote:
 On 26.05.17 at 19:03,  wrote:
>> --- a/xen/arch/x86/mm/guest_walk.c
>> +++ b/xen/arch/x86/mm/guest_walk.c
>> @@ -114,22 +114,18 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain 
>> *p2m,
>>  ASSERT(!(walk & PFEC_implicit) ||
>> !(walk & (PFEC_insn_fetch | PFEC_user_mode)));
>>  
>> -/*
>> - * PFEC_insn_fetch is only used as an input to pagetable walking if NX 
>> or
>> - * SMEP are enabled.  Otherwise, instruction fetches are 
>> indistinguishable
>> - * from data reads.
>> - *
>> - * This property can be demonstrated on real hardware by having NX and
>> - * SMEP inactive, but SMAP active, and observing that EFLAGS.AC 
>> determines
>> - * whether a pagefault occures for supervisor execution on user 
>> mappings.
>> - */
>> -if ( !(guest_nx_enabled(v) || guest_smep_enabled(v)) )
>> -walk &= ~PFEC_insn_fetch;
>> -
>>  perfc_incr(guest_walk);
>>  memset(gw, 0, sizeof(*gw));
>>  gw->va = va;
>> -gw->pfec = walk & (PFEC_insn_fetch | PFEC_user_mode | 
>> PFEC_write_access);
>> +gw->pfec = walk & (PFEC_user_mode | PFEC_write_access);
>> +
>> +/*
>> + * PFEC_insn_fetch is only reported if NX or SMEP are enabled.  Hardware
>> + * still distingueses instruction fetches during determination of access
>> + * rights.
>> + */
>> +if ( guest_nx_enabled(v) || guest_smep_enabled(v) )
>> +gw->pfec |= (walk & PFEC_insn_fetch);
>>  
>>  #if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */
>>  #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
> Don't you another adjustment to
>
> if ( (walk & PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )
> /* Requested an instruction fetch and found NX? Fail. */
> goto out;
>
> I can't see anything that would keep _PAGE_NX_BIT out of
> ar if NX is not enabled.

_PAGE_NX_BIT is reserved if NX is not enabled, and is accounted for in
guest_rsvd_bits() in guest_pt.h, and we never hit the access rights logic.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] x86/pagewalk: Fix pagewalk's handling of instruction fetches

2017-05-29 Thread Jan Beulich
>>> On 26.05.17 at 19:03,  wrote:
> --- a/xen/arch/x86/mm/guest_walk.c
> +++ b/xen/arch/x86/mm/guest_walk.c
> @@ -114,22 +114,18 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain 
> *p2m,
>  ASSERT(!(walk & PFEC_implicit) ||
> !(walk & (PFEC_insn_fetch | PFEC_user_mode)));
>  
> -/*
> - * PFEC_insn_fetch is only used as an input to pagetable walking if NX or
> - * SMEP are enabled.  Otherwise, instruction fetches are 
> indistinguishable
> - * from data reads.
> - *
> - * This property can be demonstrated on real hardware by having NX and
> - * SMEP inactive, but SMAP active, and observing that EFLAGS.AC 
> determines
> - * whether a pagefault occures for supervisor execution on user mappings.
> - */
> -if ( !(guest_nx_enabled(v) || guest_smep_enabled(v)) )
> -walk &= ~PFEC_insn_fetch;
> -
>  perfc_incr(guest_walk);
>  memset(gw, 0, sizeof(*gw));
>  gw->va = va;
> -gw->pfec = walk & (PFEC_insn_fetch | PFEC_user_mode | PFEC_write_access);
> +gw->pfec = walk & (PFEC_user_mode | PFEC_write_access);
> +
> +/*
> + * PFEC_insn_fetch is only reported if NX or SMEP are enabled.  Hardware
> + * still distingueses instruction fetches during determination of access
> + * rights.
> + */
> +if ( guest_nx_enabled(v) || guest_smep_enabled(v) )
> +gw->pfec |= (walk & PFEC_insn_fetch);
>  
>  #if GUEST_PAGING_LEVELS >= 3 /* PAE or 64... */
>  #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */

Don't you another adjustment to

if ( (walk & PFEC_insn_fetch) && (ar & _PAGE_NX_BIT) )
/* Requested an instruction fetch and found NX? Fail. */
goto out;

I can't see anything that would keep _PAGE_NX_BIT out of
ar if NX is not enabled.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode"

2017-05-29 Thread Jan Beulich
>>> On 26.05.17 at 19:03,  wrote:
> This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.
> 
> When determining Access Rights, Protection Keys only take effect when CR4.PKE
> it set, and 4-level paging is active.  All other circumstances (notibly, 32bit
> PAE paging) skip the Protection Key control mechanism.
> 
> Therefore, we do not need to clear CR4.PKE behind the back of a guest which is
> not using paging, as such a guest is necesserily running with EFER.LME
> disabled.

DYM EFER.LMA here?

> The {RD,WR}PKRU instructions are specified as being legal for use in any
> operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind the
> back of an unpaged guest, these instructions yield #UD despite the guest
> seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 

I would like to get clarification from Huaitong, however, on the
reason for the original change: According to the reasoning here,
it shouldn't have been an observed failure of some kind, but
merely the thinking that something may be wrong (but really
wasn't).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] tools/firmware: pass EXTRAVERSION to seabios build

2017-05-29 Thread Olaf Hering
On Fri, May 26, Ian Jackson wrote:

> This seems like a real problem which should be improved.  But maybe we
> should use Xen's EXTRAVERSION by default ?

After thinking about it, why does the tools build not just enforce a
fixed string? There is no point for scripts/buildversion.py to put
current time and buildhost into the binary. This breaks reproducible
builds. A simple, untested change like this should be enough:

diff --git a/tools/firmware/Makefile b/tools/firmware/Makefile
index 8562f547bc..c2b5985dc7 100644
--- a/tools/firmware/Makefile
+++ b/tools/firmware/Makefile
@@ -22,6 +22,8 @@ ovmf-dir:
 seabios-dir:
GIT=$(GIT) $(XEN_ROOT)/scripts/git-checkout.sh $(SEABIOS_UPSTREAM_URL) 
$(SEABIOS_UPSTREAM_REVISION) seabios-dir
cp seabios-config seabios-dir/.config;
+   rm -f seabios-dir/.version
+   echo '$(SEABIOS_UPSTREAM_REVISION)' > seabios-dir/.version
$(MAKE) -C seabios-dir olddefconfig
 
 .PHONY: all
@@ -35,7 +37,7 @@ ifeq ($(CONFIG_ROMBIOS),y)
false ; \
fi
 endif
-   $(MAKE) $(LD32BIT-y) CC=$(CC) PYTHON=$(PYTHON) subdirs-$@
+   $(MAKE) $(LD32BIT-y) CC=$(CC) PYTHON=$(PYTHON) EXTRAVERSION="-Xen" 
subdirs-$@
 
 
 .PHONY: install

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [linux-linus test] 109821: regressions - FAIL

2017-05-29 Thread osstest service owner
flight 109821 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/109821/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-i386-pvgrub 21 leak-check/check fail in 109809 REGR. vs. 
109656

Tests which are failing intermittently (not blocking):
 test-amd64-i386-libvirt-pair 11 host-ping-check-xen/src_host fail in 109809 
pass in 109821
 test-amd64-i386-libvirt-xsm 7 host-ping-check-xen fail in 109809 pass in 109821
 test-amd64-i386-libvirt-pair 12 host-ping-check-xen/dst_host fail in 109809 
pass in 109821
 test-amd64-i386-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail in 109809 
pass in 109821
 test-armhf-armhf-libvirt-xsm  7 host-ping-check-xenfail pass in 109809
 test-armhf-armhf-libvirt  7 host-ping-check-xenfail pass in 109809
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 14 guest-saverestore.2 fail pass 
in 109809
 test-amd64-amd64-i386-pvgrub 18 guest-start/debian.repeat  fail pass in 109809

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop  fail blocked in 109656
 test-armhf-armhf-xl-rtds   15 guest-start/debian.repeat fail blocked in 109656
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-check fail in 109809 like 
109656
 test-armhf-armhf-libvirt 13 saverestore-support-check fail in 109809 like 
109656
 test-amd64-amd64-xl-qemut-win7-amd64 15 guest-localmigrate/x10 fail in 109809 
like 109656
 test-armhf-armhf-libvirt-xsm 12 migrate-support-check fail in 109809 never pass
 test-armhf-armhf-libvirt12 migrate-support-check fail in 109809 never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 15 guest-localmigrate/x10 fail like 109656
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 109656
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 109656
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 109656
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 109656
 test-amd64-amd64-xl-qemut-ws16-amd64  9 windows-installfail never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64  9 windows-installfail never pass
 test-arm64-arm64-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-arm64-arm64-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-arm64-arm64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-rtds 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-rtds 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 13 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 12 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  12 migrate-support-checkfail   never pass
 

Re: [Xen-devel] boot message "xen:manage: Unable to read sysrq code in control/sysrq" ?

2017-05-29 Thread Juergen Gross
On 28/05/17 21:55, PGNet Dev wrote:
> Working with Xen 4.9.0 host on kernel 4.11.3
> 
>   dmesg | egrep -i "linux version|xen version"
>   [0.00] Linux version 4.11.3-2.g7262353-default 
> (geeko@buildhost) (gcc version 7.1.1 20170517 [gcc-7-branch revision 248152] 
> (SUSE Linux) ) #1 SMP PREEMPT Thu May 25 17:55:04 UTC 2017 (7262353)
>   [0.00] Xen version: 4.9.0_07-503 (preserve-AD)
> 
> I currently see in host boot
> 
>   dmesg | grep sysrq
>   [   20.905146] xen:manage: Unable to read sysrq code in 
> control/sysrq
> 
> That seems to originate at
> 
>   https://lkml.org/lkml/2016/11/8/134
>   ...
>   +   if (xenbus_scanf(xbt, "control", "sysrq", "%c", _key) < 
> 0) {
>   pr_err("Unable to read sysrq code in control/sysrq\n");
>   xenbus_transaction_end(xbt, 1);
>   return;
>   ...
> 
> Is this a config issue -- wrong or missing -- on my end?  Or a problem that 
> needs some more diagnostic info here?

Its just a message which shouldn't be printed at all, as this is no
error. I'll send a patch.

Thanks for the notice!


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 03/12 v3] xen/arm: vpl011: Add pl011 uart emulation in Xen

2017-05-29 Thread Bhupinder Thakur
Hi Julien,

On 26 May 2017 at 19:12, Bhupinder Thakur  wrote:
>>> +#ifndef _VPL011_H_
>>> +
>>> +#define _VPL011_H_
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +DEFINE_XEN_FLEX_RING(vpl011);
>>
>>
>> I am sure someone already said it in a previous version. The vpl011 is the
>> console ring. So why are we defining our own internally?
>>
This macro only defines standard functions to operate on the console
ring. Stefano suggested to use the standard functions to operate on
the ring buffer.

>> At least this should have been used by xenconsole, but this is not the
>> case... So we should really avoid defining our own ring here and re-use
>> public/io/console.h.

I am using the console ring definition as defined in
xen/include/public/io/console.h.

Regards,
Bhupinder

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel